<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Problem Parsing XML in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79326#M36678</link>
    <description>&lt;P&gt;i need to parse a HTML page to extract some info but i am getting an unexpected results, see the script below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;HTMLPageAsText = "
&amp;lt;html&amp;gt;
	&amp;lt;head&amp;gt;
		&amp;lt;title&amp;gt;String 1 i want to get&amp;lt;/title&amp;gt;
	&amp;lt;/head&amp;gt;
	&amp;lt;body&amp;gt;String 2 i want to get
		&amp;lt;h2&amp;gt;Possibly also this&amp;lt;/h2&amp;gt;
		&amp;lt;h2&amp;gt;why i get only this&amp;lt;/h2&amp;gt;
	&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
";

PageTitle="";
PageBody="";
	
	Parse XML( HTMLPageAsText,
		On Element( "title", 
			End Tag( PageTitle=XML text();show("Found title") ) 
		),
							
		On Element( "body", 
			End Tag( PageBody=XML text();show("Found body") ) 
		),
	);
	show(PageTitle,PageBody);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;what i Ideally need in the variable&amp;nbsp;PageBody is:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="String 2 i want to get
&amp;lt;h2&amp;gt;Possibly also this&amp;lt;/h2&amp;gt;
&amp;lt;h2&amp;gt;why i get only this&amp;lt;/h2&amp;gt;"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;or if it is only possible&amp;nbsp;to get the content of the tag excluding the subtags i expect to get&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="String 2 i want to get"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;while Instead what i get is:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="why i get only this"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;What am I doing wrong?&lt;/P&gt;</description>
    <pubDate>Wed, 17 Oct 2018 13:59:44 GMT</pubDate>
    <dc:creator>peri_a</dc:creator>
    <dc:date>2018-10-17T13:59:44Z</dc:date>
    <item>
      <title>Problem Parsing XML</title>
      <link>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79326#M36678</link>
      <description>&lt;P&gt;i need to parse a HTML page to extract some info but i am getting an unexpected results, see the script below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;HTMLPageAsText = "
&amp;lt;html&amp;gt;
	&amp;lt;head&amp;gt;
		&amp;lt;title&amp;gt;String 1 i want to get&amp;lt;/title&amp;gt;
	&amp;lt;/head&amp;gt;
	&amp;lt;body&amp;gt;String 2 i want to get
		&amp;lt;h2&amp;gt;Possibly also this&amp;lt;/h2&amp;gt;
		&amp;lt;h2&amp;gt;why i get only this&amp;lt;/h2&amp;gt;
	&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
";

PageTitle="";
PageBody="";
	
	Parse XML( HTMLPageAsText,
		On Element( "title", 
			End Tag( PageTitle=XML text();show("Found title") ) 
		),
							
		On Element( "body", 
			End Tag( PageBody=XML text();show("Found body") ) 
		),
	);
	show(PageTitle,PageBody);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;what i Ideally need in the variable&amp;nbsp;PageBody is:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="String 2 i want to get
&amp;lt;h2&amp;gt;Possibly also this&amp;lt;/h2&amp;gt;
&amp;lt;h2&amp;gt;why i get only this&amp;lt;/h2&amp;gt;"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;or if it is only possible&amp;nbsp;to get the content of the tag excluding the subtags i expect to get&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="String 2 i want to get"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;while Instead what i get is:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;PageBody="why i get only this"&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;What am I doing wrong?&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2018 13:59:44 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79326#M36678</guid>
      <dc:creator>peri_a</dc:creator>
      <dc:date>2018-10-17T13:59:44Z</dc:date>
    </item>
    <item>
      <title>Re: Problem Parsing XML</title>
      <link>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79368#M36681</link>
      <description>&lt;P&gt;As far as I can tell you are only missing something that may never have been documented: text(). similar to EndTag().&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;HTMLPageAsText = "
&amp;lt;html&amp;gt;
	&amp;lt;head&amp;gt;
		&amp;lt;title&amp;gt;one of four&amp;lt;/title&amp;gt;
	&amp;lt;/head&amp;gt;
	&amp;lt;body&amp;gt;two of four
		&amp;lt;h2&amp;gt;three of four&amp;lt;/h2&amp;gt;
		&amp;lt;h2&amp;gt;four of four&amp;lt;/h2&amp;gt;
	&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;
";

title = "";
body = "";
	
Parse XML( HTMLPageAsText,
	On Element( "title", 
		End Tag( title = XML Text(); ) 
	), 				
	On Element( "body", 
		Text( body = body  || XML Text(); ), 
	)
);

show(title,body);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;title = "one of four";&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;body = "two of four&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt; three of fourfour of four";&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Text() runs each time a new snippet of text is processed.&lt;/LI&gt;
&lt;LI&gt;HTML and XML are not usually the same; your example works because it is also valid XML. A number of HTML commands, like &amp;lt;br&amp;gt;, don't have a matching &amp;lt;/br&amp;gt; and break if used in an XML reader.&lt;/LI&gt;
&lt;LI&gt;If the XML is a bit more complicated, you might need to track the nesting levels too.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;(I reworked your example while I was puzzling over how to do it. I'll see if I can get this documented, Thanks!)&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2018 15:20:17 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79368#M36681</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2018-10-17T15:20:17Z</dc:date>
    </item>
    <item>
      <title>Re: Problem Parsing XML</title>
      <link>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79376#M36683</link>
      <description>&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;This will do for now.&lt;/P&gt;&lt;P&gt;However for future developments an additional command for the On Element() called like XML Body() that will return the whole content of a tag (including the nested TAG as text) could be really useful.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;On the comment about HTML vs XML i get the point. i will try to handle it with substituting the &amp;lt;br&amp;gt; with &amp;lt;br/&amp;gt; if the webpage becomes more complicated so it will be XML compliant. i could even envision a loop tracking the non closed tags and substitute them with the XML correct version. however it would be great if the XML parser would raise a warning but continue the execution in case of errors similarly to what web browser do for faulty HTMLs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Finally Regarding documentation: i agree that for Parsing XML the docomentation is somewhat minimal, so while you are updating it please also include the second argument for the&amp;nbsp;XML Attr(). at the moment documentation states only 0 and 1 attribute possible however&amp;nbsp;i found (and used) a piece of code that uses 2 arguments and the second one would be the string returned if the attribute is not found.&lt;/P&gt;&lt;P&gt;i think this is a useful feature that is not documented at the meoment&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Oct 2018 15:42:14 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Problem-Parsing-XML/m-p/79376#M36683</guid>
      <dc:creator>peri_a</dc:creator>
      <dc:date>2018-10-17T15:42:14Z</dc:date>
    </item>
  </channel>
</rss>

