<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pulling data from a webpage with multiple table pages in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625399#M82420</link>
    <description>&lt;P&gt;Ok, I see, I studied your blog post carefully.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Thank Craige!&lt;/P&gt;</description>
    <pubDate>Mon, 24 Apr 2023 10:30:45 GMT</pubDate>
    <dc:creator>lala</dc:creator>
    <dc:date>2023-04-24T10:30:45Z</dc:date>
    <item>
      <title>Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/624732#M82349</link>
      <description>&lt;P&gt;Hey Folks,&lt;/P&gt;&lt;P&gt;I am pulling data from a webpage but running into a problem, I cant provide the web address due to IP.&lt;/P&gt;&lt;P&gt;when i pull the page it opens a table but not all the rows - it opens the first 1000 rows:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jearls11_1-1682064854778.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/52192i36A73D21AC2D605C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jearls11_1-1682064854778.png" alt="jearls11_1-1682064854778.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;open("URL", HTML Table(2));&lt;/PRE&gt;&lt;P&gt;The problem is it has 3 pages if that makes sense see screen shot&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jearls11_0-1682064805144.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/52191i7B453BCFD9C5134D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jearls11_0-1682064805144.png" alt="jearls11_0-1682064805144.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;my question is how do I pull the next two pages?&lt;BR /&gt;Thank you&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jun 2023 16:08:39 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/624732#M82349</guid>
      <dc:creator>jearls11</dc:creator>
      <dc:date>2023-06-09T16:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/624918#M82366</link>
      <description>&lt;P&gt;When you select the next page on the website in a browser, does the URL change in a way that you can decipher? If so, you could run your open script multiple times, one for each page/URL, then use the Concatenate() command to bring all the tables together.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Apr 2023 16:01:50 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/624918#M82366</guid>
      <dc:creator>Jed_Campbell</dc:creator>
      <dc:date>2023-04-21T16:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625152#M82401</link>
      <description>&lt;P&gt;I tried this the URL remains the same, i tried page=2. Unfortunately as its an internal system I cannot share&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 22 Apr 2023 15:42:06 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625152#M82401</guid>
      <dc:creator>jearls11</dc:creator>
      <dc:date>2023-04-22T15:42:06Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625166#M82403</link>
      <description>&lt;P&gt;There are several approaches, in addition to the &lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/610"&gt;@Jed_Campbell&lt;/a&gt;&amp;nbsp; idea (which is the easiest if the browser's address bar shows an address change in a predictable way.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;0. If the site provides an API, possibly for &lt;EM&gt;rest&lt;/EM&gt; services, that will be the best choice. Maybe you can ask the owner of the system for an API.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;or, 1. Use F12 in your browser (I'm using FireFox, but Chrome and Edge are similar). Pick the network tab which shows you what is sent over the network. Click the next page button on the web site and start studying the requests that are sent. It's hard to say for sure what you are looking for, but it will often be JSON data. You'll see if it was a POST or GET and what sort of headers were used. If there is a choice, JSON will likely be better than HTML.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;or, 2. This will be significantly harder, but will often work when other choices fail. &lt;LI-MESSAGE title="Browser Scripting with Python Selenium" uid="485000" url="https://community.jmp.com/t5/Uncharted/Browser-Scripting-with-Python-Selenium/m-p/485000#U485000" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-blog-thread lia-fa-icon lia-fa-blog lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Scraping data from web pages is hard in the best case. In the worst case, some sites are actively trying to prevent automated bots from working, generally by detecting &lt;EM&gt;not a real browser&lt;/EM&gt; or &lt;EM&gt;not a real human&lt;/EM&gt;. Sites that don't update the address bar are using some sort of ajax-like protocol with JavaScript; that will show up in the F12 window, perhaps with an easy-to-decode URL and headers. Sometimes with a cookie or encoded parameters or password.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;JMP has special handling for HTML tables, but that might not work for you unless there is a simple URL that returns each page as HTML with a table. JMP also has JSON, XML, and CSV import wizards that might help.&lt;/P&gt;</description>
      <pubDate>Sun, 23 Apr 2023 04:07:40 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625166#M82403</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2023-04-23T04:07:40Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625168#M82404</link>
      <description>&lt;P&gt;If there is JavaScript in the web page to improve the request of the timestamp algorithm conversion, how to use JSL to implement this JavaScript algorithm, so that JSL can download to the web data smoothly?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank Craige!&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="2023-04-23_12-46-42.png" style="width: 890px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/52237iCFAF9B89B0E2A6BB/image-size/large?v=v2&amp;amp;px=999" role="button" title="2023-04-23_12-46-42.png" alt="2023-04-23_12-46-42.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 23 Apr 2023 04:47:23 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625168#M82404</guid>
      <dc:creator>lala</dc:creator>
      <dc:date>2023-04-23T04:47:23Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625209#M82407</link>
      <description>&lt;P&gt;A proper API will always be the best choice. An API will send back the data, typically as JSON, without other annoying artifacts like popups, &amp;lt;DIV&amp;gt;, etc. APIs are sometimes &lt;A href="https://en.wikipedia.org/wiki/Representational_state_transfer" target="_self"&gt;&lt;STRONG&gt;&lt;EM&gt;rest&lt;/EM&gt;&lt;/STRONG&gt;&lt;/A&gt; services.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Some web sites don't want to be scraped because they lose money when ads are not viewed and bandwidth costs money and they may have to pay for access to the data they pass along. If the site wants to make sure a human is using a real browser to access the data, using Selenium might violate their terms of service for robot scraping. Checking for a browser that can run JavaScript is one of the checks a site can do. A browser controlled by Selenium runs JavaScript just like a normal browser.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Without Selenium, when you use Open() or LoadTextFile() to read a URL, there is no browser. The JavaScript is just part of the text that is returned, not executed (just like the HTML is part of the text that is returned, not rendered to the display.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.google.com/search?q=use+selenium+to+run+javascript" target="_self"&gt;https://www.google.com/search?q=use+selenium+to+run+javascript&lt;/A&gt;&amp;nbsp; - a bunch of ideas if you actually need to run some Javascript&amp;amp;colon; JSL-&amp;gt;Python-&amp;gt;Selenium-&amp;gt;JavaScript.&lt;/P&gt;</description>
      <pubDate>Sun, 23 Apr 2023 15:00:25 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625209#M82407</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2023-04-23T15:00:25Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625379#M82416</link>
      <description>&lt;P&gt;Thank Craige!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2023 09:45:47 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625379#M82416</guid>
      <dc:creator>lala</dc:creator>
      <dc:date>2023-04-24T09:45:47Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625395#M82418</link>
      <description>I have not done it myself, but I believe you can use JMP to run Selenium to load the page with the JavaScript code and then use Selenium to call JavaScript functions from JMP .  Or you can just use Selenium to operate the page. I would not try to convert a JavaScript function to JSL. The JSL+Python+Selenium in the blog post is doing pretty much what you are describing...a robot operating a real browser to scrape a web page, even supplying a password.</description>
      <pubDate>Mon, 24 Apr 2023 10:25:10 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625395#M82418</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2023-04-24T10:25:10Z</dc:date>
    </item>
    <item>
      <title>Re: Pulling data from a webpage with multiple table pages</title>
      <link>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625399#M82420</link>
      <description>&lt;P&gt;Ok, I see, I studied your blog post carefully.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Thank Craige!&lt;/P&gt;</description>
      <pubDate>Mon, 24 Apr 2023 10:30:45 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Pulling-data-from-a-webpage-with-multiple-table-pages/m-p/625399#M82420</guid>
      <dc:creator>lala</dc:creator>
      <dc:date>2023-04-24T10:30:45Z</dc:date>
    </item>
  </channel>
</rss>

