Subscribe Bookmark RSS Feed

Importing web data using JMP "Internet Open"

roxyais

Community Trekker

Joined:

May 20, 2012

I am attempting to (legally) import sports data from a website (e.g. results from the 2012 Paralympics in London). The public site submits queries to the IPC database, and returns a page in HTML with numerous tables, depending on the combination of: Competition (London Paralympics); Sport (Athletics); Gender (F); Event (100m); AWD Class (F37); which includes a separate table for each stage of the competition (Heats/Semi-finals/Finals) and heat number. There are about 20 AWD classes per event, and usually < 20 Tables per page, depending on the popularity of the event. To partially automate this in JMP, I have a script of about 360  (=20x 18) "Open" statements of the form:

Open(
"http://www.paralympic.org/ipc_results/results.php?competition=2004PG&gender=m&sport=athletics&discip...",
HTML Table (3)
);

for the Male 100m F11 (blind) group (the third table on the page). I then concatenate the tables produced by repeating this for all events. My problem is that although this process works OK for script that allows for 20 tables per class (concatenating 20 of the above statements); when I attempt to include all 18 events (360=20 x 18), execution halts without error after the first event (eg 100m) and does not produce output for the next event (200m). Is this because there are only 12 tables in this event/class combination (I don't think so), or some other problems? (I am using a 4 core, 64 bit pc with 16Gb of RAM, running WinXP ).

     I would appreciate advice from anyone out there who has experience using the "Internet Open" choice on the Web.

1 REPLY
ms

Super User

Joined:

Jun 23, 2011

I am not sure why it breaks, but it may be because increasingly slow network connection when numbers of requests are large. Or that a request fails for another reason.

Are you concatenating all tables at once or one at a time (and then close the source table)? The latter approach demands less from your computer.

The code below seems to work, even if the event list is expanded far beyond the two events in the example (I use JMP 10 for Mac and have a pretty fast network connection). I used Try() to avoid a stop in case there are fewer than 20 tables per page.

dt = New Table( "All results", New Column( "Source table", Character ) );

event_list = {"100", "400"}; // Only two events in this example

For( i = 1, i <= N Items( event_list ), i++,

  Try( // Escape loop if there are no more table for this event

  For( j = 1, j <= 20, j++,

  //Try( Close( dt1, no save ) ); // Close previous subtable (if any)

  dt1 = Open(

  "http://www.paralympic.org/ipc_results/results.php?competition=2004PG&gender=m&sport=athletics&discip..."

   || event_list[i] || "%20m&eclass=T11",

  HTML Table( j )

  );

  Wait( 0 ); // Give JMP a chance to complete download, increase the argument if network is slow

  dt1 << set name( event_list[i] || "_table" || Char( j ) );

  col = dt1 << New Column( "Source table", character );

  For Each Row( dt1:col = (dt1 << get name) );

  dt << concatenate( dt1, Append to first table( 1 ) );

  Close( dt1, no save ); // Close subtable

  )

  )

);