Subscribe Bookmark RSS Feed

Is there any way to capture data from Internet with JSL, including hyperlinks

jschroedl

Staff

Joined:

Jun 23, 2011

PMroz has the right idea though some pages may require logging in or use background requests (AJAX) to pull in data after you navigate to the page.

For those, it's better to use File > Internet Open and change Open As: to Web page. After that, you can login/click hyperlinks, etc and use File > Import Table As Data Table... when you reach the data. From there, you can click through more links like Next | Prev etc on the page and keep importing as needed.

Of course, this is impractical for lots of pages so JSL is the way to go there as PMroz says (assuming you're not stymied with a login page).

8 REPLIES
pmroz

Super User

Joined:

Jun 23, 2011

You're right that not all web pages are this easy.  I had never tried opening a URL as a web page and then using File > Import Table as Data Table.  Great option!

I notice that when I import Teresa's table the column headers are in row 1 of the table.  I used Edit > Copy to copy the data, opened up a blank dataset, and then used Edit > Paste with Column Names.  That worked fine, but is there an easier way to move row 1 into the column header position?

jschroedl

Staff

Joined:

Jun 23, 2011

When that hapens to me, I rely on Xan's fantastic "Column Name Add-In Utility"

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=110491&jmpflag=N

I use "Move Up" to move the first row to be the column names. I'm sure you could reuse the script from the add-in if you want to automate this.

teresa

Community Trekker

Joined:

Jun 23, 2011

jschroedl  your answers are very interesting too. In fact I combine the solution of PMroz in his first message and your propossal in the last one.

Now I would like to automatice with JSL the Xan's Add-in. I get several tables opened with the open internet command, each one need to pass the add-in.

How can I program it, somethink like:

Open ("http://....);

Add in

Open ("http://...);

Add in

Thank you,

Teresa

jschroedl

Staff

Joined:

Jun 23, 2011

There are a couple of ways you can script it.

1. Invoke the add-in directly (assuming you installed the add-in)

Open( "http://www.velarc.es/informes/lista_registro_barcos.php", HTML Table( 3 ) );

Include("$ADDIN_HOME(com.jmp.columnnames)\hoist.jsl");

2. Open the code from the add-in and incorporate it directly in your script perhaps as a function. Here I made the contents of hoist.jsl into a "Hoist Column Names" function and call it.

Hoist Column Names = Function({dt},
Local( {ci, col},
  col = dt << Get Selected Columns();
  If( !Is List( col ) | N Items( col ) == 0,
   col = {};
   For( ci = 1, ci <= N Col(), ci++,
    col[ci] = Column( ci )
   );
  );

  For( ci = 1, ci <= N Items( col ), ci++,
   col[ci] << Set Name( Char( col[ci][1] ) )
  );

  dt << Delete Rows( 1 );
)
);

dt = Open( "http://www.velarc.es/informes/lista_registro_barcos.php", HTML Table( 3 ) );
Hoist Column Names(dt);

Hope this helps,

John

jschroedl

Staff

Joined:

Jun 23, 2011

For anyone curious about how I knew what script the add-in was running, here's how you can find out for yourself:

To see what a particular menu item is doing:

- Go to View > Customize > Menus and Toolbars menu.

- Cllick the Change... button at the top and select the JMP Add-In radio button then select com.jmp.columnnames in the drop down.

- Click OK

The menu editor will show the items this Add-In has defined. In this case, it's under the Add-Ins > Column Names menu.

- Expand Add-Ins > Column Names in the tree on the left side

- Select the Move Up menu item. 

On the right-side, you see that the Action is to run JSL from a file "$ADDIN_HOME(com.jmp.columnnames)\hoist.jsl".

To see the JSL source code:

- Go to View > Add-Ins... and select the add-in; "Column Names Utilities" in this case.

- Click the link next to Home Folder and the folder containing the add-in will be opened.

In my case it's: C:\Users\john\AppData\Local\SAS\JMP\Addins\com.jmp.columnnames

From there you can see the jsl (and other goodies) from the add-in.

John

teresa

Community Trekker

Joined:

Jun 23, 2011

Hi John,

Some time ago you suggest me to open a Web Page using:

"For those, it's better to use File > Internet Open and change Open As: to Web page. After that, you can login/click hyperlinks, etc and use File > Import Table As Data Table... when you reach the data. From there, you can click through more links like Next | Prev etc on the page and keep importing as needed."

Nowadays I try to import a table from the follow Web page with an Opend comand:

Open( "http://www.marinetraffic.com/ais/datasheet.aspx?datasource=V_ARR_DEP&PORT_ID=236", HTML table(1) )

But in the column

Vessel's Name

there is the Name of the Vessel but also a HREF that I would like to capture.

I try to follow your instructions: File > Internet Open, Open as Web Page and then File > Import Table As Data Table but I have the same result: a column with the name and without the HREF.

How could I capture the HREF and the Prev | Next links as you suggest in your message?

Best regards,

Teresa

jschroedl

Staff

Joined:

Jun 23, 2011

The JMP HTML import feature is only concerned with the displayed text and not other attributes of the HTML such as the link destination. So, to go deeper than just importing the text you would need to connect to the server and retrieve the underlying HTML and parse that (using the Socket functionality of JMP for example).

John

teresa

Community Trekker

Joined:

Jun 23, 2011

Oh! What a pity! I has understood it is possible.


By the way, thank you for your very, very, very soon answer.


Yours sincerely,


Teresa