Much of the web uses restful APIs to move data to and from servers. Rest is a simple concept that has nothing to do with sleeping; rest means representational state transfer, but this article is not about rest. This article is about a kludgy mechanism for working around the lack of a rest API when you really need to retrieve some data from a site.
Web sites might not want you to do this for various reasons: bandwidth for data costs money, data licensing costs money, and not watching the advertisements might cost money too. This tool, selenium, is nominally for testing a web site, not for speeding over a speed bump. Most sites have terms of use; you can find JMP's terms at the bottom of this page.
A complete JSL file is attached. It is written for Firefox, Windows, and the JMP web site as it looked on 8May2022. The Firefox part can be changed, probably, to many other browsers. It might work on Mac too, no testing was done. The JMP web site will change over time and the JSL will need tweaking. That's the downside of not using an official API.
Before starting, download a driver and install selenium as shown in the comments below. You'll need Firefox too, or do some research on the driver for your preferred browser.
Python Init();
xrc = Python Execute( {}, {By_ID, By_XPATH, rc},
"\[
try:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
#
options = webdriver.FirefoxOptions()
#options.add_argument("--private") # example. you most likely don't want private, some things don't work.
#
service=Service(r'C:\Users\v1\Desktop\geckodriver.exe')
# sometimes people use "browser" rather than "driver". It will be used below.
driver = webdriver.Firefox(service=service,options=options)
# return two magic values. you may need some others, just add them in the same way...
By_ID = By.ID
By_XPATH = By.XPATH
rc = "ok"
except Exception as e:
rc = repr(e)
]\"
);
If( xrc != 0 | rc != "ok", Throw( "start up Selenium failed" || Try( ": " || Char( rc ), "" ) ) );
The PythonInit() only needs to be done once; it connects JMP to Python and takes a few seconds the first time. You can call it again with no penalty. The PythonExecute(...) sends no variables in but gets three back from the code it runs. It takes a bit to load everything and start the browser.
You are looking at an empty browser controlled by JMP
Open the JMP.COM page next. You might see a redirect that normally goes unnoticed.
nav = Function( {url}, {rc},
Python Execute( {url}, {rc},
"\[
try:
driver.get(url)
rc = "ok"
except Exception as e:
rc = repr(e)
]\" );
return(rc);
);
rc = nav( "https://www.jmp.com/" );
if( rc != "ok", throw("nav: "||char(rc)));
The nav function returns "ok" or an error message. The JMP web page loads in the browser. Ignore the people in the screenshot.
The icon means the browser is remote controlled.
Script the sign-in to the JMP site. Right-click the Sign in button to find out the button's HTML id value. Remember how to do this; I'll skip this explanation at the end...
F12 might bring you to the next screen, but this way the control will already be selected.
And the developer console opens with the control's id showing. Further down there will be controls that have a class but not an id. Selenium's XPATH can handle it. The trick is similar to displaybox navigation--finding a path that is not too brittle and still specific enough.
Use the button id in the JSL that follows.
IDs are usually the best choice when available because they are unique on the page. WaitID waits for up to 10 seconds for the login button to appear. It might not be necessary to wait. It takes no time if it is already there.
waitID = Function( {id, timeout = 10, BYformat=By_ID}, {rc},
Python Execute( {id, timeout, BYformat}, {rc},
"\[
try:
myElem = WebDriverWait(driver, timeout).until(EC.presence_of_element_located((BYformat, id)))
rc = "ok"
except TimeoutException:
rc = "timeout"
except Exception as e:
rc = repr(e)
]\"
);
Return( rc );
);
rc = waitID( "loginButton", 5 );
If( rc != "ok", Throw( "no login button: " || char(rc) ) );
clickID = Function( {id, BYformat=By_ID}, {rc},
Python Execute( {id, BYformat}, {rc},
"\[
try:
driver.find_element(BYformat, id).click()
rc = "ok"
except Exception as e:
rc = repr(e)
]\"
);
return(rc);
);
rc = clickID( "loginButton" );
if( rc != "ok", throw("login button: "||char(rc)));
Cool! The sign on screen pops up. Find the user name field next.
Now get the id for the user name field by right-click...
Again, wait for the expected field. Now a keystroke function is needed...
rc = waitID( "idp-discovery-username", 10 );
If( rc != "ok", Throw( "no username field: " || char(rc) ) );
keysToID = Function( {id, txt}, {rc},
Python Execute( {id, txt}, {rc},
"\[
try:
driver.find_element(By.ID, id).send_keys(txt)
rc = "ok"
except Exception as e:
rc = repr(e)
]\"
);
return(rc);
);
rc = keysToID( "idp-discovery-username", Include( "$documents/UserID.jsl" ) );
if( rc != "ok", throw("keysToID username: "||char(rc)));
Fake user name for fake password.
My userid is scrolled off the screen and the next button is visible...find its name...
Click the Next button to get the password prompt.
click it, then wait for the password field...
rc = clickID( "idp-discovery-submit" );
if( rc != "ok", throw("click submit user name: "||char(rc)));
rc = waitID( "okta-signin-password", 5 );
If( rc != "ok", Throw( "no password field: " || char(rc) ) );
then enter the password
rc = keysToID( "okta-signin-password", Include( "$documents/password.jsl" ) );
if( rc != "ok", throw("keysToID password: "||char(rc)));
and repeat the process...find the sign in button...
After entering the password, click the sign in button.
click it
rc = clickID( "okta-signin-submit" );
if( rc != "ok", throw("click signin submit: "||char(rc)));
We are signed in.
Must be signed in, there is an edit profile choice.
There is a search field in the picture, Type in JSL and click the magnifier. "searchField" is the id. The magnifier could be clicked, but selenium has a submit form mechanism that will work off the searchField, which is an input field in the form.
The search field needs a bigger window to be visible, here it is.
There is some asynchronous JavaScript that loads some parts of the page. Waiting for any particular field might not be necessary if the field is loaded as part of the page.
rc = waitID( "searchField" );
If( rc != "ok", Throw( "no search field: " || char(rc) ) );
rc = keysToID( "searchField", "jsl" );
if( rc != "ok", throw("keysToID search field: "||char(rc)));
submitForm = Function( {id}, {rc},
Python Execute( {id}, {rc},
"\[
try:
driver.find_element(By.ID, id).submit()
rc = "ok"
except Exception as e:
rc = repr(e)
]\"
);
return(rc);
);
rc = submitForm( "searchField" );
if( rc != "ok", throw("submitForm searchField: "||char(rc)));
Now get ready to page through the results. The multi-page listing elements look like this
The pink outer element holds three inner parts: title, description, link.
there is a list of the pink-circled data items that spans multiple pages.
getElements = Function( {id, BYformat=By_ID}, {rc},
Python Execute( {id, BYformat}, {rc},
"\[
try:
list = driver.find_elements(BYformat,id)
rc = "ok"
except Exception as e:
rc = repr(e)
]\"
);
Return( rc );
);
getNElements = function({},{n},
Python Execute( {}, {n},
"\[
try:
n = len(list)
except Exception as e:
print(repr(e))
n = -1
]\"
);
return(n);
);
getElementItext = function({i,id, BYformat=By_ID},{txt},
Python Execute( {i, id, BYformat}, {txt},
"\[
try:
txt = list[int(i)].find_element(BYformat, id).text
except Exception as e:
txt = "Error: getElementItext: " + repr(e)
]\"
);
return(txt);
);
getElementIattribute = function({i,id, BYformat=By_ID, attr},{txt},
Python Execute( {i, id, BYformat, attr}, {txt},
"\[
try:
txt = list[int(i)].find_element(BYformat, id).get_attribute(attr)
except Exception as e:
txt = "Error: getElementIattribute: " + repr(e)
]\"
);
return(txt);
);
Above: some functions to use in the loop below. There are buttons at the bottom of the page to go to the next page; they run some JavaScript that destroys and recreates the list of items. The functions are called again to recapture the new list. The JSL and Python are good enough for this example. They will break down if there is more than one list to keep track of at the same time--see the Python list variable. I'm pretty sure it is necessary to wait for the data to load after each next page...
dt = New Table( "articles",
New Column( "link", character,
Set Property("Event Handler",
Event Handler(
Click(JSL Quote(Function( {thisTable, thisColumn, iRow}, Web( Char( thisTable:thisColumn[ iRow ] ) ); );)),
Tip(JSL Quote(Function( {thisTable, thisColumn, iRow}, "Open " || Char( thisTable:thisColumn[ iRow ] ) || " in your browser."; );)),
Color(JSL Quote(Function( {thisTable, thisColumn, iRow}, RGBColor("link"); );))
)
)
),
New Column( "title", character ),
New Column( "description", character )
);
while(1,
rc = waitID("//div[@id='searchresults']//div[@class='result-card']",10,By_XPATH);
If( rc != "ok", Throw( "no search results" ) );
rc = getElements("//div[@id='searchresults']//div[@class='result-card']",By_XPATH);
if( rc != "ok", throw("getElements: "||char(rc)));
n = getNElements();
for(i=0,i<n,i+=1,
dt<<addrows(1);
dt:title[nrows(dt)] = getElementItext(i,"a[@class='result-title_txt_all']",By_XPATH);
dt:description[nrows(dt)] = getElementItext(i,"section[@class='result-description_txt_all']",By_XPATH);
dt:link[nrows(dt)] = getElementIattribute(i,"a[@class='result-url']",By_XPATH,"href");
);
rc = waitID("//ul[@id='pager']//a[@class='pager-next']",1,By_XPATH);
if(rc != "ok",
rc = waitID("//ul[@id='pager']//span[@class='pager-disabled pager-next']",1,By_XPATH);
if(rc=="ok", break(), throw("did not find expected pager button disabled"));
,
rc = clickID("//ul[@id='pager']//a[@class='pager-next']",By_XPATH);
if( rc != "ok", throw("click pager next: "||char(rc)));
);
);
At this point the browser is open and this table is on the screen.
Today there were 57 entries spanning three pages.
Time to shut down the browser.
Python Submit(
"\[
driver.quit() # close the browser
]\" );
Python Term();
Towards the end there is an XPATH
rc = clickID("//ul[@id='pager']//a[@class='pager-next']",By_XPATH);
that means
// - somewhere below the root of the document find a
ul - a <ul> tag (some sort of HTML list)
[@id='pager'] - the list has this id
// - more nested tags, followed by...
a = a <a> tag (link)
[@class='pager-next']
It might not need to be that complicated. It is an example that uses a unique id to find an item that might not be unique if only the class was considered.