cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
thickey1
Level III

Pull ZIP Files from HTTP Link

I have published ZIP files and want to programmatically pull them and store to my PC using JSL. I don't know up front how many files will be present.

Is this possible with JSL?

 

zip.png

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Pull ZIP Files from HTTP Link

Or maybe this is closer to what you are asking:

path="https://www.vsp.virginia.gov/downloads/"; // a page with an index of files. Yours may be different format, adjust pattern below.
html = loadtextfile(path); // get the HTML text so we can scrape the links
// somewhat custom pattern for scraping the links, may be specific to this page
urls = {}; // this list will collect the urls 
rc = patmatch(html,
	patpos(0)+ // make sure the pattern matches from the start
	patrepeat( // this is the loop that extracts the urls from the html
		(
			// the urls look like <a href="2017%20Virginia%20Firearms%20Dealers%20Procedrures%20Manual.pdf">
			// and we want just the part between the quotation marks. Quickly scan forward (patBreak)
			// for a < then see if it matches. >>url grabs the text between quotation marks.
			(patbreak("<") + "<a href=\!"" + patbreak("\!"") >> url + pattest(insertinto(urls,url);1))
			| // OR
			patlen(1) // skip forward one character
		)
		+
		patfence() // fence off the successfully matched text. There is no need to backtrack if something goes wrong.
	) + 
	patrepeat(patnotany("<"),0) + // any trailing bits of html are consumed here
	patrpos(0) // make sure the pattern matches to the end
);

if(rc==0, throw("pattern did not match everything"));
show(nitems(urls),urls[6]); // pick item 6. You'll have a different strategy.

fullpath = path||regex(urls[6],"%20"," ",GLOBALREPLACE);// minimal effort to fix up the url, might need more work

pdfblob = loadtextfile(fullpath,blob); // download item 6, it is a pdf when this was written...
savetextfile("$temp/example.pdf",pdfblob); // save it somewhere
Craige

View solution in original post

4 REPLIES 4
Craige_Hales
Super User

Re: Pull ZIP Files from HTTP Link

 

zip file from @wilkap presentation 

za=open("https://community.jmp.com/kvoqx44227/attachments/kvoqx44227/virtual-jug/12/1/VJUG%20July%202015.zip","zip");
zipfiles=za<<dir;
show(zipfiles);
blob=za<<read(zipfiles[4],format(blob));
dt=open(blob,jmp);
clearglobals(za);

several things to note

  • the file is downloaded to your temp directory; the "zip" option to open returns a zip archive object
  • you can get a list of members from the zip archive using <<dir
  • you can use the blob format with zip archive for reading binary data like JMP tables
  • the 3rd line uses a 2nd argument to tell open that the blob is a JMP data table
  • clearing the za variable is needed if you rerun the whole script; the zip archive object keeps the file in the temp directory from being reused.
  • you could use loadtextfile/savetextfile with blobs to download the zip file to a location of your choice (and delete it when done) and then use the zip archive to process that file.
  • I already looked to see the 4th item in the archive directory was a JMP data table

 

Craige
Craige_Hales
Super User

Re: Pull ZIP Files from HTTP Link

Or maybe this is closer to what you are asking:

path="https://www.vsp.virginia.gov/downloads/"; // a page with an index of files. Yours may be different format, adjust pattern below.
html = loadtextfile(path); // get the HTML text so we can scrape the links
// somewhat custom pattern for scraping the links, may be specific to this page
urls = {}; // this list will collect the urls 
rc = patmatch(html,
	patpos(0)+ // make sure the pattern matches from the start
	patrepeat( // this is the loop that extracts the urls from the html
		(
			// the urls look like <a href="2017%20Virginia%20Firearms%20Dealers%20Procedrures%20Manual.pdf">
			// and we want just the part between the quotation marks. Quickly scan forward (patBreak)
			// for a < then see if it matches. >>url grabs the text between quotation marks.
			(patbreak("<") + "<a href=\!"" + patbreak("\!"") >> url + pattest(insertinto(urls,url);1))
			| // OR
			patlen(1) // skip forward one character
		)
		+
		patfence() // fence off the successfully matched text. There is no need to backtrack if something goes wrong.
	) + 
	patrepeat(patnotany("<"),0) + // any trailing bits of html are consumed here
	patrpos(0) // make sure the pattern matches to the end
);

if(rc==0, throw("pattern did not match everything"));
show(nitems(urls),urls[6]); // pick item 6. You'll have a different strategy.

fullpath = path||regex(urls[6],"%20"," ",GLOBALREPLACE);// minimal effort to fix up the url, might need more work

pdfblob = loadtextfile(fullpath,blob); // download item 6, it is a pdf when this was written...
savetextfile("$temp/example.pdf",pdfblob); // save it somewhere
Craige
thickey1
Level III

Re: Pull ZIP Files from HTTP Link

Thanks Craig for the comprehensive reply. I'll take elements of both suggestions and merge into a generic function to suit my current and future needs.

 

I know I'd have to use a REGEXP to find the links from the HTTP Source but was hoping for a '<< saveLink' function for the zip part.  

 

This will work perfectly fine though.

 

Great answer(s)

 

 

Craige_Hales
Super User

Re: Pull ZIP Files from HTTP Link

Glad you can get something out of it! I'm pretty sure the pattern could be improved, speed-wise. Probably doesn't make a difference for directories of only a few thousand links, but the len(1) part could skip non-link text faster. And a more flexible pattern for the links would be better too.

 

@bryan_boone @ErnestPasour @paul_vezzetti 

 

 

Craige