cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Ressel
Level VI

How to import multiple pdf files into data table

Warning: I am new to scripting

 

Challenge:

  • I have about 500 pdf files, each containing a data table I want to import to JMP and collate into a single jmp data table.

 

Work flow & issues:

  • I made a script (see below) that allows me to import the tabulated data from one pdf and append it to a table I called Final.jmp. I kept the script as a separate file, i.e. not inside a data table. Each pdf is appended to Final.jmp as a single row.
  • After each import, I have to close the Final.jmp table and overwrite it with the new, updated "final" table that contains the extra row from the newly imported pdf.
  • After each import I also have to manually close all the intermediate tables being created in the process. (boring)
  • Then I change the pdf file name in the script, press save and re-run it to import yet another pdf. And so the cycles continues.
  • This process is faster than manual copy pasting, but still laborious and with near certainty not automated to a degree typical for computer science.

 

Question:

  • Is it possible to have a script run repetitively, each time selecting a new pdf for import and append it to the same table until all pdfs in one folder have been imported?

 

Open(
	"C:\example folder\example.pdf",
	PDF Tables(
		Table(
			table name( "1" ),
			add rows( page( 1 ), Rect( 2.0419, 0.9776, 3.1993, 1.1726 ) ),
			add rows( page( 1 ), Rect( 1.2137, 3.7819, 6.3603, 3.9069 ) ),
			add rows( page( 1 ), Rect( 0.8717, 3.9636, 3.0796, 4.0886 ) ),
			add rows(
				page( 1 ),
				Rect( 0.9853, 4.1436, 3.0029, 4.2686 ),
				column borders( 0.9853, 2.5416, 3.0029 )
			),
			add rows( page( 1 ), Rect( 4.581, 4.1436, 7.297, 4.2686 ) ),
			add rows(
				page( 1 ),
				Rect( 0.8883, 4.4136, 4.6374, 4.5386 ),
				column borders( 0.8883, 1.9603, 4.6374 )
			),
			add rows(
				page( 1 ),
				Rect( 6.0143, 4.6569, 7.2498, 4.7819 ),
				column borders( 6.0143, 6.5817, 7.2498 )
			),
			add rows( page( 1 ), Rect( 0.7556, 4.7979, 7.5908, 8.2654 ) )
		)
	)
);

Data Table( "1" ) << Stack(
	columns( :Column 3, :Column 4, :Column 5, :Column 6 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Output table("2")
);

Data Table( "2" ) << Transpose(
	columns( :Data ),
	Output Table( "3" )
);

Data Table( "Final" ) << Concatenate( Data Table( "3" ) )

 

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: How to import multiple pdf files into data table

All that has to be done is to just loop across the code, reading in one file after another and adding it to the Final data table.  The JSL below should provide you with a good start on how to do this.

Names default to here(1);

// Here is an example where the first thing that is done, is to create
// the list of the files to be processed
// Many different methods can be used to do this, but for this example
// the file names are read in from a directory

theFilesList = Files in directory( <path to your folder that has the .pdf files> );

// Create the Final data table that will have all of the processed files 
// concatenated to it
Data Table( "Final" );

// Loop across all of the files and read them in one at a time and add them
// to the Final data table
For( i = 1, i <= N Items( theFilesList ), i++,
	Open(
		theFilesList[i],
		PDF Tables(
			Table(
				table name( "1" ),
				add rows( page( 1 ), Rect( 2.0419, 0.9776, 3.1993, 1.1726 ) ),
				add rows( page( 1 ), Rect( 1.2137, 3.7819, 6.3603, 3.9069 ) ),
				add rows( page( 1 ), Rect( 0.8717, 3.9636, 3.0796, 4.0886 ) ),
				add rows( page( 1 ), Rect( 0.9853, 4.1436, 3.0029, 4.2686 ), column borders( 0.9853, 2.5416, 3.0029 ) ),
				add rows( page( 1 ), Rect( 4.581, 4.1436, 7.297, 4.2686 ) ),
				add rows( page( 1 ), Rect( 0.8883, 4.4136, 4.6374, 4.5386 ), column borders( 0.8883, 1.9603, 4.6374 ) ),
				add rows( page( 1 ), Rect( 6.0143, 4.6569, 7.2498, 4.7819 ), column borders( 6.0143, 6.5817, 7.2498 ) ),
				add rows( page( 1 ), Rect( 0.7556, 4.7979, 7.5908, 8.2654 ) )
			)
		)
	);

	Data Table( "1" ) << Stack(
		columns( :Column 3, :Column 4, :Column 5, :Column 6 ),
		Source Label Column( "Label" ),
		Stacked Data Column( "Data" ),
		Output table( "2" )
	);

	Data Table( "2" ) << Transpose( columns( :Data ), Output Table( "3" ) );

// Concatenate the processed data to the Final data table
	Data Table( "Final" ) << Concatenate( Data Table( "3" ), Append to first table( 1 ) );

// Cleanup the environment
	Close( Data Table( "1" ), nosave );
	Close( Data Table( "2" ), nosave );
	Close( Data Table( "3" ), nosave );
);
Jim

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: How to import multiple pdf files into data table

All that has to be done is to just loop across the code, reading in one file after another and adding it to the Final data table.  The JSL below should provide you with a good start on how to do this.

Names default to here(1);

// Here is an example where the first thing that is done, is to create
// the list of the files to be processed
// Many different methods can be used to do this, but for this example
// the file names are read in from a directory

theFilesList = Files in directory( <path to your folder that has the .pdf files> );

// Create the Final data table that will have all of the processed files 
// concatenated to it
Data Table( "Final" );

// Loop across all of the files and read them in one at a time and add them
// to the Final data table
For( i = 1, i <= N Items( theFilesList ), i++,
	Open(
		theFilesList[i],
		PDF Tables(
			Table(
				table name( "1" ),
				add rows( page( 1 ), Rect( 2.0419, 0.9776, 3.1993, 1.1726 ) ),
				add rows( page( 1 ), Rect( 1.2137, 3.7819, 6.3603, 3.9069 ) ),
				add rows( page( 1 ), Rect( 0.8717, 3.9636, 3.0796, 4.0886 ) ),
				add rows( page( 1 ), Rect( 0.9853, 4.1436, 3.0029, 4.2686 ), column borders( 0.9853, 2.5416, 3.0029 ) ),
				add rows( page( 1 ), Rect( 4.581, 4.1436, 7.297, 4.2686 ) ),
				add rows( page( 1 ), Rect( 0.8883, 4.4136, 4.6374, 4.5386 ), column borders( 0.8883, 1.9603, 4.6374 ) ),
				add rows( page( 1 ), Rect( 6.0143, 4.6569, 7.2498, 4.7819 ), column borders( 6.0143, 6.5817, 7.2498 ) ),
				add rows( page( 1 ), Rect( 0.7556, 4.7979, 7.5908, 8.2654 ) )
			)
		)
	);

	Data Table( "1" ) << Stack(
		columns( :Column 3, :Column 4, :Column 5, :Column 6 ),
		Source Label Column( "Label" ),
		Stacked Data Column( "Data" ),
		Output table( "2" )
	);

	Data Table( "2" ) << Transpose( columns( :Data ), Output Table( "3" ) );

// Concatenate the processed data to the Final data table
	Data Table( "Final" ) << Concatenate( Data Table( "3" ), Append to first table( 1 ) );

// Cleanup the environment
	Close( Data Table( "1" ), nosave );
	Close( Data Table( "2" ), nosave );
	Close( Data Table( "3" ), nosave );
);
Jim
Ressel
Level VI

Re: How to import multiple pdf files into data table

@txnelson, I don't know how to adequately express my gratitude. You are very, very kind. Thank you so much for your help again!

Ressel
Level VI

Re: How to import multiple pdf files into data table

Sorry, I have to return with yet more questions because somehow, despite years of academic training, I cannot get this loop to function. Here's what I tried to make it work: First I moved all files to my local hard disk, because for some reason we are using "()" in some of our folder names on the server and I had the impression this was creating an issue. Eventually, I "quoted" the whole directory in the script using  quotation marks, which seemed to convince the script to run until the next error message. Then it told me something about "Path is invalid ...'Glue' ...". I googled it, but have no idea how to address that issue. I can see that the file name in the error message is the name of one of the files in the directory I want to work with. First I thought the hyphen in the file name was creating an issue, but even when changing that the error occurs.

 

I have pasted in below:

  • A copy of the script with a fake directory name. I preserved the number of folders as well as the folder name containing parentheses.
  • The error message from the script window
  • The error message from debugging

Also, how do I include the directory AND filename in the final data table?

 

Thank you very much in advance! (If you find this too trivial to answer, I will understand.)

Ressel_0-1608408948210.pngRessel_1-1608408956917.png

 

Names default to here(1);

// Here is an example where the first thing that is done, is to create
// the list of the files to be processed
// Many different methods can be used to do this, but for this example
// the file names are read in from a directory

theFilesList = Files in directory( "C:\directory1\subdirectory1\subdirectory2\subdirectory3 (nameof aperson)\subdirectory4\subdirectory5\subdirectory6\subdirectory7\subdirectory8\subdirectory9\subdirectory10" );

// Create the Final data table that will have all of the processed files 
// concatenated to it
Data Table( "Final" );

// Loop across all of the files and read them in one at a time and add them
// to the Final data table
For( i = 1, i <= N Items( theFilesList ), i++,
	Open(
		theFilesList[i],
		PDF Tables(
			Table(
				table name( "1" ),
				add rows( page( 1 ), Rect( 2.0419, 0.9776, 3.1993, 1.1726 ) ),
				add rows( page( 1 ), Rect( 1.2137, 3.7819, 6.3603, 3.9069 ) ),
				add rows( page( 1 ), Rect( 0.8717, 3.9636, 3.0796, 4.0886 ) ),
				add rows( page( 1 ), Rect( 0.9853, 4.1436, 3.0029, 4.2686 ), column borders( 0.9853, 2.5416, 3.0029 ) ),
				add rows( page( 1 ), Rect( 4.581, 4.1436, 7.297, 4.2686 ) ),
				add rows( page( 1 ), Rect( 0.8883, 4.4136, 4.6374, 4.5386 ), column borders( 0.8883, 1.9603, 4.6374 ) ),
				add rows( page( 1 ), Rect( 6.0143, 4.6569, 7.2498, 4.7819 ), column borders( 6.0143, 6.5817, 7.2498 ) ),
				add rows( page( 1 ), Rect( 0.7556, 4.7979, 7.5908, 8.2654 ) )
			)
		)
	);

	Data Table( "1" ) << Stack(
		columns( :Column 3, :Column 4, :Column 5, :Column 6 ),
		Source Label Column( "Label" ),
		Stacked Data Column( "Data" ),
		Output table( "2" )
	);

	Data Table( "2" ) << Transpose( columns( :Data ), Output Table( "3" ) );

// Concatenate the processed data to the Final data table
	Data Table( "Final" ) << Concatenate( Data Table( "3" ), Append to first table( 1 ) );

// Cleanup the environment
	Close( Data Table( "1" ), nosave );
	Close( Data Table( "2" ), nosave );
	Close( Data Table( "3" ), nosave );
);

 

txnelson
Super User

Re: How to import multiple pdf files into data table

My error........

Data table( "Final" );

should have been specified

New Table( "Final" );
Jim