Solved: Importing multiple pdf files

Report Inappropriate Content · Jun 10, 2023 4:23 PM

Warning: I am new to coding!

I have about 500 pdf files containing tabulated data I want to import. I managed to create a script that will extract, stack and transpose the data as well as append/concatenate it to a table (that I called "final.jmp"). Each pdf is represented by one row in final.jmp. Now here's the workflow and issues I identified:

Since I couldn't figure out how to import multiple pdfs, I had to resort to importing each pdf individually by changing the *.pdf file name to be imported in the script (which is embedded in final.jmp).
After each import I have to close all the intermediate tables created and opened in the process, and save the resulting table as new final.jmp.
I then re-open final.jmp, edit the script to the new pdf file name, import a new pdf and so on and so forth.

My question:

Is it possible to run the script in a repetitive fashion, each time selecting a new pdf until all pdfs from a folder have been imported and collated into a final table?

Open(
	"C:\example folder\example.pdf",
	PDF Tables(
		Table(
			table name( "1" ),
			add rows( page( 1 ), Rect( 2.0419, 0.9776, 3.1993, 1.1726 ) ),
			add rows( page( 1 ), Rect( 1.2137, 3.7819, 6.3603, 3.9069 ) ),
			add rows( page( 1 ), Rect( 0.8717, 3.9636, 3.0796, 4.0886 ) ),
			add rows(
				page( 1 ),
				Rect( 0.9853, 4.1436, 3.0029, 4.2686 ),
				column borders( 0.9853, 2.5172, 3.0029 )
			),
			add rows( page( 1 ), Rect( 4.581, 4.1436, 7.297, 4.2686 ) ),
			add rows(
				page( 1 ),
				Rect( 0.8883, 4.4136, 4.6374, 4.5386 ),
				column borders( 0.8883, 1.9603, 4.6374 )
			),
			add rows( page( 1 ), Rect( 0.7556, 4.7979, 7.5908, 8.2654 ) )
		)
	)
);

Data Table( "1" ) << Stack(
	columns( :Column 3, :Column 4, :Column 5, :Column 6 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Output table("2")
);

Data Table( "2" ) << Transpose(
	columns( :Data ),
	Output Table( "Final" )
)

jthi · Dec 22, 2020 08:58 AM

Possible flow:

Create function to handle single pdf file
Get all files in the directory pdf directory
Use first file in the directory to create "collector" datatable
Loop over rest of the files
Concatenate transposed tables to collector datatable created and close after

Maybe something like this could work:

Names Default To Here(1);

//directory where .pdf files exist 
pdfDirectory = "C:\example folder\";

//function to open stack and transpose pdf file
stackTransposePdfFile = function({filePath}, {Default Local},
	dtTemp = Open(
		filePath,
		PDF Tables(
			Table(
				table name("1"),
				add rows(page(1), Rect(2.0419, 0.9776, 3.1993, 1.1726)),
				add rows(page(1), Rect(1.2137, 3.7819, 6.3603, 3.9069)),
				add rows(page(1), Rect(0.8717, 3.9636, 3.0796, 4.0886)),
				add rows(page(1), Rect(0.9853, 4.1436, 3.0029, 4.2686), column borders(0.9853, 2.5172, 3.0029)),
				add rows(page(1), Rect(4.581, 4.1436, 7.297, 4.2686)),
				add rows(page(1), Rect(0.8883, 4.4136, 4.6374, 4.5386), column borders(0.8883, 1.9603, 4.6374)),
				add rows(page(1), Rect(0.7556, 4.7979, 7.5908, 8.2654))
			)
		)
	);
	
	//stack data table from pdf
	dtTemp_stacked = dtTemp << Stack(
		columns(:Column 3, :Column 4, :Column 5, :Column 6),
		Source Label Column("Label"),
		Stacked Data Column("Data"),
		invisible
	);
	Close(dtTemp, no save); //close table created directly from pdf
	
	//transpose table
	dtTemp_transposed = dtTemp_stacked << Transpose(columns(:Data), invisible);
	Close(dtTemp_stacked no save); //close stacked table
	
	//return transposed table
	return(dtTemp_transposed);
);

//list of files in directory (this assumes that there are only .pdf files)
pdfFile_list = Files In Directory(pdfDirectory);

//use the first table as "collection table"
finalDt = stackTransposePdfFile(pdfDirectory||pdfFile_list[1]);

//loop over files in pdfFile_list starting from second file
For(i = 2, i <= N Items(pdfFile_list), i++,
	pdfDt = stackTransposePdfFile(pdfDirectory||pdfFile_list[i]);
	finalDt << Concatenate(pdfDt, Append to first table);
	Close(pdfDt, no save); //close transposed table
);

-Jarmo

View solution in original post

jthi · Dec 22, 2020 08:58 AM

Possible flow:

Create function to handle single pdf file
Get all files in the directory pdf directory
Use first file in the directory to create "collector" datatable
Loop over rest of the files
Concatenate transposed tables to collector datatable created and close after

Maybe something like this could work:

Names Default To Here(1);

//directory where .pdf files exist 
pdfDirectory = "C:\example folder\";

//function to open stack and transpose pdf file
stackTransposePdfFile = function({filePath}, {Default Local},
	dtTemp = Open(
		filePath,
		PDF Tables(
			Table(
				table name("1"),
				add rows(page(1), Rect(2.0419, 0.9776, 3.1993, 1.1726)),
				add rows(page(1), Rect(1.2137, 3.7819, 6.3603, 3.9069)),
				add rows(page(1), Rect(0.8717, 3.9636, 3.0796, 4.0886)),
				add rows(page(1), Rect(0.9853, 4.1436, 3.0029, 4.2686), column borders(0.9853, 2.5172, 3.0029)),
				add rows(page(1), Rect(4.581, 4.1436, 7.297, 4.2686)),
				add rows(page(1), Rect(0.8883, 4.4136, 4.6374, 4.5386), column borders(0.8883, 1.9603, 4.6374)),
				add rows(page(1), Rect(0.7556, 4.7979, 7.5908, 8.2654))
			)
		)
	);
	
	//stack data table from pdf
	dtTemp_stacked = dtTemp << Stack(
		columns(:Column 3, :Column 4, :Column 5, :Column 6),
		Source Label Column("Label"),
		Stacked Data Column("Data"),
		invisible
	);
	Close(dtTemp, no save); //close table created directly from pdf
	
	//transpose table
	dtTemp_transposed = dtTemp_stacked << Transpose(columns(:Data), invisible);
	Close(dtTemp_stacked no save); //close stacked table
	
	//return transposed table
	return(dtTemp_transposed);
);

//list of files in directory (this assumes that there are only .pdf files)
pdfFile_list = Files In Directory(pdfDirectory);

//use the first table as "collection table"
finalDt = stackTransposePdfFile(pdfDirectory||pdfFile_list[1]);

//loop over files in pdfFile_list starting from second file
For(i = 2, i <= N Items(pdfFile_list), i++,
	pdfDt = stackTransposePdfFile(pdfDirectory||pdfFile_list[i]);
	finalDt << Concatenate(pdfDt, Append to first table);
	Close(pdfDt, no save); //close transposed table
);

-Jarmo

Ressel · Dec 22, 2020 01:46 PM

Thanks - I don't know where I'd be without this forum. Great stuff!

Ake · Jan 15, 2024 11:02 AM

Thanks once again for all the help we get!

Importing multiple pdf files

Re: Importing multiple pdf files

Re: Importing multiple pdf files

Re: Importing multiple pdf files

Re: Importing multiple pdf files

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling