cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Alicia
Level III

How do I import data from multiple PDF files into one table, tagged by PDF file name?

Hi there,

 

I have a folder containing over 100 PDFs, all set out in a consistent format. The file name of each PDF represents a batch number which I want to utilise and in each PDF there is a particular table of data that I want to extract.

 

My aim is to extract the same table of data for every PDF then concatenate all the extracted data tables, tagged by PDF file name (batch number). Finally I want to split the columns in the concatenated table and group it by PDF file name to end up with a final table.

 

I’ve had a go at doing this manually and can get to the final table I need however it is extremely time-consuming as I have to open and select the table in every PDF separately.

 

Therefore, I wonder if there is some JSL that will be able to help with this?

 

I’m very new to scripting but have been able to create a script that gives me exactly what I need for the first two PDF files:

 

//Opening up the first two files (Batch Number 1 & Batch Number 2) from the folder

Open(
	"C:\Example folder\Batch 1.pdf",
	PDF Tables(
		Table(
			table name( "Batch 1" ),
			add rows( page( 1 ), Rect( 0.4143, 5.3995, 7.5889, 6.6781 ) )
		)
	)
)

;

Open(
	"C:\Example folder\Batch 2.pdf",
	PDF Tables(
		Table(
			table name( "Batch 2" ),
			add rows( page( 1 ), Rect( 0.4143, 5.3995, 7.5889, 6.6781 ) )
		)
	)
)

;


//Concatenating the data for the first 2 batches into one table and adding a source column:

Data Table( "Batch 1" ) << Concatenate(
	Data Table( "Batch 2" ),
	Create source column,
	Output table("Concat")
)
;


//Splitting by the concatenated table by the Characteristic column and grouping by source column (batch number) to create a Final table:


Data Table( "Concat" ) << Split(
	Split By( :Characteristic ),
	Split( :Value ),
	Group( :Source Table ),
	Remaining Columns( Drop All ),
	Sort by Column Property,
	Output table("Final")
)

;

//Close all other tables

Close("Batch 1", no save)
;
Close("Batch 2", no save)
;
Close("Concat", no save)

I need to adapt this script to open all the PDF files (not just the first two) to create the Final table of data, tagged by the PDF file name.

 

Is there anyone who can help me?

 

Many thanks,

 

Alicia

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
Craige_Hales
Super User

Re: How do I import data from multiple PDF files into one table, tagged by PDF file name?

If you've got two working, the code in Concatenate data tables will probably get you there. FilesInDirectory() may be the function you need.

There are a number of similar examples; searching for Files In Directory will probably find them.

Ask again if that doesn't get you there, I think you are close.

Craige

View solution in original post

Georg
Level VII

Re: How do I import data from multiple PDF files into one table, tagged by PDF file name?

Here is an example that contains the steps you'll need. The post of @Craige_Hales helps to understand how these things work.

 

Names Default To Here( 1 );

// Prepare a directory
tempdir = "$TEMP\pdf_test";
If( !Directory Exists( tempdir ),
	Create Directory( tempdir ),
	file_lst = Files In Directory( tempdir );
	For( i = 1, i <= N Items( file_lst ), i++,
		Delete File( tempdir || "/" || file_lst[i] )
	);
);

// Open sample Table
cdt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// Prepare PDF's
For( i = 1, i <= 10, i++,
	jrn = cdt << Journal;
	jrn << save pdf( tempdir || "/" || Char( i ) || "_Big Class.pdf" );
	jrn << close window;
);

// Read PDF's
file_lst = Files In Directory( tempdir );
table_lst = {};
For( i = 1, i <= N Items( file_lst ), i++,
	Insert Into( table_lst, Open( tempdir || "/" || file_lst[i], PDF All Tables() ) )
);

// Concat PDF's
If( N Items( table_lst ) > 1,
	result_table = table_lst[1] << concatenate( table_lst[2 :: N Items( table_lst )], Output Table( "ResultTable" ), Create Source Column )
);
// Close single PDF's
For( i = 1, i <= N Items( table_lst ), i++,
	Close( table_lst[i], noSave )
);

split_table = result_table << Split(
	Split By( :sex ),
	Group( :Source Table ),
	Split( :age ),
	Remaining Columns( Drop All ),
	Sort by Column Property, output table("Split_Result")
);
Georg

View solution in original post

3 REPLIES 3
Craige_Hales
Super User

Re: How do I import data from multiple PDF files into one table, tagged by PDF file name?

If you've got two working, the code in Concatenate data tables will probably get you there. FilesInDirectory() may be the function you need.

There are a number of similar examples; searching for Files In Directory will probably find them.

Ask again if that doesn't get you there, I think you are close.

Craige
Georg
Level VII

Re: How do I import data from multiple PDF files into one table, tagged by PDF file name?

Here is an example that contains the steps you'll need. The post of @Craige_Hales helps to understand how these things work.

 

Names Default To Here( 1 );

// Prepare a directory
tempdir = "$TEMP\pdf_test";
If( !Directory Exists( tempdir ),
	Create Directory( tempdir ),
	file_lst = Files In Directory( tempdir );
	For( i = 1, i <= N Items( file_lst ), i++,
		Delete File( tempdir || "/" || file_lst[i] )
	);
);

// Open sample Table
cdt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// Prepare PDF's
For( i = 1, i <= 10, i++,
	jrn = cdt << Journal;
	jrn << save pdf( tempdir || "/" || Char( i ) || "_Big Class.pdf" );
	jrn << close window;
);

// Read PDF's
file_lst = Files In Directory( tempdir );
table_lst = {};
For( i = 1, i <= N Items( file_lst ), i++,
	Insert Into( table_lst, Open( tempdir || "/" || file_lst[i], PDF All Tables() ) )
);

// Concat PDF's
If( N Items( table_lst ) > 1,
	result_table = table_lst[1] << concatenate( table_lst[2 :: N Items( table_lst )], Output Table( "ResultTable" ), Create Source Column )
);
// Close single PDF's
For( i = 1, i <= N Items( table_lst ), i++,
	Close( table_lst[i], noSave )
);

split_table = result_table << Split(
	Split By( :sex ),
	Group( :Source Table ),
	Split( :age ),
	Remaining Columns( Drop All ),
	Sort by Column Property, output table("Split_Result")
);
Georg
Alicia
Level III

Re: How do I import data from multiple PDF files into one table, tagged by PDF file name?

Huge thanks @Craige_Hales  and @Georg for your help. I'll have a go and see if I can adapt the script. Really appreciated!