What a Mess! Cleaning Up Imported PDF Data with Column Formulas (2025-US-30MP-25...

Most of our data in R&D comes from databases, Excel, or CSV files, or is entered directly into JMP. But supporting data often comes in other formats, most notably PDF files. For example, think of chemical reference tables, vendor information sheets, or even historic company data that now only exists in PDFs, . This type of supporting information can be very helpful to complement experimental data, and thankfully, JMP has a PDF import wizard that enables importing this type of data.

However, because these tables were not formatted for statistical software like JMP, importing data from PDFs often results in messy data tables that almost always require additional cleanup steps. It can be frustrating, especially since it can take a lot of time to disentangle even the most easily imported data.

Like many messy data problems, column formulas are the way to go to clean up imported PDF data. It is especially true when recording the steps in Workflow Builder (so that you can repeat the steps on another PDF).

If you want to learn more about the basics of PDF import wizard, if you continue to get stuck with data cleanup after importing PDF data, if you'd like to learn more about using flag columns and column subscripts to clean up data, or if you want to learn how to loop PDF cleanup steps for multiple pages, this presentation is for you. You won’t shy away from cleaning up messy PDF data anymore once you know these tips and tricks!

0 Comments