Creating a Corpus for text analysis using abstracts only from a directory of pdf files
Sep 11, 2019 2:41 PM(439 views)
I am currently working on a systematic literature review. I have been playing around with the text analysis platform in JMP and it is great (straight-forward, easy to use, fast, configurable... YES!). Right now, I am stuck on what seems to be the most basic, pre-analysis step: creating the JMP table that I'll use for analysis.
Currently, I have an EndNote library with citation information for ~3,000 scientific papers that I would like to do some analyses on. I can export this citation information in any number of formats (.xls, .csv, .xml, .RIS, etc.). The EndNote library also has a folder that contains PDFs for each reference (each in its own subfolder).
I'd like to do text analyses on 3 different subsets of information for each article: 1) title only, 2) abstract only, 3) full text. I can easily export the citation information from EndNote to create a JMP file with title and abstract fields to do the first two things. However, I can't figure out how to easily create a field in this same file that has ALL of the text from the pdf from each article for full-text analysis.
Does anyone have some advice on how to get this done easily, without a bunch of cutting and pasting, and with a minimal number of steps?
Thanks in advance for any help that you can provide.