The Word Import Tool add-in allows you to import text, tables, or images from a Word document. It can also create a table summarizing one or more Word documents.
Import text from a Word document for use in Text Explorer.
Import individual tables from a Word Document as a JMP Data Table. This feature includes a preview mode for each table with the opportunity to change column headers and data types before importing.
Import images from a Word document as an Expression column. A source column and page number columns are included. Title and Descriptions are pulled from the Alt Text properties for an image.
Get a summary a group of Word documents as a JMP Data Table. This information comes from OS level file properties as well as Microsoft Word specific properties.
Changelog:
Version 1.1 (July 13, 2017)
Version 1.2 (August 13, 2020)
Version 1.3 (September 6, 2024)
Cool tool! What a great way to rediscover the collection of DOCX files that has been growing in my Downloads folder.
First of all, thank you for this add-in. It has helped me greatly!
My question is whether there is a limit on the number of Word docs it can import? When I use the Add-in to import .docx files, it tends to stop around 300 - 350 docs (I've had as many as 500 - 600 to import in a folder, so I have to do it in 2 stages). Maybe the limit is on my end with my computer, but I thought I'd check first to see if it has something to do with the Add-in itself.
Thanks in advance for any direction.
Hi @tbidwell,
I am glad to hear you find this add-in useful!
My add-in utilizes a JSL function, Pick File(), to prompt users to select the Word document files. There is a buffer limit when using this function that can cause some selected files to be excluded. This limit relates to the total length of all of the paths, so there is not a set limit on the number of files. Is the path to your Word documents fairly long? If so, you could try moving them to a shorter path to be able to import them all (although, it may be easier to just import twice).
One thing I could do is add the option to select a directory and import all documents within that directory instead of requiring you to select all files. Is this something you would find useful?
Also, which of the four features of this add-in are you using?
Kind regards,
Justin
Hi @tbidwell,
Check out the new verison of this addin I just posted (v1.1). I added the ability to choose a directory and import all docx files (non-recursively) within the directory.
Thanks,
Justin
Hi @Justin_Chilton,
Thank you for your response. The problem is definitely the length of the path. The files were stored on a network drive down in a number of sub-folders. Adding the option to choose a directory would work perfectly. Thank you for the updated version. I haven't tried it yet, but I'm sure it is exactly what I need. So far, I have only used the option to choose an entire word doc.
Hi @Justin_Chilton,
A quick question on the Word Import tool. I am importing some Word docs and it seems that sometimes the spaces between some of the words are eliminated. Is there any reason why this could happen or a way I can avoid it? When it happens, two words are stuck together and it makes my analysis of the Text get all messed up. The weird thing is that in the same doc, this only happens to a few of the words, not all of them. Of course, it happens to the words I'm most interested in. (Murphy's Law in action!).
In the case I have, the Word docs are letters we've sent to customers. The spaces between all words appear to be there in Word, but in the JMP table the spaces are sometimes not there. If it helps I could probably email you an example.
Hi @tbidwell,
Thanks for reaching out (and sending me a sample file via e-mail).
I believe the issue happened when text before and after white-space characters had different formatting or spellcheck errors. This caused the white-space to be reported on it's own in the Word doc, which got collapsed in JMP's XML parser.
In short, I changed it so that if there is no text for an element (meaning there could have been white-space), I use a single-space instead. This seems to fix your issue, and separates these words for when you are doing text analysis.
Let me know how this works for you!
I know it's been a bit since you've checked this forum, but hoping you might be able to help me out.
When using the "Import text" function of the tool, I notice that Header/Footer is excluded from the text extracted from the document. Is there are setting to modify this, or a quick fix to the underlying .jsl file that I could implement?
Thanks