Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar

Word Import Tool

 

Description 

The Word Import Tool add-in allows you to import text, tables, or images from a Word document. It can also create a table summarizing one or more Word documents.

Launch WindowLaunch Window

Import Text

Import text from a Word document for use in Text Explorer.

Import Text.png

Import Tables

Import individual tables from a Word Document as a JMP Data Table. This feature includes a preview mode for each table with the opportunity to change column headers and data types before importing.

Import Tables.png

Import Images

Import images from a Word document as an Expression column. A source column and page number columns are included. Title and Descriptions are pulled from the Alt Text properties for an image.

Import Images.png

Document Summary

Get a summary a group of Word documents as a JMP Data Table. This information comes from OS level file properties as well as Microsoft Word specific properties.

 

 Document Summary.png

Changelog:

Version 1.1 (July 13, 2017)

  • Added ability to select a directory of files as an alternative to the previous method only allowing file selection. The two options (Select File(s) and Select Directory) appear as dropdowns when you click any of the buttons.

Version 1.2 (August 13, 2020)

  • Fixed issue reported by @tbidwell where spaces may be removed if white-space is formatted differently than surrounding text in the Word doc. 
Comments

Cool tool! What a great way to rediscover the collection of DOCX files that has been growing in my Downloads folder.

First of all, thank you for this add-in.  It has helped me greatly!

 

My question is whether there is a limit on the number of Word docs it can import?  When I use the Add-in to import .docx files, it tends to stop around 300 - 350 docs (I've had as many as 500 - 600 to import in a folder, so I have to do it in 2 stages). Maybe the limit is on my end with my computer, but I thought I'd check first to see if it has something to do with the Add-in itself.

 

Thanks in advance for any direction.

Hi @tbidwell,

I am glad to hear you find this add-in useful!

My add-in utilizes a JSL function, Pick File(), to prompt users to select the Word document files. There is a buffer limit when using this function that can cause some selected files to be excluded. This limit relates to the total length of all of the paths, so there is not a set limit on the number of files. Is the path to your Word documents fairly long? If so, you could try moving them to a shorter path to be able to import them all (although, it may be easier to just import twice).

One thing I could do is add the option to select a directory and import all documents within that directory instead of requiring you to select all files. Is this something you would find useful? 

Also, which of the four features of this add-in are you using?

Kind regards,

Justin

Hi @tbidwell,

Check out the new verison of this addin I just posted (v1.1). I added the ability to choose a directory and import all docx files (non-recursively) within the directory.

Thanks,

Justin

Hi @Justin_Chilton,

 

Thank you for your response.  The problem is definitely the length of the path. The files were stored on a network drive down in a number of sub-folders.  Adding the option to choose a directory would work perfectly.  Thank you for the updated version.  I haven't tried it yet, but I'm sure it is exactly what I need.  So far, I have only used the option to choose an entire word doc.

Hi @Justin_Chilton,

 

A quick question on the Word Import tool.  I am importing some Word docs and it seems that sometimes the spaces between some of the words are eliminated.  Is there any reason why this could happen or a way I can avoid it?  When it happens, two words are stuck together and it makes my analysis of the Text get all messed up.  The weird thing is that in the same doc, this only happens to a few of the words, not all of them.  Of course, it happens to the words I'm most interested in. (Murphy's Law in action!).

 

In the case I have, the Word docs are letters we've sent to customers.  The spaces between all words appear to be there in Word, but in the JMP table the spaces are sometimes not there.  If it helps I could probably email you an example.

Hi @tbidwell,

Thanks for reaching out (and sending me a sample file via e-mail).

 

I believe the issue happened when text before and after white-space characters had different formatting or spellcheck errors. This caused the white-space to be reported on it's own in the Word doc, which got collapsed in JMP's XML parser.

 

In short, I changed it so that if there is no text for an element (meaning there could have been white-space), I use a single-space instead. This seems to fix your issue, and separates these words for when you are doing text analysis.

 

Let me know how this works for you!

Hi @Justin_Chilton,

Works great!  Thank you so much for this update.

Article Tags
Contributors