Level: Intermediate Job Function: Analyst / Scientist / Engineer Markus Schafheutle, Associate Research Fellow, Chemometrics, allnex Mahmoud Hammoud, JMP Senior Systems Engineer, SAS Johann Billiani, R&D Group Leader for WB Topcoat Technology, allnex
Recent advances in processor technology have made possible the analysis of large amounts of free-form text data, which requires computationally intensive (sparse) matrix-based techniques. Tools like JMP Text Explorer allow the user to explore and model this type of data from the comfort of their own laptops. In this paper, we show a workflow that was used to analyze textual patent data (descriptions, summaries, ratings) acquired from a patent database in order to create a patent landscape. The latter shall allow a better understanding of current industry trends and also help identify “anomalies” or gaps, e.g., areas that are not being researched much. Special emphasis will be placed on the importance of sharpening the parsing procedure (e.g., stop list, phrase list, stop words) in order to better extract the signal from the noise, as well as tips and tricks for text analysis based on our experience working with this kind of data.