Text Mining and Plotting Tools for KSA / DS / HEI Research Study

Listed in Datasets

By Corey S Seliger

Purdue University

This publication comprises the source code for various text mining utilities written against the Stanford CoreNLP project and other scripts to plot the formatted output from those programs.

Additional materials available

Version 1.0 - published on 25 Jul 2018 doi:10.4231/R7MK6B49 - cite this Archived on 25 Aug 2018

Licensed under GNU General Public License 3.0


This publication contains three major utilities: 1) An example web scraper used to pull position descriptions from internet websites. 2) TextCleanupTools, a set of Java programs written against the Stanford CoreNLP library to analyze and parse unstructured text from position descriptions and course curricula data. 3) A set of R scripts used to plot various data points extracted from the analyzed text. 

Cite this work

Researchers should cite this work as follows:


The Purdue University Research Repository (PURR) is a university core research facility provided by the Purdue University Libraries, the Office of the Executive Vice President for Research and Partnerships, and Information Technology at Purdue (ITaP).