Regular Expression Dictionaries Derived from Data Scientist Positions and Course Curriculum

Listed in Datasets

By Corey S Seliger

Purdue University

These dictionaries are ready to be used with the Stanford CoreNLP for classifying data scientist, statistics, and technology phrases.

Additional materials available

Version 1.0 - published on 25 Jul 2018 doi:10.4231/R7R78CGR - cite this Archived on 25 Aug 2018

Licensed under CC0 1.0 Universal


These dictionaries were used in a study to understand relationships between employer requirements for data scientists and the educational curriculum provided by higher education institutions. These dictionaries are being provided as a resource for anyone wishing to replicate this study. 

Cite this work

Researchers should cite this work as follows:


The Purdue University Research Repository (PURR) is a university core research facility provided by the Purdue University Libraries, the Office of the Executive Vice President for Research and Partnerships, and Information Technology at Purdue (ITaP).