This dataset of Part-of-Speech (POS) tagged building codes contains 1,522 sentences from Chapters 5 and 10 of 2015 International Building Code. It adopts the original version of Penn Treebank tag set for the POS tags. It includes tagging results from 5 human annotators and 7 machine taggers. It also provides the most commonly chosen POS tag for each word by machine taggers and by human annotators. For detailed explanations of the meanings of the POS tags, please refer to Building a Large Annotated Corpus of English: The Penn Treebank . For an explanation of the development of this dataset, please refer to the following paper .
1. Marcus, Mitchell & Ann Marcinkiewicz, Mary & Santorini, Beatrice. (2002). Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics. 19. 313-330.
2. Xue, X., and Zhang, J. (2019). "Evaluation of Seven Part-of-Speech Taggers in Tagging Building Codes: Identifying the Best Performing Tagger and Common Sources of Errors." Proc., ASCE Construction Research Congress, ASCE, Reston, VA, submitted.
Cite this work
Researchers should cite this work as follows: