Assess feasibility and potential utility of natural language processing (NLP) for storing and analyzing occupational health data.Methods:
Basic NLP lexical analysis methods were applied to 89,000 Mine Safety and Health Administration (MSHA) free text records. Steps included tokenization, term and co-occurrence counts, term annotation, and identifying exposure–health effect relationships. Presence of terms in the Unified Medical Language System (UMLS) was assessed.Results:
The methods efficiently demonstrated common exposures, health effects, and exposure–injury relationships. Many workplace terms are not present in UMLS or map inaccurately.Conclusions:
Use of free text rather than narrowly defined numerically coded fields is feasible, flexible, and efficient. It has potential to encourage workers and clinicians to provide more data and to support automated knowledge creation. The lexical method used is easily generalizable to other areas. The UMLS vocabularies should be enhanced to be relevant to occupational health.