A statistical approach to term extraction

Authors

  • Rogelio Nazar
DOI: https://doi.org/10.6018/ijes/2011/2/149691
Keywords: English technical terminology, terminology extraction, computational terminography, quantitative linguistics

Abstract

This paper argues in favor of a statistical approach to terminology extraction, general to all languages but with language specific parameters. In contrast to many application-oriented terminology studies, which are focused on a particular language and domain, this paper adopts some general principles of the statistical properties of terms and a method to obtain the corresponding language specific parameters. This method is used for the automatic identification of terminology and is quantitatively evaluated in an empirical study of English medical terms. The proposal is theoretically and computationally simple and disregards resources such as linguistic or ontological knowledge. The algorithm learns to identify terms during a training phase where it is shown examples of both terminological and non-terminological units. With these examples, the algorithm creates a model of the terminology that accounts for the frequency of lexical, morphological and syntactic elements of the terms in relation to the non-terminological vocabulary. The model is then used for the later identification of new terminology in previously unseen text. The comparative evaluation shows that performance is significantly higher than other well-known systems.

Downloads

Download data is not yet available.

Author Biography

Rogelio Nazar

was born in Mendoza, Argentina, in 1975, and is currently living in Barcelona. His initial academic background is communication studies but, after some years of experience in the private sector in this field, since 2003 he is full time doing research in linguistics at the Institute for Applied Linguistics in Pompeu Fabra University. His fields of interest are quantitative linguistics, corpus linguistics, computational linguistics, natural language processing, semantics and terminology.
Published
01-12-2011
How to Cite
Nazar, R. (2011). A statistical approach to term extraction. International Journal of English Studies, 11(2), 159–182. https://doi.org/10.6018/ijes/2011/2/149691