INVESTIGATING TYPE-TOKEN REGRESSION AND ITS POTENTIAL FOR AUTOMATED TEXT DISCRIMINATION

Authors

  • Pascual Cantos Gómez

Keywords:

Corpus linguistics, type-token regression, text typology, automated text classification

Abstract

The motivation of the present paper is base don the intuition that the sole use of data on lexical relative to text samples of variations languages, authors, linguistic domains, etc. might be a potential indicator for automated text discrimination. In order to look for a reliable and valid lexical density index, we shall review and clarify the mathematical relationship between types (word forms) and tokens (words) by discussing and constructing adequeate regression models that might help to differentiate text types from each other. Additionally we shall use multivariate statistical models (cluster analysis and discriminant analysis) to complement the mathematical lexical density regression model (TYT-formula).

Author Biography

Pascual Cantos Gómez

Departaniento de Filología Inglesa Universidad de Murcia

Issue

Section

Artículos