Corpus Linguistics, Network Analysis and Co-occurrence Matrices


  • Keith Stuart
  • Ana Botella
Keywords: corpus linguistics, co-occurrence matrices, semantic networks, knowledge discovery


This article describes research undertaken in order to design a methodology for the reticular representation of knowledge of a specific discourse community. To achieve this goal, a representative corpus of the scientific production of the members of this discourse community (Universidad Politécnica de Valencia, UPV) was created. The article presents the practical analysis (frequency, keyword, collocation and cluster analysis) that was carried out in the initial phases of the study aimed at establishing the theoretical and practical background and framework for our matrix and network analysis of the scientific discourse of the UPV. In the methodology section, the processes that have allowed us to extract from the corpus the linguistic elements needed to develop co-occurrence matrices, as well as the computer tools used in the research, are described. From these co-occurrence matrices, semantic networks of subject and discipline knowledge were generated. Finally, based on the results obtained, we suggest that it may be viable to extract and to represent the intellectual capital of an academic institution using corpus linguistics methods in combination with the formulations of network theory.


Download data is not yet available.
How to Cite
Stuart, K., & Botella, A. (2009). Corpus Linguistics, Network Analysis and Co-occurrence Matrices. International Journal of English Studies, 9(3), 1–20. Retrieved from