Design and methodology of a multilingual semantic-ontological tagger: ESMAS-ES+
Supporting Agencies
- Project PID2022-137170OB-I00, funded by MICIU/AEI/10.13039/501100011033 and by FEDER/ UE.
Abstract
The automatic tagger ESMAS-ES+ aims to annotate semantically and ontologically texts in Spanish, French, German and Galician. Besides examining the feasibility of a new method of analysis, the development of the tagger involves investigating new approaches to intelligent information and knowledge processing, and also to a deep comprehension of meaning. This paper outlines the methodological principles of the tagger’s design and provides an overview of the techniques and strategies applicable for generating sustainable linguistic, multilingual and technological knowledge. These insights will support in turn the development of tools that are adaptable to various languages. The development of ESMAS-ES+ can have a positive impact on several areas of natural language processing, particularly those related to meaning comprehension and disambiguation. Consequently, it can enhance machine-driven readability and understanding of linguistic data.
Downloads
-
Abstract141
-
PDF (Español (España))66
References
ÁLVEZ, Javier, Jordi ATSERIAS, Jordi CARRERA, Salvador CLIMENT, Egoitz LAPARRA, Antoni OLIVER y German RIGAU (2008): «Complete and Consistent Annotation of WordNet using the Top Concept Ontology», en Nicoletta Calzolari et al. (eds.), Proceedings of the 6th Language Resources and Evaluation Conference (LREC'08), Marrakech, Morocco, pp. 1529-1534.
ARIAS-ARIAS, Iván (en prensa): «Nuevas vías para la desambiguación en frases nominales en alemán: fundamentos metodológico-lingüísticos para el desarrollo de una herramienta de anotación semántica (semi)automática», Círculo de Lingüística Aplicada a la Comunicación, 104.
ARIAS-ARIAS, Iván y Elena MARTÍN-CANCELA (en prensa): «Bridging Human and AI Perspectives: Semantic Annotation of Generic Nouns in German», Proceedings of the eLex 2025 conference.
BENTIVOGLI, Luisa, Pamela FORNER, Bernardo MAGNINI y Emanuele PIANTA (2004): «Revising WordNet Domains Hierarchy: semantics, coverage and balancing», en Gilles Sérasset et al. (eds.), Proceedings of Workshop on Multilingual Linguistic Resources, Stroudsburg, Association for Computational Linguistics, pp. 101-108. En línea: <https://dl.acm.org/doi/10.5555/1706238.1706254>.
BOSQUE, Ignacio (dir.) (2004): REDES. Diccionario combinatorio del español contemporáneo, Madrid, SM.
DOMÍNGUEZ VÁZQUEZ, María José (2025): Ontología 2.0. ESMAS-ES+, Santiago de Compostela. En línea: <https://grupoportlex.github.io/ontologia/>.
DOMÍNGUEZ VÁZQUEZ, María José y Rufus H. GOUWS (2023): «The Definition, Presentation and Automatic Generation of Contextual Data in Lexicography», International Journal of Lexicography, 36(3), pp. 233-259. DOI: https://doi.org/10.1093/ijl/ecac020
DOMÍNGUEZ VÁZQUEZ, María José, Carlos VALCÁRCEL RIVEIRO y Daniel BARDANCA OUTEIRIÑO (2021): Ontología léxica, Santiago de Compostela. En línea: <http://portlex.usc.gal/ontologia/>.
DOMÍNGUEZ VÁZQUEZ, María José (dir.), Carlos VALCÁRCEL RIVEIRO, Daniel BARDANCA OUTEIRIÑO, José Antonio CALAÑAS CONTINENTE, Natalia CATALÁ TORRES, Rosa MARTÍN GASCUEÑA, Mónica MIRAZO BALSA, María Teresa SANMARCO BANDE y Laura PINO SERRANO (2021): CombiContext. Prototipo online para la generación automática de contextos frasales y oraciones de la frase nominal en alemán, español y francés, Santiago de Compostela. En línea: <http://portlex.usc.gal/combinatoria/verbal>.
ENGEL, Ulrich (1988): Deutsche Grammatik, Heidelberg, Julius Gross Verlag.
ENGEL, Ulrich (1996): «Semantische Relatoren. Ein Entwurf für künftige Valenzwörterbücher», en Nico Weber (ed.), Semantik, Lexikographie und Computeranwendung, Tubinga, Niemeyer, pp. 223-236. DOI: https://doi.org/10.1515/9783111555522.223
ENGEL, Ulrich (2009): Syntax der deutschen Gegenwartssprache, 4.ª ed., Berlín, Schmidt.
GÓMEZ GUINOVART, Xavier y Miguel SOLLA PORTELA (2018): «Construyendo el WordNet gallego: métodos y aplicaciones», Recursos y evaluación de idiomas, 52(1), pp. 317-339.
GOUWS, Rufus (2014): «Towards bilingual dictionaries with Afrikaans and German as language pair», en María José Domínguez Vázquez et al. (eds.), Zweisprachige Lexicographie zwischen Translation und Didaktik, Berlín, De Gruyter, pp. 249-262. DOI: https://doi.org/10.1515/9783110366631.249
HARRIS, Zellig (1954): «Distributional Structure», Word, 10(2-3), pp. 146-162. DOI: https://doi.org/10.1080/00437956.1954.11659520
IZQUIERDO, Rubén, Armando SUÁREZ y German RIGAU (2007): «Exploring the automatic selection of basic level concepts», en Ruslan Mitkov, Galia Angelova, y Kalina Bontcheva (eds.), Proceedings of the International Conference on Recent Advances in Natural Language Processing, Shoumen, INCOMA, pp. 298-302. En línea: <https://adimen.si.ehu.es/~rigau/publications/ranlp07-isr.pdf>.
LI, Belinda, Maxwell NYE y Jacob ANDREAS (2021): «Implicit Representations of Meaning in Neural Language Models», en Chengqing Zong et al. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, Online: Association for Computational Linguistics, pp. 1813-1827. DOI: https://doi.org/10.18653/v1/2021.acl-long.143
MARTÍN GASCUEÑA, Rosa (2023): «Diseño de una ontología de semántica léxica para los proyectos MultiGenera y MultiComb», RILEX. Revista Sobre Investigaciones léxicas, 6(3), pp. 77-106. DOI: https://doi.org/10.17561/rilex.6.3.8083
MARTINELLI, Giuliano, Francesco Maria MOLFESE, Simone TEDESCHI, Alberte FERNÁNDEZ-CASTRO y Roberto NAVIGLI (2024): «CNER: Concept and Named Entity Recognition», en Kevin Duh, Helena Gomez y Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), México D.F., Association for Computational Linguistics, pp. 8336-8351. DOI: https://doi.org/10.18653/v1/2024.naacl-long.461
MCDONALD, Scott y Michael RAMSCAR (2001): «Testing the distributional hypothesis: The influence of context on judgments of semantic similarity», en Johanna Moore y Keith Stenning, (eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society, Londres, LEA, pp. 611-616.
MEL’ČUK, Igor (2013): Semantics. From meaning to text, Ámsterdam/Filadelfia, John Benjamins.
MIKOLOV, Tomas, Kai CHEN, Greg CORRADO y Jeffrey DEAN (2013): «Efficient Estimation of Word Representations in Vector Space», en Yoshua Bengio y Yann Lecun (eds.), Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, Conference Track Proceedings, pp. 1-12. DOI: https://doi.org/10.48550/arXiv.1301.3781
MILLER, George A., Richard BECKWITH, Christiane FELLBAUM, Derek GROSS y Katherine J. MILLER (1990): «Introduction to WordNet: An On-line Lexical Database», International Journal of Lexicography, 3, pp. 235-244. DOI: https://doi.org/10.1093/ijl/3.4.235
MÜLLER-SPITZER, Carolin, Martina Nied CURCIO, María José DOMÍNGUEZ VÁZQUEZ, Idalete Maria SILVA DIAS y Sascha WOLFER (2018): «Recherchepraxis bei der Verbesserung von Interferenzfehlern aus dem Italienischen, Portugiesischen und Spanischen: Eine explorative Beobachtungsstudie mit DaF-Lernenden», Lexicographica, 34(1), pp. 157-182. DOI: https://doi.org/10.1515/lex-2018-340108
NILES, Ian y Adam PEASE (2001): «Towards a Standard Upper Ontology», en Nicola Guarino, Barry Smith y Christopher Welty (eds.), 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Main, ACM, pp. 17-19. DOI: https://doi.org/10.1145/505168.505170
PEREIRA, Francisco, Bin LOU, Brianna PRITCHETT, Samuel RITTER, Samuel J. GERSHMAN, Nancy KANWISHER, Matthew BOTVINICK y Evelina FEDORENKO (2018): «Toward a universal decoder of linguistic meaning from brain activation», Nature communications, 9, pp. 1-13. DOI: https://doi.org/10.1038/s41467-018-03068-4
PETERS, Matthew, Mark NEUMANN, Mohit IYYER, Matt GARDNER, Christopher CLARK, Kenton LEE y Luke ZETTLEMOYER (2018): «Deep Contextualized Word Representations», en Marilyn Walker (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, New Orleans, Louisiana, Association for Computational Linguistics, pp. 2227-2237. DOI: https://doi.org/10.18653/v1/N18-1202
PURAIVAN, Eduardo, Irene RENAU y Nicolás RIQUELME (2024): «Metaphor Identification and Interpretation in Corpora with ChatGPT», SN Computer Science, 5, art. n.º 976 (2024). DOI: https://doi.org/10.1007/s42979-024-03331-0
RAGANATO, Alessandro, Jose CAMACHO-COLLADOS y Roberto NAVIGLI (2017): «Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison», en Mirella Lapata, Phil Blunsom y Alexander Koller (eds.), Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Association for Computational Linguistics, pp. 99-110. DOI: https://doi.org/10.18653/v1/E17-1010
RENAU, Irene, Rogelio NAZAR, Ana CASTRO, Benjamín LÓPEZ y Javier OBREQUE (2019): «Verbo y contexto de uso: Un análisis basado en corpus con métodos cualitativos y cuantitativos», Revista Signos, 52(101), pp. 878-901. DOI: http://dx.doi.org/10.4067/S0718-09342019000300878
TRAP-JENSEN, Lars (2018): «Lexicography beHtween NLP and Linguistics: Aspects of Theory and Practice», en Jaka Čibej et al. (eds.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana, Ljubljana University Press, pp. 25-37.
VOSSEN, Piek (1998): «EuroWordNet: Building a Multilingual Database with wordnets for European Languages», ELRA Newsletter, 3(1), pp. 7-10.
WEISCHEDEL, Ralph, Martha PALMER, Mitchell MARCUS, Hovy EDUARD, Sameer PRADHAN, Lance RAMSHAW, Nianwen XUE, Ann TAYLOR, Jeff KAUFMAN, Michelle FRANCHINI, Mohammed El-BACHOUTI, Robert BELVIN y Ann HOUSTON (2022): OntoNotes Release 5.0 (Version V1), Borealis. DOI: https://doi.org/10.5683/SP2/KPKFPI
RECURSOS ELECTRÓNICOS [último acceso a todos los recursos electrónicos: 24/9/2025]
AnCora = http://clic.ub.edu/corpus/es/ancora
BabelNet = https://babelnet.org/
ChatGPT = https://chat.chatbotapp.ai/
Combina = http://portlex.usc.gal/develop/combina.php
Copilot = https://www.microsoft.com/es/microsoft-copilot/organizations
CorefAnnotator = https://github.com/nilsreiter/CorefAnnotator
DeepSeek = https://chat.deepseek.com/
Derekovecs = https://corpora.ids-mannheim.de/openlab/derekovecs/
DICE = http://www.dicesp.com/paginas/index/2
DQF-MQM = https://www.taus.net/resources/blog/dqf-mqm-beyond-automatic-mt-quality-metrics
EuroWordNet = https://archive.illc.uva.nl/EuroWordNet/
EuroWordNet Top-Ontologie = https://archive.illc.uva.nl/EuroWordNet/corebcs/ewnTopOntology.html#_Toc419884299
Flexiona = http://portlex.usc.gal/develop/flexiona.php
Flexionador = https://ilg.usc.gal/flexionador
FrameNet = https://framenet.icsi.berkeley.edu/fndrupal/
FreeLing’s dictionaries = http://nlp.lsi.upc.edu/freeling/node/1
FunGramKB = https://fungramkb.ucam.edu/
Gemini = https://gemini.google.com
GermaNet = https://uni-tuebingen.de/en/142806
Kind = http://www.tecling.com/kind
Lematiza = http://portlex.usc.gal/develop/lematiza/
Linguakit = https://linguakit.com/es/analisis-completo
Louw & Nide Model = https://ucrel.lancs.ac.uk/usas/Louw&Nida/Louw&Nida_frameset.htm
MyMemory = https://mymemory.translated.net/
Multilingual central repository = https://adimen.si.ehu.es/web/MCR
Multitools = http://portlex.usc.gal/combinatoria/
NomBank = https://nlp.cs.nyu.edu/meyers/NomBank.html
Odgen = http://ogden.basic-english.org/bewords.html
OntoNotes 5.0 = https://catalog.ldc.upenn.edu/LDC2013T19
OPUS = https://opus.nlpl.eu/
PDEV/ CPA = https://pdev.org.uk/
PropBank = http://verbs.colorado.edu/~mpalmer/projects/ace.html
PyMusas = https://pypi.org/project/pymusas/
Semantic Domains = https://semdom.org
SemantiGal = https://tec.citius.usc.es/demos-lingua/index
SemLink = https://verbs.colorado.edu/semlink/
SenSem = http://grial.edu.es/sensem/corpus/main
Sketch Engine = https://www.sketchengine.eu
Tecling = https://www.tecling.com/
TraduWord = https://ilg.usc.gal/gl/proxectos/interoperabilidade-de-recursos-e-producion-automatica-de-linguaxe-natural
TreeTagger = https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
UAM Corpus Tool = www.corpustool.com/index.html
USAS = http://ucrel-api.lancaster.ac.uk
Verbario = http://www.tecling.com/verbario/
VerbNet = https://verbs.colorado.edu/~mpalmer/projects/verbnet.html
Wikcionario = https://es.wikipedia.org/wiki/Wikcionario
WordNet = https://wordnet.princeton.edu
Xera = http://portlex.usc.gal/combinatoria/usuario
XeraWord = http://ilg.usc.es/xeraword/en/
XIADA = http://corpus.cirp.gal/xiada
Copyright (c) 2025 Journal of Linguistic Research

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The works published in this magazine are subject to the following terms:
1. The Publications Service of the University of Murcia (the publisher) preserves the economic rights (copyright) of the published works, and favors and allows the reuse of same under the license of use indicated in point 2.
2. The papers are published in the electronic edition of the magazine under a Creative Commons Attribution-NonCommercial-NoDerivative 3.0 Spain license (legal text). Papers may be copied, used, disseminated, transmitted and publicly exhibited if the following requirements are met: i) The authorship and the original source of its publication (magazine, editorial and URL of the work) must be cited; ii) The works cannot be used for commercial purposes; iii) The existence and specifications of this user license must be explicitly mentioned.
3. Self-archiving conditions. Authors can electronically disseminate pre-print versions (version before being evaluated) and / or post-print versions (version evaluated and accepted for publication). This makes possible its circulation and diffusion earlier and with it a possible increase in its citation and reach among the academic community. RoMEO color: green.



