Design and methodology of a multilingual semantic-ontological tagger: ESMAS-ES+

Authors

DOI: https://doi.org/10.6018/ril.662171
Keywords: ontologies, categorial meaning, natural language processing, sustainability

Supporting Agencies

  • Project PID2022-137170OB-I00, funded by MICIU/AEI/10.13039/501100011033 and by FEDER/ UE.

Abstract

The automatic tagger ESMAS-ES+ aims to annotate semantically and ontologically texts in Spanish, French, German and Galician. Besides examining the feasibility of a new method of analysis, the development of the tagger involves investigating new approaches to intelligent information and knowledge processing, and also to a deep comprehension of meaning. This paper outlines the methodological principles of the tagger’s design and provides an overview of the techniques and strategies applicable for generating sustainable linguistic, multilingual and technological knowledge. These insights will support in turn the development of tools that are adaptable to various languages. The development of ESMAS-ES+ can have a positive impact on several areas of natural language processing, particularly those related to meaning comprehension and disambiguation. Consequently, it can enhance machine-driven readability and understanding of linguistic data.

Downloads

Download data is not yet available.
Metrics
Views/Downloads
  • Abstract
    141
  • PDF (Español (España))
    66

References

ÁLVEZ, Javier, Jordi ATSERIAS, Jordi CARRERA, Salvador CLIMENT, Egoitz LAPARRA, Antoni OLIVER y German RIGAU (2008): «Complete and Consistent Annotation of WordNet using the Top Concept Ontology», en Nicoletta Calzolari et al. (eds.), Proceedings of the 6th Language Resources and Evaluation Conference (LREC'08), Marrakech, Morocco, pp. 1529-1534.

ARIAS-ARIAS, Iván (en prensa): «Nuevas vías para la desambiguación en frases nominales en alemán: fundamentos metodológico-lingüísticos para el desarrollo de una herramienta de anotación semántica (semi)automática», Círculo de Lingüística Aplicada a la Comunicación, 104.

ARIAS-ARIAS, Iván y Elena MARTÍN-CANCELA (en prensa): «Bridging Human and AI Perspectives: Semantic Annotation of Generic Nouns in German», Proceedings of the eLex 2025 conference.

BENTIVOGLI, Luisa, Pamela FORNER, Bernardo MAGNINI y Emanuele PIANTA (2004): «Revising WordNet Domains Hierarchy: semantics, coverage and balancing», en Gilles Sérasset et al. (eds.), Proceedings of Workshop on Multilingual Linguistic Resources, Stroudsburg, Association for Computational Linguistics, pp. 101-108. En línea: <https://dl.acm.org/doi/10.5555/1706238.1706254>.

BOSQUE, Ignacio (dir.) (2004): REDES. Diccionario combinatorio del español contemporáneo, Madrid, SM.

DOMÍNGUEZ VÁZQUEZ, María José (2025): Ontología 2.0. ESMAS-ES+, Santiago de Compostela. En línea: <https://grupoportlex.github.io/ontologia/>.

DOMÍNGUEZ VÁZQUEZ, María José y Rufus H. GOUWS (2023): «The Definition, Presentation and Automatic Generation of Contextual Data in Lexicography», International Journal of Lexicography, 36(3), pp. 233-259. DOI: https://doi.org/10.1093/ijl/ecac020

DOMÍNGUEZ VÁZQUEZ, María José, Carlos VALCÁRCEL RIVEIRO y Daniel BARDANCA OUTEIRIÑO (2021): Ontología léxica, Santiago de Compostela. En línea: <http://portlex.usc.gal/ontologia/>.

DOMÍNGUEZ VÁZQUEZ, María José (dir.), Carlos VALCÁRCEL RIVEIRO, Daniel BARDANCA OUTEIRIÑO, José Antonio CALAÑAS CONTINENTE, Natalia CATALÁ TORRES, Rosa MARTÍN GASCUEÑA, Mónica MIRAZO BALSA, María Teresa SANMARCO BANDE y Laura PINO SERRANO (2021): CombiContext. Prototipo online para la generación automática de contextos frasales y oraciones de la frase nominal en alemán, español y francés, Santiago de Compostela. En línea: <http://portlex.usc.gal/combinatoria/verbal>.

ENGEL, Ulrich (1988): Deutsche Grammatik, Heidelberg, Julius Gross Verlag.

ENGEL, Ulrich (1996): «Semantische Relatoren. Ein Entwurf für künftige Valenzwörterbücher», en Nico Weber (ed.), Semantik, Lexikographie und Computeranwendung, Tubinga, Niemeyer, pp. 223-236. DOI: https://doi.org/10.1515/9783111555522.223

ENGEL, Ulrich (2009): Syntax der deutschen Gegenwartssprache, 4.ª ed., Berlín, Schmidt.

GÓMEZ GUINOVART, Xavier y Miguel SOLLA PORTELA (2018): «Construyendo el WordNet gallego: métodos y aplicaciones», Recursos y evaluación de idiomas, 52(1), pp. 317-339.

GOUWS, Rufus (2014): «Towards bilingual dictionaries with Afrikaans and German as language pair», en María José Domínguez Vázquez et al. (eds.), Zweisprachige Lexicographie zwischen Translation und Didaktik, Berlín, De Gruyter, pp. 249-262. DOI: https://doi.org/10.1515/9783110366631.249

HARRIS, Zellig (1954): «Distributional Structure», Word, 10(2-3), pp. 146-162. DOI: https://doi.org/10.1080/00437956.1954.11659520

IZQUIERDO, Rubén, Armando SUÁREZ y German RIGAU (2007): «Exploring the automatic selection of basic level concepts», en Ruslan Mitkov, Galia Angelova, y Kalina Bontcheva (eds.), Proceedings of the International Conference on Recent Advances in Natural Language Processing, Shoumen, INCOMA, pp. 298-302. En línea: <https://adimen.si.ehu.es/~rigau/publications/ranlp07-isr.pdf>.

LI, Belinda, Maxwell NYE y Jacob ANDREAS (2021): «Implicit Representations of Meaning in Neural Language Models», en Chengqing Zong et al. (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, vol. 1, Online: Association for Computational Linguistics, pp. 1813-1827. DOI: https://doi.org/10.18653/v1/2021.acl-long.143

MARTÍN GASCUEÑA, Rosa (2023): «Diseño de una ontología de semántica léxica para los proyectos MultiGenera y MultiComb», RILEX. Revista Sobre Investigaciones léxicas, 6(3), pp. 77-106. DOI: https://doi.org/10.17561/rilex.6.3.8083

MARTINELLI, Giuliano, Francesco Maria MOLFESE, Simone TEDESCHI, Alberte FERNÁNDEZ-CASTRO y Roberto NAVIGLI (2024): «CNER: Concept and Named Entity Recognition», en Kevin Duh, Helena Gomez y Steven Bethard (eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), México D.F., Association for Computational Linguistics, pp. 8336-8351. DOI: https://doi.org/10.18653/v1/2024.naacl-long.461

MCDONALD, Scott y Michael RAMSCAR (2001): «Testing the distributional hypothesis: The influence of context on judgments of semantic similarity», en Johanna Moore y Keith Stenning, (eds.), Proceedings of the 23rd Annual Conference of the Cognitive Science Society, Londres, LEA, pp. 611-616.

MEL’ČUK, Igor (2013): Semantics. From meaning to text, Ámsterdam/Filadelfia, John Benjamins.

MIKOLOV, Tomas, Kai CHEN, Greg CORRADO y Jeffrey DEAN (2013): «Efficient Estimation of Word Representations in Vector Space», en Yoshua Bengio y Yann Lecun (eds.), Proceeding of the International Conference on Learning Representations Workshop Track, Arizona, Conference Track Proceedings, pp. 1-12. DOI: https://doi.org/10.48550/arXiv.1301.3781

MILLER, George A., Richard BECKWITH, Christiane FELLBAUM, Derek GROSS y Katherine J. MILLER (1990): «Introduction to WordNet: An On-line Lexical Database», International Journal of Lexicography, 3, pp. 235-244. DOI: https://doi.org/10.1093/ijl/3.4.235

MÜLLER-SPITZER, Carolin, Martina Nied CURCIO, María José DOMÍNGUEZ VÁZQUEZ, Idalete Maria SILVA DIAS y Sascha WOLFER (2018): «Recherchepraxis bei der Verbesserung von Interferenzfehlern aus dem Italienischen, Portugiesischen und Spanischen: Eine explorative Beobachtungsstudie mit DaF-Lernenden», Lexicographica, 34(1), pp. 157-182. DOI: https://doi.org/10.1515/lex-2018-340108

NILES, Ian y Adam PEASE (2001): «Towards a Standard Upper Ontology», en Nicola Guarino, Barry Smith y Christopher Welty (eds.), 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Main, ACM, pp. 17-19. DOI: https://doi.org/10.1145/505168.505170

PEREIRA, Francisco, Bin LOU, Brianna PRITCHETT, Samuel RITTER, Samuel J. GERSHMAN, Nancy KANWISHER, Matthew BOTVINICK y Evelina FEDORENKO (2018): «Toward a universal decoder of linguistic meaning from brain activation», Nature communications, 9, pp. 1-13. DOI: https://doi.org/10.1038/s41467-018-03068-4

PETERS, Matthew, Mark NEUMANN, Mohit IYYER, Matt GARDNER, Christopher CLARK, Kenton LEE y Luke ZETTLEMOYER (2018): «Deep Contextualized Word Representations», en Marilyn Walker (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, New Orleans, Louisiana, Association for Computational Linguistics, pp. 2227-2237. DOI: https://doi.org/10.18653/v1/N18-1202

PURAIVAN, Eduardo, Irene RENAU y Nicolás RIQUELME (2024): «Metaphor Identification and Interpretation in Corpora with ChatGPT», SN Computer Science, 5, art. n.º 976 (2024). DOI: https://doi.org/10.1007/s42979-024-03331-0

RAGANATO, Alessandro, Jose CAMACHO-COLLADOS y Roberto NAVIGLI (2017): «Word Sense Disambiguation: A Unified Evaluation Framework and Empirical Comparison», en Mirella Lapata, Phil Blunsom y Alexander Koller (eds.), Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Association for Computational Linguistics, pp. 99-110. DOI: https://doi.org/10.18653/v1/E17-1010

RENAU, Irene, Rogelio NAZAR, Ana CASTRO, Benjamín LÓPEZ y Javier OBREQUE (2019): «Verbo y contexto de uso: Un análisis basado en corpus con métodos cualitativos y cuantitativos», Revista Signos, 52(101), pp. 878-901. DOI: http://dx.doi.org/10.4067/S0718-09342019000300878

TRAP-JENSEN, Lars (2018): «Lexicography beHtween NLP and Linguistics: Aspects of Theory and Practice», en Jaka Čibej et al. (eds.), Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana, Ljubljana University Press, pp. 25-37.

VOSSEN, Piek (1998): «EuroWordNet: Building a Multilingual Database with wordnets for European Languages», ELRA Newsletter, 3(1), pp. 7-10.

WEISCHEDEL, Ralph, Martha PALMER, Mitchell MARCUS, Hovy EDUARD, Sameer PRADHAN, Lance RAMSHAW, Nianwen XUE, Ann TAYLOR, Jeff KAUFMAN, Michelle FRANCHINI, Mohammed El-BACHOUTI, Robert BELVIN y Ann HOUSTON (2022): OntoNotes Release 5.0 (Version V1), Borealis. DOI: https://doi.org/10.5683/SP2/KPKFPI

RECURSOS ELECTRÓNICOS [último acceso a todos los recursos electrónicos: 24/9/2025]

AnCora = http://clic.ub.edu/corpus/es/ancora

BabelNet = https://babelnet.org/

ChatGPT = https://chat.chatbotapp.ai/

Combina = http://portlex.usc.gal/develop/combina.php

Copilot = https://www.microsoft.com/es/microsoft-copilot/organizations

CorefAnnotator = https://github.com/nilsreiter/CorefAnnotator

DeepSeek = https://chat.deepseek.com/

Derekovecs = https://corpora.ids-mannheim.de/openlab/derekovecs/

DICE = http://www.dicesp.com/paginas/index/2

DQF-MQM = https://www.taus.net/resources/blog/dqf-mqm-beyond-automatic-mt-quality-metrics

EuroWordNet = https://archive.illc.uva.nl/EuroWordNet/

EuroWordNet Top-Ontologie = https://archive.illc.uva.nl/EuroWordNet/corebcs/ewnTopOntology.html#_Toc419884299

Flexiona = http://portlex.usc.gal/develop/flexiona.php

Flexionador = https://ilg.usc.gal/flexionador

FrameNet = https://framenet.icsi.berkeley.edu/fndrupal/

FreeLing’s dictionaries = http://nlp.lsi.upc.edu/freeling/node/1

FunGramKB = https://fungramkb.ucam.edu/

Gemini = https://gemini.google.com

GermaNet = https://uni-tuebingen.de/en/142806

Kind = http://www.tecling.com/kind

Lematiza = http://portlex.usc.gal/develop/lematiza/

Linguakit = https://linguakit.com/es/analisis-completo

Louw & Nide Model = https://ucrel.lancs.ac.uk/usas/Louw&Nida/Louw&Nida_frameset.htm

MyMemory = https://mymemory.translated.net/

Multilingual central repository = https://adimen.si.ehu.es/web/MCR

Multitools = http://portlex.usc.gal/combinatoria/

NomBank = https://nlp.cs.nyu.edu/meyers/NomBank.html

Odgen = http://ogden.basic-english.org/bewords.html

OntoNotes 5.0 = https://catalog.ldc.upenn.edu/LDC2013T19

OPUS = https://opus.nlpl.eu/

PDEV/ CPA = https://pdev.org.uk/

PropBank = http://verbs.colorado.edu/~mpalmer/projects/ace.html

PyMusas = https://pypi.org/project/pymusas/

Semantic Domains = https://semdom.org

SemantiGal = https://tec.citius.usc.es/demos-lingua/index

SemLink = https://verbs.colorado.edu/semlink/

SenSem = http://grial.edu.es/sensem/corpus/main

Sketch Engine = https://www.sketchengine.eu

Tecling = https://www.tecling.com/

TraduWord = https://ilg.usc.gal/gl/proxectos/interoperabilidade-de-recursos-e-producion-automatica-de-linguaxe-natural

TreeTagger = https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

UAM Corpus Tool = www.corpustool.com/index.html

USAS = http://ucrel-api.lancaster.ac.uk

Verbario = http://www.tecling.com/verbario/

VerbNet = https://verbs.colorado.edu/~mpalmer/projects/verbnet.html

Wikcionario = https://es.wikipedia.org/wiki/Wikcionario

WordNet = https://wordnet.princeton.edu

Xera = http://portlex.usc.gal/combinatoria/usuario

XeraWord = http://ilg.usc.es/xeraword/en/

XIADA = http://corpus.cirp.gal/xiada

Published
27-11-2025
How to Cite
Domínguez Vázquez, M. J. (2025). Design and methodology of a multilingual semantic-ontological tagger: ESMAS-ES+. Journal of Linguistic Research, 28, 175–192. https://doi.org/10.6018/ril.662171
Issue
Section
Articles