Babelscape is committed to carrying out research and develop technological innovation at the highest level in the field of multilingual Natural Language Understanding.

Research, projects & publications

All Babelscape products are the result of significant investments in Research & Development (R&D), and are comprised by the following macro-projects:

Large Language Models, with the goal of creating LLM systems that are multilingually scalable, secure and grounded.

Semantic Natural Language Understanding, with the central goal of devising novel approaches that can determine the meaning of text independently of the language they are written in, by associating words and phrases with concepts, entities, predicates and their arguments, meaning structure formalisms, relations, emotions and much more!

Personalizable Multilingual Knowledge Graphs, by creating large multilingual domain-oriented or need-oriented knowledge graphs starting from datasets and term banks.

TraDeInterpret, where we enable the semantic interpretation of strings (trademarks) across all EU languages depending on the Nice classes and the Goods & Services description of the trademark.

KnowGraphs, a scientific project involving 15 Early-Stage Researchers, 2 of which as Babelscape's employees.

Babelscape has been funding 5 industrial PhD positions for its employees based on agreements with the Sapienza University of Rome to carry out research. Currently, the company collaborates with Sapienza under the Convention for the Co-financing of 1 Scholarship for the National PhD Program in Artificial Intelligence (40th Cycle, DM 630/2024).

Scientific projects

LLMs4EU - Multilingual AI for Europe’s digital future

The LLMs4EU (Large Language Models for Europe) project aims to advance multilingual and trustworthy generative AI across the European Union. Coordinated by the Alliance for Language Technologies (ALT-EDIC), the initiative brings together academic and industrial stakeholders to preserve Europe’s linguistic and cultural diversity in the digital era. LLMs4EU addresses the risk that low-resource European languages may be excluded from AI progress due to limited training data. By fostering collaboration and data sharing, the project supports the development of open, inclusive, and high-quality language models. LLMs4EU promotes the fair use of language resources and ensures that future AI technologies remain transparent, safe, and aligned with core European values.

AtLaS - European Defence Framework

The AI-based Natural Language Processing of Low-Quality and Multilingual Data in Defence Applications with Self-Adaptation (AtLaS) project aims to revolutionize Human Language Technology (HLT) in Defence applications. Spearheaded by a diverse European consortium, AtLaS develops advanced HLTs for challenging Defence contexts, ensuring robust communication and seamless information processing. AtLaS focuses on creating resilient systems that withstand noise and handle multiple languages. Utilizing cutting-edge technologies like denoising and integrating neural networks with semantic knowledge, AtLaS creates solutions for effective communication in diverse Defence scenarios.

KnowGraphs - EU Initial Training Network

Knowledge graphs (KGs) are a flexible knowledge representation paradigm intended to allow knowledge to be consumed by humans and machines. KGs are widely regarded as a key enabler for a number of increasingly popular technologies including Web search, question answering, personal assistants and AI across most sectors including Industry 4.0, personalized medicine, legislation, economics and more. KGs are now used by several large companies as a key component of their data products. However, while they are rightly praised as a key technology for all future data-driven enterprises and regarded as a promising approach towards “blurring the lines between human and machine”, KGs are currently unattainable for the majority of companies and users.

https://knowgraphs.eu/

Publications

ALL

2025

2024

2023

2022

2021

2020

2019

Simone Teglia, Simone Tedeschi, Roberto Navigli

How Much Do Pretrained Language Models Know About Word Senses?

Proceedings of ACL 2025

Francesco Maria Molfese, Luca Moroni, Luca Gioffrè, Alessandro Scirè, Simone Conia, Roberto Navigli

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering

Findings of ACL 2025

T. Nakamura, M. Mishra, S. Tedeschi, Y. Chai, J. Stillerman, F. Friedrich, P. Yadav, T. Laud, V. Chien, T. Zhuo, D. Misra, B. Bogin, X. Vu, M. Karpinska, A. Dantuluri, W. Kusa, T. Furlanello, R. Yokota, N. Muennighoff, S. Pai, T. Adewumi, V. Laippala, X. Yao, A. Junior, A. Ariyak, A. Drozd, J. Clive, K. Gupta, L. Chen, Q. Sun, K. Tsui, N. Persaud, N. Fahmy, T. Chen, M. Bansal, N. Monti, T. Dang, Z. Luo, T. Bui, R. Navigli, V. Mehta, M. Blumberg, V. May, H. Nguyen, S. Pyysalo

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Proceedings of COLING 2025

Luca Moroni, Giovanni Puccetti, Pere-Lluís Huguet Cabot, Andrei Stefan Bejgu, Alessio Miaschi, Edoardo Barba, Felice Dell’Orletta, Andrea Esuli, Roberto Navigli

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Findings of NAACL 2025

Kenneth Enevoldsen, Isaac Chung, Imene Kerboua, Márton Kardos, Ashwin Mathur, David Stap, Jay Gala, Wissam Siblini, Dominik Krzemiński, Genta Indra Winata, Saba Sturua, Saiteja Utpala, Mathieu Ciancone, Marion Schaeffer, Gabriel Sequeira, Diganta Misra, Shreeya Dhakal, Jonathan Rystrøm, Roman Solomatin, Ömer Çağatan, Akash Kundu, Martin Bernstorff, Shitao Xiao, Akshita Sukhlecha, Bhavish Pahwa, Rafał Poświata, Kranthi Kiran GV, Shawon Ashraf, Daniel Auras, Björn Plüster, Jan Philipp Harries, Loïc Magne, Isabelle Mohr, Mariya Hendriksen, Dawei Zhu, Hippolyte Gisserot-Boukhlef, Tom Aarsen, Jan Kostkan, Konrad Wojtasik, Taemin Lee, Marek Šuppa, Crystina Zhang, Roberta Rocca, Mohammed Hamdy, Andrianos Michail, John Yang, Manuel Faysse, Aleksei Vatolin, Nandan Thakur, Manan Dey, Dipam Vasani, Pranjal Chitale, Simone Tedeschi, Nguyen Tai, Artem Snegirev, Michael Günther, Mengzhou Xia, Weijia Shi, Xing Han Lù, Jordan Clive, Gayatri Krishnakumar, Anna Maksimova, Silvan Wehrli, Maria Tikhonova, Henil Panchal, Aleksandr Abramov, Malte Ostendorff, Zheng Liu, Simon Clematide, Lester James Miranda, Alena Fenogenova, Guangyu Song, Ruqiya Bin Safi, Wen-Ding Li, Alessia Borghini, Federico Cassano, Hongjin Su, Jimmy Lin, Howard Yen, Lasse Hansen, Sara Hooker, Chenghao Xiao, Vaibhav Adlakha, Orion Weller, Siva Reddy, Niklas Muennighoff

MMTEB: Massive Multilingual Text Embedding Benchmark

ICLR 2025

Roberto Navigli, Pasquale Silvestri, Marco Lo Pinto, Dennis Rotondi, Simone Ciciliano, Alessandro Scirè

NounAtlas: Filling the Gap in Nominal Semantic Role Labeling

Proceedings of ACL 2024

Outstanding Paper Award ACL 2024

Stefano Perrella, Lorenzo Proietti, Alessandro Scirè, Edoardo Barba, Roberto Navigli

Guardians of the Machine Translation Meta-Evaluation Sentinel Metrics Fall In!

Proceedings of ACL 2024

Andrei Stefan Bejgu, Edoardo Barba, Luigi Procopio, Alberte Fernández-Castro, Roberto Navigli

Word Sense Linking: Disambiguating Outside the Sandbox

Proceedings of ACL 2024

Alessandro Scirè, Karim Ghonim, Roberto Navigli

FENICE: Factuality Evaluation of summarization based on NLI and Claim Extraction

Proceedings of ACL 2024

Alessandro Scirè, Andrei Stefan Bejgu, Simone Tedeschi, Karim Ghonim, Federico Martelli, Roberto Navigli

Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis

arXiv

Simone Tedeschi, Felix Friedrich, Patrick Schramowski, Kristian Kersting, Roberto Navigli, Huu Nguyen, Bo Li

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

arXiv

Giuliano Martinelli, Francesco Maria Molfese, Simone Tedeschi, Alberte Fernández-Castro, Roberto Navigli

CNER: Concept and Named Entity Recognition

Proceedings of NAACL 2024

Simone Conia, Edoardo Barba, Abelardo Carlos Martínez Lorenzo, Pere-Lluís Huguet Cabot, Riccardo Orlando, Luigi Procopio, Roberto Navigli

MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus

Proceedings of NAACL 2024

Lorenzo Proietti, Stefano Perrella, Simone Tedeschi, Giulia Vulpis, Leonardo Lavalle, Andrea Sanchietti, Andrea Ferrari, Roberto Navigli

Analyzing Homonymy Disambiguation Capabilities of Pretrained Language Models

Proceedings of LREC-COLING 2024

Iacopo Ghinassi, Simone Tedeschi, Paola Marongiu, Roberto Navigli, Barbara McGillivray

Language Pivoting from Parallel Corpora for Word Sense Disambiguation of Historical Languages: A Case Study on Latin

Proceedings of LREC-COLING 2024

Francesco Maria Molfese, Andrei Stefan Bejgu, Simone Tedeschi, Simone Conia, Roberto Navigli

CroCoAlign: A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts

Proceedings of EACL 2024

F. Martelli, A.S. Bejgu, C. Campagnano, J. Čibej, R. Costa, A. Gantar, J. Kallas, S. Peneva Koeva, K. Koppel, S. Krek, M. Langemets, V. Lipp, S. Nimb, S. Olsen, B.S. Pedersen, V. Quochi, A. Salgado, L. Simon, C. Tiberius, R. Ureña-Ruiz, R. Navigli

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Proceedings of CLiC-it 2023

S. Tedeschi, J. Bos, T. Declerck, J. Hajic, D. Hershcovich, E.H. Hovy, A. Koller, S. Krek, S. Schockaert, R. Sennrich, E. Shutova, R. Navigli

What's the Meaning of Superhuman Performance in Today's NLU?

Proceedings of ACL 2023

Outstanding Paper Award

Pere-Lluís Huguet Cabot, Simone Tedeschi, Axel-Cyrille Ngonga Ngomo, Roberto Navigli

RED^FM: a Filtered and Multilingual Relation Extraction Dataset

Proceedings of ACL 2023

Alessandro Scirè, Simone Conia, Simone Ciciliano, Roberto Navigli

Echoes from Alexandria: A Large Resource for Multilingual Book Summarization

Findings of ACL 2023

Pavlo Vasylenko, Pere-Lluís Huguet Cabot, Abelardo Carlos Martínez Lorenzo, Roberto Navigli

Incorporating Graph Information in Transformer-based AMR Parsing

Findings of ACL 2023

Abelardo Carlos Martínez Lorenzo, Pere-Lluís Huguet Cabot, Roberto Navigli

AMRs Assemble! Learning to Ensemble with Autoregressive Models for AMR Parsing

Proceedings of ACL 2023

Abelardo Carlos Martínez Lorenzo, Pere-Lluís Huguet Cabot, Roberto Navigli

Cross-lingual AMR Aligner: Paying Attention to Cross-Attention

Findings of ACL 2023

Stefano Perrella, Lorenzo Proietti, Alessandro Scirè, Niccolò Campolungo, Roberto Navigli

MaTESe: Machine Translation Evaluation as a Sequence Tagging Problem

Proceedings of the Seventh Conference on Machine Translation (WMT 2022)

Sedrick Keh, Rohit Bharadwaj, Emmy Liu, Simone Tedeschi, Varun Gangal, Roberto Navigli

EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation

Third Workshop on Figurative Language (EMNLP 2022)

Simone Conia, Edoardo Barba, Alessandro Scirè, Roberto Navigli

Semantic Role Labeling Meets Definition Modeling: Using Natural Language to Describe Predicate Argument Structures

Findings of EMNLP 2022

Abelardo Carlos Martìnez Lorenzo, Marco Maru, Roberto Navigli

Fully-Semantic Parsing and Generation: the BabelNet Meaning Representation

Proceedings of ACL 2022

Simone Tedeschi, Federico Martelli, Roberto Navigli

ID10M: Idiom Identification in 10 Languages

Findings of NAACL 2022

Simone Tedeschi, Roberto Navigli

MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)

Findings of NAACL 2022

Simone Tedeschi, Roberto Navigli

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection

Proceedings of SemEval 2022

Roberto Navigli, Rexhina Blloshmi, Abelardo Carlos Martìnez Lorenzo

BabelNet Meaning Representation: A Fully Semantic Formalism to Overcome Language Barriers

Proceedings of AAAI-22

Pere-Lluis Huguet Cabot, Roberto Navigli

REBEL: Relation Extraction By End-to-end Language generation

Findings of EMNLP 2021: pp. 2370-2381

Rexhina Blloshmi, Michele Bevilacqua, Edoardo Fabiano, Valentina Caruso, Roberto Navigli

SPRING Goes Online: End-to-End AMR Parsing and Generation

Proceedings of EMNLP 2021: pp. 134-142

Simone Conia, Riccardo Orlando, Fabrizio Brignone, Francesco Cecconi, Roberto Navigli

InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Proceedings of EMNLP 2021: pp. 319-328

Riccardo Orlando, Simone Conia, Fabrizio Brignone, Francesco Cecconi, Roberto Navigli

AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation

Proceedings of EMNLP 2021: pp. 329-307

Simone Tedeschi, Simone Conia, Francesco Cecconi, Roberto Navigli

Named Entity Recognition for Entity Linking: What Works and What's Next

Findings of EMNLP 2021: pp. 2584-2596

Simone Tedeschi, Valentino Maiorca, Niccolò Campolungo, Francesco Cecconi, Roberto Navigli

WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER

Findings of EMNLP 2021: pp. 2521-2533

Roberto Navigli, Michele Bevilacqua, Simone Conia, Dario Montagnini, Francesco Cecconi

Ten Years of BabelNet: A Survey

Proceedings of IJCAI 2021: pp. 4559-4567

Pere-Lluís Huget Cabot, David Abadi, Agneta Fischer, Ekaterina Shutova

Us vs. Them: A Dataset of Populist Attitudes, News Bias and Emotions

Proceedings of EACL 2021: pp. 1921-1945

Simone Conia, Fabrizio Brignone, Davide Zanfardino, Roberto Navigli

InVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Proceedings of EMNLP 2020: pp. 77-84

Federico Scozzafava, Marco Maru, Fabrizio Brignone, Giovanni Torrisi, Roberto Navigli

Personalized PageRank with Syntagmatic Information for Multilingual Word Sense Disambiguation

Proceedings of ACL 2020: pp. 37-46

Marco Maru, Federico Scozzafava, Federico Martelli, Roberto Navigli

SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations

Proceedings of EMNLP/IJCNLP 2019: pp. 3532-3538