Header shape illustration 1Header shape illustration 2
Back

MOSAICo: a Multilingual Open-text Semantically Annotated Interlinked Corpus

Simone Conia, Edoardo Barba, Abelardo Carlos Martínez Lorenzo, Pere-Lluís Huguet Cabot, Riccardo Orlando, Luigi Procopio, Roberto Navigli

Abstract

Several Natural Language Understanding (NLU) tasks focus on linking text to explicit knowledge, including Word Sense Disambiguation, Semantic Role Labeling, Semantic Parsing, and Relation Extraction. In addition to the importance of connecting raw text with explicit knowledge bases, the integration of such carefully curated knowledge into deep learning models has been shown to be beneficial across a diverse range of applications, including Language Modeling and Machine Translation. Nevertheless, the scarcity of semantically-annotated corpora across various tasks and languages limits the potential advantages significantly. To address this issue, we put forward MOSAICo, the first endeavor aimed at equipping the research community with the key ingredients to model explicit semantic knowledge at a large scale, providing hundreds of millions of silver yet high-quality annotations for four NLU tasks across five languages. We describe the creation process of MOSAICo, demonstrate its quality and variety, and analyze the interplay between different types of semantic information. MOSAICo, available at https://github.com/SapienzaNLP/mosaico, aims to drop the requirement of closed, licensed datasets and represents a step towards a level playing field across languages and tasks in NLU.

June 2024, Association for Computational Linguistics

Your privacy choices

Save and continue
Sign up!
The best way to get the latest news from Babelscape and the NLP world!
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Thank you for subscribing!
You’ve been added to our mailing list, and you’ll receive our next newsletter to stay updated on the latest news from the NLP world!
Something went wrong
We are sorry, your request cannot be processed right now.
Please wait a bit and try again.
Unsubscribe
We're sorry to see you go. Please enter your email address to complete the unsubscription process.
You'll receive an email confirmation shortly.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Check your email
We have sent you a link to your email to complete the unsubscribe process.
Something went wrong
We are sorry, your request cannot be processed right now.
Please wait a bit and try again.