— disambiguate and semantically tag text in hundreds of languages
API available soon

Multilingual Word Sense Disambiguation and Entity Linking with Comprehendo

Two things can make a difference in your products which deal with text: semantics and multilinguality. Comprehendo enables both by providing a system which understands text by associating explicit meanings with words, multiword expressions and phrases. Such meanings come from WordAtlas and are multilingual by design.

Comprehendo is based on state-of-the-art Word Sense Disambiguation and Entity Linking and can be applied to any language and text genre on a large scale. As a result, users can process large amounts of texts, articles, blogs, posts, etc. in multiple languages and aggregate this information in any way they like (for instance, using Extraggo for text analytics).

Comprehendo works both with standard text (sentences, paragraphs, documents, etc.) , and text snippets (such as tags, word clouds, user queries, etc.) and brings two main advantages:

  • It enables the semantic aggregation of similar information written using different words, either in the same language, or in different languages (for instance, what is the world talking about today in different language newspapers?)
  • It enables discrimination between different uses of the same word (for example, users searching for oil might be interested in fuel or in food depending on contextual cue words)

Disambiguation API and Features

Thanks to its disambiguation API, Comprehendo understands text in hundreds of languages and tags ambiguous words explicitly with concepts and entities in WordAtlas with high performance and speed.
Comprehendo works with full text, but also snippets, posts, query logs or term banks, among others.

  • 271 language
  • explicit disambiguation
  • high performance
  • Structured and unstructured text

How does it work?

Comprehendo is not a translator: it understands text and associates concepts and named entities with words and phrases. Such concepts and entities are provided by WordAtlas, our multilingual knowledge graph, which makes it possible to scale to arbitrary languages at any time. For example, given the following sentence:

The plane landed at Rome airport

Comprehendo produces the following output:

the plane landed at Rome airport

Note that by just selecting a different language, the concepts and entities involved are lexicalized in the target language. For instance, when reading them in Italian we would get:

the plane landed at Rome airport

Thanks to the tight integration of Comprehendo and WordAtlas, you can unify textual content expressed in different languages. For instance, consider the following queries in your search engine:

computer game addiction
video game addiction
dépendance au jeu vidéo
dipendenza da videogiochi
dipendenza da giochi per computer
зависимость от компьютерных игр

All the above queries convey the same semantics, which is identified by Comprehendo and given below:

computer game
A game played against a computer
Being abnormally tolerant to and dependent on something that is psychologically or physically habit-forming (especially alcohol or narcotic drugs)

Comprehendo is not a machine translation system: note that, even though the two concepts are expressed and explained in English, they are actually multilingual and aggregate all the above lexical realizations (and many more). As a result, web search is made semantic and sparsity is greatly reduced.

Contact Us

Thank you for your interest in Babelscape. Please fill out this inquiry form to receive more information about our products.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.