NLP Pipeline

large-scale, parallel, multilingual and modularized

Get an API key

Process text in many languages!

Our multilingual NLP Pipeline is based on a flexible API which enables effective end-to-end processing of text in the following languages:

  • Arabic
  • Chinese
  • Dutch
  • English
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish

Multilinguality is a key feature of our pipeline, with most modules available in 13 languages. Moreover, we feature:

  • parallelism of independent modules
  • modularization, with an effective pipeline customized to each specific need
  • large scale, making it possible to process millions of texts in seconds
  • availability both as an online service and as an offline software package
NLP abstract representation

Modules

Our multilingual Natural Language Processing pipeline includes modules which perform the following tasks, which can be accessed separately and are integrated into the pipeline:

Available online and offline
  • Language recognition
  • Tokenization
  • Morphological analysis
  • Part-of-speech tagging
  • Named Entity Recognition
Available offline
  • Term, concept and entity extraction
  • Domain labeling
  • Tag classification
  • Word Sense Disambiguation and Entity Linking
  • Semantic vector document creation
  • Semantic document similarity of sentences, paragraphs and documents
  • Sentiment analysis
Pipeline modules

Language detector

We also offer our language detection API as a standalone service. This API is perfect for those who need to detect the language of a given text but do not need the full suite of NLP tools that Babelscape offers. The API is easy to use and returns results in a JSON format.


60 languages available:

  • Arabic
  • Chinese
  • English
  • French
  • German
  • Italian
  • Spanish
  • Japanese
  • Korean
  • Portuguese
  • Show all languages Collapse
  • Afrikaans
  • Albanian
  • Basque
  • Bengali
  • Bulgarian
  • Catalan
  • Croatian
  • Czech
  • Danish
  • Estonian
  • Dutch
  • Finnish
  • Galician
  • Greek
  • Gujarati
  • Hebrew
  • Hindi
  • Hungarian
  • Indonesian
  • Irish
  • Kannada
  • Latvian
  • Lithuanian
  • Macedonian
  • Malay
  • Malayalam
  • Maltese
  • Marathi
  • Nepali
  • Norwegian (Bokmål)
  • Norwegian (Nynorsk)
  • Persian
  • Polish
  • Punjabi
  • Romanian
  • Russian
  • Simplified Chinese
  • Slovak
  • Slovenian
  • Somali
  • Swahili
  • Swedish
  • Tagalog
  • Tamil
  • Telugu
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Vietnamese
Language detector logo

Features

Babelscape’s NLP pipeline comes with several groundbreaking features. It is designed to work on a large scale in dozens of languages using the same interface for each language. Users can choose only the modules they need and can run dozens of tasks in parallel on the same CPU. The pipeline also integrates our flagship products as modules: WordAtlas, Comprehendo and Extraggo, thanks to which a full-fledged analysis of text can be performed, ranging from tokenization to semantic analysis and text analytics.

  • multilinguality
  • large scale
  • parallel
  • modularity
  • flexible
  • high performance

Contact Us

Thank you for your interest in Babelscape. Please fill out this inquiry form to receive more information about our products.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Send