Overview

Recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing.

To encourage more research on multilingual transfer learning, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark. XTREME covers 40 typologically diverse languages spanning 12 language families and includes 9 tasks that require reasoning about different levels of syntax or semantics.

The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data. Among these are many under-studied languages, such as the Dravidian languages Tamil (spoken in southern India, Sri Lanka, and Singapore), Telugu and Malayalam (spoken mainly in southern India), and the Niger-Congo languages Swahili and Yoruba, spoken in Africa.

For a full description of the benchmark, languages and tasks, please see XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization.

Leaderboard results

Filter by language
ModelParticipantAffiliationSubmission DateScoreSentence-pair ClassificationStructured PredictionQuestion AnsweringSentence RetrievalNumber of parameters (in millions)
Human--93.395.197.087.8-
VECO 2.0 AliceMindAlibabaMarch 17, 202385.890.884.677.294.9
Turing ULR v6 Alexander v-teamMicrosoftSep 6, 202285.591.083.877.194.4
ShenNonG Cloud Xiaowei AITencentMay 22, 202285.090.483.176.394.4
Turing ULR v5 Alexander v-teamMicrosoftNov 24, 202184.590.381.776.393.7
CoFe HFLiFLYTEKOct 26, 202184.190.181.475.094.2
InfoXLM-XFT Noah's Ark LabHuaweiOct 5, 202182.289.375.575.292.4
VECO + HICTL AliceMind + MTAlibabaSep 21, 202182.089.076.773.493.3559
Ensemble-Distil-XFT (ED-XFT) Huawei Ireland Research CenterHuaweiMay 5, 202282.089.274.675.292.4
Polyglot MLNLCByteDanceApr 29, 202181.788.380.671.990.8
Unicoder + ZCode MSRA + CognitionMicrosoftApr 26, 202181.688.476.272.593.7
ERNIE-M ERNIE TeamBaiduJan 1, 202180.987.975.672.391.9559
HiCTL DAMO MT TeamAlibabaMar 21, 202180.889.074.471.992.6
T-ULRv2 + StableTune TuringMicrosoftOct 7, 202080.788.875.472.989.3559
Anonymous3 Anonymous3Anonymous3Jan 3, 202179.988.274.671.789.0
FILTER Dynamics 365 AI ResearchMicrosoftSep 8, 202077.087.571.968.584.4559
Creative CreativeMicrosoftSep 8, 202176.586.390.859.777.5
X-STILTs Phang et al.New York UniversityJun 17, 202073.583.969.467.276.5559
xlm-roberta-large-enhanced GTNLPN/ADec 25, 202268.782.267.255.975.6
XLM-R (large) XTREME TeamAlphabet, CMU-68.282.869.062.361.6559
mBERT XTREME TeamAlphabet, CMU-59.673.766.353.847.7178
MMTE XTREME TeamAlphabet, CMU-59.374.365.352.348.9190
RemBERT RemBERT TeamAlphabetOct 2, 202056.184.173.368.6NA575
XLM XTREME TeamAlphabet, CMU-55.875.065.643.944.7
Anonymous5 Anonymous5Anonymous5Mar 4, 202153.175.366.952.518.0
mT5 mT5-TeamGoogle ResearchJan 13, 202140.989.8NA73.6NA13000
Anonymous6 Anonymous6Anonymous6Dec 20, 202239.344.20.065.534.5
Participate in Competition

Task and Language Details

The tasks included in XTREME cover a range of paradigms, including sentencetext classification, structured prediction, sentence retrieval and cross-lingual question answering. Consequently, in order for models to be successful on the XTREME benchmarks, they must learn representations that generalize to many standard cross-lingual transfer settings.

Each of the tasks covers a subset of the 40 languages. In order to obtain additional data in the low-resource languages that can be used for analyses, we automatically translate test sets of a natural language inference and question answering dataset to the remaining languages. We show that these can be used as a reasonable proxy for performance on gold standard test sets, with the caveat that they overestimate the performance of models that were trained on translations themselves.

Family Languages
Afro-Asiatic Arabic, Hebrew
Austro-Asiatic Vietnamese
Austronesian Indonesian, Javanese, Malay, Tagalog
Basque Basque
Dravidian Malayalam, Tamil, Telugu
Indo-European (Indo-Aryan) Bengali, Marathi, Hindi, Urdu
Indo-European (Germanic) Afrikaans, Dutch, English, German
Indo-European (Romance) French, Italian, Portuguese, Spanish
Indo-European (Greek) Greek
Indo-European (Iranian) Persian
Japonic Japanese
Kartvelian Georgian
Koreanic Korean
Kra-Dai Thai
Niger-Congo Swahili, Yoruba
Slavic Bulgarian, Russian
Sino-Tibetan Burmese, Mandarin
Turkic Kazakh, Turkish
Uralic Estonian, Finnish, Hungarian
PrivacyTermsAbout GoogleGoogle Products