Languages with Languages AI
More languages. Fewer barriers.
thousands of the world's languages.
There are over 7,000 languages spoken in the world, but only a few are well-represented online. Google Research is trying to change that, one community at a time.
There are over 7,000 languages spoken in the world, but only a few are well-represented online. Google Research is trying to change that, one community at a time.
Data is collected from diverse sources for any language we work on, such as web data, collected speech data, and identified video transcripts.
Once enough data has been collected for a specific language, we process and validate the data for optimal performance.
Data patterns are identified by machine learning models in training. They can then spot them in any unseen data or generate novel data, given a prompt.
We test the model to identify issues, and improve it by adjusting the data’s quality and relevance so the language works as it would in the real world.
Once it’s ready, the model is launched on products like Gboard, YouTube, or existing models are updated in products like Gemini and Translate.
Data is collected from diverse sources for any language we work on, such as web data, collected speech data, and identified video transcripts.
Once enough data has been collected for a specific language, we process and validate the data for optimal performance.
Data patterns are identified by machine learning models in training. They can then spot them in any unseen data or generate novel data, given a prompt.
We test the model to identify issues, and improve it by adjusting the data’s quality and relevance so the language works as it would in the real world.
Once it’s ready, the model is launched on products like Gboard, YouTube, or existing models are updated in products like Gemini and Translate.
“In a world interconnected like never before, language plays an increasingly important role in access to knowledge and prosperity. Our goal is to develop technology that enables better understanding of more languages, making them accessible, removing modality barriers and empowering people to communicate effectively and have access to knowledge. This technological advancement has significant societal impact.”
“In a world interconnected like never before, language plays an increasingly important role in access to knowledge and prosperity. Our goal is to develop technology that enables better understanding of more languages, making them accessible, removing modality barriers and empowering people to communicate effectively and have access to knowledge. This technological advancement has significant societal impact.”
YOSSI MATIAS
Head of Google Research
A discussion on language inclusivity
Our research
FAQ
The world’s linguistic diversity is stunning: with about 1,300 languages with 100K+ speakers, and thousands more with fewer speakers, our world boasts many more languages than most of us realize.
We’ve worked hard to gather the robust datasets and quality knowledge required to support the creation and use of everyday technology by speakers of diverse languages around the world. This has allowed us to support helping more of the world’s people to use the internet.
But this means that billions of people still have a language barrier on the Internet. And when a language is not online, it limits what we can learn, what jobs we can have, what rights we exercise, and what stories we can tell. When a language is not online, opportunities are lost.
This is why we’re working to organize the world’s information and make it universally accessible and useful across 1,000 languages—and ideally thousands more.
Language coverage varies by product. For instance, Gboard has a Help Center page that lists the supported languages, and so does Translate. While we work to provide an aggregated view of languages covered across all products, please visit product-specific help center pages for more information on language coverage.
Coverage varies by product. Growing language support can already be utilized in many Google products, from YouTube captions, to Gboard, to searching with your voice. And as we introduce expanded language coverage in models like Gemini, many products across Google will see increased language-related capabilities as a result.