Skip to main content

Languages with Languages AI

More languages. Fewer barriers.

See how Google is expanding support for
thousands of the world's languages.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
Video montage of cities and faces across the globe.
South Asian man in a white collared shirt, standing in front of a pile of baskets.
East Asian woman standing in a blue and white shirt with glasses on.

There are over 7,000 languages spoken in the world, but only a few are well-represented online. Google Research is trying to change that, one community at a time.

There are over 7,000 languages spoken in the world, but only a few are well-represented online. Google Research is trying to change that, one community at a time.

Collecting data
Every model starts with data

Data is collected from diverse sources for any language we work on, such as web data, collected speech data, and identified video transcripts.

PREPARING DATA
Pre-processing data

Once enough data has been collected for a specific language, we process and validate the data for optimal performance.

Model training
Machine learning begins

Data patterns are identified by machine learning models in training. They can then spot them in any unseen data or generate novel data, given a prompt.

MODEL TESTING
Model testing with collected data

We test the model to identify issues, and improve it by adjusting the data’s quality and relevance so the language works as it would in the real world.

PRODUCT ROLLOUT
Release vetted model

Once it’s ready, the model is launched on products like Gboard, YouTube, or existing models are updated in products like Gemini and Translate.

Languages we support
We support hundreds of languages spoken all over the world.
Learn more about language expansion
Languages we support

Languages we support

We support hundreds of languages spoken all over the world.
We support hundreds of languages spokenall over the world.
Learn more about language expansion
Meet the communities inspiring this work
Meet the communities inspiring this work

“In a world interconnected like never before, language plays an increasingly important role in access to knowledge and prosperity. Our goal is to develop technology that enables better understanding of more languages, making them accessible, removing modality barriers and empowering people to communicate effectively and have access to knowledge. This technological advancement has significant societal impact.”

“In a world interconnected like never before, language plays an increasingly important role in access to knowledge and prosperity. Our goal is to develop technology that enables better understanding of more languages, making them accessible, removing modality barriers and empowering people to communicate effectively and have access to knowledge. This technological advancement has significant societal impact.”

Image of YOSSI MATIAS

YOSSI MATIAS

Head of Google Research

FAQ

The world’s linguistic diversity is stunning: with about 1,300 languages with 100K+ speakers, and thousands more with fewer speakers, our world boasts many more languages than most of us realize.

We’ve worked hard to gather the robust datasets and quality knowledge required to support the creation and use of everyday technology by speakers of diverse languages around the world. This has allowed us to support helping more of the world’s people to use the internet.

But this means that billions of people still have a language barrier on the Internet. And when a language is not online, it limits what we can learn, what jobs we can have, what rights we exercise, and what stories we can tell. When a language is not online, opportunities are lost.

This is why we’re working to organize the world’s information and make it universally accessible and useful across 1,000 languages—and ideally thousands more.

Language coverage varies by product. For instance, Gboard has a Help Center page that lists the supported languages, and so does Translate. While we work to provide an aggregated view of languages covered across all products, please visit product-specific help center pages for more information on language coverage.

Coverage varies by product. Growing language support can already be utilized in many Google products, from YouTube captions, to Gboard, to searching with your voice. And as we introduce expanded language coverage in models like Gemini, many products across Google will see increased language-related capabilities as a result.

India
India
Home to over 1,300 languages
Home to over 1,300 languages

Google funds and supports Project Vanni in capturing India’s rich and diverse speech landscape. This can only be accomplished by meeting people where they are in their own communities. Language should be a bridge between people, not a barrier.

Google funds and supports Project Vanni in capturing India’s rich and diverse speech landscape. This can only be accomplished by meeting people where they are in their own communities. Language should be a bridge between people, not a barrier.

Project Vaani is one of the largest datasets of Indian dialects ever to exist, working to support Google’s Language Inclusivity Initiative.
Learn more about Project Vaani on the IISc website to see how we’re making the internet more accessible, one language at a time.

Project Vaani is one of the largest datasets of Indian dialects ever to exist, working to support Google’s Language Inclusivity Initiative.
Learn more about Project Vaani on the IISc website to see how we’re making the internet more accessible, one language at a time.

Watch video
“There’s a saying here: Every mile, the taste of water changes. Every four miles, the language changes.”
“There’s a saying here: Every mile, the taste of water changes. Every four miles, the language changes.”
DINESH TEWARI
Research Program Manager, Google
DINESH TEWARI
Research Program Manager, Google
 Four men enjoying a meal in rural India in front of tall green grass.
 Indian man standing in the sun in a tan button-up shirt.
A herd of goats in rural India.
The first step is to collect imagery that reflects the unique cultures, lifestyles, and languages of these communities, making sure they’re relatable to the resident population. Our partner teams capture data on the ground in each district by recording responses to the images while maintaining diversity across gender, age, and education.
The first step is to collect imagery that reflects the unique cultures, lifestyles, and languages of these communities, making sure they’re relatable to the resident population. Our partner teams capture data on the ground in each district by recording responses to the images while maintaining diversity across gender, age, and education.
Finally, the speech data is transcribed by others from those same districts to assure that important nuances and language variations are reflected accurately. 10% of that data gets transcribed and released into our ecosystem of language models.
Finally, the speech data is transcribed by others from those same districts to assure that important nuances and language variations are reflected accurately. 10% of that data gets transcribed and released into our ecosystem of language models.
“We want to make sure that when we’re collecting data, it’s anchored in the region to capture the rich landscape of India.”
“We want to make sure that when we’re collecting data, it’s anchored in the region to capture the rich landscape of India.”
PARTHA TALUKDAR
Research Scientist, Google
PARTHA TALUKDAR
Research Scientist, Google
India
Home to over 1,300 languages

Google funds and supports Project Vanni in capturing India’s rich and diverse speech landscape. This can only be accomplished by meeting people where they are in their own communities. Language should be a bridge between people, not a barrier.

Project Vaani is one of the largest datasets of Indian dialects ever to exist, working to support Google’s Language Inclusivity Initiative.
Learn more about Project Vaani on the IISc website to see how we’re making the internet more accessible, one language at a time.

Watch video
 Four men enjoying a meal in rural India in front of tall green grass.
“There’s a saying here: Every mile, the taste of water changes. Every four miles, the language changes.”
DINESH TEWARI
Research Program Manager, Google
 Indian man standing in the sun in a tan button-up shirt.
The first step is to collect imagery that reflects the unique cultures, lifestyles, and languages of these communities, making sure they’re relatable to the resident population. Our partner teams capture data on the ground in each district by recording responses to the images while maintaining diversity across gender, age, and education.
A herd of goats in rural India.
Finally, the speech data is transcribed by others from those same districts to assure that important nuances and language variations are reflected accurately. 10% of that data gets transcribed and released into our ecosystem of language models.
“We want to make sure that when we’re collecting data, it’s anchored in the region to capture the rich landscape of India.”
PARTHA TALUKDAR
Research Scientist, Google
PARTHA TALUKDAR
Research Scientist, Google
New York
New York
Uniting the world’s languages
Uniting the world’s languages

New York City is the world’s most diverse language hotspot. More languages are spoken here than in any other place on earth. This is a city where so many languages intersect on a daily basis, and the story behind that fact lives in its people.

New York City is the world’s most diverse language hotspot. More languages are spoken here than in any other place on earth. This is a city where so many languages intersect on a daily basis, and the story behind that fact lives in its people.

New Yorkers come from all walks of life, and communication is at the heart of the city’s energy. Language doesn’t just tell the story of how we got here, but how we continue to thrive by building a community on inclusion and understanding.

New Yorkers come from all walks of life, and communication is at the heart of the city’s energy. Language doesn’t just tell the story of how we got here, but how we continue to thrive by building a community on inclusion and understanding.

A young Asian woman with glasses and short black hair standing with her bike on the sidewalk.
Prayer flags hanging off the handles of a bicycle.
Many language contributors work from Google’s NYC office, where every language initiative connects at some point, but our current efforts from within the city are focused on collecting data from native speakers of AAVE (African American Vernacular English) to enable better recognition of diversified voices in the US. The NYC team is part of a diverse initiative working across several different departments,
Many language contributors work from Google’s NYC office, where every language initiative connects at some point, but our current efforts from within the city are focused on collecting data from native speakers of AAVE (African American Vernacular English) to enable better recognition of diversified voices in the US. The NYC team is part of a diverse initiative working across several different departments,
including Google Research, Speech, and Google DeepMind. This project is focused on expanding language technologies across Google products and services. Our belief is that accessible information for all will result from a better understanding of languages. We’re bringing the world’s information to New Yorkers, one language at a time.
including Google Research, Speech, and Google DeepMind. This project is focused on expanding language technologies across Google products and services. Our belief is that accessible information for all will result from a better understanding of languages. We’re bringing the world’s information to New Yorkers, one language at a time.
A middle-aged Latina woman wearing hoop earrings.
“Language was so essential to me moving here from Nigeria…I remember using language as a propellant into a new culture, but when I came home, it also was a tether to African culture in the best way possible.”
“Language was so essential to me moving here from Nigeria…I remember using language as a propellant into a new culture, but when I came home, it also was a tether to African culture in the best way possible.”
Uche Okonkwo
Responsible AI, Google Research
Uche Okonkwo
Responsible AI, Google Research
A small child walking down the front steps of a New York brownstone.
A small child walking down the front steps of a New York brownstone.
New York
Uniting the world’s languages

New York City is the world’s most diverse language hotspot. More languages are spoken here than in any other place on earth. This is a city where so many languages intersect on a daily basis, and the story behind that fact lives in its people.

A young Asian woman with glasses and short black hair standing with her bike on the sidewalk.

New Yorkers come from all walks of life, and communication is at the heart of the city’s energy. Language doesn’t just tell the story of how we got here, but how we continue to thrive by building a community on inclusion and understanding.

Prayer flags hanging off the handles of a bicycle.

Many language contributors work from Google’s NYC office, where every language initiative connects at some point, but our current efforts from within the city are focused on collecting data from native speakers of AAVE (African American Vernacular English) to enable better recognition of diversified voices in the US. The NYC team is part of a diverse initiative working across several different departments, including Google Research, Speech, and Google DeepMind.

This project is focused on expanding language technologies across Google products and services. Our belief is that accessible information for all will result from a better understanding of languages. We’re bringing the world’s information to New Yorkers, one language at a time.

A middle-aged Latina woman wearing hoop earrings.
“Language was so essential to me moving here from Nigeria…I remember using language as a propellant into a new culture, but when I came home, it also was a tether to African culture in the best way possible.”
Uche Okonkwo
Responsible AI, Google Research
A small child walking down the front steps of a New York brownstone.
A small child walking down the front steps of a New York brownstone.
Ghana
Ghana
Saving five native languages
Saving five native languages

Africa is home to thousands of languages, most of which are not documented well enough for speakers to be able to read online in their native tongue.

This is a problem that can be solved with more data, and we need a significant amount of new data to train the Automatic Speech Recognition models that power translation tools and digital language banking. Our efforts are focused on improving recognition of these underrepresented languages by addressing this shortage of speech data.

Africa is home to thousands of languages, most of which are not documented well enough for speakers to be able to read online in their native tongue.

This is a problem that can be solved with more data, and we need a significant amount of new data to train the Automatic Speech Recognition models that power translation tools and digital language banking. Our efforts are focused on improving recognition of these underrepresented languages by addressing this shortage of speech data.

Watch video
Video clips of the University of Ghana.

Waxal helps University research teams collect and transcribe voice recordings in order to accelerate digital transformation across Africa.

This effort began in Ghana with five of the more than 80 dialects spoken there: Akan, Ikposo, Ewe, Dagbani, and Dagare. The work begins inside local communities, gathering culturally relevant images that represent a full spectrum of life in each region to be used as prompts during recording.

Waxal helps University research teams collect and transcribe voice recordings in order to accelerate digital transformation across Africa.

This effort began in Ghana with five of the more than 80 dialects spoken there: Akan, Ikposo, Ewe, Dagbani, and Dagare. The work begins inside local communities, gathering culturally relevant images that represent a full spectrum of life in each region to be used as prompts during recording.

The Waxal team helps to gather those recordings, their transcriptions, and the resulting data, as well as any resources and expertise that African academics and Google Research can offer.

The data is then checked for quality, accuracy, and consistency before it’s published in an effort to promote any future collaboration, and keep these languages alive for generations to come.

The Waxal team helps to gather those recordings, their transcriptions, and the resulting data, as well as any resources and expertise that African academics and Google Research can offer.

The data is then checked for quality, accuracy, and consistency before it’s published in an effort to promote any future collaboration, and keep these languages alive for generations to come.

A Ghanaian woman in a red shirt.
“Anybody now or in the future will be able to know that these languages exist. It’s important that we don’t lose the local dialects.”
“Anybody now or in the future will be able to know that these languages exist. It’s important that we don’t lose the local dialects.”
DELPHINA DARKO
Master’s Student, University of Ghana
DELPHINA DARKO
Master’s Student, University of Ghana
Two Ghanaian men reach out in front of them to shape a heart with their fingers.
Two Ghanaian men reach out in front of them to shape a heart with their fingers.
Ghana
Saving five native languages

Africa is home to thousands of languages, most of which are not documented well enough for speakers to be able to read online in their native tongue.

This is a problem that can be solved with more data, and we need a significant amount of new data to train the Automatic Speech Recognition models that power translation tools and digital language banking. Our efforts are focused on improving recognition of these underrepresented languages by addressing this shortage of speech data.

Watch video
Video clips of the University of Ghana.

Waxal helps University research teams collect and transcribe voice recordings in order to accelerate digital transformation across Africa.

This effort began in Ghana with five of the more than 80 dialects spoken there: Akan, Ikposo, Ewe, Dagbani, and Dagare. The work begins inside local communities, gathering culturally relevant images that represent a full spectrum of life in each region to be used as prompts during recording.

A Ghanaian woman in a red shirt.

The Waxal team helps to gather those recordings, their transcriptions, and the resulting data, as well as any resources and expertise that African academics and Google Research can offer.

The data is then checked for quality, accuracy, and consistency before it’s published in an effort to promote any future collaboration, and keep these languages alive for generations to come.

“Anybody now or in the future will be able to know that these languages exist. It’s important that we don’t lose the local dialects.”
DELPHINA DARKO
Master’s Student, University of Ghana
Two Ghanaian men reach out in front of them to shape a heart with their fingers.
Two Ghanaian men reach out in front of them to shape a heart with their fingers.
Google Research Language Inclusion uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more