0%
    Explore the world's languages
    Explore the world's languages
    Google is bringing the world's 7,000+ languages online to help make information more accessible for everyone.
                Language
                FAQ
                What is Google Language Explorer?
                Google Language Explorer is an interactive website where anyone can explore the world’s rich diversity of languages. Here, users can discover information about thousands of languages through maps, search tools, and detailed data pages powered by the LinguaMeta data repository.
                What is LinguaMeta?
                LinguaMeta is a comprehensive, open-source dataset that powers Google Language Explorer. This collated repository of metadata was created by harmonizing and integrating information on the world’s languages, citing 19 different sources. You can read more about it in the LinguaMeta research paper.
                Who is Google Language Explorer for?
                Google Language Explorer is designed for anyone with an interest in human languages, including but not limited to the following examples:

                • Linguists and researchers seeking a unified source of language metadata.
                • Students of linguistics and related fields who need reliable language metadata for their studies and projects.
                • Language community members looking for information and documented resources for their languages.
                • Natural Language Processing practitioners searching for metadata on specific languages.
                Where does the data on Google Language Explorer site come from?
                All data found on the Google Language Explorer site comes from the LinguaMeta data repository. You can learn more about LinguaMeta and its sources by reading the research paper on its development.
                How often is this information updated?
                The LinguaMeta repository that powers Google Language Explorer is still an active project. As major developments are made within the dataset, this site will be updated to ensure that the information remains as current and accurate as possible.
                How do I search for a specific language?
                You can search for a language by name, country, or region; or you can use the interactive map to find languages spoken within a specific geographic area. Click the pin to find more detailed information on that language.
                How should I cite any data I find here?
                For academic use, make sure to cite both the underlying dataset (LinguaMeta) and that you accessed the information through Google Language Explorer. For the underlying LinguaMeta dataset, please cite the research paper as follows: Ritchie, Sandy, Daan van Esch, Uche Okonkwo, Shikhar Vashishth, and Emily Drummond. "LinguaMeta: Unified metadata for thousands of languages." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 10530-10538. 2024.
                What should I do if I find an error or outdated information?
                Please use the "Feedback" link available on each language page. Fill out your report with as much specific information as you can, and cite your sources if possible. The team will review each submission and apply it to the LinguaMeta repository if applicable before it updates on the Google Language Explorer site. Thank you for helping the team to improve this project.
                Can I download the entire LinguaMeta dataset?
                Yes! The full, version-controlled LinguaMeta dataset that powers Google Language Explorer is available for download from its public repository, allowing for open and reproducible research. You can find the complete dataset hosted on GitHub.
                I’m a researcher interested in contributing to LinguaMeta. How can I do that?
                Contributions are welcome! The best way to get involved is to visit the LinguaMeta repository and consult the CONTRIBUTING.md file for guidelines on how to format and submit data, such as proposing revisions or adding new language resources.
                I can’t find my language.
                The aim of Google Language Explorer and the LinguaMeta repository is to provide reliable information on living, spoken languages. Languages in the following categories are not included:

                • ancient languages
                • historical languages
                • constructed languages
                • signed languages
                • language families

                We do include macrolanguages, and map their encompassing languages to their assigned macrolanguage code. We also include languages classified as extinct, as some have been shown to have retained small communities of speakers.

                If your language is a living, spoken language and is not present in Google Language Explorer, please let us know through the feedback link here by clicking the “New issue” button.

                What is Google Language Explorer?
                Google Language Explorer is an interactive website where anyone can explore the world’s rich diversity of languages. Here, users can discover information about thousands of languages through maps, search tools, and detailed data pages powered by the LinguaMeta data repository.
                What is LinguaMeta?
                LinguaMeta is a comprehensive, open-source dataset that powers Google Language Explorer. This collated repository of metadata was created by harmonizing and integrating information on the world’s languages, citing 19 different sources. You can read more about it in the LinguaMeta research paper.
                Who is Google Language Explorer for?
                Google Language Explorer is designed for anyone with an interest in human languages, including but not limited to the following examples:

                • Linguists and researchers seeking a unified source of language metadata.
                • Students of linguistics and related fields who need reliable language metadata for their studies and projects.
                • Language community members looking for information and documented resources for their languages.
                • Natural Language Processing practitioners searching for metadata on specific languages.
                Where does the data on Google Language Explorer site come from?
                All data found on the Google Language Explorer site comes from the LinguaMeta data repository. You can learn more about LinguaMeta and its sources by reading the research paper on its development.
                How often is this information updated?
                The LinguaMeta repository that powers Google Language Explorer is still an active project. As major developments are made within the dataset, this site will be updated to ensure that the information remains as current and accurate as possible.
                How do I search for a specific language?
                You can search for a language by name, country, or region; or you can use the interactive map to find languages spoken within a specific geographic area. Click the pin to find more detailed information on that language.
                How should I cite any data I find here?
                For academic use, make sure to cite both the underlying dataset (LinguaMeta) and that you accessed the information through Google Language Explorer. For the underlying LinguaMeta dataset, please cite the research paper as follows: Ritchie, Sandy, Daan van Esch, Uche Okonkwo, Shikhar Vashishth, and Emily Drummond. "LinguaMeta: Unified metadata for thousands of languages." In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 10530-10538. 2024.
                What should I do if I find an error or outdated information?
                Please use the "Feedback" link available on each language page. Fill out your report with as much specific information as you can, and cite your sources if possible. The team will review each submission and apply it to the LinguaMeta repository if applicable before it updates on the Google Language Explorer site. Thank you for helping the team to improve this project.
                Can I download the entire LinguaMeta dataset?
                Yes! The full, version-controlled LinguaMeta dataset that powers Google Language Explorer is available for download from its public repository, allowing for open and reproducible research. You can find the complete dataset hosted on GitHub.
                I’m a researcher interested in contributing to LinguaMeta. How can I do that?
                Contributions are welcome! The best way to get involved is to visit the LinguaMeta repository and consult the CONTRIBUTING.md file for guidelines on how to format and submit data, such as proposing revisions or adding new language resources.
                I can’t find my language.
                The aim of Google Language Explorer and the LinguaMeta repository is to provide reliable information on living, spoken languages. Languages in the following categories are not included:

                • ancient languages
                • historical languages
                • constructed languages
                • signed languages
                • language families

                We do include macrolanguages, and map their encompassing languages to their assigned macrolanguage code. We also include languages classified as extinct, as some have been shown to have retained small communities of speakers.

                If your language is a living, spoken language and is not present in Google Language Explorer, please let us know through the feedback link here by clicking the “New issue” button.

                Research paper

                This research paper details the introduction of LinguaMeta, a unified resource for the language data developed by Google Research. LinguaMeta is an open-source repository of language metadata, including language codes, number of speakers, writing systems, official endangerment status, and more. Each datapoint can be traced back to its origin, making it easy to identify and improve existing resources like the Language Explorer site.

                LinguaMeta compiles language metadata from a variety of sources, each with their own open-access license. Each source and their respective licensing information are available on LinguaMeta Github repository.

                Feedback

                If you have any questions or comments about the Google Language Explorer, or if you have information or data to contribute that isn’t yet on the site, please submit that feedback here by clicking the “New issue” button.

                What does this mean?

                Macrolanguage

                This term is used to refer to a group of closely related languages. It’s used in linguistics and language coding to categorize similar but distinct dialects.

                BCP-47

                This is a type of language tag, or set of characters, that categorizes languages and their different dialects within an online platform.

                Glottocode

                These codes act as an identification system for Glottolog, an open-access database of the world’s languages.

                Wikidata_id

                Wikidata is a collaborative knowledge graph hosted by Wikimedia, and a Wikidata_id acts as an identifier for a specific piece of information within that system.

                What does this mean?
                Endonym

                An endonym is the name used to refer to a language in its native tongue, as opposed to the translated name of a language, which can vary significantly. Consider it the original name for a language.

                What does this mean?
                Number of speakers

                This refers to the number of speakers this language currently has around the world.

                What does this mean?
                Endangerment status

                The endangerment status of a language is determined by the current estimated number of speakers. It is used to understand the level of risk a specific language is under, ranging from safe (unendangered) to vulnerable to extinct.

                What does this mean?
                Scripts

                A script, or writing system, determines the type of symbols and characters of a language and how it is recorded in written text.

                What does this mean?
                Countries / Regions

                This refers to the countries and/or regions that host a significant population of native speakers.

                No data available
                Data isn't available for this selection.
                Google Research Language Explorer uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more
                World Overview
                Languages
                7K
                Population
                8B
                Countries
                190+
                Scripts
                159
                This site is best experienced with sound on