Jump to Content

Introducing Geospatial Reasoning

Unlocking insights with generative AI and multiple foundation models

Video preview image

Watch the film

Grounded by the best geospatial models

Google has a long history of organizing the world's geospatial information—data tied to specific locations—and making it accessible through products like Maps, Street View, Earth, and Search.

Geospatial Reasoning leverages this long-standing expertise in satellite imagery, mapping, and geospatial foundation models, to help solve a wide range of problems.

Yours and ours: A unique mix of data

Whether you’re working in public health, urban development, integrated business planning, or climate resilience, Google’s data, real-time services, and AI models can accelerate your analyses and augment your proprietary models and data. Geospatial Reasoning uses generative AI to reduce the significant cost, time, and domain expertise required to combine geospatial capabilities.



play silent looping video pause silent looping video

Reasoning accelerates insight

Given a complex natural language query, Gemini will plan and enact a chain of reasoning, analyzing multiple geospatial and structured data sources, and using advanced AI models for task-specific inference and grounding. Responding with insights and data visualizations, Geospatial Reasoning will provide rapid, trustworthy answers.



play silent looping video pause silent looping video

Let’s think bigger, together

Geospatial Reasoning, grounded with our new geospatial foundation models, will be able to simplify and accelerate real-world problem solving. Provide your information to receive updates on our progress, including announcements about new features, research breakthroughs, and opportunities to join our trusted tester programs.

Publications and blogs

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models
David Schottlander and Tomer Shekel, Product Managers, Google Research
A Recipe for Improving Remote Sensing Zero Shot Generalization
Aviad Barzilai
Yotam Gigi
Vered Silverman
Yehonathan Refael
Bolous Jaber
Amr Helmy
3rd ML4RS Workshop at ICLR 2025
Preview abstract Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited. In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset. We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training. View details
Enhancing Remote Sensing Representations through Mixed-Modality Masked Autoencoding
Ori Linial
George Leifman
Yochai Blau
Nadav Sherman
Yotam Gigi
Wojciech Sirko
Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops (2025), pp. 507-516
Preview abstract This paper presents an innovative approach to pre-training models for remote sensing by integrating optical and radar data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework, our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multi-task design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally, we introduce a "mixing" strategy in the pretraining phase, combining patches from both image sources, which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks, including Sen1Floods11 and BigEarthNet, demonstrates significant improvements in adaptability and generalizability across varied downstream remote sensing applications. Our findings highlight the advantages of leveraging complementary modalities for more resilient and versatile land cover analysis. View details
General Geospatial Inference with a Population Dynamics Foundation Model
Chaitanya Kamath
Prithul Sarker
Joydeep Paul
Yael Mayer
Sheila de Guia
Jamie McPike
Adam Boulanger
David Schottlander
Yao Xiao
Manjit Chakravarthy Manukonda
Monica Bharel
Von Nguyen
Luke Barrington
Niv Efron
Krish Eswaran
Shravya Shetty
(2024) (to appear)
Preview abstract Supporting the health and well-being of dynamic populations around the world requires governmental agencies, organizations, and researchers to understand and reason over complex relationships between human behavior and local contexts. This support includes identifying populations at elevated risk and gauging where to target limited aid resources. Traditional approaches to these classes of problems often entail developing manually curated, task-specific features and models to represent human behavior and the natural and built environment, which can be challenging to adapt to new, or even related tasks. To address this, we introduce the Population Dynamics Foundation Model (PDFM), which aims to capture the relationships between diverse data modalities and is applicable to a broad range of geospatial tasks. We first construct a geo-indexed dataset for postal codes and counties across the United States, capturing rich aggregated information on human behavior from maps, busyness, and aggregated search trends, and environmental factors such as weather and air quality. We then model this data and the complex relationships between locations using a graph neural network, producing embeddings that can be adapted to a wide range of downstream tasks using relatively simple models. We evaluate the effectiveness of our approach by benchmarking it on 27 downstream tasks spanning three distinct domains: health indicators, socioeconomic factors, and environmental measurements. The approach achieves state-of-the-art performance on geospatial interpolation across all tasks, surpassing existing satellite and geotagged image based location encoders. In addition, it achieves state-of-the-art performance in extrapolation and super-resolution for 25 of the 27 tasks. We also show that the PDFM can be combined with a state-of-the-art forecasting foundation model, TimesFM, to predict unemployment and poverty, achieving performance that surpasses fully supervised forecasting. The full set of embeddings and sample code are publicly available for researchers. In conclusion, we have demonstrated a general purpose approach to geospatial modeling tasks critical to understanding population dynamics by leveraging a rich set of complementary globally available datasets that can be readily adapted to previously unseen machine learning tasks. View details
Insights into population dynamics: A foundation model for geospatial inference
David Schottlander, Product Manager, and Gautam Prasad, Software Engineer, Google Research, Health AI Team
Towards a Trajectory-powered Foundation Model of Mobility
Ivan Kuznetsov
Shushman Choudhury
Aboudy Kreidieh
(2024)
Preview abstract This position paper advocates for the development of a geospatial foundation model based on human mobility trajectories in the built environment. Such a model would be widely applicable across many important societal domains currently addressed independently, including transportation networks, data-driven urban planning and management, tourism, and sustainability. Unlike existing large vision-language models, trained primarily on text and images, this foundation model should integrate the complex spatiotemporal and multimodal data inherent to human mobility. This paper motivates this challenging research agenda, outlining many downstream applications that would be differentially impacted or enabled by such a model. It then explains the critical spatial, temporal, and contextual factors that the model must capture to effectively analyze trajectories. Finally, it concludes with several research questions and directions, laying the foundations for future exploration in this exciting and emerging field. View details