In brief
- Google built the largest-ever flash flood dataset by using Gemini to mine two decades of global news reports.
- The dataset now powers an AI model that predicts urban flash floods up to 24 hours in advance.
- The system fills a major data gap that long blocked flash flood forecasting.
Flash floods kill thousands of people every year. They strike fast, hit cities hardest, and for decades there was almost nothing scientists could do to see them coming, because the data to train prediction models simply didn’t exist.
On Thursday, Google said it found a way around that problem—by reading the news.
The company unveiled Groundsource, a system that uses Gemini AI to comb through millions of news articles published since 2000, pull out references to flood events, and pin each one to a location and a date. The result is a dataset of 2.6 million historical flash floods spanning more than 150 countries, and now open for anyone to download and use.
That dataset then was used to train a new AI model capable of forecasting whether a flash flood is likely to hit an urban area in the next 24 hours. The forecasts are now live on Google’s Flood Hub, the same platform the company already uses to warn roughly 2 billion people about river-related flooding worldwide.
The problem Groundsource is solving is surprisingly basic. Rivers have physical gauges—sensors sitting in the water that have been recording levels for decades. That’s how forecasters learned to predict when a river would overflow. City streets have nothing like that. When intense rain hits pavement and overwhelms drain systems, the flooding happens too fast and too locally to track with traditional instruments.
Without historical records, you can’t train an AI model to recognize the pattern. Google’s fix was to treat news articles as the missing sensor.
“By turning public information into actionable data, we aren’t just analyzing the past—we’re building a more resilient future for everyone towards our goal that no one is surprised by a natural disaster,” Google said.

After filtering out ads, navigation menus, and duplicates, and translating articles from other languages into English, the team turned millions of messy text descriptions into clean, geolocated time-series data.
The model trained on that data uses an LSTM neural network—a type of AI built for processing sequences over time—to ingest hourly weather forecasts along with local factors like urbanization density, soil absorption rates, and topography. It then outputs a simple signal: medium or high flood risk in the next 24 hours, for any urban area with a population density above 100 people per square kilometer.
The system has real limitations. It only covers areas of about 20 square kilometers at a time, can’t tell you how bad a flood will be, and won’t perform well in regions where news coverage is thin.
Still, the early results are telling. A regional disaster authority in Southern Africa received a Flood Hub alert during the beta phase, confirmed the flood on the ground, and dispatched a humanitarian worker to manage the response. According to Google’s crisis resilience director Juliet Rothenberg, “that chain of events from a prediction in Flood Hub to boots on the ground is exactly what Flood Hub was built for.”
Daily Debrief Newsletter
Start every day with the top news stories right now, plus original features, a podcast, videos and more.
