There’s an ample quantity of unstructured information about historic occasions — information articles, authorities experiences, and native bulletins — however extracting this info manually at scale is not possible. Our methodology analyzes information experiences the place flooding is a major topic. We then use the Google Learn Aloud user-agent to isolate major textual content from 80 languages, which is standardized into English through the Cloud Translation API.
Essentially the most essential step of the extraction course of is completed utilizing the Gemini Massive Language Mannequin (LLM). We engineered a complicated immediate that guides Gemini by a strict analytical verification course of:
- Classification: The mannequin distinguishes between experiences of precise, ongoing, or previous floods and articles that merely talk about future warnings, coverage conferences, or common danger modeling.
- Temporal reasoning: Gemini anchors relative references (e.g., “final Tuesday”) in opposition to an article’s publication date to find out exact occasion timing.
- Spatial precision: The system identifies granular areas (neighborhoods and streets) and maps them to standardized spatial polygons utilizing utilizing Google Maps Platform.
The technical validation of Groundsource confirms its reliability for high-stakes analysis. In handbook opinions, we discovered that 60% of extracted occasions had been correct in each location and timing. Crucially, 82% had been correct sufficient to be virtually helpful for real-world evaluation — for instance, by capturing the right administrative district or pinpointing the occasion inside a single day of its reported peak.
The protection offered by Groundsource represents a massive-scale growth over present archives. By remodeling unstructured media into information, we now have generated 2.6 million occasions — a major improve in comparison with the information present in conventional monitoring methods. Moreover, spatiotemporal matching exhibits that Groundsource captured between 85% and 100% of the extreme flood occasions recorded by GDACS between 2020 and 2026, an indication of its effectiveness in figuring out high-impact disasters alongside smaller, localized occasions.
