This project was part of my coursework for "Land Use and Environmental Modeling," a graduate level City Planning course at the University of Pennsylvania.

I went on to present the work at the 2016 ESRI User Conference. My collaborators on this project were Madeleine Helmer and Gavin Taves.


As climate change has continued to cause rising sea levels and volatile weather patterns, cities and regions face a heightened risk of major flood events. With limited resources to mitigate flood damage, local authorities rely on flood outcome predictions to make informed resource allocation decisions. Traditionally, floodplains have been drawn using bottom-up hydrological models. These models take information about the landscape and simulate a flood event. Though these simulations are important, they have proven to be inaccurate at times, underscoring the importance of experimenting with innovative methods to predict flood outcomes. 

With this project, I developed an alternative process for flood prediction: a top-down statistical model. My objective was to begin with the the outcomes of a recent flood event (specifically the 2013 flood in Calgary), train a model on these results and use this model to predict for future floods. By ground-truthing the model with real-world observations, I hoped to develop a reliable predictive model. 

I used open data from Calgary to develop sophisticated variables that were correlative with the the flood results. With a few fundamental spatial datasets like land cover, elevation and riparian buffers, I created a large dataset of predictor variables. I used raster GIS tools (notably the Arc Hydro suite) to engineer features that measured spatial relationships among watersheds, riparian buffers, population centers and water bodies.

To avoid overfitting the model I repeatedly divided up the data into distinct training and test sets. By training the model on one dataset and testing it on another I was able to ensure that the model was accounting for the actual relationship among variables and not merely statistical noise. Furthermore, I used repeated cross-validation to ensure that the results were consistent over a series of random samples and created a spatial test set to make sure that the model was free of spatial bias. 

This model could be replicated to predict flood results for other cities that have not experienced floods. Planners, engineers and policymakers could employ models like this in the future so that other cities may gain predictive intelligence from the experience of Calgary. 


This poster was presented at the 2016 ESRI User Conference. (Click to enlarge)

This poster was presented at the 2016 ESRI User Conference. (Click to enlarge)