Commit 7e0b0fe9 authored by Cecilia Nievas's avatar Cecilia Nievas

Merge branch 'dev_gde_docs_1' into 'master'

First step of transferring documentation to Gitlab

See merge request dynamicexposure/globaldynamicexposure/gde_calculations_prototype!4
parents ac22dcc7 fbfaab0d
......@@ -8,7 +8,18 @@ NOTE: This is work in progress!
Python 3.x
(all dependencies to be listed soon!)
Dependencies (not all are needed for all individual scripts):
- Numpy (1.13.3 or later)
- Pandas (0.22.0 or later)
- Geopandas (0.7.0 or later)
- psycopg2 (2.8.5 or later)
- Shapely (1.6.4 or later)
- Matplotlib (2.1.1 or later)
- h5py (2.7.1 or later)
- json (2.0.9 or later)
- iso3166 (1.0.1 or later)
- pyproj ( or later)
# Copyright and Copyleft
# Introduction
This repository contains prototype code for the Global Dynamic Exposure (GDE) Model being developed at Section 2.6 of the GFZ German Research Centre for Geosciences (
The Global Dynamic Exposure Model combines existing building exposure models that are usually spatially aggregated with crowd-sourced and worldwide building information to produce a new state of the art high-resolution model. One of its main sources of information is OpenBuildingMap (OBM,, a twin project that continuosly harvests data from OpenStreetMap (OSM,, cleanses it, processes additional building attributes and makes the results available to the community.
This prototype code focuses on Europe, as it merges the building exposure model of the new European Seismic Risk Model (ESRM20) (, developed under the SERA project (, with building-by-building data from OBM. The ESRM20 model (also named as "SERA model" herein) is still to be officially released by the end of 2020, and we have been working so far with preliminary, in-progress data files.
# Overview of the Procedure
The overall procedure can be grouped into three main stages:
1. The distribution of the SERA exposure model, defined by administrative levels and units, onto a grid.
2. The retrieval of data on individual buildings from OpenBuildingMap for that same grid.
3. The combination of both sources of data.
The spacing of the grid used by the present prototype code is 10 arc-seconds, though the final code will work with a map tiles approach, handled through a Quadtree principle. In the present 10-arcsec grid, the cell ID starts from the North-West corner of the world, moves East by row, and finishes at the South-East corner of the world. First cell ID is 0, last cell ID is 8,398,079,999 (total number of cells is 8,398,080,000). There are 64,800 rows and 129,600 columns of cells.
The first stage consists in going one by one the relevant administrative units of each country, determining the grid cells associated with each unit, and distributing the total number of buildings indicated by the SERA exposure model across those grid cells, according to a certain criterion ("distribution method"), such as population count or built-up area estimated from the processing of remote-sensing imagery. The proportion or distribution of building classes (structural types) is also retrieved from SERA, as well as the parameters of relevance for each building class, such as the number of people per dwelling, number of dwellings per building, cost per area, etc. All this information is stored as HDF5 files.
In the second stage, information on the buildings represented in OpenBuildingMap is gathered for the same intersections of grid cells with administrative units. For each building, data on its location, footprint area, occupancy type and number of storeys is retrieved from OBM and stored in HDF5 files, one per cell ID and occupancy case (residential, commercial, industrial, other). If the SERA exposure model is defined at the location of the building, the distribution of potential classes is assigned to the building, narrowed down based on the number of storeys, if available.
In the third stage, SERA and OBM buildings are combined together. If the cell is OBM-complete, then only OBM buildings are considered for the final exposure model. If the cell is OBM-incomplete, the number of SERA buildings is treated as a theoretical number against which the number of OBM buildings is compared. If the latter is smaller than the former, the number of left-over buildings is calculated as SERA – OBM, while zero left-over buildings are considered if there are already more OBM buildings than the theoretical number from the SERA model. For the final exposure, OBM buildings retain their specific locations while all left-over buildings are treated as a set lumped at the centroid of the cell.
# Overview of the Code in this Repository
Scripts whose names start with `GDE_TOOLS_` contain functions that are used by all the other scripts.
All scripts that are not tools require input parameters that are read from a configuration file. The file `GDE_config_file_TEMPLATE.ini` contains the template of such a configuration file.
# Copyright and Copyleft
Copyright (C) 2020
Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
See further details in `` and `LICENSE`.
# Acknowledgements
This project is partially funded by:
- the Real-time Earthquake Risk Reduction for a Resilient Europe (RISE) project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 821115;
- the Large-scale EXecution for Industry and Society (LEXIS) project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825532;
- the Airborne Observation of Critical Infrastructures (Luftgestützte Observation Kritischer Infrastrukturen, LOKI in German) project, which has received funding from the German Federal Ministry for Education and Research (BMBF) under funding code (FKZ) 03G0890D.
The authors would like to thank Dr. Helen Crowley, from Eucentre, for providing early access to the preliminary, in-progress ESRM20 exposure model.
\ No newline at end of file
# Configuration File
All the scripts in this repository require input parameters that are read from a configuration file. The file `GDE_config_file_TEMPLATE.ini` contains the template of such a configuration file. The file `guide_to_use_of_config_file.csv` indicates which sub-sections of the configuration file are needed by which script.
# Running the Scripts
The scripts are run from the command line as:
`python3 GDE_config_file.ini`
# Execution Order
## Core Scripts
The order in which the scripts in the present repository need to be run to produce the GDE model for a region of interest is:
1. Run ``
2. Run ``.
3. Run ``
4. Run `` (if GHS criterion desired)
5. Run `` (if GPW criterion desired)
6. Run `` (if Sat or Sat_mod criterion desired)
7. Run `` with the desired distribution method.
8. If the OpenQuake input files for the SERA model distributed onto a grid are desired (i.e. not GDE, just SERA), run `` with the desired distribution method.
9. If a CSV summarising the number of buildings, dwellings, people and costs by cell according to the SERA model is desired (i.e. not GDE, just SERA), run `` with the desired distribution method.
10. Run `` with the desired distribution method.
11. Run `` with the desired distribution method. The output is a series of CSV files that serve as input for damage/risk calculations to be run in OpenQuake (, a CSV file that summarises results per cell and contains the geometry of the cells so that it can all be visualised with a GIS, and a CSV file that summarises results per adminstrative unit and contains the geometry of the administrative boundaries so that it can all be visualised with a GIS.
## Testing Scripts
- The scripts ``, `` and `` can be run after step 7 above. They compare the SERA-on-a-grid model against the original files of the SERA model.
- The script `` can be run after step 9 above to compare the number of buildings, people and cost per cell reported in the OpenQuake input file (generated from the grid) and the visual output CSV.
- The script `` can be run after step 6 above to create a summary of the parameters mapped (GHS, GPW, Sat, etc) in CSV format to be read with QGIS, enabling a visual check of the results.
- The script `` can be run after step 3 above to check the areas of the cells mapped for the administrative units for which step 3 was run.
- The script `` can be run after step 11 above. It carries out different consistency checks on the resulting GDE model (see detailed description of this script).
- The script `` can be run after step 11 above. It prints to screen some summary values of the files and checks that the asset ID values are all unique.
## Other Scripts
- The script `` plots results (number of buildings, dwellings, people, costs) by administrative unit and by cell of the GDE model. It gathers in the same figure results obtained using different methods to distribute SERA to a grid.
- The following scripts are used to investigate the original SERA exposure files:
- ``
- ``
- ``
- ``
- ``
- ``
- ``
# Pre-Requisites / Initial Assumptions
- The OpenBuildingMap PostgreSQL database exists and it contains, at least, the following fields the cell_id, XXX_adm_id and XXX_adm_level fields are completed with ``.
| column_name | data_type |
| ------ | ------ |
| osm_id | bigint |
| way | USER-DEFINED |
| building | text |
| building_levels | text |
| gem_occupancy | text |
| way_area | double precision |
| adm3_id | bigint |
| cell_id | bigint |
| country_iso2 | character |
| res_adm_id | character varying |
| res_adm_level | integer |
| com_adm_id | character varying |
| com_adm_level | integer |
| ind_adm_id | character varying |
| ind_adm_level | integer |
- Shapefiles of the administrative boundaries associated with the SERA model have been imported to the OpenBuildingMap PostgreSQL database. Their table will look something like this:
| column_name | data_type |
| ------ | ------ |
| id_0 | bigint |
| name_0 | character varying |
| id_1 | bigint |
| name_1 | character varying |
| id_2 | bigint |
| name_2 | character varying |
| id_3 | bigint |
| name_3 | character varying |
- The structure of the Tiles PostgreSQL database exists, even if it is empty:
| column_name | data_type |
| ------ | ------ |
| cell_id | bigint |
| country | character varying |
| occupancy | character varying |
| adm_level | smallint |
| adm_id | character varying |
| area | double precision |
| geom | USER-DEFINED |
| gpw_2015_pop | double precision |
| ghs_km2 | double precision |
| sat_27f_km2 | double precision |
| sat_27f_model_km2 | double precision |
# Configuration File
All the scripts in this repository require input parameters that are read from a configuration file. The file `GDE_config_file_TEMPLATE.ini` contains the template of such a configuration file.
The configuration file is organised into sub-sections, some of which are common to several scripts while some others are specific. Those that are common are:
- `File Paths`: main input and output paths.
- `Available Results`: used by a few scripts that work together on results from several different runs (e.g. using different methods to distribute the SERA model to a grid).
- `OBM Database`: name of the database, schema, table and user to access the OBM buildings.
- `Tiles Database`: name of the database, schema, table and user to access the tiles/cells database table.
- `Admin Units Database`name of the database, schema, table and user to access the administrative units database table.
- `Cells to Process`: the list of cells to process can be defined by means of several methods: by country, by administrative unit ID of a country, with a bounding box, specifying a number of random cells to select from a country, or by means of an arbitrary list of cell IDs. All these parameters are specified in this sub-section.
- `Ocuppancy String Groups`: mapping of occupancy strings to occupancy categories (e.g. RES1, RES2, etc., are "Res", COM1, COM11, etc., are "Com", etc.)
The ones that are specific to each script are named just like the script (e.g. `GDE_gather_SERA_and_OBM`, `GDE_plot_maps`, etc).
The file `guide_to_use_of_config_file.csv` indicates which sub-sections of the configuration file are needed by which script. The first column contains the name of the script. The second column, "Self", indicates whether a script-specific sub-section (named like the script itself) is required or not. All other columns correspond to the common sub-sections listed above. "Y" means the sub-section is needed, "N" means it is not needed. Nothing happens if sub-sections that are not needed are provided, but an error message is raised if required sub-sections are missing.
FILE,Self,File Paths,OBM Database,Tiles Database,Admin Units Database,Ocuppancy String Groups,Available Results,Cells to Process
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment