Commit c4818432 authored by Cecilia Nievas's avatar Cecilia Nievas
Browse files

Added docs on testing scripts, tools and references

parent e6ae396b
......@@ -23,7 +23,8 @@ Section 2.6: Seismic Hazard and Risk Dynamics
GDE_TOOLS_read_SERA
===================
Tools used to read the original files of the SERA exposure model.
Tools used to read the original files of the SERA exposure model
and write the SERA HDF5 cell files and the SERA HDF5 buildings files.
"""
import numpy as np
......
......@@ -23,7 +23,7 @@ Section 2.6: Seismic Hazard and Risk Dynamics
GDE_check_consistency
=====================
See Quality Control Over the GDE.docx
See docs/05_Testing_Scripts.md.
"""
import sys
import os
......
# General
For each core script, the enumerated configurable parameters are those that are specific to that script, i.e. defined in the configuration file under a subtitle that matches the name of the file. General parameters are not explained herein but in `03_Config_File.md`.
For each core script, the enumerated configurable parameters are those that are specific to that script, i.e. defined in the configuration file under a subtitle that matches the name of the file. General parameters are not explained herein but in `03_Config_File.md` and `GDE_config_file_TEMPLATE.ini`.
# OBM_assign_cell_ids_and_adm_ids_to_footprints.py
......@@ -75,9 +75,6 @@ The input to this code is an HDF5 file with a 10-arcsec grid that is aligned wit
The method for selecting the cells to process and associated parameters need to be specified in the configuration file under `Cells to Process`.
The Global Human Settlement (GHS) dataset:
Pesaresi, Martino; Florczyk, Aneta; Schiavina, Marcello; Melchiorri, Michele; Maffenini, Luca (2019): GHS settlement grid, updated and refined REGIO model 2014 in application to GHS-BUILT R2018A and GHS-POP R2019A, multitemporal (1975-1990-2000-2015), R2019A. European Commission, Joint Research Centre (JRC) \[Dataset\] doi:10.2905/42E8BE89-54FF-464E-BE7B-BF9E64DA5218 PID: http://data.europa.eu/89h/42e8be89-54ff-464e-be7b-bf9e64da5218
# SERA_mapping_admin_units_to_cells_add_GPW.py
......@@ -187,9 +184,6 @@ One CSV file is created per occupancy case. This code does not create the associ
- `<occupancyPeriods>` night `</occupancyPeriods>`
- `<tagNames>` name_X `</tagNames>` (where X is the administrative level associated with this occupancy case and country)
OpenQuake:
Pagani, M., D. Monelli, G. Weatherill, L. Danciu, H. Crowley, V. Silva, P. Henshaw, L. Butler, M. Nastasi, L. Panzeri and M. Simionato (2014). OpenQuake engine: An open hazard (and risk) software for the global earthquake model. Seismological Research Letters 85(3), 692-702. https://github.com/gem/oq-engine
# SERA_create_visual_output_of_grid_model_full_files.py
......@@ -333,10 +327,4 @@ The first sub-group level is the occupancy case: Res, Com, Ind, Oth. Within each
- SERA_Cost_per_Bdg: the total cost per building of each of the building classes listed in SERA_classes
- SERA_Ppl_per_Bdg: the number of people per building of each of the building classes listed in SERA_classes
Attributes are used at different sub-group levels to summarise total values. Names associated with GDE and Left-Over buildings are different from each other when referring to costs and people to indicate that the value given for GDE buildings results from a weighted average of all possible building costs associated with each potential building class the building might belong too, obtained considering the probability of each class.
# References
- *OpenQuake*: Pagani, M., D. Monelli, G. Weatherill, L. Danciu, H. Crowley, V. Silva, P. Henshaw, L. Butler, M. Nastasi, L. Panzeri and M. Simionato (2014). OpenQuake engine: An open hazard (and risk) software for the global earthquake model. Seismological Research Letters 85(3), 692-702. https://github.com/gem/oq-engine
- *SERA exposure model*: Crowley, H., V. Despotaki, D. Rodrigues, V. Silva, D. Toma-Danila, E. Riga, A. Karatzetzou, S. Fotopoulou, Z. Zugic, L. Sousa and S. Ozcebe (2020). Exposure model for European seismic risk assessment. Earthquake Spectra, 36(1_suppl), pp.252-273. https://eu-risk.eucentre.it/exposure/
Attributes are used at different sub-group levels to summarise total values. Names associated with GDE and Left-Over buildings are different from each other when referring to costs and people to indicate that the value given for GDE buildings results from a weighted average of all possible building costs associated with each potential building class the building might belong too, obtained considering the probability of each class.
\ No newline at end of file
# General
For each core script, the enumerated configurable parameters are those that are specific to that script, i.e. defined in the configuration file under a subtitle that matches the name of the file. General parameters are not explained herein but in `03_Config_File.md`.
For each core script, the enumerated configurable parameters are those that are specific to that script, i.e. defined in the configuration file under a subtitle that matches the name of the file. General parameters are not explained herein but in `03_Config_File.md` and `GDE_config_file_TEMPLATE.ini`.
# SERA_testing_rebuilding_exposure_from_cells_alternative_01.py
......@@ -10,8 +10,8 @@ For each core script, the enumerated configurable parameters are those that are
The parameters that need to be specified under the `SERA_testing_rebuilding_exposure_from_cells_alternative_01` section of the configuration file are:
- countries = Countries to process. If more than one, separate with comma and space.
- admin_ids_to_ignore = 1110101. Within those countries, do not process admin units specified under this parameter. This is useful for running parts of countries only, it can be empty or ignored too.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter to use to distribute the SERA model to the grid.
- admin_ids_to_ignore = Within those countries, do not process administrative units specified under this parameter. This is useful for running parts of countries only, it can be empty or ignored too.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter used to distribute the SERA model to the grid.
- occupancy_cases = Res, Com, Ind. Occupancy cases to process.
## What the code does:
......@@ -69,7 +69,7 @@ The parameters that need to be specified under the `SERA_testing_rebuilding_expo
- countries = Countries to process. If more than one, separate with comma and space.
- min_grid_cell_id = Cell IDs with numbers below this one will be ignored (useful while running pieces of countries and not whole countries). Leave empty if no constraint applies.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter to use to distribute the SERA model to the grid.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter used to distribute the SERA model to the grid.
- occupancy_cases = Res, Com, Ind. Occupancy cases to process.
## What the code does:
......@@ -120,8 +120,8 @@ The summary TXT file refers to the whole country. If only part of the country ha
The parameters that need to be specified under the `SERA_testing_rebuilding_exposure_from_cells_alternative_03` section of the configuration file are:
- countries = Countries to process. If more than one, separate with comma and space.
- admin_ids_to_ignore = 1110101. Within those countries, do not process admin units specified under this parameter. This is useful for running parts of countries only, it can be empty or ignored too.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter to use to distribute the SERA model to the grid.
- admin_ids_to_ignore = Within those countries, do not process administrative units specified under this parameter. This is useful for running parts of countries only, it can be empty or ignored too.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter used to distribute the SERA model to the grid.
- occupancy_cases = Res, Com, Ind. Occupancy cases to process.
## What the code does:
......@@ -153,25 +153,118 @@ When reading the output CSV files, it should be noted that the difference column
# SERA_testing_compare_visual_output_vs_OQ_input_files.py
compare the number of buildings, people and cost per cell reported in the OpenQuake input file (generated from the grid) and the visual output CSV.
## Configurable parameters:
The parameters that need to be specified under the `SERA_testing_compare_visual_output_vs_OQ_input_files` section of the configuration file are:
- country = Country to process (just one).
- admin_ids_to_ignore = Within those countries, do not process administrative units specified under this parameter. This is useful for running parts of countries only, it can be empty or ignored too.
- sera_disaggregation_to_consider = area, gpw_2015_pop, ghs, sat_27f or sat_27f_model. Select the parameter used to distribute the SERA model to the grid.
- occupancy_case = Res, Com or Ind. Indicate the occupancy case to run.
- visual_output_filename = filename of the visual output CSV file that will be checked (including .csv extension).
## What the code does:
This code compares the total number of buildings, people and cost per cell stemming from the CSV files for OpenQuake (i.e. the files generated with `SERA_create_OQ_input_files.py`) and from the visual output CSV (i.e. the file generated with `SERA_create_visual_output_of_grid_model_full_files.py`). The code goes one by one the points (lon-lat pairs) in the OQ input file, gathers the totals and compares them with those in the visual output CSV file (the connection is made via the grid tools, which determine the cell ID from the lon-lat pairs). This means that if the visual output file covers areas that the OQ input file does not, no warning is raised because such areas are left unexplored.
# SERA_create_outputs_QGIS_for_checking.py
create a summary of the parameters mapped (GHS, GPW, Sat, etc) in CSV format to be read with QGIS, enabling a visual check of the results.
## Configurable parameters:
The parameter that needs to be specified under the `SERA_create_outputs_QGIS_for_checking` section of the configuration file is:
- country = Country to process (just one).
## What the code does:
This code gathers the values of area, population (from GPW) and built-up area (from GHS, Sat and Sat-Mod) and creates an output CSV with geometry defined as Well-Known-Text that can be loaded to QGIS. This allows to compare manually in QGIS the values of these parameters in the CSV and the corresponding ones from the original sources.
# SERA_testing_mapping_admin_units_to_cells_qualitycontrol.py
check the areas of the cells mapped for the administrative units for which step 3 was run.
## Configurable parameters:
The parameters that need to be specified under the `SERA_testing_mapping_admin_units_to_cells_qualitycontrol` section of the configuration file are:
- country = Country to process (just one).
- tolerance = Tolerance for testing. From looking at results in QGIS it seems we cannot expect precision above the 9th decimal for the area in km2, however, from test runs it seems we could only achieve precision at the 6th decimal place (it is at the m2 level). 1E-6 is a reasonable tolerance.
- number_of_cell_samples = Number of cells to test (not all cells in the country will be tested, it would take very long). 1000 is a reasonable number (in terms of running times).
## What the code does:
This code tests the results obtained with `SERA_mapping_admin_units_to_cells.py`. The tests are two:
- checking whether the areas calculated retrieving each of the three possible occupancies ('Res', 'Com', 'Ind') match with each other
- checking whether the area of the whole cell calculated all together is the same as the sum of the pieces that are being written to the database
The first check is only relevant when the three occupancy types are defined in the SERA exposure model at the same administrative level.
Input parameters are:
- test_country
- number of samples (number of cells within that country that will be verified)
- tolerance (for the area in km2)
This code is limited in scope and might become redundant in the future.
# GDE_check_consistency.py
It carries out different consistency checks on the resulting GDE model (see detailed description of this script).
## Configurable parameters:
The parameter that needs to be specified under the `GDE_check_consistency` section of the configuration file is:
- location_var = The GDE visual output files (whose consistency is checked by this code) have names such as GDE_visual_`crit` `location_var`.csv. `location_var` is defined in `GDE_gather_SERA_and_OBM.py` as a function of how the list of cells to process is defined. Examples of `location_var` are "Greece", "Greece_3514802", "bbox_15789632513_15789638934", "arbitrary_15789632513_15789638934", etc.
## What the code does:
This code tests the consistency of different aspects of the resulting GDE model (i.e., the outputs of the `GDE_gather_SERA_and_OBM.py`) code. The aspects being checked and the output files in which the results of the checks can be evaluated are:
- Visual Output by Cell:
1. Do the total numbers of total OBM buildings classified as Res, Com, Ind, Oth match those in the PSQL database? `check_obm_bdgs.csv`
2. Do the total numbers of SERA buildings match the total from the SERA CSV files for Attica? `check_sera_cells.csv`
3. Do the numbers of SERA, OBM, LeftOver, and Total buildings match, considering the completeness status of the cell? `check_leftover_total_cells.txt`
4. IMPORTANT: It is not possible to check LeftOver = SERA – OBM because this operation is carried out within each administrative unit involved with the cell, not in the cell as a whole.
- Visual Output by Administrative Unit:
1. Does the number of administrative units match the SERA number of administrative units? `check_sera_CRIT_differences.txt`
2. Do the total numbers of SERA buildings match the total from the SERA CSV files for Attica? `check_sera_cells.csv`
3. Do the numbers (buildings, dwellings, people, cost) per administrative unit match the SERA numbers per administrative unit (per Res, Com, Ind)? `check_sera_admin_CRIT.csv` and `check_sera_CRIT_differences.txt`.
4. Do the total numbers of total OBM buildings classified as Res, Com, Ind, Oth match those in the PSQL database? `check_obm_bdgs.csv`
- Other:
1. Do the sums of numbers of OBM buildings classified as Res, Com, Ind, Oth in the visual output by administrative unit match those of the visual output by cell? `check_obm_bdgs.csv`
2. Do the sums of numbers of LeftOver and Total buildings classified as Res, Com, Ind, Oth in the visual output by administrative unit match those of the visual output by cell? `check_leftover_total.csv` and `check_leftover_total_differences.txt`
3. Do the sums of numbers of LeftOver buildings classified as Res, Com, Ind in the visual outputs match those of the OQ input files (“cellspart”)? `check_obm_leftover_visual_vs_OQ.csv` and `check_obm_leftover_visual_vs_OQ_differences.txt`
4. Do the sums of numbers of OBM buildings with classes classified as Res, Com, Ind in the visual outputs match those of the OQ input files (“OBMpart”)? `check_obm_leftover_visual_vs_OQ.csv` and `check_obm_leftover_visual_vs_OQ_differences.txt`
The output files from this code need to be manually analysed to conclude whether the resulting GDE model is consistent with its assumptions.
# GDE_check_OQ_input_files.py
It prints to screen some summary values of the files and checks that the asset ID values are all unique.
## Configurable parameters:
No specific aparameters for this code. Of the general parameters (see `03_Config_File.md` and `GDE_config_file_TEMPLATE.ini`), it needs "File Paths" and "Available Results".
## What the code does:
This code carries out checks on the GDE input files for OpenQuake, generated with `GDE_gather_SERA_and_OBM.py`. It prints to screen some summary values of the files and checks that the asset ID values are all unique.
The summary values printed to screen need to be interpreted with knowledge on the particular region being scrutinised.
# GDE_check_tiles_vs_visual_CSVs.py
It reads the visual CSV output by cell and the corresponding GDE tiles HDF5 files and compares the number of buildings, cost and number of people in each cell according to each of the two. An output CSV file collects the discrepancies found, if any.
## Configurable parameters:
The parameters that need to be specified under the `GDE_check_tiles_vs_visual_CSVs` section of the configuration file are:
- path_GDE_tiles = Path to the GDE tiles HDF5 files to consider (full directory path).
- path_visual_csv = Path to the by-cell visual output CSV file to consider (full file path, including file extension):
- occupancy_cases = Res, Com, Ind. Occupancy cases to process.
- decimal_places_gral = Decimal places tolerance for all parameters except costs. Four (4) decimal places is reasonable.
- decimal_places_costs = Decimal places tolerance for costs, which tend to have larger discrepancies than other parameters. Zero (0) decimal places is reasonable because costs are large in magnitude.
## What the code does:
This code reads the visual CSV output by cell and the corresponding GDE tiles HDF5 files and compares the number of buildings, cost and number of people in each cell according to each of the two. An output CSV file collects the discrepancies found, if any.
The code goes through each cell ID in the CSV output and attemps to open the corresponding HDF5 file. It outputs an error if the HDF5 file is not found. The code also reads the list of all HDF5 files in the indicated directory and checks if there are any HDF5 files that are not in the CSV file, writing a warning to the output file if that is the case.
\ No newline at end of file
# General
Scripts whose names begin with "GDE_TOOLS" contain functions that are used by the main code.
# GDE_TOOLS_read_config_file.py
Tools used to read and validate the parameters in the configuration file used to run the different scripts that make up the GDE code.
NOTE: Not all possible consistency checks and verifications are carried out (e.g. not every float is tested to check whether it is a float or not).
# GDE_TOOLS_psql.py
Tools used by the GDE code to access/query/write to the PSQL databases of tiles and OBM buildings.
# GDE_TOOLS_world_grid.py
This code allows to generate the 10-arcsec world grid over which the Global Dynamic Exposure model is generated in this version of the code. Future versions of GDE will use zoom level 18 Quadtiles instead.
The grid is conceptually defined in the following way:
- The grid spacing is 10 arc-seconds.
- The grid runs from -180.0 through +180.0 in longitude.
- The grid runs from -90.0 through +90.0 in latitude.
- The top-left-most cell (NW) is cell number 0.
- The cell id increases from this first cell to the east, by "row".
- At the end of each row, the cell id "jumps" to the first (westmost) cell of the next row.
# GDE_TOOLS_read_SERA.py
Tools used to read the original files of the SERA exposure model and write the SERA HDF5 cell files and the SERA HDF5 buildings files (when running `SERA_distributing_exposure_to_cells.py`).
# GDE_TOOLS_GPW.py
Tools to load the population and density grids of Gridded Population of the World (GPW) v4.0. The input HDF5 files read by this code have been previously parsed from the original GPW files.
The Gridded Population of the World (GPW) v4.0 dataset: Center for International Earth Science Information Network-CIESIN-Columbia University (2016) Gridded Population of the World, Version 4 (GPWv4). NASA Socioeconomic Data and Applications Center, Palisades. http://dx.doi.org/10.7927/H4NP22DQ
# GDE_TOOLS_access_SERA_HDF.py
Tools for accessing the SERA HDF5 cell files and the SERA HDF5 buildings files of the parsed SERA model.
# GDE_TOOLS_access_OBM_HDF.py
Tools for accessing the HDF5 files of OBM buildings created when running `OBM_buildings_per_cell.py`.
# GDE_TOOLS_general.py
A variety of tools used by the GDE code.
# References
- *Global Human Settlement (GHS) dataset*: Pesaresi, M., A. Florczyk, M. Schiavina, M. Melchiorri and L. Maffenini (2019): GHS settlement grid, updated and refined REGIO model 2014 in application to GHS-BUILT R2018A and GHS-POP R2019A, multitemporal (1975-1990-2000-2015), R2019A. European Commission, Joint Research Centre (JRC) \[Dataset\] doi:10.2905/42E8BE89-54FF-464E-BE7B-BF9E64DA5218 PID: http://data.europa.eu/89h/42e8be89-54ff-464e-be7b-bf9e64da5218
- *Gridded Population of the World (GPW) v4.0 dataset*: Center for International Earth Science Information Network-CIESIN-Columbia University (2016) Gridded Population of the World, Version 4 (GPWv4). NASA Socioeconomic Data and Applications Center, Palisades. http://dx.doi.org/10.7927/H4NP22DQ
- *OpenBuildingMap*: http://www.openbuildingmap.org/
- *OpenQuake*: Pagani, M., D. Monelli, G. Weatherill, L. Danciu, H. Crowley, V. Silva, P. Henshaw, L. Butler, M. Nastasi, L. Panzeri and M. Simionato (2014). OpenQuake engine: An open hazard (and risk) software for the global earthquake model. Seismological Research Letters 85(3), 692-702. https://github.com/gem/oq-engine
- *OpenStreetMap*: https://www.openstreetmap.org
- *SERA exposure model*: Crowley, H., V. Despotaki, D. Rodrigues, V. Silva, D. Toma-Danila, E. Riga, A. Karatzetzou, S. Fotopoulou, Z. Zugic, L. Sousa and S. Ozcebe (2020). Exposure model for European seismic risk assessment. Earthquake Spectra, 36(1_suppl), pp.252-273. https://eu-risk.eucentre.it/exposure/
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment