Commit e0d482e5 authored by Nicolas Garcia Ospina's avatar Nicolas Garcia Ospina
Browse files

Improved config file and added documentation

parent c1ad7d11
Pipeline #21418 passed with stage
in 4 minutes and 22 seconds
...@@ -3,6 +3,7 @@ datasource: ...@@ -3,6 +3,7 @@ datasource:
pathname: /your/path/to/datasource pathname: /your/path/to/datasource
raster_files_index: GHS_TEST_INDEX.shp raster_files_index: GHS_TEST_INDEX.shp
built_pixel_values: [6, 5, 4, 3] built_pixel_values: [6, 5, 4, 3]
# method_id: 1 # optional
tiles: tiles:
use_txt: False use_txt: False
...@@ -10,8 +11,8 @@ tiles: ...@@ -10,8 +11,8 @@ tiles:
txt_filepath: tiles.txt txt_filepath: tiles.txt
output_pathname: ./results output_pathname: ./results
number_cores: 10 number_cores: 1
chunk_size: 1000 batch_size: 1000
database: database:
host: your_host.dir.request_data host: your_host.dir.request_data
...@@ -22,6 +23,7 @@ database: ...@@ -22,6 +23,7 @@ database:
roads_table: roads_table:
tablename: planet_osm_roads tablename: planet_osm_roads
geometry_field: way geometry_field: way
process_buffer_magnitude: False
target_database: target_database:
host: your_host.dir.request_data host: your_host.dir.request_data
......
...@@ -10,7 +10,7 @@ have different ways to describe the surface but can be reduced to a binary solut ...@@ -10,7 +10,7 @@ have different ways to describe the surface but can be reduced to a binary solut
document a short description of the datasets can be found. For more information, check the datasets document a short description of the datasets can be found. For more information, check the datasets
official sites. official sites.
## Method_id ## source_id
The input datasets can be instantiated with a `method_id` if desired. This is recommended if the aim is The input datasets can be instantiated with a `method_id` if desired. This is recommended if the aim is
to create a multi-source product. This option can be activated by setting the `method_id` argument to to create a multi-source product. This option can be activated by setting the `method_id` argument to
...@@ -33,10 +33,10 @@ datasets. The structure has two main components to follow (`file paths` and `ras ...@@ -33,10 +33,10 @@ datasets. The structure has two main components to follow (`file paths` and `ras
## File paths ## File paths
The program searches for an environment variable called `OBMGAPANALYSIS_BASE_PATH`, which is the path The program config file uses the path name found under `datasource.pathname`, which points to the path
where the different datasets are stored and should follow this structure. where the different datasets are stored and should follow this structure.
OBMGAPANALYSIS_BASE_PATH datasource.pathname
+-- dataset_1_raster_files_index.shp +-- dataset_1_raster_files_index.shp
+-- dataset_1_directory +-- dataset_1_directory
| +-- dataset_subdirectory_1 | +-- dataset_subdirectory_1
......
# Configuration file
The OpenBuildingMap (OBM) gap analysis program can be configured to fit user needs. This is done by
changing the different parameters within the `config.yml` file. A sample file can be found in
the package directory under the filename `config-example.yml`.
## config.yml
The `config.yml` file follows a hierarchical structure with hierarchies defined by 2-space indentations.
An explanation of the different parameters can be found here:
The `datasource` section allows configuring path names for the raster files and the
`raster_files_index`. It also allows to set the rest of the parameters found in [Input dataset docs](./02_Input_datasets.md).
datasource: Configure your settlement layer source
crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
pathname (str): Pathname where the dataset and raster_files_index are located.
raster_files_index (str): Filename of the raster index.
built_pixel_values (list): List with pixel values that represent built areas.
source_id (int): ID related to a dataset/method used (optional).
The `tiles` section inputs the list of Quadkeys to be processed. The `use_txt` argument defines how to
input the tiles. If set to `True`, Quadkeys are read from `txt_filepath` instead of `tiles_list`
tiles:
use_txt (bool): If set to True = get tiles from `txt_filepath`. False = from `tiles_list`
tiles_list (list): List of Quadkeys as strings
txt_filepath (str): file path of a text file with all quadkeys to be read
The following parameters define the processing output and can improve the performance of the program.
First, `output_pathname` is the directory to store and read CSV files for further import in SQL. The
`number_cores` parameter refers to the maximum number of parallel processes the system can handle. This is defined as the number of
cores that can be dedicated to the program execution. Finally, `batch_size` sets the maximum
amount of tiles to be handled per process. Each CSV file may contain maximum this amount of tiles if
all of them provide built areas.
output_pathname (str): Target path name for the csv file writing and reading.
number_cores (int): Desired maximum number of parallel processes to execute.
batch_size (int): Maximum amount of tiles to be handled per process
The last sections refer to database connections. `database` holds a database from which roads can be
extracted to refine built areas, also it may contain buildings if the program wants to calculate a
tile based built-up area. The `process_buffer_magnitude` is a parameter that defines how the OBM roads
(defined as lines and not polygons) are processed, giving them a width. Be careful to use the same units as in the
`datasource.crs` (meters or deg). `target_database` is a second database with the table where processed
tiles will be imported into.
database:
host (str): Postgres Database host address.
dbname (str): PostgreSQL database name.
port (int): Port to connect to the PostgreSQL database.
username (str): User to connect to the PostgreSQL database.
password (str or getpass.getpass): Password for `username` argument.
roads_table:
tablename (str): Table name within database for searching.
geometry_field (str): Name of the column with geometries.
process_buffer_magnitude (float): Numeric magnitude for the
polygon buffer (units are equal to the coordinate system units).
target_database:
host (str): Postgres Database host address.
dbname (str): PostgreSQL database name.
port (int): Port to connect to the PostgreSQL database.
username (str): User to connect to the PostgreSQL database.
password (str or getpass.getpass): Password for `username` argument.
tiles_table:
tablename (str): Table name within database for writing.
geometry_field (str): Name of the column with geometries.
...@@ -37,16 +37,18 @@ class DataSource: ...@@ -37,16 +37,18 @@ class DataSource:
Args: Args:
crs (str): Coordinate system associated with the DataSource, given as a EPSG code. crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
pathname (str): Pathname where the dataset and explanatory dataframe are located. pathname (str): Pathname where the dataset and raster_files_index are located.
raster_files_index (str): Filename of the raster index. raster_files_index (str): Filename of the raster index.
method_id (int): ID related to a dataset/method used (optional). Default = False built_pixel_values (list): List with pixel values that represent built areas.
source_id (int): ID related to a dataset/method used (optional). Default = False
Attributes: Attributes:
self.crs (str): Coordinate system associated with the DataSource, given as a EPSG code. self.crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
self.pathname (str): Pathname where the dataset and explanatory dataframe are located. self.pathname (str): Pathname where the dataset and raster_files_index are located.
self.raster_files_index (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe self.raster_files_index (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe
with raster relative filepaths and respective geometries. with raster relative filepaths and respective geometries.
...@@ -54,14 +56,14 @@ class DataSource: ...@@ -54,14 +56,14 @@ class DataSource:
self.built_pixel_values (list): List with pixel values that represent built areas. self.built_pixel_values (list): List with pixel values that represent built areas.
Hints on pixel built_pixel_values available at docs/02_Input_datasets.md Hints on pixel built_pixel_values available at docs/02_Input_datasets.md
self.method_id (int): ID related to the settlement dataset/method used (optional) self.source_id (int): ID related to the settlement dataset/method used (optional)
""" """
def __init__(self, crs, pathname, raster_files_index, built_pixel_values, method_id=False): def __init__(self, crs, pathname, raster_files_index, built_pixel_values, source_id=False):
self.crs = crs self.crs = crs
self.pathname = pathname self.pathname = pathname
self.raster_files_index = geopandas.read_file( self.raster_files_index = geopandas.read_file(
os.path.join(pathname, raster_files_index) os.path.join(pathname, raster_files_index)
) )
self.built_pixel_values = built_pixel_values self.built_pixel_values = built_pixel_values
self.method_id = method_id self.method_id = source_id
...@@ -95,6 +95,7 @@ def multiprocess_chunk(quadkey_batch): ...@@ -95,6 +95,7 @@ def multiprocess_chunk(quadkey_batch):
datasource=datasource, datasource=datasource,
database_crs_number=roads_database_crs_number, database_crs_number=roads_database_crs_number,
table_config=db_config["roads_table"], table_config=db_config["roads_table"],
buffer_magnitude=db_config["process_buffer_magnitude"],
) )
if result is not None: if result is not None:
build_up_areas.append(result) build_up_areas.append(result)
...@@ -172,15 +173,15 @@ def main(): ...@@ -172,15 +173,15 @@ def main():
# Generate a parallel process pool with each quadkey batch and process # Generate a parallel process pool with each quadkey batch and process
num_processes = config["number_cores"] num_processes = config["number_cores"]
chunk_size = config["chunk_size"] batch_size = config["batch_size"]
quadkey_batchs = [ quadkey_batches = [
tiles_list[i : i + chunk_size] for i in range(0, len(tiles_list), chunk_size) tiles_list[i : i + batch_size] for i in range(0, len(tiles_list), batch_size)
] ]
logging.info("Creating multiprocessing pool") logging.info("Creating multiprocessing pool")
with multiprocessing.Pool(processes=num_processes) as pool: with multiprocessing.Pool(processes=num_processes) as pool:
logging.info("Start parallel processing") logging.info("Start parallel processing of {} batches".format(len(quadkey_batches)))
pool.map(multiprocess_chunk, quadkey_batchs) pool.map(multiprocess_chunk, quadkey_batches)
logging.info("Parallel processing finished, closing pool") logging.info("Parallel processing finished, closing pool")
pool.close() pool.close()
......
...@@ -197,7 +197,7 @@ class TileProcessor: ...@@ -197,7 +197,7 @@ class TileProcessor:
return geometry return geometry
@staticmethod @staticmethod
def process_dataframe_with_tile(input_dataframe, tile, buffer_magnitude=False): def process_dataframe_with_tile(input_dataframe, tile, buffer_magnitude=0.0):
""" """
Returns a (multi)polygon processed with a tile object and, if desired, buffered Returns a (multi)polygon processed with a tile object and, if desired, buffered
by a certain magnitude. by a certain magnitude.
...@@ -216,7 +216,7 @@ class TileProcessor: ...@@ -216,7 +216,7 @@ class TileProcessor:
""" """
geometry = input_dataframe.unary_union geometry = input_dataframe.unary_union
geometry = TileProcessor.reproject_polygon(geometry, input_dataframe.crs, tile.crs) geometry = TileProcessor.reproject_polygon(geometry, input_dataframe.crs, tile.crs)
if buffer_magnitude: if buffer_magnitude > 0.0:
geometry = geometry.buffer(buffer_magnitude) geometry = geometry.buffer(buffer_magnitude)
geometry = TileProcessor.clip_to_tile_extent(geometry, tile) geometry = TileProcessor.clip_to_tile_extent(geometry, tile)
...@@ -267,7 +267,7 @@ class TileProcessor: ...@@ -267,7 +267,7 @@ class TileProcessor:
associated to the Tile and a given DataSource. associated to the Tile and a given DataSource.
Contains: Contains:
quadkey (str): Tile quadkey quadkey (str): Tile quadkey
method_id (int): Integer associated to a predefined method source_id (int): Integer associated to a predefined method
built_area (str): Polygon string projected to WGS84 coordinates. built_area (str): Polygon string projected to WGS84 coordinates.
size_built_area (float): Area measured in squared meters. size_built_area (float): Area measured in squared meters.
last_update (str): Date when the pickle was generated. last_update (str): Date when the pickle was generated.
...@@ -276,7 +276,7 @@ class TileProcessor: ...@@ -276,7 +276,7 @@ class TileProcessor:
tile (tile.Tile): Tile object with quadkey, crs and geometry attributes. tile (tile.Tile): Tile object with quadkey, crs and geometry attributes.
datasource (datasource.DataSource): DataSource instance with crs, datasource (datasource.DataSource): DataSource instance with crs,
pathname, method_id and raster_files_index attributes. pathname, source_id and raster_files_index attributes.
built_polygon (shapely.geometry.multipolygon.MultiPolygon): Shapely built_polygon (shapely.geometry.multipolygon.MultiPolygon): Shapely
polygon or multipolygon of the built area. polygon or multipolygon of the built area.
...@@ -291,18 +291,20 @@ class TileProcessor: ...@@ -291,18 +291,20 @@ class TileProcessor:
results = { results = {
"quadkey": tile.quadkey, "quadkey": tile.quadkey,
"method_id": datasource.method_id, "source_id": datasource.source_id,
"built_area": TileProcessor.reproject_polygon(built_polygon, tile.crs, "epsg:4326"), "built_area": TileProcessor.reproject_polygon(built_polygon, tile.crs, "epsg:4326"),
"size_built_area": TileProcessor.albers_area_calculation(built_polygon, tile.crs), "size_built_area": TileProcessor.albers_area_calculation(built_polygon, tile.crs),
"last_update": str(date.today()), "last_update": str(date.today()),
} }
if not results["method_id"]: if not results["source_id"]:
del results["method_id"] del results["source_id"]
return results return results
@staticmethod @staticmethod
def get_build_up_area(quadkey, datasource, database, database_crs_number, table_config): def get_build_up_area(
quadkey, datasource, database, database_crs_number, table_config, buffer_magnitude
):
"""Run the complete processing of a quadkey and returns a dictionary """Run the complete processing of a quadkey and returns a dictionary
created with TileProcessor.build_dictionary. created with TileProcessor.build_dictionary.
...@@ -310,7 +312,7 @@ class TileProcessor: ...@@ -310,7 +312,7 @@ class TileProcessor:
quadkey (str): Quadkey code associated with a Bing quadtree tile. quadkey (str): Quadkey code associated with a Bing quadtree tile.
datasource (datasource.DataSource): DataSource instance with crs, datasource (datasource.DataSource): DataSource instance with crs,
pathname, method_id and raster_files_index attributes. pathname, source_id and raster_files_index attributes.
database (database.Database): Database instance with credentials database (database.Database): Database instance with credentials
and connection ready to perform data queries. and connection ready to perform data queries.
...@@ -319,6 +321,9 @@ class TileProcessor: ...@@ -319,6 +321,9 @@ class TileProcessor:
table_config (dict): Dictionary with table name, schema and geometry_field. table_config (dict): Dictionary with table name, schema and geometry_field.
This is part of the config.yml file. This is part of the config.yml file.
buffer_magnitude (float): Numeric magnitude for the polygon buffer (units are
equal to the coordinate system units)
Returns: Returns:
results (dictionary): Dictionary with built-up area information. results (dictionary): Dictionary with built-up area information.
""" """
...@@ -342,7 +347,7 @@ class TileProcessor: ...@@ -342,7 +347,7 @@ class TileProcessor:
tile=tile, crs_number=database_crs_number, **table_config tile=tile, crs_number=database_crs_number, **table_config
) )
roads_processed = TileProcessor.process_dataframe_with_tile( roads_processed = TileProcessor.process_dataframe_with_tile(
roads_in_tile, tile=tile, buffer_magnitude=3.0 roads_in_tile, tile=tile, buffer_magnitude=buffer_magnitude
) )
refined_built_area = TileProcessor.polygon_difference( refined_built_area = TileProcessor.polygon_difference(
clip_built_geometry, roads_processed clip_built_geometry, roads_processed
......
...@@ -33,7 +33,7 @@ def test_init(): ...@@ -33,7 +33,7 @@ def test_init():
pathname=os.environ["TEST_BASE_PATH"], pathname=os.environ["TEST_BASE_PATH"],
raster_files_index="GHS_TEST_INDEX.shp", raster_files_index="GHS_TEST_INDEX.shp",
built_pixel_values=[6, 5, 4, 3], # Built pixels in GHSL built_pixel_values=[6, 5, 4, 3], # Built pixels in GHSL
method_id=1, source_id=1,
) )
assert datasource.crs == "epsg:3857" assert datasource.crs == "epsg:3857"
......
...@@ -35,7 +35,7 @@ datasource = DataSource( ...@@ -35,7 +35,7 @@ datasource = DataSource(
pathname=os.environ["TEST_BASE_PATH"], pathname=os.environ["TEST_BASE_PATH"],
raster_files_index="GHS_TEST_INDEX.shp", raster_files_index="GHS_TEST_INDEX.shp",
built_pixel_values=[6, 5, 4, 3], built_pixel_values=[6, 5, 4, 3],
method_id=1, source_id=1,
) )
tile = Tile("122100200320321022", datasource.crs) tile = Tile("122100200320321022", datasource.crs)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment