Commit 3cb68b1d authored by Nicolas Garcia Ospina's avatar Nicolas Garcia Ospina
Browse files

Included build_dictionary and fixed main program

parent 8e7f8eb4
Pipeline #20861 passed with stage
in 4 minutes and 13 seconds
...@@ -2,7 +2,7 @@ SOURCES=obmgapanalysis tests setup.py ...@@ -2,7 +2,7 @@ SOURCES=obmgapanalysis tests setup.py
LENGTH=96 LENGTH=96
check: $(SOURCES) check: $(SOURCES)
flake8 --max-line-length=$(LENGTH) $^ flake8 --max-line-length=$(LENGTH) --ignore=E203,W503 $^
black --check --line-length $(LENGTH) $^ black --check --line-length $(LENGTH) $^
pylint -E $^ pylint -E $^
......
...@@ -14,6 +14,7 @@ data such as Global Human Settlement Layer (GHSL) and World Settlement Footprint ...@@ -14,6 +14,7 @@ data such as Global Human Settlement Layer (GHSL) and World Settlement Footprint
### Python libraries ### Python libraries
* `numpy`
* `pyproj>=3.0.0.post1` * `pyproj>=3.0.0.post1`
* `shapely` * `shapely`
* `babelgrid` * `babelgrid`
......
database:
host: your_host.dir
dbname: gis
port: 5433
username: postgres
password: secret_pass
roads_table:
tablename: planet_osm_roads
geometry_field: way
datasource: datasource:
crs: epsg:3857 crs: epsg:3857
pathname: /your/path/to/datasource pathname: /your/path/to/datasource
...@@ -17,4 +7,18 @@ datasource: ...@@ -17,4 +7,18 @@ datasource:
tiles: tiles:
use_txt: False use_txt: False
tiles_list: ['122100200320321020', '122100200320321021', '122100200320321022', '122100200320321023'] tiles_list: ['122100200320321020', '122100200320321021', '122100200320321022', '122100200320321023']
txt_file: tiles.txt txt_filepath: tiles.txt
output_pathname: ./results
number_cores: 10
chunk_size: 1000
database:
host: your_host.dir.request_data
dbname: gis
port: 5433
username: postgres
password: secret_pass
roads_table:
tablename: planet_osm_roads
geometry_field: way
...@@ -5,11 +5,26 @@ shape of a settlement by assigning pixel values to different status (e.g. 1=buil ...@@ -5,11 +5,26 @@ shape of a settlement by assigning pixel values to different status (e.g. 1=buil
There are currently some models based on remote sensing observations that depict the settlement There are currently some models based on remote sensing observations that depict the settlement
distributions among the world (e.g. [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php) distributions among the world (e.g. [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php)
available in 30m resolution or [WSF-2015](https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_global_human_footprint) available in 30m resolution). available in 30m resolution or [WSF-2015](https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_global_human_footprint) available in 10m resolution).
have different ways to describe the surface but can be reduced to a binary solution, in this have different ways to describe the surface but can be reduced to a binary solution, in this
document a short description of the datasets can be found. For more information, check the datasets document a short description of the datasets can be found. For more information, check the datasets
official sites. official sites.
## Method_id
The input datasets can be instantiated with a `method_id` if desired. This is recommended if the aim is
to create a multi-source product. This option can be activated by setting the `method_id` argument to
any integer value. By assigning a `method_id`, the output can be queried by this ID. A multi-source
output may be organized like this:
| method_id | description |
|:------:|:------:|
| 1 | Built areas produced with GHSL data from 2014|
| 2 | Built areas produced with GHSL data from 1975|
| 3 | Built areas produced with WSF data |
| ... | ... |
| n | Built areas retrieved with n dataset |
# Data structure # Data structure
This program makes use of the original data structure from the [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php) This program makes use of the original data structure from the [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php)
......
...@@ -38,18 +38,26 @@ class Database: ...@@ -38,18 +38,26 @@ class Database:
includes filepaths and raster footprints. These sources can either be: includes filepaths and raster footprints. These sources can either be:
Args: Args:
host (str): Postgres Database host direction. host (str): Postgres Database host direction.
dbname (str): Postgres database name.
port (str): Port where the Postgres database can be found. dbname (str): Postgres database name.
username (str): User to connect to the Postgres database.
password (str or getpass.getpass): Password for username argument port (str): Port where the Postgres database can be found.
username (str): User to connect to the Postgres database.
password (str or getpass.getpass): Password for username argument
Attributes: Attributes:
self.host (str): Postgres Database host direction. self.host (str): Postgres Database host direction.
self.dbname (str): Postgres database name.
self.port (str): Port where the Postgres database can be found. self.dbname (str): Postgres database name.
self.username (str): User to connect to the Postgres database.
self.password (str or getpass.getpass): Password for username argument self.port (str): Port where the Postgres database can be found.
self.username (str): User to connect to the Postgres database.
self.password (str or getpass.getpass): Password for username argument
""" """
...@@ -74,9 +82,6 @@ class Database: ...@@ -74,9 +82,6 @@ class Database:
self.cursor (psycopg2.extensions.cursor): self.cursor (psycopg2.extensions.cursor):
Cursor object to run commands on the database. Cursor object to run commands on the database.
Returns:
None
""" """
connection_string = "host={} dbname={} user={} password={} port={}".format( connection_string = "host={} dbname={} user={} password={} port={}".format(
...@@ -94,7 +99,9 @@ class Database: ...@@ -94,7 +99,9 @@ class Database:
Args: Args:
schema (str): Database schema where the database is located. Default = "public" schema (str): Database schema where the database is located. Default = "public"
tablename (str): Table name within database for searching (e.g. "obm_buildings") tablename (str): Table name within database for searching (e.g. "obm_buildings")
geometry_field (str): Name of the column with geometries. Default = "geometry" geometry_field (str): Name of the column with geometries. Default = "geometry"
Returns: Returns:
...@@ -116,11 +123,14 @@ class Database: ...@@ -116,11 +123,14 @@ class Database:
Args: Args:
tile (tile.Tile): Tile object with quadkey, crs and geometry attributes. tile (tile.Tile): Tile object with quadkey, crs and geometry attributes.
tablename (str): Table name within database for searching (e.g. "obm_buildings") tablename (str): Table name within database for searching (e.g. "obm_buildings")
crs_number (str): Number of the epsg/srid code for the specified table. crs_number (str): Number of the epsg/srid code for the specified table.
schema (str): Database schema where the database is located. Default = "public" schema (str): Database schema where the database is located. Default = "public"
geometry_field (str): Name of the column with geometries. Default = "geometry"
geometry_field (str): Name of the column with geometries. Default = "geometry"
Returns: Returns:
output_features (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe output_features (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe
......
...@@ -41,22 +41,27 @@ class DataSource: ...@@ -41,22 +41,27 @@ class DataSource:
raster_files_index (str): Filename of the raster index. raster_files_index (str): Filename of the raster index.
method_id (int): ID related to a dataset/method used (optional). Default = False
Attributes: Attributes:
self.crs (str): Coordinate system associated with the DataSource, given as a EPSG code. self.crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
self.pathname (str): Pathname where the dataset and explanatory dataframe are located. self.pathname (str): Pathname where the dataset and explanatory dataframe are located.
self.raster_files_index (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe self.raster_files_index (geopandas.geodataframe.GeoDataFrame): GeoPandas dataframe
with raster relative filepaths and respective geometries. with raster relative filepaths and respective geometries.
self.built_pixel_values (list): List with pixel values that represent built areas. self.built_pixel_values (list): List with pixel values that represent built areas.
Hints on pixel built_pixel_values available at docs/02_Input_datasets.md Hints on pixel built_pixel_values available at docs/02_Input_datasets.md
self.method_id (int): ID related to the settlement dataset/method used (optional)
""" """
def __init__(self, crs, pathname, raster_files_index, built_pixel_values): def __init__(self, crs, pathname, raster_files_index, built_pixel_values, method_id=False):
self.crs = crs self.crs = crs
self.pathname = pathname self.pathname = pathname
self.raster_files_index = geopandas.read_file( self.raster_files_index = geopandas.read_file(
os.path.join(pathname, raster_files_index) os.path.join(pathname, raster_files_index)
) )
self.built_pixel_values = built_pixel_values self.built_pixel_values = built_pixel_values
self.method_id = method_id
#!/usr/bin/env python3
# Copyright (C) 2021:
# Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
# General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.gnu.org/licenses/.
import os
import logging
import geopandas
# Initialize log
logger = logging.getLogger(__name__)
class FileProcessor:
@staticmethod
def write_tiles_to_csv(
list_of_dictionaries, output_pathname, column_geometry="built_area", crs="epsg:4326"
):
"""Write a csv file from a list of dictionaries.
Args:
list_of_dictionaries (list): List of dictionaries with built-up areas to
write.
outdir (srt): Target path name to write the csv file.
column_geometry (str): Name of the field that contains geometries.
Default = "built_area"
crs (str): EPSG code of the data projection. Default = "epsg:4326"
"""
tiles_gdf = geopandas.GeoDataFrame(
list_of_dictionaries, geometry=column_geometry, crs=crs
)
filepath_out = os.path.join(
output_pathname, "{}_{}.csv".format(tiles_gdf.tile_id.iloc[0], len(tiles_gdf.index))
)
logger.info("Creating {}".format(filepath_out))
tiles_gdf.to_csv(filepath_out, index=False)
...@@ -21,119 +21,107 @@ import logging ...@@ -21,119 +21,107 @@ import logging
import os import os
import sys import sys
import yaml import yaml
import numpy
from obmgapanalysis.datasource import DataSource from obmgapanalysis.datasource import DataSource
from obmgapanalysis.tile import Tile from obmgapanalysis.tileprocessor import TileProcessor
from obmgapanalysis.processor import TileProcessor from obmgapanalysis.fileprocessor import FileProcessor
from obmgapanalysis.database import Database from obmgapanalysis.database import Database
import multiprocessing
# Add a logger printing error, warning, info and debug messages to the screen # Add a logger printing error, warning, info and debug messages to the screen
logger = logging.getLogger() logger = logging.getLogger()
logger.setLevel(logging.DEBUG) logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(sys.stdout)) logger.addHandler(logging.StreamHandler(sys.stdout))
logger.info("Launching obmgapanalysis")
# Get program configuration from config.yml
with open("config.yml", "r") as ymlfile: with open("config.yml", "r") as ymlfile:
config = yaml.load(ymlfile, Loader=yaml.FullLoader) config = yaml.load(ymlfile, Loader=yaml.FullLoader)
db_config = config["database"] db_config = config["database"]
datasource_config = config["datasource"] datasource_config = config["datasource"]
tiles_config = config["tiles"] tiles_config = config["tiles"]
output_pathname = os.path.abspath(config["output_pathname"])
if not os.path.exists(output_pathname):
os.makedirs(output_pathname)
def main(): def multiprocess_chunk(quadkey_batch):
"""
Wrapper funtion that writes a CSV file with built-up areas found in the
quadkey_batch. The ouput is based on TileProcessor.build_dictionary.
This wrapper function is built to work with the multiprocessing library.
logger.info("obmgapanalysis has started") Output filenames are: <number of tiles>_<first quadkey>.csv
Args:
quadkey_batch (list): List of quadkeys to process with settlement data
"""
# Create DataSource instance to be used by all child processes
datasource = DataSource(**datasource_config) datasource = DataSource(**datasource_config)
logger.debug(
logger.info(
"DataSource raster files are registered in {}".format( "DataSource raster files are registered in {}".format(
os.path.join(datasource.pathname, datasource.raster_files_index) os.path.join(datasource.pathname, datasource_config["raster_files_index"])
) )
) )
# Connect to the OBM database # Connect to the OBM database
roads_database = Database(**db_config) roads_database = Database(**db_config)
roads_database.create_connection_and_cursor() roads_database.create_connection_and_cursor()
roads_table_crs_number = roads_database.get_crs_from_geometry_field( roads_database_crs_number = roads_database.get_crs_from_geometry_field(
**db_config["roads_table"] **db_config["roads_table"]
) )
logger.info( logger.info(
"Connection established to {} in {}".format(db_config["dbname"], db_config["host"]) "Connection established to {} in {}".format(db_config["dbname"], db_config["host"])
) )
# List all the # Build a list with all tiles with reported built-up areas
build_up_areas = []
try:
for quadkey in quadkey_batch:
result = TileProcessor.get_build_up_area(
quadkey=quadkey,
database=roads_database,
datasource=datasource,
database_crs_number=roads_database_crs_number,
table_config=db_config["roads_table"],
)
if result is not None:
build_up_areas.append(result)
if build_up_areas:
# Write output into a csv file
FileProcessor.write_tiles_to_csv(build_up_areas, output_pathname)
finally:
roads_database.connection.close()
def main():
# Read tiles to be processed
if tiles_config["use_txt"]: if tiles_config["use_txt"]:
with open(tiles_config["txt_file"]) as fid: tiles_list = list(numpy.loadtxt(tiles_config["txt_filepath"], dtype=str))
tiles_list = [line.strip() for line in fid]
else: else:
tiles_list = tiles_config["tiles_list"] tiles_list = tiles_config["tiles_list"]
logger.info("{} tiles will be processed".format(len(tiles_list))) logger.info("{} tiles will be processed".format(len(tiles_list)))
# Loop through each quadkey and retrieve information # Generate a parallel process pool with each quadkey batch and process
for quadkey in tiles_list: num_processes = config["number_cores"]
# Create a tile with the same crs as the DataSource chunk_size = config["chunk_size"]
tile = Tile(quadkey, datasource.crs) quadkey_batchs = [
logger.debug(str(tile.geometry)) tiles_list[i : i + chunk_size] for i in range(0, len(tiles_list), chunk_size)
logger.info(type(tile)) ]
# Find raster files with information for the given tile
raster_files = TileProcessor.intersect_datasource_with_tile(
datasource=datasource, tile=tile
)
# Load the raster files and extract only the pixels in the cells
pixel_values, pixel_georeferences = TileProcessor.get_raster_pixels_in_tile(
raster_filepaths=raster_files, tile=tile
)
# Transform the pixels into a polygon
built_geometry = TileProcessor.polygonize_array(
pixel_values, pixel_georeferences, datasource
)
logger.info(
"The built area polygon with complete pixels is: {}".format(str(built_geometry))
)
clip_built_geometry = TileProcessor.clip_to_tile_extent(built_geometry, tile)
logger.info(
"The built area polygon cropped to the extent of tile {}: {}".format(
tile.quadkey, str(clip_built_geometry)
)
)
roads_in_tile = roads_database.get_features_in_tile(
tile=tile, crs_number=roads_table_crs_number, **db_config["roads_table"]
)
logger.info(
"{} road features were found for tile {}".format(
len(roads_in_tile.index), tile.quadkey
)
)
# Process the roads query
roads_processed = TileProcessor.process_dataframe_with_tile(
roads_in_tile, tile=tile, buffer_magnitude=3.0
)
# Substract result from built area logging.info("Creating multiprocessing pool")
refined_built_area = TileProcessor.polygon_difference( with multiprocessing.Pool(processes=num_processes) as pool:
clip_built_geometry, roads_processed logging.info("Start parallel processing")
) pool.map(multiprocess_chunk, quadkey_batchs)
area = TileProcessor.albers_area_calculation(refined_built_area, tile.crs) logging.info("Parallel processing finished, closing pool")
logger.info( pool.close()
"Within tile {}, there are {} squared meters of built area.".format( pool.join()
tile.quadkey, area
)
)
roads_database.connection.close()
# Leave the program # Leave the program
logger.info("Task finished, closing obmgapanalysis")
sys.exit() sys.exit()
......
...@@ -18,6 +18,9 @@ ...@@ -18,6 +18,9 @@
import logging import logging
import os
from datetime import date
import rasterio import rasterio
from rasterio.mask import mask from rasterio.mask import mask
from shapely.geometry import mapping from shapely.geometry import mapping
...@@ -26,8 +29,7 @@ import pyproj ...@@ -26,8 +29,7 @@ import pyproj
from shapely.ops import transform from shapely.ops import transform
from functools import partial from functools import partial
import geopandas import geopandas
import os from obmgapanalysis.tile import Tile
# Initialize log # Initialize log
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
...@@ -40,33 +42,19 @@ class TileProcessor: ...@@ -40,33 +42,19 @@ class TileProcessor:
that intersect the given tile.geometry that intersect the given tile.geometry
Args: Args:
datasource (datasource.DataSource): DataSource instance with datasource (datasource.DataSource): DataSource instance with crs,
crs, pathname and raster_files_index attributes. pathname and raster_files_index attributes.
tile (tile.Tile): Tile object with quadkey, crs and geometry attributes tile (tile.Tile): Tile object with quadkey, crs and geometry attributes
Examples:
>>> from obmgapanalysis.datasource import DataSource
>>> from obmgapanalysis.tile import Tile
>>> datasource = DataSource(
crs="epsg:3857",
pathname=os.environ["OBMGAPANALYSIS_BASE_PATH"],
raster_index="GHS_TEST_INDEX.shp"
)
>>> tile = Tile("122100203023320202", datasource.crs)
>>> intersect_datasource_with_tile(datasource, tile)
Returns: Returns:
index_list (list): List of the filepaths of the raster files that index_list (list): List of the filepaths of the raster files that
intersect with the Tile.geometry. intersect with the Tile.geometry.
""" """
raster_files_index = datasource.raster_files_index
raster_files_index = datasource.raster_files_index.copy()
raster_files_index = raster_files_index[raster_files_index.intersects(tile.geometry)] raster_files_index = raster_files_index[raster_files_index.intersects(tile.geometry)]
raster_files_index["location"] = raster_files_index["location"].apply( raster_files_index.loc[:, "location"] = raster_files_index["location"].apply(
lambda x: os.path.join(datasource.pathname, x) lambda x: os.path.join(datasource.pathname, x)
) )
index_list = raster_files_index.location.tolist() index_list = raster_files_index.location.tolist()
...@@ -84,9 +72,9 @@ class TileProcessor: ...@@ -84,9 +72,9 @@ class TileProcessor:
Returns: Returns:
pixel_values (list of numpy.ndarray): List of arrays with pixel values that pixel_values (list of numpy.ndarray): List of arrays with pixel values that
intersect with the tile.geometry intersect with the tile.geometry
pixel_georeferences (list of affine.Affine): List with affine transformation pixel_georeferences (list of affine.Affine): List with affine transformation
matrices of pixel_values (allows geocoding) matrices of pixel_values (allows geocoding)
""" """
pixel_values = [] pixel_values = []
...@@ -109,13 +97,13 @@ class TileProcessor: ...@@ -109,13 +97,13 @@ class TileProcessor:
Args: Args:
pixel_values (list of numpy.ndarray): List of arrays with pixel values that pixel_values (list of numpy.ndarray): List of arrays with pixel values that
intersect with the tile.geometry intersect with the tile.geometry
pixel_georeferences (list of affine.Affine): List with affine transformation pixel_georeferences (list of affine.Affine): List with affine transformation
matrices of pixel_values (allows geocoding) matrices of pixel_values (allows geocoding)