Commit ea631acd authored by Nicolas Garcia Ospina's avatar Nicolas Garcia Ospina
Browse files

Included the DataSource class

parent 981cadf4
Pipeline #19109 passed with stage
in 3 minutes and 12 seconds
......@@ -3,6 +3,7 @@ image: debian:bullseye-slim
# Make pip cache the installed dependencies
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
OBMGAPANALYSIS_BASE_PATH: "$CI_PROJECT_DIR/tests/data"
cache:
paths:
- .cache/pip
......
# Introduction
The [OpenBuildingMap](https://www.openbuildingmap.org/) (OBM) gap analysis program evaluates
the completeness of buildings in OpenStreetMap. It compares the existing buildings with global
high-resolution open-data settlement datasets as well as detailed remote sensing data with a
local scope and categorizes the status on
[Quadtree-based](https://www.sciencedirect.com/topics/engineering/quadtrees) tiles with a level-18 resolution.
# Copyright and Copyleft
Copyright (C) 2021
Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
This program is free software: you can redistribute it and/or modify it under the terms of the
GNU Affero General Public License as published by the Free Software Foundation, either version
3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without
even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Affero General Public License for more details.
See further details in `README.md` and `LICENSE`.
# Input Datasets
The OpenBuildingMap (OBM) gap analysis makes use of geocoded raster files, which describe the
shape of a settlement by assigning pixel values to different status (e.g. 1=built and 0=non-built).
There are currently some models based on remote sensing observations that depict the settlement
distributions among the world (e.g. [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php)
available in 30m resolution or [WSF-2015](https://www.esa.int/Applications/Observing_the_Earth/Mapping_our_global_human_footprint) available in 30m resolution).
have different ways to describe the surface but can be reduced to a binary solution, in this
document a short description of the datasets can be found. For more information, check the datasets
official sites.
# Data structure
This program makes use of the original data structure from the [GHSL-BUILT](https://ghsl.jrc.ec.europa.eu/download.php)
dataset. This structure can also be applied to any other group of rasters in order to use custom
datasets. The structure has two main components to follow (`file paths` and `raster_files_index`).
## File paths
The program searches for an environment variable called `OBMGAPANALYSIS_BASE_PATH`, which is the path
where the different datasets are stored and should follow this structure.
OBMGAPANALYSIS_BASE_PATH
+-- dataset_1_raster_files_index.shp
+-- dataset_1_directory
| +-- dataset_subdirectory_1
| +-- dataset_subdirectory_2
| +-- dataset_subdirectory_3
| +-- raster_1.tif
| +-- raster_2.tif
| +-- ...
| +-- raster_n.tif
| +-- ...
| +-- dataset_subdirectory_n
+-- ...
+-- dataset_n_raster_files_index.gpkg
+-- dataset_n_directory
The naming of the directories, subdirectories and raster tiles is arbitrary as long as it is well
established in the `raster index` files.
## raster_files_index
The `raster_files_index` files are considered lightweight guides to navigate through the entire dataset
without the need to load all the raster tiles. This eases the job of finding the raster(s) intersecting
a single quadtree tile. these files are mainly georeferenced vector files (e.g.
[shapefile](https://desktop.arcgis.com/en/arcmap/10.3/manage-data/shapefiles/what-is-a-shapefile.htm),
[geopackage](https://www.geopackage.org/), or any other supported by
[`geopandas.read_file()`](https://geopandas.org/reference/geopandas.read_file.html).
The `raster_files_index` have the following minimum structure
(example for `dataset_1_raster_files_index.shp`):
| location | geometry |
|:------:|:------:|
| dataset_1_directory/dataset_subdirectory_3/raster_1.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) |
| dataset_1_directory/dataset_subdirectory_3/raster_2.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) |
| ... | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) |
| dataset_n_directory/dataset_subdirectory_n/raster_n.tif | POLIGON((... ... , ... ... , ... ... , ... ... , ... ...)) |
`location.dtype = str` relative path of the raster tile
`geometry.dtype = <shapely.geometry.polygon.Polygon>` Bounding box or footprint of the raster tile
# GHSL source dataset
- GHSL: The GHSL-BUILT multitemporal dataset available in 30m resolution at:
https://ghsl.jrc.ec.europa.eu/download.php
Size: ca. 12 Gb
GHS_BUILT_LDSMT_GLOBE_R2018A_3857_30_V2_0.shp (raster_files_index)
#!/usr/bin/env python3
# Copyright (c) 2021:
# Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
# General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.gnu.org/licenses/.
import logging
import os
import geopandas
# Initialize log
logger = logging.getLogger(__name__)
class DataSource:
"""A DataSource holds a raster-based dataset that can be used to estimate built-up areas
in the quadtree environment.
Input dataset:
The source of information are sets of raster data and a raster index file that includes
filepaths and raster footprints. These sources can either be:
Args:
crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
pathname (str): Pathname where the dataset and explanatory dataframe are located.
raster_files_index (str): Filename of the raster index.
Attributes:
self.crs (str): Coordinate system associated with the DataSource, given as a EPSG code.
self.pathname (str): Pathname where the dataset and explanatory dataframe are located.
self.raster_files_index (geopandas.geodataframe.GeoDataFrame):
GeoPandas dataframe with raster relative filepaths and respective geometries.
"""
def __init__(self, crs, pathname, raster_files_index):
self.crs = crs
self.pathname = pathname
self.raster_files_index = geopandas.read_file(
os.path.join(pathname, raster_files_index)
)
......@@ -18,7 +18,10 @@
import logging
import os
import sys
from obmgapanalysis.datasource import DataSource
from obmgapanalysis.tile import Tile
# Add a logger printing error, warning, info and debug messages to the screen
logger = logging.getLogger()
......@@ -27,10 +30,39 @@ logger.addHandler(logging.StreamHandler(sys.stdout))
def main():
# Check for OBMGAPANALYSIS_BASE_PATH environment variable
# and create it if necessary.
if "OBMGAPANALYSIS_BASE_PATH" not in os.environ:
logging.debug(
"OBMGAPANALYSIS_BASE_PATH is not a system variable \n "
"Create variable OBMGAPANALYSIS_BASE_PATH:"
)
os.environ["OBMGAPANALYSIS_BASE_PATH"] = str(input())
pass
else:
logger.info("OBMGAPANALYSIS_BASE_PATH exists")
# Read the OBMGAPANALYSIS_BASE_PATH environment variable
OBMGAPANALYSIS_BASE_PATH = os.environ["OBMGAPANALYSIS_BASE_PATH"]
# Example logging output
logger.info("obmgapanalysis has started")
# If valid instantiate the GHSL dataset as a DataSource class
if "GHSL":
datasource = DataSource(
crs="epsg:3857",
pathname=OBMGAPANALYSIS_BASE_PATH,
raster_files_index="GHS_BUILT_LDSMT_GLOBE_R2018A_3857_30_V2_0.shp",
)
logger.info(datasource.raster_files_index.head(5))
logger.info(type(datasource))
# create a tile with the same crs as the DataSource
tile = Tile("122100203023320202", datasource.crs)
logger.debug(str(tile.geometry))
logger.info(type(tile))
# Leave the program
sys.exit()
......
......@@ -51,7 +51,9 @@ class Tile:
Attributes:
self.quadkey (str): Quadkey code associated with a bing quadtree tile
self.crs (str): Coordinate system associated with the tile, given as a EPSG code.
self.geometry (shapely.geometry.polygon.Polygon): Shapely polygon of the tile
"""
......
......@@ -29,7 +29,14 @@ setup(
keywords="OpenBuildingMap, buildings, analysis",
license="AGPLv3+",
author="Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ",
install_requires=["pyproj==3.0.0.post1", "shapely", "babelgrid"],
install_requires=[
"pyproj==3.0.0.post1",
"shapely",
"babelgrid",
"fiona",
"rtree",
"geopandas",
],
extras_require={
"tests": tests_require,
"linters": linters_require,
......
UTF-8
\ No newline at end of file
PROJCS["WGS_1984_Web_Mercator_Auxiliary_Sphere",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Mercator_Auxiliary_Sphere"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],PARAMETER["Standard_Parallel_1",0.0],PARAMETER["Auxiliary_Sphere_Type",0.0],UNIT["Meter",1.0]]
\ No newline at end of file
PROJCS["WGS 84 / Pseudo-Mercator",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["X",EAST],AXIS["Y",NORTH],EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"],AUTHORITY["EPSG","3857"]]
#!/usr/bin/env python3
# Copyright (c) 2021:
# Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum GFZ
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or (at
# your option) any later version.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero
# General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.gnu.org/licenses/.
from obmgapanalysis.datasource import DataSource
import geopandas
import os
def test_DataSource_init():
datasource = DataSource(
crs="epsg:3857",
pathname=os.environ["OBMGAPANALYSIS_BASE_PATH"],
raster_files_index="GHS_TEST_INDEX.shp",
)
assert datasource.crs == "epsg:3857"
isinstance(datasource.raster_files_index, geopandas.GeoDataFrame)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment