Commit 5a0904bd authored by Daniel Scheffler's avatar Daniel Scheffler
Browse files

Revised documentation (needs to be further improved).


Signed-off-by: Daniel Scheffler's avatarDaniel Scheffler <danschef@gfz-potsdam.de>
parent 53dba9d8
Pipeline #15811 passed with stage
in 9 minutes and 40 seconds
......@@ -44,7 +44,7 @@ test_gms_preprocessing:
# create the docs
- pip install -U sphinx_rtd_theme # Read-the-docs theme for SPHINX documentation
- pip install -U sphinx-autodoc-typehint
- pip install -U sphinx-autodoc-typehints
- make docs
artifacts:
paths:
......
=====
About
=====
The goal of the gms_preprocessing Python library is to provide a fully automatic
pre-precessing pipeline for spatial and spectral fusion (i.e., homogenization)
of multispectral satellite image data. Currently it offers compatibility to
Landsat-5, Landsat-7, Landsat-8, Sentinel-2A and Sentinel-2B.
* Free software: GNU General Public License v3 or later (GPLv3+) (`license details <https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing/blob/master/LICENSE>`_)
* Documentation: https://geomultisens.gitext-pages.gfz-potsdam.de/gms_preprocessing/doc/
* Code history: Release notes for the current and earlier versions of gms_preprocessing can be found `here <./HISTORY.rst>`_.
* OS compatibility: Linux
Feature overview
----------------
Level-1 processing:
^^^^^^^^^^^^^^^^^^^
* data import and metadata homogenization (compatibility: Landsat-5/7/8, Sentinel-2A/2B)
* equalization of acquisition- and illumination geometry
* atmospheric correction (using `SICOR <https://gitext.gfz-potsdam.de/EnMAP/sicor>`_)
* correction of geometric errors (using `AROSICS <https://gitext.gfz-potsdam.de/danschef/arosics>`_)
Level-2 processing:
^^^^^^^^^^^^^^^^^^^
* spatial homogenization
* spectral homogenization (using `SpecHomo <https://gitext.gfz-potsdam.de/geomultisens/spechomo>`_)
* estimation of accuracy layers
=> application oriented analysis dataset
......@@ -136,8 +136,8 @@ def setup(app):
# Add mappings for intersphinx extension (allows to link to the API reference of other sphinx documentations)
intersphinx_mapping = {
'geoarray': ('http://danschef.gitext.gfz-potsdam.de/geoarray/doc/', None),
'python': ('http://docs.python.org/3', None),
'geoarray': ('https://danschef.gitext-pages.gfz-potsdam.de/geoarray/doc/', None),
'python': ('https://docs.python.org/3', None),
}
......
.. include:: ../HISTORY.rst
History / Changelog
*******************
You can find the protocol of recent changes in the gms_preprocessing package
`here <https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing/-/blob/master/HISTORY.rst>`__.
Welcome to gms_preprocessing's documentation!
=============================================
Documentation of the gms_preprocessing package
==============================================
.. todo::
This documentation is not yet complete but will be continously updated in future.
If you miss topics, feel free to suggest new entries here!
Contents:
.. toctree::
:maxdepth: 2
:maxdepth: 3
:caption: Contents:
readme
about
Source code repository <https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing>
installation
usage
contributing
......
.. highlight:: shell
============
Installation
============
Using Anaconda or Miniconda (recommended)
-----------------------------------------
Stable release
--------------
Using conda_ (latest version recommended), gms_preprocessing is installed as follows:
To install gms_preprocessing, run this command in your terminal:
.. code-block:: console
1. Create virtual environment for gms_preprocessing (optional but recommended):
$ pip install gms_preprocessing
.. code-block:: bash
This is the preferred method to install gms_preprocessing, as it will always install the most recent stable release.
$ conda create -c conda-forge --name gms python=3
$ conda activate gms
If you don't have `pip`_ installed, this `Python installation guide`_ can guide
you through the process.
.. _pip: https://pip.pypa.io
.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
2. Then install gms_preprocessing itself:
.. code-block:: bash
From sources
------------
$ conda install -c conda-forge gms_preprocessing
The sources for gms_preprocessing can be downloaded from the `Github repo`_.
You can either clone the public repository:
This is the preferred method to install gms_preprocessing, as it always installs the most recent stable release and
automatically resolves all the dependencies.
.. code-block:: console
$ git clone git://github.com/geomultisens/gms_preprocessing
Using pip (not recommended)
---------------------------
Or download the `tarball`_:
There is also a `pip`_ installer for gms_preprocessing. However, please note that gms_preprocessing depends on some
open source packages that may cause problems when installed with pip. Therefore, we strongly recommend
to resolve the following dependencies before the pip installer is run:
.. code-block:: console
* gdal
* geopandas
* ipython
* matplotlib
* numpy
* pyhdf
* python-fmask
* pyproj
* scikit-image
* scikit-learn=0.23.2
* shapely
* scipy
$ curl -OL https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing/repository/archive.tar.gz?ref=master
Then, the pip installer can be run by:
Once you have a copy of the source, you can install it with:
.. code-block:: bash
.. code-block:: console
$ pip install gms_preprocessing
$ python setup.py install
If you don't have `pip`_ installed, this `Python installation guide`_ can guide
you through the process.
.. note::
The gms_preprocessing package has been tested with Python 3.4+. It should be fully compatible to all Python
versions from 3.4 onwards.
.. _Github repo: https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing
.. _tarball: https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing/repository/archive.tar.gz?ref=master
.. _pip: https://pip.pypa.io
.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
.. _conda: https://conda.io/docs
=====
Usage
=====
==================
Usage instructions
==================
To use gms_preprocessing in a project::
In this section you can find some advice how to use gms_preprocessing
with regard to the Python API and the command line interface.
import gms_preprocessing
Python API
**********
gms_preprocessing command line interface
****************************************
.. toctree::
:maxdepth: 4
usage/add_new_data_to_the_database.rst
usage/create_new_jobs.rst
usage/execute_jobs.rst
Command line interface
**********************
run_gms.py
----------
......
Add new data manually
~~~~~~~~~~~~~~~~~~~~~
You can also add datasets to the local GeoMultiSens data storage which you previously downloaded on your own
(e.g., via EarthExplorer_ or the `Copernicus Open Access Hub`_).
The following code snippet will exemplarily import two Landat-7 scenes into the GeoMultiSens database:
.. code-block:: python
from gms_preprocessing.misc.database_tools import add_externally_downloaded_data_to_GMSDB
add_externally_downloaded_data_to_GMSDB(
conn_DB=get_conn_database('geoms'),
src_folder='/path/to/your/downloaded_data_directory/',
filenames=['LE71510322000093SGS00.tar.gz',
'LE71910232012294ASN00.tar.gz'],
satellite'Landsat-7',
sensor='ETM+'
)
However, this currently only works for Landsat legacy data or if the given filenames are already known in the
GeoMultiSens metadata database.
In other cases, you have to:
1. copy the provider data archives to the GeoMultiSens data storage directory (choose the proper sub-directory
corresponding to the right sensor)
2. register the new datasets in the GeoMultiSens metadata database as follows:
.. code-block:: python
from gms_preprocessing.misc.database_tools import update_records_in_postgreSQLdb
entityids = ["LE70450322008300EDC00",
"LE70450322008284EDC01"]
filenames = ["LE07_L1TP_045032_20081026_20160918_01_T1.tar.gz",
"LE07_L1TP_045032_20081010_20160918_01_T1.tar.gz"]
for eN, fN in zip(entityids, filenames):
update_records_in_postgreSQLdb(conn_params=get_conn_database('geoms'),
tablename='scenes',
vals2update_dict={
'filename': fN,
'proc_level': 'DOWNLOADED'},
cond_dict={
'entityid': eN
}
)
.. _EarthExplorer: https://earthexplorer.usgs.gov/
.. _`Copernicus Open Access Hub`: https://scihub.copernicus.eu/
_ref__add_new_data_to_the_database:
Add new data to the database
****************************
There are three ways to add new satellite data to the locally stored database. You can use the **WebUI**,
you can run the **data downloader** from the command line or you **add the data manually**.
In each case, two steps have to be carried out:
* the downloaded provider archive data need to be physically copied to the **data storage directory** on disk
* the respective metadata entries need to be added to the GeoMultiSens **metadata database**
.. hint::
Regarding the metadata entry, these conditions must be fulfilled to make GeoMultiSens recognize a dataset as properly
added:
* the **'scenes' table** of the GeoMultiSens metadata database **must contain a corresponding entry** at all
(if the entry is not there, the database needs to be updated by the metadata crawler which has to
be done by the database administrator)
* the **'filename' column** of the respective entry in the 'scenes' table must contain a **valid filename string**
* the **'proc_status' column** of the respective entry in the 'scenes' table must at least be **'DOWNLOADED'**
.. include:: ./using_the_data_downloader.rst
.. include:: ./add_new_data_manually.rst
.. _ref__create_new_jobs:
Create new jobs
***************
There are multiple ways to create new jobs depending on what you have. The section below gives a brief overview.
.. note:
Only those datasets that were correctly added to the local GeoMultiSens data storage before can be used to create a
new GeoMultiSens preprocessing job (see :ref:`ref__add_new_data_to_the_database`).
Create a job from a list of filenames
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The list of filenames refers to the filenames of the previously downloaded provider archive data.
.. code-block:: python
from gms_preprocessing.misc.database_tools import GMS_JOB
job = GMS_JOB(conn_db=get_conn_database('geoms'))
job.from_filenames(
list_filenames=['LE07_L1TP_045032_20081026_20160918_01_T1.tar.gz',
'LE07_L1TP_045032_20081010_20160918_01_T1.tar.gz'],
virtual_sensor_id=1,
comment='Two exemplary Landsat-7 scenes for application XY.')
# write the job into the GeoMultiSens metadata database
job.create()
.. code-block:: bash
OUT:
New job created successfully. job-ID: 26193017
The job contains:
- 2 Landsat-7 ETM+_SLC_OFF scenes
Create a job from a list of entity IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a job from a list of scene IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a job from a dictionary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Execute jobs
************
Once a job is created (see :ref:`ref__create_new_jobs`), it can be executed as follows:
.. code-block:: python
from gms_preprocessing import ProcessController
configuration = dict(
db_host='localhost',
CPUs=20
)
PC = ProcessController(jobID=123456, **configuration)
PC.run_all_processors()
This runs the job with the ID 123456 with the configuration parameters as given in the configuration dictionary.
There is a default configuration file, called `options_default.json`_. options_default.json where all the available
configuration parameters are documented.
.. _`options_default.json`: https://gitext.gfz-potsdam.de/geomultisens/gms_preprocessing/blob/master/gms_preprocessing/options/options_default.json
Using the data downloader
~~~~~~~~~~~~~~~~~~~~~~~~~
The GeoMultiSens data downloader downloads the requested data and makes sure that the new dataset is properly added to
the local GeoMultiSens data storage directory as well as to the metadata database.
All you need to do is:
.. code-block:: bash
cd /opt/gms-modules # default installation path of gms-modules
bash gms-cli-frontend --download 13552123
This downloads the satellite provider archive that belongs to scene 13552123 of the GeoMultiSens metadata database.
When using the WebUI, these scene IDs are automatically passed to downloader module. However, when running the data
downloader from command line as shown above, you need to know the scene IDs of the scenes you want to download.
To **find out these scene IDs**, you can query the GeoMultiSens metadata database as follows:
.. code-block:: python
from gms_preprocessing.misc.database_tools import get_info_from_postgreSQLdb
get_info_from_postgreSQLdb(
conn_params=get_conn_database('geoms'),
tablename='scenes',
vals2return=['id'],
cond_dict={
'entityid': ['LE70450322008300EDC00',
'LE70450322008284EDC01'
}
)
This returns the scene IDs of two Landsat-7 scenes
with the entity IDs 'LE70450322008300EDC00' and 'LE70450322008284EDC01':
.. code-block::
OUT:
[(13547246,), (13552123,)]
......@@ -889,7 +889,7 @@ class AtmCorr(object):
self.ac_input = dict(
rs_image=rs_image,
options=self.options, # type: dict
options=self.options, # dict
logger=repr(self.logger), # only a string
script=script
)
......
......@@ -94,7 +94,7 @@ class ExceptionHandler(object):
@functools.wraps(GMS_mapper) # needed to avoid pickling errors
def wrapped_GMS_mapper(GMS_objs, **kwargs):
# type: (Union[List[GMS_object], GMS_object, collections.OrderedDict, failed_GMS_object], dict) -> any
# type: (Union[List[GMS_object], GMS_object, collections.OrderedDict, failed_GMS_object], dict) -> Union[GMS_object, List[GMS_object], failed_GMS_object] # noqa
"""
:param GMS_objs: one OR multiple instances of GMS_object or one instance of failed_object
......@@ -121,14 +121,14 @@ class ExceptionHandler(object):
# GMS_mapper inputs CONTAIN failed_GMS_objects -> log and return mapper inputs as received
else:
GMS_obj = self.get_sample_GMS_obj(self.GMS_objs) # type: failed_GMS_object
GMS_obj = self.get_sample_GMS_obj(self.GMS_objs) # failed_GMS_object
# FIXME in case self.GMS_objs is a list and the failed object is not at first position
# FIXME GMS_obj.failedMapper will not work
print("Scene %s (entity ID %s) skipped %s due to an unexpected exception in %s."
% (GMS_obj.scene_ID, GMS_obj.entity_ID, self.GMS_mapper_name,
GMS_obj.failedMapper)) # TODO should be logged by PC.logger
return self.GMS_objs # type: Union[GMS_object, List[GMS_object], failed_GMS_object]
return self.GMS_objs # Union[GMS_object, List[GMS_object]]
except OSError:
_, exc_val, _ = self.exc_details
......@@ -145,13 +145,13 @@ class ExceptionHandler(object):
elif CFG.disable_exception_handler:
raise
else:
return self.handle_failed() # type: failed_GMS_object
return self.handle_failed() # failed_GMS_object
except Exception:
if CFG.disable_exception_handler:
raise
else:
return self.handle_failed() # type: failed_GMS_object
return self.handle_failed() # failed_GMS_object
return wrapped_GMS_mapper
......
......@@ -2372,7 +2372,7 @@ def update_proc_status(GMS_mapper):
@functools.wraps(GMS_mapper) # needed to avoid pickling errors
def wrapped_GMS_mapper(GMS_objs, **kwargs):
# type: (Union[List[GMS_object], GMS_object, OrderedDict, failed_GMS_object], dict) -> any
# type: (Union[List[GMS_object], GMS_object, OrderedDict, failed_GMS_object], dict) -> Union[GMS_object, List[GMS_object]] # noqa
# noinspection PyBroadException
try:
......@@ -2415,7 +2415,7 @@ def update_proc_status(GMS_mapper):
raise
return GMS_objs # type: Union[GMS_object, List[GMS_object]]
return GMS_objs # Union[GMS_object, List[GMS_object]]
return wrapped_GMS_mapper
......
......@@ -59,6 +59,6 @@ dependencies:
- nose2
- nose-htmloutput
- rednose
- sphinx-autodoc-typehint
- sphinx-autodoc-typehints
- sphinx-argparse
- sphinx_rtd_theme
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment