Commit 1e8082bf authored by Alison Beamish's avatar Alison Beamish
Browse files

Updated documentation based on feedback

parent 9856f7b5
Pipeline #22951 failed with stage
in 11 seconds
Sentinel-2 band number,2-blue,3-green,4-red,5-red edge 1,6-red edge 2,7-red edge 3,8-near infrared,11-shortwave infrared 1,12-shortwave infrared 2
Sentinel-2 band number,2,3,4,5,6,7,8,11,12
,,,,,,,,,
2018-03-03,1,2,3,4,5,6,7,8,9
2018-05-07,10,11,12,13,14,15,16,17,18
2018-06-06,19,20,21,22,23,24,25,26,27
......
......@@ -22,23 +22,26 @@ This manual introduces the Habitat Sampler (HaSa), an innovative tool that auton
Though development of HaSa has prioritized ease of use, this documentation assume a familiarity with the R software. The document is built successively and is intended to lead you step-by-step through the HaSa procedure of generating probability and classification maps. HaSa is still in development and any suggestions or improvements are welcomed and encouraged in our [GitHub Community Version](https://git.gfz-potsdam.de/habitat-sampler/HabitatSampler.git). If questions remain please don't hesitate to contact the authors of the package. For a detailed description of the Habitat Sampler and its applications, see [Neumann et al., (2020)](https://doi.org/10.1111/ddi.13165).
## 1.1 Usage
The tool is implemented in R and uses Leaflet [Cheng et al., 2019](https://rdrr.io/cran/leaflet/) to generate interactive maps in a web browser. There are no assumptions about the input image data and there are no constraints for the spectral-temporal-spatial domain in which the image is sampled. The tool requires the input of a priori expert user knowledge to generate reference data about expected surface classes which are delineated in the imagery or extracted from an external spectral library. The user has the choice between image classifiers [random forest](https://doi.org/10.1023/A:1010933404324) (RF) and [support vector](https://doi.org/10.1145/130385.130401) (SV).
The tool is implemented in R and uses Leaflet [(Cheng et al., 2019)](https://rdrr.io/cran/leaflet/) to generate interactive maps in a web browser. There are no assumptions about the input image data and there are no constraints for the spectral-temporal-spatial domain in which the image is sampled. The tool requires the input of a priori expert user knowledge to generate reference data about expected surface classes which are delineated in the imagery or extracted from an external spectral library. The user has the choice between image classifiers [random forest](https://doi.org/10.1023/A:1010933404324) (RF) and [support vector](https://doi.org/10.1145/130385.130401) (SV).
## 1.2 Sample datasets
The examples in this documentation use an L2A Sentinel-2 timeseries stack from 2018 (6 days, 9 bands each) and reference points that are included in the HaSa package. This Sentinel-2 data are from the Kyritz-Ruppiner Heide a former military training area north east of Berlin, Germany. The open heath-lands within the former military training area are designated protected areas under the European Natura 2000 network and are subject to management activities including tree removal, controlled burning and machine mowing. The reference data include 7 habitat classes identified by with a priori expert knowledge.
The examples in this documentation use an L2A Sentinel-2 timeseries stack from 2018 (6 days, 9 bands each) and reference points that are included in the HaSa package. This Sentinel-2 data are from the Kyritz-Ruppiner Heide a former military training area north east of Berlin, Germany. The open heathlands in the former military training area are designated protected areas under the European Natura 2000 network and are subject to management activities including tree removal, controlled burning and machine mowing. The reference data include 7 classes identified with a priori expert knowledge.
The Sentinel-2 data are downloaded and processed using the German Centre for Geosciences (GFZ) Time Series System for Sentinel-2 (GTS2). Data are made available and atmospherically corrected via a simple web application programming interface (API). Detailed information on the GTS2 system can be found [here](https://www.gfz-potsdam.de/gts2/). The metadata including the Senitnel-2 bands included and the band ID in the timeseries stack are provided below.
The Sentinel-2 data are downloaded and processed using the German Centre for Geosciences (GFZ) Time Series System for Sentinel-2 (GTS2). Data are made available and atmospherically corrected via a simple web API. Detailed information on the GTS2 system can be found [here](https://www.gfz-potsdam.de/gts2/). The metadata of the Senitnel-2 data including the band ID in the timeseries stack are provided below.
```{r S2 metadata, eval = TRUE, echo=FALSE}
library(tools)
install.packages("kableExtra")
library(kableExtra)
wd <- paste(tools::file_path_as_absolute("./"), "/", sep = "")
metadat <- read.csv(paste(wd, "Data/S2_stack_metadata.csv", sep = ""), header = T, sep = ",")
colnames(metadat) <- c("Sentinel-2 bands", "Band 2 - Blue", "Band 3 - Green", "Band 4 - Red", "Band 5 - Vegetation Red Edge 1", "Band 6 - Vegetation Red Edge 2", "Band 7 - Vegetation Red Edge 3", "Band 8 - NIR", "Band 11 - Short wave Infra-Red 1", "Band 12 - Short wave Infra-Red 2")
knitr::kable(metadat, caption = "Table 1. Sentinel-2 bands included in the timeseries stack and the correspoding band ID in the HaSa tool")
colnames(metadat) <- c("Sentinel-2 bands", "Band 2", "Band 3", "Band 4", "Band 5", "Band 6", "Band 7", "Band 8", "Band 11", "Band 12")
metadat[1,] <- c("","blue", "green", "red", "Vegetation Red Edge 1", "Vegetation Red Edge 2", "Vegetation Red Edge 3", "NIR", "SWIR 1", "SWIR 2")
knitr::kable(metadat, caption = "Table 1. Sentinel-2 bands included in the timeseries stack and the correspoding band ID in the HaSa tool") %>% kable_styling(font_size = 10)
```
Sample reference data are provided in two formats, as a data table and a point shapefile. The table includes spectral information from each class type desired for the classification and can be used directly to train the models (rows = class, columns = spectral wavebands). The first row must contain the spectral wavebands names and this must match the wavebands of the input satellite data.
Example reference data are provided in two formats, as a data table and a point shapefile. The table includes spectral information from each class type desired for the classification and can be used directly to train the models (rows = class, columns = spectral bands). The first row must contain the spectral bands names and this must match the bands of the input image data.
The point shapefile contains a point location per class and is used to extract the reference data. In the event the shapefile does not have the same projection as the input satellite data, it will be automatically reprojected. The class names must be passed as a vector in the same order as the reference spectra (rows = class).
The point shapefile contains a point location per class and is used to extract the reference data. In the event the shapefile does not have the same projection as the input image data, it will be automatically reprojected. The class names must be passed as a vector in the same order as the reference spectra (rows = class).
# 2 HaSa installation
The following procedure will lead you through the preliminary steps required to setup the HaSa tool.
......@@ -82,7 +85,7 @@ lapply(libraries, library, character.only = TRUE)
```
# 3 Load demo data
An important step preceding habitat classification is to load the Sentinel-2 satellite timeseries stack, reference data, and class names. `HaSa` provides a set of functions to guide the user through the data loading process.
An important step preceding classification is to load the Sentinel-2 satellite timeseries stack, reference data, and class names. `HaSa` provides a set of functions to guide the user through the data loading process.
## 3.1 Data directories
Before loading the input data and using `HaSa`, the user needs to define a series of directory paths. They are from where `HaSa` will read input data, and store intermediates and final results. These directory paths are relative to the working directory path, i.e., `wd`. The following code sets all the paths assuming that the root path is the current directory, i.e., the `demo` directory.
......@@ -102,8 +105,6 @@ raster::rasterOptions(tmpdir = "./RasterTmp/")
## 3.2 Satellite timeseries stack
The satellite time series is either passed as a **3.2.1** stack of images already clipped or **3.2.2** a stack of image to be clipped. In both cases, the input Satellite images needs to either have a valid projection or the projection be passed as parameter, i.e., `sat_crs_str = '+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs'`, otherwise, the function will report error. Satellite time series data are available in `dataPath`.
### 3.2.1 - Clipped
The following example loads a Sentinel-2 timeseries stack clipped to the study area.
```{r handling image data clipped, eval = TRUE, results= "hide", message = FALSE}
satellite_series_path <- paste(dataPath,"SentinelStack_2018.tif",sep = "")
timeseries_stack <- HaSa::load_timeseries_stack(satellite_series_path)
......@@ -115,18 +116,6 @@ r = 19; g = 20; b = 21;
raster::plotRGB(timeseries_stack,r = r,g = g,b = b,stretch = "lin", axes = T)
```
### 3.2.2 - Not-clipped
The following example loads a Sentinel-2 timeseries stack without clipping it to the study area. `TODO`
```{r handling image data, eval = TRUE, results= "hide", message = FALSE}
satellite_series_path <- paste(dataPath,"SentinelStack_2018.tif", sep = "")
timeseries_stack <- HaSa::load_timeseries_stack(satellite_series_path)
```
```{r raster preview, eval = TRUE}
r = 19; g = 20; b = 21;
raster::plotRGB(timeseries_stack,r = r,g = g,b = b,stretch = "lin", axes = T)
```
## 3.3 Selecting reference samples
Reference samples include spectral information of the defined classes as chosen by the user. Reference data can be either passed as a table **3.3.1** or a shapefile **3.3.2**. Reference samples passed as a table can either be pre-extracted spectral information from an image or spectral information imported from a spectral library. Reference samples passed as a shapefile are point locations defined by the user on the satellite timeseries stack from where the spectral information are automatically extracted for each class. Sample reference data are available in `dataPath` in table and shapefile format.
......@@ -140,7 +129,12 @@ ref <- HaSa::load_reference_as_table(table_data_path)
Snapshot of the table of reference samples (rows = 7, columns = 54)
```{r ref table header, echo = FALSE, eval = TRUE}
ref[,c(1:3)]
ref <- read.table(table_data_path, header = T)
colnames(ref) <- c("SentinelStack_2018.1","SentinelStack_2018.2","SentinelStack_2018.3","...", "SentinelStack_2018.54")
rownames(ref) <- c("deciduous","coniferous","heather_young","heather_old",
"heather_shrub","bare_ground","xeric_grass")
ref[,4] <- "..."
knitr::kable(ref[,c(1:5)], caption = "Table 2. Reference data") %>% kable_styling(font_size = 10)
```
### 3.2.2 Reference points
......@@ -172,9 +166,9 @@ classNames <- c("deciduous","coniferous","heather_young","heather_old",
## 4.1 Calculating class probability
### 4.1.1 Plot configuration
The interactive mode of `HaSa` requires the user's expertise to define a threshold for habitat extraction. The user selects a threshold with the help of an interactive map. The interactive map includes an RGB composite of one of the Sentinel-2 scenes to assist in habitat extraction.
The interactive mode of `HaSa` requires the user's expertise to define a threshold for class extraction. The user selects a threshold with the help of an interactive map. The interactive map includes an RGB composite of one of the Sentinel-2 scenes to assist in class extraction.
The satellite timeseries stack (`SentinelStack_2018.tif`) loaded in **3.2.1** has 6 scenes and each scene includes the following bands:`Band 2 - Blue`, `Band 3 - Green`, `Band 4 - Red`, `Band 5 - Vegetation Red Edge 1`, `Band 6 - Vegetation Red Edge 2`, `Band 7 - Vegetation Red Edge 3`, `Band 8 - NIR`, `Band 11 - Short wave Infra-Red 1`, and `Band 12 - Short wave Infra-Red 2` (See Table 1). Using the clipped Sentinel-2 timeseries stack provided as input, the user can test which bands should be used in the plot using `HaSa::plot_configuration()`. The variable `plot_rgb` will later be used for the interactive procedure of habitat sampling.
The satellite timeseries stack (`SentinelStack_2018.tif`) loaded in **3.2.1** has 6 scenes and each scene includes the following bands:`Band 2 - Blue`, `Band 3 - Green`, `Band 4 - Red`, `Band 5 - Vegetation Red Edge 1`, `Band 6 - Vegetation Red Edge 2`, `Band 7 - Vegetation Red Edge 3`, `Band 8 - NIR`, `Band 11 - Short wave Infra-Red 1`, and `Band 12 - Short wave Infra-Red 2` (See Table 1). Using the clipped Sentinel-2 timeseries stack provided as input, the user can test which bands should be used in the plot using `HaSa::plot_configuration()`. The variable `plot_rgb` will later be used for the interactive procedure of class sampling.
```{r plot configuration, eval = TRUE, results='hide', message = FALSE, warning = FALSE, tidy = FALSE}
shp_path <- paste(dataPath,"Example_Reference_Points.shp", sep = "")
......@@ -194,13 +188,13 @@ Light grey indicates a low probability and forest green a high probability.
col <- colorRampPalette(c("lightgrey","orange","yellow","limegreen","forestgreen"))
```
### 4.1.3 Habitat Sampling
Habitat sampling input
### 4.1.3 Class Sampling
Class sampling input
```r
HaSa::multi_Class_Sampling(
in.raster = raster_stk, # clipped satellite time series stack [raster brick]
init.samples = 30, # starting number of spatial samples (recommended value: 30)
init.samples = 75, # starting number of spatial samples (recommended value: 75)
sample_type = "regular", # distribution of spatial samples ("random" or "regular";
# suggest: "regular") *See note 1
nb_models = 200, # number of models to collect (recommended value: 200)
......@@ -223,13 +217,13 @@ HaSa::multi_Class_Sampling(
step = 1, # at which step should the procedure start (see 2.b.1)
# (recommended value: 1 (from the beginning))
classNames = classNames, # vector with class names in the order of reference spectra
n_classes = 7, # total number of classes (habitat types) to be separated
n_classes = 7, # total number of classes to be separated
multiTest = 1, # number of test runs to compare different probability
# output *See note 4
RGB = c(19,20,21), # pallette colors for the interactive plots
overwrite = TRUE, # overwrite the KML and raster files from previous runs
save_runs = TRUE, # an Habitat object is saved into disk for each run (default TRUE)
parallel_mode = TRUE, # run loops using all available cores
save_runs = TRUE, # an class object is saved into disk for each run (default TRUE)
parallel_mode = TRUE, # run loops using all available cores *only possible on Linux machines
max_num_cores = 4, # maximum number of cores for parallelism (default 5)
plot_on_browser = FALSE # plot on the browser or inline in a notebook (default TRUE)
)
......@@ -257,22 +251,22 @@ An interactive map is plotted in a web browser (e.g., Firefox for Linux) contain
From this interactive map, the user has two choices:
1. Extract the habitat type on the basis of a threshold if the probability map meets the user's evaluation
1. Extract class type on the basis of a threshold if the probability map meets the user's evaluation
If the user chooses to extract the habitat, a user defined threshold is entered into the R console and the following files are saved:
If the user chooses to extract the class, a user defined threshold is entered into the R console and the following files are saved:
* HabitatSampler object (Run) - R Binary: The R object is used when the user wants to restart the computation at a specific step or reuse the seeds for sampling.
* probability map - *.kml, *.png, geocoded *.tif: Tiff contains all classes plotted, one class, one color. See example in the demo/Data/Results/HabitatMap_final.pdf
* threshold list - R Binary
* leaflet interactive web interface - *.html: LeafLet Map with the 3 RGB channels and the raster containing the probabilities. The file is re-written for each run
After habitat extraction is done the user can proceed automatically to the next habitat by entering 0 into the R console
After class extraction is done the user can proceed automatically to the next class by entering 0 into the R console
2. Sample the habitat type again to produce a better probability map. If after visual inspection, the user is not satisfied they can adjust the number of samples and models. It is recommended to first increase the number of models (+50).
2. Sample the class type again to produce a better probability map. If after visual inspection, the user is not satisfied they can adjust the number of samples and models. It is recommended to first increase the number of models (+50).
If the user chooses to adjust starting number of samples and number of models, the user must enter the sample/model adjustment (../..) into the R console and evaluate the interactive probability maps again.
The evaluation, thresholding or re-running, and export process is repeated for each habitat class.
The evaluation, thresholding or re-running, and export process is repeated for each class.
**Note** If convergence is not possible, i.e., no `models` can be selected or `init.samples` are not enough or another error occurs, it is possible to restart by re-running the `multi_Class_Sampling` function again with the following new argument values:
```{r eval = FALSE}
......@@ -287,10 +281,10 @@ The evaluation, thresholding or re-running, and export process is repeated for e
## 4.2 Generating classification map and summary statistics
Generate a classification map based on the saved output of the threshold probability maps
```r
plot_results <- function(
```{r eval = FALSE}
plot_results(
inPath, # input files (*.tif), equals outPath
color # vector of plot colors, one for each habitat type
color # vector of plot colors, one for each class type
)
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment