README.md 4.49 KB
Newer Older
André Hollstein's avatar
André Hollstein committed
1
2
3
4
# Database File of Manually classified Sentinel-2A Data

This repository contains a database of manually labeled [Sentinel-2A](http://www.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Sentinel-2) spectra which were used in the paper: [Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M.    Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666.](http://www.mdpi.com/2072-4292/8/8/666).

André Hollstein's avatar
André Hollstein committed
5
6
7
The data itself and some associated metadata are stored in an [HDF5](https://www.hdfgroup.org/HDF5/) file which can be downloaded here: 

<https://gitext.gfz-potsdam.de/hollstei/sentinel2_manual_classification_clouds/raw/master/20160321_s2_manual_classification_data.h5>
André Hollstein's avatar
André Hollstein committed
8

André Hollstein's avatar
André Hollstein committed
9

André Hollstein's avatar
André Hollstein committed
10

André Hollstein's avatar
André Hollstein committed
11
12
13
The first dimension of **dates**, **spectra**, and **classes** are aligned such that for each spectrum the selected classes can be retrieved. The association of **class_ids** and **class_names** is given in additional attributes.

The figure below shows the layout of the file and some sample data:
André Hollstein's avatar
André Hollstein committed
14
15


André Hollstein's avatar
André Hollstein committed
16
17
![hdf5 file](fig/screenshot_hdfview.png)

Marta Enesco's avatar
Marta Enesco committed
18
19
20
21
22
23
24
A technical note on how the data was produced is currently under preparation.

## How the data was produced

### 1. Data Collection

Open-source Sentinel-2 data is available for download on the [Scientific Data Hub](https://scihub.copernicus.eu/dhus). Products consist of a 290 km image divided into 100 km granules in UTM/WGS84 projection. The product name includes sensing and creation date, as well as the relative orbit number of the image.
Marta Enesco's avatar
Marta Enesco committed
25

Marta Enesco's avatar
Marta Enesco committed
26
27
28
29
Following image corresponds to the division into granules of the product **S2A_OPER_PRD_MSIL1C_PDMC_20151211T153317_R021_V20151211T084342_20151211T084342.SAFE**:

![granules](fig/screenshot_granules.jpg)

Marta Enesco's avatar
Marta Enesco committed
30
Sentinel-2 multi-spectral instrument samples 13 spectral bands, as illustrated beneath:
Marta Enesco's avatar
Marta Enesco committed
31
32
33
34

![bands](fig/1S2bands.jpg)

To create a varied and representative spatial dataset, downloaded images cover a large variety of regions from all over the world. 
Marta Enesco's avatar
Marta Enesco committed
35

Marta Enesco's avatar
Marta Enesco committed
36
| ![bands](fig/1S2bands.jpg)  | ![granules](fig/screenshot_granules.jpg)) |
Marta Enesco's avatar
Marta Enesco committed
37
38
|---|---|
| S2 spectral bands | Granules |
Marta Enesco's avatar
Marta Enesco committed
39

Marta Enesco's avatar
Marta Enesco committed
40
41
42
### 2. Data Classification

By means of different spectral tools, granule pixels are selected and classified into one of the following six classes: 
Marta Enesco's avatar
Marta Enesco committed
43

Marta Enesco's avatar
Marta Enesco committed
44
45
| **Class** | **Coverage** |
| :-------: | ------------ |
Marta Enesco's avatar
Marta Enesco committed
46
47
48
49
50
| cloud | opaque clouds |
| cirrus | cirrus and vapor trails |
| snow | snow and ice |
| shadow | shadows from clouds, cirrus, mountains, buildings, etc |
| water | lakes, rivers, seas |
Marta Enesco's avatar
Marta Enesco committed
51
| clear-sky | remaining: crops, mountains, urban, etc |
Marta Enesco's avatar
Marta Enesco committed
52

Marta Enesco's avatar
Marta Enesco committed
53
Spectral tools include *false-color composites*, *image enhancements* and *graphical visualization of spectra*. Our aim is to create highly heterogeneous classes with a balanced number of pixels.
Marta Enesco's avatar
Marta Enesco committed
54

Marta Enesco's avatar
Marta Enesco committed
55
The following figure exposes the use of false-color composites for snow distinction.
Marta Enesco's avatar
Marta Enesco committed
56

Marta Enesco's avatar
Marta Enesco committed
57
![marokko](fig/screenshot_marokko.png)
Marta Enesco's avatar
Marta Enesco committed
58
59
60

For this RGB display of the Atlas mountains in Marokko, bands 12/7/3 are selected. Snow pixels appear in blue, whereas cloud pixels in pink orange.

Marta Enesco's avatar
Marta Enesco committed
61

Marta Enesco's avatar
Marta Enesco committed
62
63
64
65
66
67
And next figure illustrates some classes generation.

![fiji](fig/screenshot_fiji.jpg)

This image of Fiji coastline is displayed in two different false-composites, namely with bands 4/3/2 and bands 8a/3/2. Colored polygons represent four different classes: cyan, yellow, dark blue and green correspond to water, shadow, cloud and clear-sky pixels.

Marta Enesco's avatar
Marta Enesco committed
68
## Dataset
Marta Enesco's avatar
Marta Enesco committed
69
70
71
72
73
74
75
76
77
78
79
80
81
82

Our dataset consists of a total of N=5647725 pixels. Pixel information is saved into different tables in the HDF5 file.
*Relative to Sentinel-2 spatial and spectral resolutions*:
- **band** associates a band position with its label
- further band descriptions can be found in **bandwidth_nm**, **central_wavelength_nm** and **spatial_sampling_m**
*Relative to the classes:*
- **classes** (1xN table) includes the class id to which each pixel in the dataset is associated
- **class_ids** describes the id associated to each class that appears in **class_names**
*Relative to the spectra:*
- **spectra** (13xN table) collects the spectral values of each pixel. Sentinel-2 instrument samples 13 spectral bands.
*Relative to the image metadata:*
- **latitude** and **longitude** gather pixel coordinates
- each pixel is located in a **granule_id**, where several granules correspond to an image associated with a **product_id**
- the same product will share the sensing date -**date**-, four different sampling angles -**sun_azimuth_angle**, **sun_zenith_angle**, **viewing_azimuth_angle**, **viewing_zenith_angle**- and the geographical location -**continent** and **country**.