README.md 4.5 KB
Newer Older
André Hollstein's avatar
André Hollstein committed
1
2
3
4
# Database File of Manually classified Sentinel-2A Data

This repository contains a database of manually labeled [Sentinel-2A](http://www.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Sentinel-2) spectra which were used in the paper: [Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M.    Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666.](http://www.mdpi.com/2072-4292/8/8/666).

André Hollstein's avatar
André Hollstein committed
5
6
The data itself and some associated metadata are stored in an [HDF5](https://www.hdfgroup.org/HDF5/) file which can be downloaded here: 

Hannes Diedrich's avatar
Hannes Diedrich committed
7
8
9
<https://gitext.gfz-potsdam.de/hollstei/sentinel2_manual_classification_clouds/blob/e9dcb39b00967480c7ab4cac8ab44154df56c738/20160914_s2_manual_classification_data.h5>   
New Version (20170331):   
<https://gitext.gfz-potsdam.de/hollstei/sentinel2_manual_classification_clouds/blob/af33c34fc0972e24e7126d08275ccb8c70d66898/20170331_s2_manual_classification_data.h5>
André Hollstein's avatar
André Hollstein committed
10

André Hollstein's avatar
André Hollstein committed
11

André Hollstein's avatar
André Hollstein committed
12
13
14
The first dimension of **dates**, **spectra**, and **classes** are aligned such that for each spectrum the selected classes can be retrieved. The association of **class_ids** and **class_names** is given in additional attributes.

The figure below shows the layout of the file and some sample data:
André Hollstein's avatar
André Hollstein committed
15
16


Marta Enesco's avatar
Marta Enesco committed
17
![hdf5 file](fig/h5file.png)
André Hollstein's avatar
André Hollstein committed
18

Marta Enesco's avatar
Marta Enesco committed
19
20
21
22
23
## How the data was produced

### 1. Data Collection

Open-source Sentinel-2 data is available for download on the [Scientific Data Hub](https://scihub.copernicus.eu/dhus). Products consist of a 290 km image divided into 100 km granules in UTM/WGS84 projection. The product name includes sensing and creation date, as well as the relative orbit number of the image.
Marta Enesco's avatar
Marta Enesco committed
24

Marta Enesco's avatar
Marta Enesco committed
25
26
27
28
Following image corresponds to the division into granules of the product **S2A_OPER_PRD_MSIL1C_PDMC_20151211T153317_R021_V20151211T084342_20151211T084342.SAFE**:

![granules](fig/screenshot_granules.jpg)

Marta Enesco's avatar
Marta Enesco committed
29
To create a varied and representative spatial dataset, downloaded images cover a large variety of regions from all over the world. 
Marta Enesco's avatar
Marta Enesco committed
30
31
32
33

### 2. Data Classification

By means of different spectral tools, granule pixels are selected and classified into one of the following six classes: 
Marta Enesco's avatar
Marta Enesco committed
34

Marta Enesco's avatar
Marta Enesco committed
35
36
| **Class** | **Coverage** |
| :-------: | ------------ |
Marta Enesco's avatar
Marta Enesco committed
37
38
39
40
41
| cloud | opaque clouds |
| cirrus | cirrus and vapor trails |
| snow | snow and ice |
| shadow | shadows from clouds, cirrus, mountains, buildings, etc |
| water | lakes, rivers, seas |
Marta Enesco's avatar
Marta Enesco committed
42
| clear-sky | remaining: crops, mountains, urban, etc |
Marta Enesco's avatar
Marta Enesco committed
43

Marta Enesco's avatar
Marta Enesco committed
44
Spectral tools include *false-color composites*, *image enhancements* and *graphical visualization of spectra*. Our aim is to create highly heterogeneous classes with a balanced number of pixels.
Marta Enesco's avatar
Marta Enesco committed
45

Marta Enesco's avatar
Marta Enesco committed
46
The figure below exposes the benefit of false-color composites for snow distinction. For this RGB display of the Atlas mountains in Morocco, bands 12/7/3 are selected. Snow pixels will appear in blue, whereas cloud pixels in pink orange.
Marta Enesco's avatar
Marta Enesco committed
47

Marta Enesco's avatar
Marta Enesco committed
48
![marokko](fig/screenshot_marokko.png)
Marta Enesco's avatar
Marta Enesco committed
49

Marta Enesco's avatar
Marta Enesco committed
50
Next figure illustrates the pixel classification. The Fiji coastline is displayed in two different false-composites: (a) bands 4/3/2 and (b) bands 8a/3/2. Colored polygons represent four different classes. Cyan, yellow, dark blue and green colors stand for water, shadow, cloud and clear-sky pixels.
Marta Enesco's avatar
Marta Enesco committed
51
52
![fiji](fig/screenshot_fiji.jpg)

Marta Enesco's avatar
Marta Enesco committed
53
54
55
And following graph shows four different spectral profiles from a Sentinel-2 image.
![](fig/screenshot_spectra.jpg)

Marta Enesco's avatar
Marta Enesco committed
56
## Dataset
Marta Enesco's avatar
Marta Enesco committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70

Our dataset consists of a total of N=5647725 pixels. Pixel information is saved into different tables in the HDF5 file.
*Relative to Sentinel-2 spatial and spectral resolutions*:
- **band** associates a band position with its label
- further band descriptions can be found in **bandwidth_nm**, **central_wavelength_nm** and **spatial_sampling_m**
*Relative to the classes:*
- **classes** (1xN table) includes the class id to which each pixel in the dataset is associated
- **class_ids** describes the id associated to each class that appears in **class_names**
*Relative to the spectra:*
- **spectra** (13xN table) collects the spectral values of each pixel. Sentinel-2 instrument samples 13 spectral bands.
*Relative to the image metadata:*
- **latitude** and **longitude** gather pixel coordinates
- each pixel is located in a **granule_id**, where several granules correspond to an image associated with a **product_id**
- the same product will share the sensing date -**date**-, four different sampling angles -**sun_azimuth_angle**, **sun_zenith_angle**, **viewing_azimuth_angle**, **viewing_zenith_angle**- and the geographical location -**continent** and **country**.