README.md 4.24 KB
Newer Older
André Hollstein's avatar
André Hollstein committed
1
2
3
4
# Database File of Manually classified Sentinel-2A Data

This repository contains a database of manually labeled [Sentinel-2A](http://www.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Sentinel-2) spectra which were used in the paper: [Hollstein, A.; Segl, K.; Guanter, L.; Brell, M.; Enesco, M.    Ready-to-Use Methods for the Detection of Clouds, Cirrus, Snow, Shadow, Water and Clear Sky Pixels in Sentinel-2 MSI Images. Remote Sens. 2016, 8, 666.](http://www.mdpi.com/2072-4292/8/8/666).

André Hollstein's avatar
André Hollstein committed
5
6
7
The data itself and some associated metadata are stored in an [HDF5](https://www.hdfgroup.org/HDF5/) file which can be downloaded here: 

<https://gitext.gfz-potsdam.de/hollstei/sentinel2_manual_classification_clouds/raw/master/20160321_s2_manual_classification_data.h5>
André Hollstein's avatar
André Hollstein committed
8

André Hollstein's avatar
André Hollstein committed
9

André Hollstein's avatar
André Hollstein committed
10

André Hollstein's avatar
André Hollstein committed
11
12
13
The first dimension of **dates**, **spectra**, and **classes** are aligned such that for each spectrum the selected classes can be retrieved. The association of **class_ids** and **class_names** is given in additional attributes.

The figure below shows the layout of the file and some sample data:
André Hollstein's avatar
André Hollstein committed
14
15


Marta Enesco's avatar
Marta Enesco committed
16
![hdf5 file](fig/h5file.png)
André Hollstein's avatar
André Hollstein committed
17

Marta Enesco's avatar
Marta Enesco committed
18
19
20
21
22
23
24
A technical note on how the data was produced is currently under preparation.

## How the data was produced

### 1. Data Collection

Open-source Sentinel-2 data is available for download on the [Scientific Data Hub](https://scihub.copernicus.eu/dhus). Products consist of a 290 km image divided into 100 km granules in UTM/WGS84 projection. The product name includes sensing and creation date, as well as the relative orbit number of the image.
Marta Enesco's avatar
Marta Enesco committed
25

Marta Enesco's avatar
Marta Enesco committed
26
27
28
29
Following image corresponds to the division into granules of the product **S2A_OPER_PRD_MSIL1C_PDMC_20151211T153317_R021_V20151211T084342_20151211T084342.SAFE**:

![granules](fig/screenshot_granules.jpg)

Marta Enesco's avatar
Marta Enesco committed
30
To create a varied and representative spatial dataset, downloaded images cover a large variety of regions from all over the world. 
Marta Enesco's avatar
Marta Enesco committed
31
32
33
34

### 2. Data Classification

By means of different spectral tools, granule pixels are selected and classified into one of the following six classes: 
Marta Enesco's avatar
Marta Enesco committed
35

Marta Enesco's avatar
Marta Enesco committed
36
37
| **Class** | **Coverage** |
| :-------: | ------------ |
Marta Enesco's avatar
Marta Enesco committed
38
39
40
41
42
| cloud | opaque clouds |
| cirrus | cirrus and vapor trails |
| snow | snow and ice |
| shadow | shadows from clouds, cirrus, mountains, buildings, etc |
| water | lakes, rivers, seas |
Marta Enesco's avatar
Marta Enesco committed
43
| clear-sky | remaining: crops, mountains, urban, etc |
Marta Enesco's avatar
Marta Enesco committed
44

Marta Enesco's avatar
Marta Enesco committed
45
Spectral tools include *false-color composites*, *image enhancements* and *graphical visualization of spectra*. Our aim is to create highly heterogeneous classes with a balanced number of pixels.
Marta Enesco's avatar
Marta Enesco committed
46

Marta Enesco's avatar
Marta Enesco committed
47
The figure below exposes the benefit of false-color composites for snow distinction. For this RGB display of the Atlas mountains in Morocco, bands 12/7/3 are selected. Snow pixels will appear in blue, whereas cloud pixels in pink orange.
Marta Enesco's avatar
Marta Enesco committed
48

Marta Enesco's avatar
Marta Enesco committed
49
![marokko](fig/screenshot_marokko.png)
Marta Enesco's avatar
Marta Enesco committed
50

Marta Enesco's avatar
Marta Enesco committed
51
And next figure illustrates the pixel classification. The Fiji coastline is displayed in two different false-composites: (a) bands 4/3/2 and (b) bands 8a/3/2. Colored polygons represent four different classes. Cyan, yellow, dark blue and green colors stand for water, shadow, cloud and clear-sky pixels.
Marta Enesco's avatar
Marta Enesco committed
52
53
![fiji](fig/screenshot_fiji.jpg)

Marta Enesco's avatar
Marta Enesco committed
54
## Dataset
Marta Enesco's avatar
Marta Enesco committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68

Our dataset consists of a total of N=5647725 pixels. Pixel information is saved into different tables in the HDF5 file.
*Relative to Sentinel-2 spatial and spectral resolutions*:
- **band** associates a band position with its label
- further band descriptions can be found in **bandwidth_nm**, **central_wavelength_nm** and **spatial_sampling_m**
*Relative to the classes:*
- **classes** (1xN table) includes the class id to which each pixel in the dataset is associated
- **class_ids** describes the id associated to each class that appears in **class_names**
*Relative to the spectra:*
- **spectra** (13xN table) collects the spectral values of each pixel. Sentinel-2 instrument samples 13 spectral bands.
*Relative to the image metadata:*
- **latitude** and **longitude** gather pixel coordinates
- each pixel is located in a **granule_id**, where several granules correspond to an image associated with a **product_id**
- the same product will share the sensing date -**date**-, four different sampling angles -**sun_azimuth_angle**, **sun_zenith_angle**, **viewing_azimuth_angle**, **viewing_zenith_angle**- and the geographical location -**continent** and **country**.