The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B2-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 915–922, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-915-2021
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B2-2021, 915–922, 2021
https://doi.org/10.5194/isprs-archives-XLIII-B2-2021-915-2021

  28 Jun 2021

28 Jun 2021

BENCHMARKING OF CONVOLUTIONAL NEURAL NETWORK APPROACHES FOR VEGETATION LAND COVER MAPPING

B. Carpentier, A. Masse, E. Lavergne, and C. Sannier B. Carpentier et al.
  • CLS, Parc Technologique du Canal, 11 Rue Hermès, 31520 Ramonville-Saint-Agne, France

Keywords: Satellite image time series, Deep learning, Convolutions, Classification, Land cover map, Sparse annotations

Abstract. Satellite Image Time Series (SITS) are becoming available at high spatial, spectral and temporal resolutions across the globe by the latest remote sensing sensors. These series of images can be highly valuable when exploited by classification systems to produce frequently updated and accurate land cover maps. The richness of spectral, spatial and temporal features in SITS is a promising source of data for developing better classification algorithms. However, machine learning methods such as Random Forests (RF), despite their fruitful application to SITS to produce land cover maps, are structurally unable to properly handle intertwined spatial, spectral and temporal dynamics without breaking the structure of the data. Therefore, the present work proposes a comparative study of various deep learning algorithms from the Convolutional Neural Network (CNN) family and evaluate their performance on SITS classification. They are compared to the processing chain coined iota2, developed by the CESBIO and based on a RF model. Experiments are carried out in an operational context using with sparse annotations from 290 labeled polygons. Less than 80 000 pixel time series belonging to 8 land cover classes from a year of Sentinel-2 monthly syntheses are used. Results show on a test set of 131 polygons that CNNs using 3D convolutions in space and time are more accurate than 1D temporal, stacked 2D and RF approaches. Best-performing models are CNNs using spatio-temporal features, namely 3D-CNN, 2D-CNN and SpatioTempCNN, a two-stream model using both 1D and 3D convolutions.