FULL-PHYSICS INVERSE LEARNING MACHINE FOR SATELLITE REMOTE SENSING OF OZONE PROFILE SHAPES AND TROPOSPHERIC COLUMNS

Characterizing vertical distributions of ozone from nadir-viewing satellite measurements is known to be challenging, particularly the ozone information in the troposphere. A novel retrieval algorithm called Full-Physics Inverse Learning Machine (FP-ILM), has been developed at DLR in order to estimate ozone profile shapes based on machine learning techniques. In contrast to traditional inversion methods, the FP-ILM algorithm formulates the profile shape retrieval as a classification problem. Its implementation comprises a training phase to derive an inverse function from synthetic measurements, and an operational phase in which the inverse function is applied to real measurements. This paper extends the ability of the FP-ILM retrieval to derive tropospheric ozone columns from GOME2 measurements. Results of total and tropical tropospheric ozone columns are compared with the ones using the official GOME Data Processing (GDP) product and the convective-cloud-differential (CCD) method, respectively. Furthermore, the FP-ILM framework will be used for the near-real-time processing of the new European Sentinel sensors with their unprecedented spectral and spatial resolution and corresponding large increases in the amount of data.


MOTIVATION
Ozone (O3) plays a crucial role in the Earths atmosphere and its chemical processes (production and destruction) are highly related to climate change and air pollution caused by anthropogenic emissions.Therefore, accurate information of global/regional O3 vertical distributions over the troposphere and stratosphere turns out to be important to atmospheric environment communities.Satellite remote sensing of O3 information using the ultraviolet (UV) radiation has been comparatively mature.A number of European satellite sensors, e.g., the Global Ozone Monitoring Experiment (GOME) series, the SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIA-MACHY), and the Ozone Monitoring Instrument (OMI), have mapped the global and regional O3 distributions.On October 13, 2017, the TROPOspheric Monitoring Instrument onboard the Sentinel-5 Precursor (TROPOMI/S5P), which is one of the next generation of European Copernicus atmospheric composition missions, was launched from Plesetsk, Russia.With its global coverage and open data policy, the mission will support global efforts to monitor atmospheric pollution and to improve our understanding of chemical and physical processes.
These nadir-viewing satellite sensors are preferable to retrieve total column products.Total columns of O3 can be accurately and efficiently estimate by the Differential Optical Absorption Spectroscopy (DOAS) algorithm that essentially uses the Huggins absorption band (320-360 nm).This conventional approach normally requires an ozone climatology for air mass factor (AMF) calculation, but discrepancies between the climatological profile and the actual vertical distribution could lead to a retrieval error of up to 4 % at high solar zenith angles (SZAs) in derived total column amount (Lerot et al., 2014).As most atmospheric ozone resides in the stratosphere above, tropospheric columns of O3 can be derived by subtracting an estimate of the stratospheric columns * Corresponding author or by differencing total columns in cloud-free pixels from those in nearby pixels with thick/high convective clouds (the so-called CCD method).The CCD method can only be applied in the tropical region where the assumption of a zonally invariant stratospheric column is valid.A number of relevant studies (Valks et al., 2014, Heue et al., 2016) have been conducted for GOME-2 measurements.Therefore, the retrieval of total and tropospheric O3 abundances can largely benefit in terms of representativity by obtaining reliable an ozone profile shape.
However, characterizing O3 profile shapes from nadir-viewing satellite measurements is still known to be challenging, particularly the ozone information in the troposphere.Direct retrieval of tropospheric information has also been investigated and applied to the Global Ozone Monitoring Experiment (GOME) class of instruments.In general, estimating atmospheric parameters of interest directly from spectral measurements is often treated as an ill-posed inverse problem that often requires an iterative inversion of large matrices and multiple calls to radiative transfer calculations.An accurate forward model is important and needs to depict the relationship between atmospheric parameters and measured intensity.Furthermore, this classical inversion method is computationally expensive and often needs additional constraints, reliable a priori knowledge (Rodgers, 2000) and regularization parameterization (Xu et al., 2016) can be decisive to the retrieval outcome.Alternatively, machine learning techniques, such as neural network (NN), Gaussian processes, support vector machines, can learn this relationship quickly through a data-driven training.Although machine learning has been widely used in many research fields, a lot of potential capabilities can be exploited in atmospheric retrieval applications.
To derive O3 profile shapes from satellite UV measurements in a very fast way, we have developed a novel retrieval algorithm called Full-Physics Inverse Learning Machine (hereafter, FP-ILM) (Xu et al., 2017) and compared the first retrievals with the RAL retrieval using the optimal estimation method (Miles et al., 2015).In this paper, we further apply the FP-ILM O3 profile shape to total column retrieval from GOME-2 measurements and to tropospheric column estimation.

ALGORITHM DESCRIPTION
The detailed theoretical background and implementation of FP-ILM can be found in the previous work (Xu et al., 2017), and is just summarized here for completeness.Figure 1  1. clustering O3 profile shapes; 2. simulating UV spectra with representative O3 profiles using "smart-sampling" and a radiative transfer model; 3. obtaining the differential spectra and computing principal components from these spectra; 4. training a NN for classifying the O3 profile shape corresponding to an input; 5. developing a NN for scaling the O3 profile based on the given total vertical column density (VCD).
The reference O3 profiles (volume mixing ratios) were built on the Bodeker database (Bodeker et al., 2013)  We used the radiative transfer model VLIDORT (Vector LInearized Discrete Ordinate Radiative Transfer) (Spurr, 2006) that requires the model parameters consisting of the solar zenith angle, viewing zenith angle, relative azimuth angle, surface albedo, and surface pressure.The simulations were done for the wavelength λ range between 290 and 335 nm.In particular, the socalled "smart sampling" (Loyola R et al., 2016) approach was used to generate a minimal number of training samples so that the multi-dimensional input space and the output space can be optimally covered.
The simulated spectra y δ (λ) were converted into the differential spectra y δ c (λ) by a lower-order polynomial fit: where P N (λ, pc) is a polynomial of degree N .A total of nine principal components were extracted from the original differential spectra with the transformed measurement vector y being where U M = [u1, . . ., u M ] ∈ R M ×M is an orthogonal unitary matrix incorporating the M singular vectors u k of the covariance matrix of the original measurement vectors.
The input vector to the classification NN comprised the five model parameters used in the forward simulations and the nine principal components.The weights and biases of each layer were initialized with the Nguyen-Widrow procedure (Nguyen and Widrow, 1990), and the corresponding training was done by the Scaled Conjugate Gradient backpropagation algorithm (Møller, 1993) which is often used for pattern recognition applications.The input vector to the scaling NN for estimating O3 profile shapes comprised only the retrieved total VCD.For each O3 profile cluster, a scaling NN was trained with the corresponding O3 VCDs using the Nguyen-Widrow initialization procedure the Levenberg-Marquardt backpropagation algorithm (Hagan and Menhaj, 1994).
During the operational phase, we implemented the inverse functions (i.e., both trained NNs) derived from the training phase in the framework of total column retrieval from satellite measurements.Since the AMF/VCD conversion is an iterative process, the profile shape estimated from the VCD at the current iteration was used to obtain the next iterate.With the newly retrieved VCD, the O3 profile shape was further adjusted.

FIRST RESULTS
In this section, first results of retrieved total columns of O3 from GOME-2 onboard the MetOp-A satellite (GOME-2A, hereafter) data using FP-ILM profile shapes are presented.For comparison, we used the operational GOME-2 product generated by the GOME Data Processor (GDP) (Van Roozendael et al., 2006, Loyola et al., 2011, Hao et al., 2014) that relies on the TOMS version 8 O3 profile climatology (Bhartia, 2003, McPeters et al., 2007).
Figure 2 shows the total VCD retrieval results from GOME-2A data on November 25, 2017.It can be seen that the retrieved VCDs using the two O3 profile schemes agree well, revealing that the FP-ILM profile shape used in the total ozone retrieval seems reasonable and may reflect the actual measurement conditions.
The FP-ILM O3 profile can be used to obtain tropospheric ozone columns by integrating the partial columns at the layers in the troposphere.The computations using the FP-ILM and CCD schemes (not shown here) were done for monthly averaged values (below 200 hPa) on a 1.25 • by 2.5 • latitude-longitude grid for the tropical region between 20 • N and 20 • S.However, the tropospheric columns using the FP-ILM retrieval seem less sensitive to atmospheric variability including the identification of trends, indicating the needs for further investigations.

CONCLUSIONS
This paper exploited the ability of the FP-ILM algorithm to retrieve total and tropospheric ozone from satellite UV measurements.The retrieved total ozone from GOME-2A data using the FP-ILM profile shape seemed comparable with the one using the TOMS climatology, whereas the tropospheric ozone retrieval using the FP-ILM retrieval showed discrepancies as compared to the CCD method.Future work will focus on improving the algorithm in order to optimize the results.
depicts a schematic diagram FP-ILM algorithm during the training and operational phases.During the training, the FP-ILM algorithm consists the following main steps: ("Tier 1.4" in this study) which were globally taken from eight different spaceborne data sets merged with ozonesonde data.The McPeters/Labow climatology (McPeters and Labow, 2012) (the latest version described in(Labow et al., 2015)) was merged with the Bodeker database in order to provide the tropospheric O3 concentration.

Figure 2 .
Figure 2. Comparison of retrieved total O3 columns from GOME-2A data between the DOAS retrieval using the FP-ILM profile (top) and the GDP 4.8 product (bottom) on November 25, 2017.