MOLECULE-BASED SPARSE REPRESENTATION STRATEGY AS A REPLACEMENT SOLUTION OF ORTHOGONAL MATCHING PURSUIT TECHNIQUE FOR DECOMPOSITION OF TIME SERIES OF VEGETATION INDICES

: This paper proposes a new strategy based on introducing the molecules as a replacement alternative to the well-known OMP method for a sparse representation of the time series of vegetation indices. This method, by preventing the occurrence of leakage problems, has led to improving the accuracy of the soft classification of agricultural products. To do so, a library of molecules (combination of atoms) is firstly generated by the representative ground truth data as the probable cultivation patterns, and then each time series of vegetation index is decomposed based on molecules. The efficiency of each molecule is measured through three different criteria 1-Maximizing the accuracy of reconstructing the input signal, 2-Minimizing the number of contributed atoms in the molecule, and 3-Minimizing the estimated negative abundances. Then, considering the uncertainties of representative atoms, an iterative imposition of constraints has been used to balance the estimated abundances. The proposed method has improved by 25.21% compared to the common OMP method in the soft classification of temporal signals of vegetation indices.


INTRODUCTION
In recent years, sparse signal reconstruction through a dictionary of basic signals as an efficient tool has attracted the attention of many researchers in the field of signal processing.In this method, it is trying that an input signal is reconstructed using a linear combination of a minimum number of endmembers from a dictionary of basic elements called atoms (Zhang., et al 2021).In better words, a signal  ⃗⃗ ∈   with the help of a dictionary [] × of atoms (  ⃗⃗⃗ , i=1,2,...,n) so that m<<n is sparsely estimated with the mentioned criterion represented in Equation (1).
≅ [ 1 ⃗⃗⃗⃗ ,  2 ⃗⃗⃗⃗ , … ,   ⃗⃗⃗⃗ ] ×  (1) ..|| 0 <   ‖ −  × ‖ <  In Equation (1), [] ×1 is the sparse vector related to the abundances of dictionary atoms in the reconstruction of S ⃗ , | | 0 is the l0-norm operator (counting the number of nonzero elements), ‖ ‖ the l2-norm operator of the vector, LoS is the abbreviation of level of sparsity that show the maximum degree of sparseness, and ϵ is the threshold limit related to the signal reconstruction accuracy.Since the number of atoms in the dictionary is much more than the length of the original signal ( ), and a complete dictionary (m<<n) is used to reconstruct the original signal; Therefore, the number of equations is less than the number of unknowns and the system of equations will be an under-determined system (Xun., 2017).Due to the under-determinacy of the system of equations, there is no deterministic solution to estimate β, and its estimation is categorized into NP-hard problems.There would be infinite numbers of solutions for the under-determined equations system, and in the sparse representation methods, a unique solution will be identified using some restrictions and constraints.Minimization of the abundance vector cardinality (l0-norm) is one of the conditions used to find the unique solution to this system of underdetermined linear equations.To do so, the usage the heuristic algorithms has been proposed as an alternative solution to solving sparse representation linear equations.Among the existing heuristic approach, we can refer to the orthogonal matching pursuit algorithm (OMP).The OMP is an iterative process for sparse decomposition of the signals by sequential finding the proper atoms from a dictionary.It starts by selecting the most similar atom of the dictionary with the input signal.The next iterations dedicate to finding the most similar atoms with the residual vector of estimation of the input signal with the previously identified atoms.This iterative process continues until the convergence criteria mentioned in Equation 1 are met.It should be noted that the selected atom is removed from the dictionary in each step to ensure that it is not selected again in the next iterations.Time series of croplands Vegetation Indices (VIs) are considered as the rich source of information related to phenology and cultivation schedule (Wu et al., 2018).In the following, some research related to the use of time series as well as sparse representation techniques have been reviewed in the field of estimating the cultivated area of agricultural products.Hu et al. (2018), estimated the cultivated area of bean fields in one of the China provinces.In this study, MODIS satellite time series images were used with the 8 days temporal resolution and a 250 meters spatial resolution.In such a way, by using geographically weighted regression, a sub-pixel map of the bean product was produced.The main motive for using this method was the existence of similar time response patterns in the geographical locations that are close to each other.To do so, a 30-meter bean-non-bean reference map of the study area was produced using through classifciation of the Landsat time series images by use of the ground truths data.The prepared map has been resampled to the corresponding MODIS image space, and the geographically weighted regression coefficients were estimated with the 4000 pixels of the ground truth map.Accuracy assessment by provincial statistics and the results obtained from the classification of Landsat images was shwn the acceptable accuracy of the results (Hu et al., 2018).Dimitrov et al. (2019), produced a sub-pixel map of 10 different agricultural products through the classification of Proba-V satellite images with a spatial resolution of 100 meters.To do so, a classified map with a high spatial resolution was initially produced using ground reference data and the time series of Sentinel-2 images.The generated map was then resampled to the corresponding Proba-V images in a way that the presence of different classes was shown in its pixels.Finally, by training two non-linear methods of Artificial Neural Network (ANN) and Support Vector Regression (SVR), the abundance of the presence of different classes and pixels from Proba-V satellite images has been calculated.Higher accuracy of the SVR method is reported in this study.The availability of a high-resolution classified map is the main prerequisite of this method (Dimitrov et al., 2019).Shakri et al. (2020) by use of target detection algorithms, zoned the mountain almond plant in a time series extracted from Sentinel-2 multi-spectral images.In this research, target detection methods, including constrained energy minimization (CEM), matched filter (MF), Adaptive spectral matched filter (ASMF), and adaptive coherence estimator (ACE), have been used to detect the almond plant in the time series of Sentinel-2 satellite images.To evaluate the proposed method, the system receiver operating curve (ROC), and its area under the curve (AUC) were used.The best result was obtained using the CEM method with an accuracy of 0.993 (Shakeri et al., 2020).Razaghmanesh et al. (2020) based on the phenological responses of saffron farmlands presented a solution to reveal these farmlands through target detection algorithms using the normalized difference vegetation index (NDVI) time series extracted from Sentinel-2 satellite images.The approach to detecting saffron fields in this research was based on detecting the temporal-spectral responses of saffron croplands through a dictionary consisting of the temporal-spectral response of saffron and background samples.In this process, the background sub-dictionary was selected randomly from the samples of clustered feature space.The implementation results of this idea in the three different tested areas in Neishabur city have resulted in an average accuracy of 93.1% (Razaghmanesh et al., 2020).Increasing the temporal resolution of satellite-derived time series is inversely related to their spatial resolution.As such, obtaining a very detailed time series probably coincides with the occurrence of more than one cropland or other landcovers in their corresponding ground footprint.In this situation, a mixed (linear/non-linear) response of all occurred land covers in the resolution cell is recorded in the time series vegetation indices.Decomposing a time signal (vegetation index time series) into its constituent time patterns can be considered a method of separating agricultural products in the time series of satellite images with medium and low resolution.By collecting the temporal responses of the vegetation index related to agricultural products in the form of a dictionary, the problem of separating agricultural products in a mixed pixel is implemented in the form of a sparse decomposition process.The high similarity in the temporal patterns of agricultural products has caused the response obtained from the sparse decomposition process ( ) with the help of the conventional OMP technique to be insufficiently accurate.By increasing the linear correlation between the atoms of the dictionary, the number of possible answers is increased, if the purpose of the sparse estimation is the labeling process, the use of OMP can be considered a source of error (Zhao et al., 2017).This error occurs when the incorrectly selected atoms are used to assign their abundances.The incorrect abundances estimated for the initial atoms have directly affected the estimation of the residual vectors in each iteration, which disturbance the identification of subsequent atoms.Even when the first atom is correctly identified, this will disturb the selection process of other atoms.Theoretically, this event is known as the leakage effect in the estimation techniques (Shakeri et al., 2020).In this article, by introducing the idea of usage molecules (a composition of several atoms) instead of atoms, an attempt has been made to avoid the leakage phenomenon in the sparse representation of VI time series.The details of this method are presented in the second section, and the results and discussion are presented in the third section.The last part of this article is dedicated to conclusions and suggestions.

METHODOLOGY
The time series of plant indices extracted from satellite images allows monitoring and distinguishing agricultural products if there is a difference in their temporal greenness trends.The Proba-V is one of the multi-spectral satellite sensors installed in the SPOT-4&5 platforms, which has the possibility of providing high temporal resolution due to its low and medium spatial resolution.This satellite is designed for land greenness monitoring, and its data is freely available.The data of this satellite are also used to produce global coverage maps published yearly at (https://lcviewer.vito.be/2015).These maps are known as VITO land cover maps.Cropland is one of the layers of the VITO land cover map, which is presented as the percentage of cultivated area for each pixel.In this research, assuming the correctness this layer of VITO map, the separation of crop products of the pixels located in that layer through sparse representation techniques is on the agenda.For this purpose, the time series of NDVI extracted from the Proba-V sensor and a dictionary of agricultural products obtained from the time series of Sentinel-2 images extracted from Google Earth engine (GEE) have been used.On the other hand, contrary to the traditional expectation of the linear mixing of the reflectance spectrum of materials, the mechanism of mixing the temporal signals would differ in terms of the sum to one constraint of abundances.In other words, consecutive autumn and spring cultivations in a cropland (for example, Planting potato crop after wheat harvest) causes two hundred percent cultivation for it in a crop year.Thus, if the cultivated area percentage of a mixed temporal signal would be CAP %, the crop product percentage (CPP), can differ in the range of CAP<CPP<2×CAP.This range originates from the fact of the possibility of single cultivation of all croplands cooperated till their double cultivated ones in the mixing of temporal signals.Usually, the correct value of CPP is unknown for the recorded temporal signals of vegetation index, which would make another challenge in their decomposition to the basic signals.

Figure (1). (C). Temporal response of annual planting
In addition to the difference in how to decompose a temporal signal related to the VI time series to estimate the contribution of cultivated crops, the occurrence of leakage is also a source of error in the identification of the signal constituent atoms when using the traditional OMP method.In other words, correct labeling is possible only when atoms are used simultaneously in the process of reconstructing a signal, and the constraints of estimating also require to be different in comparison to the spectral signatures unmixing process.Therefore, with the aim of facing the challenges raised, in the proposed method, a new approach based on the definition of basic molecules and iterative applying of the constraints is proposed to sparsely decompose and production labeling of the NDVI time series.According to the flowchart presented in Figure ( 2), the mechanism of the proposed method consists of five consecutive steps that are described in following.

Figure (2). Flowchart of the proposed method based on the use of molecules in sparse decomposition of VI time series
Step 1 -the atoms of the dictionary are separated into subgroups related to each agricultural product.The subdictionary of each product might also be separated into several distinct categories due to their interclass scattering caused by diversity in the type of seed, irrigation, climate, and agricultural calendar.
Step 2 -after arranging each sub-dictionary in terms of the similarity of constituent atoms, its atoms are separated into different clusters based on their diversities, and a representative is extracted from each cluster.The diversity ranges in the atoms of each product sub-dictionary compared to the other ones will be the criterion for determining the number of clusters of that sub-dictionary.In other words, fewer representatives are extracted from sub-dictionaries related to products with high similarity, and more representatives are extracted from products with more diversity.The Euclidean distance criterion has been chosen for determining the distinction between atoms from each other.The representatives of each cluster will be simply the mean of the VI time series of the constituent atoms of that cluster.By generating representatives, they will use as the replacement of the original atoms of the dictionary for signal representation purposes.
Step 3 -since the gradual process used in the OMP technique leads to the possibility of identifying incorrect atoms; in the third step, as the first aspect of the innovation of this research, all possible non-repetitive combinations of representatives have been produced in the form of a molecular library.Each compound is labeled as a molecule and stored in the form of a design matrix consisting of representative column vectors.In this study, Molecules consist of monoatomic to tetratomic structures.The maximum number of contributing atoms in the generation of the molecules has been chosen according to the spatial resolution of the Proba-V satellite sensor (one hectare) and the average areas of cultivated land in the study area.In other words, it is not expected that there were more than four distinct temporal patterns in the area covered by each pixel of the time series extracted from Proba-V images.
Step 4 -In this step, each observed VI time series is once decomposed through all the library molecules.Then, the result obtained from the decomposition of the input signal for each molecule is evaluated from three different aspects.1-The reconstruction accuracy of signal through the RMSE index, 2-The number of atoms contributed in the molecule (Na), and 3-The sum of the absolute value of negative abundances (∑ || <0 ), have been chosen as the factors affecting the adequacy of a molecule in the process of decomposition.Equation (2) represents the cost index of each molecule that contain all the mentioned criteria in adequacy assessment of each molecule.Hence, the molecule with the lowest cost would be identified as the optimal molecule in this step.
According Equation ( 2), the molecule that simultaneously has the lowest number of atoms, the highest accuracy of the reconstruction, and the lowest sum in the absolute of the negative estimated abundances is selected as the optimal molecule.The size of agricultural land is generally more than one hectare.For this reason, the square value of Na has been used to push the results to the molecules with the minimum number of crops in the signal reconstruction process.Finally, the labels of the products related to the atoms participating in the optimal molecule are allocated as the product labels of the input signal.
Step 5 -After identifying the labels of the products, in the last step, their abundances are estimated by considering the differences related to the way of implying the constraints.
Because of the uncertainty in the production of representatives when using the averaging process, applying the constraints in recovering the abundances would be necessary.By knowing the agricultural land cover percent of each signal, the abundance of cultivated crops can be up to twice this amount.The method of applying the constraints to estimate the abundance(s) of the winner molecule has been done according to the pseudocode presented in Algorithm 1.

Algorithm (1).
Iterative process of the constrained estimation of the product's abundances for the optimum molecule Input: Time series signal of VI ( ⃗ ⃗ ) Cropland Percentage (CP) of the  ⃗ ⃗ The optimal molecule (obtained in the fourth step) Output: Production abundances (β) Identifying the type of cultivation of the products contained in the optimal molecule IF "the molecule has simultaneously the autumn and spring crop types", THEN:

RESULTS AND DISCUSSION
The proposed method in this research has been evaluated in two different aspects.First, a set of mixed signals has been simulated with the help of the existing dictionary related to agricultural products.In the simulation of signals, specific conditions have been considered in the mixing of VI time series.Decomposition of the simulated signals has been implemented in two different ways of using the traditional OMP method and the proposed method.The correctness of labeling and the accuracy of estimation of the abundances have been considered as the indices for comparison between OMP and the proposed method.Labeling accuracy is the first evaluation criterion for analyzing the results.For this purpose, the comparison of two binary vectors has been used.The vector length is equal to the total number of spring and autumn crops.In these vectors, the presence or absence of a product is indicated by the numbers one and zero, respectively.One vector corresponds to the simulated reality, and the values of the second vector are filled according to the detected products for each signal.In the second step of the evaluation process, the accuracy of estimating the abundance of agricultural products has been assessed.This was done for the atoms that were correctly identified in the labeling process.The difference between the estimated abundances for the correct products and their actual values in the simulation process is considered an error from which the RMSE value has been estimated.3), an improvement of 0.27 in the RMSE value of the abundance has been obtained, which seems favorable.
In the second aspect of evaluation, the process of separating agricultural products in the time series of NDVI extracted from PROBA-V sensor related to Urmia city of West Azerbaijan province has been implemented.The time series was related to the 2019 crop year.For this purpose, the existing dictionary of NDVI time signals related to agricultural products and the VITO land cover map were used.Although the agricultural statistics are not highly accurate and were obtained based on the generalization of field observations, the results of the proposed method show better consistency compared to the OMP method.

CONCLUSIONS
In this article, a method based on introducing the molecular strategy was proposed for the sparse decomposition of temporal signals related to VIs.This idea has been implemented to simultaneously use the dictionary's atoms in the estimation of the abundances and prevent the occurrence of the leakage phenomenon.The different occurrence of mixing in temporal signals in the agricultural lands also led to the presentation of a new approach to the constrained linear decomposition.The implementation of the proposed method has been done with the help of a dictionary of the temporal response of crop types with more than 35,000 atoms.Simulation of VI time series with a high diversity in their crop products and their decompositions with the help of the proposed method has led about 75.76% accuracy in labeling process.Meanwhile, using the OMP method in similar conditions has provided an accuracy of 50.25%.In the accuracies of the abundance estimation, the proposed method has led to an RMSE of 0.21, and the OMP method has reached an error of 0.47.Based on this, the proposed method has been better than the OMP method, with a 25.51% improvement in labeling and 55% in estimating of the abundance.
Figure 1(A) shows the Temporal response of autumn planting, Figure 1(B) shows the Temporal response of spring planting and Figure 1(C) shows the Temporal response of annual planting, that shows the difference in the responses of the time series related to agricultural lands with one and two cultivated patterns during a cultivation year.Finally, it is possible to compare the schematic responses of the time series of a field with different patterns (autumn, spring and annual).
Generating the constraint of ∑     ∈ = SP | | | | Constrained least-squares estimation of the ith β | | | | Calculation the ith cost through ith β with () =   × ∑ |  |   < % RMSEi is the reconstruction accuracy % | Figure (3)  shows the map of agricultural lands obtained from VITO and an NDVI image of the PROBA-V sensor related to Urmia city.

Figure ( 3 ).
Figure (3).Map of cropland (CP) and an NDVI image of the PROBA-V related to Urmia city

Table ( 1
) represents the details related to the simulation process of the VI time series.

Table ( 3
) represents the RMSEs in estimating the abundance of products for the proposed method and OMP.

Table ( 3
).The RMSEs of estimating the abundances of agricultural product According to the results of table(