HYPERSPECTRAL INVERSION OF SOLUBLE SALT CONTENT IN MURAL PAINTING

: Mural painting is one of the carriers expressing history and culture. Due to the natural and anthropogenic factors, the salt in mural painting and environment is enriched in the surface layer with temperature change. It will induce irreversible diseases such as crispy alkali, which is not conducive to the survival of mural painting in the present. An efficient and non-destructive method to detect salt in murals is of great importance. Therefore, we proposed a method to predict the soluble salt content of mural paintings based on hyperspectral techniques. First, simulated samples with different salt concentrations were measured by a special spectroradiometer to acquire their spectra. Next, breakpoint correction and average smoothing preprocessing are performed and the data set is divided. Then, the spectra were enhanced by continuum removal (CR) and the logarithm of reciprocal (LR). The salt concentration was correlated with the spectra to extract 10 characteristic bands. Finally, the salt content prediction model was established by simple linear regression (SLR) and multiple linear regression (MLR). The accuracy of the model was evaluated with the coefficient of determination R 2 , root mean square error RMSE, and relative percent deviation RPD. The experimental results show that the best inversion fit is based on the combination of the CR-MLR model at the strong correlation bands of 420nm, 584nm, and 2379nm (Calibration Set R 2 = 0.846, RMSE = 0.138, and RPD = 3.240). This paper provides a new technical means for the non-destructive detection of salt content in murals.


INTRODUCTION
Mural painting is a colourful painting attached to ancient buildings. As a special form of expression in the art of painting, frescoes play propaganda, educational, and totemic roles (Yu N, 2016). However, due to natural erosion degradation or improper human intervention, their preservation status has been worried in recent years. The soluble salt content in frescoes changes with external conditions. Salt continuously dissolves, crystallizes, and expands with temperature. Once the salt content accumulates to a certain concentration, enrichment and crystallization will occur through capillary water movement to the surface of the mural (Ma H, 2020). Salt accumulation can induce a variety of irreversible diseases. Zhang Y X (2021) found that water-salt transport and soluble salt dissolution crystallization directly contributed to the alkali disease of the frescoes in Cave 196 at Dunhuang, China. Jin Z L (2008) researched and found that Na2SO4-dominated puffy alkali lesions are progressive and recurring, with full destructive power. Mural salt damage to its artistic value will cause irreversible weakening. Therefore, efficient non-destructive testing of soluble salt content in murals is of urgent relevance.
In recent years, many scholars have conducted a series of studies on the composition and content of soluble salts in Mural paintings. Yu Z R (2017) sampled the diseased area of the Mani temple to determine the salt-containing compounds and concentrations at different depths by using the ionchromatograph dispersion technique. The water-salt migration pattern was revealed. Du H Y (2009) analyzed mural ionic components using capillary electrophoresis and quantitatively estimated the concentration by the peak height of the electrophoretic profile. The latter has a high separation performance as well as low contamination consumption, but there is a certain baseline noise. The above two methods require real-site sampling of the location to be tested in the full-frame mural, which might cause secondary damage to the mural.
The sensor of the spectroradiometer has a finer spectral resolution at the nanometer level, which can reflect the finer and continuous spectral characteristics of the target feature. This makes it possible to quantitatively invert the salinization status (Yao Y, 2013). Up to now, researchers have studied the relationship between heavy metal salts and spectral response bands in arid farmland (Xia J, 2019), mining soil (Tu Y, L2018). The heavy metal content fitting model was developed using spectral feature parameters and their deformations. Zhang S (2019) developed an inverse model for heavy metal Cr, As, Ni, and Cd contents by performing four spectral transformations on the preprocessed spectra using partial least squares regression and radial basis function neural networks. ZE Mashimbye (2012) modelled South African soil spectra using a single band, normalized difference salinity index, and partial least squares regression to obtain an empirical model related to electrical conductivity. The use of hyperspectral techniques for the detection of salt content in mural and wall domains is still in the exploratory stage.
The study was conducted by collecting the spectra of fresco test blocks with different concentrations of laboratory-made, and the pre-processed data were divided into calibration set and validation set. The spectra were feature-enhanced by mathematical transformation, and the feature bands with the spectra strongly correlated with the mural salinity were screened. A linear regression model was developed to invert the soluble salt concentration. The results show that the correlation between the spectra and salinity based on smoothing and continuum removal is stronger. The prediction coefficient of determination of the multiple linear regression model established using the wavebands combination was as high as 0.846, and RMSE= 0.138, with high model stability. This study is important for the excavation of spectral characteristics of murals containing soluble salts. It provides a new technical method for the nondestructive detection of soluble salt content in murals.

Sample Production and Data Collection
Mural spectral data collection, processing, modelling is based on laboratory-made mural test blocks. The mural coarse clay layer and fine clay layer were simulated by mixing sediment (Li N, 2021). In actual murals, sodium sulfate has strong migration, penetration, and crystallization capability and is the main type of salt damage. Ten salinity levels of sodium sulfate solutions were selected and set up to be added to the sample blocks. After dissolving and mixing well, we put it into a mould (16cm*11cm*1.8cm) indoor shade and monitor the culture using a soil tester before it is completely dry. Make sure the temperature is at 21-22 degrees and the relative humidity is kept consistent. Qualified samples are shown in Figure 1 The ASD-FieldSpec4 portable spectroradiometer was selected to collect the reflectance spectra of samples. The specific parameters of the instrument are shown in Table 1. After the reflectance of standard reflector, each sample area was divided into 3×3 blocks, and the centre point of one block in each row and column was selected for data acquisition. After collecting a curved spectral curve, the probe is rotated 90°perpendicular to the sample surface for repeated collection until the probe of the spectroradiometer is rotated 270°horizontally. Four curves are averaged as the spectrum of that point. In this way, a total of 3 points will be measured for each sample. They were averaged to produce the spectrum of the sample.

Data Preprocessing
To eliminate the noise as much as possible randomly caused by the instrument components, human operation, etc., the original spectra are preprocessed. To avoid the effect of the noise at the edges only the 400-2450 band interval was studied. In spectral acquisition, there is variability in the acquired spectra between different optical detection elements, and local discontinuities are eliminated by breakpoint correction. In the point selection measurement, there are point edges and measurement jittering. After removing the outliers, the spectra measured at different points of each sample are double averaged as the measured reflectance spectra of different concentration level samples. The Savitzky Golay smoothing is a local polynomial-based leastsquares fitting filtering process in the time domain. It is suitable for high-frequency curve denoising, which can reduce spurious points to effectively remove high-frequency noise and improve the smoothness of the spectral curve. The general plot of the spectral curve after preprocessing is shown in Figure 2.
The salt content at a certain place of the mural samples was selected as the salt content inversion index. Total 33 preprocessed spectral curves of the qualified samples were divided into 27 modelling sets in the ratio of 9:2 to build the inversion regression model, and 6 validation sets to evaluate the model fitting effect. The delineation results are shown in Table  2. The predicted data in the concentration gradient range of 0-1% mural salt content cover the majority of the salt concentration range.

Figure 2. The spectral curve after preprocessing
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France

METHODOLOGY
The overall technical process is shown in Figure 3. Firstly, the coarse and fine mud layers of simulated murals were produced and mixed with sodium sulfate to obtain laboratory-made mural test blocks of different concentration levels. Secondly, the sample spectral data were collected, preprocessed by breakpoint correction, spectral double averaging, Gaussian Savitzky-Golay smoothing filtering, and divided into data sets. Then, the spectral data were subjected to continuum removal (CR) and the logarithm of reciprocal (LR) for spectral enhancement. In addition, significance detection and correlation analysis are used to screen the strongly correlated bands. Finally, a linear regression model was developed to introduce explanatory variables to invert the fitted salinity. The validation set data and accuracy index was used to determine the optimal salt content prediction, model.

Spectral Feature Enhancement
To solve the high redundancy of the data and the covariance between bands, the pre-processed spectra are transformed by mathematical models. This method can enhance the response of salt content in the spectra and exploit the implied absorption features of the spectra. Ali Volkan (2011) achieved the classification of soil salinity by continuous removal of spectral reflectance. Wang S W (2018) applied five spectral transformations, including the logarithm of the inverse, for partial least squares modelling of soil salinity. The greatest improvement in accuracy was achieved after the LR transform. The CR can effectively suppress target background information and highlight absorption and reflection properties. It can normalize the spectral reflectance to [0, 1] to create more absorption valleys. LR can reduce the influence of spectra due to light conditions and terrain differences. The spectral variability in the visible range is enhanced and highlighted to reduce random errors.

Correlation Analysis and Feature Band Selection
Irrelevant information and interfering variables are still present in the spectral profile after feature enhancement. The direct selection of the band at the peak and valley features for modelling is subject to change, which will affect the modelling stability and inversion accuracy. Significance tests were performed by calculating correlation coefficients between sample salt concentrations and full-band spectral feature covariates in different spectral forms. The Pearson correlation coefficient was calculated using band-by-band to select the characteristic bands (Guo K M, 2020). The equation is as follows.
Where j = band number rj = correlation coefficient between sample salt concentration and spectral data Rij = spectral reflectance of the i sample in the j band j R = average of n samples in the j band Si = soil salinity of the i sample S = average of sample soil salinity n = number of modelled samples, n=27 Li S M (2011) performed a single-band analysis of spectra on heavy metal content, identified eight heavy metal response bands such as Cr and Ni. A regression model was established. The high correlation between spectral reflectance features and sample salinity reflects the potential of using remote sensing tools to assess the degree of salinity. The bands selected for the salinity features in this experiment are shown in Table 3.

Table3. Participation in modelling characteristic bands and correlation coefficients
Note: ①b represents the band in nm; (r) represents the Pearson correlation coefficient value that passed the significance test at the p < 0.05 level (two-sided), and the explanatory variables are listed in descending order of contribution.

Inversion Modelling
Simple Linear Regression (SLR) is one of the inversion-fitting prediction models. It contains an independent variable x and a dependent variable y. A linear equation is simulated based on multiple sets of data (x,y) to obtain a primary linear equation, and a new y is input to automatically invert the predicted x value. Multiple Linear Regression (MLR) is a regression modelling method using multiple sets of independent variables on a single dependent variable. The explanatory variables that are strongly correlated with the dependent variable are introduced into the regression equation. The correlation coefficients among the explanatory variables are tested to screen and eliminate the variables with strong covariance, thus ensuring that the set of variables is optimal and has low redundancy (Wang F, 2017). The soluble salt concentration was obtained by fitting the inverse of the spectral feature parameters at the characteristic bands by the two models mentioned above.

Accuracy Analysis
The goodness-of-fit statistic was selected to evaluate the accuracy and quality analysis of the model. The coefficient of determination (R 2 ), root mean square error (RMSE), and relative percent deviation (RPD) of the calibration and validation sets are compared for evaluation. R 2 is used to determine the model fitting effect, RMSE is used to determine the predictive power of the model, and RPD can reduce the effect of differences in the range of predicted sample attribute values in different studies and measure model reliability to some extent (Xiao Z Y, 2021). The larger the R 2 , the more stable the model; the smaller the RMSE, the higher the prediction accuracy. The RPD value between 1.4 and 1.8 indicates a good model and can be used for prediction. The RPD value large than 2.0 indicates a good fit with high reliability.

Characteristics Analysis on Spectral Curves of samples containing salt
From the analysis of the spectral curves in Figure 2, it is clear that the spectral curve trend shows an overall consistency. An obvious asymmetric absorption valley is formed at around 1440nm and 1950nm. The latter is deeper and wider. The reason for this analysis is due to the strong absorption of salt in the water absorption band. The valley depth expands non-linearly with the increase of salt concentration. The reflectance of the samples showed an overall positive correlation trend with the salt content. Among them, the reflectance of 1% and 0.8% salinity was significantly higher than the rest of the group, forming a transversal interval of discontinuity. The preliminary analysis is due to the formation of white frost salt spots on their surfaces. In the 400 nm-1400 nm band, the spectral reflectance showed an increasing trend, with the fastest slope growth at 600-800nm.
Comparing the original spectra, CR and LR spectra, the peaks and valleys of the transformed spectra are more prominent and appear more frequently. The effect on the surface spectral reflectance of the mural samples by different salt concentrations can be quantified more finely. By calculating correlation coefficients with significance tests, 10 bands from 2051 bands were selected as candidates for feature band modelling. As seen in Figure. 4(a-c) of the spectra after different transformations.

SLR Inversion Modelling Accuracy Analysis
The single band with the largest correlation coefficient based on different spectral feature transformations was used as the independent variable to invert the soluble salt concentration of the mural samples. Four SLR inversion prediction models were developed as shown in Table 4. As the soluble salt concentration increased, the reflectance values tended to increase and both were positively correlated. Soluble salt concentration was negatively correlated with the valley depth after continuum removal. The same is as the logarithm of reciprocal.
Note: ① R800nm and R1413nm represent reflectance values at wavelengths 800 and 1413nm, respectively, and R420nm and R1415nm represent spectral reflectance values at 420nm and 1415nm after CR and LR treatment, respectively. y is the salt concentration value of the simulated mural samples in the modelling group.
The four SLR modelling accuracies are shown in Table 5, and the accuracy of the modelling set data is significantly lower than that of the validation set, which is tentatively inferred to be related to the number of sample spectra. Among the six goodness-of-fit statistic metrics, three of the LR models had the highest and two of the CR models had the highest, which is indicating that after spectral enhancement, the spectral hidden information is amplified, the useless noise is effectively suppressed, and the inversion fitting accuracy is improved. In terms of model reliability, the four models are ranked LR>R2>R1>CR. In terms of model prediction accuracy, the four models are ranked LR>CR>R1>R2.
The comparative analysis from the scatter plots 5 (a-d) of the predicted-measured values of salt concentration. The scatter deviation in the 1:1 trend line of the R1 model is relatively severe. The intersection of the fitted lines of the modelling group and the validation group under the CR model exists, which reflects some modelling superiority. The discrete points in the R2 and LR models are evenly distributed on both sides of the fitted line, and the asymptotic effect between the calibration and validation groups is obvious, indicating a better correlation.
The predicted concentration is more closely matched with the measured concentration, and the model is stable. The combined results show that the SLR inversion model of the soluble salt content of mural samples at the 1415 nm band is better based on the logarithm of the reciprocal model (RC 2 = 0.737, RMSEC= 0.135, RPDC= 1.479).

Multiple Linear Regression Modelling Analysis
The feature bands with large contributions shown in Table 3 were substituted into the fitting equation as independent variables. They were used as a comparison with the single-band modelling accuracy. Most of the 10 selected superior eigenbands are concentrated in the visible-near-infrared band. These explanatory variables contain more useful spectral information. From the modelling results presented by the multivariate linear regression model in Table 6. The accuracy is improved in all cases compared to the single-band modelling. Up to three explanatory variables are introduced based on different spectral transformations. This means that more variables are included in the model and the coefficient of determination is improved but the RMSE is also amplified at the same time.
The determination coefficient of the multivariate model consisting of spectra at 420 nm, 584 nm, and 2379 nm after continuum removal reached 0.846 which was the highest among all models, while the residuals were the lowest among the MLR. The RPD of up to 3.24 was considered to have high reliability of fit. The reason for this analysis is that the implicit information is amplified under CR. The span of the three explanatory variables has low self-collinearity, but the correlation coefficient is high for salt concentration. The next better model is based on the LR. The modelling accuracy of the two linear regressions combined shows that it is the most stable. The modelling coefficients of the determination reached 0.737 and 0.761 respectively, which were improved compared to the accuracy coefficients of the original spectra. As far as the accuracy of the validation set is concerned, except for the ternary modelling of LR, the RMSE of all sets is lower than 0.1, which are considered to be of predictive value. Among them, the validation bivariate regression coefficient of determination after continuum removal was as high as 0.926, and the RPD is larger than 6, which proved the potential value of the model for general applicability.
The analysis was carried out from the scatter plot Figure 5. The ternary linear model in the MLR model was plotted, and a visual overall interpretation of the model fit was made by the dispersion of the measured scatter values relative to the 1:1 trend line. As seen in Figure 5(f), the CR is modelled closer to the predicted fit line in the low concentration range. The scatter distribution is more average. As seen in Figure 5(g), the LR model is more stable and has better predictions at the position with low and medium salt concentrations. The existence of the intersection of the two fitted lines proves the high stability of the model. On the other hand, at the position with high salt concentrations, the data points are relatively scattered, and they are far from the vertical distance of the fitting line. The prediction effect of the model is general. However, the model still has some reference significance. Based on the CR model after continuum removal, the fitted equations established by multivariate linear regression of the spectral valley depths at the three characteristic bands of 420nm, 584nm, 2379nm are the optimal salt concentration prediction models for the mural samples.

DISCUSSION
The variability of the original spectral reflectance over the spectral bands is not prominent enough and the strong spectral correlation information is hidden, making it difficult to use it directly for modelling. After the CR spectral transformation, the spectral absorption valley features are increased and the strong correlation between the bands and the salt concentration is reflected. Picking combinations of the characteristic bands and linearly fitting the salt concentration with the valley depth can give good predictions. The spectral enhancement approach and modelling tools in this paper are consistent with the results obtained in the study by Tan T (2021). She performed continuum removal on the smoothed spectral data and then developed a multivariate linear model, which has good applicability to the inverse prediction of iron oxide in the Da Wei Mountain Forest. MLR also has shortcomings such as dependence on modelling data. In this study, multiple linear modelling approaches were attempted, and the inversion accuracy was initially explored through wave combinations.
The CR-MLR model may provide a new idea for predicting the salt content concentration of murals in the future.
The application of hyperspectral salinization detection to murals is a new and innovative field, which is still in the initial exploration stage. The basis of future research should focus on the optimization of modelling approaches and the promotion of practical applications.

CONCLUSION
To address the problems of complex, costly, inefficient, and damaging mural salt detection, we proposed a method to invert the salt content of mural paintings based on hyperspectral information. Firstly, the gradient concentration mural samples are produced, collected, and pre-processed. Then, the curve hidden salt-sensitive spectral information is mined by different feature enhancement means and correlation analysis. Finally, the multiple linear regression model with high contribution bands was attempted to predict the inversion of salt concentration in the murals. According to the accuracy coefficient and variance analysis, the CR-MLR model fitted best with RC 2 = 0.846, RMSEC= 0.138 and RPDC= 3.240. It is confirmed that there is a strong correlation between salt content and spectral features in the murals, and the hidden information is amplified by pre-processing and spectral enhancement.
This method can be used for the non-destructive quantitative detection of salinized areas of murals. The linear model has some limitations due to the complex composition in solid murals. The nonlinear model with the mural salinization index is still the direction of future research.