AN INTEGRATED MINERAL SPECTRAL LIBRARY USING SHARED DATA FOR HYPERSPECTRAL REMOTE SENSING AND GEOLOGICAL MAPPING

Mineral spectral library (MSL) is the foundation of hyperspectral remote sensing, and a significant tool of storing and managing massive mineral spectral data to facilitate the matching or identifying of unknown rocks and minerals conveniently and fast. However, mineral spectral data are scattered and stored in different spectral libraries worldwide, which behave different spectral resolutions, mineral categories and measurement parameters, and hinder its application in field investigation, mineral identification, landcover identification and geological mapping. An integrated MSL using shared data is developed currently in Central South University, China, to improve the properties of MSL. We collected the shared spectral data and related information (e.g., mineral attribute data, spectrometer information, etc.) worldwide, performed data cleaning measures to retain the qualified spectral data and consolidated all the data in a common framework so as to establish a reliable and comprehensive dataset, and developed an integrated MSL for data management and diversified applications. The user can analysis the target spectrum with the spectrum absorption characteristic parameters, and match the measured spectral curve with the reference spectrum in the integrated MSL to find the most similar spectrum curve. It’s crucial to note that a new spectrum classifier was designed to limit the scope of matching for improving the efficiency of identification when the experimental sample lacks the specific information. The integrated MSL is developed in B/S and C/S website environments. A demonstration of functions of the integrated MSL and its preliminary applications are introduced in the article.


INTRODUCTION
Hyperspectral remote sensing started in the 1980s and the sensors can obtain spectral information of hundreds of bands in a very narrow band width within the wavelength range from visible light to infrared light (Magendran, Sanjeevi, 2014), which is equivalent to a complete and continuous spectral curve. Hyperspectral remote sensing was firstly proposed to investigate the objects on earth's surface, which has been successfully applied in the geological field (Goetz, 1981). Nowadays, researchers are more likely to obtain the reference spectra of ground objects by means of hyperspectral remote sensing images and establish a spectral library for spectral analysis and matching of unknown features (Mulder et al, 2013).
Mineral spectral library (MSL) is a collection of mineral spectral data obtained by using multispectral and hyperspectral remote sensing instruments, which can realize rapid recognition of unknown rocks and minerals, and support the management and analysis of spectral data. Due to massive spectral data in hyperspectral remote sensing and the requirement for refined analysis, MSL plays an increasingly important role in identifying unknown rocks and minerals in laboratory test or field investigation. However, mineral spectral data are scattered and stored in different spectral libraries worldwide (Tong et al,1998;Grove,1992;Clark,1990), which behave different spectral resolution, mineral categories and measurement parameters, and hinder its application in field investigation, mineral identification, land cover classification and geological mapping. Meanwhile, most of spectral measurements of the features are not standardized, and the parameters such as test conditions, particle size and geodesic structures are not recorded in detail (Li, 2008). Consequently, these irregular or disordered data are formidable to be used by researchers as identification standard.
There are some representative digital spectral databases in the world, such as those developed by the Jet propulsion laboratory (JPL), the United States geological survey (USGS), the Johns Hopkins University (JHU) and so on (A.M. Baldridge, 2009). Since the 1970s, many research teams have developed more than 10 featured spectral libraries (Tian, Gong, 2002). The basic information of these representative spectral libraries is shown in table 1. The spectral libraries are also linked to some software (e.g., ENVI, ERDAS, etc.) for remote sensing data processing (Li, 2006).

Spectral library
Institution The current trends of spectrum library development are not only the accumulation of spectral data, but also the use of data mining and deep analysis techniques to gradually establish a reliable spectrum knowledge base and the discovery of universal laws to provide more and better support for remote sensing applications (Zhang et al, 2017). Therefore, we collected the shared spectral data worldwide and consolidate all the data in a common framework so as to establish a reliable and comprehensive dataset, and develop an integrated MSL for data management and diversified applications. An integrated MSL using shared data is developed currently in Central South University, China, to improve the properties of MSL. A demonstration of chief functions of the integrated MSL and its preliminary applications are introduced.

Shared Data
The integrated MSL contains mineral spectral data from several shared spectral libraries, including USGS, JHU, JPL, DPS Geosciences Spectral Library, Janice Bishop Spectral Library, Hyper-spectral Image Processing and Analysis System and so on (e.g. Table 2). The specific information of these spectral libraries is introduced below. The data wavelength range of the USGS spectral library is 0.2 to 3.0 μm, and its spectral data reaches 2468 cases, including seven categories: artificial grounds, coatings, liquids, minerals, organic compounds, soil and mixtures, and vegetation. The JPL database contains 3,104 minerals with spectral wavelengths ranging from 0.4 to 2.5 μm, with spectral resolution of 0.4 to 0.8 μm at 1 nm and 0.8 to 2.5 μm at 4 nm. The sample size is divided into 3 categories: less than 45 μm, 45 to 125 nm and 125 to 500 nm. Johns Hopkins University Spectral Library (JHU) was divided into 15 sub-databases according to the ground object categories, and its spectral wavelength ranges from 0.3 to 14 μm. The spectral data downloaded from the JHU spectral library are basically consistent with the spectral data of the JPL spectral library. The wavelength of PDS Geosciences Spectral Library ranges from 0.3 to 26.0 μm and the library stored 380 samples with particle size of mm on earth, including minerals, rocks, and unconsolidated materials. Janice Bishop's Spectral Library is composed of 82 minerals, 20 rocks, and 248 loose materials, with spectral wavelengths ranging from 0.3 to 26 μm.
The infrared spectrogram database is a professional chemistry database, whose spectral data range is 0.25 to 5.0 μm. The ground object reflectance spectrum database includes 262 rock mineral spectral curves with wavelength range of 0.4 to 1.0 μm. The mineral infrared reflectance spectrum database contains 583 mineral maps, including sulphide, halide, oxide, hydroxide and so on. The HIPAS database has 125 rock mineral spectral curves with wavelengths ranging from 0.4 to 2.5 μm. Reflectance spectrum characteristic database contains 156 kinds of rock spectra more than 1600 spectral curves in total.

MSL Structure and Functions
In the phases of requirement analysis and system design stage for MSL development, we designed a complete work chain for it. The overall work chain or structure of MSL is shown in figure 1, which includes: 1) add mineral spectral data and related parameters, 2) check data quality, 3) pre-process spectral data and import data into the spectral library, and 4) spectral retrieval analysis and matching test.  Figure 1. The work chain of MSL Correspondingly, the main functions of MSL consist of importing mineral data, data cleaning, mineral spectral data management, spectral retrieval, spectral application, database maintenance and backup, and system management (as shown in figure 2). The combination of the above functions provides sufficient support for the updating and application services of the spectral library.  This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLIII-B5-2020-69-2020 | © Authors 2020. CC BY 4.0 License.

Unify the semantic meaning of minerals and rocks:
We unify the semantic meaning of minerals and rocks around the world by constructing a table of minerals and rocks both in English and Chinese. We could accomplish the import and retrieval of mineral data in different language by via of this reference table. The shared data collected from most databases worldwide were integrated and consolidated in a common framework. The integrated MSL systems can also add more languages to this table, depending on the different storage format of other databases.

Vectorize the images of spectral curve:
For the case that some spectral libraries store mineral data in form of images, we developed a software module to vectorize the images of spectral curve by using of thinning algorithm (Shi, 2008) and non-thinning algorithm (Naccache,1984). The vectorization of spectral curve in raster format is to transform the set of curve pixels into a set of coordinate pairs whose band value is the xcoordinate and spectral absorption value is the y-coordinate. The vectorized line is generally the central skeleton line of the linear body of the grid graph, and there is only one absorption value corresponding to each band value.
The thinning algorithm is used to obtain the centre point of each pixel column on the curve and extract skeleton lines of the imaged curves. The algorithm first takes the pixel on a curve as the seed point, then tracks and searches the pixel, and determines the intermediate pixel of the adjacent pixel column. Then proceed from the middle pixel and trace the centre point of the remaining pixel column in a certain direction, thus traversing the entire spectral curve and preserving the screen position of all centre points of the entire curve (as show in Figure 3).
The non-thinning algorithm is used to extract the boundary of the imaged curves (as show in Figure 4). The overview of the algorithm is to first find the contour of the line body, then determine the corresponding points of its parallel edges and calculate the midpoint of the two points. We use the designed 2*2 window to match the edge of the curve (Fu, 2004), and then choose the direction of movement according to the situation of four adjacent pixels in the window, until returning to the starting point at the end of the traversal.
In order to analyze the reliability of vectorized results, this paper takes the spectral coordinates in the database as true values, compares them with vectorized coordinates, and makes quantitative analysis with 2 , and indexes to verify the reliability of the results. As showed in Figure 5. The formulas for calculating these indexes are following: where = the total number of elements = vectorized coordinate 0 = true coordinate The precision indexes of the thinning algorithm are: 2 =0.99996， =2.8778e-04， =2.1365e-04. The precision indexes of the non-thinning algorithm are: 2 =0.99993， =3.9611e-04， =3.1316e-04. The 2 of the two algorithms are close to 1. The value represented by each pixel in the vertical axis is equal to 3.75e-04, so the error of the absorption value is no more than one-pixel value. The resolution of the extracted digital curves reaches 2 nm and the spectral absorption accuracy reaches pixel level. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B5-2020, 2020 XXIV ISPRS Congress (2020 edition)

Mineral coding:
Minerals and rocks related to the shared spectral data are coded according to general classification rules and mineral coding rules, which are based on mineral chemical compositions and can accelerate the efficiency of mineral spectral matching. The MSL first divides minerals into categories and classes based on their chemical compositions and then subdivides the minerals into groups under the class according to the principle that the chemical composition is similar but the crystal structure is the same. For this classification scheme, we establish a mineral coding principle. Partial coding scheme is shown in Table 3.

Data Cleaning
We clean all the downloaded data to be imported into the integrated MSL. The mineral spectral coordinates downloaded from shared spectral database, the spectral data vectored from imaged curves, and the sampling spectra tested in CSU must be screened and checked for qualification (Fan, 2011).

Measures for Individual spectrum:
Individual spectrum was examined through boxplot algorithms and signalto-noise ratio (SNR) detection. The boxplot can provide the information of the overall variation range and the contained extreme values of the data. It is characterized by the robustness of median and quartile, and can be used for the detection of coarse errors and outliers (as shown in Figure 6). Signal-to-noise ratio (SNR) is an important index to judge the quality of data. It is difficult to obtain a precise SNR value, but the standard deviation and mean value of spectral data can be used to approximately calculate this index.

Measures for multiple spectra:
For multiple sets of spectral data for the same mineral and with the same spectrometer, the accuracy of internal conformity and the position offset of wavelength of main absorption peak as well as the number of error points in the spectral curve are used as indicators to check whether they are unqualified or not.
The data obtained from the same spectral instrument under the same observation condition should have the same or similar characteristics. If the data measured by the instrument fluctuates greatly, the calibration of the instrument is needed. Only when the instrument is in a relatively stable state can the measured data be usable. In order to check the stability of the measuring instrument, the accuracy of internal conformity should be made under the same observation conditions. First, an average reflectivity curve is calculated from a series of repeated observations, and then the internal coincidence accuracy of each repeated observation reflectivity curve is calculated from each spectral curve and the average curve.
We calculate the difference between the corresponding wavelength of each curve and the mean of all absorption peaks by obtaining the corresponding wavelength of the absorption peaks in several spectral curves. This index called as the position offset of the wavelength of main absorption peak can detect the spectral curve which is extraordinarily discrepant with other curves from the same dataset.
The random error of a single measurement has no rule, but the whole of multiple measurements is subject to statistical rule, such as normal distribution, T-distribution, triangular distribution and uniform distribution. For the repeated measurements of the same object by the same instrument in the same period of time, the repeated reflectance at each sampling wavelength can be regarded as a group of values conforming to the normal distribution. According to the distribution, the data points at the sampling wavelength of each reflectance are determined one by one to fall within the error range. If the error limit is not exceeded, the reflectivity value is considered qualified; otherwise, it is considered unqualified. Finally, all the qualified points involved in the calculation of each curve are counted.

2.4.3
Remove duplicate data: the curve correlation coefficients (e.g., Pearson correlation coefficients, angle cosine, etc.) of the spectral data were calculated to pick out highly similar spectral data, and the most representative spectrum was selected out and imported into the integrated MSL. In this paper, several correlation coefficients are proposed to determine which curves are highly similar to each other so as to remove redundant spectral curves.

Spectrum Classifier
We developed a new spectrum classifier based on data characteristics to limit the matching scope when unknown mineral spectrum is matched with the target spectrum in the integrated MSL. In general, the absorption position of mineral spectrum reflects the component of mineral, while the other absorption characteristic parameters such as absorption depth, absorption area reflect the mineral content (Pu, 2000). Hence the multiple absorption position can be used to distinguish mineral classes and establish decision tree, which can classify the mineral spectra in the integrated MSL.
The spectrum classifier firstly calculates the absorption positions of each mineral spectrum, and subsequently the characteristic values of multiple curves of the same mineral are intersected to determine the most representative combination of absorption positions (Su, 2008). The combinations of absorption positions for different mineral from the same mineral category or class are intersected to determine the new group of absorption positions and obtain larger tree node. The establishment of decision tree also needs the mineral classification table in MSL.
The decision tree of MSL was established by the classifier which are to be used to determine the class of unknown minerals and rocks. The spectrum classifier is applied to simulate several known classes of minerals and rocks, and the tests show that the classification results are identical with the real ones. The decision tree of the experimental samples is shown in Figure 7. The presented spectrum classifier can limit the spectral match scope to improve greatly the efficiency of spectral matching instead of searching the entire database.

MSL Application Modules
This article developed function modules for applications in domains of remote sensing and geological mapping. There are two main application function modules developed including spectral matching and spectral analysis.

spectral analysis:
The spectral analysis module is adaptive to those curves that were stored in MSL or the unknown spectral data. The spectral analysis module can calculate several absorption waveform parameters (e.g., absorption position, absorption width, absorption height, absorption area, number of absorption peak, absorption symmetry, spectral slope, spectral absorption index, etc.) of spectral curve (Wang, 1996), and highlight the characteristics of the spectral curve by making preprocessing operations (i.e., spectral derivative and continuum removal). The waveform parameters help users to further analyse the composition and content of the sample minerals.
The spectral search module can find out the goal spectrum and display the spectral curve from the integrated MSL when the constraints such as mineral name in English or/and Chinese or database name are input.

spectral matching:
The spectral match module uses a variety of frequently used algorithms (e.g., BCM, SAM, SCF, SAI, etc.).
The binary code mapper method (BCM) compares the coding vector of the spectrum to be measured with the coding vector of the reference spectrum, and shows the difference between them by the matching coefficient. After binary code processing, the spectrum can be simplified to 0-1, which greatly improves the efficiency of matching recognition (Jia, 1993). The spectral angle mapper method (SAM) regards the spectral curve as a multidimensional space vector and uses the cosine of the Angle between the spectrum to be measured and the reference spectrum to express their similarity (Bough, 1998). The larger cosine means the higher similarity between the two spectra. When the cosine is equal to 1, we can judge that they are the same mineral. The correlation coefficient is calculated by SCF (Zhang, 2003) and used to measure the linear correlation of two spectral curves. The formula for calculating the correlation coefficient is as follows.
where =the standard deviation of the spectrum to be measured = the standard deviation of the reference spectrum = covariance between the spectrum to be measured and the reference spectrum The spectral absorption index technology (SAI) calculates spectral absorption index by using parameters such as the wavelength position (P), depth (H), width (W), slope (K), degree of symmetry (S), area (A) of the absorption band, and compares the similarity between reference spectrum and pixel spectrum according to the degree of similarity of the absorption index (Wang, 1996). The calculation formula of spectral absorption index (SAI) is as follows.
where =absorption valley reflectivity =left endpoint reflectivity = right endpoint reflectivity = left endpoint wavelength = right endpoint wavelength = absorption valley wavelength The spectral matching range between the reference spectrum and the unknown spectrum as well as its matching similarity level are used to construct the reliability index, which could be used to determine the most reliable matching results of each algorithm. Users of the integrated MSL can not only select one of those algorithms but also choose all the algorithms to determine the most credible result through comparing the matching results of each algorithm, so as to obtain the most matched reference spectrum.

System Development
The C# programming language and SQL Server software are used to develop the C/S terminal and manage mineral spectral data of MSL. Visual Studio platform combined with SQL Server database can realize the effective management, retrieval and application of mass rock and mineral spectrum data.

System interface and function demonstration
The management function of the integrated MSL for mineral spectral data includes adding, deleting, modifying and querying. Besides, the data analysis of reference spectrum and unknown spectrum as well as the search and matching of unknown spectrum are also realized. The main interface and functions of the system are as follows.
The login interface is shown in figure 8. The user logs into the system by relying on the account and password. The system determines the user rights according to the user information table in the database. When login as a user, the spectrum database can be viewed and the spectrum absorption characteristic parameters can be analysed to identify the unknown spectrum. When login in as the administrator, the spectral database can be managed.
The data import interface is shown in figure 9. Only the administrator can import mineral related data in the form of file or folder, such as mineral spectral data, vectorized coordinates, mineral attribute data, mineral category information and spectrometer information. The users of MSL can view the key parameters of the imported data. The data cleaning interface is shown in Figure 10. The interface consists of four part including the pre-process module, the data quality check module, the duplicate data removal module and the data classification module. The function of this interface is to process the data and import the qualified data into the library. Figure 10. Interface of data cleaning The data search interface shown in Figure 11 includes the spectral data search module and attribute data search module. According to the relevant information entered by the user, the target data is found by fuzzy query and the corresponding mineral spectrum is displayed. The user can check the corresponding detailed coordinates of the curve and jump to the interface of spectral analysis by clicking the curve node. Figure 11. The data search interface The spectral data analysis interface that is shown in Figure 12 can calculate the absorption waveform parameters and show the spectral curve with the envelope. The user can mark the corresponding absorption peak position in the form of point or line on the interface or add the spectral curve after removing the envelope to the interface. Figure 12. The data analysis interface The spectral data match interface is shown in Figure 13. The user can import the measured mineral spectral curve (choose whether to remove the envelope or not), and then click the match button to compare it with the corresponding spectral curve in the integrated MSL, and subsequently get the matching correlation coefficient from largest to smallest. Before spectral matching, the user needs to first determine the category of the unknown spectrum. If the data parameters are unknown, the decision tree classification method is used to determine the category. The purpose of this step is to limit the spectral matching scope and improve the matching efficiency. Figure 13. The data analysis interface The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B5-2020, 2020 XXIV ISPRS Congress (2020 edition)

DISCUSSION AND CONCLUSIONS
The author designs and develops an integrated mineral spectral library and a software with functions of data import, data cleaning, data query and data application. The library was developed with C# language, and SQL Server database was used to manage the mass mineral spectral curve and the corresponding description information and observation information. This integrated MSL aims to improve the sharing of mineral spectral information and promotes the application of remote sensing spectral data. We collected the shared spectral data worldwide and retained the qualified data so as to establish a reliable and comprehensive dataset, and developed an integrated MSL for data management and diversified applications. The user of this integrated MSL can conduct a custom analysis of the spectral absorption characteristic parameters, and the measured spectrum can be compared with the reference spectrum accordingly to obtain the matching correlation coefficient and find the most similar spectrum curve. The establishment of the integrated MSL also plays a certain reference significance for the establishment of other thematic spectrum libraries of typical features.
Due to the wide scope of the research and the complexity of the problems, many deep-seated problems need to be further studied.
Firstly, the integrated MSL should be developed in B/S website environment, which can promote the spectral application. Secondly, the mapping processing function of remote sensing image should be considered in the development of MSL. Thirdly, spectral analysis can not only calculate spectral absorption characteristic parameters, but should also be combined with the knowledge base to provide certain basis for the comprehensive analysis of unknown mineral composition and content.