PRACTICAL APPROACH FOR HYPERSPECTRAL IMAGE PROCESSING IN PYTHON

: Python is a very popular programming language among data scientists around the world. Python can also be used in hyperspectral data analysis. There are some toolboxes designed for spectral imaging, such as Spectral Python and HyperSpy, but there is a need for analysis pipeline, which is easy to use and agile for different solutions. We propose a Python pipeline which is built on packages xarray, Holoviews and scikit-learn. We have developed some of own tools, MaskAccessor, VisualisorAccessor and a spectral index library. They also fulﬁll our goal of easy and agile data processing. In this paper we will present our processing pipeline and demonstrate it in practice.


INTRODUCTION AND MOTIVATION
Python is a go-to programming language of many scientists and it could also be good programming language for hyperspectral data analysis.It has advantage of being actively developed, free, open source programming language.In addition, since it looks like pseudocode, it is easy to learn and write.There are Python tools and packages for all kinds of users, and especially for scientists.There are specialized open source tools for hyperspectral data analysis like Spectral Python (Boggs, n.d.) and HyperSpy (de la Peña et al., 2017), but the scope of potential usage may be too narrow and the structure of such an specialized tool can be too strict for some purposes, for example for transferring data to machine learning algorithm and developing tools that work together with them.
In this paper, we utilize some general open source tools for different aspects of hyperspectral data analysis and determine if they are useful for analysing and visualising hyperspectral images.We also introduce some new tools and packages, which are our own work.We aim at providing the reader with a modular set of tools that can be used in many contexes.These tools are reusable elements, which work fine on their own and can be used for building more complex tools.The packages and tools will be evaluated using following questions: How easy is it to use?How agile is it?What can we do with it?

DIFFERENT ASPECTS OF HYPERSPECTRAL DATA ANALYSIS
In this section we will go through different aspects of hyperspectral data analysis and an example of how the selected tools can be used in these subjects.The example is divided into smaller examples and what has been done on previously is assumed to hold on to the new example.We go through the example in figures and in text, and the source code is included in the figures.The example problem is that we have a hyperspectral image of a forest and a dataset of two tree species, birch and pine, in that forest, and we want to use machine learning to differentiate one from the * Corresponding author other.First we of course need to import all of the packages, like in figure 1.

Handling hyperspectral data
For handling hyperspectral data, we recommend the xarray1 package (Hoyer and Hamman, 2017).It provides multidimensional arrays and datasets with metadata.It is an actively developed open source project by the pydata team.The basic usage of xarray is relatively easy and for more advanced users it offers plenty of options for handling the data.Xarray's basic idea is to have netCDF (Rew et al., 1997) et al., 2013-), which in turn uses GDAL (GDAL Development Team, 2018).Rasterio is a python toolbox developed solely to read and write geospatial data, and it does it well.GDAL (Geospatial Data Abstraction Library) is a lower level C++ library that translates geospatial raster and vector data.
index, but with DataArray we have names like latitude, longitude and wavelength3 .Then we can extract data from DataArray like in figure 4 by telling it that we want to see data where latitude is between 39 • N and 40 • N, longitude is between 116 • E and 117 • E, and wavelength is between 400 nm and 700 nm.
c u b e .s e l ( l a t i t u d e = s l i c e ( 3 9 , 4 0 ) , l o n g i t u d e = s l i c e ( 1 1 6 , 1 1 7 ) , w a v e l e n g t h = s l i c e ( 4 0 0 , 7 0 0 ) ) Figure 4.Here we use xarray's sel-method to extract the data we want.
There are also other useful functionalities of xarray DataArray.
For example two or more arrays can be attached to each other with easy one line command, where the user only has to align the arrays by common dimension.Generally speaking, xarray handles dimensions well and altering and extracting data using them is generally quite easy.Xarray also handles missing data well and there is possibility to use Dask arrays to parallel compute.
Xarray fullfills our criteria of being easy to use and agile.It has a lot of functionality, enough to keep basic and advanced users satisfied most of the time.

Visualisation
For visualizing the xarray data, one excellent solution is Holoviews4 (Stevens et al., n.d.).Holoviews is a visualization library that uses Bokeh (Bokeh Development Team, 2014), Matplotlib (Hunter, 2007) or Plotly (Plotly Technologies Inc., 2015) for showing images.All figures in this paper are produced with Holoviews using Bokeh or Matplotlib visualisation backends.
Basic idea of Holoviews is that visualizing of data should be easy and simple.If user wants to see anything, it should not take many lines of code.In our opinion, Holoviews succeeds in that goal.
As we move on, one will see that all images in this paper are produced with less than four lines of code.One basic example of producing Holoviews image is to look at one band of a hyperspectral image like in figure 5.
Now that we have figured out how to visualise a single channel of an image, the next logical step is to want to visualise the entire multidimensional dataset.This is also easy.Holoviews supports multidimensional datasets very well and there are data backends that support multiple different data formats including xarray.As we can see in figure 6, more complex visualisation is easy to make.In the example we make a Holoviews dataset out of xarray DataArray, and tell Holoviews to make a series of images out of the dataset.
One of the properties of Holoviews is that one can make interactive figures using the Bokeh backend with no extra effort.By having Bokeh backend selected user can right away use interactive tools like zooming the image either by scrolling or drawing boxes on the image.A little more work is required for using hover, tapping or selection tools, which all can be programmed to do what the user wants them to do.An example of usage of tapping and selection tools are using them to select data for further analysis or activating other visualisation with them.A good platform for using Holoviews is Jupyter Notebook (Kluyver et al., 2016).Jupyter Notebook is a web application where user can code in Python and output images and write narratives between code blocks.The user has to activate Holoviews by importing it and declaring the visualisation backend like in figure 1.When one is visualising figures in Jupyter Notebook, it is possible to fine tune figures, by using output cell magic and Holoviews opts.We use output cell magic and opts in figure 6, where lines starting with % or %% are the cell magic lines.In the example of the figure we tune the size of the font and the size of the image.This fine tuning is absolutely necessary if one needs to produce figures for a publication or needs good looking images for any reason.Matplotlib backend is better suited for publication quality figures.
Holoviews is purely a visualisation library.The user can make data move between two images in the same visualisation, but the developers have not build a way to get this data for further use and the only way of getting a data output is by coding it.However, once coded, these background processes are relatively easy to attach to an image.Holoviews is an open source project and it is developed by the ioam team.Holoviews is easy to use and it can be bended to do many things.It makes beautiful images, and in all is an excellent choise for visualisation.

Masking and visualizing xarray
Using xarray and Holoviews together is made easy by Holoviews developers.Xarray is one of the available backends for Holoviews.That means, one can easily produce an image from xarray using Holoviews.There is still some difficulties involved, and to address those difficulties, we use xarrays extendability.
Making an extension to xarray in figure 7 is done by making a Python class and declaring it as dataset or DataArray accessor.Figure 6.Here we make a more complicated Holoviews visualisation by using Holoviews dataset.From using Dataset, we get a slider that goes through the wavelength bands.
These extensions are relatively easy to make and can extend xarray's functionalities to anything one might want it to do, within reasonable limits.We have developed two DataArray accessors, MaskAccessor and VisualisorAccessor5 .The reason for developing both of these tools is that we want to use more complicted background interactivity tracking with Holoviews and get the data out of the visualisation.
MaskAccessor is a general masking tool for xarray, and the main function of it is to help collect data points to further analysis, such as machine learning or modelling.It provides an interface for selecting pixels in n-dimensional datasets.In figure 8 we see that the accessor is initiated when one imports the xarray and the accessor.After that every DataArray has the property, and the user can use the accessor by calling it by name.The mask dimensions are set at the initialisation to be the first two dimensions of the DataArray, but there is the reset method that is used to change the dimensions, as we see on figure 9.One can also initialise the mask here or just assign a new mask afterwards.The MaskAccessor class checks that the shape of the mask is correct.Finally there is a histogram method (figure 17), that calculates histograms for each bands and shows those histograms side by side.This is translated from hsicube (Eskelinen, 2017) MATLAB package to Python.

Machine learning
Machine learning can be handled using scikit-learn6 package (Pedregosa et al., 2011).The main idea of scikit-learn is to make  simple and efficient tools for data analysis.Part of the simplicity is documentation, and with scikit-learn it is done well.There is flowchart for finding a suitable estimator, and every estimator is documented so well, that one can easily learn to use them decently.
Another thing we want to point out is the variety of implemented algorithms.Every well established machine learning algorithm can be found.Still, there are no duplicates, and the user does not have to worry about competing implementations, and the API is consistent through the algorithms.
Other useful properties are flows, parallel computing, fine tuning.
The user can relatively easily make a workflow, that preprosesses data, does cross-validation on desired estimators with desired parameters and returns the estimator, that seems to produce the best result.The basic forms of the estimators are simple, but there are multiple parameters that one can use to fine tune the estimator.
The usage is simple since the algorithms are well documented and their API is simple, yet agile.Scikit-learn is also free and open source, and it is developed by scikit-learn team.It fullfills Now we can use the tree coordinates to train a machine learning algorithm to recognise birch from pine.We take 30 * 30 box around every tree and calculate histogram of the box.These histograms are used to train the algorithm.We also have to make nan-values zero for this.In figure 19 we use VisualisorAccessor to make the histograms and goal vector and prepare them for machine learning.
In figure 20 we train the machine learning algorithm.For this example we are using support vector machine algorithm.We also do cross-validiation with GridSearchCV.Both of these functions are functions from scikit-learn.Then we print out the results, and that tells us the best accuracy score8 and the best parameters.
In figure 21 use the predictor to predict the species of a 30x30 histogram, that is made from a hyperspectral image of a tree.From the result we could then interpret wheter the estimator estimates the histogram as a pine or a birch.Figure 19.Calculating histograms and preparing data for machine learning algorithm.

Other aspects
Other notable libraries for hyperspectral data analysis are already mentioned Bokeh for advanced visualisation, scikit-image (van der Walt et al., 2014) for image data analysis and Tensor- Flow (Abadi et al., 2015) and Keras 9 (Chollet et al., 2015) for deep learning.
Bokeh is a package that has been on the rise in 2017.Bokeh makes interactive Python visualisations, using JavaScript.It is a backend of Holoviews, and if one wants to understand Holoviews deeply, this is one place to look at.Bokeh visualisations are generally quite beautiful, but it comes with expence of computational complexity and increased memory usage.
Scikit-image is a sister package of scikit-learn.Scikit-image is focused on computer vision and image processing.The same advantages as with scikit-learn apply here.The API is consistent and simple and the wide variety of algorithms is well curated.
TensorFlow and Keras are deep learning libraries.TensorFlow is considered to be the state of the art at this field, but the syntax is difficult and learning curve extremely steep.Keras uses Tensor-Flow as a backend, and offers simpler syntax.If one is a beginner on deep learning, Keras is a library to more easily get started, but as one is becoming more advanced user, TensorFlow's flexibility and increased tuning possibilities start becoming more attractive.

CONCLUSIONS AND FURTHER WORK
We have gathered and further developed an agile and easy to use pipeline for hyperspectral data analysis in Python.The tools we have investigated have wide range of advantages such as simple API:s, variety of different implementations, back ends and tools and extendibility.
In addition to that, Python programming language has large user base and active developer community, which guarantees that Python keeps up with needs of scientists.The packages mentioned in this paper are all actively developed and thoroughly tested.
Also, especially xarray and Holoviews are good Python tools for hyperspectral data processing and visualisation.These tools seem like they are made for this use, but they still provide the generality of non-specialised tools.Compared to HyperSpy end Spectral Python our solution is much more modular and open to extending with new blocks.
Finally we would like to suggest, that in the context of using Python in hyperspectral data analysis, there is need for developing a graphical user interface that uses these tools and finding out best practises for utilising deep learning algorithms.We are starting to develop the graphical user interface in this summer.
Deep learning algorithms are becoming more and more attractive when there is more and more computational power available.The algorithms are computationally intense, but when they are used correctly they provide strikingly good results.These algorithms can be applied on many of the problems on the field of hyperspectral data analysis, such as object recognition, classification or for example analysing the health of a crop.
On this specific toolset there is work to do with parallelisation, since the datasets are huge and paralellisation would make the computations faster.
We have also started to develop a Python library for spectral indices, and are quite far in it already.The leading principle of our implementation is to make a simple implementation of every index on website indexdatabase.de, and wrap the implementation lightly with features that help in the usage.The point was to make the indices easily computable, so that the user could easily use a loop to go through the indices.
One thing we need to define was the API for selecting bands.For many of the indices, they are not defined for exact wavelengths like 745nm, but rather for red light and user needs to define this as he/she wishes.This is for now done by declaring the defaults in form of a Python dictionary.The other thing to consider is how is a band selected.Is it selected only if there is a clear match, the indice wants wavelength 500nm and our data has exactly that or is there room for approximation?This is still work in progress.
Other thing we we are considering in developing this package is bands.How are they defined?There is big difference between a camera that has the same response on a interval around the middle value and camera that has more gaussian response.These differences should somehow be accounted for with software.The response function could be used in selection, and inbetween values coud be interpolated from two or more bands based on their responses.The response function is definately important in presicion applications and this problem needs to be solved.

Figure 1 .
Figure 1.Importing all necessary packages and declaring that Holoviews should use Matplotlib backend.
6 0 ] c u b e .c o o r d s [ ' w a v e l e n g t h ' ] = ( ' band ' , w a v e l e n g t h ) c u b e = c u b e .s w a p d i m s ({ ' band ' : ' w a v e l e n g t h ' }) c u b e .v a l u e s [ c u b e .v a l u e s <0]=np .nan

Figure 2 .
Figure 2.Here we read the cube, attach wavelength data to it and remove non-physical negative values.

Figure 3 .
Figure 3. Simple print-command to see what the cube holds inside.
Figure 5.Here we produce a very simple Holoviews visualisation by telling Holoviews Image to use the xarray data we provide it.This is an image of a Finnish forest.
%%o p t s Image [ f o n t s i z e ={{ ' t i t l e ' : 1 5 , \ ' x l a b e l ' : 1 5 , \ ' y l a b e l ' : 1 5 , \ ' t i c k s ' : 1 5 } } , \ f i g s i z e = 3 5 0 ] d s = hv .D a t a s e t ( cube , vdims = [ ' V a l u e ' ] ) d s .t o ( hv .Image , kdims = [ ' x ' , ' y ' ] , dynamic = T r u e ) Figure 7. Extending xarray with a simple Accessor.Here we declare that CatAccessor is a DataArray accessor and define it.
Figure 8.When the accessor is imported, every xarray DataArray created after that has the accessor attribute.

Figure 9 .
Figure 9. Different ways of assigning a specific matrix as the mask.On figure 10 one can see four different selection methods to set mask on individual points.
Figure 13.VisualisorAccessor has three different chooser tools.

Figure 14 .
Figure 14.Screenshot of the point chooser.

Figure 15 .
Figure 15.Screenshot of the box chooser.
Figure 17.Visualisation of the histogram.The histogram tool returns an image of the histogram, the values of the histogram and the bin edges.
Figure 18.Visualisation of the trees over the image.In this block we read tree data as pandas DataFrame and visualise the trees on the top of one band of the cube.
Figure 20.Here we train the machine learning algorithm and print out the result.caption=Results of the training.Here we see that best estimator predicts correctly 96.6% of the time.
Figure 22.The initialisation process of spectral indices library.This is still work in progress.
Figure 23.We loop through the indices, and take those that are sensible.
compatible multidimensional array object in Python.NetCDF stands for network Common Data Form and the basic idea is that the netCDF file describes itself to the reader.Xarray is also easily extendable, which means that one can add new properties as they are needed.