IMAGE CLASSIFICATION FOR MAPPING OIL PALM DISTRIBUTION VIA SUPPORT VECTOR MACHINE USING SCIKIT-LEARN MODULE

The world has been alarmed with the global warming effects. Global warming has been a distress towards the environment, thus shorten the Earth’s lifespan. It is a challenging task to reduce the global warming effects in a short period, knowing that the human population is increasing along with the electricity and energy demand. In order to reduce the effects, renewable energy is presented as an alternative method to produce energy in a way that will not harm the environment. Oil palm is one of the agricultural crops that produces huge amount of biomass which can be processed and used as a renewable energy source. In 2016, Malaysia has reported over 5 million hectares of land were covered by oil palm plantations. Placing Malaysia as the second largest country of oil palm producer in the world has given it an advantage to produce renewable energy source. However, there is a need to monitor the sustainability of oil palm plantations in Malaysia via effective mapping approaches. This study utilised two different platforms (open source and commercial) using a machine learning algorithm namely Support Vector Machine (SVM) to perform oil palm mapping. An open source Python programming-based technique utilising Scikit-learn module was performed to map the oil palm distribution and the result produced had an overall accuracy of 91.39%. To support and validate the efficiency of the Python programming-based image classification, a commercial remote sensing software (ENVI) was used and compared by implementing the same SVM algorithm and the result showed an overall accuracy of 98.21%.


INTRODUCTION
Today, energy crisis has become a serious issue especially for developing countries.Subsequently, the energy demand is increasing as their population is growing (Mekhilef et al., 2011;Ong et al., 2011).Mekhilef et al. (2011) stated that an alternative way needs to be carried out in order to replace the uses of fossil fuels to generate energy.This is because fossil fuels can no longer withstand in the near future due to the impacts towards the environment.Malaysia is blessed with humid and tropical climate which directly puts Malaysia as the second largest oil palm producer in the world.
Oil palm is one of the major vegetable oils and it has been widely used worldwide.Furthermore, oil palm is one of the biomass resources that can be used as a source of energy (Loh, 2017).Bio-diesel extracted from palm oil is biodegradable, safe, and non-toxic, thus makes it suitable to be used as a renewable energy source.In Malaysia, the oil palm plantations had an increment over the years and over 5 million hectares of oil palm area was reported in 2016 (Table 1).Therefore, Malaysia has the potential not only to produce renewable energy source, but also to be used as cooking oil and other food products (Aziz et al., 2011;Umar et al., 2014;Mba et al., 2015).However, it is a big challenge to manage a huge area of oil palm plantations especially when there are many things need to be done and properly planned.Therefore, a proper strategy with suitable and adequate information are essential in order to have an effective plan management.Due to the huge amount of data required, remote sensing offers an effective method to help in a way such data can be obtained.

Year
Oil palm area for Malaysia (ha) Remote sensing is the science of acquiring information without making a direct contact with the object.It has been used in numerous number of fields and disciplines such as agriculture, urban areas, geography, and land surveying (Joshi et al., 2016;Razali et al., 2016;Norman et al., 2017).Furthermore, remote sensing is not only capable of acquiring data in inaccessible area, but also can obtain huge amount of data in a very short time.Furthermore, remote sensing is possible to collect data from various sensors (active and passive) and platforms including ground-based, aerial-based, and satellite-based.Then, the collected data are normally being processed and classified using suitable remote sensing or Geographic Information System (GIS) software such as ERDAS (ERDAS, Inc), ENVI (ITT Visual Information Solutions, Boulder, CO, USA), ArcMap GIS software, and SNAP (Sentinel Application Platform).Basically, software provides tools for image processing which includes image calibration, classification, and accuracy assessment.In other word, software serves as a platform to perform image analysis and map making using various approach and algorithms.Several algorithms available for image classifications are supervised and unsupervised algorithms including Random Forest (RF), Support Vector Machine (SVM), and Maximum Likelihood Classifier (MLC).Li et al. (2015) conducted a study on mapping oil palm in Cameroon using Palsar 50m orthorectified mosaicked images.
The study had utilised SVM, Decision Tree (DT), and K-Mean algorithms for image classifications.Above all the algorithms mentioned, SVM was found as the ideal algorithm for oil palm mapping.Another study on oil palm mapping conducted by Lee et al. (2016) had utilised Landsat data obtained from Google Earth Engine (GEE).GEE is a cloud-based platform that allows the user to perform image analysis including data acquisition and image analysis.The platform uses Javascript that requires the user to code in order to obtain the data from the cloud server.Other than cloud-based platform, coding via programming languages can also be used for remote sensing analysis.A popular language known as Python has been widely used for image classifications, machine learning analysis, and deep learning approaches.
Python is one of the well-known programming languages that is widely used in various fields including data analysis and predictions (Predegosa et al., 2011;Li et al., 2017).There are many libraries available in Python that can be used to perform image analysis including image pre-processing, image classification, and also to produce land use land cover (LULC) map.In addition to that, Python-based image classification allows the user to tune the hyperparameters within the algorithm.The flexibility of Python programming allows the user to choose and design the procedure based on the user's needs.Due to the effectiveness of machine learning on multispectral data as mentioned by Shafri (2017), this study has used a supervised machine learning algorithm that was imported from Scikit-learn module.Due to its great performance in previous studies conducted by Peña et al. (2014), Nooni et al. (2014), andGilbertson et al. (2017), SVM was chosen to be used to map the oil palm distribution.Owing to the versatility of Python programming language in providing number of libraries, this study was conducted to assess the capability of the programming-based using Python version 3.5 to map the oil palm distribution via SVM algorithm.Then, the result obtained will be compared with a well-known commercial remote sensing software, ENVI (Goetz, 2009).

STUDY AREA AND SATELLITE DATA
This study was conducted within Selangor area.Selangor is one of the states where its land is covered with oil palm plantations (MPOB, 2017).To test the Python programming-based approach for image classification, a pilot study was conducted in Sepang, which is located at the southern part of Selangor.
The area was chosen due to its coverage that consists of different features and furthermore, the area has the least amount of clouds.An open source data obtained from Landsat 8 satellite was used in this study.The data with the least cloud cover acquired on 29 th March 2016 was used and the data comes with 11 bands including Multispectral, Panchromatic, and Thermal bands.Figure 1 showed Landsat 8 image of the study area with the combination of band 4, 3, and 2 (true colour).
In order to increase the quality of the image, a pan-sharpening technique was applied using the panchromatic band (Gilbertson et al., 2017;Shaharum et al., 2018).This technique was conducted to increase the spatial resolution from 30m to 15m.The capability of near-infrared band has proved to be a success in differentiating green vegetations from other features which would be helpful for oil palm detection (Candiago et al., 2015;Roy et al., 2016).Besides utilising panchromatic band for image enhancement, only multispectral bands were utilised in this study for image classification.

Image Pre-Processing
The downloaded image is a raw image and it needs to be corrected.Atmospheric and radiometric corrections were applied on the image in ENVI software by converting the Digital Number (DN) to reflectance value.Each pixel consists of different reflectance value depending on its feature.Later, these values were then being assessed by the algorithm to classify the features based on the assigned training and testing samples.

Development of Training Samples
The samples were created via Region of Interest (ROI) in ENVI based on the selected features using square polygons.Four classes were created namely oil palm, built-up/road, non-oil palm, and water.Each class was assigned with a certain number of ROIs and colour.The selection of the ROIs was done based on the high-resolution image from Google Earth.Then, the ROI samples created were exported to tiff file format where it can be used to classify the image via Scikit-learn SVM module in Python.

Libraries in Python for Image Classification
Several libraries such as GDAL, Numpy, Scikit-Learn, and Matplotlib were imported into the Python script.Each library has its own functions and capabilities which made them possible to be used for image classification (Predegosa et al., 2011).The pre-processed image and the samples were imported using GDAL.To perform the image classification in Python, the samples should be assigned accordingly to the georeferenced image.Therefore, to ensure that the samples are placed correctly according to the assigned feature, the samples were geo-coordinated by using the satellite image as the reference.

Support Vector Machine
SVM is an advanced machine learning algorithm that works by separating the support vectors at maximum distance by using a hyperplane (Müller et al., 1997;Mountrakis et al., 2011;Tehrany et al., 2015).It can work well even with the limited number of samples.A number of kernels are available in SVM and Radial Basis Function (RBF) was chosen to classify the image as the results from previous studies showed that RBF is the most superior kernel (Foody and Mathur, 2004;Bekios-Calfa et al., 2011).The common parameters presented in RBF were gamma and penalty and these parameters were tuned in order to produce the best result.

Accuracy Assessment
The samples were divided into 70/30 ratio whereby 70% taken from the whole samples was used to classify the image.Then, the other 30% was used to validate the output produced in a form of a classified image.The assessment was done using a train-test-split module in Python that was imported from the Scikit-learn module.

Ground Truthing
The ground truthing was conducted based on the available highresolution image from Google Earth image and a reference from the LULC map provided by the Department of Agriculture (DOA).These available references were not only being used as an aid in producing the samples, but also to validate the outputs produced.

RESULTS AND DISCUSSION
The parameters of SVM were adjusted and the best output produced was used to represent the oil palm distributions for the area.To measure the capabilities of utilising Python programming-based, the result produced was compared by classifying the image using the same algorithm and parameters in a commercial software, ENVI.

Classification of Oil Palm
Four classes (water, non-oil palm, built-up, and oil palm) were classified and the classified image produced in Python and ENVI were exported to a tiff file format as shown in Figure 3 and Figure 4 respectively.The area consists of numerous number of vegetations and other features including ponds, buildings, and oil palms.Other than oil palm, all vegetations and trees were classified as the non-oil palm feature.

Discussion
The overall accuracy (OA) produced in Python and ENVI were 91.39% and 98.21% respectively.Though Figure 3 and 4 showed almost similar results, the OA produced in ENVI was higher than the OA produced in Python.The confusion matrix for both results produced by Python and ENVI were shown in  Table 2 showed some misclassifications occurred between builtup/road and oil palm features.Then, a bit confusion was found between non-oil palm and oil palm.On the other hand, Table 3 showed less confusion between oil palm and non-oil palm features.However, the visualisation for the classified maps showed almost similar results for the class of oil palm and non-oil palm.Even though OA produced in Python was lower than the OA produced in ENVI, the output produced in Python was said to comply better to the reality.

Conclusion
Python programming-based utilising Scikit-learn to perform SVM classification managed to produce a reasonable output.It can identify the oil palm distributions similar to the softwarebased technique though the OA produced in Python was lower than the OA produced in ENVI.On top of that, the time taken for the SVM classification applied in Python was shorter than the commercial software-based SVM classification.This method can later be tested on a larger area for further assessment.In a nutshell, the performance of Python is convincing (based on the benchmarking with the industrystandard software e.g.ENVI) and provides a cost-effective and innovative alternative as it is open source and free.

Future direction
There are few methods that can be done in order to assess and measure the accuracy of the outputs produced.Depending to only one source might not be sufficient to evaluate the accuracy of the algorithms performed as the OA produced alone does not define the precision of the output.Furthermore, besides SVM, Python programming provides other algorithms such as RF, Neural Network, and other machine learning algorithms which later can be tested on other satellite data with different sensor and spatial resolutions.Gilbertson, J. K., Kemp, J., & Van Niekerk, A. (2017).Effect of pan-sharpening multi-temporal Landsat 8 imagery for crop type differentiation using different classification

Figure
Figure 1.Study area

Figure 2 .
Figure 2. Flow chart for the work flow

Figure 3 .
Figure 3. SVM classified image using Python

Table 3 .
Confusion matrix produced in ENVI