The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Download
Citation
Articles | Volume XLII-4/W16
https://doi.org/10.5194/isprs-archives-XLII-4-W16-143-2019
https://doi.org/10.5194/isprs-archives-XLII-4-W16-143-2019
01 Oct 2019
 | 01 Oct 2019

PREDICTION AND MAPPING OF COCHLODINIUM POLYKRIKOIDES RED TIDE USING MACHINE LEARNING UNDER IMBALANCED DATA

S. H. Bak, D. H. Hwang, U. Enkhjargal, and H. J. Yoon

Keywords: Machine Learning, Deep Learning, Harmful Algal Bloom, Red tide

Abstract. Cochlodinium polykrikoides (C. polykrikoides) is a phytoplankton that causes red tides every year in the middle of the South Sea of Korea. C. polykrikoides is a harmful Algae that has migratory ability and causes the fisheries damage over a long period of wide sea area if it causes red tide once. To minimize red tide damage, it is important to anticipate and prepare the red tide occurrence timing and location in advance. In this study, we predicted the occurrence of red tide of C. polykrikoides using machine learning techniques and compared the results of each algorithm. Logistic regression model, decision tree model, and multilayer neural network model were used for prediction of red tide occurrence. To produce the data set for model learning, we used the red tide occurrence map provided by the National Institute of Fisheries Science, the Local Data Assimilation and Prediction System (LDAPS) provided by the Korea Meteorological Agency, and the G1SST provided by the National Oceanic and Atmospheric Administration (NOAA). The feature vectors used for modeling consisted of 59 elements, which were made by using temperature, water temperature, precipitation, solar radiation, wind direction and wind speed. Only a very small number of red tide cases can be collected compared to the case of no red tide cases. Thus, an imbalance data problem arises in the data set. To overcome this imbalanced data problem, we used adding noise after oversampling to data of red tide occurrence to solve the difference of data between two classes.The data set is divided into 8 : 2 to prevent over-fitting and 80% is used as the learning data. The remaining 20% was used to evaluate the performance of each model. As a result of evaluating the prediction performance of each model, the multilayer neural network model showed the highest prediction accuracy.