SONAR IMAGE RECOGNITION BASED ON MACHINE LEARNING FRAMEWORK

: In order to improve the robustness and generalization ability of model recognition, sonar images are enhanced by preprocessing such as conversion coordinates, interpolation, denoising and enhancement, and the transfer learning method under the Caffe framework of MATLAB as an interface is used respectively (mainly composed of 8 layers of network structure, including 5 convolutional layers and 3 full chain layers) And the transfer learning method under the Python deep learning framework Inception-Resnet-v2 model for sonar image training and recognition. First of all, part of the sonar image dataset (derived from the 2021 National Robot Underwater Competition online competition data), using MATLAB as the interface Caffe framework, the sonar image is trained to obtain a training model, and then through parameter adjustment, the convolutional neural network model of sonar image automatic recognition is obtained, and the transfer learning method can use less sonar image data to solve the problem of insufficient sonar image data, and then make the training achieve a higher recognition rate in a shorter time. When the training data is randomly sampled for testing, the sonar data recognition model based on the Caffe framework is quickly and fully recognized, and the recognition rate can reach 92% when the test sample does not participate in the training of sonar image data; The transfer learning method under the Inception-Resnet-v2 model of python deep learning framework is used to train recognition on sonar images, and the recognition rate reaches about 97%. Using the two models in this paper, it is feasible to identify sonar images with high recognition rate, which is much higher than traditional recognition methods such as SVM classifiers, and the two sonar image data recognition models based on deep learning have better recognition ability and generalization ability.


INTRODUCTION
Recognition sonar is the emission of high-frequency sound waves to the water, when the sound waves hit the underwater obstacles are returned, and then receive the echo, from the echo signal to form a sonar image.At present, the sonar image of the target can be obtained at a distance of several hundred meters, but with the maturity of sonar image acquisition technology, efficient and rapid processing of sonar images and identification of sonar images have become problems that need to be solved at present, and the automatic recognition of underwater targets has become a research hotspot.The traditional sonar image automatic recognition method is mainly to obtain the characteristics of the target image through experimental simulation, and then design a classifier to classify the sonar image.For this method, if you want to achieve a high recognition rate and accuracy rate, you must select a large number of features and a combination of various features, give different weights to adjust the parameters, low generalization ability, change the performance of this method of degrading, and train the manually designed features to the classifier robustness is not high.With the development of artificial intelligence, deep learning methods have been widely used in many fields such as speech recognition, image recognition, image classification, natural language processing, and video analysis.At present, commonly used deep learning methods are deep belief network (DEEP belief network, DBN), convolutional neural network (CNN) and recurrent neural network (RNN), etc., of which the convolutional neural network model is most widely used in the field of vision and image recognition, compared with traditional images, sonar image recognition methods are also similar.It is mainly divided into feature extraction and recognition.In the traditional method of manual design of feature re-recognition method has been replaced by convolutional neural networks and other new algorithms, convolutional neural network methods in the field of optical image recognition is widely used, with the development of the application to radar target detection, but whether it is optical images or radar image training to have a large number of image data, and sonar images do not have a huge database, so the use of convolutional neural networks in sonar images is plagued by the problem of insufficient database.The overall architecture of a CNN consists of a series of stages of the convolutional and pooled layers.Includes local connections, sharing rights, pooling, and multi-tier usage.CNNs learn to extract features from images through their own regular learning, generate unique maps through repeated learning, and finally connect layers similar to existing hierarchical neural networks to produce the desired results.This study uses two methods to train sonar images, first in MATLAB as the framework of Caffe interface, under the basis of the model, in the training process of CNN combined with the transfer learning method, through the adjustment of the network, weight adjustment training to obtain a training model.Secondly, the Python deep learning framework Inception-Resnet-v2 model is used to combine transfer learning to obtain a training model.The migratory network retains a large number of features of the original network, and only the last few layers of network structure need to be fine-tuned to reduce the difficulty of network training and the required image data.
In view of the application potential of CNN, this paper first preprocesses sonar data images to strengthen the contrast between the features and backgrounds of the images, so as to facilitate the subsequent extraction and recognition of features; then, on the basis of the Caffe model, the migration learning method and the Inception-Resnet-v2 model are combined with the transfer learning method to train samples; finally, the same image dataset and different image datasets are tested to obtain the recognition accuracy, error rate and generalization ability of the two methods.

Data sources
The sonar image data comes from the 2021 National Robot Underwater Competition online competition data, the training set is marked with categories, and the target types include cube, ball, cylinder, human body, tire, circle cage, square cage, and metal bucket.The training set is stored in 0-7 folders under the train folder, the name is the category, and the test set is under the test folder.

Build a sonar image:
From the sonar strafing can be seen, the image composed of the echo signal is a flat sector, first of all, each pixel in the plane rectangular coordinate system is converted into a polar coordinate system, and the converted pixels form a fan sonar pixel map, the principle is Figure 1.1.1,assuming that P(x,y) is any pixel point under the cartesian coordinate system, for the pixel points under the corresponding polar coordinates, any point P in the following figure is an example: Figure 1 a is the original pixel data map, assuming that the total distance of the detection of the original data is high h=5, width w= 3, Figure b is the process of output data size conversion from a Cartesian coordinate system to polar coordinates, and the calculation formula is: where start L is the starting length of the sample, length L is the total distance of the probe, and are the length and width of the converted image, respectively.Figure C calculates the coordinates corresponding to each pixel point after the pixel map is converted from a planar cartesian coordinate system to a polar coordinate system, and the calculation formula is： From the above formula, p (x , y ) is the coordinate corresponding to p(x, y) in the polar coordinates.Deriving the coordinates of point P under polar coordinates, equation ( 7

Image interpolation:
After the image is converted by coordinates, the pixels are converted from the Cartesian coordinate system to the polar coordinate system, and due to the rounding operation during the conversion process, some areas are not filled and some pixels are lost.In order for the image to be closer to the original image, and the image is uniform and does not leave blanks, the blank space where the pixels are missing is interpolated using continuous uniform interpolation.First determine whether the interpolated point is within the sector, the calculation formula is: (10) are determined to be within the sector, the interpolation point is determined to be in the sector, and then equation ( 11) calculates the signal strength of the interpolated point.For any , find two nodes adjacent to + , and use formula (11) to interpolate the formula to calculate. (11) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-3/W1-2022 7th Intl.Conference on Ubiquitous Positioning, Indoor Navigation and Location-Based Services (UPINLBS 2022), 18-19 March 2022, Wuhan, China 2.2.3 Image denoising:Sonar image in the process of receiving the signal, will be subject to a variety of different degrees of noise interference, there are many ways to remove noise, this paper uses Lee filtering to reduce the noise of the sonar subgraph, Lee filter directly weights the pixels of the window preprocessing area.Throughout the filtering process, the entire image is first transformed into the same frequency domain, then filtered and then inverted.Since the multiplication and mean squared error of the noise are minimal, the Equation 12Lee filtering algorithm is available： where R represents the original image without noise and n represents noise independent of the distribution of the original image.The letter I indicates the image that is noisy.
Suppose is a linear combination of the means of and , which can be represented by Equation 13(13) Calculate the absolute value of the difference between the

Image data and methods
Since the training of the model requires a large number of image extraction features, in order to avoid overfitting, the image is pre-processed such as data enhancement and denoising, and the preprocessed sonar images are randomly selected as the test set, and the rest are used as the training set, and the target types include cube, ball, cylinder, human body, tire, circlecage.Square cage, metal bucket 8 categories.Enter the picture data (227 * 227 * 3), the first layer of convolution of the input picture such as the picture is convoluted, and then the convoluted picture is activeivation, normalization, and pooling, etc., etc. will not change the image size and therefore will not change.The first two convolutional layers integrate convolution, activation, pooling and normalization, the third and fourth convolutional layers only include convolution and activation, the full chain layer includes activation and dropout operations, the dropout layer weakens the fitting effect of the deep neural network, taking the default value of 0.5, and the output of the final fully connected layer is the SoftMax layer of class 8.Among them, the excitation function uses ReLU, convolution is mainly used to extract the features of each small part of the picture, and the output matrix after the pooling layer input is multiplied by the original data output of the convolutional layer and the corresponding convolution, reducing the number of training parameters, reducing overfitting only retains the useful part, and the output is 8 classes.Sonar images commonly used in the total number of pictures is not large, the use of a small number of training samples to train may cause network overfitting, first of all, the weights of the model are initialized, using the method of migration learning, the convolutional neural network is regarded as a feature extractor, trained on the existing data set, the model with good classification is adjusted, the recognition model of the sonar image feature is obtained, and then the target data set is tested.Using migration learning for sonar image feature recognition, first select a dataset with a large number of labels to train the pre-trained network, move the parameters of some of its layers to the target network, remove the last fully connected layer in the original network, and add a fully connected layer to the migrated network, and train the new network with labeled data to complete the migration.According to the initialization parameters, the image is backpropagated to calculate the error and adjust the weight parameters, and the loss function is minimized by continuous calculation, and the resulting weight parameters are used as the weight parameters of the final model.

Inception module:
Sonar data image pixel position differences, sizes are not equal, it is difficult to choose the appropriate convolutional kernel size, deep networks are easy to overfit, simple stacking of large convolutional layers consume computing resources, so run a filter with multiple sizes at the same level, so the network essence is wider, not editing, so the design of the Inception module, with three filters of different sizes.

Training results of Mat Caffe convolutional neural network and transfer learning model based on image features
The training model and test are all run by MATLAB as caffe interface, and this experiment mainly trains the migration model on the three parameters of the number of trainings, the number of network data submitted each time, and the learning rate.Adjusting the number of submission networks per commit is good for solving the problem of excessive amount of data, and the optimal number of commit networks can achieve the best balance between memory efficiency and memory capacity, and the model performance and speed are optimal.The random picture set assigns each type of picture according to 80% training set and 20% test set, and sets the step to 1 in each iteration to take out loss and accelerator, and the test results are as shown in

The Inception-ResNet-v2 model combines the experimental results of the transfer learning model
The model training and testing are run under Python's deep learning framework, and the experiment is also based on two datasets of sonar images, the two datasets are: training set train and test set test, which are also divided into 8 categories.The migration model is also mainly trained on the three parameters of the number of trainings, the number of network data submitted each time, and the learning rate.Using the preprocessed sonar image with inception-ResNet-v2 network training, and then using transfer learning to obtain the final model, the experimental results are as shown in Figure 12, the results show that the correct rate is 97.9% when iterating 1500 times, and the experimental results show that the overall recognition effect of the model is good.

DISCUSS
In this paper, MATLAB is used as a caffe interface combined with transfer learning and Inception-ResNet-v2 model combined with transfer learning training to obtain two models to identify sonar images, and the accuracy rate is about 92% and 97%, respectively, which is higher than that of existing deep learning and classifiers.MATLAB as a caffe interface combined with the transfer learning model is relatively simple training test time is shorter, more suitable for simple picture recognition, in order to test the generalization ability of different models, the untrained sonar image is tested, the results show that the training model under the Caffe framework has a high recognition rate, and has better generalization ability for sonar data from different sources; Inception-ResNet-v2 model combined with the transfer learning model is more complex and takes longer but has high accuracy.Better solve the problem that the deep network is easy to overfit, the convergence speed is fast, and it has a good recognition effect and robustness.
R and R estimates and the squared and expected values to get the value of the mean squared error of R .Let a and b return to zero, respectively, find the minimum value, calculate the values of a and b respectively, and substitute the values of a and b back to Equation 13, respectively, and the final result is shown in Equation 14: (14) where the value of the coefficient k is calculated by Equation 15: (15) Using the standard deviation factor of image I ， I C as a measure, two thresholds min C and max C are set to achieve segmented denoising.The value of 5 is calculated by Equation 16: (16) The standard deviation factor of I C indicates how different the image is within the current window: when I C is larger, the part may be at the edge of the image or isolated pixels.Therefore, when min I C C  , the image is in a flat area, and the mean filter should be used for filtering; When max I C C  , the edge information of the image is preserved.When the I C value between the two thresholds is small, the noise effect in this part of the image is the largest, and a weighted enhanced Lee filtering method is used to improve the signal-to-noise ratio of the peak.Calculates the weight of the current area of the image and weights the pixel value of the image.The mathematical expression of the entire algorithm is as follows: of the data in the current window, which is the difference between the entire image standard, T is adjustable to adjust the rate of change of , and the value of in this article is 1. 2.2.4 Image enhancemen: After the construction of the sonar image and the interpolation denoising, the sonar image is colorless compared to the optical image.The corresponding value of each pixel represents the intensity of the sonar echo, the stronger the signal, the more obvious the pixel value, the figure is replaced by white, but most of the sonar images have low gray value and poor contrast, which is not conducive to subsequent analysis and processing, so the image will be enhanced.Purposefully emphasize the overall or local characteristics of the image, turn the original unclear image into a clear or emphasize the pixel features we need, expand the difference between the features of different objects in the image, suppress unwanted background features, improve image quality, strengthen image interpretation and recognition effects, and meet the needs of analysis.Image enhancement using histogram linear rollovers, with the formula 20: value of the transformed image.A is the enhancement factor, b is the offset factor, b is often 0, a as far as possible to make the image pixel values evenly distributed and ensure that the high pixel values in the original image are not distorted.

Figure 4 .Figure 5 .
Figure 4. Target type.a is a front-view sonar image, red is the target label box, from left to right is square cage, ball, tire.b is the front-view sonar image, red is the target label box, from top to bottom is the human body, the ball, the circle cage .c is the front-view sonar image, red is the target label box, from left to right is the tire , metal bucket.3.2The MATLAB interface is used to establish the Caffe sonar feature recognition model and the transfer learning modelcaffe is a clear and efficient deep learning framework, is a C++/CUDA architecture, supports command line, python and MATLAB interfaces; can be seamlessly switched between CPU and GPU, Caffe's basic workflow is designed based on a simple assumption built on the neural network, all computation is represented in the form of layers, and what the network layer does is input data and then output the results of the calculations.The caffe command-line interface can be used to learn the model, test the score of the run model, and represent the final result of the network output as a percentage, detect system performance and measure the relative execution time of the model, this command performs model detection by layer-bylayer timing and synchronization.Based on the Caffe framework, this paper uses the migration learning method to fine-tune parameters and establish a sonar image recognition model.The structure of the Caffe Net sonar image feature model is shown in Figure5, which is mainly composed of 8 layers of network structure, of which 5 convolutional layers and 3 full chain layers.

FigureFigure9.
Figure9.Training model results.caffe provides the basic input function, but this article imported the picture with the input itself, in order to test the model, a part of the training data as test data, test the trained network recognition rate, get the results as shown in Figure 10, in the test 33 times has reached 100% accuracy.

Figure 10 .
Figure 10.Training model test results.Save the trained network, randomly read the pictures in the test set under the trained model, the picture size is represented by[width, height], take out the loss and arccuracy of each iteration to draw the image, get the results of Figure11, the accuracy of the model can reach more than 92%, which is about 5% higher than the accuracy of the common deep learning.