DEEP CONVOLUTIONAL NEURAL NETWORK FOR AUTOMATIC DETECTION OF DAMAGED PHOTOVOLTAIC CELLS

The number of distributed Photovoltaic (PV) plants that produce electricity has been significantly increased, and issue of monitoring and maintaining a PV plant has become of great importance and involves many challenges as efficiency, reliability, safety, and stability. This paper presents the novel approach to estimate the PV cells degradations with DCNNs. While many studies have performed images classification, to the best of our knowledge, this is the first exploitation of data acquired with a drone equipped with a thermal infrared sensor. The experiments on “Photovoltaic images Dataset”, a collected dataset, are presented to show the degradation problem and comprehensively evaluate the method presented in this research. Results in terms of precision, recall and F1-score show the effectiveness and the suitability of the proposed approach.


INTRODUCTION
In the near future, a key challenge will be represented by the use of renewable energy sources (RES), whose application is increasing in demand due to the environmental and climatic challenges and the recovery of global economy.Renewable energy sources will represent the only alternative to limit fossil fuel usage and pollution.World leading energy outlooks from International Energy Agency (IEA), U.S. Energy Information administration (EIA) and BP are showing the trends in the energy sector with significant increase in renewable energy conversion and utilization for the future (IEA, 2016), (EIA, 2016), (BPp.l.c, 2016).
The number of distributed Photovoltaic (PV) plants that produce electricity has been significantly increased, and most of the installations are becoming decentralized.In this sector, in the last five years, Europe itself has experienced one of the largest growths: the electricity generation by RES in general, and PV in particular.Thus, issue of monitoring and maintaining a PV plant has become of great importance and involves many challenges as efficiency, reliability, safety, and stability (Grimaccia et al., 2015), (Ackermann et al., 2015).
Monitoring the state of health of a system is crucial; in fact, detecting the degradation of solar panels is the only way to ensure good performances over long periods of time (Jordan et al., 2018).Beside avoiding waste of energy, the reason for maintaining a correct functional status of a plant is also economic.Indeed, the degradation of long-term performance and overall reliability of PV plants can drastically reduce expected revenues.PV plants are more and more extensive, composed by thousands of modules, potentially affected by the following fault types at: 1. Module level: mismatches between modules, shading, glass breakage, busbar failure, diode failures, delamination, broken interconnects and hot-spots (Pieri et al., 2017).
There are many methods proposed in the literature in the last decade regarding the identification of the damaged panel, and spans from electrical diagnostic, statistical inference from monitored control units, shading detection and so on (Drews et al., 2007), (Silvestre et al., 2014), (Chouder and Silvestre, 2010).Although standard monitoring approaches proved to be effective, they are able only to ensure power losses detection in a portion of the PV field, while the accurate localization of faulty modules requires strings disassembling, visual and/or electrical inspection.The above-mentioned techniques, and in particular electrical diagnosis, are time demanding, cause stops in the energy generation, and often require laboratory instrumentation, thus resulting not cost effective for frequent inspections.Moreover, it should be noted that PV plants are often located in inaccessible places, making any intervention dangerous.Therefore, the safety of operation deeply impacts on the maintenance costs.
In this regard, a strong contribution was given by the recent diffusion of unmanned aerial vehicles (UAV), equipped with a thermal infrared sensor, making this technique widely accessible and affordable and becoming a de-facto standard for data acquisition to inspect the PV system to detect faults (Quater et al., 2014).
Regardless the reason of the fault, malfunctioning PV cell has a higher temperature than adjacent normal cells; it is hence quite easy to detect them with a drone equipped with a thermal infrared sensor by drone.The challenge here is to expedite the process of detecting these anomalies, which could be very time consuming and depending on the visual interpretation of the operator.A current practice in fact, is to inspect frame by frame the video stream of the thermal camera.
The recent literature presents different approaches to inspect PV fields, as well as automatic tools for image processing and fault The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium "Towards Photogrammetry 2020", 4-7 June 2018, Riva del Garda, Italy detection and classification (Malof et al., 2016).Given the above and in line with recent research trends, this paper outlines the novel approach to estimate the PV cells degradations with DCNNs.While many researchs in literature have performed images classification, to the best of our knowledge, this is one of the first exploitation of data acquired with a drone equipped with a thermal infrared sensor.The experiments on "Photovoltaic images Dataset", a newly collected dataset, are presented to show the degradation problem and comprehensively evaluate the method presented in this research.The DCNN that we select is based on the VGG16 network architecture (Simonyan and Zisserman, 2014), because it is the suitable solutions to image classification (Paolanti et al., 2017a), (Paolanti et al., 2017b), (Sturari et al., 2017).The dataset is comprised of a total amount of 3336 images, captured with a Skyrobotic SR-SF6 drone, equipped with a Flir Tau 2 640 thermo camera.Images were acquired with nadir configuration, keeping an average flight height of 50 m.To obtain the ground truth of the collected pictures, the true cell degradation has been manually estimated by human annotators, thus providing a more precise and less noisy dataset.
To briefly summarize, the main contributions of this work are: (i) a demonstration that Deep Learning architectures can be applied for detection of PV cells damaged; (ii) a challenging new dataset of thermal images collected by drone equipped with a thermal infrared sensor, hand-labelled with ground truth; (ii) performance comparison of different data collection for thermal image classification.
Paper is organized as follows: Section 2 gives the description of the related work, Section 3 gives the insight to the method used in this analysis, Section4 gives the experimental results.Future works with the conclusions are presented in Section5.

RELATED WORK
In recent years, several authors have proposed the use of UAVs for inspection activities (Hallberg et al., 1999), (Ollero and Merino, 2004).The current practice adopted by the majority of PV plants owner is to perform inspection sporadically, with random criteria and without controlling the overall health of the installation.Visual inspection and output measurement methods can be used for fault diagnosis in PV panels with reduced output efficiency (Quater et al., 2014).Recently, methods for automated fault detection by combining aerial photogrammetry with computer vision technology have been developed and two technologies are mainly involved in this process: the automatic extraction of the ROI (region of interest) of a PV array field from the given images and the automatic diagnosis of defective panels based on extracted PV panel areas (Kim et al., 2017).
In (Tsanakas et al., 2015) and in (Rogotis et al., 2014),the authors have presented methods for the ROI extraction from terrestrial thermal infrared image sequences, applying image segmentation techniques (Gonzalez et al., 2004).
The Canny edge operator (Canny, 1986) and image segmentation techniques are also used for the development of an algorithm for panel area extraction from thermal infrared images captured with a UAV in (Kim et al., 2017).However, the area extraction method using the Canny edge operator did not lend itself well to creating a single polygon for the panel area due to noises within and outside the panels.Instead, the image segmentation-based area extraction method has created polygons of individual panel areas, albeit limited by imperfect linearity.As well Tsanakas et al. have adopted the Canny edge operator for the design of a method to identify the location of hot spot cells on a PV panel (Tsanakas et al., 2015).In this study the authors have employed a method for comparing the intensity characteristics of individual panel area polygons for the algorithm development.Another study for the panel fault diagnosis was performed by Kim et al. (Kim et al., 2017).They have proposed an algorithm capable of automated PV panel fault diagnosis using intensity-related statistical values of each panel based on extracted panel area polygons.
An additional fault detection and diagnostic method is based on hardware redundancy, in which several similar subsystems undertake the same task.The detection of anomalies is done by collecting and analysing each subsystems data (Daliento et al., 2017).Another possibility in monitoring and diagnosis is based on electrical parameters such as Performance Ratio considering comparison between arrays (Bizzarri et al., 2015), or strings by comparing currents (Baba et al., 2013), inferential tool, returning information about the operation of the PV field (Cristaldi et al., 2015), dataset of observed string currents and voltages and their respective low-pass-filtered time derivatives (Ben-Menahem and Yang, 2012), or I-V curve of a single string using information from the inverter (Davarifar et al., 2014), etc.A common problem of all these monitoring techniques is the large amount of data to be analyzed.
More sophistication algorithms and methods for fault detection and diagnostic are required,since PV array characteristics are highly nonlinear.Nowadays, the research is moving towards the direction of the use of artificial intelligence and data mining.Basically, these methods can be split into three categories: signal processing methods, classification methods, and inference methods.Signal processing methods are extracting some features of the measured signals, important for a particular state of health of the PV system.Two most commonly used methods are wavelet transform techniques (il Song Kim, 2016) and Fast Fourier Transform (FFT) (Momoh and Button, 2003).The classification methods instead are artificial intelligence based,and the knowledge is built from an available dataset.Supervised algorithms are first being trained and then on the basis of large labeled data supervised learning algorithms can learn the characteristics of the system and make the prediction.Artificial neural networks (ANN) have been proposed for PV systems: working under partial shading conditions (Nguyen et al., 2009); for the monitoring and supervision of health status of a PV system in (Riley and Johnson, 2012); and for short-circuit fault detection of PV arrays in (Syafaruddin et al., 2011).In other works, Bayesian networks (Coleman and Zalewski, 2011), and fuzzy logic (AbdulHadi et al., 2004) have also been successful in estimating PV output or perform fault diagnoses.Data mining methods for fault detection and isolation in PV systems have also been proposed in the literature, such as decision-tree method (Zhao et al., 2012), K-nearest neighbor, support vector machine (SVM) (Yi and Etemadi, 2016), etc.
Artificial neural network (ANN) models were shown to closely match performance of array models in PV systems (Riley and Venayagamoorthy, 2011).This learning technique is requiring only basic system monitoring hardware.Advantages of using the ANN in PV health monitoring system are: no need for a priori information of the system components or topology to accurately model the output power, it can monitor the degradation of the system over its lifetime, it can indicate catastrophic failures by monitoring the degradation rate.

METHODS
In this section, we introduce the framework as well as the dataset used for evaluation.We use especially trained DCNN for cells degradations.The framework is depicted in Figure 1.Further details are given in the following subsections.The framework is comprehensively evaluated on the "Photovoltaic images Dataset", a proprietary dataset collected for this work.The details of the data collection and ground truth labeling are discussed in subsection 3.2.

Network Architecture
The DCNN aims at providing information about the cells degradation and is therefore trained with image labels indicating the anomaly in the images.The training is performed with a VGG-16 net (Simonyan and Zisserman, 2014).The VGG-16 network is chosen for its relative ease of implementation and its success in the ILSVRC-2014 competition where it placed first in the 2a challenge.The VGG-16 is a very deep, 16-convolutional-layer network that is originally trained on the ImageNet database consisting of millions of labeled images in 1000 classes (Krizhevsky et al., 2012).The model was developed in Keras, a high-level neural networks library, written in Python and capable of running on top of either TensorFlow 1 .The VGG-16 network consists of 5 convolutional blocks with corresponding output filter sizes [54,128,256,512,512] and then a fully-connected classifier.Our implementation for VGG16 net follows the practice in (Simonyan and Zisserman, 2014).The image is resized to 224 × 224 pixel.Since there is not batch normalization layer in VGG16, input images are been normalized.We use the stochastic gradient descent (SGD) optimizer with a mini-batch size of 2. After preliminary experiment the learning rate is fixed to 0.001.We use a weight 1 https://www.tensorflow.org/decay of 0.0001 and a momentum of 0.9.A small learning rate is used to prevent wrecking the previously learned feature weights.
The network is initially trained with a binary cross entropy (BCE) for 10 epochs.

Photovoltaic images Dataset
A thermographic inspection of a ground-based PV system has carried out with a power of approximately 66 MW in South Africa.The thermographic acquisitions were made in 7 working days, from 21 to 27 January 2017 with sky predominantly clear and with maximum irradiation.This situation is optimal to enhance any abnormal behaviour of the entire panels or portion of them.
The analysis was carried out with a constant flight altitude of 50 meters with respect to the surface of the panels.A Skyrobotic SR-SF6 drone equipped with Thermographic Payload (Table 1) with FLIR TAU 2 infrared camera.
The "Photovoltaic images Dataset" is composed of 3336 thermal images collected as follows: • 811 images with dameged PV cells; • 2525 images with normal PV cells.
The ground truth of the collected pictures has been manually estimated by human annotators, thus providing a more precise and less noisy dataset.It has been shown in (Timofte et al., 2013), (Timofte et al., 2014) for neighbor embedding methods and anchored regression methods and in (Dong et al., 2014), (Dong et al., 2016) for the convolutional neural networks-based methods that more training data results in an increase in performance up to a point where exponentially more data is necessary for any further improvement.Regarding the image classification literature (Chatfield et al., 2014), we consider also the flipped and rotated versions of the training images.If we rotate the original images by 90, 180, 270 and flip them left-to-right and top-to-bottom (see Figure 3 and (Freeman et al., 2000)), we get more images without altered content.In particular, in case of flipped images we obtain a new unpdated dataset of 4866 images (2433 normal and 2433 damaged).In case of rotated images we obtain 6488 (3244 normal and 3244 damaged).

Performance Evaluation Metrics
To evaluate the performance of the algorithms the following metrics are employed: • Accuracy: approximates the effectiveness of the algorithm by showing the probability of the true value of the class label (Sokolova et al., 2006): where t p is the number of true positives and f n the number of false negatives.
• Recall: is a function of its correctly classified examples (true positives) and its incorrectly classified examples (false negatives).
• Precision: is a function of true positives and examples incorrectly classified as positives (false positives).
• F1-score: is a measure of a test's accuracy.
The F1-score is evenly balanced when β = 1.It favours precision when β > 1, and recall otherwise.The F1-score can be interpreted as a weighted average of the precision and recall.
• Support: is the number of occurrences of each class in ground truth (correct) target values.
The use of a confusion matrix can also be insightful for analyzing the results of the model.Infact, the information about actual and predicted classifications done by system is depicted by confusion matrix (Kohavi, 1998).Confusion matrix is a specific table layout allows visualization of the performance of an algorithm, where each column of the matrix represents the instances in a predicted class, and each row represents the instances in an actual class.

RESULTS AND DISCUSSION
In this section, the results of the experiments conducted on "Photovoltaic images Dataset" are reported.This dataset presents unbalanced data, i.e. the classes are not represented equally.This is a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes.Since classification algorithms tend to produce unsatisfactory classifiers when faced with unbalanced datasets (Chawla, 2009), in addition to the performance of the DCNN for automatic inspection, we also present the performance with the flipped and rotated (by 90, 180, 270) versions of the training images.In this way, we increase the quality of the dataset and ensure the validity of the experiments.
In fact, dealing with unbalanced datasets entails strategies such as improving classification algorithms or balancing classes in the training data (data preprocessing) before providing the data as input to the DCNN.The main objective of balancing classes is to    the different classes reveals that both normal and anomaly can be recognized.
Table 4, Table 5 and Table 6 report the results of the VGG-16 performance with data augmentation.As can be seen, the performance of all classifiers is good, thus demonstrating the effectiveness and the suitability of the proposed approach.The performance of the classification with images rotated and flipped is much higher than the performance of the flipped images classification but slightly lower than the classification performance of the dataset with images rotated.This comparison shows that recognizing the anomaly in images with unbalanced dataset is challenging.
The following Tables show the confusion matrices of the applied VGG-16 network.

CONCLUSIONS
Monitoring and maintaining a PV plant represent a challenging but rewarding task and detecting the degradation of solar panels is the only way to ensure good performances over long periods of time.In this paper, we introduce a deep learning approach for estimating the PV cells degradations.The DCNN aims at providing information about the cells degradation and is therefore trained with image labels indicating the anomaly in the images.
The training has been performed with a VGG-16 net, which has been chosen for its relative ease of implementation and its effectiveness in case of image classification.The approach is able to achieve high precision and recall for damage classification.The experiments on the "Photovoltaic images Dataset" yield high accuracies and demonstrate the effectiveness and suitability of our approach.Further investigation will be devoted to improve our approach by employing a larger dataset and extracting additional informative features such as conductibility.Moreover, other DC-NNs are trained and their performances are compared.
Figure 2 shows two examples of pictures of "Photovoltaic images Dataset".The images have a dimension of 640 × 512 and contain temperature values, therefore these images have single band.Temperature accuracy in files is 0.01 Celsius.The temperature of the images range from −19.27 o to 103.33 o with a mean of 45.71 o and a standard deviation of 8.96.

Figure 3 .
Figure 3. Figure 3a is image seen by input net after resizing of original image data.Figure 3b-3c are image rispectively up-down and left-right flipped from resized image.Figure 3d-3e-3f are image rotated rispectively of 90, 180 and 270 degree from resized image.either increasing the frequency of the minority class or decreasing the frequency of the majority class.This is done in order to obtain approximately the same number of instances for both the classes.Five dataset are evaluated: unbalanced "Photovoltaic images Dataset", balanced "Photovoltaic images Dataset", "Photovoltaic images Dataset" with flipped images left-to-right and top-to-bottom, "Photovoltaic images Dataset" with rotated images by 90, 180, 270, "Photovoltaic images Dataset" with both flipped and rotated images.

Table 2 .
As expected, the values of precision and recall are not very high because we deal with unbalanced data.
In fact, the metrics reflect the underlying class distribution and for the normal images we obtain high values of precision, recall (such as 100%) and F1-score.

Table 3
depicts precision, recall and F1 score of the balanced dataset.The performance of the network is good and comparing

Table 4 .
Balanced Dataset with data augmentation (flipped images)

Table 5 .
Balanced Dataset with data augmentation (rotated images)