BUILDING ROOFTOPS EXTRACTION FOR SOLAR PV POTENTIAL ESTIMATION USING GIS-BASED METHODS

Green energy is increasingly used due to the lack of traditional resources and the increase in environmental pollution, which badly affects our planet in all aspects of life including air, plant life, seas, oceans, etc. In this context, buildings’ rooftops extraction approach for photovoltaic (PV) potential estimation is presented into two main phases. First, rooftops detection from satellite images using image pre-processing techniques and a machine learning algorithm. The pre-processing steps include gamma correction, shadow, vegetation masking, kmeans, and connected components. Support Vector Machine (SVM) algorithm is then applied to extract rooftops. Second, using two GIS-based methods, PVGIS and Solar Analyst Tool in ArcGIS, for PV estimation. Satellite images for a part of Madinaty city in Egypt were used to evaluate our approach. The accuracy assessment of SVM expressed by the precision and recall were 95.7% and 90%, respectively. The identifiable rooftops in the image were 112 rooftops with a total area of 26,131 m. The annual PV potential area was estimated to be 9.3 and 8.7 MWh/year using PVGIS and Solar Analyst Tool, respectively. PVGIS was more accurate as it uses more recent data from solar databases that exist in Africa. On the other hand, Solar Analyst Tool was less accurate as it depends on a digital elevation model with a resolution of 30 m. According to our calculations, the electric energy and the amount of CO2 emission were compensated by an annual average value of 48% for using solar panels instead of the traditional sources of energy.


INTRODUCTION
1 Coal, Oil, natural gas, and other fossil fuels were the main pillars in the industrial revolution all over the world. However, using fossil fuels led to global warming that affects our planet harmfully in all aspects of life. Fossil fuel is becoming scarce in many countries including Egypt, while the rate of population growth is continuously increasing. Annual population growth in Egypt is currently at a rate of 2% per year as of 2019 according to the World Bank collection of development indicators (Worldometers, 2020). More than 90% of Egypt's main sources of energy are natural gas or oil. Therefore, with this annual population growth rate and increasing demands, Egypt faces a very fast declining rate of traditional resources. According to the Cairo Demographic Centre: Egypt's population is expected to reach 110 million by 2031 and 128 million by 2051 (Comsan, 2010), while the rate of increase in fossil fuel production is very slight.
Besides, CO2 emission in Egypt in 2015 was ranked 27 th of the world, the share of CO2 emission per capita was 2.7 tons and a percentage of 48.3% of the total CO2 emissions in Egypt was due to the electrical power generation (Abdallah and El-Shennawy, 2017). Thus, green energy in the form of solar energy, wind energy, and other forms contribute to solving both issues in Egypt by increasing the dependence on renewable energy sources and decreasing the production of CO2. Egypt is a country with high solar energy potential. It belongs to the global Sun Belt and is in an advantageous position with solar energy. According to the solar atlas (Panagiotis Kosmopoulos, Stelios Kazadzis, 2018), Egypt enjoys Direct Normal Irradiance of 1970-2800 kWh/m2 with sunshine hours of 9 to 11 hours a day all year.
As a means of harnessing solar energy, the use of photovoltaic panels (PV) is the most popular way to obtain solar energy which can be collected and converted to electricity. PV panels have become the cheapest source of electrical power in regions with high solar potential. A photovoltaic system employs solar * Corresponding author modules, each of which consists of several solar cells, which generate electrical power. PV installations may be groundmounted, rooftop mounted, wall-mounted, or floating. The mount may be fixed or use a solar tracker on a single or dualaxis to follow the sun across the sky.
To get the PV potential, the areas of the rooftops, on which the PV panels will be mounted, are first calculated from the downloaded satellite image. Some methods depend on the manual digitizing of rooftops using GIS software packages. For instance, Chow, Li and Fung (2016) digitized buildings' rooftops and then multiplied the gross areas of rooftops by different PV factors to obtain the usable area based on the roof type, structural adequacy, shading, and other factors. Carl (2014) digitized a sample of the rooftops to build a relationship with the tax map key parcel data, which were available for the author together with the buildings' shapefiles in the study area using ArcGIS. The results were then introduced to the Solar Analyst tool in ArcGIS to obtain the average solar radiation on the rooftops.
On the other hand, rooftops automatic extraction methods have been widely used instead of manual digitizing. These methods depend on data segmentation and using machine learning techniques to successfully extract the object of interest from different segments (Baluyan et al., 2013;Ghanea, Moallem and Momeni, 2014;Joshi et al., 2014). For instance, Ghanea, Moallem and Momeni (2014) used k-means clustering for segmentation, where k value was chosen to be 2 to get a binary image with the 'semi-building' and 'non-building' layers. Sub clustering is then conducted using fuzzy c-means to segment the 'semi-building' layer into 'buildings' and 'non-buildings'. After that, region growing was used to form buildings and a decision tree classification algorithm was applied to divide the layers and extract the 'buildings' with an overall accuracy of 80%. Baluyan et al. (2013) carried out some pre-processing algorithms, including bilateral filtering to remove the noise. Thus, the edges could be preserved for facilitating the segmentation process using k-means clustering Support vector machine (SVM) was then used to extract the buildings with a precision of 93%.
Besides ArcGIS that was used for the modeling of solar radiation for the study area as mentioned in (Carl, 2014), PVGIS is used by several studies as it is a reliable free web application that can be used to estimate the average solar radiation over a specific study area. Tarai and Kale (2016) used PVGIS to produce a rasterized image of PV potentials for the region of Odisha in India with an accepted accuracy to be used for decision-making for individual PV projects and policymaking of the state. Konstantinos Mardikis et al. (2014) studied many locations including Cairo in Egypt, whereas the obtained PV potential using PVGIS was compared to a real operational PV station with a variation of only 10%.
Our goal in this study is to use PV panels mounted on rooftops for energy estimation that can be produced from solar energy available in the study area by the following two stages: (1) Estimating available rooftops areas through the automatic extraction of rooftops from satellite images using Google Earth Pro.
(2) Modelling solar radiation and calculating PV potential for the entire study area using two popular approaches: PVGIS and Solar Analyst tool in ArcGIS.

STUDY AREA AND DATA
A part of Madinaty city in Cairo, Egypt is used as the study area in this research. Madinaty was specifically chosen as it is one of the first cities in Cairo to be well planned and organized and has the potential to reach our goal of the first green city in Egypt. Besides, the good planning of this city was the basis for the planning of other new residential cities and compounds in Egypt. Its coordinates are Latitude: 30° 04' 53.81'' N and Longitude: 31° 38' 21.62'' E. The image of the study area is an RGB image downloaded using Google Earth Pro as indicated in Figure 1. The resolution of the image is 4800*3463 pixel 2 which corresponds to 697*500 m 2 . The total area of the study image is about 348,500 m 2 including the streets and other nonbuilding objects.

METHODOLOGY
Our methodology is divided into two main phases; buildings' rooftops extraction and PV potential estimation. PV potential depends on rooftops usable areas which are used for the mounting of the PV panels after subtracting the shadows from neighbouring buildings, trees, and obstructions already existing on rooftops; and global, direct, and diffused solar radiation on monthly periods to determine the annual average irradiation. The following subsections explain the methodology in details.

Data Preparation
The satellite image was prepared to extract all buildings' rooftops. A sample of the rooftops pattern was used as a training dataset for the machine learning (ML) technique, to be applied for the testing process. All buildings' rooftops were first digitized using ImageJ software. A sample was then taken from the 'buildings' patterns available in our study area and five features for each digitized building were calculated as follows.
• Area: indicates the area of a building in pixels.
• Mean grey value: represents the mean grey value for a 'building'. • Standard deviation: indicates the variance in the mean grey values of a 'building'. • Roundness: represents the ratio between the area to the square of the perimeter of the area. According to (Sirmaçek and Ünsalan, 2009), the following equation describes the roundness: The value of the roundness ranges between 0 and 1. The roundness value of 'buildings' is expected to be close to 1.
• Major_minor axis ratio: indicates the width to length ratio defined for the 'buildings'. It is expected to be close to 1, while elongated objects are expected to be close to 0. Figure 2 indicates the data preparation process.

Figure 2.
Preparation of the reference data by digitizing using ImageJ software. Figure 3 shows rooftops extraction workflow. First, extensive image enhancements were introduced. K-means clustering was then used to perform the segmentation task by dividing the pixels into different classes. After that, polygon formation was carried out by the connected components algorithm. Finally, SVM was applied to extract the 'buildings' polygons from the different object polygons in the image after training it with the reference dataset. SVM was used in this research due to its high ability to introduce an optimum hyperplane that can separate between inliers and outliers. It also can deal with multidimension spaces. Besides, it contains many kernel functions to facilitate the separation. Lastly, it is memory efficient compared to other techniques.

Image Segmentation:
Before image segmentation, some enhancements were conducted sequentially as described below.
• Gamma correction was applied to enhance the contrast difference between different objects in the image. The image was converted to the LAB colour space. LAB color space is a 3-axis color system with dimension L for lightness regardless of the color properties, while A and B are the color property dimensions. Working with the LAB colour space includes all colours in the spectrum, as well as colours outside of human perception (Lab Color -MATLAB & Simulink, no date; Bertalmío, 2020). When L is separated, the difference in brightness becomes clearer. • Shadow masking was applied, whereas shadow regions were enhanced using a shadow index given by the following formula.
Where B is the pixel value in the blue channel and G is the pixel value in the green channel. The enhanced shadow regions were masked to optimize the processing time and facilitate the extraction of objects of interest.
• Gaussian blur was used as a noise reduction and smoothing filter followed by mean-shift segmentation that blends the color of the same object. • Vegetation masking was applied to the results of the meanshift segmentation. This was conducted based on the HSV color space where the range of the vegetation hue was detected and was masked after that to decrease the detection errors by focusing on our object of interest 'buildings' and to decrease the processing time. These enhancements were essential for the segmentation step using k-means clustering, which was used to cluster the image into different classes. K was chosen to be 5 after many trials and errors in which k=5 provided the best results. Therefore, the chosen number of classes was five classes. The segmentation results are shown in Figure 4.

Features Extraction:
After the segmentation, the connected components algorithm was used to form polygons for each candidate region in all classes. Two sequential steps were applied which are the seed point generation followed by region growing. The seed point generation was done first in regions with similar intensity using the following equations: Where 'rgn' is a test region and I(xi, yi) is the intensity value of the i-th point of that region. After that, region growing began from the seed, followed by a connectivity value chosen to be Cv = 8. The polygons were formed with their features specified as the reference data to contain (area, mean grey value, standard deviation, major-minor axis ratio, and roundness).

Machine Learning:
Having both the polygons formed after the segmentation and the polygons for the 'buildings' in the reference data, which were introduced to the ML as the training dataset. SVM was applied for rooftops detection as mentioned before. The performance of SVM is highly influenced by the choice of the kernel function. Based on the characteristics of our data, the Gaussian radial basis function (RBF) kernel function revealed the best results (Shashua, 2009). Thus, this was the best kernel in our case. The RBF kernel is given by the formula below.
Where (x-y) 2 is the Euclidean distance between x and y, σ is the variance, and the term 1 2 2 is equal to Ɣ which is a hyperparameter that can be changed in the RBF kernel.
The k-fold cross-validation was conducted on the training dataset to represent the performance accuracy for the training dataset, which in turn was used to fit the final model. The training dataset was split into k smaller sets. Then, the model was tested on the k-1 percentage of the training dataset and the resulting model was validated on the rest of the data. The crossvalidation output was written in the form of a scoring parameter with a confidence interval. The scoring parameter can be easily explained as the mean of all the accuracies saved for each split and the confidence interval is the standard deviation of the mean accuracy.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-M-3-2021 ASPRS 2021 Annual Conference, 29 March-2 April 2021, virtual The output rooftops were assessed using the precision and recall metrics as follows: Where, True positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) represent the number of correctly identified rooftops, number of incorrectly identified rooftops, number of correctly rejected rooftops, and number of incorrectly rejected rooftops, respectively.

Solar PV Potential Estimation
This section focuses on the estimation of PV potential by mounting the PV panels on rooftops using the available rooftops' area. The estimation of the PV potential mainly depends on two steps: (1) the rooftops' areas available for the PV array installation and (2) the inputs required for the calculation of the PV potential.

Rooftops Areas Preparation:
The final result of the ML stage is a spreadsheet with the areas of the detected rooftops. Some obstructions were present in the form of the elevator's rooms and the inter-row shading between the PV rows. A sample of the rooftops' areas and their corresponding elevators rooms were digitized. It was found that the elevator room areas represent an average of 3% of the total rooftop area. Therefore, 3% of the total areas of the rooftops were subtracted. The inter-row shading between the rows of PV modules was considered to reduce the losses induced from shadows on the PV potentials. The effective distance covered by PV arrays was determined by a ratio between the active area (service area) to the ground area called ground coverage ratio (GCR). The GCR was calculated knowing the tilt angle of the PV modules, which was chosen to be equal to the local latitude of the study area to ensure the most optimum average annual PV potential for a fixed module. Knowing the tilt angle of the modules and the default shading derate factor (from 2~3%), the GCR could be calculated from Figure

PV Module Parameters:
The parameters of the PV module were set as follows.
• Rooftop area determination: Rooftops areas from the image were extracted after using a series of image pre-processing and SVM and after removing the obstructions as aforementioned. • Solar irradiance: Solar resource data is needed to obtain the solar radiation for our area. The choice of the databases that will be used as the resource data is determined by the input location of the study area and the method used for solar modelling (i.e., PVGIS and ArcGIS). The solar irradiance across the expanse of the study area was kept uniform because of the relatively small size of the study area. • Module and array type: Renesola JC320S-24, a Chinese manufactured monocrystalline solar panel, was chosen in our work. The PV array was chosen to be fixed as it has low associated acquisition and maintenance costs. The efficiency of the module is 19.67% at standard test conditions (STC). STC is defined as the solar irradiation of one kilowatt (kW) per square meter (1000 W/m 2 ), a module temperature of 25 degrees Celsius, and standard light spectrum air mass 1.5 (AM 1.5) where AM is the ratio between the path length through the atmosphere to the path length of the solar radiation at zenith (normal to the earth's surface at sea level) (Würfel, 2016). • Array tilt: The tilt angle is the angle of the inclination of the PV module array measured from the horizontal direction. According to (Amin S., Hanania J., Stenhouse K., Yyelland B., 2018), the optimum tilt angle for a fixed array is equal to the local latitude of the location to optimize the average annual production of solar irradiance. Also according to (Masters, 2004), choosing the tilt angle of a fixed PV module to be equal to the local latitude of the study area produces the best average annual production of solar radiation. The tilt angle was chosen to be equal to the location's latitude which is 30° to produce a better average annual yield. • Array orientation (azimuth): A solar panel can collect more energy when the sun rays are perpendicular to it. In terms of annual potential, an average direction was chosen to get the optimum average annual PV potential. Based on practical recommendations according to the energy education section by the University of Calgary (Amin S., Hanania J., Stenhouse K., Yyelland B., 2018): in the northern hemisphere, the general rule for solar panel placement is that: Solar panels should face true south (and in the southern, true north) for flat rooftops. Usually, this is the best direction because solar panels will receive direct light throughout the day. • System size: The current produced from using PV panels is direct current DC. The DC system power rating in kW, at STC, was determined using the following equation (Ntsoane, 2017): System Size = Array Area * 1 kW m2 * module efficiency (6) Where the total area occupied by the array is the usable rooftops' areas in m 2 and the module efficiency is 19.67%.
The output is calculated using an equation that was primarily deduced from Súri et al. (Šúri, Huld and Dunlop, 2005). The equation depends on the average annual solar radiation based on the entered parameters and produces a near estimation of the electrical potential. This equation is given below as follows: Where: E: is the yearly potential for electricity production in kilowatthours (kWh), Pk: is the peak power of the equipment installed in kilowatts (kW) rp: is the system performance ratio or derating factor Hh,i: is the yearly average of daily global radiation in kWatthours (kWh/year). The system losses produced from various effects were considered in the form of a factor that decreases the output AC rating of the PV. This factor is a derate factor and can be referred to as performance ratio or conversion coefficient that decreases the total output of the PV module based on the induced losses. The total system losses chosen here was 14% with an inverter efficiency of 96% to convert from DC to AC.

Solar Radiation Modelling:
Two methods were used to obtain the solar radiation over the study area; Photovoltaic Geographical Information System (PVGIS) and Solar Analyst Tool available in the ArcGIS software package. PVGIS is a web application that offers the ability to get information about solar radiation and PV performance based on the location provided by the user. PVGIS can be used to calculate the energy mainly in Europe and Africa where its main databases exist but also it can be used to calculate the energy in nearly any location in the world. The solar radiation database is based on the estimation of the surface solar irradiance from satellite images (Huld, Müller and Gambardella, 2012). PVGIS-SARAH database is the most recommended to work with for Africa (EU SCIENCE HUB, 2020). Therefore, for our location in Egypt, we chose to work with the PVGIS-SARAH database.
ArcGIS provided by Esri is one of the most popular geographic information system (GIS) software packages used by many researchers and engineers all over the world. The Solar Analyst Tool in ArcGIS can be used to calculate Watt-Hours/meter 2 at the earth's surface at a local scale. The data required are a digital elevation model (DEM) and the local latitude of the study area. The calculation of the global solar radiation over a certain location with its different forms is equal to the sums of the direct and diffuse solar radiation over the specified location. Direct solar radiation is considered the largest component of global radiation, and then diffuse radiation is considered the second-largest component.
A series of equations are defined on Esri's website in the ArcGIS section, which is used to calculate the direct and diffuse solar radiation (How solar radiation is calculated-ArcGIS Pro | Documentation, 2020). The radiation can be greatly affected by the topography and surface features; the calculations of the solar radiation depend mainly on the generation of an upwardlooking hemispherical viewshed for every cell in the DEM. A Shuttle Radar Topography Mission (SRTM) DEM was downloaded with 30m resolution and clipped to fit the study area as shown in Figure 6. Then, the DEM was used as an input for the area radiation tool in ArcGIS to obtain estimated solar radiation. The output potential was calculated using equation (7) mentioned above, the performance ratio rp was chosen to be 0.75 which is recommended by Suri in (Šúri, Huld and Dunlop, 2005) as an accepted value for a derating factor to account for all the losses in a PV system and was also very close to the value of rp used by PVGIS.

Carbon Emissions:
The amount of CO2 emission that would be prevented due to replacing fossil fuel with solar energy, and hence the value of CO2 reduced that depends on the potential of solar panels were calculated using the following equation (Ntsoane, 2017).
Carbon emissions = grid emission factor * electricity production (8) The grid emissions factor is the ratio between the CO2 amount emitted to the electricity produced. A grid emissions factor of 0.533 (tCO2/MWh) was the factor we chose to work with for Egypt. It was used to relate the rooftop electricity generated with the accompanying quantity of carbon dioxide emissions, as referred to by the Institute for Global Environmental Strategies (2020), List of Grid Emission Factors version 10.8, available at (Takahashi and Louhisuo, 2021). According to (Reich et al., 2007), the maximum amount of CO2 emitted from using solar panels is 6 gCO2/kWh. The output of the equation is expressed in tCO2e/kWh which is very small compared to the emitted CO2 from using the traditional energy sources.

RESULTS AND DISCUSSION
Extensive image enhancements were applied, followed by image segmentation and finally, SVM was conducted. A sample buildings' rooftops extraction is shown below in Figure 7. The cross-validation results after using the k-folds crossvalidation were in the form of a scoring value that represents the average accuracy of all folds with a confidence interval. The scoring value of fitting the reference data using the SVM was 0.917 with a confidence interval of +/-0.07. The detection accuracy assessment using both the precision and recall metrics after obtaining the results of the SVM was 95.7% and 90% for precision and recall respectively. The detection accuracy was relatively high as SVM is effective in multi-dimension spaces and provides the ability to choose the kernel type where the RBF kernel best represents our data.
After calculating the GCR in the previous chapter, the GCR value was chosen as 50% to consider the effect of inter-row shading. Thus, after subtracting these obstructions, the usable areas for the rooftops were ready for the PV potential estimation. The usable area of the total study area was about 29,381.51 m 2 with an average usable area for a building of 264.5 m 2 . An example of the output rooftops' areas before and after removing all the obstructions is shown in Table 1.  Table 1. An example of the rooftops' areas before and after removing all the obstructions. Figure 8 shows the summary of the calculations applied to areas until reaching the usable area and hence used for the calculation of the PV potentials. The total usable area was 26224.95 m 2 and the module efficiency of the monocrystalline panels used was 19.67%. Using equation (6), we obtained a DC system size of 5158.45 kW. This system size was used as Pk in equation (7) to calculate the PV potential. Using a field survey, the average consumption of a building apartment in the study area is about 500 kW/month. Therefore, throughout the year, the average consumption of a building that consists of six floors with four apartments each equals 12,000 kWh/month and hence equals 144,000 kwh/year. For approximately 125 buildings in the image of the study area, the electric consumption of them using the traditional sources of energy equals 18,000,000 kWh/year. The PV potential and the corresponding carbon emission are calculated for each method. The PV potential, the average compensation of energy, and the percentage of prevented CO2 after using PV panels are summarized in Table 2 Table 2. The results of PV potential using the two methods.
The carbon emitted from using fossil fuels equals 9594 tCO2e/year, while that produced from using PV panels is very minimal. Thus, the CO2 directly emitted from using PV panels is negligible. According to (Konstantinos Mardikis, Nikolaos Katsikas, Constantinos S. Psomopoulos, 2014), and (Psomopoulos et al., 2015), the data of PVGIS is more recent for the African database (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) and it excels in regions located in Europe and Africa. ArcGIS on the other hand depends on solar radiation estimation based on the DEM used for the study area. The DEM we used was downloaded for free from the SRTM database with a spatial resolution of 30m. Therefore, such accuracy may not provide a very accurate solar radiation estimation for our case study.

CONCLUSION
In this paper, the target is reaching a clean and green city for one of the most known and well-planned cities in Cairo, Egypt called Madinaty. We aim to decrease the harmful environmental impact implied by using the traditional energy sources and replacing them with more clean and renewable sources. Solar energy was the focus here which will be harnessed by PV panels. Two popular GIS-based methods were used to estimate the PV potential; PVGIS and ArcGIS using the Solar Analyst Tool. The two methods were implemented after extracting rooftops in the satellite image for the study area downloaded from Google Earth Pro.
For the extraction method, pre-processing algorithms were used depending on computer vision concepts to enhance the image and facilitate the segmentation of the objects in the image. Then, a machine learning technique was used which is SVM to extract the buildings. The detection accuracy of the SVM was determined in terms of precision and recall metrics which were 95.7% and 90% respectively. The gross area of the detected buildings was then extracted considering the effects of losses embedded on the gross area in the form of the obstructions and shadows on rooftops. The usable area was calculated to be 26224.95 m 2 and used for the estimation of the PV potentials.
PVGIS revealed more reliable results of 9.3 GWh/year due to the more accurate database used for the location of our study area compared to the Solar Analyst Tool. The impact of Carbon dioxide is greatly decreased after considering PV systems on the rooftops. Approximately 49% of CO2 emitted from fossil fuels is offset due to the use of clean solar energy using the PV technology that reduced such harmful effects.