ANALYSIS OF VISIBLE INFRARED IMAGING RADIOMETER SUITE CAPABILITY FOR POPULATION ESTIMATION ON JAVA ISLAND

Population data, despite their significance, are often missing or difficult to access, especially in cities/regencies not belonging to the metropolitan areas or centers of various human activities. This hinders practices that are contingent on their availability. In this study, population estimation was carried out using nighttime light imagery generated by the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument. The variable illuminated area was integrated with the population data using linear regression based on an allometric formula so as to produce a regression value, correlation coefficient (r), and coefficient of determination (r). The average r between the illuminated area and the total population was 0.86, indicating a strong correlation between the two variables. Validation using samples of population estimates from three different years yielded an average error of 73% for each city and 7% for the entire study area. The estimation results for the number of residents per city/regency cannot be used as population data due to the high percent error, but for the population on a larger regional scale, in this case, the island of Java, they have a much smaller percent error and can be used as an initial picture of the total population.


INTRODUCTION
Badan Pusat Statistik (BPS-Statistics Indonesia) (2020) stated that the population in Indonesia reached 268,074.6 million in 2019 with a growth rate of 1.31% from 2010. Indonesia has a land area of 1,916,906.77 km 2 (BPS, 2020), while its waters reach 3,257 million km 2 or almost 60% of the total area (BPS-Statistics Indonesia, 2019). The population density is up to 140 people per km 2 (BPS-Statistics Indonesia, 2020). The increase in population from year to year has resulted in challenges such as increasing food needs, water needs, especially clean water, and energy consumption, both renewable and non-renewable (Daily et al., 1998). Furthermore, if the rapid population growth remains unchecked, one of the consequences in the next few years will include urban development, which primarily affects physical features in cities like built-up land and land functions (Kulshrestha, 2007). Therefore, population data are needed to make the latest regulations and policies to control population size and growth rates and, ultimately, minimize the resultant adverse effects (Chowdhury et al., 2012). Fitria (2014) describes the three methods used by BPS-Statistics Indonesia to collect population data, namely population census, population survey, and population registration. The population survey occurring every ten years comprises a series of activities ranging from acquiring data to publishing demographic, economic, and social data from each region (Fitria, 2014). Ajie (2008) defines population survey as population enumeration with a technique of taking samples in an area to be used as a reference for the condition of its population and population registration as * Corresponding Author the recording of population data which the authorities carry out regularly to the lowest level such as neighborhood and urban village. However, the population census, survey, and registration require a lot of effort, energy, cost, and time in their implementation (Priyono, 1992). To overcome these limitations, remote sensing offers image products that can be used to generate time-sensitive data.
One form of remote sensing-derived data is satellite imagery. Each satellite has a different sensor to capture the spectral energy reflected by objects on the earth's surface. For example, the Visible Infrared Imaging Radiometer Suite (VIIRS) is one of the sensors aboard the Suomi NPP satellite that has a Day-Night Band (DNB) to detect human presence, urban settlements, and other activities at night that require lighting (Amaral et al., 2006). Sutton et al. (1997) also explain that remote sensing products, especially nighttime light (NTL) data, are an alternative source of information for identifying urban settlements that can be used indirectly for further analysis, such as the spatial distribution and estimation of populations. Previous studies have proved a good correlation between the light-emitting or lit area and the number of residents. Therefore, linear regression is used to estimate population numbers from the logarithmic form of the allometric formula that mathematically defines population counts as a function of lit areas to find the allometric coefficient and exponent. This method requires that the regression formula be back-transformed to an allometric formula. This study aimed to analyze the application of nighttime light imagery in estimating population size.

Data
The data used were the yearly composite VIIRS DNB images in 2015, 2017, and 2019 with a median data type acquired from EOG Data Mines (HTTP: //eogdata.mines.edu/products/vnl/) and Google Earth Engine. The NTL imagery used in this study has a spatial resolution of 500 meters and a spectral resolution of 0.5-0.9 µm according to the DNB sensor. Data on the number of residents per city/regency in 2015, 2017, and 2019 were obtained from BPS-Statistics Indonesia. Each city/regency in question is part of the island of Java that administratively belongs to the provinces of DKI Jakarta, Banten, Jawa Barat, Jawa Tengah, Jawa Timur, and DI Yogyakarta. Chowdhury et al. (2012) said that NTL data need to be corrected to remove the blooming effect resulting from a fault in the sensor, which can overestimate light emissions in an urban region so that the lit area looks more extended than it should be. This effect can be corrected using a threshold, which, according to Liu et al. (2016), is determined by setting the pixel value of the lowest light intensity in the least developed cities as the minimum value and the highest intensity in the most developed ones as the maximum value of the threshold. In addition, this threshold approach is a trial-and-error method, meaning that it requires several attempts to find the optimal minimum and maximum values.

Lit Area Extraction
The extracted value is not the light intensity but the light-emitting or lit area, which is considered a settlement. The lit area was used as a variable in population estimation because many scholars have proved its strong linear correlation with population number (e.g., Amaral et al., 2006;Sutton et al., 2001;1997;Zhuo et al., 2009).

Samples
The samples in this study were population data per city/regency acquired from BPS-Statistics Indonesia. They were divided into modeling samples to create a population estimation model and validation samples to assess the accuracy of the model generated using the modeling samples. This division followed the ratio of 7:3, i.e., seven modeling samples for every three validation samples. Chowdhury et al. (2012) found a strong linear relationship between the log of the lit area and the log of the population number from previous studies. Therefore, the following logarithmic model was applied (Lo, 2001):

Linear Regression Analysis
The logarithmic model is the result of the transformation of the allometric formula between the built-up area and the total population based on research conducted by Tobler (1969). The allometric formula in question is as follows: The logarithmic model was used for linear regression analysis because the transformed allometric formula has the basic form of a regression formula, as presented below: where Y = dependent variable X = independent variable = intercept = coefficient Harvey (2002) further explained that the coefficient of determination of the regression results does not determine the accuracy of the model; therefore, the above regression formula must be back-transformed to the allometric formula used for calculating the estimated population. The back-transformation from regression to allometric is shown in the formula below (Chowdhury et al., 2012): where Y = estimated population X = lit area = coefficient = exponent = Euler's number (2.71828)

Percent Error Calculation
Linear regression analysis can produce a correlation coefficient, a coefficient of determination, and a standard error of estimate to determine the strength of intervariable correlation in 2015, 2017, and 2019. However, the coefficient of determination generated by the regression analysis does not show the accuracy of the model, warranting the need to calculate percent error (% error) as a better indicator for the overall accuracy (Chowdhury et al., 2012). The percent error was calculated using the following formula: where RE = relative error (% error) X = estimated or expected population = actual population observed 3. RESULTS

Linear Regression Analysis Results
The linear regression analysis used NTL data in 2015, 2017, and 2019 with a divided modeling sample. It produced the correlation coefficient (r), coefficient of determination (r 2 ), and standard error of estimate to define the correlation between lit area and population. Moreover, the average r 2 was 0.865, indicating that the lit area significantly influences 86% of the entire modeling samples. In addition, the 2017 and 2019 data generated the best models, as indicated by their has highest r 2 and smallest standard error of estimate with very slight differences. The similarity shared by both models produced allometric formulas with coefficients and exponents that were not much different (Table 2). This allows the 2017 model to estimate the population in 2019 using only the 2019 lit area data without substantial deviation from the estimation results of the 2019 model itself, which then enables predictions for the coming years using the 2019 model.

Population Estimation
Estimation was carried out using the allometric formulas that had been back-transformed from the regression formulas. Using the allometric formulas, the lit area was integrated with the validation samples to see the accuracy of the model. A % error close to 0 indicates that the population estimated using the model represents the actual population. Figure 2 shows the % error of each validation sample in 2015, 2017, and 2019 for several cities and regencies. The % errors of the cities showed that the expected population was underestimated or smaller than the actual population observed. On the contrary, the population estimates in the regencies were mostly overestimated or much larger than the actual population. The validation results also showed that the highest overestimate was in 2015 in Pamekasan Regency with 156.69% error, while the largest error for the underestimated population, -75.59%, was in 2015 in the City of Jakarta Barat.

Figure 2.
Graphic of percent errors of the population estimation models per year by city/regency. Table 3 groups the estimation results, both underestimated and overestimated, of the cities/regencies into their respective provinces. At the provincial level, the % errors were generally smaller than at the city/regency level because the samples used were at the city/regency level. Given that a province covers a larger area than a city/regency, these results suggest that predictions at a higher regional level also produce population estimates with higher accuracy. Also, based on For the entire island of Java, the estimated population in 2015 had a 17.14% error, with a median of 42.13% and a mean of 50.44%. Compared with this, the 2017 developed model was better in that it showed a higher correlation between lit area and population and a smaller standard error of estimate, meaning that the resulting estimates were more accurate. Table 4 shows that the estimates in 2017 were substantially different from those in 2015. There were three provinces with underestimated population numbers in 2017, but this case was only found in one province in 2015. Jawa Barat is one of the provinces with very different estimation results, namely, overestimate in 2015 (1.10% error) but underestimate in 2017 (-15.32% error), because the regression formulas (coefficients and exponents) derived for the two years were fairly significantly different. However, most cities in the province tended to have underestimated results with large % errors. This is due to the fact that many cities were densely populated and clustered. Figure 3 compares the estimates with the actual populations of the cities/regencies of Jawa Barat Province in 2017. It shows that most cities/regencies with underestimated results tended to have large differences between the estimated and actual populations. Although several cities/regencies showed overestimated results, the estimates were not much different from the actual numbers. Most of the provinces with underestimated results formed a graphic pattern, as shown in Figure 3. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W3-2021 Joint International Conference Geospatial Asia-Europe 2021 and GeoAdvances 2021, 5-6 October 2021, online areas (an independent variable) than the cities. This is especially because the cities tend to have narrow areas administratively. In addition, cities are more developed than their surroundings and experience population growth every year because of the impact of urban economic and physical developments that attract influxes of migrants. Figure 5 shows a graph comparing the lit areas in 2015, 2017, and 2019 in several cities/regencies. The lit area had covered the entire cities (100%) since 2015; thus, there would not be an increase in 2017 and 2019. On the contrary, the lit area in the regencies continued to increase during the observation years. For these reasons, the population estimates in the cities were less accurate than in the regencies, as evidenced by the increasing % errors of the estimated populations in the cities in, for instance, the DKI Jakarta Province (see Tables 3, 4, and 5).

SUMMARY
Nighttime light (NTL) images like VIIRS products can assist in providing population data by estimating the population based on the lit area. However, the derived population estimates have fewer percent errors for areas with one regional level above that of the samples used to avoid larger errors. In this case, the sample used is at the regency/city level, and the estimated populations of the provinces and Java Island have much smaller % errors; therefore, it is better to use the estimation results of the province or the island to give an initial picture of the population estimates in the city/regency level. The disadvantage of population estimation using lit area from NTL data as the only independent variable is that, in the city, it does not increase from year to year, resulting in a bias that will increase the % error of the estimates. To overcome this weakness, it is necessary to add ancillary variables such as light intensity that increases from year to year in each city/regency. Aside from adding or using more variables, another possible way to decrease errors is also to apply a correction factor, i.e., a constant value, to make the estimates closer to the actual population. However, the correction factor applied to each city/regency, province, and Java Island differs, and estimations at the city/regency level will require diverse correction factors.