PREDICTION OF POPULATION DISTRIBUTION IN 2030 USING THE INTEGRATION OF THE CA-ANN LAND COVER CHANGE METHOD WITH NUMERIC EXTRAPOLATION IN KARAWANG-BEKASI, INDONESIA

: Population growth continues to increase every year. It is recorded that in 2022 the world population will reach 7,953,952,567 or an increase of 1% from 2021. The increasing number of residents encourages land use/land cover changes to become settlements. In addition to changes in land use/land cover, there is also population mobilization from one place to another. This population movement will coincide with developments that will occur in the area. Monitoring population distribution is necessary to have no concentration in only one area. Areas with too dense a population will cause many problems such as the emergence of slum buildings, congestion, and flooding. In this case, the population density is centralized due to poor regional planning. The absence of prediction of future population distribution is one of the reasons why urban planning in the future is not optimal. This study aims to predict the distribution of the population in 2030. In predicting the distribution of population in 2030, it integrates the prediction of land use/land cover in 2030 with the prediction of the population in 2030. In predicting land use/land in 2030, land use/land in 2005 and 2010 is used as the primary data for change prediction. The method used to predict changes in land use/land cover is the CA-ANN method by considering the driving factors of changes in the form of altitude, distance from the river, and distance from the road. Predicting the population of 2030 will be done by extrapolation using the three mathematical equations approach, they are linear, exponential, and power equation. The results of the prediction of land use/land and population in 2030 are then used to predict the population distribution in 2030. From the results, In the 2030 there was an increase in the class of settlements which reached an area of 745,169 Km 2 with the overall accuracy of the land cover model reached 82%. The largest population projection come from exponential equation that until 13024668 in 2030. The model population distribution in 2030, show that there is no significant different between three projection model to be used in model population distribution 2030. It is hoped that this research can be a reference for policymakers in planning sustainable urban development.


INTRODUCTION
Population growth continues to grow every year. Recorded in 2022, the world's population reached 7,953,952,567 or an increase of 1% from 2021 (UN, 2022). The increasing number of residents is driving changes in land use/land cover in the region (Genet, 2020;Ojeda Olivares et al., 2019;Ouedraogo et al., 2010;Shukla et al., 2018). The larger the population, the greater the need for residential land for housing, and this is one of the driving factors for changes in land cover. Land cover change is a very complex phenomenon based on, firstly, complex relationships, interactions between different land cover classes, and the diversity of variations and complexity of factors that cause land cover changes, for example, economic and transportation conditions (Rahmawati & Susilo, 2014). There are methods for modelling land cover changes derived from some related information. Spatial data information is modelled into predicted information for land cover changes using various methods: Cellular Automata (CA). The application of CA to observe the dynamics of land cover has the reason that there are many advantages of applying the CA model, including being able to provide a simulation in the study of various phenomena of land cover change development such as regional growth, urban sprawl, gentrification, housing growth, population dynamics, economic and employment activities, urbanization history, land-use change, and so on (Ahmed, 2011).
One of the phenomena extracted from land cover change information is population dynamics, including their distribution (Linard et al., 2011). The distribution or distribution of population is closely related to population density in a region.
Population density can be interpreted as comparing the number of inhabitants with the area occupied based on specific area units. Monitoring the distribution of the population is necessary so that there is no concentration in only one area. Overpopulated areas will cause many problems, such as the emergence of slum buildings (Das et al., 2021), traffic jams (Chang et al., 2021) and floods (Ferdous et al., 2020). The absence of predictions of future population distribution is one of the causes of urban planning in the future is not optimal. Limited information about Indonesia's population distribution and population density, especially in the study area behind this research. This research is expected to provide a more accurate picture of the distribution of the population in the study area.
There are several studies related to this study. First, the research conducted by (Tian et al., 2005) is modelling population distribution using land cover data with China's population distribution model (CPDM) method, where CPDM is a asymmetric interpolation model. Second, the study conducted by (Yoezer Tenzin, 2019), who modelled the distribution of the population using the Third Clark model, research conducted by (Rahimi et al., 2021) modelled the spatiotemporal of the population with taxi origin and destination data that utilizes GPS data from taxis. In the four studies conducted by (Wang et al., 2021), who conducted population distribution prediction modelling using the integration of LSTM and CA models with micro-spatiotemporal granularity Based on previous studies, it can be seen that there is no modelling and prediction of future population distribution by integrating changes in CA-ANN land cover with numeric extrapolation. Thus, this study developed a prediction of population distribution in 2030 by integrating the CA-ANN land cover change model with the numeric extrapolation model in predicting population change.
The study aims to determine the population distribution by 2030 by integrating the 2030 land cover prediction and the 2030 population projection. This study will focus on population distribution using the CA-ANN land cover prediction method with numerical extrapolation of population projections. With this research, it is expected to be used materials for policymakers in planning policies related to urban governance and population welfare, such as energy production planning (Ihsan, Sakti, et al., 2021;Ihsan, Sihotang, et al., 2021)water distribution (Zhao et al., 2019), transportation (Wey & Huang, 2018).

Area Study
The locations of this study are Bekasi City, Bekasi Regency, and Karawang Regency, Indonesia. The area is located to the east of Jakarta, a dynamic area experiencing rapid development every year. In addition, there are many industrial and economic centers in the region. This increased the phenomenon of urbanization in the study area. Increasing the number of residents will result in the need for land also increasing, causing changes in land cover. Therefore, we chose the Region as our study area. More detailed research areas can be seen in Error! Reference source not found..

Data
This study uses two primary data, namely land cover data and population data. Land cover data will predict land cover by 2030, while population data will be used to project population numbers by 2030. The study also used the digital elevation model, road, and river data. The three additional data are used as data to drive land cover changes. The full specifications of the data used can be seen in table 1.

Methods
In determining the prediction of population distribution, several stages must be done. The first stage is to conduct a process of land cover changes in 2030. The second stage is to project the population in 2030. Then the third stage integrates the prediction of land cover in 2030 and the projected population of 2030. The processing stages can generally be seen in the flowchart pada Figure 2.

Ca-ANN Method Land Cover Change Prediction
The CA-ANN model integrates two concepts: Cellular Automata (CA) and Artificial Neural Network (ANN). Cellular Automata (CA) is a raster-based tool that can be used effectively for city modelling (Naghibi et al., 2016) and land cover change (Stevens & Dragićević, 2007). The CA model is generally used to predict past developments affecting the future through local interactions among plots of land, as studied by Wu and Webster in 1998. In CA simulations, the repetition affects the results of the associated repetition day (Peruge et al., 2013). The data used as input in creating the CA model is gridshaped data composed of many cells. The automata cellular system is composed of a collection of elements in the form of pixels. There are three things used in cellular automata: a state that states the class of land cover and transition rules (T), rules used to determine the changes. Rules that can be used can be added, for example, the rules of slope or height. Moreover, there is a neighborhood (N) which is the principle of sustainability. In the case of land cover, a pixel will be affected by the pixels of the neighboring land cover. Therefore, in the cellular automata model, defining the number of pixels considered neighbors is necessary. Several experiments were conducted to see the number of pixels that produced the best kappa validation value in this study. The following is the result of some experiments using the number of pixels seen in Table 3. Based on the results in Table 3 , the use of the number of pixels that produce the best kappa validation is 5×5, which results in a kappa validation of 0.807. Therefore, in making a land cover prediction model, the number of pixels is used by 5×5. To model land cover changes comprehensively, integration with other concepts is needed, one of which is Artificial Neural Network (ANN). An Artificial Neural Network (ANN) is a system based on the operation of a neural network used to recognize classification patterns (Gopal & Woodcock, 1996;Mas et al., 2004;Pijanowski et al., 2002). ANN is based on each driving factor's non-linear function and weighting applied in a network. The essential part of ANN is weighting, where ANN determines the weighting, not the user. The algorithm used is based on backpropagation learning. With the backpropagation learning algorithm, the weight will be iteratively adjusted so that the error rate achieved can be minimal so that the weight will be optimal for use in the prediction process. The advantage of ANN is that it can simplify the land cover change model when the transition rules in modelling integrate various parameters related to land changes such as accessibility, slope, soil type, etc. (De Almeida & José, 2005;Li & Yeh, 2001). The parameters used as a driving factor in this study can be seen in Table 4. Then, in this study, a combination of driving factors was carried out to see the driving factors that most affected the phenomenon of land cover changes in the study area. The most influential driving factor will result in the greatest validation cappa. The combination of the driving factors used in this study and the results of the validation kappa can be seen in Table 5.

Modeling of Population Projection 2030
In determining population projections by 2030, many methods can be used. The selection of the method used will depend on the pattern of population added to the region. In this study, an investigation was conducted for the extrapolation methods, namely linear, exponential and power (Weerasekara et al., 2021). The equations used in extrapolation use data from 2000 to 2019. The linear, exponential, and power equations used can be seen in equations 1, 2, and 3, where y is the population of the population, x represents the year (time), and e is the natural basis of the logarithm.

Population Distribution Modelling
In modelling the distribution of the population will make modifications from the research of (An-min & Zong-jian, 2002). In this case, the population is the population density in each class of land cover multiplied by the area of land cover that can be seen in equation 4. Then the model was modified by (Nengsih, 2015)), where information was included in the weight of land cover in the calculation of population as in equation 5. Then in this study, the calculation of the number of inhabitants was carried out by modifying the previous two equations so that equation 6 was obtained 4 5 6 Where, : total population : population density of each type of land use : total weight of each land class : weight of each land class : Total area of each land cover : area of each type of land cover The weighting of land cover input data is obtained from ranking the value of the difference between the social function of the land and the economic function of the (A Riqqi, 2008; Akhmad Riqqi et al., 2011). Table 6 below will show the weight of input data on each land cover. After get the population in every land cover area, in this research make grid 30 x 30 meters for visualization the population density.

Land Cover Predictions for 2030
Land cover changes are also indeed based on the previous period's land use class and neighboring land use (neighborhood). The pattern of change used to predict land cover in 2030 is using the land cover pattern of Bekasi City, Bekasi Regency, and Karawang Regency. in 2005 and 2010. From the data of Bekasi regency and city land cover in 2005 and Bekasi Regency / City in 2010, the potential for a change in land cover with the Markov Chain model can be calculated. In the Markov process, every state is likely to change from the present state to another state or will remain in the present condition depending on the distribution of probability that exists. This condition of change is called a transition, and the probability associated with the change is called the probability of the transition. The results of the transition probability matrix can be seen in Table 7. The value of the chance of change in the table is based on changes in land cover in 2005-2010. The opportunity for change from 2005 to 2010 is a picture of the possibility of change in the future. In the change opportunity matrix, the opportunity value ranges from 0-1, which 0 indicates the absence of opportunities for land cover change while one indicates the odds are bound to change. In predicting changes (change prediction) in land cover in Bekasi City, Bekasi Regency, and Karawang Regency in 2030, this method of projecting land cover changes is assumed to be the same or like future cover changes. Land cover changes are also indeed based on the previous period's land use class and neighboring land use (neighborhood). The pattern of change used to predict land cover in 2030 uses the probability of transition previously obtained. The results of the land cover prediction can be seen in Figure 4. The prediction of the area of the land cover of Bekasi City, Bekasi Regency, and Karawang Regency in 2030 compared to the area of land cover in 2005 and 2010 can be seen, the difference in each land cover described in the following graph in Figure 5.
Based on the graph above, it can be analyzed that over time, it can be seen the area of settlements continues to increase while forests, plantations, rice fields and shrubs continue to decline. It indicates that from future projections, reduced land will be  In the results made, when compared with the results of extrapolation 2020 with existing data, it was found that the power equation has a minor error value reaching -26759.7 population. The second smallest value in the linear equation has an error of 36805 inhabitants. The most significant error value occurs in the exponential equation that reaches the 155311 population. This shows that the power equation is best in the project model for the 2020 population project.

Population Distribution 2030
Density calculations using the mathematical model described in equation 6 produce a density value for each land cover in the Karawang-Bekasi region. The figure above shows the density calculations for the Karawang-Bekasi area in 2030 based on land cover input data and population projections. The part that has a red color is a region that has a high density or population density. Land cover with a high population density level is usually residential, followed by mixed gardens and moors. Conversely, those with a blue color have a lower density or population density. From the visualization of the population distribution displayed, there is generally no significant difference between the three projections used. The population distribution is slightly less frequent than the other two projections. The distribution pattern shows that the highest population density is in the City of Bekasi and then widens to the surrounding area In this study, land cover predictions were carried out in 2020 in Bekasi City, Bekasi Regency, and Karawang Regency to be used as validation, which can be seen in Figure 8. From the prediction results, validation of the 2020 land cover prediction model was carried out using confusion matrix error. Confusion matrix error is used to compare prediction results with existing values. In this study, the results of land cover predictions are compared to existing data from 2020 satellite imagery on Google Earth Pro (Google Earth Pro, 2022). In this test, 100 sample points were used where the number of sample sticks used was calculated using the loving formula with a confidence interval of 90%. Distribution of sample points using stratified random sampling method where sampling is based on existing strata. Strata in this study are a class of land cover that includes forests, fields/moors, plantations, rice field settlements, shrubs, water bodies and moors. The results from accuracy tests on each classification can be seen in Table 8, Table 9, and Table  10.   The overall accuracy value obtained from the random sampling process is 82%. According to the good overall accuracy value for the use of satellite imagery is 80%-85%. Thus, the value of overall accuracy obtained falls into the excellent category. The kappa coefficient value obtained is 0.7941, based on research conducted by the kappa value entered in the substantial class. The value of the overall accuracy test Then for the value of user's accuracy ranged from 60%-100%, while for producer's accuracy ranged from 50%-100%. The producer's accuracy indicates the producer's level of truth/accuracy in classifying objects on a map/image by a percentage of the total objects/pixels present. Consumer accuracy (CA) is defined as the user's level of correctness in interpreting objects on the map /image generated by the producer. This study also obtained commission error and omission error scores. Commission error indicates pixels from other classes classified in a particular class, causing an excess number of pixels in a class. An example is the most significant commission error in the plantation class, which is 50%. This means that more than 6 points that do not enter the plantation class are classified in this class. Omission error is a misclassification caused by a lack of pixels in one class because the pixels are classified in other classes. For example, in the plantation class has the highest omission error value of 40%, 4 points should be classified in the plantation class, not classified in this class (Rwanga & Ndambuki, 2017;Sutanto, 2013).

Comparison Model Distribution from Existing
Population Data 2020 with Result of Extrapolation 2020 Figure 9. the density calculations for the Karawang-Bekasi area in 2020 based on land cover input data and population projections.
The figure above shows the density calculations for the Karawang-Bekasi area in 2020 based on land cover input data and population projections. The part that has a red color is a region that has a high density or population density. Land cover with a high population density level is usually residential, followed by mixed gardens and moors. Conversely, those with a blue color have a lower density or population density. From the visualization of population distribution displayed, there is generally no significant difference between population distribution using prediction data and existing data. Furthermore, to find out in more detail the level of similarity between the distribution model of the existing data and the 2020 population projections that have been made, a scatterplot will be made as in Figure 10. In the figure, there is no significant difference in the spread of points from the three scatterplots made. This indicates that there is no significant difference in the distribution of the population made. Furthermore, the results are performed linear regression to calculate the level of similarity of the two data compared. In this case, the multiplier factor granted with b its value on the three scatterplots is close to 1, which concludes a high degree of similarity. This is also followed by a value of the incineration (a) close to 0, which indicates the more similar the two data. This is also followed by a coefficient of determination (R2) worth 100%. When more deeply viewed, the distribution pattern that is closest to the current data value is to use linear equations identified from the value the multiplication (b) is closest to 1. The corresponding value (a) is closest to 0. The part that has a red color is a region that has a high density or population density. Conversely, those with a blue color have a lower density or population density. As high the projected value of the number of inhabitants, the higher the distribution value.
From the visualization of population distribution displayed, there is a reasonably noticeable color difference between the three-time periods, which indicates a significant increase in population between 2010-2020 and 2020-2030. In general, the increase in the number of residents in a region is directly proportional to changes in land cover. Land cover with a high population density level is usually residential, followed by mixed gardens and moors. The two areas that are the center of population growth in the Karawang-Bekasi area are the Bekasi City Center in the west and the Cikarang Industrial Area in the south.

Comparison Model Distribution 2020 with World Population Model
The results of the 2020 distribution model located in Figure 12 will be compared with the global population distribution model (Gaughan et al., 2013). The comparison method used is image to image, where you want to know how the pattern occurs in the results used. When viewed in the Figure 12, which is the distribution of residents from global data shows that Bekasi City is the most populous region. It is also the same as the results of the distribution model that has been made. The most noticeable difference is that in the southern part of the Karawang Regency, it is seen that the global population distribution model is not very detailed. The difference is visible, but in the resulting model, the difference in population distribution in the south can be seen well because the land cover data have a spatial resolution of 30 meters.

CONCLUSION
In this study, there are several conclusions from the results given. In the 2030 land cover change, there was an increase in the class of settlements which reached an area of 745,169 Km 2 . The overall accuracy of the land cover model reached 82%. The 2030 population projections obtained a different population number from the extrapolation of linear, exponential and power equations, each of which shows the projected population value of 11573000; 13024668; 12746158. The 2030 population distribution model based on the model obtained by the settlement class is the class with the highest number of residents with the most significant value reaching 851.3, which is the result of exponential equations. The distribution that occurs is centered on the City of Bekasi by extending to the surrounding area.
This research still has some shortcomings, such as predicting changes in new land cover using the road, height, and river data. The use of data can still be improved, such as reviewing the slope and economic center in the region. In projecting the number of new residents using three mathematical models, there are still several models that can be used. Furthermore, in considering the population, there is no weight in the number of inhabitants of each City. This can be improved by adding weight to projections for each different City. In the distribution method, there is no weight in each City, so it still assumes that the value of each City will be the same