ASSESSING THE SIMILARITIES OF 3D SIMULATION MODEL OUTCOMES

The recent advancement of simulation modeling to represent phenomena in three spatial dimensions (3D) requires the development of techniques that will allow comparison of the modeling outputs in multiple dimensions. However, many existing techniques for map comparison in two spatial dimensions (2D) have been developed from non-spatial method such Cohen’s Kappa. These techniques are not yet fully extended to deal with 3D map data or simulation outcomes. Therefore, the main objective of this study is to investigate the use of the 3D Accuracy and 3D Cohen’s Kappa coefficients to compare simulation model outputs in 3D. An existing agent-based model (ABM) of forest-fire smoke propagation was used to generate multiple scenarios for the purpose of comparing 3D simulation outputs. The results for 3D Accuracy and 3D Cohen’s Kappa produces meaningful values when comparing several scenarios with different 3D ABM outputs. This study emphasizes the need for the development of more advanced simulation output comparison techniques that operate in 3D and potentially over time (4D).


INTRODUCTION
The multidimensional characteristics of geospatial phenomena and the increase in three-dimensional (3D) and fourdimensional (4D) simulation modeling (Eshraghi et al. 2012;Gobron et al. 2011;Jjumba and Dragićević 2015;Narteau et al. 2009) indicates the necessity to develop improved methods for simulation model output comparisons. With spatial analysis and modeling approaches advancing from two-dimensional (2D) into 3D space, and even 4D with space-time considerations, the methods required for map comparisons must evolve with this advance into 3D and 4D. However, non-spatial techniques from concepts independent of the spatial dimensions can be applied to spatial scenarios including 3D and 4D. Potentially, the most common map comparison method is the simple overall map accuracy where two maps are compared cell-by-cell to calculate the percent agreement (Van Genderen et al. 1978). Accuracy does not account for chance agreement in its calculation where the two maps being compared have some degree of similarity expected due to the randomness of cell values. Cohen's Kappa coefficient (Cohen 1960(Cohen , 1968, or simply Kappa, was developed to account for the chance agreement between two datasets although it was originally developed for the field of psychology. Previous to the Kappa, the mathematically identical Heidke Skill Score was developed for meteorology (Heidke 1926). However, Kappa is the commonly used name in Remote Sensing and GIScience studies when dealing with image or map comparisons. Since its development, Kappa has been widely adopted for use in remote sensing (Congalton and Mead 1986;Hudson 1987) and raster GIS for the purpose of comparison of raster-based images using paired cells, also known as pixels or raster, from two images or GIS data layers representing maps. The Kappa coefficient has been further enhanced to multiple variants such as Fleiss' Kappa (Fleiss 1971) which has seen use in the comparison of more than two maps (Rogers et al. 2014), Fuzzy Kappa which introduces spatial fuzzy logic into the calculation of Kappa (Hagen 2003), and many others specifically for land-use change comparisons including Kappa Histogram, Kappa Location, Kappa Simulation, Kappa Transition, and Kappa Transition Location (van Vliet et al. 2011). The hierarchy of the various Kappa variants are shown in  The techniques used by  2D map comparison methods such as Fuzzy Kappa are incompatible with 3D applications and therefore require major redevelopment before they can be used for comparisons of simulation outcomes in multiple spatial dimensions.
Both accuracy and Kappa measures are used for the comparison of 3D maps, typically generated from LiDAR data (Roberts et al. 2019;Wang et al. 2018) however were not as yet applied for the needs of geosimulation modeling. Therefore, the main objective of this study is to compare the similarity of 3D simulation modeling outcomes using 3D Accuracy and 3D Kappa, and then to theoretically extend the analysis towards possible spatial extension of Kappa to 3D and 4D metrics. A previously developed 4D agent-based model (ABM) for forest fire smoke propagation (Smith and Dragićević 2019) was used to generate the simulation outcomes used as a case study for comparison of 3D simulation outcomes using 3D Accuracy and 3D Kappa measures.

METHODOLOGY
Similar to the resel or raster concept in 2D GIS (Tobler 1995), voxels are the smallest representation of 3D space, divided into a regular 3D lattice, typically forming cubes (Greene 1989;Jjumba and Dragićević 2016). Similar to raster GIS data layer, representing array of equal-size uniform square cells, a voxel data layer represents the equal-size cubes. Each voxel contains a value, such as a class, that represents an attribute characterizing the 3D space that the voxel occupies, empty voxels must be represented by null values. In this study the 3D Kappa approach (Smith and Dragićević in press) has been used to compare simulation outcomes in 3D. The 3D Accuracy and 3D Kappa measures are designed for use with voxel representation of model simulation outputs. The 3D Accuracy is calculated by comparing two voxel datasets voxel-by-voxel and calculating the proportion of agreement between the pairs of voxels. The name 3D Accuracy is given to distinguish it from the cell-bycell methods used by Accuracy in 2D map comparison. Cohen's Kappa is extended to be applied to the 3D voxel datasets through 3D Kappa (K3D) and can be defined by the following equation: where Po represents the observed proportion of agreement, or observed similarity, among voxels in two 3D maps or simulation outcomes S, and Pe represents the expected proportion of agreement, or expected similarity, if voxel classes were assigned randomly to the voxels of both datasets. Po is equal to the 3D Accuracy of the two voxel datasets. The expected similarity Pe is defined by: where PiA and PiB represent the proportion of class i in simulation outcome SA and outcome SB respectively, for all n classes in the two 3D maps or simulation outcomes. To implement both 3D Accuracy and 3D Kappa for the 3D modeling outputs, a custom program in Python 2.7 was developed. This program accepts two voxel datasets converted into 3D arrays of the same extent. It synchronously iterates over each entry of the arrays, recording relevant information, including the counts of each class per array, the number of matches, and the total number of voxels with data. Equations (1) and (2) are calculated and 3D Kappa is returned alongside Po as 3D Accuracy.
This research study uses an existing ABM to simulate the propagation of forest fire smoke (Smith and Dragićević 2019) based on a hypothetical fire. The ABM operates with two primary agent types -fire agents and smoke agents. The fire agents represent fire locations, and they emit smoke agents at a predetermined rate that represent smoke in the atmosphere.
Smoke agents move using two methods, the first being carried by the wind, and the second through diffusion to increase dispersion. The local terrain acts as a lower limit for the elevation of smoke agents as they travel over, around, and through mountains and valleys. The simulation outcomes are voxelized, creating voxels with values representing the number of agents they contain, thus the concentration of smoke.

IMPLEMENTATION
The study area is located in British Columbia (BC), Canada, where fire and smoke are represented as agents as they move through the mountains using wind patterns and terrain. The 3D spatial extent of the study area is 96km x 45km x 10km with a southwest corner located at 70 Mile House, BC and a northeast corner of approximately 5km northeast of Mahood Lake, where the Murtle River joins the Clearwater River. The study area is approximately 100km north of Kamloops, BC. The simulation uses a hypothetical forest fire started on August 10 th , 2017 on the west coast of BC and burns for a total of 15 days. The model simulation outputs are obtained in voxel data format with the locations of smoke agents and at a voxel of resolution of 1km. The ABM simulates the concentration of smoke agents that are represented with five voxel classes. Voxel classes were determined by the number of smoke agents in each voxel based on the information from the Government of BC Air Quality Health Index (Government of British Columbia 2020). Class 0 represents no smoke when voxels contain no smoke agents. Class 1 represents low smoke with 1-3 smoke agents, Class 2 represents moderate smoke with 4-6 smoke agents, Class 3 represents high smoke with 7-10 smoke agents, and Class 4 represents very high smoke with more than 10 smoke agents in the voxel.
Three sets of comparisons were made between different output times and settings of the agent-based model. The first set of comparisons is composed of four simulation outcomes to allow for six comparisons (Figure 2). The study area contains smoke on August 23 rd at 3pm, 13 days and 15 hours into the simulation. The model is run twice to retrieve two simulation outcomes, Simulation 1 and Simulation 2. The model also has an option for the forest fire to move under the influence of the wind and is used to retrieve Simulation 3. A second output time was selected for Simulation 4 on the 7th day, or 0:00am on August 17 th , 2017 when smoke was also found to be present in the study area for Simulation 4. Comparison 1 is between Simulation 1 and Simulation 2 of the static fire scenarios, where both simulations use the same initial conditions but different automatically generated random seeds that define the set of random values used. This comparison is used to assess the differences caused by the randomness in the model. The randomness originates from two sources, one is the random diffusion of smoke and the other is due to the random order of the agents' movement. Comparison 2 is between Simulation 1 Figure 2. Agent-based model forest-fire smoke propagation simulation outcomes in voxel data format used for six comparisons.
and Simulation 3 using the scenario with a fire that moves over space-time through the landscape of the study area. Comparison 3 is between Simulation 1 and Simulation 4 which is obtained from the alternate time step assuming the 3D terrain is identical, but the smoke will have experienced different wind conditions. Comparisons 4, 5, and 6 are between all other possible combinations of these four simulation outcomes.
Smoke is present in the study area for an additional 9 hours after the output time used for Simulation 1, and a total of eight ABM simulation outputs have been generated from the two runs and presented on Figure 3. The four outputs of one run of the model are compared against the outputs of the second run, where the first outputs are Simulation 1 and Simulation 2 from the first set of comparisons. The 16 comparisons between these two sets of four simulation outputs make up the second set of comparisons. These comparisons are done to investigate a use case of 3D Kappa for comparing simulations to temporally match patterns from two simulations.
The third and final set of simulation outcome comparisons is performed to explore the sensitivity analysis of the diffusion rate of the smoke of the agent-based model. The distance used by the diffusion process is multiplied by the diffusion factor (DF), where DF = 2 is the most similar to the calibrated DF of the model. Six levels of diffusion are used and results in DF1, DF4, DF8, DF12, DF16, and DF20. This analysis will help assess the impact of the diffusion factor, for the selected simulation runs. Due to the required computer processing time of the model only one run for each diffusion factor was completed.

RESULTS AND DISCUSSION
The obtained results in Table 1 show the 3D Accuracy and 3D Kappa values for the first set of comparisons. Due to the large number of voxels with no smoke in the outputs, the accuracy typically receives high values. The only comparison to receive a moderate or better 3D Kappa value is Comparison 1, between the two runs with the same initial conditions. This comparison was expected to receive much higher 3D Kappa values due to the very similar appearance of the smoke patterns. However, the 3D Accuracy of this comparison is only slightly higher than Comparisons 3 and 5 which obtained the lowest 3D Kappa values. These two Comparisons involve Simulation 4 which along with Comparison 6 achieve the lowest three 3D Kappa values and with exception of Comparison 1, the highest 3D Accuracy values. This is due to the large number of voxels with no smoke in Simulation 4, more than the other simulations. The randomness incorporated in the ABM creates the differences between Simulation 1 and Simulation 2 presenting itself partially in the form of randomly distributed voxels within The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition) Figure 3. Simulation outputs S1 and S2 at initial stage and after three hours, six hours, and nine hours of smoke propagation.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition) the 3D space ( Figure 4). While this can make two 3D maps appear visually similar, few of these voxels with smoke in one map agree with their counterpart in the other. This leads to noticeably lower 3D Kappa values than expected and slightly lower 3D Accuracy values. In 2D spatial analysis techniques, a moving window is often used to incorporate nearby cells into the analysis of the central cell. Situations like this could be more accurately assessed if this type of moving window analysis could be developed into a more advanced 3D map comparison method. Using a moving 3D window, nearby voxels could be used to partially substitute the agreement between two disagreeing voxels. A similar technique with the moving window was adopted for the calculation of Fuzzy Kappa (Hagen 2003) in 2D and can allow for further expansion into 3D Fuzzy Kappa to accommodate the benefits of the use of 3D window space.

Figure 4.
Randomly distributed smoke voxels highlighted in red from ABM outputs for simulation S1 and S2 The values for 3D Accuracy and 3D Kappa for the second set of comparisons are presented in Table 2. Similar to the first set of comparisons, accuracy is very high, with the lowest value of 0.86. The accuracy values of the comparisons of outputs of the same time step are typically the highest in their row and column, although by small differences. However, 3D Kappa produces greater differences between the comparisons of outputs of the same time step and all other comparisons. Neither 3D Accuracy nor 3D Kappa include any time component that can incorporate differences in time to the comparison of data.

CONCLUSIONS
This research study extends two existing methods of Accuracy and Kappa measures into 3D by proposing 3D Accuracy and 3D Kappa for comparing model simulation outcomes in 3D space. Critics of Kappa typically have concerns with how chance is accounted for (Foody 2020;Pontius and Millones 2011), however for use in assessing models such as the one used in this study that have built-in randomness, this concern may be reduced. Kappa is also an often-required metric in many fields, such as remote sensing and GIS, and is applied despite the criticisms. While in this research study 3D Accuracy and 3D Kappa metrics show potential for use in comparison of model outputs dominated by a single voxel class, it also revealed a greater potential to expand these methods to account for spatial neighborhoods or spatial autocorrelation in 3D domain. The existing 2D Fuzzy Kappa method can be extended into 3D and even 4D Fuzzy Kappa counterparts to advance map and simulation outcome comparison into 3D and 4D while incorporating fuzzy logic (Smith and Dragićević in press). The proposed 3D Accuracy and 3D Kappa measures may be the first step towards enabling more comprehensive comparison methods of simulation outcomes and generated patterns in 3D.
While designed for 3D simulation outcomes and 3D map comparisons, such measures may also be useful to other fields, including 3D medical imaging, and in the training and comparison of predicted 3D AI-system behaviour with the realworld data. Based on findings from this research study, further advance methods of 3D simulation outcomes and 3D pattern comparisons should be developed that facilitate the calibration and validation of simulation models such as 4D ABM or voxel automata operating in the 4D space-time domain.