CLASSIFICATION OF CONSTRUCTION FIRMS BASED ON BIM ROLES AND BIM LEVELS USING MACHINE LEARNING TECHNIQUES

: Application of Building Information Modelling (BIM) within the AEC industry has been evolving. With new developments and increasing capabilities, BIM is reshaping the design, construction, and operation, and maintenance processes and revolutionizing the entire functions of building life cycles. To maximize BIM benefits and take advantage of its capabilities, it is imperative that project stakeholders define specific roles and responsibilities within projects; to employ professionals with high levels of BIM proficiency, expertise, and knowledge. This study aims to classify the construction firms into different clusters based on their BIM capabilities, implementation, BIM levels, and type of BIM roles they employ for construction projects. It will further predict and classify BIM levels at company level according to its usage. The methodology was based on a survey design which consisted of application an online questionnaire that was distributed to AEC professionals in the industry. 61 suitable responses were analysed, using different supervised and unsupervised machine learning algorithms, including Cluster Analysis, K-Nearest Neighbours algorithm (k-NN), Random Forest, and Gradient Boosting. The findings showed most firms were not applying BIM on their projects and the majority of those that did were not utilizing it in its full potential. Firms were further classified in terms of BIM levels and types of BIM applications they utilize on construction projects. The results showed that Random Forest had the highest performance and the most accuracy, comparing with KNN and Gradient Boosting, even though the performance and predictions results produced by all models were in proximity of one another.


INTRODUCTION
Building Information Modelling (BIM) is a current trend within the construction industry. According to NIBS, "BIM is a digital representation of physical and functional characteristics of a facility. As such it serves as a shared knowledge resource for information about a facility forming a reliable basis for decisions during its lifecycle from inception onward" (NIBS, 2021). ISO 16757-1: 2015 defines BIM as "construction of a model that contains the information about a building from all phases of the building life cycle" (ISO, 2015). BIM has been utilized within the construction industry for a long time, and recently, its usage has increased among the BIM-based construction networks, however, collaborations among stakeholders remain challenging (Oraee et al., 2019). BIM has gained significant momentum and evolved during the last decade, however, its implementation and usage are not universally standardized and differ significantly, depending on the context in which it is used for (Hooper, 2015). Its application for designers, architects, contractors, and owner/operators may be different due to different unique intended purposes, however, there is a commonality that all these groups share when utilizing BIM, and that it to share different types of project data and information. Depending on the size of the firms and their need for a particular purpose of usage, they may employ experts for one or all the following roles such as BIM modeler, BIM specialist, and BIM coordinator. Davies et al. (2017) investigated the definition of different BIM roles and found that there exists so many different names and variations depending on the project and specific country standards and guidelines. They interpret BIM managers as those responsible for overall creation, production, and implementation of BIM plans and protocols on projects, and additionally, describe BIM coordinators as those responsible for exchange of models and information, working directly under the supervision of BIM managers. Further, BIM specialist is a person that does the modelling and has an advance knowledge of information management and the BIM software ("BIM Manager, BIM Specialist and BIM Coordinator roles," n.d.). Ellis (2020) referred to BIM specialist as 'BIM Technician' and stated that they must "be able to understand not only how a building fits together but how a building can be modelled accurately in a BIM environment." Although these roles are crucial to application and execution of BIM on construction projects, they are not sustainable and are always evolving due to advancements in BIM (Akintola et al., 2017). BIM professionals play the role of change agents within the industry. Although these roles are accepted in the construction community, there is not a universal and standard definition of duties and responsibilities (Bosch-Sijtsema et al., 2019). In addition, each firm uses BIM for a different purpose as mentioned earlier. This could be for design, rendering, information-sharing, clash detection, geometry representation, and other purposes. This has resulted in creation of different BIM levels within the BIM community. These levels start from BIM level 0 and include 2D, 3D, 4D, and 5D (Lorek, 2021). These are also referred to as maturity levels which correspond to level of information exchange in the construction sector. "2D BIM is a digital geometric model that constitutes an X and a Y axis associated with further information. 3D BIM is a digital geometric model that constitutes an X, Y and Z axis associated with further information" (Hamil, 2021). While 4D and 5D correspond to scheduling and cost of a project, respectively. Based on these existing differences across the construction industry as well as existence of different standards and specifications, the need to evaluate and understand the structure of construction firms in terms of BIM usage and utilization is very imperative. This will, in fact, increase the possibility of more collaboration among construction firms of different backgrounds and will assist the industry on building and developing a uniform platform for information sharing and smooth transfer of knowledge across all sectors of the construction sector. The remaining structure of this research is as follows: the purpose of research, research methods and materials, results and discussion, and conclusion.

Purpose of Research
This research intends to categorize the construction firms into different groups that have similar characteristics, based on different BIM-related attributes such as BIM usage, BIM roles, and BIM levels. Additionally, it will classify and predict the construction firms' BIM levels in terms BIM roles: BIM specialist, BIM coordinator, BIM manager, and overall, BIM utilization. The purpose is to evaluate the compatibility within each observation (group) along with attributing BIM factors, and to further understand and categorize the different types of BIM levels that are being used in the construction sector.

Data Collection
The methodology consisted of implementation of a survey design that was developed based on a structured online questionnaire. The survey was part of a larger study, and for the purposes of this paper, some portions of the data were used from the original questionnaire as secondary data. The questionnaire was distributed to the members of the construction industry via e-mail and dispersed in different LinkedIn groups, therefore, making the calculation of response rate not feasible. The questions were closed-ended and multiple-choice questions. A total of 170 responses were received, but only 61 were deemed suitable and complete in the form of following groups: 13 respondents from micro firms with less than 10 employees, 10 respondents from small firms having between 10 and 50 employees, 6 medium firms that had between 50 and 100 employees, and 32 large firms with over 100 employees, as shown in Figure 1. The profiles of respondents in terms of firms' specialty and country of origin are shown in Tables 1 and Table 2, respectively. Most of the respondents (over two-thirds) were from the USA and the rest were from other countries across the globe. The 'Mix' category, shown in Table 1, represents the firms that engaged in more than one specialty, e.g., performing both commercial and residential works.

Cluster Analysis
SPSS Modeler was utilized to group up the construction firms, in terms of BIM attributes, into different clusters. TwoStep Cluster Analysis, which is an unsupervised machine learning technique, was deemed appropriate for this computation since the questionnaire contained binary data and was of categorical nature. The data was divided into training and test partitions. 70% of the data was used for the training set and 30% was utilized for testing, respectively.

Supervised Algorithms for Classification
Classification of BIM levels was conducted by machine different supervised machine learning techniques to model the associations and dependencies between the predicted output target and the input features. The following supervised algorithms were utilized and compared as part of this process.
(1) KNN, (2) Random Forest, and (3) Gradient Boosting. 10fold cross-validation was used to divide the data set into 10 different folds as demonstrated by Vabalas et al. (2019). Each fold was given an opportunity to be used as a subsample for testing. One sample was retained and the remaining of the folds were used for training. The process was repeated 10 times and each of the subsamples were used once as the validation data. The performances of the models were evaluated using class performance and overall performance, comparing area under the curve, F1-score, precision, accuracy, and specificity. The following equations were utilized for this process. Where: TP is the number of positive classes that are correctly predicted by the model. FP is the number of positive classes that are incorrectly classified by the model. TN is the number of negative classes that are predicted correctly by the model. FN represents the number of negative classes that are predicted incorrectly by the model.

Firm Clusters
Five variables of BIM Levels, BIM Usage, BIM Coordinator, BIM Manager, and BIM Specialist were used in the cluster analysis, resulting in 4 clusters as shown in Figure 2. Further, Figure 3 shows the predictor importance for each variable, with BIM Levels having the highest importance of predictability and BIM specialist being the lowest among the 5 variables. The average Silhouette value was 0.6, which means that the results are statistically good. The distributions show the number of variables assigned to each cluster. By default, clusters are sported from left to right based on size. According to Figure 2, cluster 1 has the highest number of variables (24) and cluster 3 has the least number of variables (8) Further, what can be obtained based on these 4 clusters is that the contractors do not fully engage BIM in all their operations. There are only limited number of firms that use BIM equally based on its full potential across all design and construction activities. Moreover, the results are relatively interesting, given that the sample size consisted mostly of larger firms, and large firms are the ones that should be advanced in BIM utilization.

BIM Levels Classification
Four variables of BIM Usage, BIM Specialist, BIM Manager, and BIM Coordinator were utilized as input variables and BIM Levels was chosen as the target variable for the machine learning algorithms. The purpose was to classify the type of BIM usage among construction firms based on BIM roles within those firms.

Figure 4 below is the ROC (Area Under the Receiver Operating
Characteristics) curve for the three algorithms, which shows the true positive rate against the false positive rate. According to this graph, the area is high and close to 1, which is an indication of good performance by all three models.  Although all models were close in terms of accuracy, Random Forest was the better model comparing with the other two models. With a 65% accuracy, it implies that 4.5 out of every 10 BIM levels were predicted correctly. Although the accuracy is somewhat considered low, it is contributed to the small sample of the data and is not a representation of the model performance. Based on the Recall values of around 65%, every 4.5 out of 10 BIM levels are misclassified.
Figures 5, 6, and 7 represent the confusion matrix for our machine learning models. Confusion matrices are utilized for performance assessment of classification algorithms by comparing the target values and with the predicted values. It also provides and shows the different types of errors made by the model.

CONCLUSION
Digitalization of the construction sector has been an on-going debate and discussion among researchers and the construction industry. Many research projects are being devoted for implementation of new technologies within the construction industry. These efforts are being made in response to low productivity and lack of technological advancement. The construction sector is lagging, compared with other sectors such as manufacturing, in terms of innovation. BIM is one of the most important technological advances that is being utilized by construction contractors and subcontractors due to many benefits it provides. However, due to complexity of usage, financial cost of implementation, training, and compatibility issues with other partners in the industry, its adaptation has been faced with some challenges by some contractors. BIM levels and BIM roles are critical elements to successful operation of BIM during construction projects. The purpose of this paper was to classify the construction contractors according to their BIM usage, BIM roles and levels. A cluster analysis was conducted, and the results showed that most of the firms within the sample did not fully take advantages of what BIM has to offer. They mostly did not use BIM or used it in a limited capacity. This proves that lack of knowledge and proficiency is evident, although most of the firms within the sample were large U.S. firms. Secondly, supervised machine learning techniques were utilized to classify the firms' BIM levels based on their BIM roles. Random Forest, Gradient Boosting, and KNN were used for classification and prediction as part of this process. Random Forrest was found to be the most accurate model out of the three algorithms. The results showed that the models were almost equal in terms of performance, though the accuracy was low, consistency of results was evident comparing all the three models. The limitations of this study included the small sample size that affected the precision and accuracy of the algorithms for this study. Moreover, BIM roles have different definitions, name variations, and responsibilities in different projects, guidelines, and countries, therefore, this may have impacted the responses, producing a less accurate results by machine learning algorithms. Additionally, most of the firms were larger construction firms from the USA, therefore the results can not be generalized for all sectors and all countries. It is suggested that future studies use larger samples that also include more smaller and medium-sized contractors. Moreover, it is suggested that the firms' BIM experience levels be taken into consideration.