AUTOMATIC IDENTIFICATION OF ARCHAEOLOGICAL ARTIFACTS ON THE EXCAVATION SITE

Archaeological data is processed to ensure that it can be easily accessed and used. The integration of the documentation into GIS tools is carried out in the post-excavation phase. The final documents are completed on the basis of intermediate documents made on the excavation site. Time during the excavation is precious and any action that takes time is questioned to allow to devote a maximum of resources to the most important tasks. Many tasks are associated with a traditional paper entry. The aim of this study is to experiment with the use of means of automating the management of archaeological documents in order to minimize the repetition of recording acts of different kind. The integration of computer technology in the field is gradually being achieved through the use of tablets, but their use on the excavation site remains a strong constraint. The first task of this automation lies in the possibility of identifying objects of interest during the excavation. In order to make this recognition of archaeological entities possible it is necessary to ask when they are easily identifiable: in the excavation report. The hypothesis formulated here is that excavation reports can be used as a source for creating learning data sets of neural networks dedicated to the recognition of archaeological objects on site. Two important steps in automating the integration of archaeological data are presented here, the extraction of images and their semantics from excavation reports and the learning process of a neural network for the recognition of archaeological entities at the site of their discovery. The extraction of images and the identification of what they contain allows to enrich neural network learning datasets. Tests have been made to validate the ability of such tools to reliably identify particular objects. We chose CNN to test the ability to recognize archaeological objects in an excavation context. It is an image-based network. What is sought here is the ability to recognize an object for a neural network.


INTRODUCTION
The practice of archeology generates knowledge. They are accumulated, ordered and recorded for each iteration that a search represents. Throughout this process, from the updating to the publication of scientific works, the stages of documentation follow one after the other, methodically and painstakingly. Today, the data created is intended to be found in digital media to facilitate the dissemination and reuse of the knowledge created. How to allow an effective integration of archaeological knowledge in a contemporary work and dissemination media? The question of capturing information arises as soon as artefacts are discovered on the excavation site. Digital integration at the earliest contradicts existing methods. The organization of the data entry is still mainly in paper form, while the cameras are also present and already record certain data in digital format. The study presented here is preliminary to the use of videos during the excavation to automate a large number of data entry steps as well as the production, ultimately, of 3D models. To use a video made on an archaeological site, you must be able to offer automatic extraction methods. The information obtained would correspond to the list of artifacts present and the ranges of the video on which they are found. This information would allow the selection of video ranges and the production of a 3D model with the extracted images. Object tracking can also be used to prepare pre-filled data sheets. Do images from an archaeological site allow us to use a neural network? Can we easily retrieve learning datasets for these tools? and finally, can we analyze a sequence of images from excavations and draw up a chronology of the objects? Before trying to integrate video cameras on an archaeological site it is important to be able to answer these first questions.

STATE OF THE ART
The use of artificial intelligence in a field is associated with the fear of a loss of information control by the specialists concerned. It's about using contemporary tools to assist the archaeologist. it must be possible to allow the archaeologist to better master the creation of knowledge of which he must remain the only one to master the content. If we compare human intelligence and artificial intelligence, (Julia, 2019) proposes to see human intelligence as "intelligence augmented" by artificial intelligence. Our capacities are increased at best. The main action of choosing leading to coherent production remains human.

Archeology and AI, a question of ethics?
The knowledge produced by the field of archeology is nothing other than data. When it comes to classifying artefacts by typology, it is quite logical to focus on image processing and the use of convolutional neural networks appears to be a solution for, for example, identifying ceramic shards engraved with a particular reason (Chetouani et al., 2018). This is an obvious solution because the problem is similar to other areas where image processing is already in use. Image processing for object recognition is a discipline of specialists compared to the practice of archeology. It is a delegation of competence from specialist to specialist at the time of the post excavation, the delegation of competence took place chronologically before the AI was used. In this example, the use of AI is not perceived as a threat to the integrity of archaeological production. For the approach we are proposing, it is the archaeologist who is directly assisted by tools derived from AI and this is where the ethical question arises. The introduction of new tools has an influence on the methods and therefore on the way the results are produced. What influence can the use of neural networks have on the results of archaeological practice? Is the knowledge resulting from this process still consciously produced by the archaeologist?

3D documentation of archaeology
This question already arises when the time comes to acquire data sets to produce 3D models by photogrammetry during excavation. 3D acquisition replaces drawing during excavation to document and record entire archaeological sites. The condition of a site can be preserved regardless of its evolution over time (Alby et al., 2013). The photogrammetric method saves time at the excavation site and reduces time-consuming activities while retaining quality information for the drawing (Alby, 2015). It is even possible to give on-site historical stakeholders the opportunity to manage the creation of their dataset themselves and to link them to data already produced according to reliable photogrammetric principles (Šedina, J. et al 2016). The practice of excavation according to archaeologists' methods hybridizes with technological solutions and provides flexibility in 3D documentation to take into account the hazards of the excavation and saves time on site (Alby et al., 2019).

Automated analysis of excavation reports
Automation, which is of particular interest to us, directly addresses the competence of archaeologists. What is the nature of the object discovered? The need to answer this question by means of an algorithm is motivated by the need to delegate the numerical entry on the excavation site itself. Wouldn't the archaeologist be less distracted by the obligation to capture the digital data with a video camera rather than having to manipulate a tablet?
This raises the question of the relevance of a formal description of archaeological objects using words and numbers. Is it possible to draw up a table of characteristics to these facts as we would wish for our neural network? And above all, is it possible to automate this process with a computer? This is what Jean-Claude Gardin has investigated in numerous scientific publications related to archaeology, which have been studied and synthesised in a single article (Dallas, 2016). According to this study, Gardin is "Gardin is credited as the initiator of the first actual computerbased analysis of archaeological materials" and he particularly warns against the different interpretations and classifications possible from the same data. Thus, no matter how powerful the neural network developed, its result must absolutely be validated by a specialist, as our vector of characteristics will be difficult to transpose in terms of integrity from one object to another. Moreover, Gardin is not the only one to have wondered about automation in the field of archaeology. In fact, many tasks can be improved and accelerated thanks to technology, especially in the time-consuming sketching and photographing phase of the site (Gilboa et al., 2013). Like excavation reports, drawings take a long time to produce and depend very much on who produces them and the archaeologist's understanding of the site and the object; the representation is therefore biased. Once again there is the problem of the transposability of characteristics from one object to another, but this time graphically.
In order to be as close as possible to the characteristics of the archaeological excavation site itself, and to the knowledge produced by the archaeologists, the excavation report seems to us to be the document presenting the best compromise. Indeed, it is produced by the archaeologist and for his colleagues. The document reports on the research and bases its reasoning on images from the excavation site. Most of the excavation reports are in PDF format. (Sommerer, 2004) stresses the importance of finding the structure of the original text despite the PDF format, which transforms it for printing. The source format of the data for our study must allow the extraction of information to generate training data sets for a neural network. The images resulting from the excavation reports converted into HTML must keep their link with the semantics of the described objects. (Lopez et al., 2011) proposes a method for associating captions to the image based on the list of figures and the proximity of text elements to the image.
Once the images with their related semantics have been obtained, the tool for the recognition of specific archaeological objects in these images must be implemented.

Convolutional neural network
Object detection in images is a subject that has been widely covered and has a large number of different methods that can be adapted to any type of object. Therefore, the underlying goal of this topic is to be able to evaluate the capabilities of a neural network. (Shin et al. 2016) proposes a method for implementing a neural network and details the steps involved. CNN is a tool for detecting patterns in an image.The main difficulty is to have an image with a good enough resolution to identify the object but at the same time as few pixels as possible at the input of the network. Figure 1 shows the steps of the CNN to find a compromise between input images and optimization of the amount of data to provide to the neural network. There are two main steps involved in the development of such a tool. First, the architecture of the network must be established: its structure. There are a multitude of typical network architectures, including the VGG16 architecture proposed by (Simonyan and A. Zisserman, 2014) which is presented as very efficient by (Shin et al. 2016). Figure 2 of (Ferguson and Ak 2017) illustrates the progressive transformation of the input image into a vector containing 50 times less information. Multiple successive stages of convolution, pooling will transform a 2D square image into a 1D vector connected to the "neurons". The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Then, the network has to be literally trained: pictures of a training game previously designed for its proper functioning are passed to it so that it can update its internal coefficients and be able to detect an object on a picture. It is important that a neural network can adapt to any object and any shape as long as the training game is adapted for this specific detection. The training data set is a collection of labeled pictures of the object to be detected (i.e. with an attribute allowing the program to distinguish the picture as presenting the object), as well as another set of counter-example pictures (i.e. not presenting the object), which are also labeled. One of the great strengths of the neural network is that, with the right training data set, the tool will be able to detect an archaeological artifact among several objects it has learned.

PROJECT EXPECTATIONS
Two studies are brought together in this paper and are part of a project. It aims to propose a tool to assist the archaeologist in the management of the documentation of the objects discovered on the excavation site. Rather than burdening the archaeologist with digital tools such as tablets that cannot be reliable in the conditions of an excavation and at the same time very effective. Making a tablet resistant for excavation reduces its capacity. The very rough field conditions and the need for efficient data entry cannot be contained in a tool that is sustainable over time. The time required to adapt to the tool compared to the reduced life span due to the operating conditions does not make it easily operational. The arguments for integrating IT as early as possible in the archaeological process are based on the versatility of the tool represented by the tablet. This versatility is contradictory with the specialisation of the discipline, even more so if we add to this the question of robustness imposed by the environment in which it is used. Moreover, these rugged tools require user adaptation that slows down interaction terribly. The archaeologist's adaptation contradicts the life span and cost of such devices.
The permanent presence of video cameras on the excavation site would provide a quantity of picture data allowing the automatic capture of archaeological information and the creation of 3D data if the videos can be analyzed. The production of photogrammetric data sets cannot be completely transferred to the archaeologist, just as the creation of archaeological data requires expertise. Thus, the interest of using neural networks is an issue for classifying videos and extracting archaeological and photogrammetric information in a reliable way. We propose video assistance to digitize as many knowledge sources as possible. The camera is a robust, reliable, durable and ergonomic device. The data produced over the period of the excavation must be processed automatically to the scale of the quantity represented. Automation can be frightening, especially in a knowledge production area. The use of video opens the way to a very important data production, the exploitation of which can only be automatic. The choice of this methodological orientation induces this dependence. It is therefore necessary to implement a reliable approach to the extraction of information. Automatic data management and interpretation are terms associated nowadays with artificial intelligence. How to entrust knowledge management to an automatic system? It is only if the proposed solution demonstrates a very good quality of results that it can be adopted. The quality of the data integrated in an automatic learning system is the basis of our project. The use of excavation reports is, for us, the basis of our approach to guarantee the quality of the information we want to produce during this project. Figure  3 shows the relationship between the two stages of the study presented here. The automation of recognition is closely related to the automation of dataset generation.

Figure 3 Diagram of the presented study
While it is possible to create powerful image recognition tools, their interest is limited if the creation of the training dataset cannot be automated. It is not a question of changing a few practices linked to the archaeological approach, on the contrary it is a question of letting the archaeologist concentrate on his practice rather than being forced to think about how to digitally preserve as many traces as possible.  Figure 4 shows the next phase of this study. Showing that neural networks can be used for the recognition of archaeological artifacts is interesting if it is possible to automate the creation of neural networks for a maximum number of artifacts in order to extract a maximum of information from the videos of archaeological excavations.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

EXCAVATION REPORT ANALYSIS
The need to create learning datasets on a field of knowledge generates the need for labelled images as close as possible to the picture data produced by archaeologists. The excavation report is a very good support of knowledge of the archaeological excavation. Each excavation has its report. It is a concentrate of knowledge in the field applied specifically to the excavation. It tends to contain exhaustive information specifically related to an archaeological excavation.Even if initially only images and semantic descriptive information are important, it should be kept in mind that the excavation report can provide much more information. There are several levels of information extraction sought: pictures and their semantics and then everything related to the objects identified on the pictures. The pictures search method presented here is imperative. There are several levels of information extraction that are searched for: images and their semantics, and then everything related to the objects identified on the images. The picture extraction method presented here is imperative. It depends on the types of objects contained in a excavation report. It mainly uses the classic structure of excavation reports to associate the pictures of illustrations with the semantics directly related to them. The method developed is based on 3 steps for the exploitation of an excavation report.

Pre-processing to make a PDF file usable
A text has a formalism that follows rules that are part of the culture of writing. In order to retrieve information that makes maximum sense, make sure to start the automatic analysis only once this organization is in place. The excavation reports put online are mainly in PDF format, this format allowing to lock the content. This format has been put forward as a format accessible to a very large number of people because of its design and the free distribution of its reader even if it remains the product of a commercial company. Its vocation for printing implies that its content has only a fixed form. The structure that allows this form is different from the structure of the original text. This is why when the process of extracting a coherent text from a PDF file the resulting text has lost part of its structure. The exploitation of information from a PDF must therefore follow a pre-processing which consists of a conversion to HTML, where the data is more structured, and then a syntactic analysis so that the information is as accessible as possible. In this state, it is possible to access many of the peculiarities of the original text without having been able to recover all its subtlety. The structure of the paragraphs, if it had been necessary, would have required further processing. In this extraction, there are still elements related to the creation of the original document. Several word-processed document structures can produce the same PDF file. Even if the appearance of a file can be imposed in a graphic charter for example, several ways of doing so, depending on different people, services or organisations, will offer a wide variety of search reports and will therefore have an influence on the ability to extract images and their semantics, which is one of the objectives of this study.

Picture Extraction
Once the PDF file is usable, each element composing it is referenced by tags. All the pictures we are interested in are therefore associated with a <img> tag. But not all pictures associated with these tags are of interest. It is possible to retrieve a very large proportion of pictures associated with these tags, even if some tags are not linked to any picture. The reliability of PDF file generation as well as the impossible strict standardization of report writing practices inevitably cause these inconsistencies. The inference for sorting the pictures of interest will be developed further. The illustrations that appear in a excavation report are of a different nature and need to be developed. The most obvious for this study are the pictures alone. An illustration that only corresponds to one picture is what is being sought. However, archaeological documentation also produces a very large number of integrated drawings in the form of vectors, which here are not extracted at all. Hybrid versions of illustrations appear and are related to the state of the objects discovered in the excavations: incomplete. Part of the illustration is a picture of the discovered fragment inserted into a larger drawing representing a hypothesis of reconstruction of the complete object. These hybrid illustrations contain a picture of interest that will be extracted but is part of a context that is more difficult to cover automatically. Excavation reports also contain illustrations in the form of pictures but which are not useful for the study because they do not include elements from the excavation. The official context of an archaeological excavation means that official documents are present in digitised form, and maps are an integral part of these reports. The layout resulting from the graphic charter also generates pictures in the final report. Some illustrations are also made up of several juxtaposed pictures such as maps from software working with tiling methods. The reused pictures as recovered generates what appears for automatic picture extraction as an inconsistency. The appearance of such illustrations is coherent during the production of the report but their structure makes them unusable easily. It is also tempting to want to filter out the extracted pictures to reduce the amount of unwanted pictures. The use of filters on a heterogeneous set of pictures induces the use of thresholds determined as close as possible to the types of objects of interest.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) The thresholds, in order to be effective, are established on the basis of cases that are difficult to generalize. It has already been mentioned that the sources and methods of producing excavation reports lead to very different forms of excavation reports, which requires a treatment that incorporates the least amount of specificity in order to be most effective. The implementation of filters during tests has demonstrated the need to stay close to more general treatments in order to retrieve the greatest number of relevant pictures.
To give an example on extracting pictures from a excavation report; out of a document of 313 figures, 181 are rasters with the tag ''<img>'' (58%). Among these 181 rasters, 160 were extracted perfectly (88%), i.e. the extracted picture is consistent with the one observed in the report. The others were separated/duplicated or are incomplete due to lack of the vector part. However, the small lacks (legends and north arrows) are tolerable, only those with a major defect need to be removed. Thus, 174 pictures out of the original 181 (96%) will be kept, which is a very satisfactory result.
The extraction of excavation report pictures is intended to be as automatic as possible, it is the first step towards obtaining a set of learning data discussed below. In order for it to be usable, it remains to associate the most precise semantic information possible, which can be found in the captions of these illustrations. It is thus in the following part of this section that the association of the captions to the pictures will be treated.

Pictures captions extraction
The purpose of this part is to produce a training data set for a neural network. On the one hand, it requires relevant pictures that are extracted from the excavation report, but on the other hand, it also requires that they are labelled, which in such a document corresponds to the legend. The extraction of legends for the pictures is also a means of filtering the obtained pictures because a large part of the irrelevant pictures described above are not associated with legends. Figure 5 shows a page from an excavation report (Lefis, 2011) and the relationship between the pictures and their captions.
The extracted pictures are extracted with their layout parameters, which makes it possible to know how they occupy the page, alone or grouped together for example. These aspect issues have an influence on the caption, two pictures side by side can have a single common caption for example. The structure of the text contained in the PDF file is not easily retrieved as mentioned above, the legends on several lines will not be considered as a separate entity. The scientific documents of which the excavation reports are a part contain summary lists including the list of figures used in this study for the extraction of the labelled pictures. It is easier to find a precise text in a large text than a caption based on a figure. The list of figures thus proves to be a valuable shortcut. Figure 6 of the same excavation report (Lefis, 2011) is a great help in targeting picture searches in the report. The analysis of captions and their relationship to the pictures is divided in our study into two cases, the figure number is superimposed on the picture, with the layout information highlighting this case; and the common case of a caption closest to the linked figure. The first case is common during an enumeration imposing several continuous pictures and the multiple legends are grouped apart from this juxtaposition. Since pictures are illustrations of what is said in the text, one or more references to these illustrations appear in the body of the paragraphs. These cases contain text that is strongly similar to the caption and should not be mistaken. Inversely it is possible to have captions that do not correspond with the list of figures when the list is done manually for example. There are also cases of inversion of legends with pictures that are certainly due to a discrepancy between the sequence of appearance in the file and the layout logic. Keeping the same example as for the picture extraction where the report gives 174 pictures. This report contains pictures of the first case, which is not the case in other reports. This first case allows to associate 117 captions on 174 pictures. The second, more general case should allow to associate most of the remaining cases. After management of the general case, 149 figures out of 174 were associated with a legend (86%). Among the 149 associations, three are wrong because of a problem of order or annotation and parasitic legends. This represents 98% of correct associations among those made (149) and 84% among the correctly extracted pictures (174). The association of legends to pictures is still to be generalized, as some excavation reports do not contain a list of figures and the first case is not very common. Some ways of improvement have already been identified for an iteration of this part of the study. The proximity between figures and legends has not yet been exploited; a statistical analysis of the document could be carried out and exploited, particularly with regard to the different styles used. In addition, it is necessary to be able to manage the case where several pictures are linked to the same caption, this case is very common in some reports.

NEURAL NETWORKS AND ARCHEOLOGY
This preliminary study should make it possible to validate the use of neural networks as a means of extracting archaeological knowledge from documents and to be able to process videos from archaeological sites. The ability to identify objects on pictures from archaeological excavations must therefore be evaluated. As the two studies were conducted in parallel, a dataset was compiled based on pictures of an archaeological site. To simulate what an excavation report can produce as a set of labelled pictures, the learning dataset for this part was made on the basis of pictures taken at a given archaeological site.
A neural network, once trained and functional, is a black box. It is not possible to know the precise role of each of its constituent elements. In order to implement the convolutional neural network, the analogy with the perception of vertebrates was used. The amount of input data must be pre-processed to reduce the resource requirements for such processing. Then it is a sequence of repeated convolution and pooling ( Figure 2) that makes it possible to both reduce the volume of data and adapt it to the shape of the neural network input while maintaining the specificity of the information to be processed. The goal of neural network training is to specialize the neural network. In the end, it must perform only one specific task on data that it will never have processed. The specialization of the neural network is linked to the training that is applied to it on the basis of the chosen data set. The resulting performance must allow it to be effective when applied to data other than the training set. The risk is that it only works correctly on the dataset in question. It must be made robust. This is the role of the dropout layer. This step aims to randomly add error in the behavior of some of the components of the network and thus give it the ability to adapt to the different data of the training dataset. The implementation of a neural network for a specific use does not require the creation of a virgin tool. Indeed it has been proven that a pre-trained network even with pictures that have nothing to do with the study is faster to train. This transfer learning technique is presented in (Shin et al. 2016).
After various tests, the chosen architecture allows to recognize precisely the corresponding objects. The pictures are sub-sampled before being processed by the neural network. It remains to adjust the parameters to optimize the results. Learning is done by successive modifications of the network parameters by retropropagation. The quality of the result will depend on the number of epochs chosen. How then to be sure that the network does not evolve any more according to the chosen number of epochs. The evolution of the Loss function as shown in figure 7 will help to define the number of epochs. section that it appears in its entirety throughout the excavation. A contrast between the surrounding soil material and its interior is the main clue for its detection. It is 99% recognized in this example. A picture that does not contain a silo, here a cactus, is not associated with a silo at a percentage of 99%. The use of neural networks for the identification of archaeological artifacts seems possible. This conclusion has yet to be operationally exploited. A way must be found to extend the principle to all objects identifiable in the excavation reports. This generalisation is an integral part of the project but will not be dealt with here. These results must also be used to exploit the flow of a video and the chronology of filmed events that appear in it. During the duration of the excavation, the objects are in several aspects from their discovery to the last cleanings. The objects are therefore increasingly visible over time. They are therefore in their most identifiable form. The pictures from the excavation reports most often correspond to the objects that have been excavated, so it becomes interesting to go back in time to the videos made on site and try to identify the discovery of each object. As can be seen on the right hand side of Figure 8, the silo considered here is shown by its colour. The development, based on a characteristic of the histogram of the picture in which the object appears, is not yet complete. It does not identify a given silo but the oldest silo picture.

CONCLUSION
The acquisition of pictures by video on an archaeological site can be envisaged as a support to assist in the collection of archaeological data, on the condition that effective treatments are put in place to extract artefacts from these video sequences. The same videos can also provide the opportunity to generate 3D models from identified sequences. The evolution of image processing techniques shows that neural networks are very efficient for object recognition and our study shows that archaeological artifacts can be recognized using these tools. In order to do so, they need to be trained, which implies generating specific training datasets for each type of artefact. The excavation reports, as concentrates of archaeological knowledge, contain the information sought: pictures and legends. The study presented also shows that the automatic extraction of artifact pictures and their captions is possible despite the structure of the report files. In order to be able to create sufficiently large datasets for neural networks, it is necessary to be able to do this on a large number of excavation reports. The use of a neural network has proven to be effective on pictures from archaeological excavations. The results obtained are reliable. The management of artifacts of the same type during excavation was also presented, and initial results are promising for the monitoring of objects over time.