IMAGE-BASED DEEP LEARNING FOR RHEOLOGY DETERMINATION OF BINGHAM FLUIDS

In this work, a method to predict the rheological properties of ultrasonic gel, as a reference substance of cement paste, is presented. For this purpose, images are taken with a stereo camera system which show a mixing paddle moving through the ultrasonic gels of different consistency, thus setting them in motion. A digital elevation model (DEM) and a corresponding orthophoto are created from the image pairs using classical image matching and orthoprojection methods. These are used as inputs into a Convolutional Neural Network (CNN), which predicts the support points of a flow curve which classically have to be determined in a rheometer in the laboratory. A simple network architecture consisting of a small number of convolution layers is compared with a pre-trained ResNet-18, which is fine-tuned using gel images. In a second series of experiments, rheological parameters, which alternatively need to be deduced from the flow curve in a separate step, are determined directly from the images. In the third series of experiments, the influence of different factors is tested, such as the position of the cameras relative to the direction of paddle movement and the importance of the DEMs and orthophotos in the training. It is shown in this paper that it is possible to predict the rheological properties of the ultrasonic gels with a suitable setup with a satisfying accuracy.


INTRODUCTION
Concrete is one of the most widely used building materials in the construction industry; in Germany alone, the production volume of ready-mixed concrete in 2020 was over 50 million m 3 (BTB, 2021). To conserve natural resources, demolition material (concrete, masonry, rock) from the deconstruction of buildings can be reused in concrete production in the form of recycled aggregates. In today's construction industry, however, mineral resources tend to be downcycled rather than recycled, although according to the German Federal Statistical Office, the construction sector was responsible for more than 55 % of the total German waste generated in 2019 (Statistisches Bundesamt (Destatis), 2021). One of the reasons is the fact that the properties of the demolition materials to be recycled vary widely and can negatively influence the new concrete to an unknown degree -here in particular the fresh concrete properties (workability, tendency to segregate). To overcome this problem, more cement can be used in the concrete mixing process when working with recycled aggregates to compensate for these fluctuations. However, this is a solution that is neither environmentally nor economically justifiable, because large amounts of CO 2 are released during the production of cement. If the properties of the freshly mixed concrete could already be determined during its mixing process, it would be possible to compensate the unknown negative effects, which mainly result from the use of recycled materials and from variations in the mixing process, e.g. with suitable additives.
The ReCyCONtrol 1 research project aims at achieving this goal. To this end, one part of the project is the observation of the concrete mixing process using optical stereo sensors. The aim is to evaluate the flow behaviour of the concrete in realtime and, thus, to predict its rheological properties. Based on this data, the concrete can then be specifically adjusted towards the desired characteristics.
To describe the rheological properties of concrete, the Bingham model is used in most cases (Yahia et al., 2016), which is described by the plastic viscosity η in Pa·s and the yield stress τ0 in Pa. These values describe the relationship between the shear rateγ in 1/s and the shear stress τ in Pa by τ = τ0 + ηγ. (1) The difference to a Newtonian fluid (such as water) is that a certain shear stress, the yield stress, must first be reached in order to set the substance into motion. Plastic viscosity, on the other hand, describes how flowable a substance is when it is in motion. While the plastic viscosity of fresh concrete can be roughly estimated visually by experts during the mixing process, this is not the case for the yield stress. In this work, Deep Learning is used to identify both quantities, as the network learns correlations between the values and the visual appearance itself without the need for expert knowledge.
As a first step and proof of concept, it is shown that it is possible to determine the rheological properties of Bingham fluids in motion from image data. Since working with fresh concrete always involves a lot of effort and since the time that can be worked with the fresh concrete before it hardens is limited, this work uses coloured ultrasonic gel mixed with water to varying degrees, a substitute which is a commonly used reference substance for cement paste (one of the main components of con-crete (Neroth and Vollenschaar, 2011)). Clay granulate is used to simulate the aggregate in the concrete. Thus, more tests can be carried out where different camera and illuminant positions are tested -especially with regard to 3D reconstruction.
In this paper, a learning procedure is presented that uses a digital elevation model (DEM) and an orthophoto, both derived by classical photogrammetric methods, to predict rheological properties. A regression is performed by a Convolutional Neural Network (CNN), which outputs specified values describing the rheological properties in each case. The main contribution of this paper is to prove the general possibility to extract rheological properties from images of Bingham fluids.
The paper next gives an overview of current research on automated processes in the construction industry. In section 3, the methodology used in this work is presented. The data set is described in section 4 including the data generation and the determination of the reference values. Finally, in section 5, the experiments are presented and the results are discussed. Section 6 concludes the paper and discusses future work.

RELATED WORKS
Automatic processes for the quality assurance of concrete are not widespread in the construction industry. Research in the automation of processes exists in the field of hard concrete, among others. (Song et al., 2020) present an approach to perform a semantic segmentation of concrete sections using Deep Learning to determine air voids. In (Coenen et al., 2021), a semi-supervised approach is used to segment the concrete aggregate, which can be used to determine the particle distribution. However, an important research point is whether or not the quality of a concrete can be predicted before it is used.
On the construction site, the quality of the fresh concrete is currently checked with classical manual methods, such as the slump test. By taking a stereo image of the "concrete cake", (Tuan et al., 2021) show that the manual slump measurements can be replaced by image measurements. The authors argue that human involvement requires time and effort and also has limited accuracy, while the proposed method, not requiring human involvement, has many potential advantages and ensures a better reliability of results. An automatic classification into slump cases via Deep Learning has also shown success. However, the concrete is only assessed in the mixed state and must be disposed of, if the quality is insufficient. Methods with which the quality can be determined before or during the mixing process are therefore of significant importance.
In (Chidiac and Mahmoodzadeh, 2009), the most common models of concrete technology and literature for determining the plastic viscosity of concrete based on its composition are reviewed. The results show that there are different ways predicting plastic viscosity from model to model and there are variations in the results, too. The most recent work predicting concrete properties based on its composition is (Nguyen et al., 2020). Using a hybridization of a Least Squares Support Vector Machine (LSSVM) and Particle Swarm Optimization (PSO), the yield stress is determined in addition to the plastic viscosity, leading to good performance. However, none of the models can handle the use of recycled aggregates, as their composition can vary considerably, leading to different water demands during the concrete mixing process for instance.
In (Lau Hiu Hoong et al., 2020), the problem of fluctuations in recycled aggregates is addressed and a near real-time method is developed to determine the composition of the recycled aggregate. The methodology is based on image analysis using a CNN. With the help of this method, the recycled aggregates can be pre-sorted based on their quality. It might be possible to extend the models for rheology prediction with the previously mentioned method, but uncertainties in the prediction are still to be expected due to fluctuations in the mixing process.
Even if the desired mixing ratios are known, inaccurate ratios can still occur in the mixing process. In (Yang et al., 2020), a method is developed to monitor this ratio by taking images of the fresh concrete immediately after mixing and using Deep Learning methods to perform a multilabel classification that assigns the concrete to one of five classes of water-to-binder-ratio and sand-to-aggregate-ratio, and one of three classes of nominal maximum particle size of coarse aggregate classes.
Both, (Li and An, 2014) and (Ding and An, 2018) evaluate images of concrete from the mixing process in a single-shaft mixer based on its workability. While (Li and An, 2014) use classical image analysis methods and determine the slump flow values and the V-funnel flow time by extracting the shape of the concrete in the mixer using pre-defined features, (Ding and An, 2018) show that it is also possible to determine the two values using Deep Learning, the method is thus independent of human experience and insights into the appearance of concrete with different rheological properties, once training data have been acquired.
The methods mentioned above usually only attempt to automate tests (e.g. the slump test) that are still carried out by human hands on the construction site. It has been shown that the related values can also be determined to a certain degree during the mixing process. Our aim is to predict the rheological properties from information acquired during the mixing process by automatically determining a flow curve from which values for the two rheological properties of yield stress and plastic viscosity are determined. Based on a reference flow curve from the laboratory the method is evaluated. In principle, it runs in realtime, thus, suitable additives can be added directly during the mixing process.

Introduction
Our goal is to develop a method that can predict the rheological properties of a Bingham fluid serving as reference substance for fresh concrete by observing the mixing process with imaging sensors. This method should then be adapted to real fresh concrete, however, this second step is beyond the scope of this paper. Using Deep Learning, the properties of the substance which simulates fresh concrete are to be determined via a regression model. A CNN learns features of recorded images, which show a mixing paddle moving through the reference substance at different time steps. Since it is assumed that additional information can be derived from the three-dimensional surface shape of the flowing substance, we use a stereo camera system in our approach.
It is of course possible to feed the network with the two stereo images as input. The network then has to implicitly learn the 3D surface information assumed to be relevant for the task to be solved. In a second scenario, the 3D information can be calculated beforehand and then used directly as input. Since in the first scenario less information is provided and the rest (here the DEM) consequently has to be learned, more training data tends to be needed. Given, that training data is scarce, the second scenario is investigated in this work. In a two-step procedure, first the 3D surface of the concrete is calculated from the captured stereo images using classical image matching methods. In order to co-register the images and the DEM, an orthophoto is then calculated from the input images. Consequently, the input of the CNN is a two channel image consisting of the orthophoto (as greyscale image) and the DEM. In our work photogrammetric processing is carried out using the commerical software Agisoft Metashape 2 .
The outputs of the CNN are either the support points of the flow curve (9 output neurons in our case) or the regression parameters (2 output neurons) -calculated from the flow curve -, which have a direct link to two important quantities in concrete rheology, i.e. plastic viscosity and yield stress. Both approaches are tested because, although in most cases the main interest lies in the plastic viscosity and the yield stress, there are different methods to determine these from the flow curve, and thus an individual calculation can still be carried out when predicting the flow curve. Another point is that the flow curve contains additional information that can be analysed by building material experts, if desired.
In the following subsections, the used CNN architectures are described as well as the training procedure.

Network architectures
We use two different network architectures in this work: The first one is a simple architecture, here referred to as Default CNN, which consists of 7 convolution layer with a 5x5 kernel and a stride of 2, each followed by batch normalisation and ReLU as activation function. A fully connected layer maps the features to the 9 or the 2 output neurons. In total, this architecture has 377 457 and 376 106 parameters, respectively, which are estimated from scratch in the training process. This architecture was chosen because for a CNN it has relatively few parameters, which in itself is also a type of regularisation and can thus prevent overfitting to the training data.
Another method to deal with limited amounts of training data is to use a pre-trained network. The weights of these networks serve as good initial values and are adapted during re-training. For a first set of experiments, an existing CNN architecture pretrained on the ImageNet (Deng et al., 2009) is used. However, the first layer and the last layer must be completely trained from a random initialisation, since a different number of input channels and of output neurons are needed. We are aware that the weights are pre-trained on RGB imagery, but the assumption is that the pre-trained weights can still be a support in the training process. Also, for this reason, all layers are re-trained, i.e. not only the last layers are fine-tuned.
In preliminary experiments, different architectures were tested, all with a small number of layers (ResNet-18 (He et al., 2016), VGG-11 (Simonyan and Zisserman, 2015), GoogLeNet (Szegedy et al., 2015) and AlexNet (Krizhevsky et al., 2012)). All of these architectures have shown very similar performance, which is why only the experiments with one architecture are presented in this paper; we have chosen ResNet-18. Having a 2-channel input and an output layer consisting of 9 neurons, the ResNet-18 architecture has a total of 11 177 993 trainable parameters, which is roughly 30 times more than what the Default CNN has.

Training
Training of the Network is carried out as follows. The network weights ω are iteratively adjusted by minimising an error function E(ω). In this work, the mean squared error (MSE) is used. For a batch of N samples, the squared difference of all K outputsŷ k n of a sample xn and the corresponding true values y k are averaged over all outputs of all samples. This gives (2) As another tool to prevent over-fitting in addition to those mentioned so far, weight decay is used. It is added with a factor of λ to the error function and is a penalty term for large weights, leading to 4. DATA GENERATION

Image acquisition
Ultrasonic gel is regularly used as a reference substance for cement paste (one of the main components of concrete) to investigate rheological properties (Haist et al., 2020). A major advantage of using ultrasonic gel is that its rheological properties -unlike cement paste -remain constant over a longer period of time, which means it can be used several times for experiments. In an experimental set-up, we currently use greycoloured gel mixed with water to varying degrees. To simulate the aggregate in the concrete, clay granulate is added. The mixture is then filled into a horizontal channel and set in motion with a mixing paddle. This process is recorded in several runs with a stereo camera system. A graphical illustration of the set-up can be seen in Figure 1. The cameras used in this work are Grasshopper 3 USB cameras with a focal length of 8 mm.
The 1920 px x 1200 px images were acquiesced at a frequency of 30 Hz. A trigger signal was sent from one camera to the other when a picture was taken. To check the synchronicity, an LED array was used in which 20 LEDs display a time in millisecond intervals. By combining the LEDs that were switched on and off, a time stamp was generated, which confirmed that the stereo image capture indeed occurred at the same instant in time. A challenge was the placement of the camera together with the illumination so that the ultrasonic gel would not reflect too strongly and thus, lead to errors in the 3D reconstruction and beyond. For this reason, the channel was completely darkened and two lights were placed in a way that they did not shine directly on the gel.
For each of the 21 mixed gel samples, we recorded at least 14 image sequences; the mixing paddle moved towards the cameras in 7 sequences, and in the other 7 sequences, the paddle The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France moves away from the cameras. Each sequence consists of 40 image pairs. A DEM with an corresponding orthophoto of size 360 px x 679 px was then created from each stereo pair and was smoothed over 5 time steps over each image sequence with a box filter. By defining the datum of the 3D coordinates using stable markers located on the edge of the channel, the results of all pairs refer to an identical coordinate system. A challenge regarding the reconstruction were the occlusions caused by the mixing paddle. In Fig. 2, a pair of images is presented. Fig.  3 shows the depth maps and the orthophotos of three different samples of the ultrasonic gel. The images also depict the erroneous reconstruction in the orthophoto and the blurred depth in the depth image behind the mixing paddle caused by occlusion.

Reference values
As a reference, the different gel samples are placed in a viscometer in a laboratory, and a flow curve is determined through nine support points. The measurements were carried out with the Viskomat NT from Schleibinger 3 . 370 ml of each sample is placed in a rotatable round vessel and a paddle is immersed. The torque (Nmm) needed to obtain a certain rotational speed of the vessel (min -1 ) is then measured at 9 different rotational speeds (50,40,30,20,10,8,6, 4, 2 min -1 ) for 30 s each, starting at the highest speed. All samples should have been measured twice, but due to device failures in eight cases only one measurement was usable. Figure 4 shows the flow curves of the 21 samples and indicates the support points. It is noticeable that there is a clustering of flow curves with similar values in the middle of the value range and that the value gaps between the samples are larger towards the outside. From these measurements, the yield stress and the plastic viscosity can be derived.
For the experiments in which the yield stress and the plastic viscosity are to be determined directly, the related reference values are calculated from the flow curve using the method from (Haist et al., 2020). Here, a regression is calculated from the almost linear part of the flow curve (between 20 and 50 min -1 ). With the multiplication of previously determined factors, the slope m in Nmm·min and the y-axis intersection n in Nmm of the regression line can be mapped to plastic viscosity in Pa·s and yield stress in Pa.
In table 1, the values of the support points and the corresponding regression parameters are listed. The difference to a second measurement, if it exists, is given in brackets. It can be seen that there were sometimes larger differences in a measurement for the same sample (e.g. sample 19 and sample 23). Especially for the eight samples of which there is only one measurement, it is difficult to assess, how much they can be trusted.

Rationale
In the experiments, the following aspects are investigated. First, a comparison between the Default CNN architecture with a random initialisation and the pre-trained ResNet-18 is carried out for the prediction of the nine support points of the flow curve.
In the second series of experiments, the regression parameters of the flow curve, which have a direct link to plastic viscosity and yield stress, are determined directly. These can then be compared with the regression parameters obtained by calculating a regression from the predictions of the support points from experiment 1. The third set of experiments deals with the influence of individual variables, either by working only with images in which the mixing paddle moves towards or only away  from the camera. Especially for future use, it is interesting to see whether or not the network can generate more precise information about the behaviour of the substance in front of or behind the mixing paddle. In addition, the influence of the orthophoto and the DEM is tested by training with only one of the two inputs. The input for the CNN is considered to be individual independent images and/or DEMs without any connection in time. The fact that the input comes from image sequences is thus not exploited in the research reported here and will be investigated in future work.

Training configuration
The available samples are divided into training (15 samples), test and validation set (3 samples each). To do so, 3 splits are randomly generated so that the evaluation can be done on different samples. In figure 6, the first column shows the distribution of the samples in the individual sets. The network training is performed with Stochastic Gradient Decent (SGD) using a Nesterov momentum of β = 0.99 based on the formula of (Sutskever et al., 2013). The learning rate for the Default CNN architecture is set to 1 · 10 −2 , while a learning rate of 1 · 10 −3 is used for the ResNet-18 architecture. A lower learning rate showed better results for the validation set in some preliminary tests with the pre-trained networks, probably because the weights are already rather accurate and do not need to be adjusted too much further. The weight decay parameter was set to λ = 1 · 10 −3 for both architectures. These parameters were determined in preliminary tests. Training is carried out for a maximum of 1000 epochs (an epoch comprises one run of the complete training set), but is terminated prematurely if the validation accuracy has not improved for 250 epochs. The evaluation is then carried out with the parameters that showed the best accuracy during validation.
The grey values of the orthophotos are normalised per sample to mean 0 and standard deviation 1. In the DEMs, all images in a sequence are subtracted from the mean elevation determined immediately before the sequence was acquired and thus showing a horizontal plane without the mixing paddle. Afterwards, all datasets of all samples are multiplied by the same factor, so that the value range is between -1 and 1.
For numerical reasons, the reference values are also scaled to the interval [0,1]. In detail, this means that the reference points of the flow curve were divided by 300. For the regression parameters, the slope is not scaled and the y-axis intercept is divided by 200. Thus, the intercept point lies in a similar range of values as the slope. If two reference measurements are available, one of the two is randomly selected per training sample. In the validation and test, the mean value of the two measurements or regression parameters is chosen as the true value. Data augmentation was carried out by randomly changing the brightness and contrast of the orthophotos in a certain interval. The DEMs are augmented by adding the same random value to each pixel of the dataset.

Results and Discussion
In this section, the results of the three experiments are presented and discussed.
5.3.1 Experiment series 1: In figure 6, the predictions for the images of the test samples of all test splits are plotted. In addition, the mean value over all predictions of a sample and the reference values are shown. Table 2 lists the MAE and the standard deviation across the predictions. The given standard deviation can also be seen visually in the plots. The MAE averaged over all support points lies between 3 and 10 for the Default CNN and between 3 and 14 for the ResNet-18. In general, it can thus be said that it is possible to predict the support points of the flow curves of individual samples with both architectures, while the Default CNN performs a little better in accuracy. For samples whose flow curves tend to be outside the range of the other values (samples 4 and 12), the prediction is not very accurate, however. Especially in split 3, an extrapolation has occurred in the test set for sample 4. The mean value of all predictions of sample 12 in split 2 agrees with the results of Default CNN (compare Fig. 6 (b)), but the standard deviation is rather high. With ResNet-18 on the other hand, the predictions are more precise, but the mean value over all predictions is less accurate. The MAE is also lower for the Default CNN for sample 12.
If the differences between two measurements are looked at in table 1, it can be seen that also the determination in the viscometer is subject to errors. For example, the difference between two measurements in a support point for sample 23 is as high as 6. Under these circumstances, the prediction accuracy achieved can be considered a success. Overall, the Default CNN delivers a slightly lower MAE and is therefore also used for the further experiments. One reason for this may be that the much smaller number of parameters alone forces the network to generalise better. Moreover, the ResNet is pre-trained on RGB images, whereas here, the input is a greyscale image and a DEM. Table 3 shows the results of the second series of experiments. It can be seen that the direct determination of the regression parameters does not have any advantages with respect to experiment 1, which is somewhat surprising. The reason may be that the regression parameters are somewhat prone to error, since they are only determined by four support points. For example, the difference in the slope of the regression lines from both measurements is 0.3, which is a large difference compared to the range of slope values of all samples lying between 0.6 and 1.9. If training is then carried out directly with such an erroneous measurements, this can lead to unstable results, whereas, when training is done with all 9 support points, a single inaccurate value is not as significant.

Experiment series 2:
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France  It should be noted that training only with orthophotos gives worse results than training only with DEMs. When using only images from sequences where the direction of movement of the mixing paddle is the same, the results show a higher accuracy for the mixing paddle moving away from the camera, suggesting that the groove the mixing paddle leaves behind is a clearer indication of the rheological properties than the somewhat elevated parts of the surface.

CONCLUSION AND FUTURE WORK
The results of this work have shown that it is possible to determine the rheological properties of ultrasonic gel, used here as a reference substance for cement paste, on the basis of image data, considering the potential inaccuracy in the determination of the reference values. This is in contrast to what experts can visually achieve: they typically can determine the approximate viscosity of concrete during the mixing process, but not the yield stress. It should be noted, that while viscosity and yield stress are actually two independent variables, in our experiments, these two quantities are highly correlated. Figure 5 shows this correlation. Thus, it is possible that in our experiments the yield stress (y-axis intercept) could only be determined by its correlation with the plastic viscosity (slope). This presumption needs to be tested in future research.
It is also noticeable that the standard deviation of the predictions of a sample can be relatively high, while still almost predicting the true value on average, leading to the assumption that random rather than systematic errors dominate the results. If this finding turns out to be correct, it should be considered for following applications to predict the rheological parameters based on several images where the mixing paddle is located at different positions in the image. In this sense, the use of the Long-Short-Term-Memory (LSTM) methodology (Hochreiter and Schmidhuber, 1997), which allows the processing of entire image sequences, should also be investigated. In this way, inaccurate results from some images could be compensated for.
In future work, the same setup will be tested with real concrete. Even though this work has shown that the prediction of the flow curve based on images only (here: orthophotos) is more difficult, monoscopic approaches should still be considered for use with concrete, because the appearance of concrete, such as colour and reflections, could be a reliable indication of its rheology. (c) ResNet-18 Figure 6. Results for the test sets of 3 different training splits. All predictions of the images belonging to a sample are shown with a thin line in the colour of the respective sample. The thicker dashed lines in the darker colours mark the average prediction values over all test images of a sample. The thicker solid lines in the same colour indicate the reference value (in case two measurements were available, both are plotted).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France (c) Training only with images from sequences in which the paddle moves towards the camera. Figure 7. Results for the test sets of 3 different training splits. All predictions of the images belonging to a sample are shown with a thin line in the colour of the respective sample. The thicker dashed lines in the darker colours mark the average prediction values over all test images of a sample. The thicker solid lines in the same colour indicate the reference value (in case two measurements were available, both are plotted).