A NOVEL METHOD FOR INSPECTION DEFECTS IN COMMERCIAL EGGS USING COMPUTER VISION

The objective of this work is to compare the use of classical image processing approaches with deep learning approaches in a visual inspection system for defects in commercial eggs. Currently, many industries perform the detection of defects in eggs manually, this implies a large number of workers with long working hours who are exposed to visual fatigue and physical and mental discomfort. As a solution, this work proposes to develop an automatic inspection technique for defects in eggs using computer vision, capable of being operable in the industry. Different image processing approaches were evaluated in order to determine the best solution in terms of performance and processing time.


INTRODUCTION
Egg processing systems for human consumption consist of four main stages: harvesting, washing, sorting and packaging. In particular, the sorting stage represents a fundamental aspect in the poultry industry, not only for economic reasons, but also for health reasons. At this stage, the eggs are subjected to quality control through physical and appearance control, to rule out defective eggs from the line. Among the various causes of internal and external defects that can occur in an egg are summarized in Table 1. Some of the defects present in the eggs are detected during the inspection process using Candling, a technique that consists of placing the egg over a hole and applying a light source to show the details of the egg through the shell. At an industrial level, this technique, when performed by human operators, is not * Corresponding author-marcelo.stemmer@ufsc.br very effective, since workers face long hours of work exposed to intense light in poorly lit environments, causing visual fatigue. This not only results in low detection efficiency, but also causes various health problems in workers, such as progressive visual damage, physical and mental discomfort (Sebastián et al., 2018). To reduce the problems associated with implementing manual techniques, many companies have invested in new technologies for automated visual inspection of eggs using different approaches, such as optical, mechanical or acoustic techniques. Mainly, optical approaches implemented in computer vision systems have brought great benefits for inspection of production quality in various industries, especially in the food industry (Gomes and Leta, 2012). The main advantage of optical approaches is their non-destructiveness, this is due to the fact that the inspection process is performed without direct contact with the product, avoiding the least possible damage.

Defects
To date, several approaches based on image analysis have been proposed for the detection and classification of defects in commercial eggs. According to the information compiled in the systematic mapping of the literature performed in this work, it was found that most of these approaches employ classical image processing methods, managing to design successful applications with low computational cost (Lunadei et al., 2011). However, these methods can become complex due to the requirement of several steps, such as removing the background, maintaining the region of interest, applying filters, among others. Thus, there is a demand for new techniques that can handle these complexities.
In recent years, deep learning has been developed for automated vision-based tasks such as pattern recognition and image classification (LeCun et al., 2015). Convolutional Neural Networks (CNN) are one of the most important and successful deep learning methods in the field of image analysis in which several layers are efficiently combined and trained. CNNs have proven to be promising tools in the food and agriculture industry field (Sladojevic et al., 2016), (Farooq and Sazonov, 2017), (Shimizu et al., 2017), however these approaches can be computationally expensive. Given the uncertainties of the advantages or disadvantages that these two approaches, classical and based on deep learning, may present in egg inspection systems. This research proposes to develop a new method for automatic inspection of commercial eggs to detect the two most common external defects, dirt and cracks employing computer vision. Three different image processing approaches, classical, image classification with CNN and semantic segmentation, are evaluated in order to determine the best solution in terms of precision and processing time.

RELATED WORK
This paper presents a series of algorithms for detection and classification of dirt and crack defects in eggs using computer vision. As for related works, most of them employed traditional approaches to detection. These approaches make use of threshold-based segmentation, noise removal through filtering operations, edge detection, connected-component labeling and pixel counting for decision making.

Dirt detection
Several methods were proposed for dirt detection in eggs, some of them performed the extraction of the R-G-B components of the image, others worked on the gray scale image. In (Mertens et al., 2005) to segment the blood stains in brown eggs they applied a logical XOR operation between the original image and the R component to accentuate the stains and remove red from the eggshell. Afterwards, the G component of the image was extracted which allowed to obtain a clear differentiation of the background, the egg and the defects. To detect other stains, they worked on the gray scale image, in which the brightness and contrast properties were improved by equalizing the histogram and, in the case of white stain detection, by a gamma correction. Then, the gray scale image was transformed into a binary image by setting a threshold. The background was then removed, leaving only the particles associated with dirt. This work obtained an accuracy of 99% for the detection of dirt stains. A similar algorithm was proposed in (Lunadei et al., 2011) where they performed the extraction of the blue channel from the red one to obtain an image with a high discrepancy between the pixels of the egg white, background and dirt stain. To obtain the pixels associated with the stain, they performed a binarization based on Otsu method, applied a logical XOR operation between the egg mask and the binary image, and a labeling process to perform the classification. The proposed classification algorithm was able to correctly classify nearly 98% of the samples.
In (Yang et al., 2018) they used texture features instead of color information to segment the dirt stain in the white and brown egg shells. Texture descriptors such as average brightness, average contrast, smoothness and entropy were extracted from the histogram of the gray scale image. The average contrast and the inconsistency were chosen as the input features of FCM (Fuzzy C-Means Clustering) that allowed the grouping of the pixels of the dirt region. The proposed method to classify eggs reached an accuracy of 94.3% for white eggs and 90.5% for brown eggs.

Crack detection
Most of the works used egg candling lamps to highlight crack defects. Among the algorithms found, it was identified that these present a common processing sequence, first the background egg is segmented, then a method of edge detention is applied, then the removal of small noises is performed and finally the pixels associated with the cracks are identified. Some of these algorithms used simple edge detection and segmentation techniques, as in (Omid et al., 2013) and (Abdullah et al., 2017). Obtaining an accuracy of 96.25% and 90.6% for crack detection, respectively. Others used more sophisticated techniques such as (Mansoory et al., 2011) that used Fuzzy C-Means (FCM) and a Fuzzy thresholding as a segmentation method and Smallest Univalue Segment Assimilating Nucleus (SUSAN) as an edge detection method. The average crack detection accuracy for this algorithm was 90%.
In (Guanjun et al., 2019) they applied a negative Gaussian Laplacian filter as edge detection method, then they binarized the image using a Hysteresis thresholding and filtering operations were applied to eliminate possible noise. Finally, an improved LFI (Local Fitting Image) index was used to distinguish the crack region from regions associated with noise caused by dark spots on the egg shell with a recognition rate of 92.5%.
One of the papers that used deep learning to crack detection was (Nasiri et al., 2020) where they performed transfer learning in a CNN with VGG16 model. This model was pre-trained on ImageNet (Deng et al., 2009). The VGG16 architecture was modified by adding a classifier block instead of fully connected layers. This classifier block included global average pooling, dense, batch normalization, and dropout layers. The training data set consisted of images of size 224 × 224 × 3 subjected to data augmentation by rotation, height and width shift, zoom, horizontal-flip, and shear intensity. K-fold cross-validation was utilized to evaluate the model's uncertainty and performance in class estimation. The CNN model achieved an average overall accuracy of 94.84% by 5-fold cross-validation.

Dirt and crack detection
In (Nakano et al., 2001) they proposed a single method to crack and dirt detection. This method used traditional image processing approaches such as edge detection, elimination of noise components, contour removal, among others, to find these two defects. This was one of the studies with high accuracy rates: 97.3% in dirt detection, 96.8% in cracks detection and 98.5% in broken eggs detection. However, the algorithm obtained good results in eggs with a white shell, its performance decreased for eggs with a brown shell.
Another similar work was presented in (Machado et al., 2009) where they developed a series of algorithms for the detection of more than one defect in the white egg, including dirt and cracks. The proposed algorithms also used traditional image processing approaches. To perform the segmentation, they used a method known as Adaptive Background Subtraction that identifies the parts of the image that are in motion, in this case the eggs. For dirt detection, the connected components method was used to identify the region defect. A characteristic vector was extracted from this region, consisting of the statistical averages and statistical variations of the RGB colors. This vector was introduced into an artificial neural network to perform the classification. For crack detection, an edge detection analysis was performed, in which each edge was compared with a threshold to define the presence or absence of cracks. The accuracy rates obtained with these algorithms were 75.6% for dirt detection, 73.3% for crack detection, and 62.5% for gem basement detection.
In (Alon et al., 2019) they worked on the image in HSV and YIQ format to detect dirt stains and cracks in white eggs. In both processes they performed a normalization of the image, noise removal and binarization. To crack detection, they applied an edge detection method known as Bottomhat transformation. The proposed methods reached an accuracy of 95% to dirt detection and 90% to cracks detection.

Materials
This section describes the main components of the acquisition system developed for this work, the general information of the egg samples used for the development of the algorithms and the evaluation metrics to define their performance.
3.1.1 Acquisition system: The acquisition system based on computer vision was developed, as shown in Figure 1. This system consists of two main modules, a module associated with a closed compartment of (0.5 × 0.5 × 0.5) m to ensure constant lighting conditions for image capture and a movement module that allows the rotation and translation of the eggs through the compartment. Six columns of eggs were placed on the dual tapered rollers which moved forward on the rotating chain. The length of each roller was 335 mm and the interval between them was 28 mm. The eggs on the rollers are driven to roll around the axis A-A' counter clockwise as shown in Figure 1.
Two Basler acA1300-60 gc industrial cameras with 4-12 mm varifocal lens and 1/2-inch manual zoom were located on top of the compartment for image capture using two types of illumination: • Upper lighting: two 18 W tabular lamps were placed in the upper part of the compartment.
• Bottom lighting: 24 professional candling light with 25 W were placed to provide background lighting.
The cameras were connected via GigE to a laptop with an Intel Core i5-9300H 2.4 GHz CPU with 4 cores and NVIDIA Ge-Force GTX 1060 Intel UHD Graphics 630 GPU and 4 GB of RAM. The cameras acquire RGB images at 60 fps with a resolution of 1280 × 1024 pixels. Each camera captures 6 rows by 3 columns of eggs. The chain was rotated with a three-phase motor. The motor shaft was associated with a proximity sensor that sent a pulse each time the rollers rotated 360°on their axis. Image capture was synchronized with the arrival of each pulse to ensure that the entire surface of the egg was captured.

Egg samples:
In this research, a total number of 300 chicken eggs with different shell colors (white and brown) provided by the industry were selected. The categories and number of collected egg samples are presented in the Table 2. These eggs were introduced several times into the closed compartment with the moving chain using a lighting type for each capture. This allowed the obtaining of 4,000 images like the ones shown in the Figure 2.

Egg Category
Description Total Egg Normal Clean eggs, which may have small stains of dirt, but do not detract from the overall clean appearance of the egg. These spots should not cover more than 1/32 of the egg's surface. Eggs without cracks in the shell.

150
Dirty Eggs with dirt stains covering approximately more than 1/32 of the egg's surface.

100
Cracked Egg that has a broken shell or crack in the shell. 50 Table 2. Categories and number of collected egg samples.

Evaluation metrics:
In order to measure the performance of the proposed algorithms, the precision, recall and F1 score are used as the evaluation metrics. Additionally, the processing time used by each algorithm is evaluated. The precision denotes the predicted proportion of positive cases that are correctly true positives, that is, among all the cases that are predicted to be positive, how many are true. This metric is the most convenient to analyze when there is a high cost associated with false positives. The recall measures the ability to qualify all positive samples, that is, it calculates how many real positives the model captures, labeling it as positive. This metric is the most convenient to select the model when there is a high cost associated with false negatives. And the F1 score allows one to obtain a general measure of the precision of a model, combining precision and Recall. This metric is ideal when looking to balance between precision and recall. The definition of precision, recall, and F1 score are given by Equation (1) to (3).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2021 XXIV ISPRS Congress (2021 edition) To determine the semantic segmentation model with the best performance, the IoU metric was used. This metric measures the similarity and diversity between sets of finite samples based on their intersection and union, as seen in the Equation 4.
3.2 Classification and defect detection using classical processing approaches To detect dirt defects, it was decided to work with images captured with upper lighting, since they present a greater range of values that allow the extraction of different characteristics of the stain on the eggs. On the other hand, to detect defects associated with cracks, it was found that it is easier to detect these defects in images captured with candling light since the cracks allow light to pass through the shell, which generates a better differentiation of the defect. The proposed algorithms were implemented in Python using the OpenCV library version 4.5.0.

Verification: As a first step, a region of interest (ROI)
is defined for each individual egg. Afterwards, a verification algorithm is applied to determine if the evaluated region is not empty. This verification is performed on the gray scale image using a segmentation method based on a range of pixel values.
To define this range, the histogram of a white egg, brown egg and empty ROI image was analyzed, where it was found that white eggs have a higher occurrence in pixels of intensity at 120 and the others lower than 120. Implementing the segmentation of each ROI with the boundaries [120-255] results in a black and white image associated with the egg mask where the white pixels correspond to the egg and the black pixels to the background. To differentiate between brown eggs and empty ROIs, the R component of the RGB image is extracted and a basic thresholding operation is used with the range [56-255] that allows finding the mask of the brown egg. If the total of white pixels is greater than or equal to 8,000 the image is classified as brown egg, otherwise as empty ROI.

Dirt detection:
This algorithm is applied on the previously verified ROIs. A dirt detection algorithm was created for both egg colors: • Stain in white eggs: To find dark stains, the gray scale image was transformed into a binary image employing an adaptive threshold with a 5 × 5 kernel. To find stains with lighter shades, RGB color image is converted to HSV image, a thresholding operation is applied to the image using a lower boundary of [0, 0, 0] and an upper boundary of [61,255,255].
• Stain in brown eggs: To detect dark stains, the gray scale image contrast is improved using histogram equalization and an adaptive threshold is applied with a 5 × 5 kernel. To find stains with lighter shades, a basic threshold is used to the HSV image with the lower boundary [40,50,70] and the upper boundary [179,101,255].
As a result of the two threshold operations to detect light and dark stain in both eggs colors, two black and white images are obtained where the white pixels are associated with the dirt stain, the egg contour and/or the background (Figures 3b and  3c). These two images are subjected to an addition operation, then the background and contour are removed in order to obtain the pixels associated only with the dirt stains ( Figure 3d). Finally, to define whether an egg is dirty or clean, the ratio of the total of white pixels associated with the dirt stains over the total of the pixels associated with the egg is found. In case the ratio is greater than or equal to 0.01, the egg is classified as dirty, otherwise it is classified as clean.

Crack detection:
The crack detection algorithm is applied to the previously verified ROIs. As a first step, the image is smoothed through a filtering operation. Then the R component of the RGB image is extracted and subjected to a basic thresholding operation using  and  boundaries for white and brown eggs, respectively. The binary image obtained works as a mask on the R component employing a logical AND operation between the two images. To find the cracks, an edge detection method is applied using a Gaussian Laplacian filter with a sigma value of 1.9. Finally, the contour is removed to obtain the pixels associated only with the cracks (Figure 4d).

Classification and defects detection using deep learning approaches
Two methods were proposed, image classification and semantic segmentation. These methods were implemented using the The ResNet-34 architecture is used as a training model for the image classification method and as backbone for the semantic segmentation method. This architecture was selected based on the results obtained in (Canziani et al., 2016), where ResNet-34 presented a good balance between accuracy, number of operations and memory use.
3.3.1 Image classification: The dataset used for the CNN training consists of 10,000 images of white and brown eggs with a resolution of 157 × 252 pixels. Among these images, there were 6,000 normal eggs, 3,500 dirty eggs and 500 cracked eggs. Additionally, there were 4,000 examples of empty images. The data set for validation was defined in 30% of the total images.
The Resnet-34 model was pre-trained in ImageNet which is a dataset with 1.2 million images with 1,000 classes. This network was used as a fixed feature extractor using Transfer Learning. A batch size of 64 and a learning rate of 1e − 6 for the first layers and 1.5e − 3 for the last layers were used. The dataset was normalized using the ImageNet data statistics. In addition, a data augmentation was performed by applying a mirror inversion to the images. Cross-entropy loss was used as the loss function, weight decay as the regularization technique, and precision, recall and F1 score as evaluation metrics.

Semantic segmentation:
A total of 7,702 images with a resolution of 157 × 252 pixels were selected for training. These images were subjected to a labeling process using a graphical image annotation tool known as Labelme (Torralba et al., 2010). An example of these images is shown in Figure  5. The data set was divided into 70% for training and 30% for validation. A CNN was trained with Unet-ResNet34 model, pre-trained on ImageNet. A batch size of 12 and a learning rate of 1e − 5 for the first layers and 1e − 3 for the last layers were used. The cross-entropy loss was employed as the loss function, weight decay as the regularization technique and IoU as the evaluation metric.
As a result of the training, a segmentation model was obtained that reached an IoU of 0.97, 0.95, 0.91, 0.48 and 0.41 to segment the background, the brown egg, the white egg, dirt stain and cracks respectively. The results in the dirt and cracks classes may seem low in relation to the other classes, this does not mean that the models were unable to detect these types of defects. This means that relative to the mask that contains the ground truth, the prediction was not exactly the same, since manually labeled defects are not 100% accurate. This statement can be verified in Figure 6, where the model manages to segment the defects even without being totally equal to the ground truth label. Once the segmentation model is obtained, it is adapted as a classification algorithm. In this algorithm the predictions obtained by the model are used. These predictions are tensors of size 157 × 252 pixels, from which the unique ordered elements of the matrix and the occurrences of each class are obtained. From these two vectors it is possible to determine the color egg, identify cracks and define empty images. To determine the degree of dirt, the ratio between the occurrence of pixels with the class associated with dirt over the sum of the occurrences of pixels of all other classes, except the background, is found. If the ratio is greater than 0.1, the egg is classified as dirty, otherwise it is classified as clean.

Inspection parameterization software
A software system was created to separate the regions of interest, parameterize the algorithms and facilitate the visualization of the results. Such a system, which has a graphical interface for ease of use was implemented in C ++ using the QT graphical library. An overview of the developed interface is presented in Figure 7.

RESULTS AND DISCUSSIONS
To evaluate the algorithms, 70 images were captured with the inspection system developed. The images had a resolution of 1280 × 1024 pixels and could contain between 0 to 18 individual eggs. The precision (P), recall (R) and F1 score (F1) metrics were evaluated. To determine the processing time, the time from when the image entered the algorithm until the prediction was obtained was measured. The results are tabulated in Table 3.
The results showed that the best processing time was obtained using classical algorithms. With a time of 0.049 ms for each individual egg image. The longest processing time was obtained using semantic segmentation algorithm with 0.47 ms. In normal eggs classification (clean and uncracked eggs) the algorithm with the best balance between precision and recall was the classical algorithm with an average F1 score of 95%, outperforming algorithms based on deep learning by 2.6%.
For dirt and crack classification, the recall metric is the most appropriate metric to be analyzed, since it is expected to obtain the least number of false negatives, that is, the least number of dirty or cracked eggs categorized as clean and uncracked eggs.
For dirt detection, the best result was observed in the classical algorithms, with a Recall of 95%. In the case of algorithms based on deep learning, a Recall of 89% and 65% was obtained for image classification and semantic segmentation, respectively. The low result of the semantic segmentation method in relation to the other methods was probably due to the fact that this algorithm was extremely sensitive in detecting dirt, where stains almost imperceptible to human vision were detected. An example of this is shown in Figure 8. This high sensitivity caused eggs initially considered clean were considered dirty. On the other hand, the semantic segmentation algorithm achieved high performance in crack detection, with a recall of 99%, outperforming the classical and image classification method by 11% and 9%, respectively.
One of the factors that affected the classical algorithm in crack classification was the presence of dark spots on the eggshell (Zhang et al., 2016). The dark spots have irregular shapes, such as points, stripes, flakes, etc. They vary in size and number, which can range from hundreds to thousands. Many of these spots can be mistaken for cracks, since their intensity and shape are very similar. The crack detection with classical approaches focused on generating a balance between crack detection and noise reduction caused by dark spots. However, it is difficult to find a balance between these two factors when working with traditional approaches, where classification is performed using thresholds. This is because when a threshold is defined that reduces noise, it is possible that important information of the evaluated region is also being eliminated. Therefore, it is difficult to obtain perfect results, where most of the time some factors must prevail over others.

Algorithm selection
One of the technical requirements provided by the partner company was that the inspection system must operate at a speed of 60 Hz which is equivalent to analyzing 4.1 eggs per second. The proximity sensor associated with the motor shaft with this speed sends a pulse every 240 ms. Considering that there are two cameras, each one captures between 0 to 18 eggs, 36 eggs must be processed in a range of 0 to 240 ms.
Considering the above, the three algorithms proposed in this work used an average less than 17 ms to make the prediction of 36 images. This means that all three algorithms can be implemented in the developed inspection system.
In the evaluation metrics, the best precision rates were obtained using classic algorithms. With rates of 94%, 89% and 99%, for normal, dirty and cracked egg classification, respectively. However, these approaches have a disadvantage, that to ensure high performance in these algorithms it is necessary that the images do not show variations. In this type of process, this task is not easy, it is not always possible to guarantee a constant light intensity, since the light sources suffer wear over time, which means that the parameters must be constantly monitored, increasing the cost of implementation.
On the other hand, in algorithms based on deep learning this problem does not represent a great challenge, because when the training database is performed, it is possible to train the models by making modifications to the images such as decreasing or increasing the contrast and lighting. This allows the models to respond to small variations in the images.
The image classification with CNN obtained intermediate results among the three evaluated methods, with an average precision of 91%, 83% and 99% for the detection of normal, dirty and cracked eggs and an average processing time of 0.11 ms. This method is presented as a promising option to be implemented in the egg inspection system proposed for this research; however, it is necessary to improve the precision in dirt detection, either with the implementation of other architectures or the increase of the training database.

CONCLUSIONS
In this study, three image processing methods were implemented for dirt and crack detection in white and brown eggs. The first method used classical processing techniques such as threshold-based segmentation, math operations, filtering operations, edge detection methods and pixel counting for decision making. The second method uses a CNN with ResNet-34 architecture for classification. The third method used semantic segmentation with the Unet-ResNet34 model. Both models were pre-trained on ImageNet. The average precision obtained was 94%, 91.25% and 85.75% for the three methods, respectively. Additionally, the average processing time of each method was measured. The times obtained were 0.049 ms, 0.11 ms and 0.47 ms, respectively.  Table 3. Comparison of the results obtained by the proposed algorithms for egg classification.
The classical approaches presented a better balance in terms of response time and precision. These approaches were able to detect various characteristics of eggs, such as color and surface defects of the eggshell, with a relatively fast response time compared to the other approaches. Nevertheless, the performance of these methods may decrease if there are variations in the images. Additionally, these approaches presented some difficulties in their implementation, several tests had to be performed with different algorithms to determine which approaches allowed to classify the characteristics of the egg in a satisfactory way and that were not computationally expensive. Performing the integration of all algorithms from segmentation, filtering, feature extraction to decision making, required an extended implementation time and advanced knowledge in the application of image processing techniques.
On the other hand, deep learning approaches do not require this series of implementations since these approaches receive an image with its respective label and the network does all the work of extracting the characteristics obtaining a final result as output. This eliminates all the processing flow involved in classical approaches, making it easier to implement. However, these approaches have other important considerations. For example, they require a large dataset with as many images as possible for each category and hardware with a large memory capacity for training. In addition, in networks with supervised learning as was the case in this project, they require that each image be associated with a label, which is a tedious process, with large investments of time, and if it is done using only human vision as a reference it can be susceptible to errors.
As future work, it is proposed to implement other CNN models to improve dirt detection results. Additionally, it is proposed to evaluate other types of defect such as deformities and internal defects.