LONG WAVE INFRARED IMAGE COLORIZATION FOR PERSON RE-IDENTIFICATION

: Person re-identiﬁcation (ReID) in color and thermal images require matching of the object color and its temperature. While thermal cameras increase the performance of ReID systems during the night-time, identiﬁcation of corresponding features in the visible and the long-wave infrared range is challenging. The biggest challenge arises from the multimodal relationship between an object’s color and its temperature. Modern ReID methods provide state-of-the-art results in person matching in the visible range. Hence, it is possible to perform multimodal matching by translation of a thermal probe image to the color domain. After that, the synthetic color probe image is matched with images from the real color gallery set. This paper is focused on the development of the ThermalReID multispectral person ReID framework. The framework performs matching in two steps. Firstly, it colorizes the input thermal probe image using a Generative Adversarial Network (GAN). Secondly, it matches images in the color domain using color histograms and MSCR features. We evaluate the ThermalReID framework using RegDB and ThermalWorld datasets. The results of the evaluation are twofold. Firstly, the developed GAN performs realistic colorization of thermal images. Secondly, the ThermalReID framework provides matching of persons in color and thermal images that compete with and surpass the state-of-the-art. The developed ThermalReID framework can be used in video surveillance systems for effective person ReID during the nighttime.


INTRODUCTION
Person Re-identification (ReID) is the problem of matching of the same humans captured in different images.The images can be captured using different cameras and during a different time.The only requirement is that the appearance (clothes, shoes, etc.) of a person remains unchanged across all the images.A typical ReID scenario is presented in Figure 1.Let Cam 1, Cam 2, Cam 3 be cameras located in a mall area, Let P1, P2 be the persons walking through the mall.Then the ReID system for a given image Ip(i, j) showing a person j from a camera i should find a set of images s = {I1, I2, . ..} from other cameras, where the person P is visible.Person ReID problem received a lot of scholar attention recently (Farenzena et al., 2010, Bhuiyan et al., 2015, Bhuiyan et al., 2018).While modem ReID methods provide thestate-of-the-art accuracy in the visible range, multispectral ReID remains challenging.A robust cross-domain feature matching is required for identification of the same person in a different spectral range.
Authors participate in a research project focused on the development of a multispectral video surveillance system.The system should perform person re-identification in images captured by an outdoor thermal camera and an indoor color camera.Many robust methods for ReID in the visible range were proposed recently (Bhuiyan et al., 2018, Wei et al., 2018, Saquib Sarfraz et al., 2018).Most of the methods use either hand-crafted features (color histograms, feature descriptors) or convolutional neural networks (CNN).Most of multispectral ReID methods leverage dedicated CNN architecture to extract and match discriminative * Corresponding author features (Wu et al., 2017, Ye et al., 2018b, Ye et al., 2018a).Recently (Kniaz et al., 2018) proposed a new ReID approach based on a transformation of the spectral range of the probe image.The input color probe image is transformed into the thermal image domain using a dedicated Generative Adversarial Network (GAN).To the best of our knowledge, there is no research in the literature regarding colorization of a thermal image for cross-domain ReID.This paper is focused on the development of a cross-modality person ReID framework ThermalReID.We perform re-identification in two steps.Firstly, the input thermal probe image is transformed into a synthetic color image using a GAN model.After that, we match the synthetic color probe image with a color gallery set.We evaluate our ThermalReID framework on Ther-malWord ReID and RegDB datasets.Comparison with modern ReID models demonstrates that our ThermalReID framework competes with state-of-the-art ReID methods.Our main obser-vation is that colorization of long-wave infrared images (LWIR) is an ill-posed problem.Realistic colorization requires an additional modality.We provide such modality as an additional 'hint' color image from a different camera.The 'hint' color image provides desired object colors and textures to our ColorMatchGAN model.Our ThermalReID framework proves to provide robust cross-modality person matching and can be used in video surveillance applications.
The rest of the paper is organized as follows.Section 2 presents a brief review of the literature regarding person ReID and crossmodality image-to-image translation.In Section 3 the developed ThermalReID framework is discussed.Section 4 demonstrates results of evaluation of our ThermalReID framework.Section 5 summarizes results and discusses the future implications of the developed method.

Contributions
We present three key technical contributions: 1. ColorMatchGAN model for colorization of a long-wave infrared image using a 'hint' color image.
3. Evaluation of the ThermalReID framework and baselines on two ReID datasets.

RELATED WORK
Person re-identification problem received a lot of scholar attention recently (Farenzena et al., 2010, Bhuiyan et al., 2015, Bhuiyan et al., 2018).Most of modern approaches can be divided into three groups: direct methods, metric learning methods, and transform learning methods (Bhuiyan et al., 2018).The rest of this section presents a brief review of modern ReID methods organized by these groups.

Direct methods
Direct methods utilize hand-crafted or learned features to directly match person IDs.Recent works demonstrate a significant progress in continuous ReID that utilizes multiple person detections in frame sequence.In (Tang et al., 2017) a graph-based method was proposed that allows to cluster person hypotheses over time.
Clustered hypotheses are solved using a minimum cost lifted multicut problem.Another approach proposed in (Bhuiyan et al., 2018) leverages cumulative weighted brightness transfer function (CWBTF) to match person images from different views.

Metric learning methods
Metric learning methods utilize the training data to learn metric spaces that allow effective ReID.In (Barman and Shah, 2017) an algorithm was proposed that allowed effective match ranking using a graph-based approach.Siamese networks prove to be efficient to learn metric spaces for person ReID (Guo and Cheung, 2018).Multilevel matching further increases the performance of the proposed method.
Kronecker Product combined with an end-to-end trainable network provide effective means for feature map matching (Shen et al., 2018).The evaluation of the method on the Market-1501 (Zheng et al., 2015), CUHK03 (Li et al., 2012), and DukeMTMC (Ristani et al., 2016) datasets proved its robustness.

Transform learning methods
Transform learning methods aim to learn a transformation from one camera to another that increases the similarity of person in different view and lightning conditions.Most of transform learning methods divide human body into parts to extract discriminative features (Zhao et al., 2017, Li et al., 2018, Qian et al., 2017, Si et al., 2018, Su et al., 2017).A deep architecture that utilizes human part cues for efficient ReID was proposed in (Su et al., 2017).In (Zhao et al., 2017)  Transform learning methods are essential for cross-modality Re-ID problem where gallery and probe images are provided in different spectral ranges (e.g., visible and infrared).Recently multiple deep learning-based methods were proposed that demonstrated the possibility of cross-modality ReID.In (Wu et al., 2017) a large cross-modality SYSU-MM01 dataset was proposed.Authors also proposed a deep learning-based method for cross-modality Re-ID that leverages a one-stream network and deep zero-padding.
The evaluation of the proposed method on the SYSU-MM01 dataset proved that infrared modality can increase the matching rate in low light conditions.
Our model is based on the previous research in the field of colorto-thermal image translation for person ReID (Kniaz et al., 2018).
Unlike other similar models it uses a GAN network to synthesize realistic color probe images that are matched against real color gallery images.

METHOD
We use the ThermalGAN ReID framework (Kniaz et al., 2018) as the starting point for our research.Unlike the original approach, we perform matching in the color domain.We hypothesized that a GAN network could learn the transformation from the thermal range to the visible spectrum.However, the naïve translation from to color image in RGB format thermal image was not successful.We used the assumptions of Berg et al. (Berg et al., 2018) and performed colorization of LWIR images to images in the Lab color space.Translation from thermal image to the Lab domain prove to be more efficient.However, the result still suffered from various artifacts (e.g., color uncertainty).
We hypothesized that additional modality could improve thermal image colorization.More specifically, we use a 'hint' color image as an input modality for realistic colorization.We term the resulting model as ColorMatchGAN.The model should be able to perform the diverse colorization for various 'hint' color image.
The rest of this section is organized as follows.Firstly, we describe our Color ReID framework.After that, we discuss our colorization GAN network and the architecture of the generator.
A description of ReID features concludes the section.

Multimodal ReID framework
Let V = {v1, v2, v3, . . ., v k , } be the gallery set and Ip the thermal probe image.Where i is the index of the camera i = {1, 2, k}, ji = {1, 2, . . ., l}, ji is the number of persons detected an image i.Our ReID framework must provide the ReID score m(Vi,j, tp) for a given pair of visible and thermal images.
The score must be close to 1 if the image presents the same person.Otherwise, the score must be close to 0. We calculate the score using a pair of color images: a real image vi,j and a synthesized image vp.We use a dedicated GAN model to generate a synthetic image vp from the real color image.Finally, we match images using a color histogram and MSCR features (Forssén, 2007).

ColorMatchGAN for colorization of LWIR image
Colorization of a long-wave infrared image (LWIR) is challenging.Firstly, the colorization problem requires a reconstruction of three channels (red, green, blue) from a single channel (8-14 µm).Secondly, for most of the objects, there is no correlation between an object's color texture and its infrared texture.
Our experiments with the pix2pix model have demonstrated that the naïve prediction of object's color from an LWIR image provide unrealistic results for RGB and Lab Color spaces.While Lab color space increases the details of the synthesized image, the result still suffers from false textures.We hypothesized that an additional modality could improve the quality of reconstruction.An example of such modality is a color image that provides examples of desired objects texture and color.We use the ThermalGAN framework (Kniaz et al., 2018) as a starting point for our research.
ThermalGAN framework is based on the pix2pix GAN model and the U-Net (Ronneberger et al., 2015) Generator.We made the following modifications to the ThermalGAN: (1) a new input tensor that includes thermal segmentation image, relative temperatures image, and a 'hint' color image, (2) output domain in Lab color space.The resulting colorization framework is presented in Figure 2.
The 'hint' image is supplied randomly from the color gallery set.We assume that the random color image Vi,j and the probe thermal image tp share simile textures and colors if they are located close to each other.The generator network architecture is based on the U-Net architecture.We made the following contributions to the generator: (1) two additional convolutional layers, that increase input/output dimensions, (2) we modified stride and kernel sizes, to enable filters to cover larger objects.The original U-Net architecture consists of eight convolutional and deconvolution layers.Convolutional layers consequently compress image information until it becomes a vector with shape 1 × 512.Deconvolution layers increase the resolution of the feature vector until it reconstructs the size of the original image.

Network training
The ColorMatchGAN framework was trained on the VOC split of the ThermalWorld dataset (Kniaz et al., 2018) using the PyTorch library (Paszke et al., 2017).The VOC split includes indoor and outdoor scenes to avoid domain shift.The training was performed using the NVIDIA 1080 Ti GPU and took 126 hours for G1,D1.
For network optimization, we use minibatch SGD with an Adam solver.We set the learning rate to 0.0002 with momentum parameters β1 = 0.5, β2 = 0.999 similar to (Isola et al., 2017).

Multimodal colorization evaluation
We evaluate multimodal colorization performance using the independent part of the VOC split of the ThermalWorld dataset.
Colorizations results are presented in Figure 3.We perform a qualitative assessment of colorization results.We found that colorization results for object classes with a low diversity of surfaces colors (cats, dogs, humans) are much better than colorizations of objects with a high diversity of surface colors (cars, trucks).We perform a quantitative evaluation of our ThermalReID framework using the ReID split of the ThermalWorld dataset.The dataset includes pairs of aligned color and thermal images for 516 ID.We compare the performance of our ThermalReID framework with six baselines.We measure the performance of frameworks in terms of cumulative matching characteristic (CMC) curves and normalized area-under-curve (nAUC).The evaluation results for single-shot and multi-shot settings are presented in Figure 4. Examples of probe image colorization are shown in Figure 5.

CONCLUSION
We demonstrated that generative adversarial networks are capable of realistic colorization of long-wave infrared images and can reconstruct discriminative features for effective person ReID.Furthermore, our ThermalReID framework utilizes two-stage person re-identification for matching thermal and color images.Firstly, we supply a thermal image and an additional 'hint' color image to a GAN colorization network to generate a synthetic color probe image.The 'hint' image is sampled from the gallery set and provides the desired color distribution to the colorization network.Secondly, we match the resulting synthetic color probe image with a real gallery set using color histograms and MSCR features.
Our main observation is that the colorization of thermal image is an ill-posed problem and can not be performed without additional modalities.While training in Lab color space provides a significant boost in the colorization performance, the resulting images still suffer from unrealistic textures and false color proposals.Nevertheless, our ColorMatchGAN network can match objects in thermal and color domains and translate colors and textures from the 'hint' color image to the corresponding objects in the input thermal image.

Figure 3 .
Figure 3. Qualitative colorization results for different classes and networks.

Figure 4 .
Figure 4. CMC plot and nAUC for evaluation of baselines and ThermalReID method in single-shot setting (top) and multi-shot setting (bottom).

Figure 5 .
Figure 5. Qualitative method comparison.We compare performance of various multimodal image translation frameworks on ThermalWorld ReID dataset.We present colorization results for models trained in RGB and Lab color spaces.