Xstream: a Highly Efficient High Speed Real-time Satellite Data Acquisition and Processing System Using Heterogeneous Computing

In the last decade, the remote sensing community has observed a significant growth in number of satellites, sensors and their resolutions, thereby increasing the volume of data to be processed each day. Satellite data processing is a complex and time consuming activity. It consists of various tasks, such as decode, decrypt, decompress, radiometric normalization, stagger corrections, ephemeris data processing for geometric corrections etc., and finally writing of the product in the form of an image file. Each task in the processing chain is sequential in nature and has different computing needs. Conventionally the processes are cascaded in a well organized workflow to produce the data products, which are executed on general purpose high-end servers / workstations in an offline mode. Hence, these systems are considered to be ineffective for real-time applications that require quick response and just-in-time decision making such as disaster management, home land security and so on.. This paper discusses anovel approach to processthe data online (as the data is being acquired) using a heterogeneous computing platform namely XSTREAM which has COTS hardware of CPUs, GPUs and FPGA. This paper focuses on the process architecture, re-engineering aspects and mapping of tasks to the right computing devicewithin the XSTREAM system, which makes it an ideal cost-effective platform for acquiring, processing satellite payload data in real-time and displaying the products in original resolution for quick response. The system has been tested for IRS CARTOSAT and RESOURCESAT series of satellites which have maximum data downlink speed of 210Mbps.


INTRODUCTION
Remote Sensing (RS) community has observed a significant growth in number of satellites, sensors and their resolutions, thus causing an exponential growth in the volume of data to for processing each day during last decade.On the other hand there has also been a growing demand to cut down the lead time for generating the products that need to be used by different applications, such as disaster management, home land security, etc. that demand quick response; therefore the data products need to be made available for these applications in the quickest possible time.Conventional data processing [1][2]that involves chains, though well proven, time tested and rugged, are considered to be ineffective for real-time applications that require quick response and just-in-time decision making.The examples of processing includes Decoding, Decryption, Decompression, Radiometric Normalization, Geo-coordinate computation and tagging the corner coordinates, and finally writing of the product onto the disk to make a Level-1A product.Additionally, for some of the missions such as Cartosat series of satellites, stagger estimation and stagger correction are also done before writing of the product onto the disk.
IRS Satellites such as IRS Cartosat-1[3],Cartosat-2[4], Resourcesat[5] capture images of ground and transmit the image data (also referred to as video data) after performing various onboard processing such as compression, encryption, error correction coding etc.Along with the image data, data pertaining to various sensors on-board, such as Star Sensor, Gyros, clock, health parameters etc., which are referred to as auxiliary data (ephemeris data or AUX data ) are also multiplexed with video data and transmitted to ground stations.
Video data is processed to produce viewable image, while aux data is used to estimate and tag the geo-location of the ground features in the final image product.The data is transmitted in the form of fixed frames by attaching a pattern called Frame Sequence Code (FSC) to identify the valid data on ground.Ground systems, acquire the data and process the payload data (video and aux) to generate level-1A product [6].Cartosat-1 has two streams with each stream having I& Q channels of 52.5Mbps data rate, while Cartosat-2 downlinks the data in one stream having I & Q channels of 52.5Mbps each.The data is transmitted using X-band Carrier Radio Frequency (RF) communication system.The data is sent using X-band Carrier Radio Frequency (RF) communication system.RF data receiving and conversion to Intermediate Frequency (IF) at the ground stations etc., are standard equipments and are always done in real-time hence are not discussed in this paper.
Satellite payload data processing, to generate level-1A product from raw signal data received from the De-mod output, involves certain complex processes performed sequentially in the reverse order of on-board processing.The processes are similar to that of for most of the remote sensing satellites, with variation in mission specific formats and processes.For example, some missions do not compress or encrypt the data and similarly a different method may be adopted for encoding, encryption, or compression.
Figure-1 explains the conventional ground system data acquisition and processing for IRS Cartosat-1[1]and similar methodology is also adopted for Carosat-2 with mission specific variations.The system employs organized workflow to produce the usable data products, which are executed on general purpose high-end server(s)or workstation(s) in an offline mode (i.e., record the data to files, process them by reading the data from the files and writing them back on to the files on the hard disk).Therefore they consume a lot of time, computing resources and are deemed to be sub-optimal and applications that demand quick response.

Figure 1. Ground System Processing to generate Level
Product for IRS Cartosat-1/2 In the recent past, the computing paradigm has taken a shift from usual sequential processing to parallel/multi architectures duetothe clock speeds hitting the limits.Today, all the major processor vendors manufacture with around ten cores in a single CPU.GP/GPU (General Purpose Graphics Processing Unit) is the latest addition in the High Pe Computing (HPC) domain.GPU is an accelerator card which contains very large number of cores, specifically designed to perform Single Instruction Multiple Data (SIMD) type of operations.Similarly, Field Programmable Gate Arrays (FPGA) are special electronic chips containing large number of Gates and Common Logic Blocks (CLB) which can be programmed to behave like an electronic circuit, thus can achieve very high performance by consuming less power Processors (DSPs) which are inherently parallel and are designed for specific processesand perform these tasks very fast.
On the other hand there has also been a significant progress in standardization of interfaces and programming language Interfaces.PCIe[7] has emerged as de-facto inte of the add-on accelerator cards, FPGA cards cards.OpenCL[8] and OpenMP[9] have emerged as an ideal programming interface which supports all multi architectures and are vendor neutral.All these developments have given system engineers greater flexibility to chose the right platform for a given task and integrate them into a single system, single instance of OS and a single program.
Heterogeneous computing or sometimes also referred to as Hybrid Computing [10] platforms integrate various computing platforms such as CPUs, FPGAs and GPUs into a single system which can perform complex tasks/workflows in parallel and achieve higher throughput.
[11]demonstrated the usage of Metacomputing by utilizing multiple resources and high speed network for real processing and visualization of existing remote sensing data, but the process does not address the real-time pass acquisition.[ discusses a requirement of storing the realreal-time database using file buffers.It does not discuss , process them by reading the data from the files and writing them back on to the files on the hard lot of time, computing optimal and ineffective for Ground System Processing to generate Level-1A 1/2 , the computing paradigm has taken a radical to parallel/multi-core clock speeds hitting the limits.Today, all the major processor vendors manufacture with around ten cores in a single CPU.GP/GPU (General Purpose Graphics Processing Unit) is the latest addition in the High Performance Computing (HPC) domain.GPU is an accelerator card which contains very large number of cores, specifically designed to perform Single Instruction Multiple Data (SIMD) type of operations.Similarly, Field Programmable Gate Arrays (FPGA) l electronic chips containing large number of Gates and Common Logic Blocks (CLB) which can be programmed to behave like an electronic circuit, thus can achieve very high performance by consuming less power.Digital Signal ently parallel and are designed for specific processesand perform these tasks very significant progress in standardization of interfaces and programming language facto interface for many FPGA cards, and GP/GPU have emerged as an ideal programming interface which supports all multi-core All these developments m engineers greater flexibility to chose the right platform for a given task and integrate them into a single system, single instance of OS and a single program.
or sometimes also referred to as rate various computing platforms such as CPUs, FPGAs and GPUs into a single system which can perform complex tasks/workflows in parallel and Metacomputing by utilizing ed network for real-time processing and visualization of existing remote sensing data, but time pass acquisition.[12] -time data into nons.It does not discuss elements of payload data processing.[13][14] demonstrated the use of Heterogeneous Computing using GP/GPUs for high throughput for different domains This paper discusses a novel processing Remote Sensing Data being received) using a heterogeneous computing platform namely XSTREAM having a combination of CPUs, GPUs and FPGA.
Section 2 discusses the methodology adopted, aspects, mapping of the tasks to the synchronization of all the modulesin order to products and display them in original resolution (while the pass is being acquired discusses the test results and compar data processing chain in terms of quality and accuracies.4 discusses the conclusion and future directions.

Level-1A product generation process:
Figure 1 depicts the sequence of steps to be performed for generation of level-1A product (also referred to as basic product for any analysis) from raw signal data and a brief description of nature of the problem is listed below.Please note that staggered placement of the CCD arrays odd and even pixels are recorded, processed in separate streams and sent as I & Q channels.Hence the following processes to be performed for two streams separately (ix))and finally combined to generate a level (x) to step (xiii)).

i. Detection of Valid Frames:
search for frame header i.e., FSC valid frames.ii.Time Stamping: Time stamp each frame with the Ground Reception Time (GRT), whi Translator (TCT) iii.Aux Separation:Separate the Aux data data and process aux data and video data iv.Decode:Decoding is done, which is encoding.In Cartosatmissions Solomonencoding method byte errors in 247 bytes.v. Decrypt: Decrypt the data based on the encryption methodology used in the mission used is stream cipher and the details are out of the scope of this paper.vi.Decompress: Most of the use lossy compression techniques downlink rates vis-a-vis data acquisition rates.These compressions are data dependent and produce length bit stream after compression JPEG2000 etc).In Cartosat series compression,augmented with a rate control algorithm and Huffman coding is used searching of header, followed by stream, de-quantization and inverse transform vii.Aux Processing: The star sensor data, board time etc (also known as or ephemeris data) used to construct orbit and attitude information (OAT) with respect to the imaging time.These are used in computing the geo-location (latitude and lo known as lat/longs) and tagging the image line with the computed lat/longs.elements of payload data processing.[13][14] demonstrated the use of Heterogeneous Computing using GP/GPUs for achieving domains.
approach of acquiring and Sensing Data in real-time (as the data is being received) using a heterogeneous computing platform namely XSTREAM having a combination of CPUs, GPUs and ection 2 discusses the methodology adopted, re-engineering the tasks to the right computing device and synchronization of all the modulesin order to generate the data and display them in original resolution in real time (while the pass is being acquired and processed).Section 3 comparison with the conventional data processing chain in terms of quality and accuracies.Section 4 discusses the conclusion and future directions.

METHODOLOGY 1A product generation process:
the sequence of steps to be performed for 1A product (also referred to as basic product raw signal data and a brief description of of the problem is listed below.

Analysis& Design:
This section discusses the nature of processing involved for each of the task/step mentioned in section 2.1 and provides the reasoning for mapping them to the chosen device.Summary of the same is shown in Table 1.
The de-mod provides digital data to the system.The satellite transmission starts slightlyahead of the actual imaging data transmission.Hence to detect valid data a fixed code i.e., FSC is attached to every valid frame.The FSC code needs to be searched, for obtaining the valid data in the ground processing systems.Time stamping on each frame is also required to detect the frame losses during the transmission (if any) and replace them with interpolated values if possible.Since the nature of these tasks is search and replace in the bit stream obtained from de-mod, it is ideal to perform these tasks on an FPGA.Hence a COTS FPGA based card is used which takes the Low Voltage Differential Signal (LVDS) input data and writes the valid frames onto main memory of the system.The card sits on a PCI-X bus and also provides a logic to stamp the time through IRIG-B interface,which can be used to correct the frame losses, wherever possible.
In Cartosat series of missions, the aux data is added to video data by the data formatter on-board.Hence the aux data in each valid frame needs to be separated from video data.Aux separationinvolvesextraction of fixed number of bytes from valid frame andcan be easily done on FPGA as well as on CPU.
In some of the missions aux data is also encoded.Hence to make the process generic and flexible,the aux separation task is mapped to CPU in the proposed method.
The next step in the process chain is RS-Decoding.RS-Encoding[17] is a common procedure adopted in satellite communications for correcting communication bit errors.RS-Encoding is a fixed block 255/247 encoding, the decoding process is iterative in nature and also to balance the load the on GPU and CPU, CPU implementation is chosen using multiple threads mapped to multi-core.
After decoding process, the next step is decryption.In Cartosat series of satellites stream cipher encryption is used.Decryption of stream cipher text involves simple XOR operations, which can be done on FPGA or GPU.Since the data is already available in the main memory of the host, noting that the process is a light weight process,further the next step needs to be done on CPU, the task is mapped to CPU to avoid multiple data transfers.
As explained in previous section that the compression results in a variable length code, a marker is attached to every compressed block.The decompression process involves searching of marker on CPU, and sent to GPU.Additionally, to address frame loss or irrecoverable errors in one of the channels (I or Q) the blocks are aligned and sent to GPU for decompression.The decompression is executed in block parallel way on GPU which is more optimal than that of CPU.
Aux processing involves complex modelling,which uses the ephemeris (AUX data), and iterative computations.This process is independent of video processing, hence, can easily be spawned as a separate dedicated thread on CPU.This module also performs stagger estimation and identifies the strip based on the time tag.
The next step in the process chain after the decompression is radiometric normalization,which is a pixel based operations.Pixel based operations falls in the category of SIMD, for which GPU architecture is ideally suited.After the radiometric normalization the images from I & Q (odd and even) need to be combined to generate full swath image.The stagger correction process is also applied to optimize the process flow while combining the data.Since stagger correctioninvolves re-sampling, which is again a pixel based operation,it is also mapped to GPU.

Design
Since all the steps need to be performed in sequential mode a 'pipelined architecture' as shown in the figure-2 is adopted to achieve real-time processing.Three way pipelineapproach is adopted, in which each stage is performed on a different computing device.Parallel pipeline architecture requires buffer based processing.Hence in XSTREAM, based on all the processes involved, block size is fixed to three seconds of input data.

Table 1. Mapping of tasks to computing devices
The first stage, which is mapped to a FPGA acquires the data, does FSC validation and time stamping.The result is passed to the second stage, which is mapped to CPU, where decoding, decompression and block identification tasks are performed.Additionally, second stage also performs aux processing, which is completely an independent task and can be executed on CPU.The result of second stage viz., decompression blocks and discrete stagger values, are passed to GPU, which is the third stage of the pipelined architecture, proposed in XSTREAM software.The GPU performs decompression, radiometric normalization and stagger corrections as three independent kernels.The result of the third stage is a full swath and resolution image and is sent back to the host system.The GPU uses a high speed PCIex16 bus for communicating with the host.Finally, the geo-tagging and writing to a file is done in a separate dedicated thread with asynchronous IO.The first stage, which is mapped to a FPGA acquires the data, does FSC validation and time stamping.The result is passed to the second stage, which is mapped to CPU, where decoding, identification tasks are performed.Additionally, second stage also performs aux processing, which is completely an independent task and can be executed on CPU.The result of second stage viz., decompression blocks and to GPU, which is the third stage of the pipelined architecture, proposed in XSTREAM software.The GPU performs decompression, radiometric normalization and stagger corrections as three independent kernels.The result of the third stage is a full swath and full resolution image and is sent back to the host system.The GPU uses a high speed PCIex16 bus for communicating with the tagging and writing to a file is done in a separate dedicated thread with asynchronous IO. Figure 2 he three way pipeline architecture of XSTREAM.
Three way pipeline processing adopted in XSTREAM

XSTREAM System and Process
Based on the analysis the XSTREAM asfourseparate processes, namely 'R Acquisition', 'RT Data processing', 'RT Full resolution scroll display' and 'XScheduler'.This is to propagation of failure of one module to engineered such that it can work independently.Real-Time Acquisition module is responsible for acquiring the data and writing the valid frames into h data can also be flushed to disk in real pass for analysis and playback adopted to providefail-safe option in case of failure in RT Processing.
The 'RT Data processing' module is responsible for processing the data in blocks and in real-time.Processing includes all the steps as discussed in section 2.2.'RT Full resolution Display' displays the processed L products on a multi-screen display.minimal navigational aids such as scroll speed adjustments, jump to arbitrary location within the strip and minimal enhancements, such as contrast stretch.used in offline mode to view already processed The 'XScheduler' controls all the pro status information, error messages, alerts action, logs etc., on a GUI.All the real controlledby XScheduler.One of the main tasks of scheduler is to handle clash scenario by providing pre scheduling as well as manual override options for the pass schedules of each mission schedule accordingly.
Figure 3 shows the DFD and process architecture of XSTREAM software.The hardware block diagram of the host system is shown in Figure 4.

Process architecture
TREAM softwareis designed namely 'Real-Time(RT) stream Acquisition', 'RT Data processing', 'RT Full resolution scroll This is to modularize and arrest the module to others.Every process is independently.ime Acquisition module is responsible for acquiring the mes into host memory area.The in real-time or flushed after the playback support.This procedure is safe option in case of failure in RT 'RT Data processing' module is responsible for processing time.Processing includes all the displays the processed Level-1A screen display.The display also provides minimal navigational aids such as scroll speed adjustments, jump to arbitrary location within the strip and minimal enhancements, such as contrast stretch.This process can also be already processed strips.controls all the processes and displays the status information, error messages, alerts requiring immediate etc., on a GUI.All the real-time processes are .One of the main tasks of scheduler is handle clash scenario by providing pre-emptive priority based manual override options.XScheduler polls ssion and updates the process

XSTREAM Software Implementation Specifications
The entire software is built on COTS hardware and Open platforms.Standard Linux Operating system with ANSI C/C++ language and gcc [18] compiler is used.On host side POSIX threads are used for exploiting all the cores in CPU, while OpenCL API is used for GP/GPU programming.OpenCL implementation is chosen because it is independent of the GPU provider and provides portability across all the computing devices viz., CPU, GPU, etc. OpenSceneGraph[19] is used for Real-time rendering of processed images on multi-screen display.The Scheduler module is developed using Qt [20] User Interface (UI) framework.Qt toolkit is lightweight and is an Open source project.For synchronize between all the modules within XSTREAM, standard POSIX inter process communication (IPC) mechanisms are adopted.

TEST RESULTS
The XSTREAM host systemhaving the configuration as listed in Table-2 is used for testing.The system was connected to COTS demodulator of M/S Cortex make at NRSC-Shadnagar.The COTS High Speed Data Acquisition Card supplied by M/S Apollo Microsystems [21], Hyderabad, was used to acquire data from the de-mod and was connected to PCI-X bus within the host system.The high speed acquisition card also had IRIG-B interface to stamp time on the valid frames.The system was initially tested with Cartosat-2/2A/2B passes for all modes, such as Real-Time (RT), Solid State Recorder (SSR) Playback (PB) and mixed modes i.e., RT and PB.Later the software, with mission specific modifications, was also tested for Cartosat-1 and Resourcesat-2 missions.The system was continuously tested for more than a year and specific evaluations for the following data sets were performed to study the radiometric quality and geometric accuracies.
The results of both processesi.e., conventional processing and XSTREAM processingare compared below.Sys-I as mentioned in the above table, is the conventional system, which consists of two Itanium based servers one performs data acquisition and Level-0 processing, while the second server performs the Level-1A processing in offline mode.The Sys-II is the XSTREAM host system having the configuration as shown in Table 2.
The radiometric quality was visually verified and found to be similar for both the products.Similarlythe Level-2 products generated from both chains showed similar accuracies.

CONCLUSIONS
Satellite payload data processing to generate level-1A product involves sequence of multiple tasks that need to be executed one after the other.Conventionally they are done on high server(s) in a sequential mode which is not only suboptimal but also introduces substantial delays in actual usage of the data, thereby is not suitable for applications needing quick response.This work presents generation of level-1A products in real-time using a single heterogeneous computing platform namely XSTREAM having a combination of FPGA, GP/GPU and multi-core CPUs.The approach adopted involves reengineering of all the software modules and careful orchestration among all computing entities within a single system and a single process.
The XSTREAM system is tested for Cartosat series of satellites having data rates of 105Mbps (2 (I&Q) X 52.5Mbps) and 210Mbps (4 X 52.5Mbps) downlink.All operational modes such as RT, PB, and SSR were tested and the products were generated in real-time (during the live-pass) with latencies between 6-10 seconds, by using 2 Socket CPU with 8 core each, along with one GP/GPU card per channel (I&Q) of data.The radiometric quality of both the products i.e., the product generated through normal offline mode and through XSTREAM appeared similar, while the geometric accuracies for generation level-1B was slightly inferior as compared to that of the offline processing.However, the level-2 product accuracy again showed similar results for the conventional offline processing as well as for on-line processing through XSTREAM.
The system is mission neutral, highly modular and independently scalable to support any future missions such as Cartosat-3 which has higher data rates.Future work is also oriented towards processing of SAR missions, such as RISAT-1 in which challenge would be not only the high data rates but also signal processing.Another direction of work is to generate level-2 products in real-time with automatic identification of GCPs.
Figure2.Three way pipeline processing adopted in XSTREAM

Figure 3
Figure 3 shows the DFD and process architecture of XSTREAM software.The hardware block diagram of the host

Figure 4 .
Figure 4. Hardware block diagram of XSTREAM system

Figure 5 .
Figure 5.XScheduler displaying the list of passes scheduled and RT Processing stage Please note that due to t of the CCD arrays in Cartosat missions, odd and even pixels are recorded, processed in separate streams and sent as I & Q channels.Hence the following processes need Stagger Estimation: Stagger value is specific to missions having staggered placement of CCD arrays like Cartosat satellites.The stagger value depends on the satellite look direction (Roll bias), and hence it changes for each scene.This requires complex geo-processing using OAT values.ix.Radiometric normalization:Due to the non-linear nature of CCD devices, a gray level normalization process is to be performed.Often this process involves passing through a LookUp Table (LUT) for every pixel.LUT is pre-computed table based on lab settings, and is also routinely updated using a calibration test site.Since the data is processed from two independent chains there is a possibility of data mis-alignment due to any reasons, such as, loss of frames, bit-errors,process exception in one of the channels I/Q etc. Hence the data from both I & Q channels are to be aligned using the time tag and line count extractedfrom the video data.xi.Stagger Correction: As explained in step-(viii), due to staggered placement of odd pixels and even pixels in the focal plane one of the images need to be shifted (resample) by a value computed in step (viii) to generate a stagger freefull swath original resolution image.xii.Strip Separation and Geo Tagging:For missions like Cartosat-2 (which involves manoeuvring between two successive spot acquisitions) the data may contain manoeuvre data during of RT acquisition and in case of PB the strips are streamed continuously without any manoeuvre data.Hence the process involves separating the scenes and writing the separatedstrips as independent images.Aux processed values OAT is processed and stored as Auxiliary Data Interchange Format (ADIF) and are used to compute the geo-coordinates of the scene.The Geo-coordinatesat regular intervals and are stored in a separate file (grd file) and is used in subsequent processing, such as level-2 processing andlat/long readout in display process etc. xiii.Geo-Image generation:The image thus generated in step (xi) and geo coordinates computed at regular intervals in step (xii) are combined together and written in a native format as a level-1A product (the design is flexible and can be written in any other open format like HDF5[15]).
for two streams separately (step (i) to step to generate a level-1A product ((step Detection of Valid Frames: Read the bit stream and search for frame header i.e., FSC code and store the

Table - 2
: Hardware configuration of XSTREAM Host