MULTI-AGENT LEARNING FRAMEWORK FOR ENVIRONMENT REDUNDANCY IDENTIFICATION FOR MOBILE SENSORS IN AN IOT CONTEXT

From an IoT point of view, the continuous growth of cheap and versatile sensor technologies has generated a massive data flow in communication networks, which most of the time carries unnecessary or redundant information that requires larger storage centers and more time to process and analyze data. Most of this redundancy is due to fact that network nodes are unable to identify environmental cues showing measurement changes to be considered and instead remain at a static location getting the same data. In this work we propose a multi-agent learning framework based on two theoretical tools. Firstly, we use Gaussian Process Regression (GPR) to make each node capable of getting information from the environment based on its current measurement and the measurements taken by its neighbors. Secondly, we use the rate distortion function to define a boundary where the information coming from the environment is neither redundant nor misunderstood. Finally, we show how the framework is applied in a mobile sensor network in which sensors decide to be more or less exploratory by means of the parameter s of the Blahut-Arimoto algorithm, and how it affects the measurement coverage in a spatial area being sensed.


INTRODUCTION
The amount of data collected in IoT applications grows continuously due to the high variety of sensing devices and the increase of applications using them.New devices to measure a wide range of variables in surveillance contexts such as air pollution (CO, N O2, P M2.5 and P M10) (Zheng et al., 2013), meteorological data (temperature, wind speed, humidity), assistive living (heartbeat, SpO2) (González-Valenzuela et al., 2011), disaster management and intelligent transportation are just some examples of the extensive spectrum of applications.
Most of these applications use analytic and statistical methods such as Reinforcement Learning (Song et al., 2013, Sendra et al., 2014), Deep Recurrent Neural Networks (Ma et al., 2015, Ong et al., 2014), Markov Decision Process and Gaussian Process Regression (Cheng et al., 2014, Dongbing Gu and Huosheng Hu, 2012, Kunz et al., 2009) to infer the environment behavior in uncovered zones and to establish collaborative decision making between the network nodes.However, in spite of having good performance in spatial and temporal predictions, these methods are not specifically concerned about the redundant information taken from the environment and its injection through the network from the measurement points, which generates for some applications, unnecessary data accumulation in data centers or repetitive transmission to sink nodes.
On the other hand, from an IoT point of view, applications involving node mobility might have isolated nodes or clusters in a given period of time, which requires network implementations with characteristics such as self-organization and event-driven decision making, so that nodes have abilities to identify environment changes based just in their neighborhood information, and make decisions using them in order to improve features such as transmission times, sleep-awake mode schemes and mobility.
In this regard, in this work we propose a multi-agent framework that allows the nodes (henceforth referred as agents) of a sensor network to identify cues from the environment through inferred information based on its neighborhood knowledge with low information redundancy.The proposed framework consists of the following steps: • Inference of the environment information for each agent using Gaussian process regression, based on its neighborhood information.
• A decision making scheme that depends on the mutual information of each agent about its environment.To calculate the mutual information between the agent and its environment we use the rate distortion function by means of the Blahut-Arimoto algorithm.
• Agents can change their behavior depending on the input parameter s of the Blahut-Arimoto algorithm.In this way, they can decide where to move in a spatial field depending on the redundancy level of information, becoming more exploratory if they move to positions with low mutual information or less exploratory if they move to positions with high mutual information.This will be shown in section 5.
The rest of this paper is organized as follows.Section 3 describes the rate distortion function and how it can be obtained using the Blahut-Arimoto algorithm.In section 3.2.1 we describe the relevance of the parameter s to define distortion-information points.In section 3.3 we show general concepts about Gaussian process regression and its focus in our approach, and in 3.4 we show some graph theory concepts that will be useful to describe the agent interactions when they become more or less exploratory.Finally, in section 4 we describe the proposed framework in a detailed way.

RELATED WORK AND CONTRIBUTION
In terms of mobile sensor networks over an IoT context, a huge variety of approaches has been proposed in order to reduce redundancy and energy consumption of sensing devices by means of data collection and data aggregation schemes as is reported in (Ang and Phooi Seng, 2016).In terms of data collection, in (Li et al., 2013), the compressed sensing theory (CS) is used to reduce the sampling points taken for each sensor, achieving therefore, redundant data and energy consumption reduction, since the low sampled data is reconstructed at the base station where the energy is not constrained.In (Wang et al., 2014), the energy consumption is reduced through the scheduling of sleep and awake modes of nodes located in zones where the measures are considered redundant.This approach uses complex networks concepts, specifically a random network model to define through parameters such as the degree distribution, clustering coefficient and shortest path, the minimum number of nodes to maintain the network connected within an specific zone.From the data aggregation point of view, even though most of the approaches mainly aim to increase the network lifetime and reduce the energy consumption, the redundancy reduction is also an issue that researches take into account.
In (Zeydan et al., 2012), a game theory adaptive and distributed routing algorithm is proposed.In this approach, the correlated data between nodes is compressed in data streams to reduce the network load.This aggregated data plus the energy and interference define the cost function used by the routing algorithm.
In (Scaglione and Servetto, 2005), they generate an estimation on each node about the entire network measurements in order to identify correlated information among measurement points, and in this way, avoid its transmission to a sink through a multi-hop path.This approach uses the rate distortion function so that any node can obtain information from all its peers, under a prescribed distortion condition.These works, in spite of being concerned about the redundant information, are not focused on a multi-agent context where agents can make decisions according to their environment perception.Additionally, some of them require a full information network scheme, i.e, the nodes must be connected with all their peers.
On the other hand, in terms of research combining GPR models (also known as Kriging filter) and multi-agent learning contexts, most of the work has been focused in algorithms to find informative locations through the uncertainty maximization.In (Dongbing Gu and Huosheng Hu, 2012), a distributed Gaussian process regression (DPGR) that only requires neighborhood information is implemented.In this work, they use the information entropy to detect locations with high uncertainties to define an utility function for the central Voronoi tessellation (CVT) algorithm (Cortes et al., 2004).In (Nguyen et al., 2016), a sampling strategy based on entropy maximization is designed to find the most informative locations for mobile robotic wireless sensor networks (MRWSs) over a spatial field modeled with a Gaussian process.In (Oh et al., 2010), they first use sparse Gaussian processes to reduce the computational cost on each sensor before the environment state is predicted.Then, to define an exploratory rule towards the more informative locations, they use and compare three informative strategies: informative vector machine (IVM), principal feature analysis (PFA), and a mutual information based measurement selection algorithm (MI).In (Xu and Choi, 2011) a self-organizing scheme is presented where the agents move in an anisotropic field modeled with a spatio-temporal Gaussian process.A centralized sampling strategy to determine the next position of the agent is proposed based on the minimization of the Fisher Information Matrix.In (Binney et al., 2010), a path planning for underwater vehicles (AUVs) is presented.The movement is based on the minimization of the uncertainty (entropy) between sampled and un-sampled positions.In order to calculate the entropy, they model a underlying scalar field using a Gaussian process.Even though these works are concerned about high or low uncertainty in spatial fields, their main focus is not the redundancy of information and its identification from the agent side.
With this in mind, the main contribution of this work, is the ability of the agents to identify high or low correlated cues from the environment based on the parameter s of the Blahut-Arimoto algorithm, which, to the best of our knowledge, has not been discussed in literature.In this sense, agents in an IoT application can decide to be more or less exploratory in an spatial field, always under the limits defined by the rate distortion function, which guaranties a minimum mutual information scheme between agents and their environment, i.e, the minimum redundancy.

Rate Distortion Theory
In Fig. 1 we show the interaction of an agent and its environment to emulate the emitter-transmitter relationship through a communication channel described in (Shannon, 1948).Here, we represent the information that the agent has about the environment by means of the random variable Y , which can take any value inside of an alphabet Y, having a probability distribution p(y) = P r{Y = y}, y ∈ Y.The random variable X, represents the environmental information, with alphabet X and probability distribution p(x) = P r{X = x}, x ∈ X .The difference between X and Y can be calculated using the squared error measure It means that if the agent has complete certainty about the environmental information, X = Y and L(x, y) = 0.The relationship between X and Y is given by the conditional probability p(y|x), which expresses how the knowledge of Y reduces the uncertainty about X.This can be defined in terms of the entropy as where I(X; Y ) is the mutual information between X and Y , is the entropy of Y , and is the conditional entropy of Y given X = x.

It means that the mutual information can take the form
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W11, 2018 3rd International Conference on Smart Data and Smart Cities, 4-5 October 2018, Delft, The Netherlands x 2 x 3 x 4 x 5 x 6 xn Fig. 1.Agent-environment interaction from a rate distortion theory perspective.
Now, suppose that we want to define the minimum mutual information between X and Y necessary for the agent to understand the environmental information.Logically, this minimization must be subject to an expected distortion value, i.e, for a high mutual information, we must have a low distortion and conversely.In this sense, we have the mutual information minimization problem, also called rate distortion function R(D) given by where D is the expected distortion value.
Normally, to minimize the mutual information following an analytic procedure is not straightforward.In this regard, the Blahut-Arimoto algorithm offers a less complex procedure to calculate it.

The Blahut-Arimoto Algorithm
Before explaining the Blahut-Arimoto algorithm, we express (6) through the Lagrange multipliers s 1 and λ.Then 1 We use the symbol s as one of the Lagrange multipliers to be consistent with literature related to the Blahut-Arimoto algorithm.
where the inequality restriction p(y|x) ≥ 0 is temporarily ignored.Now, let Then which is a double minimization problem that is solved in two steps described in appendix B. The results of these two steps, are the basis for the Blahut-Arimoto algorithm definition (Blahut, 1972), which calculates p * (y|x) and p * (y) iteratively until a convergence condition is fulfilled (see Algorithm.1).
Algorithm 1: BLAHUT-ARIMOTO Input: s, p0(y) The choice of the inputs s and po(y) follows the next rules: |Y| , where |Y| is the number of elements in Y.
Note that the algorithm requires the prior distribution p(x), which for our approach, is determined by the environment information gathered for each agent.We will describe how to define it in section 4.

3.2.1
The Parameter s In order to understand the relevance of parameter s, let's consider a random variable X with X = {x ∈ R : −10 ≤ x ≤ 10}.Assuming that p(x) is a normal distribution given by N µ = 0, σ 2 = 3 and applying the Blahut-Arimoto algorithm for −2 ≤ s ≤ −20, we obtain the curve of the R(D) function shown in Fig. 2 (Blahut, 1972).The allowed region depicted in this figure, is composed of the set of all pairs {R, D} that guarantee the environmental information recovering for the agent.However, some points in this region might use more information than necessary to represent some data, causing redundancy.The {R, D} pairs above the curve R(D), set the limit where the the mutual information is minimum and enough to understand what is happening in the environment.In other words, above the curve we have points where the information is neither redundant nor misunderstood.The relationship between the parameter s and the R(D) curve is visible if we recall Eq. ( 9), which can be rewritten as The parameter s also affects the certainty of the conditional distribution p(y|x).Figure 4 shows how the variance of p(y|x = −5.15)changes depending on the value of s.Note how the variance increases with s.At this point is noticeable that the selection of s can define the level of distortion that we want to have for a specific application.
In our approach, this parameter is important to establish a greedy behavior of agents exploring the environment in a multi-agent system.It will be described in more detail in section 4.

Non Parametric Learning
In a conventional learning process, the use of linear parametric regressions with the form y = f (x) = w0 + w1x + , (with = irreducible error), (12) has been widely used to predict the behavior of a variable y from a set of known values of x, where the parameters wn are calculated from an error condition minimization.For the case of two parameters, this minimization is given by This kind of regression, in spite of being extended to non linear cases, becomes unmanageable as the number of parameters increases.
In this regard, the GPR offers a non parametric alternative to address regression problems, since it finds a distribution over all possible vectors f (x), coherent with the previously known data (named training data set).A brief explanation of GPR is given for the reader to better understand how it works and is used in our approach.

GPR
The GPR consists by the definition of a posterior vector z k , from a prior vector z h satisfying a training data set.
The pair (c hi , z hi ) denotes the training point i in which c hi is a 2-D coordinate and z hi its corresponding value 2 .In the same way, the pair (c kj , z kj ) denotes the testing point j where c kj is a 2-D coordinate and z kj the value that we want to predict.It can be represented as Since vectors z h and z k are assumed to be Gaussian distributed, their combination must also follow a Gaussian density given by where K is the training data covariance matrix, K * the crosscovariance matrix between training and test data, and K * * is the covariance matrix for the testing points (Rogers and Girolami, 2016).The calculation of the above covariance matrices, for the case of our approach, is based on the popular kernel RBF (Radial Basis Function), which is defined as where c(bn, bm) is the covariance between a pair of points bn and bm belonging to any set being evaluated (training or test) and the parameters α and γ define the amplitude and smoothness of the resultant distribution .
Finally, the conditional distribution between the posterior and prior vectors is given by where and For the calculation of z k we can use the approximation where L is the square root of the covariance matrix, such that Σ * = LL T .The matrix L can be obtained using the Cholesky decomposition.
In the case of our study, the set of predicted values given by z k represents the environmental information for each agent, which is calculated using the neighborhod information as training data.
We will show it in detail in section 5.

Graph Theory in Multi-Agent Systems
Graph theory and complex networks have high relevance in multi-agent systems description through parameters such as cluster coefficient, average degree and degree distribution, among others.In this section, we explain some concepts related to undirected graphs that have relevance for the results analysis in section 5.
Consider the definition of a graph as G = (V, E), where the sets V and E represent the nodes and the edges connecting them, respectively.An edge denoted as (i, j) links the nodes i and j.If a pair of nodes are connected through an edge, they are said to be adjacent or neighbors.
A general form to describe the node interaction in a network is by means of the N × N symmetric adjacency matrix A = {aij}, which is defined such that where N is the number of nodes in the network.
The degree of a node i, denoted as ki is defined as the number of edges connected to it.In terms of the adjacency matrix, it is given by ki therefore, the network average degree is As we will see in section 5, the value of k shows how the number of connections of a node changes with the value of the parameter s.

AGENT SELF-IDENTIFICATION OF REDUNDANT INFORMATION
So far, we have shown how a GPR can be used to get environment information and predict its behavior from a prior set of training data.Additionally, we have explained the usefulness of the R(D) function to get the minimal mutual information between an agent and its environment.Now, let us suppose that we have a multiagent system in which its individuals get information from the environment just based on the information that they can collect from their neighborhood in order to make decisions.In addition, this collected information is the minimum necessary to understand what is happening in the agent vicinity, avoiding in this way, the use of resources to gather and interpret unnecessary data.
In this regard, we propose the multi-agent learning framework shown in Fig. 5, which combines the environmental-predictive capacity of GPR and the low redundancy characteristics offered by the rate distortion function.This figure shows the steps of the framework, composed of the learning stage, followed by a step where each agent can fix its greedy behavior through the parameter s, and the mutual information minimization stage.Firstly, each agent calculates the µ * , Σ * representing the covariance and mean values of the predicted set, having as training set the data taken from its neighbors, given by c h and z h .Secondly, the parameter s can be tuned for each agent in order to perceive more or less distortion about its environment, allowing it to identify more or less correlated cues and make decisions about them.Thirdly, the environment predicted information, given by µ * , Σ * , sets the input probability distribution p(x) of the Blahut-Arimoto algorithm, which is used to calculate p(z ho |z k ), i.e, the conditional probability between the agent information z ho and the environment information z k .Finally, the agent modifies its behavior according to a strategy β, which indicates the level of mutual information to follow, i.e, if it will be more of less exploratory.

The Greedy behavior of Parameter s
As we mentioned before, the value of the parameter s determines the pair {R, D} on the R(D) curve.In our model, the choice of this parameter for each agent depends on the calculated variance Σ * j in the predicted point c kj .It can be understood if we consider the normal distribution N (µ * j , Σ * j ) as the input probability distribution p(x) of the Blahut-Arimoto algorithm, which, according to (Cover and Thomas, 2005), lets us to calculate R(D) through the expression Thus, considering that s is the slope of curve R(D) in a point {D, R}, we have which, evaluated in the maximum distortion value, gives us the upper boundary condition For this value of s, the distortion on each agent about the environmental information is the highest.However, we still are inside of the values allowed by the rate distortion function.In this point, agents demonstrate to be more exploratory (greedy) than for values of s associated to lower distortions, where they prefer to follow the same environmental cues followed by their peers.
In section 5, we will show how the proposed model behaves in a mobile sensor network simulated inside of a 2-D grid where a measured variable has different levels on each point.The results will show the network behavior for different values of the parameter s, generating more or less exploratory interest on each agent (sensor).

Algorithm Representation
The Algorithm 2 shows a brief representation of the framework implementation.Observe how the Blahut-Arimoto algorithm is included to calculate the conditional probability between the agent measurement z h0 and the environment predictions represented as z kj .The returned parameter β describes the strategy to be followed by the agent, which in the case of a mobile sensor network, as we will see in section 5, is given by the movement towards the estimated point c kj with the lowest mutual information.

CASE OF STUDY: A MOBILE SENSOR NETWORK
The framework implementation is carried out in a mobile sensor network composed of 10 sensors (agents) with mobility capabilities deployed randomly in a 2-D grid that simulates a spatial field (see Fig. 6).Observe the variation of the hypothetical measure along the 2D-grid and the initial positions of the agents.

The Greedy Effect of s
The exploratory effect of parameter s on each agent is shown in Fig. 8.In Fig. 8a, s = −10.In this case, near agents exhibit preference for the same environment cue, following similar trajectories and generating clusters; therefore, the coverage is reduced.In Fig. 8b, we use the higher boundary s = −1 2Σ * j .In this case, the agents become more exploratory, improving the coverage.
In Fig. 9 the trajectories for all agents are shown for 20 simulation steps and different values of s.In Fig. 9a, s = −10.In this case, agents exhibit redundant trajectories, which results in loss of coverage in some regions.In Fig. 9b, s = −4 Σ * j .The coverage in this case improves but some agents still explore regions with redundant measures.In Fig. 9c, s = −1 2Σ * j , i.e, the high boundary for the parameter s.In this case, agents show a preference to move through regions where the environment is more variable.In terms of coverage, it has a better performance than the two previous cases, since the exploration is reduced in zones where the measures are redundant.
Results in Fig. 10 show the variation of the degree for each one of the 10 agents when the value of s changes.Additionally, it is possible to note how the average degree of the system increases as the value of s decreases beyond −4 Σ * j .Agents following cues with low distortion (lower values of s) tend to form clusters and increase their average degree.In terms of data transmission, it means that redundant information is collected and sent to the data center or sink.On the other hand, agents following distorted cues (higher values of s), tend to have lower average degrees, avoiding cluster formation and redundancy in data centers.
In spite of the apparent differences in the agents behavior for different values of s, it is important to note that any selected value below the upper boundary condition is a point in the rate distortion curve, and therefore, is a point for minimal mutual information, i.e, the agents make decisions using just the necessary information from the environment.

CONCLUSIONS
Our study provides a framework for learning in multi-agent systems based on GPR to gather predictive environment information and the rate distortion function to define conditions for agent decision making under low redundancy conditions.The results, implemented in a mobile sensor network, shown how the parameter s of the Blahut-Arimoto algorithm allows the agents to be more or less exploratory.This behavior can improve coverage efficiency and reduce the transmission of redundant data.
On the other hand, we have shown how the network average degree maintains lower values when s is closer to its upper boundary.It means that the agents maintain a low number of connections because they are continuously moving to uncovered regions.Additionally, agents show a preference to move in zones where the environment is variable, avoiding the exploration of places having similar measurements.
Since the agents have the ability to detect locations with redundant information, an energy waste reduction is implicit when they avoid to move in those directions.Additionally, agents do not require a full information scheme, i.e, they do not need to get information from all their peers in order to make predictions about the environment, which can be traduced in a reduction of the communication cost. .Agents tend to explore regions where the environment is more variable.

Fig. 10 .
Fig. 10.Average degree for different values of s.

Fig. 8 .Fig. 9 .
Fig. 8.The exploratory effect of parameter s on each agent.(a) For s = −10.Near agents exhibit preference for the same environment cue.(b) s = −1 2Σ * j The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W11, 2018 3rd International Conference on Smart Data and Smart Cities, 4-5 October 2018, Delft, The Netherlands The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W11, 2018 3rd International Conference on Smart Data and Smart Cities, 4-5 October 2018, Delft, The Netherlands