STRUCTURAL SIMILARITY MEASURE OF USERS PROFILES BASED ON A WEIGHTED BIPARTITE GRAPHS

: The user profile is a very important tool in several fields such as recommendation systems, customization systems etc., it is used to narrow the number of data or results provided for a specific user, also to minimize the cost and the time of processing of multiple systems. Whatever the user profile model used, it’s updating and enrichment is a very essential step in the information research process in order to obtain more interesting and satisfactory results, which lead the information systems to develop several techniques aiming to enrich them based especially on similarity methods between user profiles. The similarity methods are used for several tasks such as the detection of duplicate profiles in online social network, also to answer the problem of cold start, and to predict users who can become friends as well as their future intentions, etc. In this paper, we propose a new approach to express the similarity between users profiles by developing a structural similarity measure to calculate the similarity between user profiles based on SimRank measure or similarity ,and the properties of bipartite graphs, in order to take advantage of the information provided by the relational structure between user profiles and their interests, our method is characterized by the similarity propagation between graph's nodes over iterations from source nodes to their successors, so our method finds profiles similar to the query profile, whether the links are direct or indirect between profiles.


INTRODUCTION
The fundamental purpose of information systems (IS) is to provide more satisfactory results to the needs of a given user from his query by using similarity measures to study the resemblance between this query and a collection of documents. To facilitate the processing of this task, these systems have starting to add additional information from the user to his query, such as his browsing history, his profiles on social networks, the information entered in forms, etc. A study made by (Fijałkowski, 2011) shown that the best additional information which can be integrated during information retrieval processes, is the use of the user profile, which has given rise to personalized information search system and then the contextual information search system based on user profile which integrates it in the information retrieval process such as in relevance reinjection ,query reformulation ,search results ordering ,etc. Sometimes these systems are faced profiles which do not contain all the information which can be useful for them, especially in the case of cold start problem (Lika, 2014), consequently the enrichment of these profiles is essential, the most used techniques in these cases is the processing and analysis of information of users similar to the user which we aim to complete and enrich his profile. So in this paper, we propose a new structural similarity measure, based on a weighted bipartite graph to study the similarity between profiles, since we think that the information provided by the relational structure present an interest and deserves to be studied. So in this article we will first present the user profile, its uses and some similarity measures in order to introduce our approach of structural similarity between user profiles with an application and we will end by a conclusion and our prospects for research.

The User Profile
According to (Hasan, 2013) a user profile represents a collection of personal data associated with a specific user which describes a set of attributes, these attributes may include geographic location, academic and professional experiences, objectives (short term and long term), behaviours, interests (professionals, entertainment, commercial products, etc.), etc. The user profile can be built according to two methods: either by the user himself, what is called explicit profile, or automatically from data resulting from the interactions between the user and the system, in this case it's called implicit profile. This last step is the most common, since the manual entry of parameters (preferences, interests ...) by the user can be a tiring task for him and can take a long time to express his needs.

The Use of User Profile
Users profiles are used in several areas to speed up and facilitate data processing, especially in the areas of recommendation systems such as (Alshammari, 2019) which deals with personalized recommendations on Twitter based on the explicit modeling of users profiles, as well than in the field of personalization such as the case of (Tahar, 2017) which begins a very interesting approach to information search based on semantics using a geo-social user profile, or to detect false information by exploiting the profiles of users on social networks (Shu, 2019) and extracting the opinions and interests of these users (Chen, 2017), etc.
The lifecycle of a user profile goes through several techniques, starting with the extraction of information and data of the user, then its modeling (El Achkar, 2019), its construction and finally its enrichment.
User's data changes from one moment to another, which implies a regular update of these profiles, some systems tend to exploit the data of users similar to such a user in order to enrich his profile, which has pushed researchers to develop techniques and measures of similarity between user profiles, especially to overcome the famous problem of cold start (Lika, 2014).In the next part, we will cite some existing similarity measures, in order to introduce our similarity approach between user profiles based on a weighted bipartite graph.

Similarity Measures
There are several similarity measures in the field of information system that we can group them into 5 main types: Semantic similarity (Hliaoutakis, 2006), Structural similarity (Buttler, 2004), Content similarity (Stentiford, 2003), keyword similarity (Niwattanakul, 2013) and Hybrid similarity (Gupta, 2014), each type of similarity is exploited in a given context according to the needs and intentions of each system. For example to compare user profiles in order to enrich them, or to detect fake profiles or else for matching user profiles, also in recommendation systems in order to predict user behaviours and intentions and so on.
Comparative studies between these measurements show that SimRank and the Cosine measurement give satisfactory results especially in the field of collaborative filtering and another comparative study between SimRank and cosine conducted by (Champclaux, 2008) in the field of information retrieval, demonstrates that the SimRank outperform, which motivates our approach to apply the SimRank measure on user profiles in order to study the similarity between them using a weighted bipartite graph.

OUR APPROACH
Our work revolves around the development of a structural similarity measure to calculate the similarity between users profiles, this similarity is based on the structural measure of similarity SimRank (Jeh, 2002). This part is organized as follows: we will start with the presentation of the SimRank similarity measurement based on an oriented graph, as well as the generic bipartite SimRank measurement in order to introduce our approach and the methodology that we will follow to measure the similarity between two user profiles. (Jeh, 2002) Proposed a measure of structural similarity between objects in a domain involving an object-to-object relationship. In this approach, the objects and their relations are modeled by an oriented graph G(V, E), which the nodes V represent the domain objects studied, and the arcs E represent the relations between these objects.

The SimRank Model Based on an Oriented Graph
The initial assumption is that "objects are similar if they are connected by similar objects". The aim of this approach is to determine the similarities between nodes of the graph by assigning them a similarity score called SimRank which is defined by: Let I(v) the set of predecessors of a node v, |I(v)| is the cardinal of all these predecessors. The SimRank Score S(a, b) between an object a and an object b is defined by: (1)

The SimRank Model Based on Bipartite Graph
The SimRank measure was extended by (Jeh, 2002) to fields with two types of objects. The appropriate structure to represent such a domain is a bipartite graph. So we can calculate two types of similarity scores: _ The similarity score between nodes of type 1: Two object of type 1 are considered similar if they point to similar objects of type 2.
_ The similarity score between type 2 objects, two objects of type 2 are similar if they are pointed by similar type 1 objects. These notions can be formalized by two functions and , the SimRank Score S(a, b) between two objects a and b is defined by: Where and are constants between 0 and 1 Experiments of these formulas by (Champclaux, 2009) have shown that this measure of similarity is characterized by the capacity to order objects according to their relationships as well as the illustration of similarity propagation phenomenon which is the basis of our approach for studying similarity between user profiles from their interests.

Our Structural Similarity Approach between User Profiles Based On Weighted Bipartite Graphs
As we mentioned earlier, the user profile is considered as a set of data made up of a various information: personal, professional, and especially user's interests which we will use in our study. The application of the structural measurement of SimRank described above consists on one hand in representing this data in the form of a bipartite graph in which the type 1 nodes are user profiles and the type 2 nodes are the interests of these users, and secondly to define the structural relationship between them: The belonging, that is to say the fact that a profile contains interests and vice versa that the interests are contained in a profile, a profile node is connected by an arc to an interest node if the profile contains this interest, and finally searching user profiles similar to a given user profile by the application of Simrank. A query that contains the user profile in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online) which we are searching for profiles those simulate it, is integrated in this graph as an additional profile node.
Example: Considering a corpus of research composed of two profiles made up of a set of interests like that: _ Profile1: {interest1, interest2, interest4, interest5} _ profile2: {interest2, interest3, interest5} Given a search query: R-Profile: {interest1, interest3, interest5} that represents the user profile which we are searching profiles those similar to him.
The corpus and the query are represented by the following graph G: Our goal is to sort profiles based on their similarity to the Rprofile query.

The Formula
In Information retrieval field, the best results are obtained when documents are represented in the form of a weighted terms list that is why we want to adopt this principle and add the weight of users interests in this approach. Such a description is translated by a weighted bipartite graph in which the arcs between profiles nodes and interests' nodes are weighted by the weight of these interests appearing in each profile. So the SimRank formulas adapted to our approach will be presented as follows: Considering a corpus described by: C and P where: C=(cj), j=1..m : is the set of corpus's interests, m is the total number of these interests.
P=(pi), i=1..n : is the set of corpus's profiles, n is the total number of profiles in this collection.
With pi=(wi1,wi2,…,wij,…wim) , wij is the weight of the interest j in the profile i. In order to take into account interest's weights, in the intention of giving the Profiles-interest arcs a weight.
The calculation of the similarity between two profiles is defined as follows: (3) Where : M : a propagation constant M= 0,9 : Is the set of interests of the profile Pi.
: Is the number of interests belonging to the profile Pi.
: Is the interest of the profile Pi (the profile of the collection). The similarity between two interests and is defined as follows: (4) Where: : Is the set of profiles containing the interest .
: Is the number of profiles containing the interest .
: Is the profile containing the interest .
The formulas reflect the fact that the similarity of two profiles strongly depends on the similarity of the interests that contain them and reciprocally the similarity of two interests depends on the similarity between the profiles in which they belong, this is due to the structural relationship between each profile and its interests

APPLICATION
In this part, we will apply the formulas presented previously to a corpus composed of three profiles (P1, P2, and P3) and a query (R-Profile). The R-Profile query is composed of five interests: int1, int2, int3, int4, int5. The P1 profile is composed of five interests: int1, int2, int3 that it shares with the query (R-Profile), plus int6 and int7. The P2 profile is composed of four interests: int4 and int5 which it shares with the request (R-Profile), plus int8 and int9, the P3 profile is composed of three interests: int6 and int7 which it shares with the P1 profile, and int8 which it shares with the P2 profile, and finally the P4 profile which contains the interests int10 and int11. int1, int2 and int3 have a weight of 2, the other interests have a weight of 1. This example is illustrated by the following figures: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online) Here are the similarity scores of our approach, the SimRank similarity and the Cosine similarity that each profile obtains with the R-Profile query: According to the results the profile P1 is the most relevant for the query in front of P2, itself in front of P3. If we apply the cosine measurement between the query and profile 3, we get a similarity score of 0 since there is no common term between them, contrary to our approach where we obtain a score of 0.27, this is due to the direct resemblance of P3 with P1 and P2, and to the direct resemblance of P1 and P2 with R-Profile, these transitive relations reflects the phenomenon of similarities propagation between the corpus profiles, which gives strength to our approach.
We also notice that we obtained similar scores between SimRank and our approach since our approach is an extension of the SimRank similarity to which we add the interest weights of the profiles

Analysis and Discussion
We have described the adaptation of an objects comparison method based on graphs to the comparison of users' profiles. This adaptation resulted in the definition of a new similarity function taking into account the graph structure induced by the relationship between profiles and their interests. Conceptually a profile is considered as the node of a bipartite graph to which the interest nodes are connected. The similarity between profiles is calculated as the average of interests' similarities that compose them. Reciprocally, the similarity between interests is calculated as the average of profiles similarities that contain them. From this recursive definition, we defined two formulas: one defining the similarity between the profiles, the other defining the similarity between the interests, which allowed us to define a measure of structural similarity inter-profiles and inter-interests.We also took into account the profiles interests weighting, which translates conceptually by weighting the graph arcs between the profiles and their interests.
Our similarity approach high score in comparing two profiles depends more on the proportion of common interests than on the proportion of non-common interests. In addition to this, our similarity approach finds profiles similar to the query(Rprofile), whether the link is direct or indirect between them, since this algorithm propagates similarities between profiles over iterations from node to node, from source nodes to their successors.

CONCLUSION AND PERSPECTIVES
We have presented a structural similarity measure between users' profiles, able to extract similar profiles even if there is no link or common interest between them. our approach can be used by several domains, for example it can be used to solve the famous cold start problem (Lika, 2014), also to study community evolution graph. In our case, we aim to take advantage of the similarity propagation property of our approach in order to detect nodes that can be connected in the future in a given network, for example in the case of friends networks to predict the users who can become friends as well as their future intentions, especially since it is quite obvious that a user will certainly be influenced by the interests and behaviors of the users of his network.