Learning graph-structured data using Poincaré embeddings and Riemannian K-means algorithms

  • 2019-07-02 21:45:22
  • Hatem Hajri, Hadi Zaatiti, Georges Hebrail
  • 17

Abstract

Recent literature has shown several benefits of hyperbolic embedding ofgraph-structured data (GSD) in representing their structures and latentrelations. While several studies have explored the ability of hyperbolicembedding to represent data (for example, by quantifying their mean averageprecision) and their ability to produce better visualisations of clusters, onlyfew works exploited the effectiveness of hyperbolic embedding to performlearning on the initial GSD. Motivated by innovative ideas from the fields of Brain computer interfacesand Radar processing, this paper presents a new scheme for learning GSD basedon hyperbolic embedding, Riemannian barycentre (i.e. Fr\'echet or geometricmean) and $K$-means algorithms as a significant tool that derives from it. Themain idea is as follows. Relying on the Riemannian barycentre, we define anotion of minimal variance which allows us to choose an embedding betweendifferent ones. This embedding is used thereafter together with $K$-meansalgorithms to perform unsupervised clustering and in combination with thenearest neighbour rule to perform supervised learning. We demonstrate theperformance of the proposed framework through several experiments on real-worldsocial networks and hierarchical GSD. The obtained results outperform theircounterparts in high-dimensional Euclidean spaces and recent proposed geometricapproaches.

 

Quick Read (beta)

Learning graph-structured data using Poincaré embeddings and Riemannian K-means algorithms

Hatem Hajri Institut de recherche technologique SystemX, 8 Avenue de la Vauve, 91120 Palaiseau, France.
11email: [email protected]
   Hadi Zaatiti Institut de recherche technologique SystemX, 8 Avenue de la Vauve, 91120 Palaiseau, France.
11email: [email protected]
   Georges Hebrail Institut de recherche technologique SystemX, 8 Avenue de la Vauve, 91120 Palaiseau, France.
11email: [email protected]
Abstract

Recent literature has shown several benefits of hyperbolic embedding of graph-structured data (GSD) in representing their structures and latent relations. While several studies have explored the ability of hyperbolic embedding to represent data (for example, by quantifying their mean average precision) and their ability to produce better visualisations of clusters, only few works exploited the effectiveness of hyperbolic embedding to perform learning on the initial GSD.

Motivated by innovative ideas from the fields of Brain computer interfaces and Radar processing, this paper presents a new scheme for learning GSD based on hyperbolic embedding, Riemannian barycentre (i.e. Fréchet or geometric mean) and K-means algorithms as a significant tool that derives from it. The main idea is as follows. Relying on the Riemannian barycentre, we define a notion of minimal variance which allows us to choose an embedding between different ones. This embedding is used thereafter together with K-means algorithms to perform unsupervised clustering and in combination with the nearest neighbour rule to perform supervised learning. We demonstrate the performance of the proposed framework through several experiments on real-world social networks and hierarchical GSD. The obtained results outperform their counterparts in high-dimensional Euclidean spaces and recent proposed geometric approaches.

Keywords: Poincaré embeddings, Riemannian barycentre, Hyperbolic K-means clustering.

1 Introduction

In recent years, the idea of embedding data in new spaces has proven effective in many applications. Indeed, several techniques have become very popular because of their great ability to represent data while reducing the complexity and dimensionality of the space. For instance, Word2vec [25] and Glove [28] are widely used tools in natural language processing, Nod2vec [19], Graph2vec [26] and DeepWalk [29] are commonly used for community detection, link prediction and node classification in social networks [15].
A major achievement in recent years has been the discovery of hyperbolic embeddings [27, 13, 31]. Although it has been speculated since several years that hyperbolic spaces would better represent GSD than Euclidean spaces [18, 22, 11, 2], it is only recently that these speculations have been proven effective through concrete studies and applications [27, 13, 31]. As outlined by [27], Euclidean embeddings require large dimensions to capture certain complex relations such as the Wordnet noun hierarchy [16]. On the other hand, this complexity can be captured by a simple model of hyperbolic geometry such as the Poincaré disk of two dimensions [31]. Hyperbolic embeddings also provide better visualisation of clusters on graphs than Euclidean embeddings [13].

The present paper is concerned with learning GSD represented by an adjacency matrix. Examples of these GSD include social networks, hierarchical lexical databases such as Wordnet and Lexical entailments datasets such as Hyperlex [27, 13, 31]. In the state of the art, one can distinguish two different approaches for clustering this type of data. The first one applies pure clustering techniques on graphs such as spectral clustering algorithms [32], power iteration clustering [24] and label propagation [38]. The second one is two-step and may be called Euclidean clustering after (Euclidean) embedding. First it embedds data in Euclidean spaces using techniques such as Nod2vec, Graph2vec and DeepWalk and then applies traditional clustering techniques such as K-means algorithms. This approach appeared notably in [37, 33, 35]. Our main objective throughout the paper is to present the hyperbolic counterpart of this approach.

The motivation for this paper comes from recent works in Brain computer interfaces [10] and Radar processing [8]. In particular [10] used the distance to the Riemannian barycentre on the space of covariance matrices to classify brain computer signals. [8] used the Riemannian median on the Poincaré disk to detect outliers in Radar data based on a similar idea. In this paper, we apply for the first time (to the best of our knowledge) these techniques in link with hyperbolic embeddings of GSD. Our main contributions can be summarised as follows:

Unsupervised learning on GSD: We consider the task of clustering nodes on a GSD with a known number of classes denoted K. For this, we first generate several hyperbolic embeddings on the Poincaré disk 𝔻={z:|z|<1} following [27]. For each embedding, we run a Riemannian K-means algorithm. Finally we keep the embedding with minimal total variance, a notion which we introduce. This procedure is evaluated on 5 real-data social networks and compared with its analog on the Euclidean space with 10 dimensions and with the recent clustering method proposed in [1]. Experiments show that our method outperforms these two approaches.

As another application, we considered the task of clustering a typical example of hierarchical GSD which is a subtree of Wordnet. We focused on the representation of clusters and the interpretation of their contents.

Supervised learning on GSD: We adapted the previous idea to the context of supervised learning by keeping the embedding which has minimal total variance on the training dataset. Non labelled nodes are then classified according to the nearest barycentre rule. This approach is compared with the recent hyperbolic supervised approach [14] based on support vector machines. Experiments show the advantage of our method over that of [14].

The rest of the paper is organised as follows. Section 2 reviews necessary optimisation tools on the Poincaré disk as a Riemannian manifold of negative curvature. In particular, we discuss existence, uniqueness and numerical computations of the Riemannian barycentre and then deduce a Riemannian K-means clustering algorithm on this space. Using barycentres to unroll K-means algorithms have had several applications including object detection and tracking, shape classification, and image segmentation [34, 23, 20, 17, 9]. In these works the reference space is the manifold of covariance matrices. Here, we adapt this idea to the Poincaré space. Section 3 reviews the Poincaré embedding as introduced in [27] and presents our approach to learn GSD. Finally Section 4 provides experimental results and a comparison with the state of the art11 1 The package generating these results will be made public in the near future..

2 Statistical learning on the Poincaré disk

This section reviews the Riemannian geometry of 𝔻, discusses the existence and uniqueness of the Riemannian barycentre, focuses on its numerical computations and as a consequence derives a Riemannian K-means algorithm on this space.

The Poincaré disk is commonly equipped with the Riemannian metric known as the Poincaré metric and expressed as

gp(X,Y)=X,Y(1-|p|2)2 (1)

where p𝔻,X,Y2 and X,Y is the scalar product on 2. The Riemannian metric (1) induces a Riemannian distance between z0 and z1 given as follows:

d(z0,z1)=12log((1+|z1-z01-z0¯z1|)(1-|z1-z01-z0¯z1|)-1)

This distance can also be expressed as 12arcosh(1+|z1-z0|2(1-|z0|2)(1-|z1|2)) which is half the distance used in [27].

The Poincaré metric (1) turns 𝔻 into a Riemannian manifold of negative sectional curvature [21, 36]. As a result, 𝔻 enjoys the property of existence and uniqueness of the Riemannian barycentre [3]. More precisely, for every set of points {zi,1in} in 𝔻, the empirical Riemmanian barycentre z^n

z^n=argminz𝔻(i=1nd2(z,zi)) (2)

exists, is unique and belongs to 𝔻. Several stochastic gradient algorithms can be applied to numerically approximate z^n [12, 5, 7, 6]. In this paper, we will use the algorithm of [8] which has proven effective for Radar applications. For this, we first recall the definitions of the Riemannian exponential and logarithmic maps.

For given z𝔻 and v2 the exponential map is

Expz(v)=(z+eiθ)e2||v||+(z-eiθ)(1+z¯eiθ)e2||v||+(1-z¯eiθ)

with

θ=arg(v),||v||=|v|(1-|z|2)

The exponential map Expz is a diffeomorphism from 2 to 𝔻. Its inverse, called the logarithmic map and denoted by logz is given by

Logz(y)=(1-|z|2)atanh(|y-z1-z¯y|)eiθ

with θ=arg(y-z1-z¯y) and atanh is the inverse of the hyperbolic tangent. For more details on the previous formulas, we refer to [4, 36]. For numerical computation of the barycentre, we will use Algorithm 2 below from [8].

{algorithm}

[!ht] SGD on 𝔻 for barycentre computation InputsZ={zi,1in}: a subset of 𝔻 (complex), zinit: barycentre initialisation (complex), τb: step size (strictly positive float)
Outputz^: numerical approximation of the Riemannian barycentre (complex)

\[email protected]@algorithmic

[1] \Statez^zinit \Commentrandom or a trickier initialisation (e.g. , average mean)

\State

d=1e6 \Commenta significantly large number \Repeat\State                           μ2ni=1nLogz^(zi), z^Expz^(τbμ),  d|μ|2(1-|z^|2)2 \Untild is small
\Returnz^

A direct consequence of existence and uniqueness of the Riemannian barycentre is K-means algorithm illustrated in Algorithm 2. In the description we use the word centroid to denote Riemannian barycentre.

{algorithm}

[!ht] K-means clustering on 𝔻 Inputs   K: number of clusters (integer), Z: set of n complex numbers that are a subset of 𝔻 (complex),τb: barycentre approximation step size (strictly positive float)
OutputN: set of K centroids (complex), labels: n labels of the input data (table of integers) \[email protected]@algorithmic[1] \StateInitialize K centroids, N={ν1,ν2,,νK} randomly in 𝔻 \Repeat\ForziZ \State                           ciargminj{1,,K}d(zi,νj) \Commentd is the Riemannian distance \EndFor\Forj{1,,K} \State                          νj𝚁𝚒𝚎𝚖𝚖𝚊𝚗𝚒𝚊𝚗_𝙱𝚊𝚛𝚢𝚌𝚎𝚗𝚝𝚎𝚛({zi|ci=j},𝚁𝚊𝚗𝚍𝚘𝚖,τb)) \EndFor\Untilconvergence \ReturnN, labels={ci}

3 Learning GSD using Poincaré embeddings and Riemannian optimisation

This section starts by reviewing the approach of [27] to embed GSD in the Poincaré disk. Based on this embedding and K-means algorithm introduced in the previous section, we present our algorithm to perform supervised and unsupervised clustering tasks.
Given a GSD (𝒱,), where 𝒱 and and are the nodes and edges datasets, the probability of an edge e={u,v} is modelled in [27] using the Fermi-Dirac distribution

P(u|v,Θ)=1ed(u,v)+1

To embed 𝒱 in 𝔻, [27] learns a map u𝒱θu𝔻 by minimising

u𝒱v𝒩(u)log(P(u|v,Θ)),Θ={θu}u𝒱 (3)

where 𝒩(u) is the set of neighbours of u. Following [25], (3) is optimised by selecting small number of negative samples according to a priori distribution P𝒩. Taking this into account, the objective function (3) can be written as

(Θ)=log(σ(-d(u,v)))+i=1M𝔼P𝒩[log(σ(d(ui,v)))] (4)

with σ(x)=(1+e-x)-1 the softmax function 22 2 Since the Riemannian gradient of dp is given by θdp(θ,θ)=-p*dp-1(θ,θ)*logθ(θ)d(θ,θ) (see [8]), we actually used d2 instead of d to avoid division by d(θ,θ) and get better stability. We also notice that [27] proposes to maximise (4) for social networks GSD and to maximise (Θ)=log(e-d(u,v)v𝒩(u)e-d(u,v)) for hierarchical GSD. We found that with the latter loss function, the embedded nodes quickly approach the boundary of the disk and that using (4) also for hierarchical GSD gives us more stable results. . In practice, (Θ) is optimised by generating nodes on the graph using DeepWalk [29] and then sampling from these nodes using the unigram distribution raised to 3/4 [25].

Given a cluster 𝒞={zi,1in} in 𝔻, with barycentre z^, we define its variance as

Var(𝒞)=1ni=1nd2(z^,zi)

This definition is in accordance with the use of d2 mentioned in the footnote. The variance will be used to choose one embedding between several ones. In fact, it is reasonable to give more confidence on the embedding for which the variance is minimal, that is points are more concentrated around barycentres. This idea will be justified empirically in the next section.

Finally, based on the Riemannian barycentre, Algorithm 3 below presents our scheme to perform unsupervised clustering. The main idea is to embed GSD in the Poincaré disk, form clusters using the ground truth data and lastly associate points to clusters according to the nearest barycentre rule.

In what follows, the Poincare_Embedding function, given an adjacency matrix of an input GSD, minimises (4) and outputs the embedding of every node on 𝔻.

{algorithm}

[!ht] Unsupervised clustering algorithm. Inputs𝒜: adjacency matrix of a GSD with 0,1 entries, Pembedding: an object containing the embedding parameters, K: number of classes, PK_Means: an object containing the K-means algorithm parameters, NE: number of experiments
OutputsBest_Embedding: embedding of input graph with minimal total variance, N: barycentres of each cluster, labels: cluster labels for each node

\[email protected]@algorithmic

[1] \Repeat\CommentExperimentally each execution is run in parallel \State                          Embeddingi𝙿𝚘𝚒𝚗𝚌𝚊𝚛𝚎_𝙴𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐(𝒜,Pembedding) \State                          Ni,labelsi𝙺_𝙼𝚎𝚊𝚗𝚜(K,Embeddingi,PK_Means) \State                           Clusters{c1,,cK|cj={zqEmbeddingi|labelsi[q]=j}} \State                          vari=maxj{1,,K}(Var(cj))33 3 Experimentally we chose the second largest variance, because the cluster with maximum variance is often dispersed and does not reflect the quality of the embedding. This may be relaxed by taking higher dimensional hyperbolic balls. \UntilNE embeddings have been computed \State

Best_EmbeddingEmbeddingbest|best=argmini{1,,NE}(vari)
\Return

Best_Embedding,Nbest,labelsbest

Algorithm 3 below presents our scheme to perform supervised clustering. Details regarding the implementation aspects together with experimental results of Algorithms 3 and 3 will be provided and discussed in the next section.

{algorithm}

Supervised clustering algorithm Inputs𝒜: adjacency matrix of a GSD with 0,1 entries, Pembedding: an object containing the embedding parameters (object), Ground_Truth: The ground truth of each node in the GSD (table of integers), τb: barycentre approximation step size (strictly positive float)
OutputClass: computed cluster of each node used for training (integer) \[email protected]@algorithmic[1] \State                          Embedding𝙿𝚘𝚒𝚗𝚌𝚊𝚛𝚎_𝙴𝚖𝚋𝚎𝚍𝚍𝚒𝚗𝚐(𝒜,Pembedding) \State                           Train_Data,Test_Data𝚜𝚙𝚕𝚒𝚝(Embedding) \State\CommentEmbedded nodes are divided into two parts \State                       {c1,,cK} Compute_Clusters(Train_Data,Ground_Truth) \Forq{1,,K} \State                          νq𝚁𝚒𝚎𝚖𝚊𝚗𝚗𝚒𝚊𝚗_𝙱𝚊𝚛𝚢𝚌𝚎𝚗𝚝𝚛𝚎(cq,𝚁𝚊𝚗𝚍𝚘𝚖,τb) \EndFor\ForuTest_Data \State                          Class(u)argminp{1,,K}(d(νp,u)) \Commentd is the Riemannian distance \EndFor\State\ReturnClass \State\CommentPerformances are then obtained by comparing the ground truth with the computed centroids of the Test_Data.

4 Experimental results

The algorithms from the previous sections are implemented as a package that does the following: given the adjacency matrix of a binary graph as input, it performs embedding over the Poincaré disk, applies Riemannian K-means clustering and provides visualisation of the computed clusters.

The package is implemented in Python and makes use of multiprocessing to run a number of experiments in parallel. All computations are performed on a machine using four cores equipped with an Intel Core i5 running at a 2.71 GHz frequency. The threshold τb is set to 0.005 for all experiments. The datasets used in the paper are given in Table 1 with their number of nodes (Nodes) and their number of edges (Edges) [30].

Dataset Nodes Edges K
Karate 34 77 2
Polblogs 1224 16781 2
Polbooks 105 441 3
Football 115 613 12
Adjnoun 112 425 2
Mammals subtree 1179 6541 NA
Table 1: Datasets used in the paper and their charachteristics.

4.1 Unsupervised clustering

Social Networks. In this part, we are interested in applying the previous algorithms to the datasets presented above.

Comparison criterion.

For each dataset 10 experiments are performed. Each experiment is conducted in two steps. In the first step, we generate a Poincaré and an Euclidean embedding which uses DeepWalk [29]. An intermediate step of the latter algorithm is the generation of random walks that captures the structure of the graph. The same random walks are used for both embeddings. In the second step, we apply Riemannian (resp. Euclidean) K-means algorithm over the embedding. For the Euclidean embedding, we set the space dimension to 10 and use the Euclidean distance and barycentre. For each dataset, we choose the embedding having the second greatest variance as given in Algorithm 3 as the best one (in the Riemannian and Euclidean case). Finally, for each dataset, we computed the mean average performance of the 10 embeddings (Riemannian and Euclidean). The results are presented in Table 2 with the following abbreviations:

PBPE: Performance of the best Poincaré embedding

PBEE: Performance of the best Euclidean embedding

APPE: Average performance of the 10 Poincaré embeddings

APEE: Average performance of the 10 Euclidean embeddings

Dataset PBPE PBEE APPE APEE [1]
Karate 91.2% 70.6% 91.4% 65.8% -
Polblogs 92.8% 51.9% 92.5% 53.5% -
Polbooks 84.8% 77.1% 80% 62% 75%
Football 87% 67.8% 69.4% 56.8% 77%
Adjnoun 51.8% 51.8 % 52.5% 51.6% 51%
Table 2: Comparative performances table of K-means Poincaré clustering for different examples compared to Euclidean K-means clustering. The best results are highlighted in bold text.

In addition to Table 2, Figure 1 provides visualisations of the computed clusters by the best Poincaré embedding for each dataset. Each cluster is represented with a different color and its barycentre by a square symbol.

Performance increase with the number of experiments.

In order to justify our choice of the best embedding as having the minimal total variance, we plot for the Football dataset the evolution of the PBPE with respect to the number of experiments NE. The obtained plot (Figure 2), shows indeed that the PBPE increases as NE grows from 1 to 10.

(a) Karate.
(b) Polblogs.
(c) Polbooks.
(d) Football.
(e) Adjnoun.
Figure 1: Visualisation of the computed clusters for the experiments having the best embedding (according to the defined variance criterion), the barycentres are represented by square shapes.

Hierarchical GSD. In this part, we are interested in applications over hierarchical GSD. We consider an example from Wordnet which is the mammals subtree. Figure 3 illustrates the obtained clusters with K=6. The barycentres are represented with squares. Figure 4 then shows explicitly some of the nodes labels, chosen randomly. A focus is given for nodes near barycentres on one hand and at boundaries between distinct clusters on the other hand.

Figure 2: PBPE performance increase with the number of experiments NE for the Football dataset.
Figure 3: Partitioning the mammals subtree into six clusters by K-means algorithms.
Figure 4: Close up over the mammal subtree labels: at the boundaries of distinct clusters and inside the cluster in a small neighborhood of the barycentre (the later is represented by a square)

Notice that the obtained clusters discern between different types of mammals. For example the blue cluster contains mostly canine mammals while the orange one contains mostly larger mammals (lion, tigress and so on…).

4.2 Supervised learning

In this section, we exploit Riemannian barycentres to perform supervised clustering. Each dataset from the previous list is divided into five parts with (almost) equal sizes. The ground truth of 4/5 the data will be used to define the clusters and the remaining 1/5 will serve for testing in a cross-validation fashion. Following Algorithm 3, each element from the test dataset is assigned to one cluster according to the nearest neighbor barycentre rule. This experiment is performed 10 times. Finally the performance of this method is evaluated using ground truth of the test data. Table 3 presents the obtained results with the following abbreviation:

CVMAP: Average performance obtained using cross-validation over the number of performed experiments.

Dataset CVMAP [14]
Karate 93.9% 86%
Polblogs 92.4% 93%
Polbooks 83.3% 73%
Football 77.9 % 24%
Adjnoun 57.8 % -
Table 3: Cross validation performances of the supervised clustering algorithm.

4.3 Discussion and comparison with the state of the art

First we point out that using the notion of minimal variance it is always possible to generate more embeddings than what we did previously (10 embeddings) and it is also possible to increase the dimension of the Poincaré ball as in [27]. This reasoning takes advantage of the nice properties of hyperbolic manifolds and is completely unsupervised in the sense that it does not require any ground truth. Thus, improvements in the results given above remain possible in both supervised and unsupervised settings.

Unsupervised clustering. In this paragraph, we comment on results of Table 2. First the advantages of Poincaré clustering over Euclidean clustering are straightforward and confirm that hyperbolic spaces represent GSD more suitably than Euclidean spaces. Regarding Poincaré clustering, notice that the best embedding as defined before is not necessarily the one with the highest clustering performance since in two situations PBPE<APPE. However in both cases, the gaps are slight and do not exceed 0.7%. The advantage of PBPE is more visible for the Football dataset where it largely exceeds APPE (with more than 17%) and for the Polbooks dataset where it exceeds APPE with 4.8%. Our results outperform that of [1] whose authors proposed an embedding on generalised surfaces and considered the datasets Polbooks, Football, Adjnoun for testing. They obtained approximate successes rates of 75,77,51% respectively. The improvements for these datasets using our proposed scheme are significant: 9.8,𝟏𝟎 and 0.8%.

Supervised clustering. In this paragraph, we compare our results with that of [14] in which the authors used a generalisation of the SVM method to the Poincaré disk. [14] considered the datasets Karate, Polbooks, Football, Polblogs and reported mean approximate successes of 86,73,24,93% respectively over 5 cross-validation trials over 5 different embeddings using the embedding of [13]. We obtained significant improvements for the datasets Karate, Polbooks, Football of 7.9, 10.3, and 53.9% with a slight gap of 0.6% for the Polblogs dataset.

4.4 Multidimensional embedding and clustering

In a multidimensional setting, we considered K-means clustering after Poincaré embeddings in the hyperbolic space given by the product 𝒫=𝔻××𝔻 of n Poincaré disks. This space is equipped with the product metric. The current publicly available Python implementation provides this multidimensional setting. However, experimentally, we did not observe a significant increase in performances for the values of n ranging till 10 for the datasets used in this paper. This may be explained by the fact that these datasets are not very large. In future work, we aim to experiment with larger datasets while increasing the dimension of the hyperbolic space in order to better study the scalability of our approach.

5 Conclusion

Several recent studies [27, 13, 14, 31] have concluded that hyperbolic spaces, even in small dimensions, are more suitable embedding spaces than Euclidean spaces for representing a GSD. In this paper, we used Poincaré embeddings [25] to propose a new method for clustering GSD data. This method is based on Riemannian K-means algorithms and a notion of minimal variance that allowed us to choose one embedding among several ones.

The proposed method has been tested on several datasets and has shown improvements in the state-of-the-art for both supervised and unsupervised clustering tasks. In particular, these performances were achieved at minimal cost:

  • We used the Poincaré disk of only 2 dimensions.

  • We got visualisation with high level representation of clusters on graphs.

Our results on clustering with the DeepWalk Euclidean embedding, suggest that getting good performances with this approach or other varieties of it such as Graph2vec [26], Nod2vec [19] would need very high dimensional Euclidean representation spaces.

Acknowledgement. The first author is grateful to Jeanine Harb for drawing his attention to the paper [27] during the machine learning seminar at IRT SystemX.

References

  • [1] Aalto, M., Verma, N.: Metric learning on manifolds. CoRR https://arxiv.org/abs/1902.01738 (2018)
  • [2] Adcock, A.B., Sullivan, B.D., Mahoney, M.W.: Tree-like structure in large social and information networks. In: 2013 IEEE 13th International Conference on Data Mining. pp. 1–10 (Dec 2013)
  • [3] Afsari, B.: Riemannian Lp center of mass: existence, uniqueness and convexity. Proc. Amer. Math. Soc. 139(2), 655–6673
  • [4] Anderson, J.W.: Hyperbolic geometry; 2nd ed. Springer undergraduate mathematics series, Springer, Berlin (2005), https://cds.cern.ch/record/1164418
  • [5] Arnaudon, M., Dombry, C., Phan, A., Yang, L.: Stochastic algorithms for computing means of probability measures. Stoch. Proc. Appl. 58(9), 1473–1455 (2012)
  • [6] Arnaudon, M., Miclo, L.: Means in complete manifolds : uniqueness and approximation. ESAIM Probability and statistics 18, 185–206 (2014)
  • [7] Arnaudon, M., Yang, L., Barbaresco, F.: Stochastic algorithms for computing p-means of probability measures, geometry of Radar Toeplitz covariance matrices and applications to HR Doppler processing. In: International Radar Symposium (IRS). pp. 651–656 (2011)
  • [8] Arnaudon, M., Barbaresco, F., Yang, L.: Riemannian medians and means with applications to radar signal processing. J. Sel. Topics Signal Processing 7(4), 595–604 (2013)
  • [9] Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (Dec 2005), http://dl.acm.org/citation.cfm?id=1046920.1194902
  • [10] Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multiclass brain-computer interface classification by riemannian geometry. IEEE Trans. Biomed. Engineering 59(4), 920–928 (2012)
  • [11] Boguñá, M., Papadopoulos, F., Krioukov, D.: Sustaining the Internet with Hyperbolic Mapping. Nature Communications 1(62) (Oct 2010)
  • [12] Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control. 122(4), 2217–2229 (2013)
  • [13] Chamberlain, B.P., Clough, J., Deisenroth, M.P.: Neural embeddings of graphs in hyperbolic space. 13th international workshop on mining and learning with graphs. Available at https://arxiv.org/abs/1705.10359 (2017)
  • [14] Cho, H., Demeo, B., Peng, J., Berger, B.: Large-margin classification in hyperbolic space. CoRR abs/1806.00437 (2018)
  • [15] Cui, P., Wang, X., Pei, J., Zhu, W.: A survey on network embedding. CoRR abs/1711.08752 (2017)
  • [16] Fellbaum, C. (ed.): WordNet: an electronic lexical database. MIT Press (1998)
  • [17] Goh, A., Vidal, R.: Clustering and dimensionality reduction on riemannian manifolds. In: CVPR. IEEE Computer Society (2008)
  • [18] Gromov, M.: Hyperbolic Groups, pp. 75–263. Springer New York, New York, NY (1987)
  • [19] Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. KDD : proceedings. International Conference on Knowledge Discovery & Data Mining 2016, 855–864 (2016)
  • [20] Gu, X., Deng, J.D., Purvis, M.K.: Improving superpixel-based image segmentation by incorporating color covariance matrix manifolds. In: ICIP. pp. 4403–4406. IEEE (2014)
  • [21] Helgason, S.: Differential geometry, Lie groups, and symmetric spaces. American Mathematical Society (2001)
  • [22] Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A., Boguñá, M.: Hyperbolic geometry of complex networks. Phys. Rev. E 82, 036106 (Sep 2010)
  • [23] Kurtek, S., Klassen, E., Ding, Z., Jacobson, S., Jacobson, J.B., Avison, M., Srivastava, A.: Parameterization-invariant shape comparisons of anatomical surfaces. IEEE Trans. Med. Imaging 30(3), 849–858 (2011)
  • [24] Lin, F., Cohen, W.W.: Power iteration clustering. In: ICML (2010)
  • [25] Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013), http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
  • [26] Narayanan, A., Chandramohan, M., Venkatesan, R., Chen, L., Liu, Y., Jaiswal, S.: graph2vec: Learning distributed representations of graphs. CoRR abs/1707.05005 (2017)
  • [27] Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: Advances in Neural Information Processing Systems 30, pp. 6338–6347. Curran Associates, Inc. (2017)
  • [28] Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: In EMNLP (2014)
  • [29] Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 701–710. KDD ’14, ACM, New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623732, http://doi.acm.org/10.1145/2623330.2623732
  • [30] Rossi, R.A., Ahmed, N.K.: The network data repository with interactive graph analytics and visualization. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (2015), http://networkrepository.com
  • [31] Sala, F., Sa, C.D., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic embeddings. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. pp. 4457–4466 (2018), http://proceedings.mlr.press/v80/sala18a.html
  • [32] Spielmat, D.A.: Spectral partitioning works: Planar graphs and finite element meshes. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science. pp. 96–. FOCS ’96, IEEE Computer Society, Washington, DC, USA (1996), http://dl.acm.org/citation.cfm?id=874062.875505
  • [33] Tu, C., Wang, H., Zeng, X., Liu, Z., Sun, M.: Community-enhanced network representation learning for network analysis. CoRR abs/1611.06645 (2016), http://arxiv.org/abs/1611.06645
  • [34] Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008), http://dblp.uni-trier.de/db/journals/pami/pami30.html#TuzelPM08
  • [35] Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1225–1234. KDD ’16, ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939753, http://doi.acm.org/10.1145/2939672.2939753
  • [36] Yang, L.: Médianes de mesures de probabilité dans les variétés riemanniennes et applications à la détection de cibles radar. PhD thesis (2011), http://www.theses.fr/2011POIT2311
  • [37] Zheng, V.W., Cavallari, S., Cai, H., Chang, K.C., Cambria, E.: From node embedding to community embedding. CoRR abs/1610.09950 (2016), http://arxiv.org/abs/1610.09950
  • [38] Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Tech. rep. (2002)