Learning Graphs from Noisy Epidemic Cascades

  • 2019-08-12 16:59:13
  • Jessica Hoffmann, Constantine Caramanis
  • 0

Abstract

We consider the problem of learning the weighted edges of a graph byobserving the noisy times of infection for multiple epidemic cascades on thisgraph. Past work has considered this problem when the cascade information,i.e., infection times, are known exactly. Though the noisy setting is wellmotivated by many epidemic processes (e.g., most human epidemics), to the bestof our knowledge, very little is known about when it is solvable. Previous workon the no-noise setting critically uses the ordering information. If noise canreverse this -- a node's reported (noisy) infection time comes after thereported infection time of some node it infected -- then we are unable to seehow previous results can be extended. We therefore tackle two versions of the noisy setting: the limited-noisesetting, where we know noisy times of infections, and the extreme-noisesetting, in which we only know whether or not a node was infected. We provide apolynomial time algorithm for recovering the structure of bidirectional treesin the extreme-noise setting, and show our algorithm almost matches lowerbounds established in the no-noise setting, and hence is optimal up tolog-factors. We extend our results for general degree-bounded graphs, whereagain we show that our (poly-time) algorithm can recover the structure of thegraph with optimal sample complexity. We also provide the first efficientalgorithm to learn the weights of the bidirectional tree in the limited-noisesetting. Finally, we give a polynomial time algorithm for learning the weightsof general bounded-degree graphs in the limited-noise setting. This algorithmextends to general graphs (at the price of exponential running time), provingthe problem is solvable in the general case. All our algorithms work for anynoise distribution, without any restriction on the variance.

 

Quick Read (beta)

Learning Graphs from Noisy Epidemic Cascades

Jessica Hoffmann, Constantine Caramanis
August 24, 2019
Abstract

We consider the problem of learning the weighted edges of a graph by observing the noisy times of infection for multiple epidemic cascades on this graph. Past work has considered this problem when the cascade information, i.e., infection times, are known exactly. Though the noisy setting is well motivated by many epidemic processes (e.g., most human epidemics), to the best of our knowledge, very little is known about when it is solvable. Previous work on the no-noise setting critically uses the ordering information. If noise can reverse this – a node’s reported (noisy) infection time comes after the reported infection time of some node it infected – then we are unable to see how previous results can be extended. We therefore tackle two versions of the noisy setting: the limited-noise setting, where we know noisy times of infections, and the extreme-noise setting, in which we only know whether or not a node was infected. We provide a polynomial time algorithm for recovering the structure of bidirectional trees in the extreme-noise setting, and show our algorithm almost matches lower bounds established in the no-noise setting, and hence is optimal up to log-factors. We extend our results for general degree-bounded graphs, where again we show that our (poly-time) algorithm can recover the structure of the graph with optimal sample complexity. We also provide the first efficient algorithm to learn the weights of the bidirectional tree in the limited-noise setting. Finally, we give a polynomial time algorithm for learning the weights of general bounded-degree graphs in the limited-noise setting. This algorithm extends to general graphs (at the price of exponential running time), proving the problem is solvable in the general case. All our algorithms work for any noise distribution, without any restriction on the variance.

1 Introduction

Epidemic models accurately represent (among other processes) the spread of diseases, information (rumors, viral videos, news stories, etc.), the spread of malevolent agents in a network (computer viruses, malicious apps, etc.), or even biological processes (pathways in cell signaling networks, chains of activation in the gene regulatory network, etc.).

We focus on epidemics that spread on an underlying graph [30], as opposed to the fully mixed models introduced in the early literature [4].

Most settings assume we know the underlying graph and aim to study properties of the spread. Much work has been done in detection [2, 3, 27, 26, 25, 24, 21], where the goal is to decide whether or not there is indeed an infection. This problem is of importance in deciding whether or not a computer network is under attack, for instance, or whether a product gets sold through word-of-mouth or thanks to the advertisement campaign (or both [28]). More specifically, the problem of source detection [32, 33, 31, 34, 35] or obfuscation [11, 13, 12] has been extensively studied. On the other side of the spectrum, both experimental and theoretical work has tackled the problem of modeling [7, 36, 17], predicting the growth [6, 38], and controlling the spread of epidemics [9, 10, 18, 14].

In this work, we take the opposite approach: assuming we know some properties of the spread, can we recover the underlying graph? The early works on this subject proposed a few heuristics and experimentally proved their effectiveness [16, 19]. Netrapalli et al. [29] established the first theoretical guarantees for this problem for discrete-time infections. They proved one can recover the edges of any graph with correlation decay, with access to the times of infection for multiple cascades spreading on the graph. They introduced a likelihood, proved it decouples into convex subproblems, and demonstrated that the edges of the graph can therefore be obtained efficiently. They also proved a sample complexity lower bound and showed their method is within a log factor of it. Abrahao et al. [1] also introduced a method of solving this problem, this time for a more realistic, continuous-time infection model, through learning only the first edge of each cascade. Zarezade et al. [37] proposed a first experimental attempt to tackle the case of correlated cascades using Hawkes processes. Khim et al. [22] extended the theoretical results to the case where the the cascades spreading on the graph are not independent, which required completely new machinery involving martingales and weighted Pólya urns.

All the results above assume we have perfect knowledge of the properties of the spread we use to reconstruct the graph. For most of the literature, those are the times of infection for all nodes for each cascade. This assumption may hold for online epidemics, as information is usually dated (for instance, posts or retweets on social networks have time stamps). For human networks, however, this assumption is often unrealistic: official diagnosis (and hence recording by any tracking entity such as the CDC) may come days, weeks, or in important examples such as HIV, years after the actual moment of infection. Moreover, this can be highly variable from person to person, hence the infector is often diagnosed after the infectee. Similar issues arise with biological networks: we only know the expression of a gene when a measure is taken, which can happen after a typically arbitrary delay.

We therefore develop a method for learning the graph of epidemics with noisy times of infection. We demonstrate that past approaches are unusable, due to the fact that even small levels of noise are typically enough to cause order-of-diagnosis to differ from order-of-infection. We develop new techniques to deal with this setting, and prove our algorithm can reliably learn the edges of a tree in the limited-noise setting, for any noise distribution. We also show we can learn the structure of any bounded degree graph from a very weak observation model, in an sample-optimal fashion (up to log-factors). We finally provide an algorithm which learns the weights of any bounded-degree graph in the limited-noise setting.

Table 1: Notations
G=(V,E) Graph G, V set of nodes, E set of edges.
N Number of nodes in the graph.
Tim Random variable for the actual time of infection of node i during cascade m.
nim Noise of node i during cascade m.
Tim=Tim+nim Random variable for the noisy time of infection of node i during cascade m.
Iim Random boolean variable. (Iim=True node i was infected during cascade m).
pij Weight of edge (i,j), corresponding to the probability that i infects j.

1.1 Model

\includegraphics

[width = ]pen-1.png

(a) At t=0, node 1 is the source, in the infected state. It can possibly infect node 2, 3 and 4, all in the susceptible state.
\includegraphics

[width = ]pen-2.png

(b) At t=1, nodes 3 and 4 are infected. Node 3 can infect node 5, in the susceptible state. Node 4 can infect node 2, in the susceptible state, but not node 1, since node 1 is now removed.
\includegraphics

[width = ]pen-3.png

(c) At t=2, node 2 is infected. All its neighbors are in the removed state, so new node can be infected.
\includegraphics

[width = ]pen-4.png

(d) At t=3, the cascade stops, even if node 5 remains in the susceptible state.
Figure 1: A complete cascade.

We observe epidemics spreading on a graph, and aim to reconstruct the graph structure from noisy estimates of the times of infection. In this section, we specify the exact propagation model, the noisy observation model, and the two learning tasks this work tackles.

Propagation model: We consider a particular variant of the independent cascade model, close to the one-step model introduced by [15] and further studied by [20]. The epidemic spreads in discrete time on a weighted directed graph G=(V,E), where parents can infect their children with probability equal to the weight of the edge between them, but children cannot infect their parents. We allow bidirectional edges: it is possible that both (i,j)E and (j,i)E, possibly with different weights. For each edge (i,j)E, the corresponding weight pij is such that 0<pminpijpmax<1.

This process is an instance of a SusceptibleInfectedRemoved (SIR) process. Each node starts out in the susceptible state. As in [1], each cascade m starts at a positive time 11 1 Most of the literature considers the initial time of infection to be 0. This is because when we have access to the exact times of infection, we can make this assumption without loss of generality. In our case, it would imply we know exactly when an outbreak started, which is usually not the case. on a unique node source node, picked uniformly among the nodes of the graph. Once the source becomes infected, it is removed from the graph, and if it has children, each is infected at the next time step independently according to a probability specified by the weight of the edge shared with the source.

The process ends when there are no newly infected nodes (either because no infection happened during the previous time step, in which case some nodes may never be infected, or because all the nodes of the graph are removed). One realization of this process from start to finish is called a cascade. If two nodes are infected during the same cascade, we say that they are co-infected for this cascade. This process is illustrated in Figure 1.

Observation model: Let Tim be a random variable corresponding to the time of infection of node i during cascade m, and let tim be its realization (if i stays in the susceptible state during cascade m, we have tim=). We introduce three observation models.

In the no-noise setting, we have access to the exact times of infection Tim.

In the limited-noise setting, we never get to observe the exact times of infection Tim, but only a noisy version Tim=Tim+nim (with realization tim), where all the nim are i.i.d., and represent the noise added to the Tim. We assume nim follows a known distribution 𝒟. The only restriction we put on 𝒟 is that it cannot have infinite value (i.e., tim=tim=, and we know for a fact when nodes have been infected or not).

In the extreme-noise setting, we take the previous setting to the extreme, and we assume that instead of having access to the noisy times of infection Tim, we only have access to the infection status of the nodes Iim. We know Iim=True if i was infected during cascade m, and Iim=False otherwise. Note that Tim<Iim=True, so we can always deduce the infection status from the noisy times of infection. However, we cannot guess the noisy times of infection from the infection status: the (noisy) times of infections contain strictly more information than the infection status.

For these three settings, we call a sample the vector of all observations for the cascade m. In the no-noise setting, this is the extended-integer vector {tim}iV. In the limited-noise setting, this is the extended-integer vector {tim}iV. In the extreme-noise setting, this is the boolean vector corresponding to the realization of {Iim}iV. We also use the notation T={Tim}iVm=1M (respectively t={tim}iVm=1M) for the matrix representing the random variable (respectively the realizations) of all the samples.

Learning tasks: We focus on two different learning tasks. When we learn the structure of a graph, it means that for any two nodes i and j, we can decide whether or not there exists an edge between these two nodes (whatever its direction). When we learn the weights of the graph, it means that for every two nodes i and j, we learn the exact value22 2 When (i,j)E, we have pij=0. of both pij and pji up to precision ϵ.

1.2 Why is it a hard problem?

1.2.1 Counting approaches

Most approaches in the no-noise setting relate to counting. In our setting, for instance, a natural (and consistent) estimator for pij is to count how often an infection occurred along an edge, and divide it by how often such an infection could have happened:

pij^=Number of times j becomes infected one time step after iNumber of times i was infected before j.
\includegraphics

[width=0.20]ijk.png

Figure 2: Possible scenarios which could have led to Ti=2, Tj=3 and Tk=4. In the no-noise setting, this implies Ti=2,Tj=3,Tk=4, and there is only one possible infection pattern.
\includegraphics

[width = ]ijk.png T n i 0 2 j 1 2 k 2 2

(a)
\includegraphics

[width = ]i-jk.png T n i 0 2 j 1 2 k 1 3

(b)
\includegraphics

[width = ]kij.png T n i 1 1 j 2 1 k 0 4

(c)
\includegraphics

[width = ]jik.png T n i 1 1 j 0 3 k 2 2

(d)
\includegraphics

[width = ]j-ik.png T n i 1 1 j 0 3 k 1 3

(e)
\includegraphics

[width = ]kji.png T n i 2 0 j 1 2 k 0 4

(f)
\includegraphics

[width = ]ikj.png T n i 0 2 j 2 1 k 1 3

(g)
\includegraphics

[width = ]k-ij.png T n i 1 1 j 1 2 k 0 4

(h)
\includegraphics

[width = ]jki.png T n i 2 0 j 0 3 k 1 3

(i)
Figure 3: Possible scenarios which could have led to Ti=2, Tj=3 and Tk=4. We have Tl=Tl+nl. In the limited-noise setting, there are nine possible infection patterns (many more scenarios with the same infection pattern, but different noise values, are not shown).

In the no-noise setting, j could only have been infected by a node signaling exactly one time step before j. However, in the limited-noise setting, j signaling its infection one time step after i could stem from a variety of scenarios:

  • i could have indeed infected j: cases a), b) and c) of Figure LABEL:fig:second above.

  • j could have infected i, but the noise flipped the order of signaling: cases d), e) and f).

  • No infection happen between i and j, and the probability of infectin depends mainly on another node k: cases g), e) and f). This could happen for any other node k in the graph.

The natural estimator introduced earlier is therefore not consistent anymore; instead, it tends to a quantity which depends on pij, but also pji, and pik,pki,pjk,pkj as well, for all the other nodes k in the graph. By counting the number of times j became infected one time step after i, we are not counting the number of infections along the edge (i,j) anymore, but instead a mixture of all the scenarios described above, which not only include the cases where j infected i, but also events in which the cascade spread through another node k, and the edge (i,j) was irrelevant to the process. Using this estimator, or any obvious (to us) extension of it, would not only imply learning the wrong weights for the edges, but also learning edges when there are no edges. Our first contribution is therefore to design a new set of estimators, from which we can deduce the value of pij (Sections 2.3 and 3.2).

Adding noise in the time of infection not only reverses the cascade chronology, it also exponentially increased the number of possible infection patterns that could have happened. Bounding the realm of possibilities is therefore our second step towards solving the problem (Section 2.1).

1.2.2 Max-likelihood approaches

Another common approach is to use likelihood-based methods. For instance, in [29], the authors develop a max-likelihood-based approach to learn the edges of the graph. They prove the log-likelihood has desirable properties: it decouples into only one local problem for each node, and this local problem is convex (up to the change of variable θij=-log(1-pij)):

(T,Pij) =log(1N1iN(j;tj<ti-1(1-pji))Probability that i was notinfected before ti.(1-j;tj=ti-1(1-pji))Probability that i wasinfected at ti.)
(T,Pij) =log(1N)+i=1Nj;tj<ti-1log(1-pji)+j;tj=ti-1log(1-pji)

In our setting, the log-likelihood has none of these properties. It is not convex, and it is unclear any method other than brute force could find its maximum. Moreover, it does not decouple anymore, and even computing the log-likelihood itself takes exponential time.

(T,Pij) =log(1N(t1,,tN)(t1-1,,tN-1)This is the rootof the difficulty.1iN(n(ti-ti))Noise at node i is ti-ti.
  (j;tj<ti-1(1-pji))Probability that i was notinfected before ti.(1-j;tj=ti-1(1-pji))Probability that i wasinfected at ti.)
=log(1N)+log((t1,,tN)(t1-1,,tN-1)1iNn(ti-ti)(j;tj<ti-1(1-pji))(1-j;tj=ti-1(1-pji)))

When dealing with hidden variables, a common technique would be to use the Expectation-Maximization algorithm [8]. However, in our setting, the number of hidden states is i=1Nti, which can be as large as (N-1)N. This prohibits any realistic use of the Expectation-Maximization algorithm for networks with more than twelve nodes. Moreover, except for the recent contributions [23], very little is known about the theoretical convergence of the Expectation-Maximization algorithm.

1.3 Contributions

The contributions of this article are multiple:

  • To the best of our knowledge, we are the first to tackle the problem of learning the edges of a graph from noisy times of infection, a simple but natural extension of a well-known problem.

  • We provide the first efficient algorithm for learning the structure and weights of a bidirectional tree in this setting. We also establish a tree-specific lower bound which shows that our algorithm is sample-optimal (up to log-factors) for learning the structure of the tree (Section 2).

  • We prove it is possible to learn the structure of any bounded-degree graph in the extreme setting for which we only have access to the infection status (i.e., whether or not a node was infected). Moreover, we can do so with almost optimal sample complexity, according to the bound established in [29].

  • We provide polynomial algorithms for learning the weights of bounded degree graphs.

  • Finally, we extend the results from bounded-degree graphs to general graphs. This proves the problem is solvable under any noise distribution, although the exponential sample complexity and running time prohibits any use of this algorithm in practice (Section 3.2).

2 Learning bidirectional trees

The bidirectional tree is the simplest example which illustrates some of the difficulties of the noisy setting. Indeed, for a directed tree, the true sequence of infections can be reconstructed, and we can use techniques from the no-noise setting. For a bidirectional tree, those techniques cannot be extended. However, the uniqueness of paths in the bidirectional tree still makes this problem considerably easier than the general setting. We therefore start by presenting a solution for the bidirectional tree. The key ideas here generalize to the neighborhood-based decomposition we introduce below, which forms our key conceptual approach for the general problem.

This section contains three contributions. First, we show how to learn the structure of a tree using only the infection status, i.e., what we call the extreme-noise setting (Section 2.1). For each cascade, we only know which nodes were infected. We show this contains enough information to learn the structure of bidirectional tree. Second, we establish a lower bound for the no-noise setting, and show our algorithm has the same dependency in the number of nodes N as this lower bound (up to log-factors). In other words, for the task of learning the structure of any tree, an optimal algorithm in the no-noise setting would need as many cascades as our algorithm needs in the extreme-noise setting (up to log-factors).

Finally, we show how we can leverage this learned structure to learn the weights of the tree, this time when we have noisy access to the times of infection, i.e., the limited-noise setting (Section 2.3). We provide sample complexity for this task.

2.1 Tree structure

As illustrated in Section 1.2, the number of edges that could exist is much higher in the limited-noise setting than the number of actual edges in the tree. Our first key contribution is therefore to introduce a new estimator, h^ij, which keeps track of the fraction of cascades for which i and j were both infected. This estimator can therefore be computed only with the infection status in the extreme-noise setting. Using this estimator, we show that in the specific case where the graph is a tree, we learn the structure of this tree, i.e. whether or not both pij=pij=0.

Our algorithm for learning the edges of the tree relies on one central observation: hi,j achieves a kind of local maximum if there is an edge between i and j (Lemma 1). This observation relies heavily on the fact that there is uniqueness of paths on a tree. Let us now dive into the proof.

Definition 1.

Let h^i,j be the fraction of cascades in which both i and j became infected. We have:

h^i,jM(Iim&Ijm):=hi,j.

We now show that the limit hi,j of the estimator h^i,j satisfies a local maximum property on the edges of the tree:

Lemma 1.

If i and j are not neighbors, let (u0,u1,,ud) be the path between them, with u0=i, ud=j, and d>1. Then:

r{0,d-1},hi,j<hur,ur+1.
Proof.

We consider the case in which both i and j have been infected. There is a unique source of infection and a unique path between i and j. Therefore, all the nodes on the path from i to j must have been infected as well. In particular, both ur and ur+1 were infected. This shows Iim&IjmIurm&Iur+1m, so (Iim&Ijm)(Iurm&Iur+1m), and therefore hi,jhur,ur+1.

What’s more, every time ur and ur+1 became infected, at least one more infection along an edge must have occurred in order for j to become infected as well. This occurred with probability at most pmax<1. Therefore, hi,j=(Iim&Ijm)pmax(Iurm&Iur+1m)=pmaxhur,ur+1. We conclude hi,j<hur,ur+1.

This simple lemma allows us to design Algorithm 2.1. Indeed, suppose we have access to all the limits hi,j. By ordering them in decreasing order, we can deduce the structure of the tree by greedily adding every edge unless it forms a cycle33 3 This algorithm is very similar in spirit to Kruskal’s algorithm for finding the maximum spanning tree .

{algorithm}

[H] Learn the undirected edges of the tree. {algorithmic}[1] \ProcedureLearnTree{hi,j}i,jV \Commenthi,j limit of h^i,j. \Statepairs_h_edge[(hi,j,(i,j)) for 1ijnnodes] \StateSort(pairs_h_edge) by decreasing order \Stateedges_tree[] \For(,potential_edge)pairs_h_edge \IfAdding potential_edge to edges_tree does not create a cycle \StateAdd potential_edge to edges_tree \EndIf\EndFor\Returnedges_tree \EndProcedure We show that if we have access to the limits hi,j of the estimators h^i,j, the algorithm above correctly find the structure of the tree.

Lemma 2.

Algorithm 2.1 correctly finds all the N-1 pairs (i,j) such that there exists at least one directed edge between i and j.

Proof.

We show that in the for-loop at line 1, we add an edge to edges_tree if and only if this edge was a real edge in the original tree. We prove it by induction on the elements of the sorted list pairs_h_edge.

When no element has been selected, the proposition is trivially true.
Suppose now that t elements of pairs_h_edge have been examined so far. Let (,(i,j)) be the t+1th element. Two cases arise:

  1. 1.

    i and j are not neighbors. Let (u0,,ud) be the path between them, with u0=i and ud=j. In this case, using Lemma 1, r,hur,ur+1>hi,j. In other words, all the pairs (ur,ur+1) have already been considered by the algorithm. By induction, we have kept all of them in edges_tree. Therefore, adding the pair (i,j) would form a cycle. This pair is not kept in edges_tree, which is what we wanted since it is not an edge in the original tree.

  2. 2.

    i and j are neighbors. Suppose that adding this pair forms a cycle. Then there is a sequence (v0=i,,vd=j) of nodes such that hvk,vk+1 were all bigger than hi,j, and the pairs (vk,vk+1) were kept by the algorithm for all k. However, by uniqueness of paths in a tree, there exists a pair (va,va+1) such that the path connecting va and va+1 in the original tree goes through (i,j). Using Lemma 1, this means hi,j>hva,va+1, which is a contradiction. Therefore, adding this pair in edges_tree does not form a cycle. This pair is kept in edges_tree.

Therefore, this algorithm keeps all the edges, and only the edges of the tree, so it recovers the tree structure.

We next quantify how many cascades M are needed for Algorithm 2.1 to be correct if we replace the {hi,j}i,jV by their estimates {h^i,j}i,jV. We note that we do not require h^i,j to be close to their limit, but only need the order of the h^i,j to be the same as the order of the hi,j. We identify events which guarantee that the order is the same (Corollary 1):

Definition 2.

Let H3:={(i,j,k){1,,N}3,pij+pji>0&pjk+pjk>0} be the set of triplets of nodes such that at least one directed edge exists between the first and the second node, as well as between the second and the third node.

Proposition 1.

If:

(i,j,k)3,h^i,j>h^i,k and h^j,k>h^i,k,

then for all paths (u0,,ud) in the tree, with d>1, we have:

r{0,,d-1},h^ur,ur+1>h^u0,ud.
Proof.

For r{0,,d-2}, we have by hypothesis h^ur,ur+1>h^ur,ur+2. Now, we recall that h^u0,ud is the number of cascades for which both u0 and ud were infected. By uniqueness of paths in the tree, every time both u0 and ud were infected, both ur and ur+2 must have been infected as well. This shows that h^u0,udh^ur,ur+2. Notice that this is a deterministic property, not an asymptotic property. Therefore, h^ur,ur+1>h^ur,ur+2h^u0,ud.

For r=d-1, we follow an identical reasoning, but with h^ur,ur+1>h^ur-1,ur+1.

Corollary 1.

If:

(i,j,k)3,h^i,j>h^i,k and h^j,k>h^i,k,

then the correctness of Algorithm 2.1 is preserved when given h^ as input instead of h.

In other words, Algorithm 2.1 outputs a correct set of undirected edges with finite samples.

Proof.

According to Proposition 1, for all paths (u0,,ud) in the tree, with d>1, we have that r{0,,d-1},h^ur,ur+1>h^u0,ud. As shown in the proof of Lemma 2, this is the only property of the input needed in order to yield the correct output.

Proposition 2.

With M=N(log(1δ)+2log(N))pmin(1-pmax) cascades, with probability at least 1-δ, we have:

(i,j,k)3,h^i,j>h^i,k and h^j,k>h^i,k.
Proof.

Let us consider one triplet (i,j,k) in 3. We recall that h^i,k is the number of cascades for which both i and k were infected. Since the only path from i to k is through j, we always have that h^i,jh^i,k and h^j,kh^i,k. We notice that to obtain h^i,j>h^i,k, we only need one cascade for which both i and j got infected, but not k. We lower bound the probability Ptriplet identified of this cascade happening. For each cascade m, we have:

Ptriplet identified =(Iim&Ijm&Not(Ikm))
(i was a source, i infected jj did not infect k)
+(j was a source, j infected ij did not infect k)
1Npij(1-pjk)+1Npji(1-pjk)
1Npmin(1-pmax).

The probability that this event never occurs during the M cascades is upper bounded by:

(h^i,j=h^i,k) (1-1Npmin(1-pmax))M
e-MNpmin(1-pmax)
δN2.

Now, there are N-1 edges in a tree, therefore |3|(N-1)2<N2. By union bound:

((i,j,k)3h^i,j=h^i,k) (i,j,k)3)(h^i,j=h^i,k)
<N2δN2
δ.

Notice that 3 contains both (i,j,k) and (k,j,i). We have therefore proven that with probability at least 1-δ, when considering M=N(log(1δ)+2log(N))pmin(1-pmax) cascades, we have (i,j,k)3,h^i,j>h^i,k and h^j,k>h^i,k.

Putting together Proposition 2 and Corollary 1, we obtain our first theorem for learning the undirected edges with finite samples:

Theorem 1.

With M=N(log(1δ)+2log(N))pmin(1-pmax) cascades, with probability at least 1-δ, we can learn the structure of a any bidirectional tree in the extreme-noise setting, i.e., when we only have access to the infection status of the nodes.

2.2 Lower bound

In this section, we prove a lower bound for trees in the no-noise setting. With very minor adjustments, we adapt the lower bound of [29]. Since for a general tree, the max degree can be up to N-1, we design a lower bound which is independent from the max-degree. Let G be a tree drawn uniformly from 𝒢, the set of all possible trees on N nodes, and G^ be the reconstructed graph from the times of infection. GTG^ therefore forms a Markov chain. We have:

H(G) =I(G;G^)+H(G|G^)
(data processing inequality) I(G;T)+H(G|G^)
(independent cascades) MI(T1;T1)+H(G|G^)
(I(X,Y)H(X)) =Mi=1NH(Ti1)+H(G|G^)
(Fano’s inequality) Mi=1NH(Ti1)+(1+Pelog(|𝒢|))

Since G is drawn uniformly from 𝒢, H(G)=log(|𝒢|). There are NN-2 trees on N nodes, according to Cayley’s formula [5], so H(G)=(N-2)log(N).

In conclusion:

M(1-Pe)(N-2)log(N)-1NH(Ti1)=Ω(log(N)H(Ti1))

Using the same kind of techniques as in [29], we can assume H(Ti1)log(N)N. Therefore:

Theorem 2.

In the no-noise setting, we need M=Ω(N) cascades to learn the tree structure.

In our extreme-noise setting, when we have only access to the infection status of the nodes, we can learn the tree structure with the same sample complexity (up to log-factors) as the no-noise setting!

2.3 Tree weights

In this section, we assume we are in the limited-noise setting, and we have access to the times of infection. We also assume we have already learned the structure of the tree.

Once we have reduced the set of possible edges by learning the structure of the bidirectional tree, learning the weights of the edges is still non-trivial. Indeed, from Tim and Tjm, it is still impossible to know whether this sample is useful for estimating pij (case when i infected j), or whether we should use this sample for estimating pji instead (case when j infected i). What is more, we only get one sample per node and per cascade, so it is impossible to know what really happened during that cascade. However, knowing the distribution of the noise, it is possible to compute the probability that the noise maintained the order of infections. Using this information and the reduced set of known undirected edges, we can compute two sets of N(N-1) estimators, from which it is possible to infer the weights of all edges in the tree.

We introduce these two sets of N(N-1) estimators, or, in other words, two estimators for each directed edge. These estimators tend to multivariate polynomials of the weights of most edges of the tree. Thus in general these polynomials have exponentially many terms; however, when i and j are neighbors, it is possible to express them concisely using a quantity 𝒫j(i), which we define formally below. This succinct representation is the key idea we exploit to solve the resulting system of equations.

Once we know the structure of the bidirectional tree, we can consider the four estimators for each undirected edge (two estimators for each directed edge). They form a system of four equations and four unknowns, which we solve to obtain the weights of the edges.

Definition 3.

𝒫i(j) is the probability that j became infected before any node on the path from j to i, including i, became infected.

We now introduce the estimators:

Definition 4.

We introduce 2 sets of N(N-1) estimators:

f^i<j =Fraction of infections for which i and j got infected, and i reported before j.
g^i,j =Fraction of infections for which i got infected, but j did not.

By the law of large numbers, as the number of cascades scales, f^i<j tends to fi<j and g^i,j to gi,j, where

fi<j := (Tim<Tjm<),
gi,j := (Tim<,Tjm=).

We now compute the exact values of these two quantities. Let us assume the (unique) path between i and j has length d. We call (u0,,ud) the set of nodes on the path from i to j, with u0=i and ud=j. We then have:

Lemma 3.

Recall fi<j and gi,j are the expectation of the estimators defined above. We have:

fi<j =𝒫j(i)pijs-d+1+𝒫i(j)pjisd+1
+l=1d-1𝒫i,j(ul)pulipuljs2l-d+1.
g^i,j =𝒫j(i)(1-pij)+l=1d-1𝒫i,j(ul)puli(1-pulj).

What’s more, when i and j are neighbors (which implies d=1), the expressions simplify to:

fi<j =𝒫j(i)pijs0+𝒫i(j)pjis2
gi,j =𝒫j(i)(1-pij).
Proof.
fi<j =(Tim<Tjm<)
=(i got infected before any nodes on the path from i to j,
then infected j, and the noise did not flip the times of infection)
+(j got infected before any nodes on the path from i to j,
then infected i, and the noise flipped the times of infection)
+(One node on the path from i to j got infected before i and j,
then infected both i and j, but i reported before j)
=𝒫j(i)pijs-d+1
+𝒫i(j)pjisd+1
+l=1d-1𝒫i,j(ul)pulipuljs2l-d+1.

This expression is involved in general. However, if i and j are neighbors, then there are no nodes ul on the path between i and j, other than i and j themselves. What’s more, pij=pij, and d=1. Therefore:

fi<j=𝒫j(i)pijs0+𝒫i(j)pjis2.

Let us now focus on gi,j:

gi,j =(Tim<,Tjm=)
=(i got infected before any nodes on the path from i to j, but j did not get infected)
+(One node on the path from i to j got infected before i and j, then infected i but not j)
=𝒫j(i)(1-pij)
+l=1d-1𝒫i,j(ul)puli(1-pulj).

As before, this expression is complex in general, but simplifies if i and j are neighbors, in which case:

gi,j=𝒫j(i)(1-pij).

Using the simplified expression only for when i and j are neighbors, we obtain:

Proposition 3.

If we know (i,j) is an edge in the original tree, then the probability of infection along this edge is given by:

pij=fi<js0-fj<is2gi,j(s02-s22)+fi<js0-fj<is2.
Proof.

According to Lemma 3, we had four second-order equations, with 4 unknowns: pij, pji, 𝒫j(i) and 𝒫i(j). We solve it, and obtain the wanted result. See Appendix A for details.

Combining all the pieces, we obtain our first theorem for infinite samples:

Theorem 3.

It is possible to learn the weights of a bidirectional tree in the limited-noise setting.

Now that we have proven the problem is solvable, we establish the number of samples needed to learn the weights with the method above.

Lemma 4.

With M=N2ϵ2log(6δ)((s02-s22+s0+s2)pmax+s0+s2)2(s02-s22)2 samples, with probability at least 1-δ, we have:

|p^ij-pij|ϵ.
Proof.

Using Hoeffding’s inequality:

(|f^i<j-fi<j|>ϵ1) 2e-2Mϵ12,
(|f^j<i-fj<i|>ϵ1) 2e-2Mϵ12,
(|g^i,j-gi,j|>ϵ1) 2e-2Mϵ12.

Choosing M=1ϵ12log(6δ), we have that with probability at least 1-δ, all the following hold:

|f^i<j-fi<j| ϵ1,
|f^j<i-fj<i| ϵ1,
|g^i,j-gi,j| ϵ1.

Hence, with probability at least 1-δ, we have (see Appendix A for details):

p^ij-pij ϵ1(s02-s22+s0+s2)pmaxgi,j(s02-s22)+fi<js0-fj<is2
+ϵ1s0+s2gi,j(s02-s22)+fi<js0-fj<is2+o(ϵ1).

We use the results from Lemma 3 to bound the denominator by s02-s22N. In the end, we obtain:

|p^ij-pij|ϵ1N(s02-s22+s0+s2)pmax+s0+s2s02-s22+o(ϵ1).

We choose ϵ1=ϵNs02-s22(s02-s22+s0+s2)pmax+s0+s2. Therefore:
With M=N2ϵ2log(6δ)((s02-s22+s0+s2)pmax+s0+s2)2(s02-s22)2 samples, with probability at least 1-δ, we have |p^ij-pij|ϵ.

By a union bound on all the weights of the tree, knowing there are at most 2N(N-1)<2N2 directed edges in a directed tree, we obtain the following sample complexity:

Theorem 4.

With M=N2ϵ2log(12N2δ)((s02-s22+s0+s2)pmax+s0+s2)2(s02-s22)2 cascades, with probability 1-δ, we can learn all the weights of the edges of a bidirectional tree in the limited-noise setting, i.e., when we only have access to the noisy times of infection.

3 Bounded-degree graphs

In the previous section, the algorithm presented relies heavily on the uniqueness of paths. This property implies that we can deduce the edges from the nodes which are co-infected the most often. However, this is not true for a general bounded-degree graph. In Figure 4, we can see that the two nodes i and j would be co-infected frequently despite not sharing an edge. This makes the task of finding the structure much more challenging than for the bidirectional tree.

In this section, we show how the main ideas for learning the structure of the bidirectional tree can be extended for learning the structure of general bounded-degree graphs, in the extreme-noise setting, with almost optimal sample complexity. The framework for learning the weights of the edges in the limited-noise setting is - to the best of our knowledge - not extendable to general bounded-degree graphs; we therefore develop a new algorithm to learn the weights for general bounded-degree graphs.

\includegraphics

[width = 0.5]lotsEdges-clean.png

Figure 4: Two nodes can be co-infected frequently without sharing an edge.

3.1 Bounded-degree structure

In the previous section, we introduced the estimator h^i,j, which records the fraction of cascades for which both i and j are infected. From a local maximum property of this estimator, we deduced the structure of the tree, in a sample efficient fashion. Indeed, if there exists a path between i and k, and the first edge on this path is (i,j), then if i and k are infected, j must have been infected as well.

We want to build on this idea for a bounded-degree graph of maximum degree d. However, for such a graph, there may be multiple paths leading from i to j, and we cannot guarantee a single node will be infected each time. However, if i is a node, 𝒩i is its neighborhood, and k𝒩i is another node of the graph, we can guarantee that if both i and k are infected, there exists a node in 𝒩i which is infected. Moreover, 𝒩i is the set of smallest size for which at least one node is infected at the same time as i the most frequently. This leads us to a new set of estimators:

Definition 5.

Let i be a node of the graph, and let S be a set such that |S|<d and iS. We define a new set of estimators:

h^i,S =Fraction of cascades for which i is infected, and at least one node of S is infected.

Let us assume that for each pair (i,S), we have access to the limit hi,S of h^i,S. We now introduce an algorithm showing how to leverage these limits to learn the structure of any bounded-degree graph. {algorithm}[H] Learn the undirected edges of any graph of maximum degree d. {algorithmic}[1] \ProcedureLearnGraph{hi,S}iV|S|d \Stateedges[] \Fori=1nnodes \StateS_max_i set such that hi,S_max_i is maximal, and such that Size(S_max_i) is minimal. \Fornj in S_max_i \StateAdd edge (i,nj) to edges \EndFor\EndFor\Returnedges \EndProcedure

We show this algorithm is correct.

Lemma 5.

Algorithm 3.1 correctly finds the neighborhood of each node.

Proof.

Let us recall that hi,S is the probability that node i and at least one node of S are co-infected. To prove the correctness of this algorithm, it suffices to prove:

iV,Ss.t.|S|d,hi,Shi,𝒩i𝒩iS

Let pick a set S such that 𝒩iS, and let k be a node in 𝒩iS. We know hi,S{k}hi,S+(i and k are the only infected nodes). Since i and k are neighbors, (i and k are the only infected nodes)>0, and therefore hi,S{k}>hi,S. Following this line of reasoning, if 𝒩iS, S we can always increase the value of hi,S by adding a node of 𝒩i.

However, it is impossible to increase the value of hi,𝒩i, because if i and any other node of the graph are co-infected, we know one node of 𝒩i is also infected. Therefore, the algorithm is correct.

Unfortunately, we do not have direct access to hi,S. We therefore study how many samples are needed to replace hi,S by its estimate h^i,S while preserving the correctness of the algorithm. Just like in Section 2.1, we notice that we do not need the {h^i,S}iV|S|d to be close to their limit, we only need the indexes of their ordering to be the same. We notice that the neighborhood 𝒩i of i is the set of smallest size which is the most often infected at the same time as i is (𝒩i=argmax|R|dhi,R). Therefore, we must have that for every set S, h^i,Sh^i,𝒩i. However, some other sets might achieve the same value if we do not observe enough cascades. This could happen in two cases:

  • Not all the nodes of the neighborhood were infected, and therefore a subset T1𝒩i of the neighborhood is such that h^i,T1=h^i,𝒩i. Since |T1|<|𝒩i|, the algorithm would return T1 and not 𝒩i, which would be a failure case.

  • Some other node k is always infected every time a specific node j𝒩i is infected. The set T2=𝒩i{j}{k} will therefore be such that h^i,T2=h^i,𝒩i, and the algorithm would not know which maximum to pick. This is a failure case as well.

We identify events which guarantee that the failure cases above cannot arise. Let Ei,jm be the event that i and j were the only infected nodes during cascade m. Ei,j=1mMEi,jm is therefore the event that there exists a cascade for which only i and j were infected. If such a cascade exists, the set Smax=argmax|R|dh^i,R must contain j. If the event Ei,j happens for every node j in the neighborhood of i, we can therefore guarantee Algorithm 3.1 is correct. We characterize the sample complexity needed for this below.

Proposition 4.

Let i be a node, and let j be one of its neighbors. Let S be a set of size |S|d, such that iS and jS. With probability at least 1-δdNd+2, among M=(d+2)Nlog(N)+Nlog(dδ)pmin(1-pmax)2(d-1) cascades, there exists a cascade in which i and j are infected, but no nodes of S are infected.

Proof.

We notice that if, during cascade m, the only infected nodes are i and j, no nodes of S are infected. This is exactly the event Ei,jm defined above.

(Ei,jm) =(i and j are the only infected nodes during cascade m)
(cascade started at ij was the only infected neighbor of i, and j did not infect any nodes)
1Npmin(1-pmax)2(d-1).

The probability that this never happens during M cascade is exactly the complement of the event Ei,j defined above:

(Not(Ei,j)) (1-1Npmin(1-pmax)2(d-1))M
e-MpminN(1-pmax)2(d-1)
δdNd+2.

Lemma 6.

With probability at least 1-δ, if we observe M=(d+2)Nlog(N)+Nlog(dδ)pmin(1-pmax)2(d-1) cascades, Algorithm 3.1 is correct, and has running time O(MNd+2)=O(dNd+3log(N)).

Proof.

Let 𝒮={S𝒫(V),|S|d} be the set of sets of nodes of size at most d, let 𝒮i={S𝒮,iS} be the set of sets of 𝒮 which do not contain i, and let 𝒩i={j,(i,j)E} be the neighborhood of i. We pick a set S𝒮, such that S𝒩i.

We first prove that h^i,Sh^i,𝒩i. Since the neighborhood of i separates i from the rest of the graph, and infected nodes are connected, we can conclude that every time i and another node of S are infected, one node in the neighborhood 𝒩i of i is infected as well. Therefore, we cannot increase h^i,S without also increasing h^i,𝒩i. In particular, this means that even if |S|>|𝒩i| (for instance if 𝒩iS), we have h^i,Sh^i,𝒩i. We therefore know that 𝒩iargmaxR𝒮ih^i,R.

We now prove that with probability at least 1-δ, 𝒩i is the set of argmaxR𝒮ih^i,R of minimal size. To do so, we notice that if there exists a cascade such that i and j𝒩i are infected, but no node of S is infected, then this implies h^i,S<h^i,𝒩i. Indeed, as shown above, every time we increase h^i,S, we also increase h^i,𝒩i, and we know there exists one cascade for which we increased h^i,𝒩i without increasing h^i,S. We now calculate the probability Pfailure that such a cascade does not exist for all nodes i in the graph, all nodes j in their neighborhood, and all sets S which do not include i or j.

Pfailure (iV,j𝒩i,S𝒮i,j,every time i and j are infected, a node of S is also infected)
iVj𝒩iS𝒮i,j(every time i and j are infected, a node of S is also infected).

We use Proposition 4 to bound this quantity:

Pfailure iVj𝒩iS𝒮i,jδdNd+2
NdNd+1δdNd+2
δ.

We now know that with probability at least 1-δ:

Rj𝒩i𝒮i,j,h^i,R<h^i,𝒩i.

This implies that no set of j𝒩i𝒮i,j can belong in argmaxR𝒮ih^i,R. However, we have:

𝒮ij𝒩i𝒮i,j={S𝒮,𝒩iS}.

In particular, this means that 𝒩i is the only set of 𝒮ij𝒩i𝒮i,j of minimal size. This shows that with probability at least 1-δ, 𝒩i is the set of argmaxR𝒮ih^i,R of minimal size. This proves Algorithm 3.1 is correct, and that we can learn the structure of any graph of maximum degree d with M=(d+2)Nlog(N)+Nlog(dδ)pmin(1-pmax)2(d-1) cascades.

Since we do at most one operation by pair of (node, set) and by cascade, the running time is 𝒪(NNd+1M), which is what we wanted to prove.

This leads to our theorem for learning the structure of any bounded-degree graph.

Theorem 5.

With probability at least 1-δ, in the extreme-noise setting, we can learn the structure of any graph of maximum degree d with M=O(dNlog(Nδ)pmin(1-pmax)2d) cascades in polynomial time.

Let us now assume that pmax1d. This assumption is reasonable when you expect a constant number of infections by time step. For instance, it makes sense for real diseases, for which carriers have to meet to transmit it (we can only meet a constant number of people each day). It would not make sense for social networks, in which it is possible to reach many followers with each post.

Corollary 2.

With probability at least 1-δ, in the extreme-noise setting, if we assume pmax1d and pmin constant, we can learn the structure of any graph of maximum degree d with M=O(dNlog(Nδ)). This sample complexity is optimal up to log-factors, and almost matches the lower bound established in the no-noise setting.

Proof.

We need at least 𝒪(dNlog(N)) samples to learn the structure of a bounded-degree graph with maximum degree d, according to the lower bound in [29].

3.2 Bounded-degree weights

For the remainder of this paper, we state results in the limited-noise setting.

If we consider cascades of size k, the exact probability of infection between two nodes is a multivariate polynomial of degree N on N(N-1) variables (the variables here would be the weights of the graph), with a sum of up to 2l=1k(N-2)!(N-k)! terms. If the graph has more than five nodes, the resulting polynomial is of degree more than five.

For our algorithm, we therefore only use cascades of size 1 or 2. This is a waste of the data, since we simply discard cascades of larger size. However, we are not aware of techniques on how to utilize larger cascades in the limited-noise setting. Cascades of size 1 or 2 are simple enough that we can write explicitly their probability. This allows use to:

  1. 1.

    Design estimators for which we can calculate the exact limit.

  2. 2.

    Combine these estimators to transform a polynomial system of degree N to a polynomial system of degree 2.

  3. 3.

    Solve this system exactly and obtain the probabilities of infection.

3.3 Estimators

We start by designing a few estimators:

Definition 6.

We introduce two sets of N(N-1) estimators and one set of N estimators. These estimators can be computed even if we only have access to the noisy times of infection.

h^i,j2 =Fraction of cascades for which only i and j are infected.
f^i<j2 =Fraction of cascades for which only i and j are infected, and ti<tj.
e^i1 =Fraction of cascades for which only i is infected.

To simplify the coming notations, let us introduce sk, which is the probability that the noise on j has delay at least k relative to the noise on i. Since the noise is i.i.d., this value is independent from i and j: sk=(nj-nik)44 4 For instance, for geometric noise of parameter q, we have: sk=tj=max(0,k)ti=0tj-k(1-q)ti+tj=(1-q)max(0,k)(1-(1-q)1-min(0,k)2-q).. For instance, if i infected j during cascade m, the probability that the noise did not flip the order of infection (i.e. Tim<Tjm) is (njni)=s0. In the reverse case, the probability that the noise flipped the order of infection is (Tjm+nj<Tim+ni)=(1+nj<ni)=(ni-nj2)=s2.

We now compute the limits of those estimators.

Proposition 5.

As the number of cascades M goes to infinity, the estimators introduced above tend to the following limit:

h^i,j2 M1N(pij+pji)ki,j(1-pik)(1-pjk),
f^i<j2 M1N(pijs0+pjis2)ki,j(1-pik)(1-pjk),
e^i1 M1N(1-pij)ki,j(1-pik).
Proof.

The proof is very similar to 3. See details in Appendix B.2.

3.4 Solving the system

The limit of these estimators, as seen as a function of the probabilities of infection, is a complex polynomial on up to 2(N-1) variables. The crux of our algorithm is to combine those estimators in order to cancel out most of these variables, and create N-1 systems of two equations of degree 2 and two unknowns, which we then solve.

Proposition 6.

Let V^ij=f^i<j2h^i,j2+Ne^i1e^j1. Then the limit of V^ij as the number of cascades M goes to infinity only depends on the variables pij and pji :

V^ijMpijs0+pjis21+pijpji.
Proof.

Since all the estimators converge towards a constant, we can use Slutsky’s lemma to find the limit of V^ij. Let Vij be the limit of V^ij as M goes to infinity. We notice that the ki,j(1-pik)(1-pjk) parts cancel each other out:

Vij =fi<j2hi,j2+Nei1ej1
=1N(pijs0+pjis2)[ki,j(1-pik)(1-pjk)](1N(pij+pji)+N(1-pij)(1-pji)N2)[ki,j(1-pjk)(1-pjk)]
=(pijs0+pjis2)1+pijpji.

We can therefore use this equality to deduce the weights of all the edges of the graph:

Theorem 6.

For any graph, for any noise distribution having finite values, we can learn the weights of all the edges of the graph. In particular, we can compute a quantity which converges to the true weight of each edge:

p^ij=2(V^jis2-V^ijs0)(s02-s22)+(s02-s22)2-4(V^jis2-V^ijs0)(V^ijs2-V^jis0).
Proof.

We present a sketch of the proof here. The details can be found in Appendix B.1. We know V^ij tends to Vij=pijs0+pjis21+pijpji. Using both Vij and Vji, we can establish this second-degree equation:

Vjis2-Vijs0+(s02-s22)pij+(Vijs2-Vjis0)pij2=0

We recall that by definition, s0s2. We also notice that if pij=q1 and pji=q2 is a pair of solutions of this system, then pij=1q2 and pji=1q1 forms the other pair of solution, which implies there is uniqueness of solutions in [0,pmax]. Since the real probabilities of infection satisfy this system, we also know the solution exists. Let Δ=(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0). The only solution of this system in [0,pmax] is then:

pij =2(Vjis2-Vijs0)(s02-s22)+Δ.

3.5 Sample complexity

We establish the sample complexity needed to estimate pij with precision ϵ. To do so, we start by estimating Vij. Note that we only consider the pair of nodes i and j if, among the M samples, there exists a cascade of size 2 in which i and j are the only infected nodes (i.e. hi,j2>0). Otherwise, we set p^ij=p^ji=0 as our estimate for pij.

Proposition 7.

With probability 1-δN2, with M= samples, we can estimate Vij with precision ϵV.

Proof.

We present a sketch of the proof; the details can be found in Appendix B.2. As in Proposition 4, we use Hoeffding’s inequality:

(|f^i<j2-fi<j2|>ϵ1) 2e-2Mϵ12,
(|h^i,j2-hi,j2|>ϵ1) 2e-2Mϵ12,
(|e^i1-ei1|>ϵ1) 2e-2Mϵ12,
(|e^j1-ej1|>ϵ1) 2e-2Mϵ12.

We use this to bound above Vij:

V^ij Vij[1+ϵ1fi<j2+ϵ11+N(ei1+ej1)hi,j2+Nei1ej1+o(ϵ1)]

By bounding below the denominators and bounding above the numerator, we finally obtain:

|V^ij-Vij| ϵ14Npmins2(1-pmax)2d+o(ϵ1).

Therefore, by union bound, and by choosing ϵ14Npmins2(1-pmax)2d=ϵV, and setting 2e-2Mϵ12=δ3N2, we obtain:
With M=1ϵV216N2pmin2s22(1-pmax)4d2log(3N)-log(δ)2 samples, we can guarantee |V^ij-Vij|ϵV with probability at least 1-δN2.

Once we have estimated Vij with precision ϵV, estimating pij is unfortunately still not an easy task. Indeed, let Δ=(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0). We have pij=-(s02-s22)+Δ2(Vijs2-Vjis0), which means that Δ has to be positive for this quantity to be defined. However, for general values of Vij and Vji, Δ can be negative. We therefore use the framework of constrained optimization to bound Δ away from 0.

Lemma 7.

Let Δ=(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0). We have:

Δ(s02-s22)2(1-pmax)21+pmax2.
Proof.

We find the lower bound on Δ by reformulating the problem as a constrained optimization problem, and introducing the corresponding Lagrangian multipliers. The details can be found in Appendix B.1.

Now that we have established this bound on Δ, we can give the sample complexity needed to estimate pij.

Proposition 8.

Assuming we can estimate Vij within precision ϵV, then we can estimate pij within precision ϵ=6ϵV(1+pmax2)(s02-s22)2(1-pmax)2.

Proof.

This is a simple derivation which can be found in Appendix B.2.

We finally piece everything together, and use a union bound on all the pij to obtain the final sample complexity of our algorithm for learning the weights of general bounded-degree graphs.

Theorem 7.

In the limited-noise setting, with probability at least 1-δ, with M=O(e4pmax(d+1)pmin2s22(s02-s22)4N2ϵ2log(Nδ)) samples, we can learn the weights of any bounded-degree graph up to precision epsilon.

Proof.

We obtain the desired bound by combining Proposition 7 and Proposition 8. See details in Appendix B.2.

Once again, if we assume pmax1d and pmin constant, we obtain the following sample complexity:

Corollary 3.

In the limited-noise setting, with probability at least 1-δ, if pmax1d and pmin constant, we can learn the weights of any bounded-degree graph up to precision epsilon with M=O(1s22(s02-s22)4N2ϵ2log(Nδ)) samples.

4 General graphs

In the limited-noise setting, we notice that nothing prevents us from using the algorithm for learning bounded-degree weights for general graph. This proves this problem is solvable for any graph, and any noise distribution. As explained in Section 1.2, this is not an obvious result. However, if we do not assume pmax1N, the sample complexity is now exponential.

Theorem 8.

In the limited-noise setting, with probability at least 1-δ, it is possible to learn all the weights of any graph, for any noise distribution, with finite (but potentially exponential) sample complexity.

5 Discussion and Future work

In this paper, we presented the first results to learn the edges of a graph from noisy times of infection. We showed we learn the structure of any bidirectional tree or any bounded-degree graph (note that not all trees are bounded-degree graphs) with optimal sample complexity (up to log-factors).

However, our results are not tight for learning the weights of bidirectional trees or bounded-degree graphs. In the no-noise setting, [29] proves this can be achieved with 𝒪(d2Nlog(N)) for graphs of maximum degree d with correlation decay. Our results have sample complexity 𝒪(N2log(N)) without any assumption of correlation decay. It is an open problem to understand whether it is possible to achieve a sample complexity of 𝒪(d2Nlog(N)) in the limited-noise setting, and whether assuming correlation decay is necessary to obtain such a result.

Moreover, for learning the weights of a general bounded-degree graph, we only use cascades of size 1 or 2. If we are given an infinite number of cascades of size bigger than 2, our current algorithm cannot learn the weights of the graph. Future work could develop an algorithm without such a weakness.

All our results for learning the weights of the edges are in the limited-noise setting. Whether or not it is possible to learn the noise in the extreme-noise setting is an other question of interest for future work.

Finally, we have made no restriction on the distribution of the noise we add, other than it is finite. It would be interesting to study whether stronger restrictions on the noise (for instance Gaussian noise) would lead to stronger results. It would also be interesting to allow infinite noise, and develop algorithms which are robust to errors in the infection status of a node (our current algorithms can return wrong graph structure with only one adversarially chosen false positive).

References

  • [1] Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, and Alessandro Panconesi. Trace complexity of network inference. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’13, page 491, 2013.
  • [2] Ery Arias-castro, Emmanuel J Candès, and Arnaud Durand. Detection of an anomalous cluster in a network. The Annals of Statistics, 39(1):278–304, 2011.
  • [3] Ery Arias-castro and S T Nov. Detecting a Path of Correlations in a Network. pages 1–12.
  • [4] Daniel Bernoulli and Sally Blower. An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it. Reviews in medical virology, 14:275–288, 2004.
  • [5] A. Cayley. A theorem on trees. In Collected Mathematical Papers Vol. 13, pages 26–28. Cambridge University Press, 1897.
  • [6] Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon Kleinberg, and Jure Leskovec. Can Cascades be Predicted? In Proceedings of the 23rd international conference on World wide web (WWW’ 14), 2014.
  • [7] Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, page 201517441, 2016.
  • [8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal ofthe Royal Statistical Society, 39(1):1–38, 1977.
  • [9] Kimon Drakopoulos, Asuman Ozdaglar, and John N. Tsitsiklis. An efficient curing policy for epidemics on graphs. arXiv preprint arXiv:1407.2241, (December):1–10, 2014.
  • [10] Kimon Drakopoulos, Asuman Ozdaglar, and John N. Tsitsiklis. A lower bound on the performance of dynamic curing policies for epidemics on graphs. (978):3560–3567, 2015.
  • [11] Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath. Rumor source obfuscation on irregular trees. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS’ 16 ), pages 153–164. ACM, 2016.
  • [12] Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath. Hiding the Rumor Source. IEEE Transactions on Information Theory, 63(10):6679–6713, 2017.
  • [13] Giulia Fanti, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Spy vs. Spy: Rumor Source Obfuscation. Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 14), pages 271–284, 2015.
  • [14] Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake News Mitigation via Point Process Based Intervention. In Proceedings of the 34th International Conference on Machine Learning (ICML’ 17), 2017.
  • [15] Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Eigentaste: A Constant Time Collaborative Filtering Algorithm. Information Retrieval, 4(2):133–151, 2001.
  • [16] Manuel Gomez-rodriguez, Jure Leskovec, and Andreas Krause. Inferring Networks of Diffusion and Influence. In ACM Transactions on Knowledge Discovery from Data (TKDD’ 12), volume 5, 2012.
  • [17] Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. Structure and Dynamics of Information Pathways in Online Media. In 6th International Conference on Web Search and Data Mining (WSDM 2013), 2013.
  • [18] Jessica Hoffmann and Constantine Caramanis. The Cost of Uncertainty in Curing Epidemics. Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS’ 18), 2(2):11–13, 2018.
  • [19] Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. Discovering Latent Influence in Online Social Activities via Shared Cascade Poisson Processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’ 13), 2013.
  • [20] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03, 2003.
  • [21] Justin Khim and Po-Ling Loh. Permutation Tests for Infection Graphs. pages 1–28, 2017.
  • [22] Justin Khim and Po-Ling Loh. A theory of maximum likelihood for weighted infection graphs. pages 1–47, 2018.
  • [23] Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis. Global Convergence of the EM Algorithm for Mixtures of Two Component Linear Regression. XX:1–57, 2019.
  • [24] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Cost-effective Outbreak Detection in Networks. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’07), page 420, 2007.
  • [25] Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, and Sanjay Shakkottai. Localized epidemic detection in networks with overwhelming noise. pages 1–27, 2014.
  • [26] Chris Milling, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. Network Forensics : Random Infection vs Spreading Epidemic. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 12), 2012.
  • [27] Chris Milling, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. Local detection of infections in heterogeneous networks. Proceedings - IEEE INFOCOM, 26:1517–1525, 2015.
  • [28] Seth Myers, Chenguang Zhu, and Jure Leskovec. Information Diffusion and External Influence in Networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’ 12), pages 33–41, 2012.
  • [29] Praneeth Netrapalli and Sujay Sanghavi. Learning the Graph of Epidemic Cascades. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 12), pages 211–222, 2012.
  • [30] M. E. J. Newman. Networks: An Introduction, volume 23. 2014.
  • [31] Devavrat Shah and Tauhid Zaman. Detecting sources of computer viruses in networks: theory and experiment. In ACM SIGMETRICS Performance Evaluation Review, volume 38, pages 203–214. ACM, 2010.
  • [32] Devavrat Shah and Tauhid Zaman. Rumors in a Network : Who ’ s the Culprit ? IEEE Transactions on information theory, 57(8):1–43, 2010.
  • [33] Devavrat Shah and Tauhid Zaman. Rumor centrality: a universal source detector. In ACM SIGMETRICS Performance Evaluation Review, volume 40, pages 199–210. ACM, 2012.
  • [34] Sam Spencer and R Srikant. On the impossibility of localizing multiple rumor sources in a line graph. ACM SIGMETRICS Performance Evaluation Review, 43(2):66–68, 2015.
  • [35] Zhaoxu Wang, Wenxiang Dong, Wenyi Zhang, and Chee Wei Tan. Rumor source detection with multiple observations: Fundamental limits and algorithms. In ACM SIGMETRICS Performance Evaluation Review, volume 42, pages 1–13. ACM, 2014.
  • [36] Liang Wu and Huan Liu. Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate. In (WSDM 2018) The 11th ACM International Conference on Web Search and Data Mining, 2018.
  • [37] Ali Zarezade, Ali Khodadadi, Mehrdad Farajtabar, Hamid R Rabiee, and Hongyuan Zha. Correlated Cascades : Compete or Cooperate. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), pages 238–244, 2017.
  • [38] Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15 ), 2015.

Appendix A Bidirectional tree

We include here the full calculations for learning the weights of the bidirectional tree.

Proposition 9.

If we know (i,j) is an edge in the original tree, then the probability of infection along this edge is given by:

pij=fi<js0-fj<is2gi,j(s02-s22)+fi<js0-fj<is2.
Proof.

According to Lemma 3, we have:

fi<j =𝒫j(i)pijs0+𝒫i(j)pjis2
gi,j =𝒫j(i)(1-pij)
fj<i =𝒫i(j)pjis0+𝒫j(i)pijs2
gj,i =𝒫i(j)(1-pji).

We have 4 second-order equations, with 4 unknowns: pij, pji, 𝒫j(i) and 𝒫i(j). We solve it:

fi<j+fj<i =(𝒫j(i)pij+𝒫i(j)pji)(s0+s2)
fi<j-fj<i =(𝒫j(i)pij-𝒫i(j)pji)(s0-s2)
𝒫j(i)pij =12(fi<j+fj<is0+s2+fi<j-fj<is0-s2)
=fi<js0-fj<is2s02-s22
𝒫j(i) =gi,j+𝒫j(i)pij
=gi,j+fi<js0-fj<is2s02-s22
pij =𝒫j(i)pij𝒫j(i)
pij =fi<js0-fj<is2gi,j(s02-s22)+fi<js0-fj<is2.

Lemma 8.

With M=N2ϵ2log(6δ)((s02-s22+s0+s2)pmax+s0+s2)2(s02-s22)2 samples, with probability at least 1-δ, we have:

|p^ij-pij|ϵ.
Proof.

Using Hoeffding’s inequality:

(|f^i<j-fi<j|>ϵ1) 2e-2Mϵ12,
(|f^j<i-fj<i|>ϵ1) 2e-2Mϵ12,
(|g^i,j-gi,j|>ϵ1) 2e-2Mϵ12.

Choosing M=1ϵ12log(6δ), we have that with probability at least 1-δ, all the following hold:

|f^i<j-fi<j| ϵ1,
|f^j<i-fj<i| ϵ1,
|g^i,j-gi,j| ϵ1.

Hence, with probability at least 1-δ, we have:

p^ij =f^i<js0-f^j<is2g^i,j(s02-s22)+f^i<js0-f^j<is2
(fi<j+ϵ1)s0-(fj<i-ϵ1)s2(gi,j-ϵ1)(s02-s22)+(fi<j-ϵ1)s0-(fj<i+ϵ1)s2
=fi<js0-fj<is2+ϵ1(s0+s2)gi,j(s02-s22)+fi<js0-fj<is2-ϵ1(s02-s22+s0+s2)
=pij11-ϵ1(s02-s22+s0+s2)gi,j(s02-s22)+fi<js0-fj<is2
+ϵ1s0+s2gi,j(s02-s22)+fi<js0-fj<is2+o(ϵ1)
=pij(1+ϵ1(s02-s22+s0+s2)gi,j(s02-s22)+fi<js0-fj<is2)
+ϵ1s0+s2gi,j(s02-s22)+fi<js0-fj<is2+o(ϵ1)
p^ij-pij ϵ1(s02-s22+s0+s2)pmaxgi,j(s02-s22)+fi<js0-fj<is2
+ϵ1s0+s2gi,j(s02-s22)+fi<js0-fj<is2+o(ϵ1).

Using the results from Lemma 3, we have:

fi<j =𝒫j(i)pijs0+𝒫i(j)pjis2,
gi,j =𝒫j(i)(1-pij).

We use it to simplify the denominator:

denominator =gi,j(s02-s22)+fi<js0-fj<is2
=(𝒫j(i)(1-pij))(s02-s22)
+(𝒫j(i)pijs0+𝒫i(j)pjis2)s0
-(𝒫i(j)pjis0+𝒫j(i)pijs2)s2
=𝒫j(i)(s02-s22)
s02-s22N.

Plugging back above:

p^ij-pij ϵ1(s02-s22+s0+s2)pmaxgi,j(s02-s22)+fi<js0-fj<is2
+ϵ1s0+s2gi,j(s02-s22)+fi<js0-fj<is2+o(ϵ1)
ϵ1N(s02-s22+s0+s2)pmaxs02-s22+ϵ1Ns0+s2s02-s22+o(ϵ1).

By symmetry, we obtain:

|p^ij-pij|ϵ1N(s02-s22+s0+s2)pmax+s0+s2s02-s22+o(ϵ1).

By choosing ϵ1=ϵNs02-s22(s02-s22+s0+s2)pmax+s0+s2, we therefore have:
With M=N2ϵ2log(6δ)((s02-s22+s0+s2)pmax+s0+s2)2(s02-s22)2 samples, with probability at least 1-δ, we have |p^ij-pij|ϵ.

Appendix B Bounded-degree graphs

B.1 Solving the system

Lemma 9.

Let Δ=(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0). We have:

Δ(s02-s22)2(1-pmax)21+pmax2.
Proof.

Finding a lower bound for Δ can be achieved through minimizing Δ, or maximizing (Vjis2-Vijs0)(Vijs2-Vjis0). We want to solve:

maximizeVij,Vji (Vjis2-Vijs0)(Vijs2-Vjis0)
subject to Vij=pijs0+pjis21+pijpji,
Vji=pjis0+pijs21+pijpji,
pij0,
pmax-pij0,
pji0,
pmax-pji0.

To do so, we introduce Lagrangian multipliers. By replacing Vij and Vji with their actual value, the optimization problem above only has affine constraints, so it satisfies the linearity constraint qualification for the Karush-Kuhn-Tucker conditions. In other words, all the partial derivatives of the Lagrangian are equal to 0 for an optimal point.

=(Vij,Vji,pij,pji,λ1,λ2,μ1,μ2,μ3,μ4)
=(Vjis2-Vijs0)(Vijs2-Vjis0)
-λ1(vij(1+pijpji)-pijs0+pjis2)
-λ2(vji(1+pijpji)-pjis0+pijs2)
-μ1pij-μ2(pmax-pij)
-μ3pji-μ4(pmax-pji).

We calculate the gradients of .

Vij =Vji(s02+s22)-2Vijs0s2-λ1(1+pijpji),
Vji =Vij(s02+s22)-2Vjis0s2-λ2(1+pijpji),
pij =-λ1(Vijpji-s0)-λ2(Vjipji-s2)-μ1+μ2,
pji =-λ1(Vijpij-s2)-λ2(Vjipij-s0)-μ3+μ4.

From now on, we find the set X0 of points for which all the partial derivatives are null. We know the solution of the maximization problem is the point of X0 which maximizes the objective function.

Let us assume an interior point solution exists. For this point, all the gradients of are equal to 0. Since it is an interior point, we also have μ1=μ2=μ3=μ4=0 by complementary slackness. Solving this system, we obtain:

λ1 =(s02-s22)pjis0-pijs2(1+pijpji)2,
λ2 =(s02-s22)pijs0-pjis2(1+pijpji)2.

Plugging this in above, the condition pij=0 becomes pijpji(1-pji)=0. However, this is impossible for an interior point, since 0<pij,pji<pmax<1. Therefore, the extrema of Δ are attained when at least one constraint is active.

We notice that if the conditions pij=0 or pji=0 are active, then (Vjis2-Vijs0)(Vijs2-Vjis0)=0. Let us suppose (without loss of generality, by symmetry of the problem) that we have pij=pmax. The objective function is then increasing in pji. Therefore, Δ is minimized when pij=pji=pmax, which implies vij=Vji=pmax(s0+s2)1+pmax2. In this case:

Δ (s02-s22)2-4Vij2(s0-s2)2
(s02-s22)2-4(pmax(s0+s2)1+pmax2)2(s0-s2)2
(s02-s22)2[1-4pmax1+pmax2]
(s02-s22)2(1-pmax)21+pmax2.

This expression is always positive, which is what we wanted.

Theorem 9.

For any graph, for any noise distribution having finite values, we can learn the weights of all the edges of the graph. In particular, we can compute a quantity which converges to the true weight of each edge:

p^ij=2(V^jis2-V^ijs0)(s02-s22)+(s02-s22)2-4(V^jis2-V^ijs0)(V^ijs2-V^jis0).
Proof.
V^ij MVij
:=pijs0+pjis21+pijpji
pji =Vij-pijs0s2-Vijpij

We can plug this in Vji:

Vji =pjis0+pijs21+pijpji
Vji[1+pijVij-pijs0s2-Vijpij] =Vij-pijs0s2-Vijpijs0+pijs2

After some shuffling around, we obtain the second-degree equation:

Vjis2-Vijs0+(s02-s22)pij+(Vijs2-Vjis0)pij2=0

We recall that by definition, s0s2. We also notice that if pij=q1 and pji=q2 is a pair of solutions of this system, then pij=1q2 and pji=1q1 forms the other pair of solution, which implies there is uniqueness of solutions in [0,pmax]. Since the real probabilities of infection satisfy this system, we also know the solution exists. Let Δ=(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0). The only solution of this system in [0,pmax] is:

pij =-(s02-s22)+Δ2(Vijs2-Vjis0)
=(s02-s22)2-(s02-s22)2-4(Vjis2-Vijs0)(Vijs2-Vjis0)2(Vjis0-Vijs2)((s02-s22)+Δ)
=2(Vjis2-Vijs0)(s02-s22)+Δ

B.2 Sample complexity

Proposition 10.

As the number of cascades M goes to infinity, the estimators below tend to the following limit:

h^i,j2 M1N(pij+pji)ki,j(1-pik)(1-pjk)
f^i<j2 M1N(pijs0+pjis2)ki,j(1-pik)(1-pjk)
e^i1 M1N(1-pij)ki,j(1-pik)
Proof.

Using the law of large numbers:

f^i<j2 M𝔼[f^i<j2]
=(i source, infects j, no other infections, delay 0)
+(j source, infects i, no other infections, delay 2)
=1Npijki,j(1-pik)ki,j(1-pjk)s0
+1Npjiki,j(1-pjk)ki,j(1-pik)s2
=1N(pijs0+pjis2)ki,j(1-pik)(1-pjk).

In the same vein, we have:

h^i,j2 M𝔼[h^i,j2]
=(i source, infects j, no other infections)
+(j source, infects i, no other infections)
=1N(pij+pji)ki,j(1-pik)(1-pjk)
e^i1 M𝔼[e^i1]
=(i source, no other infections)
=1Nki(1-pik)
=1N(1-pij)ki,j(1-pik).

Proposition 11.

With probability 1-δN2, with M= samples, we can estimate Vij with precision ϵV.

Proof.

As in Proposition 4, we use Hoeffding’s inequality:

(|f^i<j2-fi<j2|>ϵ1) 2e-2Mϵ12,
(|h^i,j2-hi,j2|>ϵ1) 2e-2Mϵ12,
(|e^i1-ei1|>ϵ1) 2e-2Mϵ12,
(|e^j1-ej1|>ϵ1) 2e-2Mϵ12.

We use this to bound above Vij:

V^ij =f^i<j2h^i,j2+Ne^i1e^j1
fi<j2+ϵ1hi,j2+ϵ1+N(ei1+ϵ1)(ej1+ϵ1)
=Vij1+ϵ1fi<j21+ϵ1+Nϵ1(ei1+ej1)hi,j2+Nei1ej1
=Vij[1+ϵ1fi<j2+ϵ11+N(ei1+ej1)hi,j2+Nei1ej1+o(ϵ1)]

We bound below the denominators:

fi<j2 1N(pijs0+pjis2)(1-pmax)2d
pmins2N(1-pmax)2d
hi,j2+Nei1ej1 1N(1+pijpji)(1-pmax)2d
1N(1-pmax)2d

We bound above the numerator:

N(ei1+ei1) =N(1Nki(1-pik)+1Nkj(1-pjk))
N(1N+1N)
2.

Plugging in above:

V^ij =Vij[1+ϵ1fi<j2+ϵ11+N(ei1+ej1)hi,j2+Nei1ej1+o(ϵ1)]
Vij[1+ϵ1pmins2N(1-pmax)2d+ϵ1(1+2)1N(1-pmax)2d+o(ϵ1)].

Using Vij1, and by symmetry:

|V^ij-Vij| =ϵ1N(1-pmax)2d[1pmins2+3]+o(ϵ1)
=ϵ1N(1-pmax)2d[1+3pmins2pmins2]+o(ϵ1)
ϵ14Npmins2(1-pmax)2d+o(ϵ1).

Therefore, by union bound, and by choosing ϵ14Npmins2(1-pmax)2d=ϵV, and setting 2e-2Mϵ12=δ3N2, we obtain:
With M=1ϵV216N2pmin2s22(1-pmax)4d2log(3N)-log(δ)2 samples, we can guarantee |V^ij-Vij|ϵV with probability at least 1-δN2. ∎

Proposition 12.

Assuming we can estimate Vij within precision ϵV, then we can estimate pij within precision ϵ=6ϵV(1+pmax2)(s02-s22)2(1-pmax)2.

Proof.

If we know |V^ij-Vij|<ϵV and |V^ji-Vji|<ϵV:

p^ij =2(V^jis2-V^ijs0)(s02-s22)+(s02-s22)2-4(V^jis2-V^ijs0)(V^ijs2-V^jis0)
2(Vjis2-Vijs0)+2ϵV(s0+s2)(s02-s22)2+Δ-4ϵV(s0+s2)2(Vij+Vji).

We recall s0+s21, Vij1, pij1, and 1+pmax2Δ1 (Lemma 7). Hence:

p^ij 2(Vjis2-Vijs0)+2ϵV(s02-s22)2+Δ1-8ϵVΔ
pij(1+4ϵVΔ((s02-s22)2+Δ))+2ϵV(s02-s22)2+Δ+o(ϵV)
pij+6ϵVΔ.

By symmetry, and using the bound on Δ stated above, we conclude that if we know Vij and Vji up to precision ϵV, we know pij up to precision ϵ=6ϵV(1+pmax2)(s02-s22)2(1-pmax)2.

Theorem 10.

In the limited-noise setting, with probability at least 1-δ, with M=O(e4pmax(d+1)pmin2s22(s02-s22)4N2ϵ2log(Nδ)) samples, we can learn the weights of any bounded-degree graph up to precision epsilon.

Proof.

With probability at least 1-δN2, using
M=1ϵV216N2pmin2s22(1-pmax)4d2log(3N)-log(δ)2 samples, we can guarantee |V^ij-Vij|ϵV with probability at least 1-δN2 samples, knowing 1ϵV=6(1+pmax2)ϵ(s02-s22)2(1-pmax)2. This gives us a sample complexity of:

M 18ϵ2(s02-s22)4(1-pmax)4(d+1)N2[4pmins2]2log(9N2δ)
1152e4pmax(d+1)pmin2s22(s02-s22)4N2ϵ2log(9N2δ)
=𝒪(e4pmax(d+1)pmin2s22(s02-s22)4N2ϵ2log(Nδ)).