Abstract
We consider the problem of learning the weighted edges of a graph byobserving the noisy times of infection for multiple epidemic cascades on thisgraph. Past work has considered this problem when the cascade information,i.e., infection times, are known exactly. Though the noisy setting is wellmotivated by many epidemic processes (e.g., most human epidemics), to the bestof our knowledge, very little is known about when it is solvable. Previous workon the nonoise setting critically uses the ordering information. If noise canreverse this  a node's reported (noisy) infection time comes after thereported infection time of some node it infected  then we are unable to seehow previous results can be extended. We therefore tackle two versions of the noisy setting: the limitednoisesetting, where we know noisy times of infections, and the extremenoisesetting, in which we only know whether or not a node was infected. We provide apolynomial time algorithm for recovering the structure of bidirectional treesin the extremenoise setting, and show our algorithm almost matches lowerbounds established in the nonoise setting, and hence is optimal up tologfactors. We extend our results for general degreebounded graphs, whereagain we show that our (polytime) algorithm can recover the structure of thegraph with optimal sample complexity. We also provide the first efficientalgorithm to learn the weights of the bidirectional tree in the limitednoisesetting. Finally, we give a polynomial time algorithm for learning the weightsof general boundeddegree graphs in the limitednoise setting. This algorithmextends to general graphs (at the price of exponential running time), provingthe problem is solvable in the general case. All our algorithms work for anynoise distribution, without any restriction on the variance.
Quick Read (beta)
Learning Graphs from Noisy Epidemic Cascades
Abstract
We consider the problem of learning the weighted edges of a graph by observing the noisy times of infection for multiple epidemic cascades on this graph. Past work has considered this problem when the cascade information, i.e., infection times, are known exactly. Though the noisy setting is well motivated by many epidemic processes (e.g., most human epidemics), to the best of our knowledge, very little is known about when it is solvable. Previous work on the nonoise setting critically uses the ordering information. If noise can reverse this – a node’s reported (noisy) infection time comes after the reported infection time of some node it infected – then we are unable to see how previous results can be extended. We therefore tackle two versions of the noisy setting: the limitednoise setting, where we know noisy times of infections, and the extremenoise setting, in which we only know whether or not a node was infected. We provide a polynomial time algorithm for recovering the structure of bidirectional trees in the extremenoise setting, and show our algorithm almost matches lower bounds established in the nonoise setting, and hence is optimal up to logfactors. We extend our results for general degreebounded graphs, where again we show that our (polytime) algorithm can recover the structure of the graph with optimal sample complexity. We also provide the first efficient algorithm to learn the weights of the bidirectional tree in the limitednoise setting. Finally, we give a polynomial time algorithm for learning the weights of general boundeddegree graphs in the limitednoise setting. This algorithm extends to general graphs (at the price of exponential running time), proving the problem is solvable in the general case. All our algorithms work for any noise distribution, without any restriction on the variance.
1 Introduction
Epidemic models accurately represent (among other processes) the spread of diseases, information (rumors, viral videos, news stories, etc.), the spread of malevolent agents in a network (computer viruses, malicious apps, etc.), or even biological processes (pathways in cell signaling networks, chains of activation in the gene regulatory network, etc.).
We focus on epidemics that spread on an underlying graph [30], as opposed to the fully mixed models introduced in the early literature [4].
Most settings assume we know the underlying graph and aim to study properties of the spread. Much work has been done in detection [2, 3, 27, 26, 25, 24, 21], where the goal is to decide whether or not there is indeed an infection. This problem is of importance in deciding whether or not a computer network is under attack, for instance, or whether a product gets sold through wordofmouth or thanks to the advertisement campaign (or both [28]). More specifically, the problem of source detection [32, 33, 31, 34, 35] or obfuscation [11, 13, 12] has been extensively studied. On the other side of the spectrum, both experimental and theoretical work has tackled the problem of modeling [7, 36, 17], predicting the growth [6, 38], and controlling the spread of epidemics [9, 10, 18, 14].
In this work, we take the opposite approach: assuming we know some properties of the spread, can we recover the underlying graph? The early works on this subject proposed a few heuristics and experimentally proved their effectiveness [16, 19]. Netrapalli et al. [29] established the first theoretical guarantees for this problem for discretetime infections. They proved one can recover the edges of any graph with correlation decay, with access to the times of infection for multiple cascades spreading on the graph. They introduced a likelihood, proved it decouples into convex subproblems, and demonstrated that the edges of the graph can therefore be obtained efficiently. They also proved a sample complexity lower bound and showed their method is within a log factor of it. Abrahao et al. [1] also introduced a method of solving this problem, this time for a more realistic, continuoustime infection model, through learning only the first edge of each cascade. Zarezade et al. [37] proposed a first experimental attempt to tackle the case of correlated cascades using Hawkes processes. Khim et al. [22] extended the theoretical results to the case where the the cascades spreading on the graph are not independent, which required completely new machinery involving martingales and weighted Pólya urns.
All the results above assume we have perfect knowledge of the properties of the spread we use to reconstruct the graph. For most of the literature, those are the times of infection for all nodes for each cascade. This assumption may hold for online epidemics, as information is usually dated (for instance, posts or retweets on social networks have time stamps). For human networks, however, this assumption is often unrealistic: official diagnosis (and hence recording by any tracking entity such as the CDC) may come days, weeks, or in important examples such as HIV, years after the actual moment of infection. Moreover, this can be highly variable from person to person, hence the infector is often diagnosed after the infectee. Similar issues arise with biological networks: we only know the expression of a gene when a measure is taken, which can happen after a typically arbitrary delay.
We therefore develop a method for learning the graph of epidemics with noisy times of infection. We demonstrate that past approaches are unusable, due to the fact that even small levels of noise are typically enough to cause orderofdiagnosis to differ from orderofinfection. We develop new techniques to deal with this setting, and prove our algorithm can reliably learn the edges of a tree in the limitednoise setting, for any noise distribution. We also show we can learn the structure of any bounded degree graph from a very weak observation model, in an sampleoptimal fashion (up to logfactors). We finally provide an algorithm which learns the weights of any boundeddegree graph in the limitednoise setting.
$G=(V,E)$  Graph $G$, $V$ set of nodes, $E$ set of edges. 

$N$  Number of nodes in the graph. 
${T}_{i}^{m}$  Random variable for the actual time of infection of node $i$ during cascade $m$. 
${n}_{i}^{m}$  Noise of node $i$ during cascade $m$. 
${T}_{i}^{{}^{\prime}m}={T}_{i}^{m}+{n}_{i}^{m}$  Random variable for the noisy time of infection of node $i$ during cascade $m$. 
${I}_{i}^{m}$  Random boolean variable. (${I}_{i}^{{}^{\prime}m}=True\iff $ node $i$ was infected during cascade $m$). 
${p}_{ij}$  Weight of edge $(i,j)$, corresponding to the probability that $i$ infects $j$. 
1.1 Model
We observe epidemics spreading on a graph, and aim to reconstruct the graph structure from noisy estimates of the times of infection. In this section, we specify the exact propagation model, the noisy observation model, and the two learning tasks this work tackles.
Propagation model: We consider a particular variant of the independent cascade model, close to the onestep model introduced by [15] and further studied by [20]. The epidemic spreads in discrete time on a weighted directed graph $G=(V,E)$, where parents can infect their children with probability equal to the weight of the edge between them, but children cannot infect their parents. We allow bidirectional edges: it is possible that both $(i,j)\in E$ and $(j,i)\in E$, possibly with different weights. For each edge $(i,j)\in E$, the corresponding weight ${p}_{ij}$ is such that $$.
This process is an instance of a $Susceptible\to Infected\to Removed$ (SIR) process. Each node starts out in the susceptible state. As in [1], each cascade $m$ starts at a positive time ^{1}^{1} 1 Most of the literature considers the initial time of infection to be 0. This is because when we have access to the exact times of infection, we can make this assumption without loss of generality. In our case, it would imply we know exactly when an outbreak started, which is usually not the case. on a unique node source node, picked uniformly among the nodes of the graph. Once the source becomes infected, it is removed from the graph, and if it has children, each is infected at the next time step independently according to a probability specified by the weight of the edge shared with the source.
The process ends when there are no newly infected nodes (either because no infection happened during the previous time step, in which case some nodes may never be infected, or because all the nodes of the graph are removed). One realization of this process from start to finish is called a cascade. If two nodes are infected during the same cascade, we say that they are coinfected for this cascade. This process is illustrated in Figure 1.
Observation model: Let ${T}_{i}^{m}$ be a random variable corresponding to the time of infection of node $i$ during cascade $m$, and let ${t}_{i}^{m}$ be its realization (if $i$ stays in the susceptible state during cascade $m$, we have ${t}_{i}^{m}=\mathrm{\infty}$). We introduce three observation models.
In the nonoise setting, we have access to the exact times of infection ${T}_{i}^{m}$.
In the limitednoise setting, we never get to observe the exact times of infection ${T}_{i}^{m}$, but only a noisy version ${T}_{i}^{{}^{\prime}m}={T}_{i}^{m}+{n}_{i}^{m}$ (with realization ${t}_{i}^{{}^{\prime}m}$), where all the ${n}_{i}^{m}$ are i.i.d., and represent the noise added to the ${T}_{i}^{m}$. We assume ${n}_{i}^{m}$ follows a known distribution $\mathcal{D}$. The only restriction we put on $\mathcal{D}$ is that it cannot have infinite value (i.e., ${t}_{i}^{{}^{\prime}m}=\mathrm{\infty}\iff {t}_{i}^{m}=\mathrm{\infty}$, and we know for a fact when nodes have been infected or not).
In the extremenoise setting, we take the previous setting to the extreme, and we assume that instead of having access to the noisy times of infection ${T}_{i}^{{}^{\prime}m}$, we only have access to the infection status of the nodes ${I}_{i}^{m}$. We know ${I}_{i}^{m}=True$ if $i$ was infected during cascade $m$, and ${I}_{i}^{m}=False$ otherwise. Note that $$, so we can always deduce the infection status from the noisy times of infection. However, we cannot guess the noisy times of infection from the infection status: the (noisy) times of infections contain strictly more information than the infection status.
For these three settings, we call a sample the vector of all observations for the cascade $m$. In the nonoise setting, this is the extendedinteger vector ${\{{t}_{i}^{m}\}}_{i\in V}$. In the limitednoise setting, this is the extendedinteger vector ${\{{t}_{i}^{{}^{\prime}m}\}}_{i\in V}$. In the extremenoise setting, this is the boolean vector corresponding to the realization of ${\{{I}_{i}^{m}\}}_{i\in V}$. We also use the notation ${T}^{\prime}={\{{T}_{i}^{{}^{\prime}m}\}}_{i\in V}^{m=1\mathrm{\dots}M}$ (respectively ${t}^{\prime}={\{{t}_{i}^{{}^{\prime}m}\}}_{i\in V}^{m=1\mathrm{\dots}M}$) for the matrix representing the random variable (respectively the realizations) of all the samples.
Learning tasks: We focus on two different learning tasks. When we learn the structure of a graph, it means that for any two nodes $i$ and $j$, we can decide whether or not there exists an edge between these two nodes (whatever its direction). When we learn the weights of the graph, it means that for every two nodes $i$ and $j$, we learn the exact value^{2}^{2} 2 When $(i,j)\notin E$, we have ${p}_{ij}=0$. of both ${p}_{ij}$ and ${p}_{ji}$ up to precision $\u03f5$.
1.2 Why is it a hard problem?
1.2.1 Counting approaches
Most approaches in the nonoise setting relate to counting. In our setting, for instance, a natural (and consistent) estimator for ${p}_{ij}$ is to count how often an infection occurred along an edge, and divide it by how often such an infection could have happened:
$$\widehat{{p}_{ij}}=\frac{\text{Number of times}j\text{becomes infected one time step after}i}{\text{Number of times}i\text{was infected before}j}.$$ 
In the nonoise setting, $j$ could only have been infected by a node signaling exactly one time step before $j$. However, in the limitednoise setting, $j$ signaling its infection one time step after $i$ could stem from a variety of scenarios:

•
$i$ could have indeed infected $j$: cases a), b) and c) of Figure LABEL:fig:second above.

•
$j$ could have infected $i$, but the noise flipped the order of signaling: cases d), e) and f).

•
No infection happen between $i$ and $j$, and the probability of infectin depends mainly on another node $k$: cases g), e) and f). This could happen for any other node $k$ in the graph.
The natural estimator introduced earlier is therefore not consistent anymore; instead, it tends to a quantity which depends on ${p}_{ij}$, but also ${p}_{ji}$, and ${p}_{ik},{p}_{ki},{p}_{jk},{p}_{kj}$ as well, for all the other nodes $k$ in the graph. By counting the number of times $j$ became infected one time step after $i$, we are not counting the number of infections along the edge $(i,j)$ anymore, but instead a mixture of all the scenarios described above, which not only include the cases where $j$ infected $i$, but also events in which the cascade spread through another node $k$, and the edge $(i,j)$ was irrelevant to the process. Using this estimator, or any obvious (to us) extension of it, would not only imply learning the wrong weights for the edges, but also learning edges when there are no edges. Our first contribution is therefore to design a new set of estimators, from which we can deduce the value of ${p}_{ij}$ (Sections 2.3 and 3.2).
Adding noise in the time of infection not only reverses the cascade chronology, it also exponentially increased the number of possible infection patterns that could have happened. Bounding the realm of possibilities is therefore our second step towards solving the problem (Section 2.1).
1.2.2 Maxlikelihood approaches
Another common approach is to use likelihoodbased methods. For instance, in [29], the authors develop a maxlikelihoodbased approach to learn the edges of the graph. They prove the loglikelihood has desirable properties: it decouples into only one local problem for each node, and this local problem is convex (up to the change of variable ${\theta}_{ij}=\mathrm{log}(1{p}_{ij})$):
$\mathcal{L}(T,{P}_{ij})$  $$  
$\mathcal{L}(T,{P}_{ij})$  $$ 
In our setting, the loglikelihood has none of these properties. It is not convex, and it is unclear any method other than brute force could find its maximum. Moreover, it does not decouple anymore, and even computing the loglikelihood itself takes exponential time.
$\mathcal{L}(T,{P}_{ij})$  $=\mathrm{log}({\displaystyle \frac{1}{N}}\cdot \underset{\begin{array}{c}\text{This is the root}\\ \text{of the difficulty.}\end{array}}{\underset{\u23df}{{\displaystyle \sum _{({t}_{1},\mathrm{\dots},{t}_{N})\le ({t}_{1}^{\prime}1,\mathrm{\dots},{t}_{N}^{\prime}1)}}}}\cdot {\displaystyle \prod _{1\le i\le N}}\underset{\begin{array}{c}\text{Noise at node}\\ \text{}i\text{is}{t}_{i}^{\prime}{t}_{i}\text{.}\end{array}}{\underset{\u23df}{\left(n({t}_{i}^{\prime}{t}_{i})\right)}}$  
$$  
$$ 
When dealing with hidden variables, a common technique would be to use the ExpectationMaximization algorithm [8]. However, in our setting, the number of hidden states is $\prod _{i=1}^{N}}{t}_{i}^{\prime$, which can be as large as ${(N1)}^{N}$. This prohibits any realistic use of the ExpectationMaximization algorithm for networks with more than twelve nodes. Moreover, except for the recent contributions [23], very little is known about the theoretical convergence of the ExpectationMaximization algorithm.
1.3 Contributions
The contributions of this article are multiple:

•
To the best of our knowledge, we are the first to tackle the problem of learning the edges of a graph from noisy times of infection, a simple but natural extension of a wellknown problem.

•
We provide the first efficient algorithm for learning the structure and weights of a bidirectional tree in this setting. We also establish a treespecific lower bound which shows that our algorithm is sampleoptimal (up to logfactors) for learning the structure of the tree (Section 2).

•
We prove it is possible to learn the structure of any boundeddegree graph in the extreme setting for which we only have access to the infection status (i.e., whether or not a node was infected). Moreover, we can do so with almost optimal sample complexity, according to the bound established in [29].

•
We provide polynomial algorithms for learning the weights of bounded degree graphs.

•
Finally, we extend the results from boundeddegree graphs to general graphs. This proves the problem is solvable under any noise distribution, although the exponential sample complexity and running time prohibits any use of this algorithm in practice (Section 3.2).
2 Learning bidirectional trees
The bidirectional tree is the simplest example which illustrates some of the difficulties of the noisy setting. Indeed, for a directed tree, the true sequence of infections can be reconstructed, and we can use techniques from the nonoise setting. For a bidirectional tree, those techniques cannot be extended. However, the uniqueness of paths in the bidirectional tree still makes this problem considerably easier than the general setting. We therefore start by presenting a solution for the bidirectional tree. The key ideas here generalize to the neighborhoodbased decomposition we introduce below, which forms our key conceptual approach for the general problem.
This section contains three contributions. First, we show how to learn the structure of a tree using only the infection status, i.e., what we call the extremenoise setting (Section 2.1). For each cascade, we only know which nodes were infected. We show this contains enough information to learn the structure of bidirectional tree. Second, we establish a lower bound for the nonoise setting, and show our algorithm has the same dependency in the number of nodes $N$ as this lower bound (up to logfactors). In other words, for the task of learning the structure of any tree, an optimal algorithm in the nonoise setting would need as many cascades as our algorithm needs in the extremenoise setting (up to logfactors).
Finally, we show how we can leverage this learned structure to learn the weights of the tree, this time when we have noisy access to the times of infection, i.e., the limitednoise setting (Section 2.3). We provide sample complexity for this task.
2.1 Tree structure
As illustrated in Section 1.2, the number of edges that could exist is much higher in the limitednoise setting than the number of actual edges in the tree. Our first key contribution is therefore to introduce a new estimator, ${\widehat{h}}_{ij}$, which keeps track of the fraction of cascades for which $i$ and $j$ were both infected. This estimator can therefore be computed only with the infection status in the extremenoise setting. Using this estimator, we show that in the specific case where the graph is a tree, we learn the structure of this tree, $i.e.$ whether or not both ${p}_{ij}={p}_{ij}=0$.
Our algorithm for learning the edges of the tree relies on one central observation: ${h}_{i,j}$ achieves a kind of local maximum if there is an edge between $i$ and $j$ (Lemma 1). This observation relies heavily on the fact that there is uniqueness of paths on a tree. Let us now dive into the proof.
Definition 1.
Let ${\widehat{h}}_{i\mathrm{,}j}$ be the fraction of cascades in which both $i$ and $j$ became infected. We have:
$${\widehat{h}}_{i,j}{\to}_{M\to \mathrm{\infty}}\mathbb{P}({I}_{i}^{m}\&{I}_{j}^{m}):={h}_{i,j}.$$ 
We now show that the limit ${h}_{i,j}$ of the estimator ${\widehat{h}}_{i,j}$ satisfies a local maximum property on the edges of the tree:
Lemma 1.
If $i$ and $j$ are not neighbors, let $\mathrm{(}{u}_{\mathrm{0}}\mathrm{,}{u}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{u}_{d}\mathrm{)}$ be the path between them, with ${u}_{\mathrm{0}}\mathrm{=}i$, ${u}_{d}\mathrm{=}j$, and $d\mathrm{>}\mathrm{1}$. Then:
$$ 
Proof.
We consider the case in which both $i$ and $j$ have been infected. There is a unique source of infection and a unique path between $i$ and $j$. Therefore, all the nodes on the path from $i$ to $j$ must have been infected as well. In particular, both ${u}_{r}$ and ${u}_{r+1}$ were infected. This shows ${I}_{i}^{m}\&{I}_{j}^{m}\Rightarrow {I}_{{u}_{r}}^{m}\&{I}_{{u}_{r+1}}^{m}$, so $\mathbb{P}({I}_{i}^{m}\&{I}_{j}^{m})\le \mathbb{P}({I}_{{u}_{r}}^{m}\&{I}_{{u}_{r+1}}^{m})$, and therefore ${h}_{i,j}\le {h}_{{u}_{r},{u}_{r+1}}$.
What’s more, every time ${u}_{r}$ and ${u}_{r+1}$ became infected, at least one more infection along an edge must have occurred in order for $j$ to become infected as well. This occurred with probability at most $$. Therefore, ${h}_{i,j}=\mathbb{P}({I}_{i}^{m}\&{I}_{j}^{m})\le {p}_{max}\cdot \mathbb{P}({I}_{{u}_{r}}^{m}\&{I}_{{u}_{r+1}}^{m})={p}_{max}\cdot {h}_{{u}_{r},{u}_{r+1}}$. We conclude $$.
∎
This simple lemma allows us to design Algorithm 2.1. Indeed, suppose we have access to all the limits ${h}_{i,j}$. By ordering them in decreasing order, we can deduce the structure of the tree by greedily adding every edge unless it forms a cycle^{3}^{3} 3 This algorithm is very similar in spirit to Kruskal’s algorithm for finding the maximum spanning tree .
[H] {algorithmic}[1] \ProcedureLearnTree${\{{h}_{i,j}\}}_{i,j\in V}$ \Comment${h}_{i,j}$ limit of ${\widehat{h}}_{i,j}$. \State$pairs\mathrm{\_}h\mathrm{\_}edge\leftarrow [({h}_{i,j},(i,j))$ for $1\le i\le j\le {n}_{nodes}]$ \State$\mathrm{\text{Sort}}(pairs\mathrm{\_}h\mathrm{\_}edge)$ by decreasing order \State$edges\mathrm{\_}tree\leftarrow []$ \For$(\sim ,potential\mathrm{\_}edge)\in pairs\mathrm{\_}h\mathrm{\_}edge$ \IfAdding $potential\mathrm{\_}edge$ to $edges\mathrm{\_}tree$ does not create a cycle \StateAdd $potential\mathrm{\_}edge$ to $edges\mathrm{\_}tree$ \EndIf\EndFor\Return$edges\mathrm{\_}tree$ \EndProcedure We show that if we have access to the limits ${h}_{i,j}$ of the estimators ${\widehat{h}}_{i,j}$, the algorithm above correctly find the structure of the tree.
Lemma 2.
Algorithm 2.1 correctly finds all the $N\mathrm{}\mathrm{1}$ pairs $\mathrm{(}i\mathrm{,}j\mathrm{)}$ such that there exists at least one directed edge between $i$ and $j$.
Proof.
We show that in the forloop at line 1, we add an edge to $edges\mathrm{\_}tree$ if and only if this edge was a real edge in the original tree. We prove it by induction on the elements of the sorted list $pairs\mathrm{\_}h\mathrm{\_}edge$.
When no element has been selected, the proposition is trivially true.
Suppose now that $t$ elements of $pairs\mathrm{\_}h\mathrm{\_}edge$ have been examined so far. Let $(\sim ,(i,j))$ be the $t+{1}^{th}$ element. Two cases arise:

1.
$i$ and $j$ are not neighbors. Let $({u}_{0},\mathrm{\dots},{u}_{d})$ be the path between them, with ${u}_{0}=i$ and ${u}_{d}=j$. In this case, using Lemma 1, $\forall r,{h}_{{u}_{r},{u}_{r+1}}>{h}_{i,j}$. In other words, all the pairs $({u}_{r},{u}_{r+1})$ have already been considered by the algorithm. By induction, we have kept all of them in $edges\mathrm{\_}tree$. Therefore, adding the pair $(i,j)$ would form a cycle. This pair is not kept in $edges\mathrm{\_}tree$, which is what we wanted since it is not an edge in the original tree.

2.
$i$ and $j$ are neighbors. Suppose that adding this pair forms a cycle. Then there is a sequence $({v}_{0}=i,\mathrm{\dots},{v}_{d}=j)$ of nodes such that ${h}_{{v}_{k},{v}_{k+1}}$ were all bigger than ${h}_{i,j}$, and the pairs $({v}_{k},{v}_{k+1})$ were kept by the algorithm for all $k$. However, by uniqueness of paths in a tree, there exists a pair $({v}_{a},{v}_{a+1})$ such that the path connecting ${v}_{a}$ and ${v}_{a+1}$ in the original tree goes through $(i,j)$. Using Lemma 1, this means ${h}_{i,j}>{h}_{{v}_{a},{v}_{a+1}}$, which is a contradiction. Therefore, adding this pair in $edges\mathrm{\_}tree$ does not form a cycle. This pair is kept in $edges\mathrm{\_}tree$.
Therefore, this algorithm keeps all the edges, and only the edges of the tree, so it recovers the tree structure.
∎
We next quantify how many cascades $M$ are needed for Algorithm 2.1 to be correct if we replace the ${\{{h}_{i,j}\}}_{i,j\in V}$ by their estimates ${\{{\widehat{h}}_{i,j}\}}_{i,j\in V}$. We note that we do not require ${\widehat{h}}_{i,j}$ to be close to their limit, but only need the order of the ${\widehat{h}}_{i,j}$ to be the same as the order of the ${h}_{i,j}$. We identify events which guarantee that the order is the same (Corollary 1):
Definition 2.
Let ${\mathrm{H}}_{\mathrm{3}}\mathrm{:=}\mathrm{\{}\mathrm{(}i\mathrm{,}j\mathrm{,}k\mathrm{)}\mathrm{\in}{\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}N\mathrm{\}}}^{\mathrm{3}}\mathrm{,}{p}_{i\mathit{}j}\mathrm{+}{p}_{j\mathit{}i}\mathrm{>}\mathrm{0}\mathrm{\&}{p}_{j\mathit{}k}\mathrm{+}{p}_{j\mathit{}k}\mathrm{>}\mathrm{0}\mathrm{\}}$ be the set of triplets of nodes such that at least one directed edge exists between the first and the second node, as well as between the second and the third node.
Proposition 1.
If:
$$\forall (i,j,k)\in {\mathscr{H}}_{3},{\widehat{h}}_{i,j}>{\widehat{h}}_{i,k}\mathit{\text{and}}{\widehat{h}}_{j,k}{\widehat{h}}_{i,k},$$ 
then for all paths $\mathrm{(}{u}_{\mathrm{0}}\mathrm{,}\mathrm{\dots}\mathrm{,}{u}_{d}\mathrm{)}$ in the tree, with $d\mathrm{>}\mathrm{1}$, we have:
$$\forall r\in \{0,\mathrm{\dots},d1\},{\widehat{h}}_{{u}_{r},{u}_{r+1}}>{\widehat{h}}_{{u}_{0},{u}_{d}}.$$ 
Proof.
For $r\in \{0,\mathrm{\dots},d2\}$, we have by hypothesis ${\widehat{h}}_{{u}_{r},{u}_{r+1}}>{\widehat{h}}_{{u}_{r},{u}_{r+2}}$. Now, we recall that ${\widehat{h}}_{{u}_{0},{u}_{d}}$ is the number of cascades for which both ${u}_{0}$ and ${u}_{d}$ were infected. By uniqueness of paths in the tree, every time both ${u}_{0}$ and ${u}_{d}$ were infected, both ${u}_{r}$ and ${u}_{r+2}$ must have been infected as well. This shows that ${\widehat{h}}_{{u}_{0},{u}_{d}}\le {\widehat{h}}_{{u}_{r},{u}_{r+2}}$. Notice that this is a deterministic property, not an asymptotic property. Therefore, ${\widehat{h}}_{{u}_{r},{u}_{r+1}}>{\widehat{h}}_{{u}_{r},{u}_{r+2}}\ge {\widehat{h}}_{{u}_{0},{u}_{d}}$.
For $r=d1$, we follow an identical reasoning, but with ${\widehat{h}}_{{u}_{r},{u}_{r+1}}>{\widehat{h}}_{{u}_{r1},{u}_{r+1}}$.
Corollary 1.
If:
$$\forall (i,j,k)\in {\mathscr{H}}_{3},{\widehat{h}}_{i,j}>{\widehat{h}}_{i,k}\mathit{\text{and}}{\widehat{h}}_{j,k}{\widehat{h}}_{i,k},$$ 
then the correctness of Algorithm 2.1 is preserved when given $\widehat{h}$ as input instead of h.
In other words, Algorithm 2.1 outputs a correct set of undirected edges with finite samples.
Proof.
According to Proposition 1, for all paths $({u}_{0},\mathrm{\dots},{u}_{d})$ in the tree, with $d>1$, we have that $\forall r\in \{0,\mathrm{\dots},d1\},{\widehat{h}}_{{u}_{r},{u}_{r+1}}>{\widehat{h}}_{{u}_{0},{u}_{d}}$. As shown in the proof of Lemma 2, this is the only property of the input needed in order to yield the correct output.
Proposition 2.
With $M\mathrm{=}\frac{N\mathit{}\mathrm{\left(}\mathrm{log}\mathit{}\mathrm{\left(}\frac{\mathrm{1}}{\delta}\mathrm{\right)}\mathrm{+}\mathrm{2}\mathit{}\mathrm{log}\mathit{}\mathrm{(}N\mathrm{)}\mathrm{\right)}}{{p}_{m\mathit{}i\mathit{}n}\mathit{}\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}$ cascades, with probability at least $\mathrm{1}\mathrm{}\delta $, we have:
$$\forall (i,j,k)\in {\mathscr{H}}_{3},{\widehat{h}}_{i,j}>{\widehat{h}}_{i,k}\mathit{\text{and}}{\widehat{h}}_{j,k}{\widehat{h}}_{i,k}.$$ 
Proof.
Let us consider one triplet $(i,j,k)$ in ${\mathscr{H}}_{3}$. We recall that ${\widehat{h}}_{i,k}$ is the number of cascades for which both $i$ and $k$ were infected. Since the only path from $i$ to $k$ is through $j$, we always have that ${\widehat{h}}_{i,j}\ge {\widehat{h}}_{i,k}\text{and}{\widehat{h}}_{j,k}\ge {\widehat{h}}_{i,k}$. We notice that to obtain ${\widehat{h}}_{i,j}>{\widehat{h}}_{i,k}$, we only need one cascade for which both $i$ and $j$ got infected, but not $k$. We lower bound the probability ${P}_{\text{triplet identified}}$ of this cascade happening. For each cascade $m$, we have:
${P}_{\text{triplet identified}}$  $=\mathbb{P}({I}_{i}^{m}\&{I}_{j}^{m}\&\mathrm{\text{Not}}({I}_{k}^{m}))$  
$\ge \mathbb{P}(i\text{was a source,}i\text{infected}j\text{,}j\text{did not infect}k)$  
$\mathrm{\hspace{1em}}+\mathbb{P}(j\text{was a source,}j\text{infected}i\text{,}j\text{did not infect}k)$  
$\ge {\displaystyle \frac{1}{N}}\cdot {p}_{ij}\cdot (1{p}_{jk})+{\displaystyle \frac{1}{N}}\cdot {p}_{ji}\cdot (1{p}_{jk})$  
$\ge {\displaystyle \frac{1}{N}}{p}_{min}(1{p}_{max}).$ 
The probability that this event never occurs during the $M$ cascades is upper bounded by:
$\mathbb{P}({\widehat{h}}_{i,j}={\widehat{h}}_{i,k})$  $\le {\left(1{\displaystyle \frac{1}{N}}{p}_{min}(1{p}_{max})\right)}^{M}$  
$\le {e}^{\frac{M}{N}{p}_{min}(1{p}_{max})}$  
$\le {\displaystyle \frac{\delta}{{N}^{2}}}.$ 
Now, there are $N1$ edges in a tree, therefore $$. By union bound:
$\mathbb{P}({\displaystyle \bigcup _{(i,j,k)\in {\mathscr{H}}_{3}}}{\widehat{h}}_{i,j}={\widehat{h}}_{i,k})$  $\le {\displaystyle \sum _{(i,j,k)\in {\mathscr{H}}_{3})}}\mathbb{P}({\widehat{h}}_{i,j}={\widehat{h}}_{i,k})$  
$$  
$\le \delta .$ 
Notice that ${\mathscr{H}}_{3}$ contains both $(i,j,k)$ and $(k,j,i)$. We have therefore proven that with probability at least $1\delta $, when considering $M=\frac{N\left(\mathrm{log}\left(\frac{1}{\delta}\right)+2\mathrm{log}(N)\right)}{{p}_{min}(1{p}_{max})}$ cascades, we have $\forall (i,j,k)\in {\mathscr{H}}_{3},{\widehat{h}}_{i,j}>{\widehat{h}}_{i,k}\text{and}{\widehat{h}}_{j,k}{\widehat{h}}_{i,k}.$
Putting together Proposition 2 and Corollary 1, we obtain our first theorem for learning the undirected edges with finite samples:
Theorem 1.
With $M\mathrm{=}\frac{N\mathit{}\mathrm{\left(}\mathrm{log}\mathit{}\mathrm{\left(}\frac{\mathrm{1}}{\delta}\mathrm{\right)}\mathrm{+}\mathrm{2}\mathit{}\mathrm{log}\mathit{}\mathrm{(}N\mathrm{)}\mathrm{\right)}}{{p}_{m\mathit{}i\mathit{}n}\mathit{}\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}$ cascades, with probability at least $\mathrm{1}\mathrm{}\delta $, we can learn the structure of a any bidirectional tree in the extremenoise setting, i.e., when we only have access to the infection status of the nodes.
2.2 Lower bound
In this section, we prove a lower bound for trees in the nonoise setting. With very minor adjustments, we adapt the lower bound of [29]. Since for a general tree, the max degree can be up to $N1$, we design a lower bound which is independent from the maxdegree. Let $G$ be a tree drawn uniformly from $\mathcal{G}$, the set of all possible trees on $N$ nodes, and $\widehat{G}$ be the reconstructed graph from the times of infection. $G\leftrightarrow T\leftrightarrow \widehat{G}$ therefore forms a Markov chain. We have:
$H(G)$  $=I(G;\widehat{G})+H(G\widehat{G})$  
(data processing inequality)  $\le I(G;T)+H(G\widehat{G})$  
(independent cascades)  $\le M\cdot I({T}^{1};{T}^{{}^{\prime}1})+H(G\widehat{G})$  
($I(X,Y)\le H(X)$)  $=M\cdot {\displaystyle \sum _{i=1}^{N}}H({T}_{i}^{1})+H(G\widehat{G})$  
(Fano’s inequality)  $\le M\cdot {\displaystyle \sum _{i=1}^{N}}H({T}_{i}^{1})+(1+{P}_{e}\mathrm{log}(\mathcal{G}))$ 
Since $G$ is drawn uniformly from $\mathcal{G}$, $H(G)=\mathrm{log}(\mathcal{G})$. There are ${N}^{N2}$ trees on $N$ nodes, according to Cayley’s formula [5], so $H(G)=(N2)\cdot \mathrm{log}(N)$.
In conclusion:
$$M\ge \frac{(1{P}_{e})(N2)\mathrm{log}(N)1}{N\cdot H({T}_{i}^{1})}=\mathrm{\Omega}\left(\frac{\mathrm{log}(N)}{H({T}_{i}^{1})}\right)$$ 
Using the same kind of techniques as in [29], we can assume $H({T}_{i}^{1})\sim \frac{\mathrm{log}(N)}{N}$. Therefore:
Theorem 2.
In the nonoise setting, we need $M\mathrm{=}\mathrm{\Omega}\mathit{}\mathrm{\left(}N\mathrm{\right)}$ cascades to learn the tree structure.
In our extremenoise setting, when we have only access to the infection status of the nodes, we can learn the tree structure with the same sample complexity (up to logfactors) as the nonoise setting!
2.3 Tree weights
In this section, we assume we are in the limitednoise setting, and we have access to the times of infection. We also assume we have already learned the structure of the tree.
Once we have reduced the set of possible edges by learning the structure of the bidirectional tree, learning the weights of the edges is still nontrivial. Indeed, from ${T}_{i}^{{}^{\prime}m}$ and ${T}_{j}^{{}^{\prime}m}$, it is still impossible to know whether this sample is useful for estimating ${p}_{ij}$ (case when $i$ infected $j$), or whether we should use this sample for estimating ${p}_{ji}$ instead (case when $j$ infected $i$). What is more, we only get one sample per node and per cascade, so it is impossible to know what really happened during that cascade. However, knowing the distribution of the noise, it is possible to compute the probability that the noise maintained the order of infections. Using this information and the reduced set of known undirected edges, we can compute two sets of $N(N1)$ estimators, from which it is possible to infer the weights of all edges in the tree.
We introduce these two sets of $N(N1)$ estimators, or, in other words, two estimators for each directed edge. These estimators tend to multivariate polynomials of the weights of most edges of the tree. Thus in general these polynomials have exponentially many terms; however, when $i$ and $j$ are neighbors, it is possible to express them concisely using a quantity ${\mathcal{P}}_{\overline{)j}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to i)$, which we define formally below. This succinct representation is the key idea we exploit to solve the resulting system of equations.
Once we know the structure of the bidirectional tree, we can consider the four estimators for each undirected edge (two estimators for each directed edge). They form a system of four equations and four unknowns, which we solve to obtain the weights of the edges.
Definition 3.
${\mathcal{P}}_{\overline{)i}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to j)$ is the probability that $j$ became infected before any node on the path from $j$ to $i$, including $i$, became infected.
We now introduce the estimators:
Definition 4.
We introduce 2 sets of $N\mathit{}\mathrm{(}N\mathrm{}\mathrm{1}\mathrm{)}$ estimators:
$$  $=\mathit{\text{Fraction of infections for which}}i\mathit{\text{and}}j\mathit{\text{got infected, and}}i\mathit{\text{reported before}}j\mathit{\text{.}}$  
${\widehat{g}}_{i,\overline{)j}}$  $=\mathit{\text{Fraction of infections for which}}i\mathit{\text{got infected, but}}j\mathit{\text{did not.}}$ 
By the law of large numbers, as the number of cascades scales, $$ tends to $$ and ${\widehat{g}}_{i,\overline{)j}}$ to ${g}_{i,\overline{)j}}$, where
$$  $:=$  $$  
${g}_{i,\overline{)j}}$  $:=$  $$ 
We now compute the exact values of these two quantities. Let us assume the (unique) path between $i$ and $j$ has length $d$. We call $({u}_{0},\mathrm{\dots},{u}_{d})$ the set of nodes on the path from $i$ to $j$, with ${u}_{0}=i$ and ${u}_{d}=j$. We then have:
Lemma 3.
Recall $$ and ${g}_{i\mathrm{,}\overline{)j}}$ are the expectation of the estimators defined above. We have:
$$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{i\to j}\cdot {s}_{d+1}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{j\to i}\cdot {s}_{d+1}$  
$\mathrm{\hspace{1em}}+{\displaystyle \sum _{l=1}^{d1}}{\mathcal{P}}_{\overline{)i},\overline{)j}}(\to {u}_{l})\cdot {p}_{{u}_{l}\to i}\cdot {p}_{{u}_{l}\to j}\cdot {s}_{2ld+1}.$  
${\widehat{g}}_{i,\overline{)j}}$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{i\to j})+{\displaystyle \sum _{l=1}^{d1}}{\mathcal{P}}_{\overline{)i},\overline{)j}}(\to {u}_{l})\cdot {p}_{{u}_{l}\to i}\cdot (1{p}_{{u}_{l}\to j}).$ 
What’s more, when $i$ and $j$ are neighbors (which implies $d\mathrm{=}\mathrm{1}$), the expressions simplify to:
$$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{2}$  
${g}_{i,\overline{)j}}$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{ij}).$ 
Proof.
$$  $$  
$=\mathbb{P}(i\text{got infected before any nodes on the path from}i\text{to}j\text{,}$  
$\mathrm{\hspace{1em}}\text{then infected}j\text{, and the noise did not flip the times of infection})$  
$\mathrm{\hspace{1em}}+\mathbb{P}(j\text{got infected before any nodes on the path from}i\text{to}j\text{,}$  
$\mathrm{\hspace{1em}}\text{then infected}i\text{, and the noise flipped the times of infection})$  
$\mathrm{\hspace{1em}}+\mathbb{P}(\text{One node on the path from}i\text{to}j\text{got infected before}i\text{and}j\text{,}$  
$\mathrm{\hspace{1em}}\text{then infected both}i\text{and}j\text{, but}i\text{reported before}j)$  
$={\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{i\to j}\cdot {s}_{d+1}$  
$\mathrm{\hspace{1em}}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{j\to i}\cdot {s}_{d+1}$  
$\mathrm{\hspace{1em}}+{\displaystyle \sum _{l=1}^{d1}}{\mathcal{P}}_{\overline{)i},\overline{)j}}(\to {u}_{l})\cdot {p}_{{u}_{l}\to i}\cdot {p}_{{u}_{l}\to j}\cdot {s}_{2ld+1}.$ 
This expression is involved in general. However, if $i$ and $j$ are neighbors, then there are no nodes ${u}_{l}$ on the path between $i$ and $j$, other than $i$ and $j$ themselves. What’s more, ${p}_{i\to j}={p}_{ij}$, and $d=1$. Therefore:
$$ 
Let us now focus on ${g}_{i,\overline{)j}}$:
${g}_{i,\overline{)j}}$  $$  
$=\mathbb{P}(i\text{got infected before any nodes on the path from}i\text{to}j\text{, but}j\text{did not get infected})$  
$\mathrm{\hspace{1em}}+\mathbb{P}(\text{One node on the path from}i\text{to}j\text{got infected before}i\text{and}j\text{, then infected}i\text{but not}j)$  
$={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{i\to j})$  
$\mathrm{\hspace{1em}}+{\displaystyle \sum _{l=1}^{d1}}{\mathcal{P}}_{\overline{)i},\overline{)j}}(\to {u}_{l})\cdot {p}_{{u}_{l}\to i}\cdot (1{p}_{{u}_{l}\to j}).$ 
As before, this expression is complex in general, but simplifies if $i$ and $j$ are neighbors, in which case:
$${g}_{i,\overline{)j}}={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{ij}).$$ 
∎
Using the simplified expression only for when $i$ and $j$ are neighbors, we obtain:
Proposition 3.
If we know $\mathrm{(}i\mathrm{,}j\mathrm{)}$ is an edge in the original tree, then the probability of infection along this edge is given by:
$$ 
Proof.
According to Lemma 3, we had four secondorder equations, with 4 unknowns: ${p}_{ij}$, ${p}_{ji}$, ${\mathcal{P}}_{\overline{)j}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to i)$ and ${\mathcal{P}}_{\overline{)i}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to j)$. We solve it, and obtain the wanted result. See Appendix A for details.
∎
Combining all the pieces, we obtain our first theorem for infinite samples:
Theorem 3.
It is possible to learn the weights of a bidirectional tree in the limitednoise setting.
Now that we have proven the problem is solvable, we establish the number of samples needed to learn the weights with the method above.
Lemma 4.
With $M\mathrm{=}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{\mathrm{6}}{\delta}\mathrm{\right)}\mathit{}\frac{{\mathrm{\left(}\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{)}\mathit{}{p}_{m\mathit{}a\mathit{}x}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{\right)}}^{\mathrm{2}}}{{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{2}}}$ samples, with probability at least $\mathrm{1}\mathrm{}\delta $, we have:
$${\widehat{p}}_{ij}{p}_{ij}\le \u03f5.$$ 
Proof.
Using Hoeffding’s inequality:
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{g}}_{i,\overline{)j}}{g}_{i,\overline{)j}}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}}.$ 
Choosing $M=\frac{1}{{\u03f5}_{1}^{2}}\mathrm{log}\left(\frac{6}{\delta}\right)$, we have that with probability at least $1\delta $, all the following hold:
$$  $\le {\u03f5}_{1},$  
$$  $\le {\u03f5}_{1},$  
${\widehat{g}}_{i,\overline{)j}}{g}_{i,\overline{)j}}$  $\le {\u03f5}_{1}.$ 
Hence, with probability at least $1\delta $, we have (see Appendix A for details):
${\widehat{p}}_{ij}{p}_{ij}$  $$  
$$ 
We use the results from Lemma 3 to bound the denominator by $\frac{{s}_{0}^{2}{s}_{2}^{2}}{N}$. In the end, we obtain:
$${\widehat{p}}_{ij}{p}_{ij}\le {\u03f5}_{1}N\frac{({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}}{{s}_{0}^{2}{s}_{2}^{2}}+o({\u03f5}_{1}).$$ 
We choose ${\u03f5}_{1}=\frac{\u03f5}{N}\frac{{s}_{0}^{2}{s}_{2}^{2}}{({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}}$. Therefore:
With $M=\frac{{N}^{2}}{{\u03f5}^{2}}\mathrm{log}\left(\frac{6}{\delta}\right)\frac{{\left(({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}\right)}^{2}}{{({s}_{0}^{2}{s}_{2}^{2})}^{2}}$ samples, with probability at least $1\delta $, we have ${\widehat{p}}_{ij}{p}_{ij}\le \u03f5$.
∎
By a union bound on all the weights of the tree, knowing there are at most $$ directed edges in a directed tree, we obtain the following sample complexity:
Theorem 4.
With $M\mathrm{=}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{\mathrm{12}\mathit{}{N}^{\mathrm{2}}}{\delta}\mathrm{\right)}\mathit{}\frac{{\mathrm{\left(}\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{)}\mathit{}{p}_{m\mathit{}a\mathit{}x}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{\right)}}^{\mathrm{2}}}{{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{2}}}$ cascades, with probability $\mathrm{1}\mathrm{}\delta $, we can learn all the weights of the edges of a bidirectional tree in the limitednoise setting, i.e., when we only have access to the noisy times of infection.
3 Boundeddegree graphs
In the previous section, the algorithm presented relies heavily on the uniqueness of paths. This property implies that we can deduce the edges from the nodes which are coinfected the most often. However, this is not true for a general boundeddegree graph. In Figure 4, we can see that the two nodes $i$ and $j$ would be coinfected frequently despite not sharing an edge. This makes the task of finding the structure much more challenging than for the bidirectional tree.
In this section, we show how the main ideas for learning the structure of the bidirectional tree can be extended for learning the structure of general boundeddegree graphs, in the extremenoise setting, with almost optimal sample complexity. The framework for learning the weights of the edges in the limitednoise setting is  to the best of our knowledge  not extendable to general boundeddegree graphs; we therefore develop a new algorithm to learn the weights for general boundeddegree graphs.
3.1 Boundeddegree structure
In the previous section, we introduced the estimator ${\widehat{h}}_{i,j}$, which records the fraction of cascades for which both $i$ and $j$ are infected. From a local maximum property of this estimator, we deduced the structure of the tree, in a sample efficient fashion. Indeed, if there exists a path between $i$ and $k$, and the first edge on this path is $(i,j)$, then if $i$ and $k$ are infected, $j$ must have been infected as well.
We want to build on this idea for a boundeddegree graph of maximum degree $d$. However, for such a graph, there may be multiple paths leading from $i$ to $j$, and we cannot guarantee a single node will be infected each time. However, if $i$ is a node, ${\mathcal{N}}_{i}$ is its neighborhood, and $k\notin {\mathcal{N}}_{i}$ is another node of the graph, we can guarantee that if both $i$ and $k$ are infected, there exists a node in ${\mathcal{N}}_{i}$ which is infected. Moreover, ${\mathcal{N}}_{i}$ is the set of smallest size for which at least one node is infected at the same time as $i$ the most frequently. This leads us to a new set of estimators:
Definition 5.
Let $i$ be a node of the graph, and let $S$ be a set such that $$ and $i\mathrm{\notin}S$. We define a new set of estimators:
${\widehat{h}}_{i,S}$  $=\mathit{\text{Fraction of cascades for which}}i\mathit{\text{is infected, and at least one node of S is infected.}}$ 
Let us assume that for each pair $(i,S)$, we have access to the limit ${h}_{i,S}$ of ${\widehat{h}}_{i,S}$. We now introduce an algorithm showing how to leverage these limits to learn the structure of any boundeddegree graph. {algorithm}[H] {algorithmic}[1] \ProcedureLearnGraph${\{{h}_{i,S}\}}_{i\in V}^{S\le d}$ \State$edges\leftarrow []$ \For$i=1\mathrm{\dots}{n}_{nodes}$ \State$S\mathrm{\_}max\mathrm{\_}i\leftarrow $ set such that ${h}_{i,S\mathrm{\_}max\mathrm{\_}i}$ is maximal, and such that $\mathrm{\text{Size}}(S\mathrm{\_}max\mathrm{\_}i)$ is minimal. \For${n}_{j}$ in $S\mathrm{\_}max\mathrm{\_}i$ \StateAdd edge $(i,{n}_{j})$ to $edges$ \EndFor\EndFor\Return$edges$ \EndProcedure
We show this algorithm is correct.
Lemma 5.
Algorithm 3.1 correctly finds the neighborhood of each node.
Proof.
Let us recall that ${h}_{i,S}$ is the probability that node $i$ and at least one node of $S$ are coinfected. To prove the correctness of this algorithm, it suffices to prove:
$$\forall i\in V,\forall Ss.t.S\le d,{h}_{i,S}\ge {h}_{i,{\mathcal{N}}_{i}}\u27f9{\mathcal{N}}_{i}\subseteq S$$ 
Let pick a set $S$ such that ${\mathcal{N}}_{i}\setminus S\ne \mathrm{\varnothing}$, and let $k$ be a node in ${\mathcal{N}}_{i}\setminus S$. We know ${h}_{i,S\cup \{k\}}\ge {h}_{i,S}+\mathbb{P}(i\text{and}k\text{are the only infected nodes})$. Since $i$ and $k$ are neighbors, $\mathbb{P}(i\text{and}k\text{are the only infected nodes})0$, and therefore ${h}_{i,S\cup \{k\}}>{h}_{i,S}$. Following this line of reasoning, if ${\mathcal{N}}_{i}\setminus S\ne \mathrm{\varnothing}$, $S$ we can always increase the value of ${h}_{i,S}$ by adding a node of ${\mathcal{N}}_{i}$.
However, it is impossible to increase the value of ${h}_{i,{\mathcal{N}}_{i}}$, because if $i$ and any other node of the graph are coinfected, we know one node of ${\mathcal{N}}_{i}$ is also infected. Therefore, the algorithm is correct.
∎
Unfortunately, we do not have direct access to ${h}_{i,S}$. We therefore study how many samples are needed to replace ${h}_{i,S}$ by its estimate ${\widehat{h}}_{i,S}$ while preserving the correctness of the algorithm. Just like in Section 2.1, we notice that we do not need the ${\{{\widehat{h}}_{i,S}\}}_{i\in V}^{S\le d}$ to be close to their limit, we only need the indexes of their ordering to be the same. We notice that the neighborhood ${\mathcal{N}}_{i}$ of $i$ is the set of smallest size which is the most often infected at the same time as $i$ is (${\mathcal{N}}_{i}=\underset{R\le d}{\mathrm{arg}\mathrm{max}}{h}_{i,R}$). Therefore, we must have that for every set $S$, ${\widehat{h}}_{i,S}\le {\widehat{h}}_{i,{\mathcal{N}}_{i}}$. However, some other sets might achieve the same value if we do not observe enough cascades. This could happen in two cases:

•
Not all the nodes of the neighborhood were infected, and therefore a subset ${T}_{1}\subset {\mathcal{N}}_{i}$ of the neighborhood is such that ${\widehat{h}}_{i,{T}_{1}}={\widehat{h}}_{i,{\mathcal{N}}_{i}}$. Since $$, the algorithm would return ${T}_{1}$ and not ${\mathcal{N}}_{i}$, which would be a failure case.

•
Some other node $k$ is always infected every time a specific node $j\in {\mathcal{N}}_{i}$ is infected. The set ${T}_{2}={\mathcal{N}}_{i}\setminus \{j\}\cup \{k\}$ will therefore be such that ${\widehat{h}}_{i,{T}_{2}}={\widehat{h}}_{i,{\mathcal{N}}_{i}}$, and the algorithm would not know which maximum to pick. This is a failure case as well.
We identify events which guarantee that the failure cases above cannot arise. Let ${E}_{i,j}^{m}$ be the event that $i$ and $j$ were the only infected nodes during cascade $m$. ${E}_{i,j}={\displaystyle \bigcup _{1\le m\le M}}{E}_{i,j}^{m}$ is therefore the event that there exists a cascade for which only $i$ and $j$ were infected. If such a cascade exists, the set ${S}_{max}=\underset{R\le d}{\mathrm{arg}\mathrm{max}}{\widehat{h}}_{i,R}$ must contain $j$. If the event ${E}_{i,j}$ happens for every node $j$ in the neighborhood of $i$, we can therefore guarantee Algorithm 3.1 is correct. We characterize the sample complexity needed for this below.
Proposition 4.
Let $i$ be a node, and let $j$ be one of its neighbors. Let $S$ be a set of size $\mathrm{}S\mathrm{}\mathrm{\le}d$, such that $i\mathrm{\notin}S$ and $j\mathrm{\notin}S$. With probability at least $\mathrm{1}\mathrm{}\frac{\delta}{d\mathrm{\cdot}{N}^{d\mathrm{+}\mathrm{2}}}$, among $M\mathrm{=}\frac{\mathrm{(}d\mathrm{+}\mathrm{2}\mathrm{)}\mathrm{\cdot}N\mathit{}\mathrm{log}\mathit{}\mathrm{(}N\mathrm{)}\mathrm{+}N\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{d}{\delta}\mathrm{\right)}}{{p}_{m\mathit{}i\mathit{}n}\mathit{}{\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}^{\mathrm{2}\mathit{}\mathrm{(}d\mathrm{}\mathrm{1}\mathrm{)}}}$ cascades, there exists a cascade in which $i$ and $j$ are infected, but no nodes of $S$ are infected.
Proof.
We notice that if, during cascade $m$, the only infected nodes are $i$ and $j$, no nodes of $S$ are infected. This is exactly the event ${E}_{i,j}^{m}$ defined above.
$\mathbb{P}({E}_{i,j}^{m})$  $=\mathbb{P}(i\text{and}j\text{are the only infected nodes during cascade}m)$  
$\ge \mathbb{P}(\text{cascade started at}i\text{,}j\text{was the only infected neighbor of}i\text{, and}j\text{did not infect any nodes})$  
$\ge {\displaystyle \frac{1}{N}}\cdot {p}_{min}\cdot {(1{p}_{max})}^{2(d1)}.$ 
The probability that this never happens during $M$ cascade is exactly the complement of the event ${E}_{i,j}$ defined above:
$\mathbb{P}(\mathrm{\text{Not}}({E}_{i,j}))$  $\le {\left(1{\displaystyle \frac{1}{N}}\cdot {p}_{min}\cdot {(1{p}_{max})}^{2(d1)}\right)}^{M}$  
$\le {e}^{M\cdot \frac{{p}_{min}}{N}\cdot {(1{p}_{max})}^{2(d1)}}$  
$\le {\displaystyle \frac{\delta}{d\cdot {N}^{d+2}}}.$ 
∎
Lemma 6.
With probability at least $\mathrm{1}\mathrm{}\delta $, if we observe $M\mathrm{=}\frac{\mathrm{(}d\mathrm{+}\mathrm{2}\mathrm{)}\mathrm{\cdot}N\mathit{}\mathrm{log}\mathit{}\mathrm{(}N\mathrm{)}\mathrm{+}N\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{d}{\delta}\mathrm{\right)}}{{p}_{m\mathit{}i\mathit{}n}\mathit{}{\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}^{\mathrm{2}\mathit{}\mathrm{(}d\mathrm{}\mathrm{1}\mathrm{)}}}$ cascades, Algorithm 3.1 is correct, and has running time $\mathrm{O}\mathit{}\mathrm{(}M\mathrm{\cdot}{N}^{d\mathrm{+}\mathrm{2}}\mathrm{)}\mathrm{=}\mathrm{O}\mathit{}\mathrm{(}d\mathrm{\cdot}{N}^{d\mathrm{+}\mathrm{3}}\mathit{}\mathrm{log}\mathit{}\mathrm{(}N\mathrm{)}\mathrm{)}$.
Proof.
Let $\mathcal{S}=\{S\in \mathcal{P}(V),S\le d\}$ be the set of sets of nodes of size at most $d$, let ${\mathcal{S}}_{\overline{)i}}=\{S\in \mathcal{S},i\notin S\}$ be the set of sets of $\mathcal{S}$ which do not contain $i$, and let ${\mathcal{N}}_{i}=\{j,(i,j)\in E\}$ be the neighborhood of $i$. We pick a set $S\in \mathcal{S}$, such that $S\ne {\mathcal{N}}_{i}$.
We first prove that ${\widehat{h}}_{i,S}\le {\widehat{h}}_{i,{\mathcal{N}}_{i}}$. Since the neighborhood of $i$ separates $i$ from the rest of the graph, and infected nodes are connected, we can conclude that every time $i$ and another node of $S$ are infected, one node in the neighborhood ${\mathcal{N}}_{i}$ of $i$ is infected as well. Therefore, we cannot increase ${\widehat{h}}_{i,S}$ without also increasing ${\widehat{h}}_{i,{\mathcal{N}}_{i}}$. In particular, this means that even if $S>{\mathcal{N}}_{i}$ (for instance if ${\mathcal{N}}_{i}\subset S$), we have ${\widehat{h}}_{i,S}\le {\widehat{h}}_{i,{\mathcal{N}}_{i}}$. We therefore know that ${\mathcal{N}}_{i}\in \underset{R\in {\mathcal{S}}_{\overline{)i}}}{\mathrm{arg}\mathrm{max}}{\widehat{h}}_{i,R}$.
We now prove that with probability at least $1\delta $, ${\mathcal{N}}_{i}$ is the set of $\underset{R\in {\mathcal{S}}_{\overline{)i}}}{\mathrm{arg}\mathrm{max}}{\widehat{h}}_{i,R}$ of minimal size. To do so, we notice that if there exists a cascade such that $i$ and $j\in {\mathcal{N}}_{i}$ are infected, but no node of $S$ is infected, then this implies $$. Indeed, as shown above, every time we increase ${\widehat{h}}_{i,S}$, we also increase ${\widehat{h}}_{i,{\mathcal{N}}_{i}}$, and we know there exists one cascade for which we increased ${\widehat{h}}_{i,{\mathcal{N}}_{i}}$ without increasing ${\widehat{h}}_{i,S}$. We now calculate the probability ${P}_{\mathrm{failure}}$ that such a cascade does not exist for all nodes $i$ in the graph, all nodes $j$ in their neighborhood, and all sets $S$ which do not include $i$ or $j$.
${P}_{\mathrm{failure}}$  $\le \mathbb{P}(\exists i\in V,\exists j\in {\mathcal{N}}_{i},\exists S\in {\mathcal{S}}_{\overline{)i},\overline{)j}},\text{every time}i\text{and}j\text{are infected, a node of}S\text{is also infected})$  
$\le {\displaystyle \sum _{i\in V}}{\displaystyle \sum _{j\in {\mathcal{N}}_{i}}}{\displaystyle \sum _{S\in {\mathcal{S}}_{\overline{)i},\overline{)j}}}}\mathbb{P}(\text{every time}i\text{and}j\text{are infected, a node of}S\text{is also infected}).$ 
We use Proposition 4 to bound this quantity:
${P}_{\mathrm{failure}}$  $\le {\displaystyle \sum _{i\in V}}{\displaystyle \sum _{j\in {\mathcal{N}}_{i}}}{\displaystyle \sum _{S\in {\mathcal{S}}_{\overline{)i},\overline{)j}}}}{\displaystyle \frac{\delta}{d\cdot {N}^{d+2}}}$  
$\le N\cdot d\cdot {N}^{d+1}\cdot {\displaystyle \frac{\delta}{d\cdot {N}^{d+2}}}$  
$\le \delta .$ 
We now know that with probability at least $1\delta $:
$$ 
This implies that no set of ${\bigcup}_{j\in {\mathcal{N}}_{i}}{\mathcal{S}}_{\overline{)i},\overline{)j}}$ can belong in $\underset{R\in {\mathcal{S}}_{\overline{)i}}}{\mathrm{arg}\mathrm{max}}{\widehat{h}}_{i,R}$. However, we have:
$${\mathcal{S}}_{\overline{)i}}\setminus \bigcup _{j\in {\mathcal{N}}_{i}}{\mathcal{S}}_{\overline{)i},\overline{)j}}=\{S\in \mathcal{S},{\mathcal{N}}_{i}\subseteq S\}.$$ 
In particular, this means that ${\mathcal{N}}_{i}$ is the only set of ${\mathcal{S}}_{\overline{)i}}\setminus {\bigcup}_{j\in {\mathcal{N}}_{i}}{\mathcal{S}}_{\overline{)i},\overline{)j}}$ of minimal size. This shows that with probability at least $1\delta $, ${\mathcal{N}}_{i}$ is the set of $\underset{R\in {\mathcal{S}}_{\overline{)i}}}{\mathrm{arg}\mathrm{max}}{\widehat{h}}_{i,R}$ of minimal size. This proves Algorithm 3.1 is correct, and that we can learn the structure of any graph of maximum degree $d$ with $M=\frac{(d+2)\cdot N\mathrm{log}(N)+N\mathrm{log}\left(\frac{d}{\delta}\right)}{{p}_{min}{(1{p}_{max})}^{2(d1)}}$ cascades.
Since we do at most one operation by pair of (node, set) and by cascade, the running time is $\mathcal{O}(N\cdot {N}^{d+1}\cdot M)$, which is what we wanted to prove.
∎
This leads to our theorem for learning the structure of any boundeddegree graph.
Theorem 5.
With probability at least $\mathrm{1}\mathrm{}\delta $, in the extremenoise setting, we can learn the structure of any graph of maximum degree $d$ with $M\mathrm{=}\mathrm{O}\mathit{}\mathrm{\left(}\frac{d\mathrm{\cdot}N\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{N}{\delta}\mathrm{\right)}}{{p}_{m\mathit{}i\mathit{}n}\mathit{}{\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}^{\mathrm{2}\mathit{}d}}\mathrm{\right)}$ cascades in polynomial time.
Let us now assume that ${p}_{max}\sim \frac{1}{d}$. This assumption is reasonable when you expect a constant number of infections by time step. For instance, it makes sense for real diseases, for which carriers have to meet to transmit it (we can only meet a constant number of people each day). It would not make sense for social networks, in which it is possible to reach many followers with each post.
Corollary 2.
With probability at least $\mathrm{1}\mathrm{}\delta $, in the extremenoise setting, if we assume ${p}_{m\mathit{}a\mathit{}x}\mathrm{\sim}\frac{\mathrm{1}}{d}$ and ${p}_{m\mathit{}i\mathit{}n}$ constant, we can learn the structure of any graph of maximum degree $d$ with $M\mathrm{=}\mathrm{O}\mathit{}\mathrm{\left(}d\mathrm{\cdot}N\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{N}{\delta}\mathrm{\right)}\mathrm{\right)}$. This sample complexity is optimal up to logfactors, and almost matches the lower bound established in the nonoise setting.
Proof.
We need at least $\mathcal{O}\left(d\cdot N\mathrm{log}\left(N\right)\right)$ samples to learn the structure of a boundeddegree graph with maximum degree $d$, according to the lower bound in [29].
3.2 Boundeddegree weights
For the remainder of this paper, we state results in the limitednoise setting.
If we consider cascades of size $k$, the exact probability of infection between two nodes is a multivariate polynomial of degree $N$ on $N(N1)$ variables (the variables here would be the weights of the graph), with a sum of up to $2\cdot {\displaystyle \sum _{l=1}^{k}}{\displaystyle \frac{(N2)!}{(Nk)!}}$ terms. If the graph has more than five nodes, the resulting polynomial is of degree more than five.
For our algorithm, we therefore only use cascades of size 1 or 2. This is a waste of the data, since we simply discard cascades of larger size. However, we are not aware of techniques on how to utilize larger cascades in the limitednoise setting. Cascades of size 1 or 2 are simple enough that we can write explicitly their probability. This allows use to:

1.
Design estimators for which we can calculate the exact limit.

2.
Combine these estimators to transform a polynomial system of degree $N$ to a polynomial system of degree 2.

3.
Solve this system exactly and obtain the probabilities of infection.
3.3 Estimators
We start by designing a few estimators:
Definition 6.
We introduce two sets of $N\mathit{}\mathrm{(}N\mathrm{}\mathrm{1}\mathrm{)}$ estimators and one set of $N$ estimators. These estimators can be computed even if we only have access to the noisy times of infection.
${\widehat{h}}_{i,j}^{2}$  $=\mathit{\text{Fraction of cascades for which only}}i\mathit{\text{and}}j\mathit{\text{are infected.}}$  
$$  $$  
${\widehat{e}}_{i}^{1}$  $=\mathit{\text{Fraction of cascades for which only}}i\mathit{\text{is infected.}}$ 
To simplify the coming notations, let us introduce ${s}_{k}$, which is the probability that the noise on $j$ has delay at least $k$ relative to the noise on $i$. Since the noise is i.i.d., this value is independent from $i$ and $j$: ${s}_{k}=\mathbb{P}({n}_{j}{n}_{i}\ge k)$^{4}^{4} 4 For instance, for geometric noise of parameter $q$, we have: ${s}_{k}={\displaystyle \sum _{{t}_{j}=\mathrm{max}(0,k)}^{\mathrm{\infty}}}{\displaystyle \sum _{{t}_{i}=0}^{{t}_{j}k}}{(1q)}^{{t}_{i}+{t}_{j}}={(1q)}^{\mathrm{max}(0,k)}\left(1{\displaystyle \frac{{(1q)}^{1\mathrm{min}(0,k)}}{2q}}\right).$. For instance, if $i$ infected $j$ during cascade $m$, the probability that the noise did not flip the order of infection (i.e. $$) is $\mathbb{P}({n}_{j}\ge {n}_{i})={s}_{0}$. In the reverse case, the probability that the noise flipped the order of infection is $$.
We now compute the limits of those estimators.
Proposition 5.
As the number of cascades $M$ goes to infinity, the estimators introduced above tend to the following limit:
${\widehat{h}}_{i,j}^{2}$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}({p}_{ij}+{p}_{ji}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk}),$  
$$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}({p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk}),$  
${\widehat{e}}_{i}^{1}$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}(1{p}_{ij}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik}).$ 
3.4 Solving the system
The limit of these estimators, as seen as a function of the probabilities of infection, is a complex polynomial on up to $2(N1)$ variables. The crux of our algorithm is to combine those estimators in order to cancel out most of these variables, and create $N1$ systems of two equations of degree 2 and two unknowns, which we then solve.
Proposition 6.
Let $$. Then the limit of ${\widehat{V}}_{i\mathit{}j}$ as the number of cascades $M$ goes to infinity only depends on the variables ${p}_{i\mathit{}j}$ and ${p}_{j\mathit{}i}$ :
$${\widehat{V}}_{ij}{\to}_{M\to \mathrm{\infty}}\frac{{p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}}{1+{p}_{ij}\cdot {p}_{ji}}.$$ 
Proof.
Since all the estimators converge towards a constant, we can use Slutsky’s lemma to find the limit of ${\widehat{V}}_{ij}$. Let ${V}_{ij}$ be the limit of ${\widehat{V}}_{ij}$ as $M$ goes to infinity. We notice that the $\prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk})$ parts cancel each other out:
${V}_{ij}$  $$  
$={\displaystyle \frac{\frac{1}{N}({p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2})\left[{\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk})\right]}{\left(\frac{1}{N}({p}_{ij}+{p}_{ji})+N\frac{(1{p}_{ij})(1{p}_{ji})}{{N}^{2}}\right)\left[{\displaystyle \prod _{k\ne i,j}}(1{p}_{jk})(1{p}_{jk})\right]}}$  
$={\displaystyle \frac{({p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2})}{1+{p}_{ij}\cdot {p}_{ji}}}.$ 
∎
We can therefore use this equality to deduce the weights of all the edges of the graph:
Theorem 6.
For any graph, for any noise distribution having finite values, we can learn the weights of all the edges of the graph. In particular, we can compute a quantity which converges to the true weight of each edge:
$${\widehat{p}}_{ij}=\frac{2({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})}{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})({\widehat{V}}_{ij}{s}_{2}{\widehat{V}}_{ji}{s}_{0})}}.$$ 
Proof.
We present a sketch of the proof here. The details can be found in Appendix B.1. We know ${\widehat{V}}_{ij}$ tends to ${V}_{ij}=\frac{{p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}}{1+{p}_{ij}\cdot {p}_{ji}}$. Using both ${V}_{ij}$ and ${V}_{ji}$, we can establish this seconddegree equation:
$${V}_{ji}{s}_{2}{V}_{ij}{s}_{0}+\left({s}_{0}^{2}{s}_{2}^{2}\right){p}_{ij}+\left({V}_{ij}{s}_{2}{V}_{ji}{s}_{0}\right){p}_{ij}^{2}=0$$ 
We recall that by definition, ${s}_{0}\ge {s}_{2}$. We also notice that if ${p}_{ij}={q}_{1}$ and ${p}_{ji}={q}_{2}$ is a pair of solutions of this system, then ${p}_{ij}=\frac{1}{{q}_{2}}$ and ${p}_{ji}=\frac{1}{{q}_{1}}$ forms the other pair of solution, which implies there is uniqueness of solutions in $[0,{p}_{max}]$. Since the real probabilities of infection satisfy this system, we also know the solution exists. Let $\mathrm{\Delta}={\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$. The only solution of this system in $[0,{p}_{max}]$ is then:
${p}_{ij}$  $={\displaystyle \frac{2({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})}{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{\mathrm{\Delta}}}}.$ 
∎
3.5 Sample complexity
We establish the sample complexity needed to estimate ${p}_{ij}$ with precision $\u03f5$. To do so, we start by estimating ${V}_{ij}$. Note that we only consider the pair of nodes $i$ and $j$ if, among the $M$ samples, there exists a cascade of size 2 in which $i$ and $j$ are the only infected nodes (i.e. ${h}_{i,j}^{2}>0$). Otherwise, we set ${\widehat{p}}_{ij}={\widehat{p}}_{ji}=0$ as our estimate for ${p}_{ij}$.
Proposition 7.
With probability $\mathrm{1}\mathrm{}\frac{\delta}{{N}^{\mathrm{2}}}$, with $M\mathrm{=}$ samples, we can estimate ${V}_{i\mathit{}j}$ with precision ${\u03f5}_{V}$.
Proof.
We present a sketch of the proof; the details can be found in Appendix B.2. As in Proposition 4, we use Hoeffding’s inequality:
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{h}}_{i,j}^{2}{h}_{i,j}^{2}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{e}}_{i}^{1}{e}_{i}^{1}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{e}}_{j}^{1}{e}_{j}^{1}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}}.$ 
We use this to bound above ${V}_{ij}$:
${\widehat{V}}_{ij}$  $$ 
By bounding below the denominators and bounding above the numerator, we finally obtain:
${\widehat{V}}_{ij}{V}_{ij}$  $\le {\u03f5}_{1}{\displaystyle \frac{4N}{{p}_{min}{s}_{2}{(1{p}_{max})}^{2d}}}+o({\u03f5}_{1}).$ 
Therefore, by union bound, and by choosing ${\u03f5}_{1}\frac{4N}{{p}_{min}{s}_{2}{(1{p}_{max})}^{2d}}={\u03f5}_{V}$, and setting $2{e}^{2M{\u03f5}_{1}^{2}}=\frac{\delta}{3{N}^{2}}$, we obtain:
With $M=\frac{1}{{\u03f5}_{V}^{2}}\frac{16{N}^{2}}{{p}_{min}^{2}{s}_{2}^{2}{(1{p}_{max})}^{4d}}\frac{2\mathrm{log}(3N)\mathrm{log}(\delta )}{2}$ samples, we can guarantee ${\widehat{V}}_{ij}{V}_{ij}\le {\u03f5}_{V}$ with probability at least $1\frac{\delta}{{N}^{2}}$.
∎
Once we have estimated ${V}_{ij}$ with precision ${\u03f5}_{V}$, estimating ${p}_{ij}$ is unfortunately still not an easy task. Indeed, let $\mathrm{\Delta}={\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$. We have ${p}_{ij}=\frac{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{\mathrm{\Delta}}}{2({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})}$, which means that $\mathrm{\Delta}$ has to be positive for this quantity to be defined. However, for general values of ${V}_{ij}$ and ${V}_{ji}$, $\mathrm{\Delta}$ can be negative. We therefore use the framework of constrained optimization to bound $\mathrm{\Delta}$ away from 0.
Lemma 7.
Let $\mathrm{\Delta}\mathrm{=}{\mathrm{\left(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{\right)}}^{\mathrm{2}}\mathrm{}\mathrm{4}\mathit{}\mathrm{(}{V}_{j\mathit{}i}\mathit{}{s}_{\mathrm{2}}\mathrm{}{V}_{i\mathit{}j}\mathit{}{s}_{\mathrm{0}}\mathrm{)}\mathit{}\mathrm{(}{V}_{i\mathit{}j}\mathit{}{s}_{\mathrm{2}}\mathrm{}{V}_{j\mathit{}i}\mathit{}{s}_{\mathrm{0}}\mathrm{)}$. We have:
$$\mathrm{\Delta}\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}\frac{{(1{p}_{max})}^{2}}{1+{p}_{max}^{2}}.$$ 
Proof.
We find the lower bound on $\mathrm{\Delta}$ by reformulating the problem as a constrained optimization problem, and introducing the corresponding Lagrangian multipliers. The details can be found in Appendix B.1.
∎
Now that we have established this bound on $\mathrm{\Delta}$, we can give the sample complexity needed to estimate ${p}_{ij}$.
Proposition 8.
Assuming we can estimate ${V}_{i\mathit{}j}$ within precision ${\u03f5}_{V}$, then we can estimate ${p}_{i\mathit{}j}$ within precision $\u03f5\mathrm{=}\frac{\mathrm{6}\mathit{}{\u03f5}_{V}\mathit{}\mathrm{(}\mathrm{1}\mathrm{+}{p}_{m\mathit{}a\mathit{}x}^{\mathrm{2}}\mathrm{)}}{{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{2}}\mathit{}{\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}^{\mathrm{2}}}$.
Proof.
This is a simple derivation which can be found in Appendix B.2.
∎
We finally piece everything together, and use a union bound on all the ${p}_{ij}$ to obtain the final sample complexity of our algorithm for learning the weights of general boundeddegree graphs.
Theorem 7.
In the limitednoise setting, with probability at least $\mathrm{1}\mathrm{}\delta $, with $M\mathrm{=}\mathrm{O}\mathit{}\mathrm{\left(}\frac{{e}^{\mathrm{4}\mathit{}{p}_{m\mathit{}a\mathit{}x}\mathit{}\mathrm{(}d\mathrm{+}\mathrm{1}\mathrm{)}}}{{p}_{m\mathit{}i\mathit{}n}^{\mathrm{2}}\mathit{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathit{}{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{4}}}\mathit{}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{N}{\delta}\mathrm{\right)}\mathrm{\right)}$ samples, we can learn the weights of any boundeddegree graph up to precision $e\mathit{}p\mathit{}s\mathit{}i\mathit{}l\mathit{}o\mathit{}n$.
Once again, if we assume ${p}_{max}\sim \frac{1}{d}$ and ${p}_{min}$ constant, we obtain the following sample complexity:
Corollary 3.
In the limitednoise setting, with probability at least $\mathrm{1}\mathrm{}\delta $, if ${p}_{m\mathit{}a\mathit{}x}\mathrm{\sim}\frac{\mathrm{1}}{d}$ and ${p}_{m\mathit{}i\mathit{}n}$ constant, we can learn the weights of any boundeddegree graph up to precision $e\mathit{}p\mathit{}s\mathit{}i\mathit{}l\mathit{}o\mathit{}n$ with $M\mathrm{=}\mathrm{O}\mathit{}\mathrm{\left(}\frac{\mathrm{1}}{{s}_{\mathrm{2}}^{\mathrm{2}}\mathit{}{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{4}}}\mathit{}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{N}{\delta}\mathrm{\right)}\mathrm{\right)}$ samples.
4 General graphs
In the limitednoise setting, we notice that nothing prevents us from using the algorithm for learning boundeddegree weights for general graph. This proves this problem is solvable for $any$ graph, and $any$ noise distribution. As explained in Section 1.2, this is not an obvious result. However, if we do not assume ${p}_{max}\sim \frac{1}{N}$, the sample complexity is now exponential.
Theorem 8.
In the limitednoise setting, with probability at least $\mathrm{1}\mathrm{}\delta $, it is possible to learn all the weights of $a\mathit{}n\mathit{}y$ graph, for $a\mathit{}n\mathit{}y$ noise distribution, with finite (but potentially exponential) sample complexity.
5 Discussion and Future work
In this paper, we presented the first results to learn the edges of a graph from noisy times of infection. We showed we learn the structure of any bidirectional tree or any boundeddegree graph (note that not all trees are boundeddegree graphs) with optimal sample complexity (up to logfactors).
However, our results are not tight for learning the weights of bidirectional trees or boundeddegree graphs. In the nonoise setting, [29] proves this can be achieved with $\mathcal{O}({d}^{2}N\mathrm{log}(N))$ for graphs of maximum degree $d$ with correlation decay. Our results have sample complexity $\mathcal{O}({N}^{2}\mathrm{log}(N))$ without any assumption of correlation decay. It is an open problem to understand whether it is possible to achieve a sample complexity of $\mathcal{O}({d}^{2}N\mathrm{log}(N))$ in the limitednoise setting, and whether assuming correlation decay is necessary to obtain such a result.
Moreover, for learning the weights of a general boundeddegree graph, we only use cascades of size 1 or 2. If we are given an infinite number of cascades of size bigger than 2, our current algorithm cannot learn the weights of the graph. Future work could develop an algorithm without such a weakness.
All our results for learning the weights of the edges are in the limitednoise setting. Whether or not it is possible to learn the noise in the extremenoise setting is an other question of interest for future work.
Finally, we have made no restriction on the distribution of the noise we add, other than it is finite. It would be interesting to study whether stronger restrictions on the noise (for instance Gaussian noise) would lead to stronger results. It would also be interesting to allow infinite noise, and develop algorithms which are robust to errors in the infection status of a node (our current algorithms can return wrong graph structure with only one adversarially chosen false positive).
References
 [1] Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, and Alessandro Panconesi. Trace complexity of network inference. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining  KDD ’13, page 491, 2013.
 [2] Ery Ariascastro, Emmanuel J Candès, and Arnaud Durand. Detection of an anomalous cluster in a network. The Annals of Statistics, 39(1):278–304, 2011.
 [3] Ery Ariascastro and S T Nov. Detecting a Path of Correlations in a Network. pages 1–12.
 [4] Daniel Bernoulli and Sally Blower. An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it. Reviews in medical virology, 14:275–288, 2004.
 [5] A. Cayley. A theorem on trees. In Collected Mathematical Papers Vol. 13, pages 26–28. Cambridge University Press, 1897.
 [6] Justin Cheng, Lada A. Adamic, P. Alex Dow, Jon Kleinberg, and Jure Leskovec. Can Cascades be Predicted? In Proceedings of the 23rd international conference on World wide web (WWW’ 14), 2014.
 [7] Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. The spreading of misinformation online. Proceedings of the National Academy of Sciences, page 201517441, 2016.
 [8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal ofthe Royal Statistical Society, 39(1):1–38, 1977.
 [9] Kimon Drakopoulos, Asuman Ozdaglar, and John N. Tsitsiklis. An efficient curing policy for epidemics on graphs. arXiv preprint arXiv:1407.2241, (December):1–10, 2014.
 [10] Kimon Drakopoulos, Asuman Ozdaglar, and John N. Tsitsiklis. A lower bound on the performance of dynamic curing policies for epidemics on graphs. (978):3560–3567, 2015.
 [11] Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath. Rumor source obfuscation on irregular trees. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science (SIGMETRICS’ 16 ), pages 153–164. ACM, 2016.
 [12] Giulia Fanti, Peter Kairouz, Sewoong Oh, Kannan Ramchandran, and Pramod Viswanath. Hiding the Rumor Source. IEEE Transactions on Information Theory, 63(10):6679–6713, 2017.
 [13] Giulia Fanti, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. Spy vs. Spy: Rumor Source Obfuscation. Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 14), pages 271–284, 2015.
 [14] Mehrdad Farajtabar, Jiachen Yang, Xiaojing Ye, Huan Xu, Rakshit Trivedi, Elias Khalil, Shuang Li, Le Song, and Hongyuan Zha. Fake News Mitigation via Point Process Based Intervention. In Proceedings of the 34th International Conference on Machine Learning (ICML’ 17), 2017.
 [15] Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Eigentaste: A Constant Time Collaborative Filtering Algorithm. Information Retrieval, 4(2):133–151, 2001.
 [16] Manuel Gomezrodriguez, Jure Leskovec, and Andreas Krause. Inferring Networks of Diffusion and Influence. In ACM Transactions on Knowledge Discovery from Data (TKDD’ 12), volume 5, 2012.
 [17] Manuel GomezRodriguez, Jure Leskovec, and Bernhard Schölkopf. Structure and Dynamics of Information Pathways in Online Media. In 6th International Conference on Web Search and Data Mining (WSDM 2013), 2013.
 [18] Jessica Hoffmann and Constantine Caramanis. The Cost of Uncertainty in Curing Epidemics. Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS’ 18), 2(2):11–13, 2018.
 [19] Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. Discovering Latent Influence in Online Social Activities via Shared Cascade Poisson Processes. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’ 13), 2013.
 [20] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining  KDD ’03, 2003.
 [21] Justin Khim and PoLing Loh. Permutation Tests for Infection Graphs. pages 1–28, 2017.
 [22] Justin Khim and PoLing Loh. A theory of maximum likelihood for weighted infection graphs. pages 1–47, 2018.
 [23] Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis. Global Convergence of the EM Algorithm for Mixtures of Two Component Linear Regression. XX:1–57, 2019.
 [24] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Costeffective Outbreak Detection in Networks. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’07), page 420, 2007.
 [25] Eli A. Meirom, Chris Milling, Constantine Caramanis, Shie Mannor, Ariel Orda, and Sanjay Shakkottai. Localized epidemic detection in networks with overwhelming noise. pages 1–27, 2014.
 [26] Chris Milling, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. Network Forensics : Random Infection vs Spreading Epidemic. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 12), 2012.
 [27] Chris Milling, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. Local detection of infections in heterogeneous networks. Proceedings  IEEE INFOCOM, 26:1517–1525, 2015.
 [28] Seth Myers, Chenguang Zhu, and Jure Leskovec. Information Diffusion and External Influence in Networks. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’ 12), pages 33–41, 2012.
 [29] Praneeth Netrapalli and Sujay Sanghavi. Learning the Graph of Epidemic Cascades. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems (SIGMETRICS’ 12), pages 211–222, 2012.
 [30] M. E. J. Newman. Networks: An Introduction, volume 23. 2014.
 [31] Devavrat Shah and Tauhid Zaman. Detecting sources of computer viruses in networks: theory and experiment. In ACM SIGMETRICS Performance Evaluation Review, volume 38, pages 203–214. ACM, 2010.
 [32] Devavrat Shah and Tauhid Zaman. Rumors in a Network : Who ’ s the Culprit ? IEEE Transactions on information theory, 57(8):1–43, 2010.
 [33] Devavrat Shah and Tauhid Zaman. Rumor centrality: a universal source detector. In ACM SIGMETRICS Performance Evaluation Review, volume 40, pages 199–210. ACM, 2012.
 [34] Sam Spencer and R Srikant. On the impossibility of localizing multiple rumor sources in a line graph. ACM SIGMETRICS Performance Evaluation Review, 43(2):66–68, 2015.
 [35] Zhaoxu Wang, Wenxiang Dong, Wenyi Zhang, and Chee Wei Tan. Rumor source detection with multiple observations: Fundamental limits and algorithms. In ACM SIGMETRICS Performance Evaluation Review, volume 42, pages 1–13. ACM, 2014.
 [36] Liang Wu and Huan Liu. Tracing FakeNews Footprints: Characterizing Social Media Messages by How They Propagate. In (WSDM 2018) The 11th ACM International Conference on Web Search and Data Mining, 2018.
 [37] Ali Zarezade, Ali Khodadadi, Mehrdad Farajtabar, Hamid R Rabiee, and Hongyuan Zha. Correlated Cascades : Compete or Cooperate. In Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence (AAAI17), pages 238–244, 2017.
 [38] Qingyuan Zhao, Murat A. Erdogdu, Hera Y. He, Anand Rajaraman, and Jure Leskovec. SEISMIC: A SelfExciting Point Process Model for Predicting Tweet Popularity. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15 ), 2015.
Appendix A Bidirectional tree
We include here the full calculations for learning the weights of the bidirectional tree.
Proposition 9.
If we know $\mathrm{(}i\mathrm{,}j\mathrm{)}$ is an edge in the original tree, then the probability of infection along this edge is given by:
$$ 
Proof.
According to Lemma 3, we have:
$$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{2}$  
${g}_{i,\overline{)j}}$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{ij})$  
$$  $={\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{2}$  
${g}_{j,\overline{)i}}$  $={\mathcal{P}}_{\overline{)i}}(\to j)\cdot (1{p}_{ji}).$ 
We have 4 secondorder equations, with 4 unknowns: ${p}_{ij}$, ${p}_{ji}$, ${\mathcal{P}}_{\overline{)j}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to i)$ and ${\mathcal{P}}_{\overline{)i}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to j)$. We solve it:
$$  $=({\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji})\cdot ({s}_{0}+{s}_{2})$  
$$  $=({\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji})\cdot ({s}_{0}{s}_{2})$  
$\u27f9{\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}$  $$  
$$  
${\mathcal{P}}_{\overline{)j}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to i)$  $={g}_{i,\overline{)j}}+{\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}$  
$$  
$\u27f9{p}_{ij}$  $={\displaystyle \frac{{\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}}{{\mathcal{P}}_{\overline{)j}}\phantom{\rule{veryverythickmathspace}{0ex}}(\to i)}}$  
${p}_{ij}$  $$ 
∎
Lemma 8.
With $M\mathrm{=}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{\mathrm{6}}{\delta}\mathrm{\right)}\mathit{}\frac{{\mathrm{\left(}\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{)}\mathit{}{p}_{m\mathit{}a\mathit{}x}\mathrm{+}{s}_{\mathrm{0}}\mathrm{+}{s}_{\mathrm{2}}\mathrm{\right)}}^{\mathrm{2}}}{{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{2}}}$ samples, with probability at least $\mathrm{1}\mathrm{}\delta $, we have:
$${\widehat{p}}_{ij}{p}_{ij}\le \u03f5.$$ 
Proof.
Using Hoeffding’s inequality:
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{g}}_{i,\overline{)j}}{g}_{i,\overline{)j}}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}}.$ 
Choosing $M=\frac{1}{{\u03f5}_{1}^{2}}\mathrm{log}\left(\frac{6}{\delta}\right)$, we have that with probability at least $1\delta $, all the following hold:
$$  $\le {\u03f5}_{1},$  
$$  $\le {\u03f5}_{1},$  
${\widehat{g}}_{i,\overline{)j}}{g}_{i,\overline{)j}}$  $\le {\u03f5}_{1}.$ 
Hence, with probability at least $1\delta $, we have:
${\widehat{p}}_{ij}$  $$  
$$  
$$  
$$  
$$  
$$  
$$  
${\widehat{p}}_{ij}{p}_{ij}$  $$  
$$ 
Using the results from Lemma 3, we have:
$$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{2},$  
${g}_{i,\overline{)j}}$  $={\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{ij}).$ 
We use it to simplify the denominator:
denominator  $$  
$=({\mathcal{P}}_{\overline{)j}}(\to i)\cdot (1{p}_{ij}))\cdot ({s}_{0}^{2}{s}_{2}^{2})$  
$\mathrm{\hspace{1em}}+({\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{2})\cdot {s}_{0}$  
$\mathrm{\hspace{1em}}({\mathcal{P}}_{\overline{)i}}(\to j)\cdot {p}_{ji}\cdot {s}_{0}+{\mathcal{P}}_{\overline{)j}}(\to i)\cdot {p}_{ij}\cdot {s}_{2}){s}_{2}$  
$={\mathcal{P}}_{\overline{)j}}(\to i)\cdot ({s}_{0}^{2}{s}_{2}^{2})$  
$\ge {\displaystyle \frac{{s}_{0}^{2}{s}_{2}^{2}}{N}}.$ 
Plugging back above:
${\widehat{p}}_{ij}{p}_{ij}$  $$  
$$  
$\le {\u03f5}_{1}N{\displaystyle \frac{({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}}{{s}_{0}^{2}{s}_{2}^{2}}}+{\u03f5}_{1}N{\displaystyle \frac{{s}_{0}+{s}_{2}}{{s}_{0}^{2}{s}_{2}^{2}}}+o({\u03f5}_{1}).$ 
By symmetry, we obtain:
$${\widehat{p}}_{ij}{p}_{ij}\le {\u03f5}_{1}N\frac{({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}}{{s}_{0}^{2}{s}_{2}^{2}}+o({\u03f5}_{1}).$$ 
By choosing ${\u03f5}_{1}=\frac{\u03f5}{N}\frac{{s}_{0}^{2}{s}_{2}^{2}}{({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}}$, we therefore have:
With $M=\frac{{N}^{2}}{{\u03f5}^{2}}\mathrm{log}\left(\frac{6}{\delta}\right)\frac{{\left(({s}_{0}^{2}{s}_{2}^{2}+{s}_{0}+{s}_{2}){p}_{max}+{s}_{0}+{s}_{2}\right)}^{2}}{{({s}_{0}^{2}{s}_{2}^{2})}^{2}}$ samples, with probability at least $1\delta $, we have ${\widehat{p}}_{ij}{p}_{ij}\le \u03f5$.
∎
Appendix B Boundeddegree graphs
B.1 Solving the system
Lemma 9.
Let $\mathrm{\Delta}\mathrm{=}{\mathrm{\left(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{\right)}}^{\mathrm{2}}\mathrm{}\mathrm{4}\mathit{}\mathrm{(}{V}_{j\mathit{}i}\mathit{}{s}_{\mathrm{2}}\mathrm{}{V}_{i\mathit{}j}\mathit{}{s}_{\mathrm{0}}\mathrm{)}\mathit{}\mathrm{(}{V}_{i\mathit{}j}\mathit{}{s}_{\mathrm{2}}\mathrm{}{V}_{j\mathit{}i}\mathit{}{s}_{\mathrm{0}}\mathrm{)}$. We have:
$$\mathrm{\Delta}\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}\frac{{(1{p}_{max})}^{2}}{1+{p}_{max}^{2}}.$$ 
Proof.
Finding a lower bound for $\mathrm{\Delta}$ can be achieved through minimizing $\mathrm{\Delta}$, or maximizing $({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$. We want to solve:
$\underset{{V}_{ij},{V}_{ji}}{\text{maximize}}$  $({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$  
subject to  ${V}_{ij}={\displaystyle \frac{{p}_{ij}{s}_{0}+{p}_{ji}{s}_{2}}{1+{p}_{ij}{p}_{ji}}},$  
${V}_{ji}={\displaystyle \frac{{p}_{ji}{s}_{0}+{p}_{ij}{s}_{2}}{1+{p}_{ij}{p}_{ji}}},$  
${p}_{ij}\ge 0,$  
${p}_{max}{p}_{ij}\ge 0,$  
${p}_{ji}\ge 0,$  
${p}_{max}{p}_{ji}\ge 0.$ 
To do so, we introduce Lagrangian multipliers. By replacing ${V}_{ij}$ and ${V}_{ji}$ with their actual value, the optimization problem above only has affine constraints, so it satisfies the linearity constraint qualification for the KarushKuhnTucker conditions. In other words, all the partial derivatives of the Lagrangian are equal to 0 for an optimal point.
$\mathcal{L}$  $=\mathcal{L}({V}_{ij},{V}_{ji},{p}_{ij},{p}_{ji},{\lambda}_{1},{\lambda}_{2},{\mu}_{1},{\mu}_{2},{\mu}_{3},{\mu}_{4})$  
$=({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$  
$\mathrm{\hspace{1em}}{\lambda}_{1}({v}_{ij}(1+{p}_{ij}{p}_{ji}){p}_{ij}{s}_{0}+{p}_{ji}{s}_{2})$  
$\mathrm{\hspace{1em}}{\lambda}_{2}({v}_{ji}(1+{p}_{ij}{p}_{ji}){p}_{ji}{s}_{0}+{p}_{ij}{s}_{2})$  
$\mathrm{\hspace{1em}}{\mu}_{1}{p}_{ij}{\mu}_{2}({p}_{max}{p}_{ij})$  
$\mathrm{\hspace{1em}}{\mu}_{3}{p}_{ji}{\mu}_{4}({p}_{max}{p}_{ji}).$ 
We calculate the gradients of $\mathcal{L}$.
$\frac{\partial \mathcal{L}}{\partial {V}_{ij}}$  $={V}_{ji}({s}_{0}^{2}+{s}_{2}^{2})2{V}_{ij}{s}_{0}{s}_{2}{\lambda}_{1}(1+{p}_{ij}{p}_{ji}),$  
$\frac{\partial \mathcal{L}}{\partial {V}_{ji}}$  $={V}_{ij}({s}_{0}^{2}+{s}_{2}^{2})2{V}_{ji}{s}_{0}{s}_{2}{\lambda}_{2}(1+{p}_{ij}{p}_{ji}),$  
$\frac{\partial \mathcal{L}}{\partial {p}_{ij}}$  $={\lambda}_{1}({V}_{ij}{p}_{ji}{s}_{0}){\lambda}_{2}({V}_{ji}{p}_{ji}{s}_{2}){\mu}_{1}+{\mu}_{2},$  
$\frac{\partial \mathcal{L}}{\partial {p}_{ji}}$  $={\lambda}_{1}({V}_{ij}{p}_{ij}{s}_{2}){\lambda}_{2}({V}_{ji}{p}_{ij}{s}_{0}){\mu}_{3}+{\mu}_{4}.$ 
From now on, we find the set ${X}^{0}$ of points for which all the partial derivatives are null. We know the solution of the maximization problem is the point of ${X}^{0}$ which maximizes the objective function.
Let us assume an interior point solution exists. For this point, all the gradients of $\mathcal{L}$ are equal to 0. Since it is an interior point, we also have ${\mu}_{1}={\mu}_{2}={\mu}_{3}={\mu}_{4}=0$ by complementary slackness. Solving this system, we obtain:
${\lambda}_{1}$  $=({s}_{0}^{2}{s}_{2}^{2}){\displaystyle \frac{{p}_{ji}{s}_{0}{p}_{ij}{s}_{2}}{{(1+{p}_{ij}{p}_{ji})}^{2}}},$  
${\lambda}_{2}$  $=({s}_{0}^{2}{s}_{2}^{2}){\displaystyle \frac{{p}_{ij}{s}_{0}{p}_{ji}{s}_{2}}{{(1+{p}_{ij}{p}_{ji})}^{2}}}.$ 
Plugging this in above, the condition $\frac{\partial \mathcal{L}}{\partial {p}_{ij}}=0$ becomes ${p}_{ij}{p}_{ji}(1{p}_{ji})=0$. However, this is impossible for an interior point, since $$. Therefore, the extrema of $\mathrm{\Delta}$ are attained when at least one constraint is active.
We notice that if the conditions ${p}_{ij}=0$ or ${p}_{ji}=0$ are active, then $({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})=0$. Let us suppose (without loss of generality, by symmetry of the problem) that we have ${p}_{ij}={p}_{max}$. The objective function is then increasing in ${p}_{ji}$. Therefore, $\mathrm{\Delta}$ is minimized when ${p}_{ij}={p}_{ji}={p}_{max}$, which implies ${v}_{ij}={V}_{ji}=\frac{{p}_{max}({s}_{0}+{s}_{2})}{1+{p}_{max}^{2}}$. In this case:
$\mathrm{\Delta}$  $\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}4{V}_{ij}^{2}{({s}_{0}{s}_{2})}^{2}$  
$\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}4{\left({\displaystyle \frac{{p}_{max}({s}_{0}+{s}_{2})}{1+{p}_{max}^{2}}}\right)}^{2}{({s}_{0}{s}_{2})}^{2}$  
$\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}\left[14{\displaystyle \frac{{p}_{max}}{1+{p}_{max}^{2}}}\right]$  
$\ge {({s}_{0}^{2}{s}_{2}^{2})}^{2}{\displaystyle \frac{{(1{p}_{max})}^{2}}{1+{p}_{max}^{2}}}.$ 
This expression is always positive, which is what we wanted.
∎
Theorem 9.
For any graph, for any noise distribution having finite values, we can learn the weights of all the edges of the graph. In particular, we can compute a quantity which converges to the true weight of each edge:
$${\widehat{p}}_{ij}=\frac{2({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})}{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})({\widehat{V}}_{ij}{s}_{2}{\widehat{V}}_{ji}{s}_{0})}}.$$ 
Proof.
${\widehat{V}}_{ij}$  ${\to}_{M\to \mathrm{\infty}}{V}_{ij}$  
$:={\displaystyle \frac{{p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}}{1+{p}_{ij}\cdot {p}_{ji}}}$  
${p}_{ji}$  $={\displaystyle \frac{{V}_{ij}{p}_{ij}\cdot {s}_{0}}{{s}_{2}{V}_{ij}\cdot {p}_{ij}}}$ 
We can plug this in ${V}_{ji}$:
${V}_{ji}$  $={\displaystyle \frac{{p}_{ji}\cdot {s}_{0}+{p}_{ij}\cdot {s}_{2}}{1+{p}_{ij}\cdot {p}_{ji}}}$  
${V}_{ji}\left[1+{p}_{ij}\cdot {\displaystyle \frac{{V}_{ij}{p}_{ij}\cdot {s}_{0}}{{s}_{2}{V}_{ij}\cdot {p}_{ij}}}\right]$  $={\displaystyle \frac{{V}_{ij}{p}_{ij}\cdot {s}_{0}}{{s}_{2}{V}_{ij}\cdot {p}_{ij}}}\cdot {s}_{0}+{p}_{ij}\cdot {s}_{2}$ 
After some shuffling around, we obtain the seconddegree equation:
$${V}_{ji}{s}_{2}{V}_{ij}{s}_{0}+\left({s}_{0}^{2}{s}_{2}^{2}\right){p}_{ij}+\left({V}_{ij}{s}_{2}{V}_{ji}{s}_{0}\right){p}_{ij}^{2}=0$$ 
We recall that by definition, ${s}_{0}\ge {s}_{2}$. We also notice that if ${p}_{ij}={q}_{1}$ and ${p}_{ji}={q}_{2}$ is a pair of solutions of this system, then ${p}_{ij}=\frac{1}{{q}_{2}}$ and ${p}_{ji}=\frac{1}{{q}_{1}}$ forms the other pair of solution, which implies there is uniqueness of solutions in $[0,{p}_{max}]$. Since the real probabilities of infection satisfy this system, we also know the solution exists. Let $\mathrm{\Delta}={\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})$. The only solution of this system in $[0,{p}_{max}]$ is:
${p}_{ij}$  $={\displaystyle \frac{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{\mathrm{\Delta}}}{2({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})}}$  
$={\displaystyle \frac{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})({V}_{ij}{s}_{2}{V}_{ji}{s}_{0})}{2({V}_{ji}{s}_{0}{V}_{ij}{s}_{2})\left(\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{\mathrm{\Delta}}\right)}}$  
$={\displaystyle \frac{2({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})}{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{\mathrm{\Delta}}}}$ 
∎
B.2 Sample complexity
Proposition 10.
As the number of cascades $M$ goes to infinity, the estimators below tend to the following limit:
${\widehat{h}}_{i,j}^{2}$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}({p}_{ij}+{p}_{ji}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk})$  
$$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}({p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk})$  
${\widehat{e}}_{i}^{1}$  ${\to}_{M\to \mathrm{\infty}}{\displaystyle \frac{1}{N}}(1{p}_{ij}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})$ 
Proof.
Using the law of large numbers:
$$  $$  
$=\mathbb{P}(i\text{source, infects}j\text{, no other infections, delay 0})$  
$\mathrm{\hspace{1em}}+\mathbb{P}(j\text{source, infects}i\text{, no other infections, delay 2})$  
$={\displaystyle \frac{1}{N}}{p}_{ij}{\displaystyle \prod _{k\ne i,j}}(1{p}_{ik}){\displaystyle \prod _{k\ne i,j}}(1{p}_{jk})\cdot {s}_{0}$  
$\mathrm{\hspace{1em}}+{\displaystyle \frac{1}{N}}{p}_{ji}{\displaystyle \prod _{k\ne i,j}}(1{p}_{jk}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})\cdot {s}_{2}$  
$={\displaystyle \frac{1}{N}}({p}_{ij}\cdot {s}_{0}+{p}_{ji}\cdot {s}_{2}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk}).$ 
In the same vein, we have:
${\widehat{h}}_{i,j}^{2}$  ${\to}_{M\to \mathrm{\infty}}\mathbb{E}[{\widehat{h}}_{i,j}^{2}]$  
$=\mathbb{P}(i\text{source, infects}j\text{, no other infections})$  
$\mathrm{\hspace{1em}}+\mathbb{P}(j\text{source, infects}i\text{, no other infections})$  
$={\displaystyle \frac{1}{N}}({p}_{ij}+{p}_{ji}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik})(1{p}_{jk})$  
${\widehat{e}}_{i}^{1}$  ${\to}_{M\to \mathrm{\infty}}\mathbb{E}[{\widehat{e}}_{i}^{1}]$  
$=\mathbb{P}(i\text{source, no other infections})$  
$={\displaystyle \frac{1}{N}}{\displaystyle \prod _{k\ne i}}(1{p}_{ik})$  
$={\displaystyle \frac{1}{N}}(1{p}_{ij}){\displaystyle \prod _{k\ne i,j}}(1{p}_{ik}).$ 
∎
Proposition 11.
With probability $\mathrm{1}\mathrm{}\frac{\delta}{{N}^{\mathrm{2}}}$, with $M\mathrm{=}$ samples, we can estimate ${V}_{i\mathit{}j}$ with precision ${\u03f5}_{V}$.
Proof.
As in Proposition 4, we use Hoeffding’s inequality:
$$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{h}}_{i,j}^{2}{h}_{i,j}^{2}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{e}}_{i}^{1}{e}_{i}^{1}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}},$  
$\mathbb{P}({\widehat{e}}_{j}^{1}{e}_{j}^{1}>{\u03f5}_{1})$  $\le 2{e}^{2M{\u03f5}_{1}^{2}}.$ 
We use this to bound above ${V}_{ij}$:
${\widehat{V}}_{ij}$  $$  
$$  
$$  
$$ 
We bound below the denominators:
$$  $\ge {\displaystyle \frac{1}{N}}({p}_{ij}{s}_{0}+{p}_{ji}{s}_{2}){(1{p}_{max})}^{2d}$  
$\ge {\displaystyle \frac{{p}_{min}{s}_{2}}{N}}{(1{p}_{max})}^{2d}$  
${h}_{i,j}^{2}+N{e}_{i}^{1}{e}_{j}^{1}$  $\ge {\displaystyle \frac{1}{N}}(1+{p}_{ij}{p}_{ji}){(1{p}_{max})}^{2d}$  
$\ge {\displaystyle \frac{1}{N}}{(1{p}_{max})}^{2d}$ 
We bound above the numerator:
$N({e}_{i}^{1}+{e}_{i}^{1})$  $=N\left({\displaystyle \frac{1}{N}}{\displaystyle \prod _{k\ne i}}(1{p}_{ik})+{\displaystyle \frac{1}{N}}{\displaystyle \prod _{k\ne j}}(1{p}_{jk})\right)$  
$\le N\left({\displaystyle \frac{1}{N}}+{\displaystyle \frac{1}{N}}\right)$  
$\le 2.$ 
Plugging in above:
${\widehat{V}}_{ij}$  $$  
$\le {V}_{ij}\left[1+{\displaystyle \frac{{\u03f5}_{1}}{\frac{{p}_{min}{s}_{2}}{N}{(1{p}_{max})}^{2d}}}+{\displaystyle \frac{{\u03f5}_{1}(1+2)}{\frac{1}{N}{(1{p}_{max})}^{2d}}}+o({\u03f5}_{1})\right].$ 
Using ${V}_{ij}\le 1$, and by symmetry:
${\widehat{V}}_{ij}{V}_{ij}$  $={\u03f5}_{1}{\displaystyle \frac{N}{{(1{p}_{max})}^{2d}}}\left[{\displaystyle \frac{1}{{p}_{min}{s}_{2}}}+3\right]+o({\u03f5}_{1})$  
$=\le {\u03f5}_{1}{\displaystyle \frac{N}{{(1{p}_{max})}^{2d}}}\left[{\displaystyle \frac{1+3{p}_{min}{s}_{2}}{{p}_{min}{s}_{2}}}\right]+o({\u03f5}_{1})$  
$\le {\u03f5}_{1}{\displaystyle \frac{4N}{{p}_{min}{s}_{2}{(1{p}_{max})}^{2d}}}+o({\u03f5}_{1}).$ 
Therefore, by union bound, and by choosing ${\u03f5}_{1}\frac{4N}{{p}_{min}{s}_{2}{(1{p}_{max})}^{2d}}={\u03f5}_{V}$, and setting $2{e}^{2M{\u03f5}_{1}^{2}}=\frac{\delta}{3{N}^{2}}$, we obtain:
With $M=\frac{1}{{\u03f5}_{V}^{2}}\frac{16{N}^{2}}{{p}_{min}^{2}{s}_{2}^{2}{(1{p}_{max})}^{4d}}\frac{2\mathrm{log}(3N)\mathrm{log}(\delta )}{2}$ samples, we can guarantee ${\widehat{V}}_{ij}{V}_{ij}\le {\u03f5}_{V}$ with probability at least $1\frac{\delta}{{N}^{2}}$.
∎
Proposition 12.
Assuming we can estimate ${V}_{i\mathit{}j}$ within precision ${\u03f5}_{V}$, then we can estimate ${p}_{i\mathit{}j}$ within precision $\u03f5\mathrm{=}\frac{\mathrm{6}\mathit{}{\u03f5}_{V}\mathit{}\mathrm{(}\mathrm{1}\mathrm{+}{p}_{m\mathit{}a\mathit{}x}^{\mathrm{2}}\mathrm{)}}{{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{2}}\mathit{}{\mathrm{(}\mathrm{1}\mathrm{}{p}_{m\mathit{}a\mathit{}x}\mathrm{)}}^{\mathrm{2}}}$.
Proof.
If we know $$ and $$:
${\widehat{p}}_{ij}$  $={\displaystyle \frac{2({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})}{\left({s}_{0}^{2}{s}_{2}^{2}\right)+\sqrt{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}4({\widehat{V}}_{ji}{s}_{2}{\widehat{V}}_{ij}{s}_{0})({\widehat{V}}_{ij}{s}_{2}{\widehat{V}}_{ji}{s}_{0})}}}$  
$\le {\displaystyle \frac{2({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})+2{\u03f5}_{V}({s}_{0}+{s}_{2})}{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}+\sqrt{\mathrm{\Delta}4{\u03f5}_{V}{({s}_{0}+{s}_{2})}^{2}({V}_{ij}+{V}_{ji})}}}.$ 
We recall ${s}_{0}+{s}_{2}\le 1$, ${V}_{ij}\le 1$, ${p}_{ij}\le 1$, and $1+{p}_{max}^{2}\le \mathrm{\Delta}\le 1$ (Lemma 7). Hence:
${\widehat{p}}_{ij}$  $\le {\displaystyle \frac{2({V}_{ji}{s}_{2}{V}_{ij}{s}_{0})+2{\u03f5}_{V}}{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}+\sqrt{\mathrm{\Delta}}\sqrt{18\frac{{\u03f5}_{V}}{\mathrm{\Delta}}}}}$  
$\le {p}_{ij}\left(1+{\displaystyle \frac{4{\u03f5}_{V}}{\sqrt{\mathrm{\Delta}}\left({\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}+\sqrt{\mathrm{\Delta}}\right)}}\right)+{\displaystyle \frac{2{\u03f5}_{V}}{{\left({s}_{0}^{2}{s}_{2}^{2}\right)}^{2}+\sqrt{\mathrm{\Delta}}}}+o({\u03f5}_{V})$  
$\le {p}_{ij}+{\displaystyle \frac{6{\u03f5}_{V}}{\mathrm{\Delta}}}.$ 
By symmetry, and using the bound on $\mathrm{\Delta}$ stated above, we conclude that if we know ${V}_{ij}$ and ${V}_{ji}$ up to precision ${\u03f5}_{V}$, we know ${p}_{ij}$ up to precision $\u03f5=\frac{6{\u03f5}_{V}(1+{p}_{max}^{2})}{{({s}_{0}^{2}{s}_{2}^{2})}^{2}{(1{p}_{max})}^{2}}$.
∎
Theorem 10.
In the limitednoise setting, with probability at least $\mathrm{1}\mathrm{}\delta $, with $M\mathrm{=}\mathrm{O}\mathit{}\mathrm{\left(}\frac{{e}^{\mathrm{4}\mathit{}{p}_{m\mathit{}a\mathit{}x}\mathit{}\mathrm{(}d\mathrm{+}\mathrm{1}\mathrm{)}}}{{p}_{m\mathit{}i\mathit{}n}^{\mathrm{2}}\mathit{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathit{}{\mathrm{(}{s}_{\mathrm{0}}^{\mathrm{2}}\mathrm{}{s}_{\mathrm{2}}^{\mathrm{2}}\mathrm{)}}^{\mathrm{4}}}\mathit{}\frac{{N}^{\mathrm{2}}}{{\u03f5}^{\mathrm{2}}}\mathit{}\mathrm{log}\mathit{}\mathrm{\left(}\frac{N}{\delta}\mathrm{\right)}\mathrm{\right)}$ samples, we can learn the weights of any boundeddegree graph up to precision $e\mathit{}p\mathit{}s\mathit{}i\mathit{}l\mathit{}o\mathit{}n$.
Proof.
With probability at least $1\frac{\delta}{{N}^{2}}$, using
$M=\frac{1}{{\u03f5}_{V}^{2}}\frac{16{N}^{2}}{{p}_{min}^{2}{s}_{2}^{2}{(1{p}_{max})}^{4d}}\frac{2\mathrm{log}(3N)\mathrm{log}(\delta )}{2}$ samples, we can guarantee ${\widehat{V}}_{ij}{V}_{ij}\le {\u03f5}_{V}$ with probability at least $1\frac{\delta}{{N}^{2}}$ samples, knowing $\frac{1}{{\u03f5}_{V}}=\frac{6(1+{p}_{max}^{2})}{\u03f5{({s}_{0}^{2}{s}_{2}^{2})}^{2}{(1{p}_{max})}^{2}}$. This gives us a sample complexity of:
$M$  $\ge {\displaystyle \frac{18}{{\u03f5}^{2}{({s}_{0}^{2}{s}_{2}^{2})}^{4}{(1{p}_{max})}^{4(d+1)}}}{N}^{2}{\left[{\displaystyle \frac{4}{{p}_{min}{s}_{2}}}\right]}^{2}\mathrm{log}\left({\displaystyle \frac{9{N}^{2}}{\delta}}\right)$  
$\ge {\displaystyle \frac{1152\cdot {e}^{4{p}_{max}(d+1)}}{{p}_{min}^{2}{s}_{2}^{2}{({s}_{0}^{2}{s}_{2}^{2})}^{4}}}{\displaystyle \frac{{N}^{2}}{{\u03f5}^{2}}}\mathrm{log}\left({\displaystyle \frac{9{N}^{2}}{\delta}}\right)$  
$=\mathcal{O}\left({\displaystyle \frac{{e}^{4{p}_{max}(d+1)}}{{p}_{min}^{2}{s}_{2}^{2}{({s}_{0}^{2}{s}_{2}^{2})}^{4}}}{\displaystyle \frac{{N}^{2}}{{\u03f5}^{2}}}\mathrm{log}\left({\displaystyle \frac{N}{\delta}}\right)\right).$ 
∎