Toward Universal Testing of Dynamic Network Models

  • 2020-02-13 18:12:10
  • Abram Magner, Wojciech Szpankowski
  • 0

Abstract

Numerous networks in the real world change over time, in the sense that nodesand edges enter and leave the networks. Various dynamic random graph modelshave been proposed to explain the macroscopic properties of these systems andto provide a foundation for statistical inferences and predictions. It is ofinterest to have a rigorous way to determine how well these models matchobserved networks. We thus ask the following goodness of fit question: given asequence of observations/snapshots of a growing random graph, along with acandidate model M, can we determine whether the snapshots came from M or fromsome arbitrary alternative model that is well-separated from M in some naturalmetric? We formulate this problem precisely and boil it down to goodness of fittesting for graph-valued, infinite-state Markov processes and exhibit andanalyze a universal test based on non-stationary sampling for a natural classof models.

 

Quick Read (beta)

[

\NameAbram Magner \Email[email protected]
\addrDepartment of Computer Science
University at Albany
   SUNY
Albany
   NY    USA
\AND\NameWojciech Szpankowski \Email[email protected]
\addrDepartment of Computer Science
Purdue University
West Lafayette
   IN    USA
Abstract

Numerous networks in the real world change over time, in the sense that nodes and edges enter and leave the networks. Various dynamic random graph models have been proposed to explain the macroscopic properties of these systems and to provide a foundation for statistical inferences and predictions. It is of interest to have a rigorous way to determine how well these models match observed networks. We thus ask the following goodness of fit question: given a sequence of observations/snapshots of a growing random graph, along with a candidate model M, can we determine whether the snapshots came from M or from some arbitrary alternative model that is well-separated from M in some natural metric? We formulate this problem precisely and boil it down to goodness of fit testing for graph-valued, infinite-state Markov processes and exhibit and analyze a universal test based on non-stationary sampling for a natural class of models.

Testing Dynamic Network Models]Toward Universal Testing of Dynamic Network Models

1 Introduction

Time-varying networks abound in various domains. To explain their macroscopic properties (e.g., subgraph frequencies, diameter, degree distribution, etc.) and to make predictions and other inferences (such as link prediction, community detection, etc.), a plethora of generative models have been proposed (see [Newman(2010), Hofstad(2016)] for broad overviews of classical random graph models; for a mathematical approach via exchangeable random structures, see, for example, [Veitch and Roy(2016), Orbanz and Roy(2013)]). For scientific reasons it is of interest to be able to judge how well the models (which postulate particular mechanisms of growth) reflect reality. This leads naturally to the problem of goodness of fit testing for dynamic network mechanisms. At a high level, this entails determining, in some formal way, whether or not a given observed sequence of graph snapshots is more likely to have originated from some candidate growth mechanism or, alternatively from a sufficiently different one.

Related work

Regarding related work, goodness of fit testing is a classical topic in statistics, but the focus is typically on real-valued and categorical data over small alphabets [Diakonikolas et al.(2017)Diakonikolas, Gouleakis, Peebles, and Price, Wasserman(2010)], where the input to the problem is a sequence of independent, identically distributed (iid) random variables from an unknown probability distribution. This is in contrast with the setting of dynamic graph models, where what is given is a trajectory (extremely non-iid) from an unknown model, and each element of the trajectory is a graph, which potentially shares a large amount of structure with its temporal neighbors.

Closer in spirit to our work is the (rather recent) literature on testing of static random graph models [Ghoshdastidar et al.(2017)Ghoshdastidar, Gutzeit, Carpentier, and von Luxburg]. Here (in the one-sample case), what is given is a sequence of iid graphs (not a trajectory of an evolving graph) coming from an unknown model, and the task is to distinguish the unknown model from some candidate distribution. In works so far, the distribution class under consideration is a particular parametric family or satisfies strong independence assumptions (e.g., in [Ghoshdastidar et al.(2017)Ghoshdastidar, Gutzeit, Carpentier, and von Luxburg], all edges are assumed to be independent), which is inherently not dynamic.

There has been work on testing of dynamic graph models, such as [Jeong et al.(2003)Jeong, Néda, and Barabási], but most such methods have not come with rigorous guarantees and have proposed tests tailored to particular parameteric distributions (so they are more properly considered to deal with parameter estimation than goodness of fit testing). In contrast, we formulate our results in terms of classes of models that we prove to be distinguishable.

With regard to the specific types of models of interest, our focus in the present paper is on model classes that are given by inequalities that control the rate of change of the conditional distributions governing the evolution of the sample graphs with respect to time. This affords us some flexibility in defining tractable model classes. There exist alternative theories on which a model class could be based, such as the theory of graphexes for sparse exchangeable random graphs (described in [Borgs et al.(2017)Borgs, Chayes, Cohn, and Veitch]). To a graphex is associated a growing random graph model, and the problem akin to ours in that setting is to produce consistent estimators of the graphex from a graph trajectory. This has been done in [Veitch and Roy(2016)]. However, while the set of models that can be parameterized by a graphex is quite rich, it has limitations that make it unsuitable for some of the models that we wish to consider, such as preferential attachment and uniform attachment (according to results in [Borgs et al.(2017)Borgs, Chayes, Cohn, and Veitch], both of these converge to the same degenerate graphex). We mention also [Ryabko(2017)], which deals with hypothesis testing on infinite random graphs. The focus there is on testing properties of infinite, bounded-degree graph isomorphism classes based on random walk samples; in contrast, our concern is with graph trajectories, which fundamentally carry more information than isomorphism classes, and our sampling model is different.

Additionally, one of the goals of our paper is to analyze the particularly natural scheme of non-stationary sampling (which was inspired by the algorithm sketched, but not fully specified, in [Jeong et al.(2003)Jeong, Néda, and Barabási]), and the guarantees on it lend themselves to phrasing in terms of the sequence of conditional distributions of a model, as we have done.

Finally, we mention recent work on goodness of fit testing for Markov chains [Daskalakis et al.(2018)Daskalakis, Dikkala, and Gravin]. In that work, as in ours, what is given is a single trajectory from a Markov chain. However, they restrict to a chain of constant size and assume homogeneous transition probabilities, ergodicity, symmetry, etc. See also more recent work that drops the symmetry condition [Cherapanamjeri and Bartlett(2019), Wolfer and Kontorovich(2019)]. General hypothesis testing problems relating to finite Markov chains under these assumptions have been considered in the more distant past [Natarajan(1985), Nakagawa and Kanaya(1993), Baum and Petrie(1966)]. In the setting of our paper, none of these assumptions hold.

Our contributions

In this paper, we give a rigorous formulation of goodness of fit testing for dynamic random graph models satisfying a Markov property, including a natural distortion measure on dynamic mechanisms. We identify (at an intuitive level) the key properties of dynamic models that make universal testing (i.e., testing with provable guarantees for a broad range of models) feasible and, motivated by these, propose a test of goodness of fit for models in a natural, infinite class based on non-stationary sampling, and show that this test succeeds with high probability. Our work may also be viewed as contributing toward the theoretical understanding of the generality of the idea of non-stationary sampling. Intuitively speaking, we find that it is not completely general, because of pathological special cases, but we can carve out a class of models defined by very natural conditions for which this technique is useful.

There are several novel technical challenges that arise in the problem of testing of dynamic random graph models. In particular, the problem deals with distinguishing between a candidate model and an arbitrary unknown model from a model class, both of which take the form of Markov processes on an infinite state space, so that classical tools for finite-state, ergodic Markov chains do not apply. Furthermore, identification of appropriate metrics for measuring the distance between two dynamic models is a nontrivial philosophical problem: we argue that, for example, total variation distance is of limited applicability in this setting, because it is intuitively substantially too stringent for exploratory data analysis: one may consider two models that are, intuitively, driven by the same mechanism (e.g., preferential attachment), but that have total variation distance 1 because of small model differences.

Having settled on some measure, identification of an appropriate model class under which our candidate test can be shown to succeed with high probability is nontrivial, due to the large number of potentially pathological models that need to be ruled out.

The initial inspiration for the present work was to explore principled techniques for model validation in order to explore the validity of the application of the algorithms in [Sreedharan et al.(2019)Sreedharan, Magner, Szpankowski, and Grama] to real networks, such as protein interaction and brain region co-excitation networks. In that work, the authors devised algorithms and information-theoretic bounds to estimate the arrival order of nodes from a snapshot of a growing graph, assuming preferential attachment as a generative model. We regard the present paper as forming initial groundwork for subsequent developments along the lines of model validation.

1.1 Organization of the paper

In the main body of the paper, we state the problem formally and give main definitions and results and experiments. We give the proofs in Appendix A.

2 Formal problem setting

We begin by formulating the problem in general. The most basic object of study is a (discrete-time) dynamic random graph model.

{definition}

[Dynamic random graph model] A dynamic random graph model (or dynamic mechanism) is a graph-valued Markov process. More specifically, it is specified by a sequence of conditional distributions Pr[G0], {Pr[Gt|Gt-1]}t=1. The Markov assumption is a natural one, especially in light of the fact that many dynamic graph models satisfy it (including the various preferential attachment models, the Cooper-Frieze web graph model [Newman(2010)], the dynamic stochastic block model, the duplication-divergence model, etc). It has additionally already appeared explicitly in works on dynamic graphs [Avin et al.(2008)Avin, Koucký, and Lotker]. However, one can imagine plausible situations in which it does not hold: for example, nodes may be nostalgic, in the sense that they may choose to connect with others that were, at some point in the past, of high degree, or with nodes to which they had been previously frequently connected. We do not deal explicitly with such situations in the present work (but it is possible that they may be accommodated by a suitable change in the definition of the network under consideration).

We next define a distortion measure on Markov processes, which will be used to define our testing problem. {definition}[Distortion measure on Markov processes] Consider two Markov processes X0=(X0,1,,X0,n,), X1=(X1,1,,X1,n,). We define the following distortion:

dn(X0,X1)=j=1n-1𝔼ZX1,j[dTV([X0,j+1|X0,j=Z],[X1,j+1|X1,j=Z])], (1)

where dTV is the total variation distance between the laws of the two random variables. We note that the proposed distortion measure is not symmetric, which is natural in our problem setting. In general, the choice of distortion measure depends on the application at hand. It implicitly encodes which models we choose to regard as distinguishable, and in subsequent work we intend to examine this issue more closely. The measure under consideration here has the following natural interpretation, coming from the interpretation of the total variation distance: consider a setting in which we observe a sample Z from X1 at the time step t-1, for a uniformly randomly selected time t[n]. A fair coin BBernoulli(1/2) is then flipped, and we then observe a sample from XB,t, conditioned on XB,t-1=Z. We are then asked to guess the value of B (i.e., which process generated the next value conditioned on the initial value that we saw). Up to constant factors, dn(X0,X1)/n gives the success probability of an optimal hypothesis test for this problem (this is folklore about total variation distance). In other words, dn(,) measures the distinguishing power of a uniformly random timestep from a trajectory from one of the models. See Example 2 below for an illustration.

Problem statement:

Having introduced dynamic mechanisms and a measure of distortion between them, we come to our main problem statement. Fix a particular natural class 𝒞 of models with some distinguished member M0. Let 𝒢 denote the set of all graphs (for simplicity, let us throughout only consider multigraphs with finitely many vertices and edges). Let BBernoulli(1/2). We wish to exhibit a test for M0, which takes the form of a function F=FM0:𝒢n{0,1} mapping sequences of graphs to {0,1}; that is, the input to the test F is an observed dynamic graph trajectory with n timesteps, and the output represents whether or not the test decides that the trajectory came from M0. Ideally, the test should distinguish trajectories coming from M0 from those coming from an arbitrary unknown M1𝒞 satisfying dn(M0,M1)nϵ (we will find that ϵ cannot always be taken to be arbitrarily close to 0, due to an intrinsic property of a model which we call its non-stationary sampling radius). More precisely, F takes n steps G1,,Gn from MB and should satisfy F(G1,,Gn)=B with high probability (asymptotically as n). In what follows, we will exhibit a class 𝒞 and a universal test F for it that succeeds with high probability under natural conditions.

The class will be inspired by (and will contain) the (linear) preferential attachment model, which has received extensive attention over several years in various communities as a simple model that produces a power law degree distribution [Albert and Barabási(2002), Bollobás et al.(2001)Bollobás, Riordan, Spencer, and Tusnády]. This model, denoted by 𝒫𝒜(m,n), is defined as follows: there is a parameter m. The graph G1 is a single vertex (called 1) with m self loops. Conditioned on Gt-1, Gt is sampled as follows: we have Gt-1Gt, and there is an additional vertex t with m edges to vertices in [t-1]={1,,t}, chosen independently as follows: the probability that a particular choice goes to vertex v[t-1] is given by degt-1(v)2m(t-1), where degt-1(v) is the degree of v in Gt-1. There are several small tweaks to this model, but they have no bearing on the analysis in this paper. In examples, we will also mention the uniform attachment model, which is similar to preferential attachment, except that the conditional probability of a given vertex v being chosen at time t is 1/(t-1). We denote this model by 𝒰𝒜(m,n).

{example}

Consider the case of uniform attachment versus preferential attachment graphs with shared parameter m1. Let us consider dn(𝒫𝒜,𝒰𝒜). The jth term of the sum defining the distortion measure is an expectation with respect to a random variable Z𝒰𝒜(m,j); that is, Z is a uniform attachment graph on j vertices. Now, the conditional probability distribution of each of the choices of the j+1st vertex, given Z, in the uniform attachment model is simply the uniform distribution on the vertices [j]={1,,j}. Meanwhile, the conditional distribution of each choice made by the j+1st vertex, given Z, in the preferential attachment model depends on the degrees of vertices in Z. It is known that with high probability in the uniform attachment model, as j, there are Θ(j) vertices with degree m, and so the preferential attachment distribution assigns Θ(j) vertices a conditional probability equal to 12(j-1). In particular, this implies that dn(𝒫𝒜,𝒰𝒜)=Ω(n). A more careful analysis, using known results from the literature, can yield precise asymptotics.

3 Main results

We will build slowly to our main results and model class definition. The main difficulty in the problem is that we are tasked with inferring information about a sequence of probability distributions, usually given only a single (or few) graph trajectory, which implies only a single sample of each conditional distribution. In general, then, we must appropriately restrict our model class 𝒞 in order to yield a solvable problem. The key insight is that, between consecutive time steps, the network structure, and, hence, the corresponding conditional distributions, should not change significantly (note, however, that this is a nontrivial phenomenon to formalize). Our plan of attack will be to pretend that the sequence of updates to the graph immediately after a given time point are independently sampled from the same distribution and to construct an estimator of our metric from these samples. For simplicity of exposition, we restrict below to the case of growing networks, where a single vertex is added at each timestep, and it connects to some set of previous vertices. In this setting, an update (with respect to a given graph sequence) at time t takes the form of a selection (a multiset) of vertices present in the graph at the previous timestep to which vertex t will connect. We will denote the set of possible updates at time t by 𝔻t. For a given update κ𝔻t, we will use tκ to denote the event that vertex t connects to the vertices specified by κ at time t. We denote by Ut the update at time t in a given sample graph trajectory.

Formally, we will define the test statistic as follows. Fix C(n) (which we call the sample width) and M(n) (the number of probe points), and, for t[n] and κ𝔻t, let μ^t,κ denote the following probability:

μ^t,κ=|{h[t,t+C(n)):hκ}||{h[t,t+C(n)):Uh[t-1]}|. (2)

Let μ^t denote the probability distribution giving probability μ^t,κ to κ, for each possible κ. That is, μ^t is an empirical probability distribution over updates. We note that the sample intervals corresponding to different probe points in general can overlap substantially.

Let pt,κM0=pt,κ denote the conditional probability given to update κ at time t by model M0. For example, when M0 is preferential attachment with parameter m1 and Gt-1 is given, κ is a multiset {v1,,vm} of cardinality m of vertices in Gt-1 (in particular, those to which the new vertex t will connect), and pt,κ is given by

pt,κ=i=1mdegt-1(vi)2m(t-1). (3)

Given an input trajectory G=(G1,,Gn), the test statistic is computed by randomly sampling M(n) probe points r1rM(n)[n] and computing

S(n):=SM0(G1,,Gn)=j=1M(n)dTV(μ^rj,prjM0). (4)

We often shall write S:=S(n) for SM0(G1,,Gn). This may be thought of as an estimate of dn(MB,M0) via non-stationary sampling. Now we are ready for present our test. We will further restrict to the case where the cardinality of an update is at most some fixed m1. This is natural in light of the fact that many real networks tend to be sparse.

Algorithm: Test-DynamicGraph (n,M0,M(n),C(n)). Input: Sample trajectory G=(G1,,Gn)MB, distance threshold D0. Output: An estimate B^{0,1} of B. 1. Select M(n) probe points r1,,rM(n) uniformly at random with replacement from [n] and compute μ^rj for each j[M(n)]. 2. Compute S(n):=SM0(G1,,Gn)=j=1M(n)dTV(μ^rj,prjM0). 3. Set α=α(n)=Dn/2, and compute 𝔼M0[SM0], where 𝔼M0[SM0] can be estimated by sampling G=(G1,,Gn) from M0 (or, possibly, computed analytically). 4. If |S(n)-𝔼M0[SM0]|>α:=α(n), set B^=1, else B^=0.

We note that although S has the form of an estimate of dn(MB,M0), it does not converge in probability to this value (indeed, it is likely that a single trajectory is insufficient to estimate S). However, as we will see, the above test succeeds with high probability under natural conditions.

To state our main result for this test, we need to develop some concepts related to our model class (from which both M0 and M1 will come). The general pattern of the analysis of the test to show that it succeeds with high probability in distinguishing two models will proceed in two steps:

  • Show that |𝔼M1[S]-𝔼M0[S]| is sufficiently large.

  • Show that S is well-concentrated under both M0 and M1.

Let us examine the first step more closely. We can lower bound the expected value difference as follows: let Sj denote the jth term in the sum (4) defining the test statistic (note that it is defined with respect to M0). We have

𝔼M0[Sj]=𝔼M0[dTV(μ^rj,prjM0)], (5)

and

𝔼M1[Sj]=𝔼M1[dTV(μ^rj,prjM0)]. (6)

By the reverse triangle inequality, this becomes

𝔼M1[Sj]|𝔼M1[dTV(μ^rj,prjM1)]-𝔼M1[dTV(prjM1,prjM0)]|. (7)

This can be further lower bounded by

𝔼M1[Sj]𝔼M1[dTV(prjM1,prjM0)]-𝔼M1[dTV(μ^rj,prjM1)]. (8)

Thus, we have

|𝔼M1[Sj]-𝔼M0[Sj]|𝔼M1[dTV(prjM1,prjM0)]-𝔼M1[dTV(μ^rj,prjM1)]-𝔼M0[dTV(μ^rj,prjM0)]. (9)

The positive term measures the contribution of the point rj to the distance between M0 and M1. Meanwhile, the negative terms are both intrinsic estimation errors from non-stationary sampling. In order for two models to be easy to distinguish, we desire that these terms be small in absolute value. In the sequel we call 𝔼M[SM] the non-stationary sampling radius.

We will show in Section A.1 a concentration result for the test statistic for any M0,M1 in the model class 𝒞 (inspired by the analysis of non-stationary sampling on the preferential attachment model) described below in the following definition. {definition}[Bounded-degree model class] Let 𝒞=𝒞m, for any fixed m1, denote the class of dynamic random graph models M taking the following form: for each time step t, there is a positive integer-valued random variable Γtm, independent of Gt-1 and all other Γt, denoting the number of edges to be added at timestep t between a new vertex t and vertices present in the graph at the previous timestep. Conditioned on Γt and Gt-1, the random variable Ut consists of Γt independent and identically distributed choices of vertices in [t-1], according to a probability distribution {πt,v}v=1t-1 (note that a vertex may be chosen multiple times in a given timestep, which would yield a multigraph). We call 𝒞m the class of bounded-degree models. For instance, in preferential attachment, Γt=m with probability 1, and πt,v is simply degt-1(v)2m(t-1), so that pt,Ut=vUtπt,v.

We further stipulate the following conditions. As will be spelled out explicitly in Example 3, these are inspired by the basic properties of the preferential attachment model that imply concentration and tail bounds on its degree sequence.

{definition}

[Further natural model class conditions]

  1. (i)

    Each πt,v is dependent on Gt-1 only through a random variable Yt,v, which, conditioned on Gt-1, is independent of any other Yt,v with tt and v. In other words, Gt-1Yt,vpt,v forms a Markov chain (cf. [Cover and Thomas(2006)]).

  2. (ii)

    There is a positive constant Δ such that, for each t, at most Δ vertices satisfy Yt,v-Yt-1,v0.

  3. (iii)

    We have πt,t-1=Θ(1/t), uniformly in t, with probability 1.

  4. (iv)

    Only vertices chosen in a given timestep can increase significantly in conditional probability: there exist some constants 0<c1,c2, with c2<1, such that, for any timestep t, if a vertex v is not chosen for connection, then

    πt+1,v-πt,vc1πt,v/t. (10)

    If, on the other hand, v is chosen for connection, then we only require that

    πt+1,v-πt,vc2(1-πt,v)/t. (11)
{example}

The linear preferential attachment model is easily seen to be contained in 𝒞=𝒞m, taking Yt,v=degt(v). The second constraint follows for this model because only at most m vertices’ degrees change after a given time step. The third constraint follows because the degree of vertex t immediately after it is added is m, the parameter of the model. That is, πt,t-1=m2m(t-1)=12(t-1)=Θ(1/t). The final constraint can be seen as follows. If a vertex v is not chosen in timestep t, then the change in its conditional probability at time t+1 is given by

πt+1,v-πt,v=degt(v)2mt-degt(v)2m(t-1)=degt(v)2m(1t-1t-1)=-πt,vt(1+O(t-2)). (12)

On the other hand, if a vertex is chosen (say, exactly once) at timestep t, then

πt+1,v-πt,v=degt-1(v)+12mt-degt-1(v)2m(t-1)=-πt,vt(1+O(t-2))+12mt. (13)

For another example, consider the uniform attachment model with parameter m. In this case, πt,v=1t-1 for any v. We may take Yt,v to be degt-1(v) (though πt,v is trivially independent of Gt-1), so that conditions (i) and (ii) are trivially satisfied, as is condition (iii). Since, in any timestep, we have πt+1,v-πt,v=1t-1t-1=-1t(t-1), condition (iv) is satisfied as well. Mixtures of preferential and uniform attachment can additionally be seen to fit into this model class.

Additionally, models involving preferential attachment to vertices based, e.g., on the number of triangles in which they participate are also in 𝒞. While many natural models (such as nonlinear preferential attachment and the standard duplication-divergence model) are not obviously contained in this class, similar results to ours (concerning our proposed test statistic) nonetheless may be shown to hold with a minor generalization of our model class.

We note that conditions (iii) and (iv) together imply an important property of models in the class: for a model M, let NtM(q) denote the number of vertices v satisfying πt,vM=q (note that this is a random variable). Furthermore, let RM,t(q) denote the following set:

RM,t(q)={v:πt,vMq}. (14)

Then conditions (iii) and (iv) together imply that, for any M0,M1𝒞, possibly with M0=M1, we have that

q1𝔼M1[NtM0(x)]𝑑x=𝔼M1[|RM0,t(q)|]C(qt)1+γ, (15)

for some γ<-2, some C>0, and all q[0,1], tn. Intuitively, the function NtM0(q) is conceptually related to the degree sequence of the graph at time t, and, in preferential attachment, it is exactly the degree sequence. Thus, this inequality is akin to a power law degree sequence tail.

Inequality (15) can be seen using the following reasoning: we will prove Theorem 3 below, which will allow us to upper bound |RM1,t(q)|, for any model M1, in terms of |RM0,t(q)|, where M0 is some other model. In particular, we will choose M0 to be preferential attachment, and so this will allow us to upper bound |RM1,t(q)| in terms of the degree sequence tail of preferential attachment graphs.

{theorem}

Consider two models M0,M1 satisfying conditions (iii) and (iv) of Definition 3 with constants cb,k, for b{0,1} and k{0,1,2}. Suppose, further, that M0 satisfies inequalities (10) and (11) with asymptotic equality (as t), and that c1,kc0,k for each k{1,2}. Then for any sequence of vertex choices v1,,vt,, we have that there exists some C>0 for which

RM1,t(Cq)RM0,t(q). (16)

Therefore,

|RM1,t(Cq)||RM0,t(q)|. (17)
{proof}

We note that if all vertices v satisfy

πv,tM1Cπv,tM0, (18)

then vRM1,t(Cq) implies that πv,tM1Cq, which implies that

πv,tM0q, (19)

so that vRM0,t(q), and the inclusion claimed by the theorem is verified.

It thus remains to verify (18). We can do this by induction on t. At time t, when vertex t appears, our initial condition says that

πt,tM1=c1,0/t=c1,0c0,0c0,0t=c1,0c0,0c0,0/t. (20)

so we can take C=c1,0/c0,0, and this verifies the base case.

Now, for the inductive hypothesis, we have that for any vertex v,

πv,tM1Cπv,tM0, (21)

so, for a vvt,

πv,t+1M1Cπv,tM0+c1,1πv,tM1/tCπv,t+1M0 (22)

as long as c1,1c0,1. Similarly, as long as c1,2c0,2, the same inequality holds for v=vt. To conclude the proof of (15), we note that the method of proof used to establish upper bounds on the expected degree sequence for preferential attachment applies for arbitrary fixed values of c1>0 and c2(0,1).

As hinted above, conditions (iii) and (iv) are important in that they imply (15). This motivates the definition of a more general model class, this time parametrized by a model M0. {definition}[More general model class] Suppose that M is a bounded-degree (with degree bound m>0) model satisfying conditions (i) and (ii) of Definition 3, in addition to (15) for M0=M1=M. We define the model class 𝒞M,m=𝒞M to be the set of models M again satisfying (i) and (ii) of Definition 3, and also (15) with M0=M,M1=M and M0=M,M1=M. In what follows, for simplicity, we will say that a pair of models (M0,M1) satisfies Definition 3 if M0 is as in the definition and M1𝒞M0.

We arrive at our main result. For a pair of models M0 and M1, it will be phrased in terms of dn(M0,M1) as well as the sum of their non-stationary sampling radii. Intuitively, the non-stationary sampling radius of a model measures the fidelity with which non-stationary sampling of a sample trajectory from a model can estimate the transition probabilities of the model itself. Thus, to distinguish two models via non-stationary sampling, it is required that they be well-separated and that they be estimable via non-stationary sampling. The expressions in the theorem capture this more precisely.

{theorem}

[Distinguishability for a natural model class] Let 𝒞 be as in Definition 3. Let M0,M1𝒞 be any pair of models such that

dn(M0,M1)-𝔼M0[SM0]-𝔼M1[SM1]>Dn, (23)

where D is the constant input given in the algorithm. Alternatively, let (M0,M1) satisfy Definition 3, again satisfying the inequality (23).

Then the test based on the statistic S defined in (4), with C(n) and M(n) both chosen to be Θ(n), succeeds with probability 1-Θ(n-δ) for some δ=δ(D)>0.

In order to estimate 𝔼M[SM], we will also show that a model’s non-stationary sampling radius can be efficiently estimated from samples, provided that it is a member of 𝒞. {theorem}[Estimability of the non-stationary sampling radius] Let M𝒞. Then the following concentration result holds, again in the setting where M(n) and C(n) are Θ(n): for any c>0, there exists some ϵ>0 for which

PrM[|SM-𝔼M[SM]|cn]=O(n-ϵ). (24)

This theorem implies that one may estimate the non-stationary sampling radius of M with high probability up to an arbitrarily small multiplicative error by taking a single sample trajectory and computing SM on it.

3.1 Running time and sample complexity upper bounds

Here we consider the running time of the calculation of the proposed test. We stress that the focus of this paper is not on algorithmic issues, but rather on conditions under which the error probability of the test can be shown to decay to 0. However, the running time is polynomial in all relevant parameters, provided that the conditional distributions of M0 can be computed in polynomial time. {theorem}[Running time of the proposed test] Suppose that the probability distribution ptM0 can be computed in time X(t). Then a naive algorithm for computing SM0(G1,,Gn) works in time Θ(nX(n)+n2). {proof} A simple way to perform the computation is to compute each term of the sum defining the test statistic independently of any other one. There are M(n)=Θ(n) such terms, and, for each term, computing the total variation distance requires at most X(n)+Θ(n) time. The result is that the running time is at most O(nX(n)+n2). As an example, computing the conditional distribution at each timestep t takes time X(t)=O(t) under uniform and preferential attachment models. Therefore, the running time of the above algorithm for such models is O(n2).

Regarding sample complexity, we may consider a sample to correspond to a graph at any timestep that we use to compute the test statistic. Since the intervals corresponding to different probe points can overlap substantially, graphs corresponding to certain timesteps may appear several times in the test statistic calculation. However, these only count as one sample, and so we have a sample complexity of at most O(n) (it is likely that this can be improved). We stress that our theorems remain nontrivial and valuable, because they show conditions under which the probability of error decays to 0 – a priori, it need not be the case that this happens, even with full information about the graph trajectory.

4 Experimental results

We empirically investigate the non-stationary sampling radius for a certain subset of models. Since this quantity is crucial for our theoretical guarantees, it is of interest to build intuition about it.

We consider a mixture model interpolating between uniform and preferential attachment in the case m=1. Namely, at each step, the conditional probability of vertex v[t-1] is given by πt,v=βt-1+(1-β)degt-1(v)2(t-1), for a parameter β[0,1]. Note that β=0 corresponds to preferential attachment, while β=1 corresponds to uniform.

We plotted estimates for the non-stationary sampling radius of several models as β ranges from 0 to 1. The result is in Figure 1. We see a clear linear trend, with the quantity minimized at β=0 (preferential attachment). On this basis, and given the close relation between the empirical distribution coming from non-stationary sampling and the degrees of vertices, we have the following conjecture. {conjecture} The model in 𝒞 with the minimum non-stationary sampling radius is preferential attachment.

Figure 1: Plot of empirical estimates of 𝔼M[SM] for mixtures of uniform and preferential attachment with various values of the mixing parameter. Preferential attachment corresponds to mixing parameter β=0, while uniform corresponds to β=1.0. Here, n=90. The average is over 40 samples.

We additionally have results applying our test to the case of distinguishing uniform attachment (playing the role of M0) from preferential attachment. See Figure 2. The precision jumps significantly at n=10.

Figure 2: Precision of our test distinguishing uniform attachment from preferential as n increases to 150 (here, precision is the fraction of samples declared to be from preferential attachment, normalized by the number that, indeed, are). The fraction of trials in which preferential attachment was correctly identified (i.e., the recall) was always 1. For each n, 40 samples were taken, with each sample coming from uniform attachment with probability 1/2.

References

  • [Albert and Barabási(2002)] Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Rev. Mod. Phys., 74:47–97, Jan 2002. doi: 10.1103/RevModPhys.74.47. URL https://link.aps.org/doi/10.1103/RevModPhys.74.47.
  • [Avin et al.(2008)Avin, Koucký, and Lotker] Chen Avin, Michal Koucký, and Zvi Lotker. How to explore a fast-changing world (cover time of a simple random walk on evolving graphs). In Automata, Languages and Programming, pages 121–132, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg. ISBN 978-3-540-70575-8.
  • [Baum and Petrie(1966)] Leonard E. Baum and Ted Petrie. Statistical inference for probabilistic functions of finite state markov chains. Ann. Math. Statist., 37(6):1554–1563, 12 1966. doi: 10.1214/aoms/1177699147. URL https://doi.org/10.1214/aoms/1177699147.
  • [Bollobás et al.(2001)Bollobás, Riordan, Spencer, and Tusnády] Béla Bollobás, Oliver Riordan, Joel Spencer, and Gábor Tusnády. The degree sequence of a scale-free random graph process. Random Structures & Algorithms, 18(3):279–290, 2001. doi: 10.1002/rsa.1009. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/rsa.1009.
  • [Borgs et al.(2017)Borgs, Chayes, Cohn, and Veitch] Christian Borgs, Jennifer T. Chayes, Henry Cohn, and Victor Veitch. Sampling perspectives on sparse exchangeable graphs. arXiv e-prints, art. arXiv:1708.03237, Aug 2017.
  • [Cherapanamjeri and Bartlett(2019)] Yeshwanth Cherapanamjeri and Peter L. Bartlett. Testing symmetric markov chains without hitting. In COLT, 2019.
  • [Cover and Thomas(2006)] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, New York, NY, USA, 2006. ISBN 0471241954.
  • [Daskalakis et al.(2018)Daskalakis, Dikkala, and Gravin] Constantinos Daskalakis, Nishanth Dikkala, and Nick Gravin. Testing symmetric markov chains from a single trajectory. In Proceedings of the 31st Conference On Learning Theory, volume 75 of Proceedings of Machine Learning Research, pages 385–409. PMLR, 06–09 Jul 2018. URL http://proceedings.mlr.press/v75/daskalakis18a.html.
  • [Diakonikolas et al.(2017)Diakonikolas, Gouleakis, Peebles, and Price] Ilias Diakonikolas, Themis Gouleakis, John Peebles, and Eric Price. Sample-optimal identity testing with high probability. Electronic Colloquium on Computational Complexity (ECCC), 24:133, 2017.
  • [Ghoshdastidar et al.(2017)Ghoshdastidar, Gutzeit, Carpentier, and von Luxburg] D. Ghoshdastidar, M. Gutzeit, A. Carpentier, and U. von Luxburg. Two-sample hypothesis testing for inhomogeneous random graphs. preprint, 2017. arXiv preprint (arXiv:1707.00833).
  • [Hofstad(2016)] Remco van der Hofstad. Random Graphs and Complex Networks, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2016. doi: 10.1017/9781316779422.
  • [Jeong et al.(2003)Jeong, Néda, and Barabási] H. Jeong, Z. Néda, and A. L. Barabási. Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4):567, 2003. URL http://stacks.iop.org/0295-5075/61/i=4/a=567.
  • [Nakagawa and Kanaya(1993)] Kenji Nakagawa and Fumio Kanaya. On the converse theorem in statistical hypothesis testing for markov chains. IEEE Trans. Information Theory, 39:629–633, 1993.
  • [Natarajan(1985)] S. Natarajan. Large deviations, hypotheses testing, and source coding for finite markov chains. IEEE Transactions on Information Theory, 31(3):360–365, May 1985. ISSN 0018-9448. doi: 10.1109/TIT.1985.1057036.
  • [Newman(2010)] Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010. ISBN 0199206651, 9780199206650.
  • [Orbanz and Roy(2013)] Peter Orbanz and Daniel Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 12 2013. doi: 10.1109/TPAMI.2014.2334607.
  • [Ryabko(2017)] Daniil Ryabko. Hypotheses testing on infinite random graphs. In ALT, 2017.
  • [Sreedharan et al.(2019)Sreedharan, Magner, Szpankowski, and Grama] J.K. Sreedharan, A. Magner, W. Szpankowski, and A. Grama. Inferring temporal information from a snapshot of a dynamic network. Nature Scientific Reports, 9:3057, 2019.
  • [Veitch and Roy(2016)] Victor Veitch and Daniel M. Roy. Sampling and estimation for (sparse) exchangeable graphs. ArXiv, abs/1611.00843, 2016.
  • [Wasserman(2010)] Larry Wasserman. All of Statistics: A Concise Course in Statistical Inference. Springer Publishing Company, Incorporated, 2010. ISBN 1441923225, 9781441923226.
  • [Wolfer and Kontorovich(2019)] Geoffrey Wolfer and Aryeh Kontorovich. Minimax testing of identity to a reference ergodic markov chain. ArXiv, abs/1902.00080, 2019.

Appendix A Proofs

In this section we prove Theorem 3 and Theorem 3.

A roadmap of the proof of Theorem 3 is as follows: we show that, regardless of which of M0 or M1 generated the observed trajectory, the test statistic S is well-concentrated around its mean (in both cases, this is typically Θ(n)). Furthermore, the hypothesis (23) allows us to conclude that 𝔼M0[S] and 𝔼M1[S] are well-separated. Thus, we are able to find an appropriate decision boundary for our test. The concentration of the test statistic boils down to concentration of each of its terms, which we are able to establish by rewriting them as an appropriate double integral in terms of a counting function (Nt(p,q) below) which is analogous to the degree sequence in preferential attachment. We are then able to adapt techniques used to establish concentration of the degree sequence to complete the proof. The case for general m1 (where we recall that m is an upper bound on the number of edges that may be added in a given timestep) is a corollary of the case for m=1, and so below we handle the case of m=1. This simplifies the notation somewhat: updates are now single vertices, and so we can use pt,v in place of πt,v, etc.

A.1 Proof of Theorem 3

The analysis of the test boils down to showing concentration of the test statistic when the observed graph trajectory is from either M0 or M1. We will start by proving the following.

{proposition}

[Concentration of dTV(μ^t,pt)] We have that for any models M0,M1 as in the setting of Theorem 3, there is some C>0 such that for any sufficiently small ϵ1>0, there exists ϵ2>0 for which

PrM1[|dTV(μ^t,ptM0)-𝔼M1[dTV(μ^t,ptM0)]|>t-ϵ1]Ct-ϵ2. (25)

In order to show Proposition A.1, we rewrite the total variation distance as follows: in general, consider discrete random variables X and Y. Then

dTV(X,Y)=120101N(p,q)|p-q|𝑑q𝑑p, (26)

where N(p,q) is the number of elements x in the common domain of X,Y for which Pr[X=x]=p and Pr[Y=x]=q, and the integrals are with respect to discrete (Dirac) measures supported on the set of values for which N(p,q) can possibly be nonzero. We see, then, that concentration of dTV(μ^t,ptM0) will follow from concentration of Nt(p,q) (now parameterized by t), where, in our case, the random variables in question are distributed according to μ^t (which will correspond to the p integrating variable) and ptM0=pt (which will correspond to the q integrating variable). More explicitly, Nt(p,q) is the number of vertices v in the interval [t-1] whose empirical probability μ^t,v is equal to p and whose conditional probability at time t according to model M0 is equal to q. Note that Nt(p,q) is a random variable, since it depends on μ^t, which itself depends on the observed trajectory. It is interesting to observe at this point that Nt(p,q) is closely related, in the preferential attachment case, to the degree distribution, so ideas used in proving concentration of the degree sequence in that model can be generalized to prove concentration for Nt(p,q). The next lemma gives the necessary concentration result. {lemma}[Concentration for Nt(p,q)] Consider a model M1𝒞. Then

PrM1[|Nt(p,q)-𝔼M1[Nt(p,q)]|x]2exp(-x2cC(n)), (27)

for arbitrary p,q[0,1], some positive constant c, and we recall that C(n)=Θ(n). To prove this, we will use the method of bounded differences. Define the martingale Zs, for s{t,t+1,,t+C(n)}, as

Zs=𝔼M1[Nt(p,q)|Gs]. (28)

The next lemma establishes the necessary bound on the martingale differences. {lemma}[Martingale difference bound] Let t[n], and consider the setting of Proposition A.1. Then for all s[t+1,,t+C(n)],

|Zs-Zs-1|2Δ, (29)

where Δ is the constant in the specification of 𝒞 in Definition 3. {proof} We consider the following pair of dynamic mechanisms: M1 and M1, which are coupled as follows: at times 1,,s-1, they are equal. At times s and beyond, they evolve independently but according to M1. We will show that we can rewrite the martingale differences in terms of M1 and M1 as follows:

Zs-Zs-1=v=1t(PrM1[μ^t,v=p,pt,v=q|Gs]-PrM1[μ^t,v=p,pt,v=q|Gs]). (30)

We will let Gs be the graph at time s according to M1 and Gs be the graph at time s according to process M1. To prove the identity, by linearity of expectation, we have

Zs=v=1tPrM1[μ^t,v=p,pt,v=q|Gs]. (31)

An expression for Zs-1 can similarly be written, with the only change being in the conditioning: Gs-1 instead of Gs. Now, conditioning on Gs has the same effect on random variables from M1 as conditioning on Gs-1, since the choices made by M1 are independent of Gs,Gs+1,. This gives us (30).

It remains to upper bound the right-hand side of (30). We note that

PrM1[μ^t,v=p,pt,v=q|Gs]=𝔼M1[Pr[μ^t,v=p,pt,v=q|Gs,Gs]]. (32)

I.e., we have conditioned on the choice at time s according to M1. We thus have

PrM1[μ^t,v=p,pt,v=q|Gs]-PrM1[μ^t,v=p,pt,v=q|Gs] (33)
=𝔼M1[PrM1[μ^t,v=p,pt,v=q|Gs]-PrM1[μ^t,v=p,pt,v=q|Gs,Gs]]. (34)

The quantity inside the expectation measures the difference in probabilities after the choice at time s is made according to each mechanism. Using the conditional independence properties of the model class, at most 2Δ vertices v are such that this difference is nonzero. More precisely, let W,W be the sets of vertices v in the two respective trajectories such that Ys,vYs-1,v. Since all future vertex choices relating to vertices not in these sets are conditionally independent of them given Gs,Gs, the above difference is only nonzero for vertices in W and W. By the model assumptions, their cardinalities are both at most Δ. This completes the proof.

With Lemma A.1 in hand, we apply the Azuma-Hoeffding inequality to prove Lemma A.1. The probability upper bound in Lemma A.1 is given by

2exp(-x2s=t+1t+C(n)(2Δ)2)=2exp(-x2(2Δ)2C(n)).

We are finally ready to prove Proposition A.1. {proof}[Proof of Proposition A.1] We will bound the integral representation (26) of the total variation distance.

We will start by ruling out large values of q. In particular, set q=Ω(tδ), where δ=-1-γ-ϵγ, for an arbitrarily small positive ϵ. That is, we wish to upper bound the integral

|01q01Nt(p,q)-𝔼M1[Nt(p,q)]dqdp|. (35)

By (15), we have

q01𝔼M1[Nt(p,q)]𝑑qq01𝔼M1[NM0,t(q)]𝑑q=O((qt)1+γ), (36)

where we recall that γ<-2. Similarly, by Markov’s inequality, we have that for any x>0,

Pr[q=q01Nt(p,q)dq>x]Pr[q=q01NM0,t(q)dq>x]q=q01𝔼M1[NM0,t(q)]𝑑qxO((qt)1+γ)x. (37)

In particular, we will choose x=Θ((qt)1+γ+ϵ), which has the following consequence:

Pr[q=q01Nt(p,q)dq>x]=O((qt)-ϵ), (38)

and since q=Θ(t-δ) with δ>-1, this tends to 0 polynomially fast in t. All of this has the consequence that

Pr[|01q01Nt(p,q)-𝔼M1[Nt(p,q)]dqdp|>(qt)1+γ+ϵ]O((qt)-ϵ). (39)

We can thus ignore the range where q=Ω(t-1-γ-ϵγ) for arbitrary fixed positive ϵ. Now we focus on the small q range: δ-1-γ-ϵγ.

We will split the integral as follows:

0t-1-γ-ϵγ01|Nt(p,q)-𝔼M1[Nt(p,q)]||p-q|𝑑p𝑑q (40)
=0t-1-γ-ϵγ0p0|Nt(p,q)-𝔼M1[Nt(p,q)]||p-q|𝑑p𝑑q (41)
 +0t-1-γ-ϵγp01|Nt(p,q)-𝔼M1[Nt(p,q)]||p-q|𝑑p𝑑q, (42)

where p0=(1+c)t-1-γ-ϵγ, with c some small positive constant. The first integral on the right-hand side can be handled by noticing that |p-q|=O(t-1-γ-ϵγ) throughout, and by Lemma A.1 (with x=t1/2+ε1), with probability exponentially close to 1 (with respect to t), we have

|Nt(p,q)-𝔼M1[Nt(p,q)]|O(t1/2+ϵ1). (43)

Thus, the entire first integral is O(t1/2+ϵ1-1+γ+ϵγ), which tends to 0 because γ<-2.

To handle the second integral (where |p-q| may be Θ(1)), we notice that |p-q|1, and then, with high probability (by Markov’s inequality), |Nt(p,q)-𝔼M1[Nt(p,q)]|𝔼M1[Nt(p,q)]tϵ2. This last inequality can be seen more precisely as follows: since Nt(p,q) is a cardinality, it is lower bounded by 0. Thus, if Nt(p,q)<𝔼M1[Nt(p,q)], the difference between the two must be at most 𝔼M1[Nt(p,q)], which is certainly less than 𝔼M1[Nt(p,q)]tϵ2. On the other hand, we can see by Markov’s inequality that

Pr[Nt(p,q)>𝔼M1[Nt(p,q)]tϵ2]𝔼M1[Nt(p,q)]𝔼M1[Nt(p,q)]tϵ2=t-ϵ2.

We can further upper bound by 𝔼M1[Nt(p,q)]𝔼M1[Nμ^t(p)] (where Nμ^t(p) is the number of vertices having conditional probability p according to the distribution μ^t), and since pp0, 𝔼M1[Nμ^t(p)]=O(t-ϵ3), which is a consequence of the model class definition. In particular, the expected number of vertices v with pt,vM1=p is at most Ct-ϵ, because of (15) and the fact that pp0 (in exactly the same way that we concluded (36)). This can be used to show that 𝔼M1[Nμ^t(p)]=O(t-ϵ3), using a proof analogous to the one that upper bounds the maximum degree for preferential attachment graphs.

Then, we are left with the double integral

t-ϵ3+ϵ20t-1-γ-ϵγ011𝑑q𝑑p=O(t-1-γ-ϵγ+ϵ2). (44)

Recalling that γ<-2, we can make the exponent negative by setting ϵ and ϵ2 to be small enough, so the right-hand side of the above expression tends to 0. More precisely, since ϵ>0, the exponent is at most -1γ-1+ϵ2. Since γ<-2, we have -1γ<1/2, so the exponent is less than -1/2+ϵ2. Thus, if we choose ϵ2<1/2, the resulting exponent is negative. This yields the claimed result.

Now, we use Proposition A.1 to complete the proof of Theorem 3. We shall show that the test statistic is well concentrated. In particular, we want to show the following.

{theorem}

[Concentration of the test statistic] We have, for any models M0,M1𝒞 (or M0, M1 as in Definition 3), and for any b{0,1} and any small positive constant c, that

PrMb[|SM0-𝔼Mb[SM0]|>cn]O(n-ϵ1), (45)

for some ϵ1. {proof} We note that the terms Sj=dTV(μ^rj,prjM0) of the sum defining S may be heavily dependent, due to the fact that the sampling intervals are likely to overlap heavily. To circumvent this, define X to be the number of probe points rj for which

|Sj-𝔼Mb[Sj]|>c/2, (46)

where we recall that Sj is the term in the sum defining S corresponding to the jth probe point. By Markov’s inequality, for arbitrarily small ϵ1>0,

Pr[X>n1-ϵ1]𝔼[X]n1-ϵ1. (47)

We can calculate 𝔼[X] using linearity of expectation, and we see from Proposition A.1 that each term decays at least polynomially fast in n to 0. We thus have

Pr[X>n1-ϵ1]n-ϵ2, (48)

for some small ϵ1,ϵ2>0. Now, if Xn1-ϵ1, then |S-𝔼[S]|O(n1-ϵ1)+cn/2(1-O(n-ϵ1)), which implies the desired result.

Now, to finish the analysis of the error of the test, consider first the case where the sample trajectory comes from M0. Then with probability 1-O(n-ϵ1), we have that |S-𝔼M0[SM0]|<cn for arbitrary fixed c>0, so that the test correctly outputs 0.

When the sample trajectory comes from M1, note that for any small enough constant ϵ3>0, there is some ϵ1>0 such that with probability 1-O(n-ϵ1), |S-𝔼M1[S]|<ϵ3n. Then, by the condition in the theorem,

|𝔼M1[S]-𝔼M0[S]|dn(M0,M1)-𝔼M0[SM0]-𝔼M1[SM1]Dn, (49)

where D is the constant alluded to in the theorem statement. Then, by the reverse triangle inequality, |S-𝔼M0[S]|||𝔼M0[S]-𝔼M1[S]|-|𝔼M1[S]-𝔼M0[S]||Dn-ϵ3n, and for small enough ϵ3, this is Dn/2. Thus, |S-𝔼M0[S]|>α(n) with high probability.

Theorem 3 is an immediate invocation of Theorem A.1.

We note that the proof of concentration for m1 follows by generalizing Lemma A.1. In particular, this may be done by induction on m, where m=1 forms the base case. We introduce some convenient notation: let Nt(j)(p,q) denote the number of multisets S of vertices from [t-1] of cardinality at most j whose empirical probability (according to the choices of the vertices in the interval [t,t+C(n)]) is p and for which vSπt,v=q (where the product takes into account the number of times each v occurs in S). In particular, Nt(m)(p,q)=Nt(p,q).

Now, we can write Nt(m)(p,q) by considering the conditional probability assigned to the first vertex chosen in teach timestep:

Nt(m)(p,q)=ppqqNt(1)(p,q)Nt(m-1)(p/p,q/q)𝑑p𝑑q. (50)

As above, we may neglect all but small values of p and q (on the order of O(1/t)), and the concentration of the remaining integral follows by induction.