Graph Neural Networks (GNNs) have become a topic of intense research recentlydue to their powerful capability in high-dimensional classification andregression tasks for graph-structured data. However, as GNNs typically definethe graph convolution by the orthonormal basis for the graph Laplacian, theysuffer from high computational cost when the graph size is large. This paperintroduces the Haar basis, a sparse and localized orthonormal system for graph,constructed from a coarse-grained chain on the graph. The graph convolutionunder Haar basis --- the Haar convolution can be defined accordingly for GNNs.The sparsity and locality of the Haar basis allow Fast Haar Transforms (FHTs)on graph, by which a fast evaluation of Haar convolution between the graphsignals and the filters can be achieved. We conduct preliminary experiments onGNNs equipped with Haar convolution, which can obtain state-of-the-art resultsfor a variety of geometric deep learning tasks.
Quick Read (beta)
Haar Transforms for Graph Neural Networks
Graph Neural Networks (GNNs) have become a topic of intense research recently due to their powerful capability in high-dimensional classification and regression tasks for graph-structured data. However, as GNNs typically define the graph convolution by the orthonormal basis for the graph Laplacian, they suffer from high computational cost when the graph size is large. This paper introduces the Haar basis, a sparse and localized orthonormal system for graph, constructed from a coarse-grained chain on the graph. The graph convolution under Haar basis — the Haar convolution can be defined accordingly for GNNs. The sparsity and locality of the Haar basis allow Fast Haar Transforms (FHTs) on graph, by which a fast evaluation of Haar convolution between the graph signals and the filters can be achieved. We conduct preliminary experiments on GNNs equipped with Haar convolution, which can obtain state-of-the-art results for a variety of geometric deep learning tasks.
Convolutional neural networks (CNNs) have been a very successful machinery in many high-dimensional regression and classification tasks on Euclidean domains [1, 2]. Recently, its generalization to non-Euclidean domains, known as geometric deep learning, has attracted growing attention, due to its great potential in pattern recognition and regression for graph-structured data, see .
The Laplacian, which carries the structural features of the data, is a second-order isotropic differential operator that admits a natural generalization to graphs and manifolds. In GNNs, input signals are convoluted with filters under an orthonormal system for the Laplacian. However, as the algebraic properties of regular Euclidean grids are lost in general manifolds and graphs, FFTs for the Laplacian are not available. This leads to the issue that the computation of convolution for graph signal is not always efficient, especially when the graph dataset is large.
In this paper, we introduce an alternative orthonormal system on graph, the Haar basis. It then defines a new graph convolution for GNNs — Haar convolution. Due to the sparsity and locality of the Haar basis, fast Haar transforms (FHTs) can be achieved on graph-structured data. This would significantly improve the computational efficiency of GNNs as the Haar convolution guarantees the linear computational complexity. We apply Haar convolution to GNNs and give a novel type of deep convolutional neural networks on graph — HANet. Numerical tests on real graph datasets show that HANet achieves good performance and computational efficiency in classification and regression tasks. The major contributions of the paper are as follows.
The Haar basis is introduced for graph. Both theoretical analysis and real examples of the sparsity and locality are given. With these properties, the fast algorithms for Haar transforms (FHTs) are proposed.
The Haar convolution under the Haar basis is given. By FHTs, the computational cost for the Haar convolution is proportional to the size of the data on the graph. Using Haar basis for Haar convolution, the comprehensive geometric information of the graph data can be well preserved.
GNN with Haar convolution (HANet) is proposed. We demonstrate experimentally that HANet provides great efficiency and achieves good performance on a broad range of high-dimensional regression and classification problems on graphs.
II Related Work
Graph Neural Networks. Developing deep neural networks for graph-structured data has received extensive attention in recent years [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]. Bruna et al.  first propose graph convolution, which is defined by graph Fourier transforms under the orthogonal basis of the graph Laplacian. The graph convolution uses Laplacian eigendecomposition which is computationally expensive. Defferrard et al.  approximate smooth filters in the spectral domain by Chebyshev polynomials. Kipf and Welling  simplify the convolutional layer by exploiting first-order Chebyshev polynomial for filters. Following this line, several acceleration methods for graph convolutional networks are proposed [19, 20]. Graph wavelet neural networks  replace graph Fourier transform by graph wavelet transform in the graph convolution, where Chebyshev polynomials are used to approximate the graph wavelet basis . Although GWNN circumvents the Laplacian eigendecomposition, the matrix inner-product operations are nevertheless not avoidable in wavelet transforms for convolution computation. Graph convolutional networks with attention mechanisms [23, 24] can effectively learn the importance between nodes and its neighbors, which is more suitable for node classification task rather than graph regression ones. But much computational and memory cost is requested to perform the attention mechanism in the convolutional layers. Some GNN models [25, 26, 27] use multi-scale information and higher order adjacency matrix to define graph convolution. To increase the scalability of the model for large-scale graph, Hamilton et al.  propose the framework Graph-SAGE with sampling and a neural network based aggregator over a fixed size node neighbor. Diffusion convolutional neural networks  is developed by using diffusion operator for graph convolution. MoNet  introduces a general methodology to define spatial-based graph convolution by the weighted average of multiple weighting functions on neighborhood.  provides a unified framework, the Message Passing Neural Networks (MPNNs), by which some existing GNN models are incorporated. Xu et al.  present a theoretical analysis for the expressive power of GNNs and propose a simple but powerful variation of GNN, the graph isomorphism network. In , the authors propose the path integral based convolution for GNNs (PAN) which generalizes the graph Laplacian to maximal entropy transition matrix derived from a path integral formalism.
Haar Basis and Fast Algorithms. Haar basis rooted in the theory of Haar wavelet basis as first introduced by Haar , is a special case of Daubechies wavelets , and later developed onto graph by Belkin et al. , see also . The fast computation for sparse matrix multiplication is well-studied (see e.g. ). The sparse Fourier transforms by which sparse Haar transforms can be developed are seen in [39, 40, 41].
III Graph convolution with Haar basis
III-A Graph Fourier transform
Bruna et al.  first define the graph convolution based on spectral graph theory . An un-directed weighted graph is a triplet with vertices , edges and weights . Let the real-valued space on the graph with inner product . The (normalized) eigenvectors of the graph Laplacian forms an orthonormal basis for . Denoted by the number of vertices of the graph and let be the (Fourier) base matrix whose columns form the Fourier basis for . The graph convolution is given by
where is the adjoint discrete Fourier transform of , is the forward discrete Fourier transform of on , is the element-wise Hadamard product.
While graph convolution defined in (1) is conceptually important, it has some limitations in practice. First, the base matrix is obtained by using eigendecomposition of the graph Laplacian in the sense that , where is the diagonal matrix of corresponding eigenvalues. The computational complexity is proportional to , which is impractical when the number of vertices of the graph is quite large. Second, the computation of the forward and inverse graph Fourier transforms (e.g. and ) have computational cost due to the multiplication by (dense) matrices and . In general, there is no fast algorithms for the graph Fourier transforms. Third, filters in the spectral domain cannot guarantee the localization in the spatial (vertex) domain, and parameters need to be tuned in the convolutional layer with filters (hidden nodes) and features for each vertex.
To alleviate the cost of computing the graph Fourier transform, Chebyshev polynomials  are used to construct localized polynomial filters for graph convolution, which is called ChebNet. Kipf and Welling  further simplify the ChebNet to obtain graph convolutional networks (GCNs). However, such a polynomial-based approximation strategy may lose some information in the spectral graph convolutional layer, and matrix multiplication is still not avoidable as FFTs is not available for graph convolution. Thus, the graph convolution in this scenario is also computationally expensive, especially for dense graph of large size. We propose an alternative orthonormal basis that allows fast computation for the corresponding graph convolution, which then improves the scalability and efficiency of existing graph models. The basis we use is the Haar basis on a graph. The Haar basis replaces the matrix of eigenvectors in (1) and forms a highly sparse matrix, which reflects the clustering information of the graph. The sparsity of the Haar transform matrix allows fast computation (in nearly linear computational complexity) of the corresponding graph convolution.
III-B Haar basis
A basis for is a set of vectors on which are linearly independent and orthogonal (i.e. if ). The construction of the Haar basis exploits a chain of the graph. For a graph , a graph is called a coarse-grained graph of if and each vertex of has a parent in . Each vertex of is called a cluster of . Let two integers , . A coarse-grained chain for is a set of graphs if and is a coarse-grained graph of for , and the top level or the coarsest level and the bottom level or the finest level. If the top level of the chain has only one node, becomes a tree. The chain gives a hierarchical partition for the graph . For details about graphs and chains, refer to examples in [42, 43, 37, 44].
Step 1. Let be a coarse-grained graph of . Each vertex is a cluster of . We define vectors on by, for ,
and for ,
where is the number of vertices of , and is the indicator function for the th vertex on given by
Then, forms an orthonormal basis for . For each , we extend to a vector on by
here means the number of vertices in whose common parent is . For , is a cluster of and can be viewed as a subset of . We let be the number of vertices in the cluster , for which we label the vertices by
where for ,
The resulting is an orthonormal basis for .
Step 2. Let be a coarse-grained chain for the graph . An orthonormal basis for is generated using (2) and (3). We then repeatedly use Step 1: for , we generate an orthonormal basis for from the orthonormal basis for the coarse-grained graph that was derived in the previous steps. We call the function at the finest level , the Haar global orthonormal basis or simply the Haar basis for associated with the chain . The orthonormal basis for , is called the associated (orthonormal) basis for the Haar basis .
For each level , the is an orthonormal basis for , and in particular, is an orthonormal basis for ; each basis is the Haar basis for the chain .
Let be a chain. If each parent of level , , contains at least two children, the number of different values of the Haar basis , , is bounded by a constant.
The Haar basis depends on the chain for the graph. If the topology of the graph is well reflected by the clustering of the chain, the Haar basis then contains the crucial geometric information of the graph. For example, by using -means clustering algorithm  or METIS algorithm  one can generate a chain that reveals desired geometric properties of the graph.
Figure 1b shows a chain with levels of a graph . Here, for each level, the vertices are given by
Figure 1a shows the Haar basis for the chain . There are in total vectors of the Haar basis for . From construction, the Haar basis and the associated basis , are closely connected: the can be reduced to and the can be reduced to . This connection would allow fast algorithms for Haar transforms as given in Algorithms 1 and 2. In Figure 1, the matrix of the Haar basis vectors on has good sparsity. With the increase of the graph size, the sparsity of the Haar basis matrix becomes more prominent. We will later demonstrate this finding in our experimental study.
III-C Haar convolution
With the Haar basis constructed in Section III-B, we can define Haar convolution as an alternative form of spectral graph convolution in (1). Let be the Haar basis associated with a chain of a graph . Denoted by the Haar transform matrix. We define by
the adjoint Haar transform for the signal on , and by
the forward Haar transform for (coefficients) vector . We call the matrix Haar transform matrix.
The Haar convolution for a filter and a signal on can be defined as
Computationally, (6) is obtained by performing forward Haar transform of the element-wise Hadamard product between adjoint Haar transform of and . Compared with the Laplacian based spectral graph convolution given in (1), the Haar convolution has several characteristics: (i) the Haar transform matrix is sparse so that the computation of or is more efficient than or ; (ii) as the Haar basis is constructed based on the chain of the graph which reflects the clustering property for vertices, the Haar convolution can potentially extract abstract features (localized components) over , which can be viewed as the learning representation for graph signal. From this perspective, Haar convolution might be more suitable for graph-level based modelling tasks (e.g. graph classification/regression) than node-level ones (e.g. semi-supervised node classification); (iii) by means of the sparsity of the Haar basis, adjoint and forward Haar transforms can be implemented by the fast algorithms that have nearly linear computational complexity (with respect to the size of ).
We can change the execution of ADHTs on the filter (e.g. ) and regard the filter as defined in the frequency domain, and can thus refine Haar convolution as .
Fast algorithms for Haar tranforms. The computation of Haar transforms can also be accelerated by using sparse matrix multiplications due to the sparsity of the Haar transform matrix. This would allow the linear computational complexity with sparsity of the Haar transform matrix. Moreover, a similar computational strategy to the sparse Fourier transforms [41, 48] can be used to have the Haar transforms achieve an even faster algorithm with complexity for graph with nodes and Haar transform matrix with non-zero elements. By the sparsity of Haar transform matrix, fast Haar transforms (FHTs) which includes adjoint Haar transform and forward Haar transform can be developed to speed up the implementation of Haar convolution. Theorems 4 and 5 in the following section show that the computational cost of adjoint and forward Haar transform can reach and . They are nearly linear computational complexity and are then called fast Haar transforms (FHTs). Meanwhile, the Haar convolution can be evaluated by FHTs in steps.
Weight sharing. We can use weight sharing in Haar convolution to reduce the number of parameters of the filter, and capture the common feature of the nodes which are in the same cluster. As the clustering contains the information of the neighbourhood, we can use the chain for weight sharing: the vertices of the graph which have the same parent at a coarser level share a parameter of the filter. Here, the coarser level is some fixed level , . For example, the weight sharing rule for chain in Figure 1b is: assign the weight for each node , on the top level, the filter (or the weight vector) at the bottom level is then . In this way, we can use the filter with two independent parameters to convolute with the input vector with components.
IV Fast algorithms under Haar basis
For the Haar convolution introduced in Definition 6, we can develop an efficient computational strategy by virtue of the sparsity of the Haar basis, as mentioned. Let be a chain of the graph . For convenience, we label the vertices of the level- graph by .
Fast computation for adjoint Haar transform . The adjoint Haar transform in (4) can be computed in the following way. For , let be the number of children of , i.e. the number of vertices of which belongs to the cluster , for . For , let for . For and , we define the weight factor for by
Let . Then, the weighted chain is a filtration if each parent in the chain has at least two children. See e.g. [37, Definition 2.3].
Let be the Haar basis for the filtration of a graph . We define the weighted sum for by
and for and ,
For each vertex of , the is the weighted sum of the at the level for those vertices of whose parent is .
The adjoint Haar transform can be evaluated by the following theorem.
Let be the Haar basis for the filtration of a graph . Then, the adjoint Haar transform for the vector on the graph can be computed by, for ,
By the relation between and ,
thus completing the proof. ∎
Fast computation for forward Haar transform . The forward Haar transform in (5) can be computed, as follows.
Let be the Haar basis for a filtration (,) of the graph and , is the associated bases at . Then, the forward Haar transform for vector in can be computed by, for ,
where for , is the parent of at level , and and
where the weight factors for are given by (7).
Let for and . For , let the th vertex of . For , there exists such that the parent at level of . By the property of the Haar basis, for each eigenvector there exists such that , is a constant for the vertices of who have the same parent at level . Then,
where the product of the weights in the third equality only depends upon the level and the vertex , and we have let
thus completing the proof. ∎
For each , let be the integer such that , where . Evaluating
Computational Complexity Analysis. Algorithm 1 gives the computational steps for the evaluation of , in Theorem 4. In the first step of Algorithm 1, the total number of summations to compute all elements of (13) is no more than ; In the second step, the total number of multiplication and summation operations is at most . Here is the constant which bounds the number of distinct values of the Haar basis (see Proposition 2). Thus, the computational steps of Algorithm 1 are .
By Theorem 5, the evaluation of the forward Haar transform can be implemented by Algorithm 2. In the first step of Algorithm 2, the number of multiplications is no more than ; in the second step, the number of summations is no more than as well; in the third step, the computational steps are ; in the last step, the total number of summations and multiplications is . Thus, the total computational steps are .
Proposition 6 shows that DFTs can recover the graph signal from the forward Haar transform . This means that forward and adjoint Haar transforms have zero-loss in graph signal transmission.
The Haar convolution can be evaluated fast by FHTs in Algorithms 1 and 2. We show the computational steps for Haar convolution in Algorithm 3. From the above discussion, the computational complexity of Algorithm 3 is . We call Algorithm 3 the fast Haar convolution (algorithm). This can be used in graph neural networks which we introduce now.
V Graph neural networks with Haar transforms
The Haar convolution of (1) can be applied to all architectures of graph neural networks. For graph classification and regression tasks, we can apply the model with convolutional layer consisting of -hidden neutrons and a non-linear activation function (e.g. ReLU): for ,
for input graph data with nodes and input features (for each vertex). Here, the feature of the input signal is convolved with the learnable filter by Haar transforms, and then all Haar-transformed features are combined as a new feature . This gives the output matrix . If we write as the diagonal matrix of filter , the convolutional layer has the compact form of the second equality in (V-A). We call the GNNs with Haar convolution in (V-A) HANet.
Weight detaching. For each layer, parameters need to be tuned. To reduce the number of parameters, we can replace the filter matrix by a unified diagonal filter matrix and a compression matrix (which is a detaching approach used in conventional CNN for extracting features). This then leads to a concise form
Then, it requires parameters to train. Recall that constructing the Haar basis uses a chain for the graph , one can implement weight sharing based on the same chain structure. Specifically, -means clustering algorithm  or METIS algorithm  can be used to generate a chain that reveals desired geometric properties of the graph. Suppose we consider a coarser level () having clusters, then all vertices in the same cluster share the common filter parameter. Similarly, the corresponding children vertices in level share the same filter parameters as used in their parent vertices, until the bottom level corresponds to the whole set of vertices of the input graph. Thus, the parameter complexity is reduced to .
The HANet can be evaluated by performing -times fast Haar convolutions (consisting of -times adjoint and forward Haar transforms). The total computational cost is . Deep GNNs with Haar convolution can be built by stacking multiple Haar convolutional layers of (15), followed by an output layer.
HANet for graph classification and regression. Graph classification and regression can be formulated as supervised learning. Given a collection of graph-structured data with labels , the objective of the classification task is to find a mapping that can classify or predict labels. The model of HANet uses a similar architecture of deep convolutional neural network which has several Haar convolutional layers and fully connected dense layers. Figure 2a shows the flowchat of HANet with multiple Haar convolutional layers: the chain and the Haar basis and the associated basis , are pre-computed; graph-structured input is Haar-convoluted with filter which is of length but with independent parameters, where is expanded from level to by weight sharing, and the output of the first layer is the ReLU of the Haar convolution of and ; the graph pooling reduces of size to of size ; and in the second Haar convolutional layer, the input is and the Haar basis is ; the following layers continue this process; the final Haar convolutional layer is fully connected by one or multiple dense layers. For classification, the softmax function is applied to the last dense layer.
HANet for node classification. In node classification, the whole graph is the only single input, where a small proportion of nodes are labeled. The output is the graph with all unknown labels predicted. We can use the network with two layers as
where and are the Haar convolutional layers
where we use the modified Haar convolution . For a graph with nodes and features, in the first Haar convolutional layer, the filter contains parameters and is extended to a matrix by weight sharing, where is the number of nodes at the coarsest level. The plays the role of weight compression and feature extraction. The first layer is activated by the rectifier and the second layer is fully connected with softmax. The , which is defined in , is the square matrix of size determined by the adjacency matrix of the input graph. This smoothing operation compensates the information lost in coarsening by taking a weighted average of features of each vertex and its neighbours. For vertices that are densely connected, it makes their features more similar and significantly improves the ease of node classification task .
V-B Technical components
Fast computation for HANet. Complexity analysis of FHTs in the previous section shows that HANet is more efficient than GNNs with Fourier basis. The graph convolution of the latter will inccur computational steps. Various strategies are proposed to improve the computational performance for graph convolution. For example, ChebNet  and GCN  use localized polynomial approximation for the spectral filters; GWNN  constructs sparse and localized graph wavelet basis matrix for graph convolution. These methods implement the multiplication between a sparse matrix (e.g. the refined adjacency matrix in GCN or the wavelet basis matrix in GWNN ) and input matrix in the convolutional layer. However, to compute either or , the computational complexity to a great extent relies on the sparse degree of or , i.e. roughly proportional to , where , , represents the percentage of non-zero elements in a square matrix. The may be significantly higher than as long as is not extremely small, indicating that our FHTs usually outperform these methods especially when is quite large and . Moreover, sparse matrix multiplication or sparse FHTs as mentioned can be used to improve the computational speed of Haar convolution.
Chain. In HANet, the chain and the Haar basis can be pre-computed since the graph structure is already known. In particular, the chain is computed by a modified version of the METIS algorithm , which fast generates a chain for the weight matrix of a graph. In general, the parents of a chain from METIS have at least two children, and then the weighted chain is a filtration and the condition of Proposition 2 is satisfied.
Weight sharing for filter. In the HANet one can use weight sharing given in Section III-C for filters. By doing this, we exploit the local geometry of the graph-structured data to extract the common feature of neighbour nodes and meanwhile reduce the independent parameters of the filter. Weight sharing can take place in each convolutional layer of HANet. For chain with which the Haar basis is associated, weight sharing can act from the coarsest level to the finest level or from any level coarser than to . For a filtration, the weight sharing could shrink the number of parameters by rate at least, see Figure 2b.
Graph pooling. We use max graph pooling between two convolutional layers of HANet. Each pooled input is the maximum of the previous layer whose locations are at the children of the current level. The pooling uses the same chain as the Haar basis at the same layer. For example, after pooling, the second layer uses the chain , as illustrated in Figure 2. By the construction of Haar basis in Section III-B, the new Haar basis associated with is exactly the pre-computed basis .
In this section, we test the proposed graph neural networks with Haar transforms on MNIST (graph signal classification), Quantum Chemistry (graph regression) and Citation Networks (node classification). The experiments for graph classification were carried out under the Google Colab environment with Tesla K80 GPU while for node classification were under the unix environment with a 3.3GHz Intel Core i7 CPU and 16GB RAM. All the methods were implemented in TensorFlow. SGD+Momentum and Adam optimization methods were used in the experiments.
VI-A MNIST for graph signal classification.
We treat the MNIST digits classification as a learning problem on graphs, following [17, 30], where each pixel of an image with resolution is one of the vertices of the graph. Of images, we use for training, for validation and the remaining for test. We compare HANet against GNN with graph Laplacian  and ChebNet , all with LeNet-5-like architecture . The edges are determined by the spatial relation between vertices. A vertex has an edge to the neighbours and has no edge to other vertices. The task is to recognize an image of hand-written digit as one of the ten digits . We use similar hyper-parameter setting from ChebNet11 1 https://github.com/mdeff/cnn_graph: dropout probability , regularization weight , initial learning rate , learning rate decay , momentum . The chain for Haar basis is . The finest level has nodes and the graphs at the following levels have , , , and nodes. For weight sharing and graph pooling we use the chain for the th convolutional layer. The test accuracy of HANet for MNIST is , which is close to of ChebNet and higher than of the GNN with graph Laplacian. This shows that HANet is able to learn complicated big graph data in graph signal classification where the Haar convolution preserves the geometric information of data.
VI-B Quantum Chemistry for graph regression.
We test HANet on QM7 [51, 52]. The QM7 contains molecules, each of which is represented by the Coulomb (energy) matrix and labeled with atomization energy. We treat each molecule as a weighted graph where the nodes are the atoms and the adjacency matrix is the -Coulomb matrix of the molecule, where the true number of atoms may be less than . As in most cases the adjacency matrix is not full ranked, we take the average of the Coulomb matrices of all molecules as the common adjacency matrix, for which we generate the Haar basis. To avoid exploding gradients in parameter optimization, we take the standard score of each entry over all Coulomb matrices as input.
The architecture of HANet contains layers of Haar convolution with and filters and then fully connected layers with and neurons. As the graph is not big, we do not use graph pooling or weight sharing. Following , we use mean squared error (MSE) plus regularization as the loss function in training and mean absolute error (MAE) as the test metric. We repeat the experiment over splits with the same proportion of training and test data but with different random seeds. We report the average performance and standard deviation of HANet in Table 1 compared against other public results  by methods Random Forest (RF) , Multitask Networks (Multitask) , Kernel Ridge Regression (KRR) , Graph Convolutional models (GC) , Deep Tensor Neural Network (DTNN) , ANI-1 , KRR and Multitask with Coulomb Matrix featurization (KRR(CM)/Multitask(CM)) . It shows that HANet ranks third in the list with average test MAE and average relative MAE , and thus offers a good approximator for QM7 regression.
|Dataset||Haar basis size||Sparsity||Basis Generate Time (s)||Adjoint FHT Time (s)||Forward FHT Time (s)|
VI-C Citation Networks for node classification.
We test the model (16) on citation networks Citeseer, Cora and Pubmed , following the experimental setup of [62, 18]. The Citeseer, Cora and Pubmed are , and classification problems with nodes , and , edges , and , features , and , and label rates , and respectively.
In Table II, we compare the performance of the model (16) of HANet with methods Multilayer Perceptron (MLP), Manifold Regularization (ManiReg) , Semi-supervised Embedding (SemiEmb) , Traditional Label Propagation (LP) , DeepWalk , Link-based Classification (ICA) , Planetoid , ChebNet  and GCN . We repeat the experiment times with different random seeds and report the average test accuracy of HANet. HANet has the top test accuracies on Cora and Pubmed and ranks second on Citeseer. Figure 4 shows the mean and standard deviation of validation accuracies and the validation loss up to epoch of HANet and GCN. HANet achieves slightly higher max accuracy as well as smaller standard deviation, and the loss also converges faster than GCN.
Haar basis and FHTs. In Figure 3a, we show the matrix of the Haar basis vectors for Cora, which has sparsity (i.e. the proportion of zero entries) . The associated chain has , , , , , , , , , , nodes from level to . Figure 3b shows the comparison of time for FHTs with direct matrix product. It illustrates that FHTs have nearly linear computational cost while the cost of matrix product grows at for a graph of size . Figure 3c shows the comparison of time for generating the Haar basis and the basis for graph Laplacian. The Haar basis needs much less time than that for graph Laplacian. Table III gives the sparsity and the CPU time for generating Haar basis and FHTs on three datasets. All sparsity values for three datasets are very high (around ), and the computational cost of FHTs is proportional to .
We introduce Haar convolution for GNNs, which has a fast implementation in view of the sparsity of the Haar basis matrix. This reduces the computational cost of graph convolution to linear complexity. Haar basis gives a sparse representation of graph data, meanwhile, the geometric property of the data is preserved. Haar convolution can act as an effective alternative of the convolution by graph Laplacian for GNNs, as illustrated by extensive experimental study on benchmarks. The models for HANet can possibly be improved and generalized. For example, training of HANet can be accelerated using neighbour sampling , importance sampling , or variance reduction . Application of Haar convolution in other architectures is also expected.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NIPS, 2012, pp. 1097–1105.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
-  M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond euclidean data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
-  J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun, “Spectral networks and locally connected networks on graphs,” in ICLR, 2014.
-  M. Henaff, J. Bruna, and Y. LeCun, “Deep convolutional networks on graph-structured data,” arXiv preprint arXiv:1506.05163, 2015.
-  F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
-  Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel, “Gated graph sequence neural networks,” arXiv preprint arXiv:1511.05493, 2015.
-  D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams, “Convolutional networks on graphs for learning molecular fingerprints,” in Advances in neural information processing systems, 2015, pp. 2224–2232.
-  M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural networks for graphs,” in International Conference on Machine Learning, 2016, pp. 2014–2023.
-  P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner et al., “Relational inductive biases, deep learning, and graph networks,” arXiv preprint arXiv:1806.01261, 2018.
-  J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, and M. Sun, “Graph neural networks: A review of methods and applications,” arXiv preprint arXiv:1812.08434, 2018.
-  Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” arXiv preprint arXiv:1901.00596, 2019.
-  Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” arXiv preprint arXiv:1812.04202, 2018.
-  M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,” Proceedings of the IEEE, vol. 104, no. 1, pp. 11–33, 2015.
-  P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: A survey,” Knowledge-Based Systems, vol. 151, pp. 78–94, 2018.
-  W. L. Hamilton, R. Ying, and J. Leskovec, “Representation learning on graphs: Methods and applications,” arXiv preprint arXiv:1709.05584, 2017.
-  M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks on graphs with fast localized spectral filtering,” in NIPS, 2016, pp. 3844–3852.
-  T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in ICLR, 2017.
-  J. Chen, J. Zhu, and L. Song, “Stochastic training of graph convolutional networks with variance reduction,” in ICML, 2018, pp. 941–949.
-  J. Chen, T. Ma, and C. Xiao, “FastGCN: fast learning with graph convolutional networks via importance sampling,” arXiv:1801.10247, 2018.
-  B. Xu, H. Shen, Q. Cao, Y. Qiu, and X. Cheng, “Graph wavelet neural network,” in International Conference on Learning Representations, 2019.
-  D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011.
-  K. K. Thekumparampil, C. Wang, S. Oh, and L.-J. Li, “Attention-based graph neural network for semi-supervised learning,” arXiv preprint arXiv:1803.03735, 2018.
-  P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
-  A. Susnjara, N. Perraudin, D. Kressner, and P. Vandergheynst, “Accelerated filtering on graphs using lanczos method,” arXiv preprint arXiv:1509.04537, 2015.
-  R. Liao, Z. Zhao, R. Urtasun, and R. S. Zemel, “Lanczosnet: Multi-scale deep graph convolutional networks,” in International Conference on Learning Representations, 2019.
-  F. Wu, T. Zhang, A. H. d. Souza Jr, C. Fifty, T. Yu, and K. Q. Weinberger, “Simplifying graph convolutional networks,” arXiv preprint arXiv:1902.07153, 2019.
-  W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in NIPS, 2017, pp. 1024–1034.
-  J. Atwood and D. Towsley, “Diffusion-convolutional neural networks,” in Advances in Neural Information Processing Systems, 2016, pp. 1993–2001.
-  F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model CNNs,” in CVPR, 2017, pp. 5425–5434.
-  J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in Proceedings of the 34th International Conference on Machine Learning. JMLR. org, 2017, pp. 1263–1272.
-  K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” arXiv preprint arXiv:1810.00826, 2018.
-  Z. Ma, M. Li, and Y. G. Wang, “Pan: Path integral based convolution for deep graph neural networks,” in Workshop on Learning and Reasoning with Graph-Structured Representation. The 36th International Conference on Machine Learning, 2019.
-  A. Haar, “Zur theorie der orthogonalen funktionensysteme,” Mathematische Annalen, vol. 69, no. 3, pp. 331–371, 1910.
-  I. Daubechies, Ten lectures on wavelets. SIAM, 1992, vol. 61.
-  M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” Journal of Machine Learning Research, vol. 7, no. Nov, pp. 2399–2434, 2006.
-  C. Chui, F. Filbir, and H. Mhaskar, “Representation of functions on big data: graphs and trees,” Applied and Computational Harmonic Analysis, vol. 38, no. 3, pp. 489 – 509, 2015.
-  G. H. Golub and C. F. Van Loan, Matrix computations. JHU press, 2012, vol. 3.
-  P. Indyk and M. Kapralov, “Sample-optimal Fourier sampling in any constant dimension,” in 2014 IEEE 55th Annual Symposium on Foundations of Computer Science. IEEE, 2014, pp. 514–523.
-  P. Indyk, M. Kapralov, and E. Price, “(nearly) sample-optimal sparse fourier transform,” in Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2014, pp. 480–499.
-  H. Hassanieh, P. Indyk, D. Katabi, and E. Price, “Simple and practical algorithm for sparse fourier transform,” in Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2012, pp. 1183–1194.
-  F. R. Chung and F. C. Graham, Spectral graph theory. American Mathematical Society, 1997, no. 92.
-  D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011.
-  Y. G. Wang and X. Zhuang, “Tight framelets and fast framelet filter bank transforms on manifolds,” Applied and Computational Harmonic Analysis, 2018.
-  M. Gavish, B. Nadler, and R. R. Coifman, “Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning,” in ICML, 2010, pp. 367–374.
-  S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
-  G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 359–392, 1998.
-  H. Hassanieh, P. Indyk, D. Katabi, and E. Price, “Nearly optimal sparse fourier transform,” in 44th Annual ACM Symposium on Theory of Computing, STOC’12, 2012, pp. 563–577.
-  Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolutional networks for semi-supervised learning,” arXiv:1801.07606, 2018.
-  Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
-  L. C. Blum and J.-L. Reymond, “970 million druglike small molecules for virtual screening in the chemical universe database GDB-13,” Journal of the American Chemical Society, vol. 131, p. 8732, 2009.
-  M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von Lilienfeld, “Fast and accurate modeling of molecular atomization energies with machine learning,” Physical Review Letters, vol. 108, p. 058301, 2012.
-  J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” arXiv preprint arXiv:1704.01212, 2017.
-  Z. Wu, B. Ramsundar, E. N. Feinberg, J. Gomes, C. Geniesse, A. S. Pappu, K. Leswing, and V. Pande, “MoleculeNet: a benchmark for molecular machine learning,” Chemical Science, vol. 9, no. 2, pp. 513–530, 2018.
-  L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
-  B. Ramsundar, S. Kearnes, P. Riley, D. Webster, D. Konerding, and V. Pande, “Massively multitask networks for drug discovery,” arXiv preprint arXiv:1502.02072, 2015.
-  C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
-  H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande, “Low data drug discovery with one-shot learning,” ACS Central Science, vol. 3, no. 4, pp. 283–293, 2017.
-  K. T. Schütt, F. Arbabzadah, S. Chmiela, K. R. Müller, and A. Tkatchenko, “Quantum-chemical insights from deep tensor neural networks,” Nature Communications, vol. 8, p. 13890, 2017.
-  J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost,” Chemical Science, vol. 8, no. 4, pp. 3192–3203, 2017.
-  P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad, “Collective classification in network data,” AI Magazine, vol. 29, no. 3, p. 93, 2008.
-  Z. Yang, W. W. Cohen, and R. Salakhutdinov, “Revisiting semi-supervised learning with graph embeddings,” in ICML, 2016, pp. 40–48.
-  J. Weston, F. Ratle, H. Mobahi, and R. Collobert, “Deep learning via semi-supervised embedding,” in Neural Networks: Tricks of the Trade. Springer, 2012, pp. 639–655.
-  X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in ICML, 2003, pp. 912–919.
-  B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of social representations,” in ACM SIGKDD. ACM, 2014, pp. 701–710.
-  Q. Lu and L. Getoor, “Link-based classification,” in ICML, 2003, pp. 496–503.