Edge features
GNN defines a graph as a set of \(G=\left(V,E\right)\),\(V\)and\(\ E\) representing nodes and edges respectively. Nodes are
divided into word nodes and document nodes, and the edges between word
nodes are defined as Point-wise Mutual Information
(PMI)4. The edge between the word node and the
document node is defined as Term Frequency-Inverse Document Frequency
(TF-IDF)25. The weight of two nodes \(i\) and \(j\)the edges between them is defined as:
where \(A_{i,j}\) is the adjacency matrix, the word frequency denotes
the number of times a word appears in the document, and inverse document
frequency is the logarithm of the total number of documents over the
number of documents containing the word. PMI is a popular word
association measure, which can collect co-occurrence information on all
documents in the corpus with a sliding window of fixed size and
calculate the weight between two word-nodes. The PMI values for words\(i\) and words \(j\) are calculated as:
where, \(\#W\left(i\right)\ \)is the number of sliding windows
containing words \(i\) in the corpus, \(\#W\left(i,j\right)\ \)is the
number of windows containing words \(i\) and \(j\), \(\#W\) represents
the total number of sliding windows in the corpus. To extract
multidimensional edge features, the \(N\times N\) dimensional adjacency
matrix \(A_{i,j}\ \) is raised to an \(N\times N\times P\) tensor\({\hat{E}}_{\text{ijp}}\), where \(P\) represents the \(P\) dimensional
features of the edge. The specific process is shown in Figure 3.
Figure 3. Upgrading of tensor dimension\(A_{i,j}.\)
The tensor changes with the training of the network, and the extra
dimension is the newly learned weight. The adjacency matrix, after
dimensionalization, represents edge features with the value of
continuous multidimensional, which can make full use of edge features
compared with traditional GNN. After network training is completed,\({\hat{E}}_{\text{ijp}}\)is normalized as follows:
The | | operator joins operations. Herein, the initial
node feature of the graph \(X\) is defined as the identity matrix, that
is, each word or document is represented as a one-hot vector. After two
layers of GCN, \(X\) is sent to Softmax for classification:
Where, \(E=D^{-1/2}E_{\text{ij}}D^{-1/2}\) denotes the normalized
edge feature matrix. \(W_{0}\) and \(W_{1}\) are the weight parameters
of training respectively.The output of the GCN layer is considered as
the final representation of the document, which is then fed to the
Softmax layer for classification.