Combining BERT and GCN
Herein, BERT is trained in another part of the network as an auxiliary
classifier6. Combining BERT on GCN can make the
network combine the advantages of large-scale pre-training model,
resulting in more rapid convergence and better performance. In terms of
specific implementation, an auxiliary classifier is constructed by
embedding documents \(X\) directly into the Softmax layer:
Finally, linear interpolation is used to combine the representation of
BERT and GCN with:
The reasons for better performance through interpolation are:
ZBERT acts directly on the GCN input to ensure that the
GCN input is tuned and optimized towards the goal. This helps the
multi-layer GCN model to overcome inherent shortcomings, such as
gradient disappearance or over-smoothing27, thus
resulting in an improved performance.