Using attention mechanisms like transformer before GAT, GraphBert( Zhang et al)
Baseline for GT:
To encode better for graph transformer
SAN: Spectral attention network - NeuIPS 2021
Graphormer - 2021 adds structural features in the form of centrality and spatial encoding
GraphiT 2021 using relative positive encodings based on diffusion kernels.
For scalable graph transformer:
Some previous work proposed a sampling-based approach for Graph Transformers:
Gophormer-2021
NAGphormer-2022
Some use linear time transformers.
Node-former 2022
Performer 2021
It proposed the sparse attention for graph transformer through two mechanisms: virtual global nodes and expander graph.
from spectral expansion, pseudorandomness, and sparsity → linear graph transformer
can scalable on larger graphs
GT largely operates by encoding graph structure in the form of a soft inductive bias. (it can be seen as a graph adaptation of the Transformer architecture).