Using attention mechanisms like transformer before GAT, GraphBert( Zhang et al)

Baseline for GT:

To encode better for graph transformer

SAN: Spectral attention network - NeuIPS 2021

Graphormer - 2021 adds structural features in the form of centrality and spatial encoding

GraphiT 2021 using relative positive encodings based on diffusion kernels.

For scalable graph transformer:

Some previous work proposed a sampling-based approach for Graph Transformers:

Gophormer-2021

NAGphormer-2022

Some use linear time transformers.

Node-former 2022

Performer 2021

Exphormer - ICML 2023

It proposed the sparse attention for graph transformer through two mechanisms: virtual global nodes and expander graph.

from spectral expansion, pseudorandomness, and sparsity → linear graph transformer

can scalable on larger graphs

GT largely operates by encoding graph structure in the form of a soft inductive bias. (it can be seen as a graph adaptation of the Transformer architecture).