Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization
Together with Martin Burger, Samira Kabri, Yury Korolev and Lukas Weigand we posted a new preprint called Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization.