Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization
I’m happy to share that our paper Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization has been published in the Philosophical Transactions of the Royal Society A.