Skip to main content

Showing 1–2 of 2 results for author: Koubbi, H

  1. arXiv:2410.06833  [pdf, other

    cs.LG math.AP math.DS

    Dynamic metastability in the self-attention model

    Authors: Borjan Geshkovski, Hugo Koubbi, Yury Polyanskiy, Philippe Rigollet

    Abstract: We consider the self-attention model - an interacting particle system on the unit sphere, which serves as a toy model for Transformers, the deep neural network architecture behind the recent successes of large language models. We prove the appearance of dynamic metastability conjectured in [GLPR23] - although particles collapse to a single cluster in infinite time, they remain trapped near a confi… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  2. arXiv:2402.15415  [pdf, other

    cs.LG math.DS stat.ML

    The Impact of LoRA on the Emergence of Clusters in Transformers

    Authors: Hugo Koubbi, Matthieu Boussard, Louis Hernandez

    Abstract: In this paper, we employ the mathematical framework on Transformers developed by \citet{sander2022sinkformers,geshkovski2023emergence,geshkovski2023mathematical} to explore how variations in attention parameters and initial token values impact the structural dynamics of token clusters. Our analysis demonstrates that while the clusters within a modified attention matrix dynamics can exhibit signifi… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.