It seems that the title of the Transformer architecture paper is resonating more and more through our minds recently. Is attention really all we need? For some years now, it seems clear that the NLP community believes so, with transformers being the key component of SotA architectures, in language modeling, translation or summarization tasks.
A few months ago, a transformer-based architecture for image classification, proposed in this paper by Google, has outperformed SotA CNN-based architectures. One of the inconveniences of this architecture is the (understandably) large number of parameters and resources required to train.
In a fresh paper, Facebook has…
In this post, I will share my understanding of the Vision Transformer architecture. All the drawings in this post are original content, based on the knowledge from the paper and other tutorials which will be referred to where appropriate.
Transformer architectures have been a major breakthrough in Natural Language Processing (NLP) tasks since being proposed in 2016. Google’s BERT and Open AI’s GPT-2 / GPT-3 architectures have been the state-of-the-art solutions for various tasks, including language modeling, text summarization, and question answering.
Studying MSc in Artificial Intelligence 🤓