What if you don’t have a dataset of 300M images to train your vision transformer on? get some help from the good ol’ CNNs via distillation!

Photo by Selina Bubendorfer on Unsplash

It seems that the title of the Transformer architecture paper is resonating more and more through our minds recently. Is attention really all we need? For some years now, it seems clear that the NLP community believes so, with transformers being the key…