Vision Transformer and its Applications

Vision transformer is a recent breakthrough in the area of computer vision. While transformer-based models have dominated the field of natural language processing since 2017, CNN-based models are still demonstrating state-of-the-art performances in vision problems. Last year, a group of researchers from Google figured out how to make a transformer work on recognition. They called it "vision transformer". The follow-up works by the community demonstrated superior performance of vision transformers not only in recognition but also in other downstream tasks such as detection, segmentation, multi-modal learning and scene text recognition to mention a few. In this talk, Rowel Atienza will go into a deeper understanding of the model architecture of vision transformers. Most importantly, Rowel will focus on the concept of self-attention and its role in vision. Then, he will present different model implementations utilizing the vision transformer as the main backbone. Since self-attention can be applied beyond transformers, Rowel Atienza will also discuss a promising direction in building general-purpose model architectures. In particular, networks that can process a variety of data formats such as text, audio, image and video. → To watch more videos like this, visit https://aiplus.training ← Do You Like This Video? Share Your Thoughts in Comments Below Also, You can visit our website and choose the nearest ODSC Event to attend and experience all our Trainings and Workshops: https://odsc.com/california/ https://odsc.com/apac/ Sign up for the newsletter to stay up to date with the latest trends in data science: https://opendatascience.com/newsletter/ Follow Us Online! • Facebook: / opendatasci • Instagram: / odsc • Blog: https://opendatascience.com/ • LinkedIn: / open-data-science • Twitter: / odsc

Yann LeCun's $1B Bet Against LLMs [Part 1]

Yann LeCun's $1B Bet Against LLMs [Part 1]

Introduction to Vision Transformer (ViT) | An image is worth 16x16 words | Computer Vision Series

Introduction to Vision Transformer (ViT) | An image is worth 16x16 words | Computer Vision Series

Q&A Data Provenance for Generative Artificial Intelligence

Q&A Data Provenance for Generative Artificial Intelligence

Webinar | Introduction to parallel performance engineering

Webinar | Introduction to parallel performance engineering

How AI Cracked the Protein Folding Code and Won a Nobel Prize

How AI Cracked the Protein Folding Code and Won a Nobel Prize

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

RL for Agents Workshop - Deep Dive on Training Agents with RL and Open Source

Efficient General Intelligence with Novel Model and Customized Silicon Co-Design, Jason Cong (UCLA)

Efficient General Intelligence with Novel Model and Customized Silicon Co-Design, Jason Cong (UCLA)

Quantum BC Seminar Series on August 12, 2025: Elham Torabian and Jonas Jäger

Quantum BC Seminar Series on August 12, 2025: Elham Torabian and Jonas Jäger

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Billionaire's WARNING: I'm SELLING. The Crash Is Already Here!

Training Sand to Think: Artificial General Intelligence & Future of Physics

Training Sand to Think: Artificial General Intelligence & Future of Physics

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Vision Transformer explained in detail | ViTs

Vision Transformer explained in detail | ViTs

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan

Attention in transformers, step-by-step | Deep Learning Chapter 6

Attention in transformers, step-by-step | Deep Learning Chapter 6

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

System Design Explained: APIs, Databases, Caching, CDNs, Load Balancing & Production Infra

AI Agents for Beginners – Part 1 (Free Labs)

AI Agents for Beginners – Part 1 (Free Labs)

Communicating with AI: Teaching Machines What We Really Want with Michael Littman

Communicating with AI: Teaching Machines What We Really Want with Michael Littman

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Transformers, the tech behind LLMs | Deep Learning Chapter 5