RT DETR - realtime object detection with transformers

This video talks about RT DETR - one of the most recent advancements in transformer-based object detection. RT DETR focuses on speed of inference, and by reworking some parts of the original DETR architecture as well as later SOTA models like Dino, it manages to outperform CNN based models like YOLO in both accuracy and speed. Important links: Original paper https://arxiv.org/pdf/2304.08069 Rep VGG paper https://arxiv.org/pdf/2101.03697 PANet paper https://arxiv.org/pdf/1803.01534 00:00 - Intro 04:04 - Hybrid Encoder 11:04 - Feature Fusion 13:23 - Fusion Block 20:10 - Uncertainty-Minimal Query Selection 24:35 - Similarities to DINO 25:40 - Results