07-SAM3 vs Grounding DINO + SAM2 - which wins for scientific images?
In video 7 of Applied LLMs for Scientists, we test SAM3, Meta's new unified vision-language model, against our Grounding DINO + SAM2 pipeline on real kidney histology images. SAM3 takes a text prompt and produces segmentation masks in a single pass, replacing the two-model pipeline with one. We run a three-way comparison on the same image with the same prompt: SAM3 zero-shot Grounding DINO + SAM2 zero-shot Fine-tuned Grounding DINO + SAM2 (24 images, Val F1 = 0.70) The results are not what you might expect. SAM3 fails on domain-specific prompts but recovers with geometric ones. Fine-tuned DINO+SAM2 wins clearly. The lesson: foundation models trained on natural images do not automatically transfer to scientific domains, fine-tuning on even a small dataset still matters. We also walk through setting up SAM3 and running both backends in the updated annotation tool (v5). Code: https://github.com/bnsreenu/llm-visio... Playlist: • Applied LLMs for Scientists

08-Ask an LLM About Your Images - GPT-4o vs Claude Sonnet for Scientific Images

How AI Cracked the Protein Folding Code and Won a Nobel Prize

05-Fine-Tuning Grounding DINO for Scientific Image Analysis

01-LLM-Assisted Image Annotation - Concepts and Overview

Why The Russian Accent Terrifies Everyone

Feynman Explains Why light does not move

The Tiny Donut That Proved We Still Don't Understand Magnetism

From chatGPT to TSMC - The Whole AI Ecosystem in One Video

The Insane Genius of a Formula 1 Gearbox

Building the PERFECT Linux PC with Linus Torvalds

The Crystal That Could Destroy All Medicine

If You Have A Bad Memory, I’ll Help You Fix It In 28 Minutes

The Energy Storage Problem No One Explained Properly

Ben Raphael | Models and Methods for Spatial Transcriptomics | CGSI 2023

The magic of physics - with Felix Flicker

This Engine Will Reinvent Space Travel

Press Release: 46 Climate Scientists Rebelling? | Limits of Knowledge

Electrons Don't Actually Orbit Like This

06-Literature informed object detection (using RAG)

