07-SAM3 vs Grounding DINO + SAM2 - which wins for scientific images?

In video 7 of Applied LLMs for Scientists, we test SAM3, Meta's new unified vision-language model, against our Grounding DINO + SAM2 pipeline on real kidney histology images. SAM3 takes a text prompt and produces segmentation masks in a single pass, replacing the two-model pipeline with one. We run a three-way comparison on the same image with the same prompt: SAM3 zero-shot Grounding DINO + SAM2 zero-shot Fine-tuned Grounding DINO + SAM2 (24 images, Val F1 = 0.70) The results are not what you might expect. SAM3 fails on domain-specific prompts but recovers with geometric ones. Fine-tuned DINO+SAM2 wins clearly. The lesson: foundation models trained on natural images do not automatically transfer to scientific domains, fine-tuning on even a small dataset still matters. We also walk through setting up SAM3 and running both backends in the updated annotation tool (v5). Code: https://github.com/bnsreenu/llm-visio... Playlist:    • Applied LLMs for Scientists