The 0.9B OCR Model That Beats Gemini? (GLM-OCR) | Benchmarks + Demo | Live Coding + Q&A (Mar 19th)

GLM-OCR packs just 0.9B parameters — a 0.4B CogViT visual encoder and a 0.5B GLM language decoder — yet it tops OmniDocBench V1.5 at 94.62, approaching Gemini-level performance. A Multi-Token Prediction mechanism lets it decode multiple tokens per step, keeping latency low enough for edge deployment and production workloads. In this stream I first benchmark GLM-OCR across 8 diverse datasets — captchas, LaTeX equations, receipts, date stamps, jersey numbers, container serials, tire codes, and license plates — to test its limits on real-world images. Then I build a complete smart parking management system that chains license plate detection, OC-SORT multi-object tracking, and GLM-OCR into a pipeline that reads plates automatically as vehicles enter a lot. Both Colab notebooks are linked below so you can follow along. Resources: 📓 How to Perform OCR with GLM-OCR: https://colab.research.google.com/git... 📓 Smart Parking Management with GLM-OCR: https://colab.research.google.com/git... 📄 GLM-OCR Paper: https://arxiv.org/abs/2603.10910 🤗 GLM-OCR on HuggingFace: https://huggingface.co/zai-org/GLM-OCR Stay updated with the projects I'm working on at https://github.com/roboflow and https://github.com/SkalskiP! ⭐

The Local AI Hardware Mistake Everyone Makes

The Local AI Hardware Mistake Everyone Makes

YOLO26 vs RF-DETR: Which Model Should You Use in 2026?

YOLO26 vs RF-DETR: Which Model Should You Use in 2026?

Full Archon Guide - Build AI Coding Harnesses That Actually Ship (LIVE)

Full Archon Guide - Build AI Coding Harnesses That Actually Ship (LIVE)

GLM-OCR: Fast 0.9B Local Document Parsing

GLM-OCR: Fast 0.9B Local Document Parsing

How Agents Quietly Break Architecture

How Agents Quietly Break Architecture

Dwell Time Analysis | Real-Time Stream Processing | Community Q&A (April 11)

Dwell Time Analysis | Real-Time Stream Processing | Community Q&A (April 11)

RF-DETR Segmentation. Benchmarks, Inference, Training | Live Coding + Q&A (Jan 29th)

RF-DETR Segmentation. Benchmarks, Inference, Training | Live Coding + Q&A (Jan 29th)

I 100%'d the Backyard Nuclear Bomb Building Game

I 100%'d the Backyard Nuclear Bomb Building Game

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

Turing Award Winner: Disagreeing with Google, Postgres, Future Problems | Mike Stonebraker

The SpaceX Bubble Crash Is Worse Than It Looks

The SpaceX Bubble Crash Is Worse Than It Looks

Stop Prompting Claude. Use Karpathy's Method Instead.

Stop Prompting Claude. Use Karpathy's Method Instead.

Trump PANICS over FINAL MOU WARNING!!!

Trump PANICS over FINAL MOU WARNING!!!

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

Keynote: After the AI Hype – What’s Real, and What’s Next - Richard Campbell - 2026

This is not the AI we were promised | The Royal Society

This is not the AI we were promised | The Royal Society

I Made Opus 4.8 and Fable 5 Build the Same App (RAW RESULTS)

I Made Opus 4.8 and Fable 5 Build the Same App (RAW RESULTS)

Why Google Just Gave Away Gemma 4 for Free

Why Google Just Gave Away Gemma 4 for Free

This 284B Model Shouldn't Fit On Your Laptop. It Does

This 284B Model Shouldn't Fit On Your Laptop. It Does

YOLO26 Fine-Tuning | Detection and Instance Segmentation | Live Coding + Q&A (Jan 15th)

YOLO26 Fine-Tuning | Detection and Instance Segmentation | Live Coding + Q&A (Jan 15th)

I Hacked This Temu Router. What I Found Should Be Illegal.

I Hacked This Temu Router. What I Found Should Be Illegal.

Learn To See What God Sees When He Looks At You

Learn To See What God Sees When He Looks At You