Do Not Install oMLX Before Watching This
oMLX is a new MLX server for Mac that claims it can run AI models too big for your RAM by using your SSD as overflow memory. If that actually worked, it would be a real upgrade for anyone running local AI on a Mac mini, MacBook, or Mac Studio — you could finally run a 33 GB or even a 64 GB model on a 48 GB machine without dropping ten grand on an M3 Ultra workstation. The biggest constraint on a Mac is that you can’t upgrade the RAM after you buy it. So a tool that lets you punch above your weight class is exactly what most of us want. I put oMLX through three real tests on my 48 GB M4 Mac mini, against the two tools I actually use day to day: Ollama and LM Studio. Same 31-billion parameter model loaded into all three. Same hardware. Same workload. Test A was a baseline — how much RAM does each tool reserve before you even start talking to it? Test B was the conversation test — does it stay fast across a real 5-prompt build, or does it slow down as the context grows? Test C was the headline claim — can oMLX actually run a 33 GB model that doesn’t fit in normal RAM? The conversation test was the same exact 5-prompt sequence across all three tools: (1) Architecture of a Python web scraper. (2) Write the actual code. (3) Add CAPTCHA handling. (4) Add an analytics module. (5) Refactor everything into a proper Python package with a config file and a CLI. Each prompt builds on the last, which forces the model to keep every line we’ve already written in its context. The conversation gets harder as it gets longer. That’s exactly the workload where oMLX’s SSD cache trick is supposed to pay off. 📘 FREE BESTSELLER CHATGPT PROFITS: https://jonlinks.com/gift 🔔 SUBSCRIBE TO MY CHANNEL: https://jonlinks.com/subscribe MY PROGRAMS 🐲 Coloring Book AI: https://jonlinks.com/color 🔨Top Prompts: https://jonlinks.com/101 FOLLOW ME: 💻 BLOG: https://ServeNoMaster.com 🎙️PODCAST: https://ArtificialIntelligencePod.com 📷 INSTAGRAM: / servenomaster 💙 FACEBOOK: / servenomaster 🐦 TWITTER: / servenojonathan 🏢 LINKEDIN: / servenomaster 📷 PINTEREST: / servenomaster TOOLS: The Software and Tools I Use Daily: https://servenomaster.com/tools In this video: 00:00 The promise oMLX is built on 01:30 How I test AI tools (real workloads, not demos) 02:30 Step 0 — the install nightmare 05:30 Test A — RAM usage on a 31B model 07:30 Test B — 5-prompt conversation, Ollama vs oMLX speed 10:00 Test C — the bigger model that crashed my Mac 12:00 The verdict + what I'm running instead 13:30 Final thoughts ——————————————————————— Affiliate links mean I earn a commission from qualifying purchases. This helps support my channel at no additional cost to you.

The M4 Mac mini - I got it WRONG!

I Tested Every AI Website Builder—They're All Lying to You

The Local AI Hardware Mistake Everyone Makes

WWDC 2026 Impressions: Yeah, That's About Right

This Is The Best Local Model Runner For Apple Silicon (oMLX)

Dario Amodei WARNS: People Have No Idea What's Coming In 2027

12 Self-Hosted Apps to Finally Quit Big Tech.

I Hacked This Temu Router. What I Found Should Be Illegal.

Something is jamming GPS over Europe. Here's what we found

I replaced a $20,000 server with this

Quantum Just Killed AI Data Centers

Palantir. IT’S WORSE Than You Think

Trump Attends NBA Finals, Cries Election Fraud in California & Storms Out of Interview

you need to use Hermes RIGHT NOW!! (goodbye OpenClaw!!)

AI Bubble: How AI's push towards IPOs became a death drive | Ed Zitron

$10,000 Mac Studio vs. $10 AI Agent

Why Google Just Gave Away Gemma 4 for Free

Lokale KI ist jetzt WIRKLICH brauchbar (und auf dieser Hardware läuft sie)

Apple's Most Affordable AI Cluster

