OpenAI: Introducing GPT-Oss

The gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU. The gpt-oss-20b model delivers similar results to OpenAI o3‑mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inference, or rapid iteration without costly infrastructure.

I've been running the smaller model on my M4 Pro MacBook Pro (24GB RAM) and it's worked great. The performance obviously isn't as good as it is from the server, but it's pretty darn good and since it's running locally will never run into rate limits or light up GPUs in a server farm. I've asked it some coding questions and it's done quite well with them, absolutely better than local models I've used before, and honestly about on par with what I'd expect from the best models out there. I'm really excited to see these get integrated into IDEs and see how they get on with real coding workflows. Open API says these models "demonstrate strong tool use" which is core to how useful a model is in coding tasks (this is a big reason Claude was the first game-changer in LLMs powering apps like Cursor, for example).

As always, these local-only models struggle with current events. The data cutoff for this model is June 2024, so asking anything newer than that results in gibberish, unless of course you tie it to a real time search tool like ChatGPT/Claude/Gemini on the web have at their disposal.

I've made this comparison a lot recently, but Toy Story took months to render on a massive server farm and Cyberpunk 2077 looks way more advanced, has way more complexity going on, and renders at almost 100 frames per second on my laptop. Everything that takes tons of energy to do on servers today will one day become trivial to run on the devices in our bags and eventually pockets. Local models are the future of sustainability and cost reductions for consumers (who wants to pay $20/month for every AI tool?) and very good models like this move us closer to that world.