A lot changed for LLMs in 2024
I thought this was a fascinating post by Simon Willison: Things We Learned About LLMs in 2024
This increase in efficiency and reduction in price is my single favourite trend from 2024. I want the utility of LLMs at a fraction of the energy cost and it looks like that’s what we’re getting.
I think the idea of “infinite” energy with minimal cost and negligible environmental impact is something we should be striving for as a people, but in the meantime, the radical reduction in LLM energy requirements is something I’m excited to see.
Also, I see people compare LLM power usage to Bitcoin, but it’s worth noting that as I talked about in this members’ post, Bitcoin use is hundreds of times more substantial than LLMs, and a key difference is that Bitcoin is fundamentally built on using more and more power over time, while LLMs will get more efficient as technology improves.
OpenAI themselves are charging 100x less for a prompt compared to the GPT-3 days. I have it on good authority that neither Google Gemini nor Amazon Nova (two of the least expensive model providers) are running prompts at a loss.
I think this means that, as individual users, we don’t need to feel any guilt at all for the energy consumed by the vast majority of our prompts. The impact is likely neglible [sic] compared to driving a car down the street or maybe even watching a video on YouTube.
Likewise, training. DeepSeek v3 training for less than $6m is a fantastic sign that training costs can and should continue to drop.
For less efficient models I find it useful to compare their energy usage to commercial flights. The largest Llama 3 model cost about the same as a single digit number of fully loaded passenger flights from New York to London. That’s certainly not nothing, but once trained that model can be used by millions of people at no extra training cost.
This is all great to hear, although that doesn’t mean the big companies out there aren’t massively increasing their datacenter investment in the meantime.
The much bigger problem here is the enormous competitive buildout of the infrastructure that is imagined to be necessary for these models in the future.
[…]
Is this infrastructure necessary? DeepSeek v3’s $6m training cost and the continued crash in LLM prices might hint that it’s not. But would you want to be the big tech executive that argued NOT to build out this infrastructure only to be proven wrong in a few years’ time?
I think this speaks to a bubble on the one hand as every executive is going to want to advocate for more investment now, but things like DeepSeek v3 also points towards radically cheaper training in the future.
Then on how running these models locally on a Mac has changed this year:
Last year it felt like my lack of a Linux/Windows machine with an NVIDIA GPU was a huge disadvantage in terms of trying out new models.
[…]
The llama.cpp ecosystem helped a lot here, but the real breakthrough has been Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.
That’s awesome to hear! I’m not really clued into this part of the LLM world, but it’s good to see Apple is putting in the work and the community are doing the work to get these running great on Macs.
I think this is a really good read for those who want to understand how the world of LLMs has changed in the past year. Things are changing fast, and it’s important to keep up to date with what’s going on, whether you want to support or oppose this tech.