Some of the hottest new apps in 2023 have been tools built on ChatGPT, Whisper, and Stable Diffusion. These new tools for generating text, transcribing audio, and generating images respectively have lead to a boom in innovation. Think what you want about the tech powering these tools, but never since 2008 have we seen more apps released that literally were not possible before.
But while it's great that companies like OpenAI have made APIs easy enough to use that basically anyone can build an app around it, I'd argue it's fundamentally problematic that so much of this has to happen off-device. It's problematic for Apple, but it's also problematic for users, and it would be good if the OS could make this better.
This is bad for Apple because the core technology powering the most innovative apps out there doesn't run on their devices; most apps use a web API which sends data over the network. Ideally for speed and privacy, these generative tasks would all be done locally.
But I hear you! Stable Diffusion, Whisper, and Llama can run locally on your phone! True, although now we get into one of the issues with these local models: they're huge. I have 2 apps installed on my iPhone that use Whisper to transcribe audio to text. They're great, and they're pretty minimal apps, but these 2 apps alone (Transcriptionist and Aiko) use up 4.6GB of storage on my phone. That's a hell of a lot of space to be used by two apps where surely 90% of the code is the Whisper model they both use.
My prediction: Advanced Siri
I think iOS 18 will feature some sort of advanced Siri functionality that will have these sorts of models built into the OS and app developers will be able to integrate to them using something like SiriKit. This will mean apps will be able to do more processing locally, that processing would be faster, their bundle sizes will be much smaller, and all apps can pull from the same models so you don't have to have a bunch of 2GB apps on your phone that all have the same models loaded in them.
Of course these models are going to take up a decent amount of space no matter what, and people with lower capacity devices may not want them (or they may not want them for personal reasons), so maybe this would be something you opt into the first time you install an app that would use one of these models. For example, I wouldn't have any models installed on my iPhone when I upgrade to iOS 18, but when I install a transcription app that uses this new SiriKit functionality, the app could check if I have the required model on my phone and have the system prompt me to download it if I want to use the app.
It's surely not a 1.0 thing (or maybe an ever thing), but it would be interesting if Apple could allow third parties to install their own models as the SiriKit brains. Don't like Apple's text generation? Install OpenAI's or Google's or Anthropic's…SiriKit could abstract away the model itself, so apps could integrate to the SiriKit SDK and the model used to generate the output wouldn't really matter that much.
This solution seems so obvious to me that it feels like a matter of when rather than a matter of if. If these LLMs have legs (and I think they do), then it's simply ridiculous for this all to live outside the platform-holder's purview.