Mastodon

Oops, I made a benchmark

Oops, I made a benchmark

I didn't really set out to do it, but my Quick Subtitles app actually makes for a pretty interesting benchmark tool. Back in October I compared sustained performance between the iPhone 17 Pro and iPhone Air by using the app's batch feature, but it wasn't much work to tweak that feature to build a bespoke benchmarking mode into the app, so that's what I did.

The test

This benchmark is pretty simple, you give it an audio or video file and it transcribes the file using Apple's on-device language model over and over and over again. I maxed everything out by giving it a Cozy Zone podcast episode to transcribe 20 times in a row. After each run, it logs how many words per minute it transcribed in that specific run, and begins again.

To be clear, this is a very specific benchmark that tests the performance of a combination of features of the system on a chip, including the neural engine, CPU, and memory. This is not a wide ranging, general benchmark.

But that's not what I'm using it for, I'm using it in this case to test thermal throttling. See, whatever combination of components this tests, it generates heat…a lot of heat. I wanted to know how quickly each device would thermally throttle. When it did throttle, how much performance did it lose?

iPhone 16e

This is the one you're probably least interested in, but here's our baseline.

  • A18 processor, 6-core CPU and 4-core GPU

As we can see, the first transcript hit 170 words per second, and by the 4th one we were about as throttled as we could get. Performance was around 66-70% of the max performance most of the run.

iPhone Air

The iPhone Air is where things get more interesting.

  • A19 Pro processor, 6-core CPU and 5-core GPU

This one was basically the same exact story, just with a higher starting point. We started at a very good 208 words per second to begin, but by the 4th run we were bottoming out around 130 words per second, or about 62% of the max performance.

iPhone 17 Pro

Here's where we get to see the benefit of a vapor chamber.

  • A19 Pro processor, 6-core CPU and 6-core GPU

This one starts at a highest-yet first run of 217 words per second, dropping to 151 in the 20th and final run. That's a drop to 70% of the performance, but you can see a pretty linear trend as it gets marginally slower each time. What this tells me is that the vapor chamber is doing some good work, but since it's not active cooling, it's just passive cooling, eventually we still get pretty darn hot and need to throttle.

These last two were especially interesting because back in October I did a similar test, and I suspected that the Air was indeed throttling due to worse thermals, but I didn't have enough granular data to prove it. I think this test shows pretty conclusively that its raw performance at this workload is comparable to the Pro phones, but only for a couple minutes.

MacBook Pro

Now let's get kind of unfair and bring a freaking laptop to the shootout.

  • M4 Pro, 14-code CPU and 20-core GPU

There are a couple notable things with this test. First, in terms of throttling, buddy this thing doesn't throttle. Outside of the first run, which was oddly a bit slower than the rest, every single other test was almost exactly 227 words per second.

And second, while this is objectively faster than the iPhones, it's not that much faster. Its fastest time was only 5% faster than the iPhone 17 Pro's best time, which is pretty remarkable for the phone. Basically, if you need to transcribe a one-hour podcast, your phone and Mac will be about the same speed doing it, but if you need to transcribe a season all at once, do that on the Mac.

iPad Pro

While were at it, how about an iPad Pro as well?

  • M4 processor

This looks just like the MacBook Pro chart: completely steady (besides an odd dip in the middle).


Benchmark mode will be in a Quick Subtitles update, which should be out before the end of the year. Shameless plug…More Birchtree subscribers get beta access to all of my apps, and it should be in the beta in the next day or two (depending on App Review time, which yes, also impacts TestFlight).