Mastodon

Gemini 2.0: speed and deep research impress

Posted by Matt Birchler
— 4 min read

From The Keyword Google blog: Introducing Gemini 2.0: Our New AI Model for the Agentic Era

Today we’re excited to launch our next era of models built for this new agentic era: introducing Gemini 2.0, our most capable model yet. With new advances in multimodality — like native image and audio output — and native tool use, it will enable us to build new AI agents that bring us closer to our vision of a universal assistant.

I just started playing with this, but there’s some interesting stuff here in Gemini’s updates.

Full disclosure, I typically find ChatGPT and Claude to trade blows on giving me the best answers, and Gemini for whatever reason never clicks with me.

Speed, baby!

The first update is Gemini 2.0 Flash, which is a more capable model and a very fast one as well. I struggled to think of a test that would make them all generate the same amount of content in an interesting way, so I gave them this prompt:

Come up with a list of 200 baby names I can pick from. rank them from most normal to most unusual.

Then I ran it in all of the “fast” models I had available to me to see how long they took to make the list.

Assistant Time
Gemini 2.0 Flash 6 seconds
Gemini 1.5 Flash 9 seconds
ChatGPT 4o1-mini 13 seconds
Claude 3.5 Haiku 28 seconds

Then for a more code-based workflow, I asked this:

Write me a simple html app that can show someone a picture of a duck and then let the user use a color picker to pick out individual colors on the image. They should be able to copy the hex value of their selected color to the clipboard when they click on the penguin. Please use css to make this look nice and javascript to perform the interactions. Don't use any third party libraries, just use vanilla javascript.
Assistant Time
Gemini 2.0 Flash 6 seconds
Gemini 1.5 Flash 10 seconds (doesn’t work)
ChatGPT 4o1-mini 22 seconds
Claude 3.5 Haiku 24 seconds

All of the apps they spit out needed a little work to actually get the job done, but Gemini 1.5 Flash literally wouldn’t do anything useful for me. It wrote code, but it ignored the entire duck image part each time I asked. All other Gemini models did a fine job, though.

Those are all the timing I did, and they’re not as rigorous as real researchers would do, but I will say they back up my initial feelings using the tool, which is to say that Gemini 2.0 Flash feels very fast compared to all the LLMs I’ve used in the past. It’s so quick, in fact that it outpaces the “text fade in” animation that plays on the Gemini website, so it’s done generating the response well before the CSS has caught up. The ChatGPT and Claude web interfaces both render the text to screen as soon as it’s ready, so they don’t seem to have this issue. Honestly I think this dulls the impact of the speed in some cases, but this is a good problem to have, I suppose.

I personally value accuracy and web citations in these models more than raw speed, so these fast models aren’t what I’ll use going forward, but I definitely welcome more performant models as we try to get LLMs to run more locally and to consume less power when they do still need to run in the cloud.

Gemini with Deep Research

This one is really interesting, and i think it could end up being really cool in certain circumstances. The basic idea of this seems to be that you ask Gemini a question about a topic, ideally something complex with several things you want to know, and it will go out and read a bunch of websites to generate a Google Doc with all if its findings. It takes a few minutes to do this, so you can even close the browser tab and come back when it’s done, but once it does, it creates a pretty decent “executive summary” of the topic at hand.

The best way to explain this is to just show you the results, so here’s one where I asked it to tell me some things about the 2024 Formula One championship. This is something I know a lot about and can easily fact check it. Here’s the Google Doc where I do just that, and truthfully, it’s done a quite good job, minus one spot where it seems to have just crapped out.

Here’s another one where I asked it to explain the differences between the ActivityPub and AT protocol…protocols, something I know much less about. A quick scan of this seems to generally align with the bits I do know, but I can’t speak to its overall accuracy.

The thing I like about this is that all of the websites sourced are linked inline in the final document. This makes it trivially easy for me to check its work as I go. Like I said earlier, I value accuracy over speed, and nothing makes me more confident in what an LLM tells me than being able to see a reputable human tell me the same thing.

Does this change my chatbot of choice?

I mentioned at the start that I prefer ChatGPT and Claude to Gemini, and I’m pretty confident that preference still stands even with these updates. I think Cluade is a very good coding assistant and its web UI is so much better than anyone else at letting me play with the generated code as I iterate on it. Meanwhile, ChatGPT continues to have some of the best designers in this space building friendly features that are super reliable and just work how want them to. I even find ChatGPT’s web search feature to be surprisingly effective, and I hated Perplexity, so that’s saying something!

If you do use and like Gemini though, I’m happy for you 😁