Gemini 2.0: our new AI model for the agentic era

simonw · 2024-12-11T17:15:12 1733937312

I released a new llm-gemini plugin with support for the Gemini 2.0 Flash model, here's how to use that in the terminal:

    llm install -U llm-gemini
    llm -m gemini-2.0-flash-exp 'prompt goes here'

LLM installation: https://llm.datasette.io/en/stable/setup.html

Worth noting that the Gemini models have the ability to write and then execute Python code. I tried that like this:

    llm -m gemini-2.0-flash-exp -o code_execution 1 \
      'write and execute python to generate a 80x40 ascii art fractal'

Here's the result: https://gist.github.com/simonw/0d8225d62e8d87ce843fde471d143...

It can't make outbound network calls though, so this fails:

    llm -m gemini-2.0-flash-exp  -o code_execution 1 \
      'write python code to retrieve https://simonwillison.net/ and use a regex to extract the title, run that code'

Amusingly Gemini itself doesn't know that it can't make network calls, so it tries several different approaches before giving up: https://gist.github.com/simonw/2ccfdc68290b5ced24e5e0909563c...

The new model seems very good at vision:

    llm -m gemini-2.0-flash-exp describe -a https://static.simonwillison.net/static/2024/pelicans.jpg

I got back a solid description, see here: https://gist.github.com/simonw/32172b6f8bcf8e55e489f10979f8f...

simonw · 2024-12-11T20:45:07 1733949907

Published some more detailed notes on my explorations of Gemini 2.0 here https://simonwillison.net/2024/Dec/11/gemini-2/

bravura · 2024-12-11T21:27:38 1733952458

Question: Have you tried using this for video?

Alternately, if I wanted to pipe a bunch of screencaps into it and get one grand response, how would I do that?

e.g. "Does the user perform a thumbs up gesture in any of these stills?"

[edit: also, do you know the vision pricing? I couldn't find it easily]

simonw · 2024-12-11T21:53:03 1733953983

Previous Gemini models worked really well for video, and this one can even handle steaming video: https://simonwillison.net/2024/Dec/11/gemini-2/#the-streamin...

bravura · 2024-12-11T22:55:06 1733957706

Wow this is amazing. It just gave me critique on my bodyweight squat form.

But I also found it hard to prompt to tutor in French or Portuguese; the accents were gruesomely bad.

pcwelder · 2024-12-11T20:26:44 1733948804

Code execution is okay, but soon runs into the problem of missing packages that it can't install.

Practically, sandboxing hasn't been super important for me. Running claude with mcp based shell access has been working fine for me, as long as you instruct it to use venv, temporary directory, etc.

mnky9800n · 2024-12-11T22:21:51 1733955711

Can it run ipython? Then you could use ipython magic to pip install things:

https://ipython.readthedocs.io/en/stable/interactive/magics....

UltraSane · 2024-12-11T20:35:10 1733949310

Is there a guide on how to do that?

pcwelder · 2024-12-11T20:58:19 1733950699

For building mcp server? The official docs do a great job

https://modelcontextprotocol.io/introduction

My own mcp server could be an inspiration on Mac. It's based on pexpect to enable repl session and has some tricks to prevent bad commands.

https://github.com/rusiaaman/wcgw

However, I recommend creating one with your own customised prompts and tools for maximum benefit.

stavros · 2024-12-11T21:22:48 1733952168

I wrote a program that can do more or less the same thing, if you only care about the LLM running commands to help you do something:

https://github.com/skorokithakis/sysaidmin

rafram · 2024-12-11T19:08:30 1733944110

> Some pelicans have white on their heads, suggesting that some of them are older birds.

Interesting theory!

smackay · 2024-12-11T19:17:02 1733944622

Brown Pelican (Pelecanus occidentalis) heads are white in the breeding season. Birds start breeding aged three to five. So technically the statement is correct but I wonder if Gemini didn't get its pelicans and cormorants in a muddle. The mainland European Great Cormorant (Phalacrocorax carbo sinensis) has a head that gets progressively whiter as birds age.

crowcroft · 2024-12-11T15:59:23 1733932763

Big companies can be slow to pivot, and Google has been famously bad at getting people aligned and driving in one direction.

But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.

Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.

StableAlkyne · 2024-12-11T17:51:52 1733939512

> Remains to be seen how well they will be able to productize and market

The challenge is trust.

Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.

It's hard to justify committing developers and money to a product when there's a good chance you'll just have to pivot again once they get bored. Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.

egeozcan · 2024-12-11T18:00:44 1733940044

> they also have an incredibly bad track record of supporting their products

Incredibly bad track record of supporting products that don't grow. I'm not saying this to defend Google, I'm still (perhaps unreasonably) angry because of Reader, it's just that there is a pattern and AI isn't likely to fit that for a long while.

RandomThoughts3 · 2024-12-11T18:54:28 1733943268

I’m sad for reader but it was a somewhat niche product. Inbox I can’t forgive. It was insanely good and was killed because it was a threat to Gmail.

My main issue with Google is that internal politic affects users all the time. See the debacle of anything built on top of Android and being treated as a second citizen.

You can’t trust a company which can’t shield users from its internal politics. It means nothing is aligned correctly for users to be taken seriously.

makeitdouble · 2024-12-12T00:22:51 1733962971

> products that don't grow.

I think we all acknowledge this.

The question is seldom "why" they kill it (I'd argue ultimately it doesn't matter), it's about how fast and what they offer as a migration path for those who boarded the train.

That also means the minute Gemini stops looking like a growing product it's gone from this world, where Microsoft backed alternatives have a fighting chance to get some leeway to recover or pivot.

esafak · 2024-12-11T23:32:20 1733959940

Why would they grow if they don't vocally support them? Launch and hope for the best does not work; it's not the wild west on the Internet any more.

msabalau · 2024-12-11T20:48:35 1733950115

Yeah, either AI is significant, in which case Google isn't going to kill it. Or AI is a bubble, in any of the alternatives one might pick can easily crash and die long before Google ends of life anything.

This isn't some minor consumer play, like a random tablet or Stadia. Anyone who has paying attention would have noticed that AI has been an important, consistent, long term strategic interest of Google's for a very long time. They've been killing off the fail/minor products to invest in this.

mannycalavera42 · 2024-12-11T20:22:37 1733948557

not going to miss the opportunity to upvote on the grief of having lost Reader

TIPSIO · 2024-12-11T18:10:59 1733940659

Yes. Imagine Google banning your entire Google account / Gmail because you violated their gray area AI terms ([1] or [2]). Or, one of your users did via an app you made using an API key and their models.

With that being said, I am extremely bullish on Google AI for a long time. I imagine they land at being the best and cheapest for the foreseeable future.

[1] https://policies.google.com/terms/generative-ai

[2] https://policies.google.com/terms/generative-ai/use-policy

estebarb · 2024-12-11T19:03:30 1733943810

For me that is a reason for not touching anything from Google for building stuff. I can afford lossing my Amazon account, but Google's one would be too much. At least they should be clear in their terms that getting banned at cloud doesn't mean getting banned from Gmail/Docs/Photos...

bippihippi1 · 2024-12-11T19:11:25 1733944285

why not just make a business / project account?

rtsil · 2024-12-11T19:26:53 1733945213

That won't help. Their TOS and policies are vague enough that they can terminate all accounts you own (under "Use of multiple accounts for abuse" for instance).

TIPSIO · 2024-12-11T21:35:59 1733952959

To be fair, I believe this is reserved for things like fighting fraud.

dbdoskey · 2024-12-11T21:52:53 1733953973

It has been used a few times by people who had a Google Play app banned, that sometimes the personal account would get banned as well.

https://www.xda-developers.com/google-developer-account-ban-...

YetAnotherNick · 2024-12-12T00:59:25 1733965165

Even if it is warranted on their part, the 1% false positive will be detrimental to those affected. And we all know there is no way to reach out to them in case the account is automatically flagged.

estebarb · 2024-12-12T01:02:30 1733965350

I asked Gemini about banning risks, and it answered:

Gemini: Yes, there is a potential risk of your Google account being suspended if your SaaS is used to process inappropriate content, even if you use Gemini to reject the request. While Gemini can help you filter and identify harmful content, it's not a foolproof solution.

Here are some additional measures you can take to protect your account:

* Content moderation: Implement a robust content moderation system to filter out inappropriate content before it reaches Gemini. This can include keyword-based filtering, machine learning models, and human review.

...

* Regularly review usage: Monitor your usage of Gemini to identify any suspicious activity.

* Follow Google's terms of service: Make sure that your use of Gemini complies with Google's terms of service.

By taking these steps, you can minimize the risk of your account being suspended and ensure that your SaaS is used responsibly.

---

In a follow up question I asked about how to implement robust content moderation and it suggested humans reviewing each message...

dotancohen · 2024-12-11T19:49:51 1733946591

  > Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.

This is why we've stayed with Anthropic. Every single person I work with on my current project is sore at Google for discontinuing one product or another - and not a single one of them mentioned Reader.

We do run some non-customer facing assets in Google Cloud. But the website and API are on AWS.

bastardoperator · 2024-12-11T21:02:06 1733950926

Putting your trust in Google is a fools errand. I don't know anyone that doesn't have a story.

meta_x_ai · 2024-12-12T00:09:05 1733962145

Google has 4 Billion users. It's delusional to think that you don't know anyone or you live in an incredibly small bubble

TacticalCoder · 2024-12-12T00:28:04 1733963284

> But they also have an incredibly bad track record of supporting their products.

I don't know about that: my wife built her first SME on Google Workspace / GSuite / Google Apps for domain (this thing changed names so many times I lost track). She's now running her second company on Google tools, again.

All she needs is a browser. At one point I switched her from Windows to OS X. Then from OS X to Ubuntu.

Now I just installed Debian GNU/Linux on her desktop: she fires up a browser and opens up Google's GMail / GSuite / spreadsheets and does everything from there.

She's a happy paying customer of Google products since a great many years and there's actually phone support for paying customers.

I honestly don't have many bad things to say. It works fine. 2FA is top notch.

It's a much better experience than being stuck in the Windows "Updating... 35%" "here's an ad on your taskbar" "you're computer is now slow for no reason" world.

I don't think they'l pull the plug on GSuite: it's powering millions and millions of paying SMEs around the world.

fluoridation · 2024-12-11T18:54:01 1733943241

>Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.

Eh... I don't know about that. Their tech graveyard isn't as populous as Google's, but it's hardly empty. A few that come to mind: ATL, MFC, Silverlight, UWP.

bri3d · 2024-12-11T19:18:15 1733944695

Besides Silverlight (which was supported all the way until the end of 2021!), you can still not only run but _write new applications_ using all of the listed technologies.

fluoridation · 2024-12-11T19:38:49 1733945929

That doesn't constitute support when it comes to development platforms. They've not received any updates in years or decades. What they've done is simply not remove the capability build capability from the toolchains. That is, not even the work that would be required to no longer support them in any way. Compare that to C#, which has evolved rapidly over the same time period.

Fidelix · 2024-12-11T20:14:34 1733948074

That's different from "killing" the product / technology, which is what Google does.

fluoridation · 2024-12-11T20:32:56 1733949176

Only because they operate different businesses. Google is primarily a service provider. They have few software products that are not designed to integrate with their servers. Many of Microsoft's businesses work fundamentally differently. There's nothing Microsoft could do to Windows to disable all MFC applications and only MFC applications, and if there was it would involve more work than simply not doing anything else with MFC.

px1999 · 2024-12-11T23:11:04 1733958664

The business model doesn't matter.

I can write something with Microsoft tech and expect it with reasonable likelihood to work in 10 years (even their service-based stuff), but can't say the same about anything from Google.

That alone stops me/my org buying stuff from Google.

fluoridation · 2024-12-11T23:26:05 1733959565

I'm not contending that Microsoft and Google are equivalent in this regard, I'm saying that Microsoft does have a history of releasing technologies and then letting them stagnate.

panabee · 2024-12-11T16:58:28 1733936308

With many research areas converging to comparable levels, the most critical piece is arguably vertical integration and forgoing the Nvidia tax.

They haven't wielded this advantage as powerfully as possible, but changes here could signal how committed they are to slaying the search cash cow.

Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.

It will be far more impressive if Google can defy the odds and conquer the innovator's dilemma with search.

Regardless, congratulations to Google on an amazing release and pushing the frontiers of innovation.

bloomingkales · 2024-12-11T17:17:59 1733937479

They have to not get blind sided by Sora, while at the same time fighting the cloud war against MS/Amazon.

Weirdly Google is THE AI play. If AI is not set to change everything and truly is a hype cycle, then Google stock withstands and grows. If AI is the real deal, then Google still withstands due to how much bigger the pie will get.

whimsicalism · 2024-12-11T22:56:47 1733957807

sora is not a big factor in this

crowcroft · 2024-12-11T17:06:13 1733936773

They need an iPod to iPhone like transition. If they can pull it off it will be incredible for the business.

TacticalCoder · 2024-12-12T00:37:03 1733963823

> Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.

You mean by shifting away from Windows for mobile and focusing on iOS and Android?

crazygringo · 2024-12-11T18:26:26 1733941586

> and Google has been famously bad at getting people aligned and driving in one direction.

To be fair, it's not that they're bad at it -- it's that they generally have an explicit philosophy against it. It's a choice.

Google management doesn't want to "pick winners". It prefers to let multiple products (like messaging apps, famously) compete and let the market decide. According to this way of thinking, you come out ahead in the long run because you increase your chances of having the winning product.

Gemini is a great example of when they do choose to focus on a single strategy, however. Cloud was another great example.

xnx · 2024-12-11T19:18:25 1733944705

I definitely agree that multiple competing products is a deliberate choice, but it was foolish to pursue it for so long in a space like messaging apps that has network effects.

As a user I always still wish that there were fewer apps with the best features of both. Google's 2(!) apps for AI podcasts being a recent example : https://notebooklm.google.com/ and https://illuminate.google.com/home

tbarbugli · 2024-12-11T19:40:03 1733946003

Google is not winning on cloud, AWS is winning and MS gaining ground.

surajrmal · 2024-12-11T20:28:08 1733948888

Parent didn't claim Google is winning. Only that there is a cohesive push and investment in a single product/platform.

rrdharan · 2024-12-11T21:22:00 1733952120

That was 2023; more recently Microsoft is losing ground to Google (in 2024).

pelorat · 2024-12-11T16:37:08 1733935028

Well, compared to github copilot (paid), I think Gemini Free is actually better at writing non-archaic code.

rafaelmn · 2024-12-11T16:41:18 1733935278

Using Claude 3.5 sonnet ?

jacooper · 2024-12-11T19:45:24 1733946324

Gemini is coming to copilot soon anyway.

manishsharan · 2024-12-11T16:14:15 1733933655

>> hard to deny that their LLM models aren't really, really good though.

The context window of Gemini 1.5 pro is incredibly large and it retains the memory of things in the middle of the window well. It is quite a game changer for RAG applications.

KaoruAoiShiho · 2024-12-11T19:14:16 1733944456

It looks like long context degraded from 1.5 to 2.0 according to the 2.0 launch benchmarks.

caeril · 2024-12-11T19:44:59 1733946299

Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16.

Google's marketing wins again, I guess.

bwb · 2024-12-11T18:36:39 1733942199

So far, for my tests, it has performed terribly compared to ChatGPT and Claude. I hope this version is better.

TacticalCoder · 2024-12-12T00:21:38 1733962898

> but hard to deny that their LLM models aren't really, really good though

Although I do still pay for ChatGPT, I find it dog slow. ChatGPT is simply way too slow to generate answers. It feels like --even though of course it's not doing the same thing-- I'm back to the 80s with my 8-bit computer printing thing line by line.

Gemini OTOH doesn't feel like that: answers are super fast.

To me low latency is going to be the killer feature. People won't keep paying for models that are dog slow to answer.

I'll probably be cancelling my ChatGPT subscription soon.

talldayo · 2024-12-11T16:54:01 1733936041

BERT and Gemma 2B were both some of the highest-performing edge models of their time. Google does really well - in terms of pushing efficiency in the community they're second to none. They also don't need to rely on inordinate amounts of compute because Google's differentiating factor is the products they own and how they integrate it. OpenAI is API-minded, Google is laser-focused on the big-picture experience.

For example; those little AI-generated YouTube summaries that have been rolling out are wonderful. They don't require heavyweight LLMs to generate, and can create pretty effective summaries using nothing but a transcript. It's not only more useful than the other AI "features" I interact with regularly, it doesn't demand AGI or chain-of-thought.

closewith · 2024-12-11T18:09:59 1733940599

> Google is laser-focused on the big-picture experience.

This doesn't match my experience of any Google product.

talldayo · 2024-12-11T19:53:56 1733946836

I disagree - another way you could phrase this is that Google is presbyopic. They're very capable of thinking long-term (eg. Google Deepmind and AI as a whole, cloud, video, Drive/GSuite, etc.), but as a result they struggle to respond to quick market changes. AdSense is the perfect example of Google "going long" on a product and reaping the rewards to monopolistic ends. They can corner a market when the set their sights on it.

I don't think Google (or really any of FAANG) makes "good" products anymore. But I do think there are things to appreciate in each org, and compared to the way Apple and Microsoft are flailing helplessly I think Google has proven themselves in software here.

lxgr · 2024-12-11T23:17:51 1733959071

Google does software/features relatively well, but they are completely lost when it comes to marketing, shipping, and continuing to support products.

Or how would you describe their handling of Stadia, or their weird obsession about shipping and cancelling about a dozen instant messengers?

bushbaba · 2024-12-11T17:35:28 1733938528

Yet, google continues to show it'll deprecate it's APIs, Services, and Functionality at the detriment of your own business. I'm not sure enterprises will trust Google's LLM over the alternatives. Too many have been burned throughout the years, including GCP customers.

The fact GCP needs to have this page, and these lists are not 100% comprehensive is telling enough. https://cloud.google.com/compute/docs/deprecations https://cloud.google.com/chronicle/docs/deprecations https://developers.google.com/maps/deprecations

Steve Yegge rightfully called this out, and yet no change has been made. https://medium.com/@steve.yegge/dear-google-cloud-your-depre...

weatherlite · 2024-12-11T17:47:22 1733939242

GCP grew 35% last quarter , just saying ...

Jabbles · 2024-12-11T21:58:57 1733954337

"just saying" things that are false.

Google Cloud grew 35% year over year, when comparing the 3 months ending September 30th 2024 with 2023.

https://abc.xyz/assets/94/93/52071fba4229a93331939f9bc31c/go... page 12

surajrmal · 2024-12-11T22:31:04 1733956264

Isn't that the typical interpretation of what the parent comment said? How is it false?

mattmerr · 2024-12-11T23:50:02 1733961002

I read parent comment "grew 35% last quarter" as (income on 2024-09-30) is 1.35 * (income on 2024-07-01)

The balance sheet shows (income on days from 2024-07-01 through 09-30) is 1.35 * (income on days from 2023-07-01 through 09-30)

These are different because with heavily handwavey math the first is growing 35% in a single quarter and the second is growing 35% annually (by comparing like-for-like quarters)

superq · 2024-12-12T00:35:00 1733963700

35% over 12 months != 35% over 3 months.

aerhardt · 2024-12-11T19:35:31 1733945731

> seems to be getting the right results

> hard to deny that their LLM models aren't really, really good though

I'm so scarred by how much their first Gemini releases sucked that the thought of trying it again doesn't even cross my mind.

Are you telling us you're buying this press release wholesale, or you've tried the tech they're talking about and love it, or you have some additional knowledge not immediately evident here? Because it's not clear from your comment where you are getting that their LLM models are really good.

MaxDPS · 2024-12-11T22:18:23 1733955503

I’ve been using Gemini 1.5 Pro for coding and it’s been great.

bradhilton · 2024-12-11T15:51:03 1733932263

Beats Gemini 1.5 Pro at all but two of the listed benchmarks. Google DeepMind is starting to get their bearings in the LLM era. These are the minds behind AlphaGo/Zero/Fold. They control their own hardware destiny with TPUs. Bullish.

p1esk · 2024-12-11T15:58:38 1733932718

Are these benchmarks still meaningful?

maeil · 2024-12-11T16:53:15 1733935995

No, and they haven't been for at least half a year. Utterly optimized for by the providers. Nowadays if a model would be SotA for general use but not #1 on any of these benchmarks, I doubt they'd even release it.

CamperBob2 · 2024-12-11T20:14:57 1733948097

I've started keeping an eye out for original brainteasers, just for that reason. GCHQ's Christmas puzzle just came out [1], and o1-pro got 6 out of 7 of them right. It took about 20 minutes in total.

I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.

Meanwhile, Google's newest 2.0 Flash model went 0 for 7.

1: https://metro.co.uk/2024/12/11/gchq-christmas-puzzle-2024-re...

iamdelirium · 2024-12-11T21:42:08 1733953328

Why are you comparing flash vs o1-pro, wouldn't a more fair comparison be flash vs mini?

iamdelirium · 2024-12-11T21:49:54 1733953794

I just ask o1-mini the first two questions and it got it wrong.

CamperBob2 · 2024-12-11T23:57:26 1733961446

It's the only Google model that my account has access to that accepts .PNG files. I assumed it was the latest/greatest experimental 2.0 release.

If they want a rematch, they'll need to bring their 'A' game next time, because o1-pro is crazy good.

nrvn · 2024-12-11T21:19:45 1733951985

Did it get the 8 right? The linked article provides the wrong answer btw.

CamperBob2 · 2024-12-12T00:01:04 1733961664

I didn't see a straightforward way to submit the final problem, because I used different contexts for each of the 7 subproblems.

Given the right prompt, though, I'm sure it could handle the 'find the corresponding letter from the landmarks to form an anagram' part. That's easier than most of the other problems.

You're saying the ultimate answer isn't 'PROTECTING THE UNITED KINGDOM'?

p1esk · 2024-12-11T21:41:20 1733953280

Wow! That’s all I need to know about Google’s model.

Workaccount2 · 2024-12-11T22:43:55 1733957035

What is impressive about this new model is that it is the lightweight version (flash).

There will probably be a 2.0 pro (which will be 4o/sonnet class) and maybe an ultra (o1(?)/Opus).

danpalmer · 2024-12-11T22:09:39 1733954979

That's a comparison of multiple GPT-4 models working together... against a single GPT-4 mini style model.

dagmx · 2024-12-11T15:58:43 1733932723

Regarding TPU’s, sure for the stuff that’s running on the cloud.

However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.

Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.

I am curious if they’ll introduce something like Apple’s private cloud compute.

whimsicalism · 2024-12-11T16:03:53 1733933033

i don’t think they need to win the on device market.

we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference

maeil · 2024-12-11T16:51:02 1733935862

> i don’t think they need to win the on device market.

The second Apple comes out with strong on-device AI - and it very much looks like they will - Google will have to respond on Android. They can't just sit and pray that e.g. Samsung makes a competitive chip for this purpose.

SimianSci · 2024-12-11T19:44:54 1733946294

I think Apple is uniquely disadvantaged in the AI race to a point people dont realize. They have less training data to use, having famously been focused on privacy for its users and thus having no particular advantage in this space due to not having customer data to train on. They have little to no cloud business, and while they operate a couple of services for their users, they do not have the infrastructure scale to compete with hyperscaler cloud vendors such as Google and Microsoft. Most of what they would need to spend on training new models would require that they hand over lots of money to the very companies that already have their own models, supercharging their competition.

While there is a chance that Apple might come out with a very sophisticate on-device model. The problem here is that they would only be able to compete with other on-device models. The magnitude of compute needed to keep pace with SOA models is not achievable on a single device. It will take many generations of Apple silicon in order to compete with the compute of existing datacenters.

Google also already has competitive silicon in this space with the Tensor series processors, which are being fabbed at Samsung plants today. There is no sitting and praying necessary on their part as they already compete.

Apple is a very distant competitor in the space of AI, and I see no reason to assume this will change, they are uniquely disadvantaged by several of the choices they made on their way to mobile supremacy. The only thing they currently have going for them is the development of their own ARM silicon which may give them the ability to compete with Google's TPU chips, but there is far more needed to be competitive here than the ability to avoid the Nvidia tax.

mark_l_watson · 2024-12-12T01:30:25 1733967025

It is likely Apple can get additional data by creating synthetic data for user interactions.

About 7 years ago I trained GAN models to generate synthetic data, and it worked so well. The state of the art has increased a lot in 7 years, so Apple will be fine.

simonw · 2024-12-11T22:00:33 1733954433

"having famously been focused on privacy for its users and thus having no particular advantage in this space due to not having customer data to train on"

That may not be as big a disadvantage as you think.

Anthropic claim that they did not use any data from their users when they trained Claude 3.5 Sonnet.

whimsicalism · 2024-12-11T22:49:47 1733957387

sure but they certainly acquired data from mass scraping (including of data produced by their users) and/or data brokering aka paying someone to do the same.

whimsicalism · 2024-12-11T20:08:28 1733947708

yeah i’ve never understood the outsized optimism for apple’s ai strategy, especially on hn.

they’re a little bit less of a nobody than they used to be, but they’re basically a nobody when it comes to frontier research/scaling. and the best model matters way more than on-device which can always just be distilled later and find some random startup/chipco to do inference

msabalau · 2024-12-11T20:59:25 1733950765

Theory: Apple's lifestyle branding is quite important to the identity of many in the community here. I mean, look at the buy-in at launch for Apple Vision Pro by so many people on HN--it made actual Apple communities and publications look like jaded skeptics.

reportingsjr · 2024-12-11T19:13:21 1733944401

The Android on chip AI is and has been leagues better than what is available on iOS.

If anything, I think the upcoming iOS AI update will bring them to a similar level as android/google.

petra · 2024-12-11T17:31:42 1733938302

But given inference time compute, to give a strong reply reasonably fast, you'll need a lot of compute, very rarely used.

Economically this fits the cloud much better.

dagmx · 2024-12-11T16:38:02 1733935082

At what point does the on device stuff eat into their market share though? As on device gets better, who will pay for cloud compute? Other than enterprise use.

I’m not saying on device will ever truly compete at quality, but I believe it’ll be good enough that most people don’t care to pay for cloud services.

whimsicalism · 2024-12-11T17:02:14 1733936534

You're still focused about inference :)

inference basically does not matter, it is a commodity

dagmx · 2024-12-11T17:08:41 1733936921

You’re still focused about training :)

training doesn’t matter if inference costs are high and people don’t pay for them

whimsicalism · 2024-12-11T17:28:50 1733938130

but inference costs arent high already and there are tons of hardware companies that can do relatively cheap LLM inference

dagmx · 2024-12-11T17:42:48 1733938968

Inference costs per invocation aren’t high. Scale it out to billions of users and it’s a different story.

Training is amortized over each inference, so the cost of inference also needs to include the cost of training to break even unless made up elsewhere

rowanG077 · 2024-12-11T18:10:18 1733940618

That makes no sense. Inference cost dwarf training cost if you have a succesfull product pretty quickly. Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.

whimsicalism · 2024-12-11T18:44:35 1733942675

> Afaik there is no commodity hardware that can run state of the art models like chatgpt-o1.

Stack enough GPUs and any of them can run o1. Building a chip to infer LLMs is much easier than building a training chip.

Just because one cost dwarfs another does not mean that this is where the most marginal value from developing a better chip will be, especially if other people are just doing it for you. Google gets a good model, inference providers will be begging to be able to run it on their platform, or to just sell google their chips - and as I said, inference chips are much easier.

menaerus · 2024-12-11T20:33:30 1733949210

Each GPU costs ~50k. You need at least 8 of them to run mid-sized models. Then you need a server to plug those GPUs into. That's not commodity hardware.

whimsicalism · 2024-12-11T22:48:53 1733957333

more like ~$16k for 16 3090s. AMD chips can also run these models. The parts are expensive but there is a competitive market in processors that can do LLM inference. Less so in training.

vineyardmike · 2024-12-11T17:09:39 1733936979

I don’t think the AI market will ever really be a healthy one until inference vastly outnumbers training. What does it say about AI if training is done more than inference?

I agree that the in-device inference market is not important yet.

whimsicalism · 2024-12-11T17:28:19 1733938099

done more != where the value is at

inference hardware is a commodity in a way that training is not

YetAnotherNick · 2024-12-11T16:45:57 1733935557

If the model weights is not open, you can't run it on device anyways.

kridsdale1 · 2024-12-11T16:50:22 1733935822

The Pixel 9 runs many small proprietary Gemini models on the internal TPU.

griomnib · 2024-12-11T17:17:11 1733937431

And yet these new models still haven’t reached feature parity with Google Assistant, which can turn my flashlight on, but with all the power of burning down a rainforest, Gemini still cannot interact with my actual phone.

lern_too_spel · 2024-12-11T18:27:20 1733941640

I just tried asking my phone to turn on the flashlight using Gemini. It worked. https://9to5google.com/2024/11/07/gemini-utilities-extension...

griomnib · 2024-12-11T19:26:19 1733945179

Ok I tried literally last week on Pixel 7a and it didn’t work. What model do you have? Maybe it requires a phone that can do on-device models?

_puk · 2024-12-11T23:22:55 1733959375

Works on a Pixel 4A 5G..

Pretty sure that's not doing any fancy on-device models!

That said, there was a popup today saying that assistant is now using Gemini, so I just enabled it to try. Could well have changed in the last week.

staticman2 · 2024-12-11T22:21:51 1733955711

I just tried it on my Galaxy Ultra s23 and it worked. I then disconnected internet and it did not work.

YetAnotherNick · 2024-12-11T17:28:06 1733938086

Gemini nano weights are leaked and google doesn't care about it being leaked. Google would definitely care if Pro weights are leaked.

onlyrealcuzzo · 2024-12-11T20:00:13 1733947213

Is there any phone in the world that can realistically run pro weights?

mupuff1234 · 2024-12-11T16:13:01 1733933581

Majority of people want better performance, running locally is just a nice to have feature.

griomnib · 2024-12-11T17:18:23 1733937503

Latency is a huge factor in performance, and local models often have a huge edge. Especially on mobile devices that could be offline entirely.

dagmx · 2024-12-11T16:34:46 1733934886

They’ll care though when they have to pay for it, or when they’re in an area with poor reception.

vineyardmike · 2024-12-11T17:19:29 1733937569

Poor reception is rapidly becoming a non-issue for most of the developed world. I can’t think of the last time I had poor reception (in America) and wasn’t on an airplane.

As the global human population increasingly urbanizes, it’ll become increasingly easy to blanket it with cell towers. Poor(er) regions of the world will increase reception more slowly, but they’re also more likely to have devices that don’t support on-device models.

Also, Gemini Flash is basically positioned as a free model, (nearly) free API, free in GUI, free in Search Results, Free in a variety of Google products, etc. No one will be paying for it.

dagmx · 2024-12-11T17:32:42 1733938362

Many major cities have significant dead spots for coverage. It’s not just for developing areas.

Flash is free for api use at a low rate limit. Gemini as a whole is not free to Android users (free right now with subscription costs beyond a time period for advanced features) and isn’t free to Google without some monetary incentive. Hence why I also originally ask about private cloud compute alternatives with Google.

michaelmrose · 2024-12-11T23:32:30 1733959950

I ride a ferry from a city of 50k to a city of 700k in the US and work in a building with apartments upstairs basically a concrete cave.

I see poor reception in both areas and only one has WiFi.

mupuff1234 · 2024-12-11T16:39:05 1733935145

They pay to run it locally as well (more expensive hardware)

And sure, poor reception will be an issue, but most people would still absolutely take a helpful remote assistant over a dumb local assistant.

And you don't exactly see people complaining that they can't run Google/YouTube/etc locally.

dagmx · 2024-12-11T17:02:25 1733936545

Your first sentence has the fallacy that you’re attributing the cost of the device to a single feature against the cost of that single feature.

Most people are unlikely to buy the device for the AI features alone. It’s a value add to the device they’d buy anyway.

So you need the paid for option to be significantly better than the free one that comes with the device.

Your second sentence assumes the local one is dumb. What happens when local ones get better? Again how much better is the cloud one to compete on cost?

To your last sentence, it assumes data fetching from the cloud. Which is valid but a lot of data is local too. Are people really going to pay for what Google search is giving them for free?

mupuff1234 · 2024-12-11T17:08:00 1733936880

I think it's a more likely assumption that on device performance will trail off device models by a significant margin for at least the next few years - of course if magically you can make it work locally with the same level of performance it would be better.

Plus a lot of the "agentic" stuff is interaction with the outside world, connectivity is a must regardless.

dagmx · 2024-12-11T17:09:52 1733936992

My point is that you do NOT need the same level of performance. You need an adequate level of performance that the cost to get more performance isn’t worth it to most people.

mupuff1234 · 2024-12-11T17:13:19 1733937199

And my point is that it's way too early to try to optimize for running locally, if performance really stabilizes and comes to a halt (which may likely happen) then it makes more sense to optimize.

Plus once you start with on device features you start limiting your development speed and flexibility.

jsight · 2024-12-11T17:51:03 1733939463

It isn't really hypothetical. Lots of good models run well on a modern Macbook Pro.

YetAnotherNick · 2024-12-11T16:48:04 1733935684

You can run model >100x faster in cloud compared to on device with DDR RAM. This would make up for the reception.

dagmx · 2024-12-11T17:03:05 1733936585

And you can’t run the cloud model at all if you can’t talk to the cloud.

YetAnotherNick · 2024-12-11T17:25:06 1733937906

Yes, but I can't imagine situations where I "have" to run a model when I don't have internet at that time. My life would be more affected with the rest of the internet than having to run a small stupid model locally. At the very least until the hallucination is completely solved, as I need internet to verify the models.

dagmx · 2024-12-11T17:34:32 1733938472

You’re assuming the model is purely for generation though. Several of the Gemini features are lookup of things across data available to it. A lot of that data can be local to device.

That is currently Apple’s path with Apple Intelligence for example.

michaelmrose · 2024-12-11T23:35:34 1733960134

Hallucination can't be solved because bogus output is categorically the same sort of thing as useful output.

It has no world model. It doesn't know truth any more than it knows bullshit just a statistical relationship between words.

VirusNewbie · 2024-12-11T18:37:25 1733942245

If you look at where talent is going, it's Anthropic that is the real competitor to Google, not OpenAI.

JeremyNT · 2024-12-11T18:11:46 1733940706

Yeah they've been slow to release end-user facing stuff but it's obvious that they're just grinding away internally.

They've ceded the fast mover advantage, but with a massive installed base of Android devices, a team of experts who basically created the entire field, a huge hardware presence (that THEY own), massive legal expertise, existing content deals, and a suite of vertically integrated services, I feel like the game is theirs to lose at this point.

The only caution is regulation / anti-trust action, but with a Trump administration that seems far less likely.

serjester · 2024-12-11T18:34:14 1733942054

Buried in the announcement is the real gem — they’re releasing a new SDK that actually looks like it follows modern best practices. Could be a game-changer for usability.

They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.

[1]https://github.com/googleapis/python-genai

redrix · 2024-12-12T00:55:34 1733964934

Oh wow, it supports directly specifying a Pydantic model as an output schema that it will adhere to for structured JSON output. That’s fantastic!

https://github.com/googleapis/python-genai?tab=readme-ov-fil...

mark_l_watson · 2024-12-12T01:31:24 1733967084

I looked carefully at the SDK earlier today - it does look very nice, but it is also a work in progress.

pkkkzip · 2024-12-11T20:30:10 1733949010

its interesting that just as the LLM hype appears to be simmering down, DeepMind is making big strides. I'm more excited by this than any of OpenAI's announcements.

airstrike · 2024-12-11T16:03:03 1733932983

OT: I’m not entirely sure why, but "agentic" sets my teeth on edge. I don't mind the concept, but the word itself has that hollow, buzzwordy flavor I associate with overblown LinkedIn jargon, particularly as it is not actually in the dictionary...unlike perfectly serviceable entries such as "versatile", "multifaceted" or "autonomous"

OutOfHere · 2024-12-11T16:51:18 1733935878

To play devil's advocate, the correct use of the word would be when multiple AIs are coordinating and handing off tasks to each other with limited context, such that the handoffs are dynamically decided at runtime by the AI, not by any routine code. I have yet to see a single example where this is required. Most problems can be solved with static workflows and simple rule based code. As such, I do believe that >95% of the usage of the word is marketing nonsense.

jasonsteving · 2024-12-12T01:13:28 1733966008

You nailed an interesting nuance there about agents needing to make their own decisions!

I'm getting fairly excited about "agentic" solutions to the point that I even went out of my way to build "AgentOfCode" (https://github.com/JasonSteving99/agent-of-code) to automate solving Advent of Code puzzles by iteratively debugging executions of generated unit tests (intentionally not competing on the global leaderboard).

And even for this, there's actually only a SINGLE place in the whole "agent" where the models themselves actually make a "decision" on what step to take next, and that's simply deciding whether to refactor the generated unit tests or the generated solution based on the given error message from a prior failure.

maeil · 2024-12-11T16:55:24 1733936124

I actually have built such a tool (two AIs, each with different capabilities), but still cringe at calling at agentic. Might just be an instinctive reflex.

danpalmer · 2024-12-11T22:17:12 1733955432

I think this sort of usage is already happening, but perhaps in the internal details or uninteresting parts, such as content moderation. Most good LLM products are in fact using many LLM calls under the hood, and I would expect that results from one are influencing which others get used.

wepple · 2024-12-11T18:15:31 1733940931

Versatile is far worse. It’s so broad to the point of meaninglessness. My garden rake is fairly versatile.

Agentic to me means that it acts somewhat under its own authority rather than a single call to an LLM. It has a small degree of agency.

thom · 2024-12-11T16:33:00 1733934780

I'm personally very glad that the word has adhered itself to a bunch of AI stuff, because people had started talking about "living more agentically" which I found much more aggravating. Now if anyone states that out loud you immediately picture them walking into doors and misunderstanding simple questions, so it will hopefully die out.

geodel · 2024-12-11T16:19:08 1733933948

Huh, all three words you mentioned as replacement are equally buzzwordy and I see them a lot in CVs while screen candidates for job interview.

lolinder · 2024-12-11T16:33:06 1733934786

They agree—they're saying that at least those buzzwords are in the dictionary, not that they'd be a good replacement for "agentic".

raincole · 2024-12-11T16:24:41 1733934281

Versatile implies it can to more kinds of tasks (than it's predecessor or competitor). Agentic implies it requires less human intervention.

I don't think these are necessary buzzwords if the product really does what they imply.

airstrike · 2024-12-11T16:21:55 1733934115

At least all three of them are actually in the dictionary

hombre_fatal · 2024-12-11T19:58:50 1733947130

That's not necessarily a good thing because they are overloaded while novel jargon is specific.

We use new words so often that we take it for granted. You've passively picked up dozens of new words over the last 5 or 10 years without questioning them.

ramoz · 2024-12-11T17:02:33 1733936553

Need a general term for autonomous intelligent decision making.

aithrowawaycomm · 2024-12-11T17:57:26 1733939846

No, we need a scientific understanding of autonomous intelligent decision-making. The problem with “agentic AI” is the same old “Artificial Intelligence, Natural Stupidity” problem: we have no clue what “reasoning” or “intelligence” or “autonomous” actually means in animals, and trying to apply these terms to AI without understanding them (or inventing a new term without nailing down the underlying concept) is doomed to fail.

airstrike · 2024-12-11T17:04:28 1733936668

Isn't that just "intelligent"?

ramoz · 2024-12-11T17:39:12 1733938752

We need something to describe a behavioral element in business processes. Something goes into it, something comes out of it - though in this case nondeterminism is involved and it may not be concrete outputs so much as further actioning.

Intelligence is a characteristic.

airstrike · 2024-12-11T17:52:06 1733939526

Volitional, independent, spontaneous, free-willed, sovereign...

m3kw9 · 2024-12-11T16:57:52 1733936272

Yeah I hate it when AI companies throw around words like AGI and agentic capabilities. It’s non sense to most people and ambiguous at best

christianqchung · 2024-12-11T23:43:24 1733960604

This is what other replies are missing - I've been following AI closely since GPT 2 and it's not immediately clear what agentic means, so to other people, the term must be even less clear. Using the word autonomous can't be worse than agentic imo.

og_kalu · 2024-12-11T15:43:20 1733931800

The Gemini 2 models support native audio and image generation but the latter won't be generally available till January. Really excited for that as well as 4o's image generation (whenever that comes out). Steerability has lagged behind aesthetics in image generation for a while now and it's be great to see a big advance in that.

Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc

They have a demo: https://www.youtube.com/watch?v=7RqFLp0TqV0

jncfhnb · 2024-12-11T16:01:25 1733932885

These are not computer vision tasks…

newfocogi · 2024-12-11T17:35:19 1733938519

Maybe some of these tasks are arguably not aligned with the traditional applications of CV, but Segmentation and Edge detection are definitely computer vision in every definition I've come across - before and after NNs took over.

Jabrov · 2024-12-11T17:12:20 1733937140

What are they, then…?

85392_school · 2024-12-11T17:36:33 1733938593

The first two are tasks which involve making images. They could be called image generation or image editing.

losvedir · 2024-12-11T16:51:26 1733935886

This naming is confusing...

Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.

I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.

I look forward to seeing how it handles upcoming problems!

I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.

For the more Java proficient, can someone explain why it may have provided this code:

     for (int[] current = queue.remove(0)) {

which was a compilation error for me? The corrected code it gave me afterwards was just

     for (int[] current : queue) {

and with that one change the class ran and gave the right solution.

srameshc · 2024-12-11T16:57:28 1733936248

I use a Claude and Gemini a lot for coding and I realized there is no good or best model. Every model has it's upside and downside. I was trying to get authentication working according to the newer guidelines of Manifest V3 for browser extensions and every model is terrible. It is one use case where there is not much information or right documentation so every model makesup stuff. But this is my experience and I don't speak for everyone.

huijzer · 2024-12-11T17:04:39 1733936679

Relatedly, I start to think more and more the AI is great for mediocre stuff. If you just need to do the 1000th website, it can do that. Do you want to build a new framework? Then there will probably be less many useful suggestions. (Still not useless though. I do like it a lot for refactoring while building xrcf.)

EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.

meiraleal · 2024-12-11T19:19:14 1733944754

We got to the point that AI isn't great because it is not like a Tarantino movie. What a time to be alive.

copperx · 2024-12-11T17:22:33 1733937753

That's when having a huge context is valuable. Dump all of the new documentation into the model along with your query and the chances of success hugely increase.

monkmartinez · 2024-12-11T17:06:27 1733936787

This is true for all newish code bases. You need to provide the context it needs to get the problem right. It has been my experience that one or two examples with new functions or new requirements will suffice for a correction.

xnx · 2024-12-11T19:34:02 1733945642

> I use a Claude and Gemini a lot for coding and I realized there is no good or best model.

True to a point, but is anyone using GPT2 for anything still? Sometimes the better model completely supplants others.

notamy · 2024-12-11T17:00:54 1733936454

> For the more Java proficient, can someone explain why it may have provided this code:

To me that reads like it was trying to accomplish something like

    int[] current;
    while((current = queue.pop()) != null) {

rybosome · 2024-12-11T17:18:00 1733937480

I can't comment on why the model gave you that code, but I can tell you why it was not correct.

`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:

``` for (int[] current : queue) { for (int c : current) { // ...do stuff... } } ```

Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.

ianmcgowan · 2024-12-11T16:55:52 1733936152

A tangent, but is there a clear best choice amongst those models for AOC type questions?

siliconc0w · 2024-12-11T16:25:10 1733934310

What's everyone's favorite LLM leaderboard? Gemini 2 seems to be edging out 4o on chatbot arena(https://lmarena.ai/?leaderboard)

danpalmer · 2024-12-11T22:38:56 1733956736

Notably, GPT-4o is a "full size" model, whereas Gemini 2 Flash is the small and efficient variant in that family as far as I understand it.

IAmGraydon · 2024-12-11T19:51:26 1733946686

https://aider.chat/docs/leaderboards/

zhyder · 2024-12-11T18:39:13 1733942353

I like that https://artificialanalysis.ai/leaderboards/models describes both quality and speed (tokens/s and first chunk s). Not sure how accurate it is; anyone know? Speed and variance of it in particular seems difficult to pin down because providers obviously vary it with load to control their costs.

manishsharan · 2024-12-11T17:07:48 1733936868

Leaderboards are not that useful for measuring real-life effectiveness of the models atleast in my day-today usage.

I am currently struggling to diagnose an ipv6 mis-configuration in my enormous aws cloudformation yaml code. I gave the same input to Claude Opus, Gemini and ChatGPT ( o1 and 4o).

4o was the worst. verbose and waste of my time.

Claude completely went off-tangent and began recommending fixes for ipv4 while I specifically asked for ipv6 issues

o1 made a suggestion which I tried out and it fixed it. It literally found a needle in the haystack. The solution is working well now.

Gemini made a suggestion which almost got it right but it was not a full solution.

I must clarify diagnosing network issues on AWS VPC is not my expertise and I use the LLMs to supplement my knowledge.

blastbking · 2024-12-11T20:06:37 1733947597

Sonnet 3.5 as of today is superior to Opus, curious if sonnet could have solved your problem

lossolo · 2024-12-11T20:25:46 1733948746

https://livebench.ai/#/

SV_BubbleTime · 2024-12-11T16:30:10 1733934610

AI benchmarks and leaderboards are complete nonsense though.

Find something you like, use it, be ready to look again in a month or two.

falcor84 · 2024-12-11T17:20:32 1733937632

With the accelerating progress, the "be ready to look again" is becoming a full time job that we need to be able to delegate in some way, and I haven't found anything better than benchmarks, leaderboards and reviews.

EDIT: Typo

siliconc0w · 2024-12-11T18:01:50 1733940110

FWIW I've found the 'coding' 'category' of the leaderboard to be reasonably accurate. Claude was the best, o1-mini then was typically stronger, now the Gemini Exp 1206 is at the top.

I find myself just paying a la carte via the API rather than paying the $20/mo so I can switch between the models.

hombre_fatal · 2024-12-11T20:04:58 1733947498

poe.com has a decent model where you buy credits and spend them talking to any LLM which makes it nice to swap between them even during the same conversation instead of paying for multiple subscriptions.

Though gpt-4o could say "David Mayer" on poe.com but not on chat.openai.com which makes me wonder if they sometimes cheat and sneak in different models.

tkgally · 2024-12-12T00:59:37 1733965177

I tried accessing Gemini 2.0 Flash through Google AI Studio in the Safari browser on my iPhone, and to my surprise it worked. After I gave it access to my microphone and camera, I was able to have a pretty smooth conversation with it about what it saw through the camera. I pointed the camera at things in my room and asked what they were, and it identified them accurately. It was also able to read text in both English and Japanese. It correctly named a note I played on a piano when I showed it the keyboard with my finger playing the note, but it couldn’t identify notes by sound alone.

The latency was low, though the conversation got cut off a few times.

jncfhnb · 2024-12-11T16:00:35 1733932835

Am I alone in thinking the word “agentic” is dumb as shit?

Most of these things seem to just be a system prompt and a tool that get invoked as part of a pipeline. They’re hardly “agents”.

They’re modules.

thomassmith65 · 2024-12-11T16:39:28 1733935168

It's easier for consultants and sales people to sell to enterprise if the terminology is familiar but mysterious.

Bad

  1. installed Antivirus software
  2. added screen-size CSS rules
  3. copied 'Assets' harddrive to DropBox
  4. edited homepage to include Bitcoin wallet address link
  5. upgraded to ChatGPT Pro

"Good"

  1. Cyber-security defenses
  2. Responsive Design implementation
  3. Cloud Storage
  4. Blockchain Technology gateway
  5. Agentic enhancements

xnx · 2024-12-11T16:44:57 1733935497

Controlling a browser in Project Mariner seems very agentic: https://youtu.be/Fs0t6SdODd8?t=86

Agentus · 2024-12-11T18:38:42 1733942322

The beauty of LLMs isn’t just these coding objects speak human vernacular but they can be concatenated with human vernacular prompts and that itself can be used as an input, command or output sensibly without necessarily causing error even if a series of inputs combinations weren't preprogrammed.

I have an A.I. textbook that has agent terminology that was written preLLm days. agents are just autonomous ish code that loops on itself with some extra functionality. LLMs in their elegance can more easily out the box selfloop just on the basis concatenating language prompts, sensibly. They are almost agent ready out the box by this very elegant quality(the textbook agentic diagram is just a conceptual self perpetuation loop), except…

Except they fail at a lot or get stuck at hiccups. But, here is a novel thought. What if an LLM becomes more agentic (ie more able to sustain autonomous chain prompts that do actions without a terminal failure) and less copilotee not by more complex controlling wrapper self perpetuation code, but by means of training the core llm itself to more fluidly function in agentic scenarios.

a better agentically performing llm that isnt mislabeled with a bad buzzword might not reveal itself in its wrapper control code but through it just performing better in an typical agentic loop or environment conditions with whatever initiating prompt, control wrapper code, or pipeline that initiates its self perpetuation cycle.

Havoc · 2024-12-11T21:56:49 1733954209

>“agentic” is dumb as shit?

It'll create endless consulting opportunities for projects that never go anywhere and add nothing of value unless you value rich consultants.

uludag · 2024-12-11T18:31:07 1733941867

Definitely not alone. With all the this money at stake, coining dumb terms like this might make you a pretty penny.

It's like a meme that can be milked for monetization.

WA · 2024-12-11T19:45:21 1733946321

Gemini, too, for the sole reason that non-native speakers have no clue how to pronounce it.

kaashif · 2024-12-11T20:11:05 1733947865

Also, people at NASA pronounce it two ways, even native speakers of English.

purple-leafy · 2024-12-11T23:12:09 1733958729

pronounced: juh-meany .... right?

coayer · 2024-12-12T01:25:47 1733966747

I say jem-in-eye in my English accent, Google search says jeh·muh·nai

EternalFury · 2024-12-11T16:27:51 1733934471

Think of Google as of a tanker ship. It takes a while to change course, but it has great momentum. Sundar just needs to make sure the course is right.

CSMastermind · 2024-12-11T18:16:39 1733940999

That's almost word for word what people said about Windows Phone when I was at Microsoft.

rrrrrrrrrrrryan · 2024-12-11T20:43:52 1733949832

Windows Phone was actually great though, and would've eventually been a major player in the space if Microsoft were stubborn enough to stick with it long enough, like they did with the Xbox.

By his own admission, Gates was extremely distracted at the time by the antitrust cases in Europe, and he let the initiative die.

onlyrealcuzzo · 2024-12-11T20:04:20 1733947460

But Windows Phone was actually good, like Xune, it was just late, and it was incredibly popular to hate Microsoft at the time.

Additionally, Microsoft didn't really have any advantage in the smart phone space.

Google is already a product the majority of people on the planet use regularly to answer questions.

That seems like a competitive advantage to me.

machiaweliczny · 2024-12-11T20:18:50 1733948330

Yeah, I liked my windows phone, not sure why they killed it

atorodius · 2024-12-11T18:26:52 1733941612

Was the Windows Phone ever at the frontier tho?

scarmig · 2024-12-11T23:47:58 1733960878

Windows Phone was superior to everything else on the market at the time. But phones are an ecosystem, and MS was a latecomer.

zaptrem · 2024-12-11T18:44:11 1733942651

It is a lot easier to switch LLMs than it is to switch smartphone platforms.

griomnib · 2024-12-11T17:25:00 1733937900

And where is the ship headed if they are no longer supporting the open web?

Publishers are being squeezed and going under, or replacing humans with hallucinated genai slop.

It’s like we’re taking the private equity model of extracting value and killing something off to the entire web.

I’m not sure where this is headed, but I don’t think Sundar has any strategy here other than playing catch up.

Demis’ goal is pretty transparently positioning himself to take over.