Hacker News new | past | comments | ask | show | jobs | submit login

This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting. The responsiveness and apparent personality are pretty mind blowing. It’s similar to what OpenAI had initially demoed for advanced voice mode, at least for the voice conversation portion.

The demo interactions are recorded, which is mentioned in their disclaimer under the demo UI. What isn't mentioned though is that they include past conversations in the context for the model on future interactions. It was pretty surprising to be greeted with something like "welcome back" and the model being able to reference what was said in previous interactions. The full disclaimer on the page for the demo is:

" 1. Microphone permission is required. 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days. 3. By using this demo, you are agreeing to our "

edit: Actually this has been posted quite a few times already and had good visibility a couple days ago: - https://news.ycombinator.com/item?id=43200400 Others: https://hn.algolia.com/?q=sesame.com




It was genuinely startling how human it felt. Apparently they are planning on open-sourcing some of their work as well as selling glasses (presumably with the voice assistant). I’m very excited to have a voice assistant like this and am almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.


I still feel like they don't have the right amount of human to them, maybe it's because I'm Australian and it sounds like I'm hearing an American robot?

Edit: well I asked the "male" model to speak more like an Australian and yep, getting way more uncanny. If it had an Australian accent I think it would mess with me more


Maybe the ability to personalize the voice so it is more... robotic or based on a fictional thing like Knight Rider would help to change the attachment to something more... healthy?


Yeah this is straight up creepy, and I also can't stand chatgpt saying "Lmao" and "Yeah". Keep it formal & robotic.


What ever did you tell ChatGPT so it responded with "lmao"?

I told it that it should behave explicitly like a computer in the system prompt, sort of worked.


After multiple prompts and utterly garbage output: https://i.imgur.com/5aOARCV.png

I'm almost positive that some AI systems have a backend that analyzes the sentiment of your messages and if you threaten to cancel billing it will notice your defcon-1 sentiment and spin up some more powerful instances behind the scenes to tide you over.

This is actually much more stressful than working without any AI as I have to decompress from constantly verbally obliterating a robotic intern.

I'll try with the system prompt. Also love your username.


> After multiple prompts

It generally maintains the tone you set. Remember that it outputs most likely tokens based on the system prompt of its owners + your system prompt + the whole conversation. If OpenAI and default system prompt tell it that it's a helpful cheerful secretary/assistant, you get best results if you talk to it "professionally".

I heard you could make Claude say "kurwa" a lot while helping you program in Go if you convince it that you want a conversation with your ziomek Seba from your backyard with whom you like to share kebab and browar, so there goes.


> This was already posted here: https://news.ycombinator.com/item?id=43221377 but I’m really surprised at the lack of attention this model is getting.

I'm surprised by the lack of attention that Gemini 2.0 with native audio output got. They have a demo at https://youtu.be/qE673AY-WEI, which I think is really good too. The main problem with Google's model is that this audio output is not supported by the API, but you can try it at https://aistudio.google.com.

In general, text to speech is pretty good nowadays I think. For example, this is a little math video that I made a few days ago: https://www.youtube.com/watch?v=G1mvLrCfjFM with the (old) Google text to speech API. Honestly, I think the narration is better than I personally could have done. It's calm, well pronounced, and sounds relatively enthusiastic.


>They have a demo at https://youtu.be/qE673AY-WEI

That's not a demo, that's a video. Anyone can make something like that in an afternoon with a couple friends and a microphone.

Also, Google is known for putting out fake "demos", remember the Google Duplex scam?


Scam? Duplex worked.


I thought it was announced and never heard from again. It may have worked but it never shipped did it?


I made some restaurant reservations, it worked.


I see, I guess it was never a standalone product then, from reading a Reddit post, it’s a feature built into assistant. Thanks, solves a mystery for me.


It was never real. They even admitted they used real people for the service. It was a scam.

Also, that would be quite hard to pull today, 2025, after transformers etc. There's absolutely no chance they were sitting on that back in 2018.


I know people who worked on it. It was real. They used real people for some calls, in some cases, but a vast majority of calls made through the system with 100% automatic.


Source: trust me, bro.

Meanwhile, the CEO of the f company admitted it was false, but sure, you know better ;).


He didn't, otherwise you would have linked to a quote. But OK. Believe what you want.


I doesn't work today, let alone 6 years ago.

But good work defending your master.


How do I get to this in aistudio.google.com?


I think the one under "Stream Realtime" should be similar to the demo. It's only Gemini 2.0 flash though and not the full one.


It really is an astonishing technological feat! Also note that the largest model they trained is only 8.3B parameters (8B backbone + .3B decoder). It's exciting to think that they're going to be releasing this model under an Apache 2.0 license.


Just realizing how uncanny valley it is to talk to AI and it never remembers anything you said in the past. Imagine if a human did that. It’s like you are talking to Tom Hanks’ Mr. Short Term Memory from SNL over and over.

https://youtube.com/watch?v=C6ufImch00g


I does remember but you have to ask for. Try to say "make a bookmark at this point" and later ask for that bookmark. You can even give the bookmark a name or ask it to do so for you.


That can easily be fixed if you attach it to a RAG system


> 2. Calls are recorded for quality review but not used for ML training and are deleted within 30 days.

Sounds (pun intended) reasonable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: