Hacker News new | past | comments | ask | show | jobs | submit | Anuiran's comments login

Awesome, but I am surprised by the constrained context window as it balloons everywhere else.

Am I missing something? 8k seems quite low in current landscape.


Honestly, I swear to god, been working 12 hours a day with these for a year now, llama.cpp, Claude, OpenAI, Mistral, Gemini:

The long context window isn't worth much and is currently creating more problems than it's worth for the bigs, with their "unlimited" use pricing models.

Let's take Claude 3's web UI as an example. We build it, and go the obvious route: we simply use as much of the context as possible, given chat history.

Well, now once you're 50-100K tokens in, the initial prefill takes forever, O(10 seconds). Now we have to display a warning whenever that is the case.

Now we're generating an extreme amount of load on GPUs for prefill, and it's extremely unlikely it's helpful. Writing code? Previous messages are likely to be ones that needed revisions. The input cost is ~$0.02 / 1000 tokens and it's not arbitrary/free, prefill is expensive and on the GPU.

Less expensive than inference, but not that much. So now we're burning ~$2 worth of GPU time for the 100K conversation. And all of the bigs use a pricing model of a flat fee per month.

Now, even our _paid_ customers have to take message limits on all our models. (this is true, Anthropic quietly introduced them end of last week)

Functionally:

Output limit is 4096 tokens, so tasks that are a map function (ex. reword Moby Dick in Zoomer), need the input split into 4096 tokens anyway.

The only use cases I've seen thus far that _legitimately_ benefit are needle in a haystack stuff, video with Gemini, or cases with huuuuuge inputs and small outputs, like, put 6.5 Harry Potter books into Gemini and get a Mermaid diagram out connecting characters.


As a user, I've been putting in some long mathematical research papers and asking detailed questions about them in order to understand certain parts better. I feel some benefit from it because it can access the full context of the paper so it is less likely to misunderstand notation that was defined earlier etc.


Same, that's super useful.


I don't need a million tokens, but 8k is absolutely too few for many of the use cases that I find important. YMMV.


I don't think it's a YMMV thing: no one claims it is useless, in fact, there's several specific examples of it being necessary.


Based on your use cases. I thought it's not hard to push the window to 32K or even 100k if we change the position embedding


This is similar to my thoughts, “code” is for humans. AI does need a game engine or massive software, some future video game just needs to output the next frame and respond to input. Little to no code required.


I really like this test.


That's really interesting. Even if you specifically tell it to "write non-rhyming, free verse, iambic pentameter prose", it absolutely cannot generate appropriate output.


Some examples here of what it can do:

https://sites.google.com/view/genie-2024/home


FF7 is not a remake in traditional sense, is all I can say without spoilers and will diverge more and more as the other parts come out.


Oh, so it's more like the Evangelion Rebuild. That's cool. I haven't touched FF7 Remake yet.


The other interim ceo turned on the board and supported Sam I think.


This just makes the board seem very incompetent TBH, why select a CEO that is not aligned with the board?


This just seems like goal post moving.

If all AI stuff was 500x worse than it is currently that would still be a big deal.

Tech does not stand still, with any rate of improvement this stuff will become insanely good quality.


A step beyond that, Software 2.0 and beyond does not generate source code. Your app simply lives in the LLM.


I'm wondering how the LLM will be able to make use of hardware, or handle high demand. Can it be relied on to have data integrity?


Many AI image generation can do text now. Especially some unreleased ones like Muse from Google.

What does behind the scenes might isn’t necessarily 1:1 with old image generation stuff either way.


The colors are really hard for me to see, but otherwise it is a cool idea.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: