Hacker News new | past | comments | ask | show | jobs | submit login

I am so despondent at the lack of creativity in most of the (many, many) LLM powered projects that are popping up. I have seen hardly a single thing that goes beyond "it's a chat bot, but with a special prompt". Like, is this the best we can expect from this supposedly ground-breaking technology?



I agree. I wrote a book on LangChain and LlamaIndex [1] and was initially very enthusiastic about possible applications. However, most of what I now do is just writing simple scripts to interact with my data. I feel like my “lines of code per month” metrics are at an all time low. I wanted a local chat interface that worked with all the books I have written, and various local PDFs that I have collected on the semantic web and other technologies. I ended up with two Python scripts that use a new library embedchain (which uses LangChain) that total 30 lines of code. So easy to do this stuff, that I am not so sure about the idea of using products from the new flood of LLM startups. All of this does require either using the OpenAI APIs, or the Hugging Face APIs, or renting something like a Lambda Labs GPU server and running a 33B model (if you use FastChat, you get an OpenAI compatible API).

I would urge companies and people to build their own stuff because there is so much value in learning the tech. For off the shelf LLM tools, it is hard to beat OpenAI’s web app, Microsoft Bing+ChatGPT and Office 365, and Google’s beta Bard integrations with Google Docs, etc.

[1] https://leanpub.com/langchain


Part of this post is very similar to a famous reply from Dropbox launch on hb when someone said roughly: "why do we need Dropbox, here's steps to roll your own".

The average non tech person doesn't know what an http API is.


This trope gets trotted out every time someone casts a skeptical eye on buzzy tech. In this case GP is saying that OpenAI, Microsoft and Google's apps are already "Dropbox-y" enough and that (paraphrasing fairly hard) the flood of "AI" startups are wafer thin combinations of "Bootstrap UI + GPT API calls + a vector store".

Maybe there's a killer unicorn app hiding in one of those wafer thin wrappers, but OP's point is precisely that it's disappointing how wafer-ish all these "exciting applications" actually are, relative to the massive sense of expectation that existed only a few months ago.


>[1] https://leanpub.com/langchain

Too short to call it a book.


For what it's worth, I am working on a new chapter, which will be the 5th update to the book. This topic is a moving target!


I mean if DnD can release a book that’s 43 pages long (with many pictures), this more than qualifies


Remember that the most downloaded app in Apple Store’s first year was this infantile thing called iBeer… it actually hit the mark of $20k of sales/day

This was an extreme bizarre example, but most mobile apps from the first wave were also pretty dull

Give it some time. I think something similar will happen: most LLM apps from this first wave will be forgotten soon, and when people shift the focus from the technology back to the users and their real pain points, the good stuff will surface


Because these are the people who are trying to "front run" with an MVP. They think if they can build something and get feedback, they'll be ahead with their fuller vision.

I see a lot of companies that are experimenting with it, and there are a few press releases here and there, but there is a next generation that will be more ambitious. You just can't build more ambitious in the timeframe yet.


Exactly.

All these projects are desperately slapping buzzwords such as ‘LLMs’, ‘AI’, ‘ChatGPT’, etc to pretend that their product is somehow revolutionary despite sitting on someone else’s AI model in the cloud via an API that they do not own.

They are slowing realizing that there is little to no moat with these LLMs let alone any serious use cases other than summarizing / rewording text.

Everything else requires the triple checking of another human to review the bullshit that the so-called black-box AI outputs.


Most of the stuff it’s actually good at (like NLP tasks) are both super boring and require a secondary layer of processing to catch hallucinations. Not as cool of a sales pitch to everyone on the “it’s alive!” hype train.


I've been working on NLP stuff using LLMs for a while, and it's not that the problems aren't somewhat interesting, but solving them is ridiculously tedious.

Most of the time on that project I've just played around with different prompts to make it do what I'm hoping for, with almost no mental process for understanding problems and solutions, mostly just randomly coming up with experiments and reading the results very carefully, checking how consistent they are.

I moved towards finding and isolating the parts LLMs are good (and reliable!) at and using deterministic approaches for everything I possibly can. That part is not too tedious, but all this black box trial and error (with all the waiting and errors)...

Luckily the client doesn't expect LLMs to be what the hype says and mainly just wants some reasonably useful features so they can say they use AI - wouldn't enjoy dealing with someone thinking it's super easy and I just need to write a few scripts because they tried this in ChatGPT once.


Any advice for people building the second layer stuff?


Just basic sanity checking. You can use an LLM call for this or something procedural. Generally, it’s just of the order of like “does the output conform to this structure”, “is the returned text actually in the input” etc


Same. You just know most of the paid apps are going to be abandoned in a few months.


Thanks to this comment:

https://news.ycombinator.com/item?id=36529885

I realized how primitive even pure prompting is. Stable Diffusion is kind of primitive too, but the input/prompting methods it has are lightyears ahead.


All techniques for prompting in Stable Diffusion work with regular LLMs. I have complained bitterly about this lack of tooling in NLP and wrote a github gist with sample implementations to prove this.

NLP folks don't know shit about prompt engineering, which is ironic.

https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

I still can't emphasize certain tokens in ChatGPT, or mathamatically average them. Not sure why NLP folks don't bother implementing these things, even in the oogabooga frontend (which is supposed to be the automatic1111 of LLMs)


I just realized we already chatted briefly about banned tokens (I'm the Grover tongue twister guy) but I somehow completely missed this gist at that time. Total facepalm moment, would have been helpful reference.


I'd love to continue this conversation more, as I was going to write a long reply to your previous comment. Please reach out to the email in my "about" box. Would love to chat!!!


AHhhhhhhhhhhhhhhhhhhhhhhhhhhhh. That's me screaming. I constantly wonder why this stuff does not exist in LLMs. But my technical depth and competence is quite low. Way lower than the people implementing the models and samplers. So I just assume: there must be a good reason, right. Right?

But recently I threw a just a bit of similar-ish stuff as you describe there into a TTS model, barely knowing anything, and yeah it's totally works and is fun and cool. The stuff that doesn't work fails in interesting and strange ways, so it almost STILL works. (Well, it gives people really bizarre speech impediments, at least...)

I was just working on prompt editing actually. Which is weird to imagine in a TTS model. It makes sense for the future tokens of course, for words the model has not said yet. I think it even makes sense for the past right? You can rewrite the past context, and it still changes future output audio model. In bark it's two different things: one is the text prompt, and one is the generated audio tokens/context, which is not the same. (The text and the past audio is concatted in the Bark prompt, so this idea makes sense in Bark but not in other models. You could change either text OR 'what was generated with the text' independently.)

As long as you don't rewrite the time touching the last token, at 0 seconds - if it's like a segment 2 to 4 seconds in the past, it should influence future output but not cause a discontinuity in the audio. I think?

BTW an easy and fun thing - just let generation parameters be dependent variables. Of anything.

A trivial example: why is temperature just a number, why not a function? Like the temp varies according to how far long in the prompt you are. For music, just that is already a fun tool. Now as a music segment starts or ends the style transitions. Or: spike the temperature at regular intervals - like use a sine wave for temp, input is current token position. You can probably imagine that works great in music model.

Even in a TTS model this you can get weird and diverse speech patterns.

The thing is: I really very a low level of competence. Total monkey hitting keys and googling, and even I can make it work, easily. Sampling is just a loop, okay, what if I copy logits from sample A and subtract them from sample B. What if take the last generation, save the tokens, ban then in the next. Really just do anything and you end up in interesting places in the model you didn't know existed and are often cool. (Recently, TTS output overlapping speech, for example.)

Like I recently generated french accents from any voice in the Bark TTS model, with no fine-tuning, no training, actually not even really any AI. Just by counting token frequencies in the french voices, and having the sampler loop go, "Okay let's bump these those logits up a bit, and the others down" and it just somehow works. No Loras, no fine-tuning, not stats, it's like middle school level math, but sounded great.

(I'm in a bit of a stream of consciousness ramble mode from lack of sleep, but I'll keep going on this message anyway so I don't forget to come back to your post when I'm back at normal capacity. And just hope I don't cringe too hard reading this when better rested.)

Oh I'd love to hear your thoughts on negative prompts in LLMs.

1) What does 'working correctly' look like?

For an audio LLM, I'm thinking something like: a negative prompt "I'm screaming and I hate you!!!" makes the model more inclined to generate quieter, friendly speech, in your positive prompt. Something like that?

2) How to make it work.

This is probably very model dependent and fiddly. My first thought was generate two samples in sequence. The first sample is the negative prompt. Save all the logits and tokens. Use them as a negative influence in the second prompt. At least in Bark you can't just like flat subtract them or what you actually get is more like 'the opposite of speech' than 'the opposite of your prompt' but when I did french accents I basically just fiddled with a bunch of constant values and weights and eventually it worked. So I'm hoping the same applies. I can imagine a more complicated versions where you do some more math to figure out what's unique about a text prompt, versus 'a generic sentence from that language' and only push on those logits. I suppose that might be necessary.


>Like, is this the best we can expect from this supposedly ground-breaking technology?

There's a insidious reason for this. It's because LLMs are one of the few technologies we don't fully understand and we can't fully control.

It's a stark contrast with traditional engineering.


There's cool stuff going on in medicine https://www.nature.com/articles/s41586-023-06160-y


I'm genuinely curious how you would classify my product, Wonder an AI powered browser for kids: hellowonder.ai

Does it cross the threshold?


Yeah I'll raise my hand here to say I did a file manager that automates file management tasks by using AI to write Bash scripts (Aerome.net). It's still super primitive though, if I'm being honest. I think the problem is that it's way harder to write a cross platform file manager, or browser wrapper in your case, then it is to write a chat interface on top of ChatGPT. I suspect in a year or two many good use cases will emerge, as people write more complicated software to take advantage of LLM's capabilities.

I'm going to check out you're browser thing later tonight, it looks good!


The fact that this tech gave us so many low hanging fruit is the testament of how powerful it is.


I'm working on something to help you code with LLMs with projects of any size. If you go to https://inventai.xyz, sign-in with your email, you'll be notified on release.

I started off with something to create AI art, but that didn't really take off. I'm also disappointed with Dall-E 2 lagging behind the others in terms of image quality. So now I'm focusing on code generation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: