Hacker News new | past | comments | ask | show | jobs | submit | ggnore7452's comments login

appreciate the question on hparams for websearch!

one of the main reasons i build these ai search tools from scratch is that i can fully control the depth and breadth (and also customize loader to whatever data/sites). and currently the web search isn't very transparent on what sites they do not have full text or just use snippets.

having computer use + websearch is definitely something very powerful (openai's deep research essentially)


How’s this compare to likes of Fish audio? Wish they support voice clone using longer audio tho .

Haven’t looked into this space for few months , but iirc, previously SOTA was like GPT VITS or something ?


This is the clear SOTA at the moment, even better than ElevenLabs in a technical sense, because you can specify emotion, speed, etc.


anyone tried this? is this actually overall better than xgboost/catboost?


Benchmark of tabpfn<2 compared to xgboost, lightgbm, and catboost: https://x.com/FrankRHutter/status/1583410845307977733 .. https://news.ycombinator.com/item?id=33486914


Yes it actually is but the limitations of rows and features could be a hindrance.


if anything i would consider embeddings bit overrated, or it is safer to underrate them.

They're not the silver bullet many initially hoped for, they're not a complete replacement for simpler methods like BM25. They only have very limited "semantic understanding" (and as people throw increasingly large chunks into embedding models, the meanings can get even fuzzier)

Overly high expectations lets people believe that embeddings will retrieve exactly what they mean, and With larger top-k values and LLMs that are exceptionally good at rationalizing responses, it can be difficult to notice mismatches unless you examine the results closely.


Absolutely. Embeddings have been around a while and most people don’t realize it wasn’t until the e5 series of models from Microsoft that they even benchmarked as well as BM25 in retrieval scores, while being significantly more costly to compute.

I think sparse retrieval with cross encoders doing reranking is still significantly better than embeddings. Embedding indexes are also difficult to scale since hnsw consumes too much memory above a few million vectors and ivfpq has issues with recall.


Off the shelf embedding models definitely underpromise and overdeliver. In ten years I'd be very surprised if companies weren't fine-tuning embedding models for search based on their data in any competitive domains.


My startup (Atomic Canyon) developed embedding models for the nuclear energy space[0].

Let's just say that if you think off-the-shelf embedding models are going to work well with this kind of highly specialized content you're going to have a rough time.

[0] - https://huggingface.co/atomic-canyon/fermi-1024


> they're not a complete replacement for simpler methods like BM25

There are embedding approaches that balance "semantic understanding" with BM25-ish.

They're still pretty obscure outside of the information retrieval space but sparse embeddings[0] are the "most" widely used.

[0] - https://zilliz.com/learn/sparse-and-dense-embeddings


What’s more interesting to me here are the calibration graphs:

• LLMs, at least GPT models, tend to overstate their confidence. • A frequency-based approach appears to achieve calibration closer to the ideal.

This kinda passes my vibe test. That said, I wonder—rather than running 100 trials, could we approximate this by using something like a log-probability ratio? This would especially apply in cases where answers are yes or no, assuming the output spans more than one token.


If you imagine a future where LLMs get faster and cheaper even without getting better it means we'd be able to automatically repeat questions 100x and every answer could come with a pretty good confidence measure.


yeah, this is by far the most interesting part of this page, the fact that LLMs can know what they know is not a trivial fact.


Side note: I feel like ChatGPT's long-term memory isn't implemented properly. If you check the 'saved memories,' they are just bad.


100% agree. I have seen similar issues related to both quality and performance of ChatGPT Memory feature.

Shameless plug: We have been working on this problem at Mem0 to solve the long-term memory problem with LLMs. GitHub: https://github.com/mem0ai/mem0


question: any good on-device size image embedding models?

tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?


I have successfully used OpenCLIP models for embedding and similar-image search. The smallest model listed on that UForm page is 79 million parameters, so I presume that you can use other models of similar size. There are a few OpenCLIP models with 80 million or fewer parameters listed here:

https://github.com/mlfoundations/open_clip/blob/main/docs/mo...

When embeddings are quantized to int8 they still work very well for similarity (no differences in top 10 search on my test set). I haven't tried quantizing the models themselves.


I've been working on a small personal project similar to this and agree that replicating the overall experience provided by Perplexity.ai, or even improving it for personal use, isn't that challenging. (The concerns of scale or cost are less significant in personal projects. Perplexity doesn't do too much planning or query expansion, nor does it dig super deep into the sources afaik)

I must say, though, that they are doing a commendable job integrating sources like YouTube and Reddit. These platforms benefit from special preprocessing and indeed add value.


The Groq demo was indeed impressive. I work with LLM alot in work, and a generation speed of 500+ tokens/s would definitely change how we use these products. (Especially considering it's an early-stage product)

But the "completely novel silicon architecture" and the "self-developed LPU" (claiming not to use GPUs)... makes me bit skeptical. After all, pure speed might be achievable through stacking computational power and model quantization. Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?


I work at Groq. We arent using GPUs at all. This is a novel hardware architecture of ours that allows this high throughput and latency. Nothing sketchy about it.


> Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

GPUs are general purpose, a for purpose built chip that is better isn't that hard to make at all. Google didn't have to work hard at all to invent TPUs which is that idea as well, they said their first tests proved the idea worked so it didn't require anything near Nvidias scale or expertise.


more on the LPU and data center: https://wow.groq.com/lpu-inference-engine/

price and speed benchmark: https://wow.groq.com/


The place where I work was an early adopter of LLM, having started working on it a year ago.

When I build stuff with GPT-3, especially in the earlier days, I get the strong impression that it's like we are doing machine learning without Numpy and Pandas.

with LangChain, many of the systems I have built can be done in just one or two lines, making life much easier for rest of us. I also believe that LangChain's Agent framework is underappreciated as it was pretty ahead of its time until the official ChatGPT plugins were released. (contributed to LangChain a bit too.)

Unfortunately, the documentation is lacking indeed. While I understand the need to move quickly, it is not good that some crucial concepts like Customized LLM have inadequate documentation. (Perhaps having some LLM builds on top of the repo would be more effective than documentation at this point.)


The docs are lacking but the underlying code is so simple that it's just a few clicks/rg searches away from figuring out what you need. It's all mostly ways to do string templating. IMO the ergonomics of LangChain need an overhaul; there's too many ways to do the same thing and there's too much type erasure that makes it hard to use strengths in a particular LLM. For example, it's still a pain to distinguish between using a Chat oriented LLM vs a regular completion one.

There also seems to be really poor observability in the code and performance seems to be an afterthought. I tell friends who ask about LangChain that it's great to experiment with but not something I'd put into production. Hopefully this funding helps them shore things up.


Are you saying you'd use something else in production?

> For example, it's still a pain to distinguish between using a Chat oriented LLM vs a regular completion one.

Totally agree. After using it for a few weeks, this is one of the most visible weaknesses in the design.


> Are you saying you'd use something else in production?

Absolutely. The Langchain interface is quite easy to build atop any old LLM, and if you're just using OpenAI then OpenAI already distirbutes clients in a bunch of languages that you can just use. Even then, you're calling a few HTTP endpoints and stuffing some data in a payload. It's really basic stuff. I'd prototype in Langchain, grab the prompts and agents I ended up using, and reimplement them in $MY_PL on $MY_PLATFORM. That's what I find so fun about these LLMs, they're just trivial to use for anyone that can ship text off to an endpoint (whether networked or local) and receive a text response.


This is what blows my mind. They raised a 10M seed on what is (no disrespect intended) a wafer thin abstraction that an experienced dev really could implement the core of in a few days. Obviously the raise was about the momentum and mindshare Langchain holds, but still. Wow.


Agreed. The text parsing for feeding prompts into an LLM and parsing a response is about as simple and straightforward as it gets in programming. It is nice to have some control over that process and know what your code is doing every step along the way. It doesn’t need to do much anyway, the LLM is doing the hard work mostly. It makes no sense to me and I’m trying to understand it, but I just can’t see the value in a black box library to interface with an LLM when it’s so easy to DIY.


I agree that their current implementation is what you said, something an experienced dev can do in a couple days. But they have the potential to really make a robust library here. The thing is, there's a lot of small things with stop sequences, token estimation, API call backoff, and generic structures that are just a pain to make yourself.

But you're right that their moat will probably be razor then. A few senior devs can get together and probably hackathon a library that's just like Langchain but much more robust. Thanks for an idea on what I'm gonna do this weekend lol.


> But you're right that their moat will probably be razor then. A few senior devs can get together and probably hackathon a library that's just like Langchain but much more robust. Thanks for an idea on what I'm gonna do this weekend lol.

Did you do it? I'm doing something similar but in Rust, for my product. It'll be open source soon enough.


I was about to rant about the documentation, but I just checked and it seems to have improved a lot.


I wonder is it already possible for an AI to write documentation from scratch based of code base?


I agree that the agents are underappreciated.

To make them more accessible I rewrote them in ~200 lines of code, so you can easily understand how it works.

They have access to a python console, Google search and hacker news search:

https://github.com/mpaepper/llm_agents


In case anyone misses it buried in the readme, your accompanying blog post looks like the solid introduction to the subject that I've been looking for: https://www.paepper.com/blog/posts/intelligent-agents-guided...


I was looking through Langchain's docs and code last weekend. I'm surprised how well it is documented, actually. I thought it was fairly feature rich vis-a-vis potential chaining opportunities, but with obvious room to grow. Quite impressive, all things considered.

Excited to see what happens going forward.


I think LangChain is already outdated and it (and its copycats) are going to cripple the entire field.


It is, but it'll still get copious funding and let a few sly engineers escape the Matrix - so what's the harm?


Could you expand on what you think is the state of the art and direction we should be heading in?


Well, had the community tried analyzing what could be done better? And iterating on the design?

No, the design from an old academic paper was used with a much newer model. And now everyone is just copying that.

It works, because new models are impressive. But the design is far from being elegant or particularly efficient. And now, because tons of data like that is going to be generated, we’ll be stuck with it.


I mean, it's based on a paper[0] from November, no? Or is that called "old" in the AI world?

[0]: https://ai.googleblog.com/2022/11/react-synergizing-reasonin...


Yes, relatively old. The issue is, this approach is designed to work with the “classical” language model, trained using “The Pile” methods. This particular one was Palm 540b.

So essentially you have an approach, designed to work on these models that are not really instruction following models and that truly are stochastic parrots.

The models had changed substantially since. But the approach of chaining them in this particular way stuck. And is getting copied everywhere, without much thought.


Your answer doesn't make any sense regarding langchain to be honest.


Sure. I’m just expressing my opinion that the design is suboptimal and that the level of design is literally “You are GPT-3 and you suck at math” [quote from the LangChain code base].

I don’t want to see further expansion of this. I’m not offering higher level design, because I’m not sure about safety of all of this.

But having a poor and potentially unstable design like LangChain also doesn’t contribute to it :-/


Sorry to bring up an older thread, but I was looking into LangChain recently, and I was thinking of making something similar but in other languages. Do you have any insight into what direction is better to move in for LangChain-like alternatives?


It seems pretty rare that communities get "stuck with" a framework. Frameworks are pretty fluid. Eg, Python didn't get "stuck on Django," and they didn't get "stuck on flask," and they're not "stuck on FastAPI" now - the ecosystem continued to evolve, and none of these projects even had to die for a different vision of how a framework should be organized to capture the zeitgeist. They've each got dedicated communities which are continuing to improve them.

Similarly, I expect creative hackers to pursue new approaches in the space of LLM frameworks and for some of those to catch on, and that they don't need to uproot langchain to do so.


The difference is, lots of data is being generated. And, open models in particular, are trained on it. So there is a certain level of stickiness.

An analogy of a file format is a close one. Imagine someone invents something nasty and bulky, like a twisted form of XML or something. And, simply because it is early, it catches on. It could be buggy and unstable and ugly, but it still wins, because it is early.

The call here is to try to examine LangChain design a bit closer. And maybe consider that a start from scratch is a good idea.


Could you expand on what you think is the state of the art and direction we should be heading in?


Why? What do you think the better model is?


There are two Javascript alternatives, but how robust the development is remains to be seen:

https://github.com/hwchase17/langchainjs

https://github.com/cfortuner/promptable


Where do you see Langchain fitting into the ecosystem once Open AI rolls out plugins more widely?


it still works well with other LLM (like llama and more).

various small, open sourced, and verticle LLMs vs one large GPT models would be quite interesting.


Then we will connect plugins visually as Unreal's Blueprint Visual Scripting allows.


Is this supposition, or actually the direction Langchain is headed?


It's unfortunate that prettymuch all shiny AI things have horrible documentation. I see a lot of misinformation in non-researcher AI circles and I feel like it stems from that sometimes.


it is ironic that documentation is lacking when it can be generated with an LLM, using LangChain itself


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: