More

budududuroiu · 2024-05-30T03:26:23

All I want is semantic search. I can make sense of what I find, I don’t want a Transformer to regurgitate slop to me

wildrhythms · 2024-05-30T12:27:11

Right. I think there's value in being able to tell the LLM to give me the result in JSON format, for example. But the hallucinatory nature of LLMs poisons everything.

budududuroiu · 2024-05-30T13:13:49

Yeah, I think I’m still not fully trusting in that pipeline because running it twice could give me different results, even if they’re structured.

kjkjadksj · 2024-05-30T16:22:32

Wish you could supply the random seed yourself to at least make it a little more deterministic

budududuroiu · 2024-05-22T01:23:36

An interface for vector search is 100x more helpful to me than an LLM spitting out the same content as slop.

The key to vector search is how you chunk your data, but I have some libraries to help with that

budududuroiu · 2024-05-17T00:08:38

> We offer customers a choice around these practices

If you’re so customer-first, and make it so easy to opt out, just make it opt-in instead. Oh wait, you’re just a lying pathetic corpo

hu3 · 2024-05-17T00:14:42

Sadly, even Firefox is using opt-out these days. I feel we're going downhill with regards to privacy.

https://news.ycombinator.com/item?id=40355982

> What Firefox’s search data collection means for you

> We understand that any new data collection might spark some questions. Simply put, this new method only categorizes the websites that show up in your searches — not the specifics of what you’re personally looking up.

> Sensitive topics, like searching for particular health care services, are categorized only under broad terms like health or society. Your search activities are handled with the same level of confidentiality as all other data regardless of any local laws surrounding certain health services.

> Remember, you can always opt out of sending any technical or usage data to Firefox. Here’s a step-by-step guide on how to adjust your settings. We also don’t collect category data when you use Private Browsing mode on Firefox.

> As far as user experience goes, you won’t see any visible changes in your browsing. Our new approach to data will just enable us to better refine our product features and offerings in ways that matter to you.

budududuroiu · 2024-05-17T00:26:13

Yeah these statements are always so benevolent, with the patronising undertone of “this is for your own good, for a better experience”.

SMH

budududuroiu · 2024-05-14T00:34:16

What’s wrong with Australia, I swear for the past couple years these guys saw every authoritarian dystopian movie and went “hold my biya”

10u152 · 2024-05-14T01:26:42

It’s depressing, both major political parties are enthusiastic about the idea of removing as many rights of citizens as possible.

budududuroiu · 2024-05-14T01:29:49

But why? Like for example, in the US you can chalk it up to reducing labor rights to drive down cost and increase stock price, but what’s the reason Australia has this authoritarian boner atm?

a_bonobo · 2024-05-14T01:56:54

Australia is a special case as it does not have a national-level bill of rights [1]. There are various cases that enshrine aspects of human rights as Australia is party to many UN human rights agreements, but there is no central easy document people can go back to like the US constitution or the German Grundgesetz. That allows for easy overreach that has to, time and time again, be pushed back by courts and then enshrined in new laws. For example, age discrimination is unconstitutional in some countries but is only illegal in Australia since the Age Discrimination Act 2004. So Australia needs piecemeal laws for every civil rights issue.

[1] https://peo.gov.au/understand-our-parliament/how-parliament-...

nojvek · 2024-05-14T17:54:15

Australia is very much a Nanny state. You get fined for all sorts of things - Jay walking, biking without helmet, speeding over 5km/h, red light cameras, changing lanes without signal e.t.c

shermozle · 2024-05-14T01:41:33

Cops ask, cops get. Same as America.

budududuroiu · 2024-05-12T04:22:45

Im going to approach this from a totally different angle, maybe wrong.

To me it seems like periods of intense geopolitical tensions seem to correlate with a suppression of civil liberties in the West (see: McCarthyism and the Red Scare).

Now, with algorithmic ally driven feeds, a dying media apparatus clamouring to maintain relevance, it’s easier than ever for foreign state actors (and domestic ruling classes) to exploit genuine civil expression of dissatisfaction for division and a partisanship.

budududuroiu · 2024-05-08T02:34:56

My issue with RAG systems isn’t hallucinations. Yes sure those are important. My issue is recall. Given petabyte-scale index of chunks, how can I make sure that my RAG system surfaces the “ground truth” I need, and not just “the most similar vector”.

This I think is scarier. A healthcare-oriented (or any industry) RAG retrieving a bad, but highly linguistically similar answer.

thenaturalist · 2024-05-08T08:46:31

You're correctly identifying an issue that by now I think everyone is facing globally: Realizing the bottleneck to performance or improvements of LLMs isn't necessarily quantity, but inevitably quality.

Which is a much harder problem to solve outside few highly standardized niches/ industries.

I think synthetic data generation as a mean to guide LLMs over a larger than optimal search space is going to be quite interesting.

budududuroiu · 2024-05-08T09:09:24

To me synthetic data generation makes no sense. Mathematically your LLM is learning a distribution (let’s say of human knowledge). Let’s assume your LLM models human knowledge perfectly. In that case, what can you achieve? Just sampling the same data that your model mapped perfectly.

However, if your models distribution is wrong, you’re basically going to have an even more skewed distribution in models trained using the synthetic data.

To me, it seems like the architecture is the next place for improvements. If you can’t synthesise the entirety of human knowledge using transformers, there’s an issue there.

The smell that points me in that direction is the fact that up until recently, you could quantise models heavily with little drop in performance, but recent Llama3 research shows that’s not the case anymore

budududuroiu · 2024-05-08T02:30:34

Those do-not-search here chunks wouldn’t be retrieved during vector search and reranking because it would likely have a very low cross-encoder score with a question like “Who are the business partners of X?”.

budududuroiu · 2024-05-02T01:43:39

You can buy fully functional Android phones for less than $200, with better cameras, and battery life, and larger screens, and… well not a software-locked touch screen

budududuroiu · 2024-05-02T01:36:21

… or, you could do the same with a web app… and OAuth.

But a web app doesn’t get you the hype, VC funding and Galaxy brained Twitter tech bro adoration.

budududuroiu · 2024-04-30T03:39:23

Off topic but all these LLM-pin/tool/device keep building around chat. Chat is a genuinely horrible interface for conveying information to users, but it provides a nice layer of polite plausible deniability when the response is total garbage.

giantrobot · 2024-04-30T04:59:14

Not only is chat a poor UX, I don't want to talk to shit to use it. In a pinch I'll use Siri to set a timer or alarm but in general I just don't want to talk to things. I also don't really want my things talking to me most of the time.

I can read faster than the computer can talk and type/touch faster (and more clearly) than I can talk. Unless the voice output is really finely tuned a lot of stuff I usually search for sounds like gibberish when read back with TTS. I've yet to find even AI TTS that recognizes source code or technical abbreviations and reads them correctly.

Then there's of course the privacy aspect, or complete lack of privacy.

add-sub-mul-div · 2024-04-30T03:52:24

Likewise, after mastering how to find answers by typing in the correct few keywords, why go backwards and use slow/ambiguous written or spoken conversation as a query language?

MattRix · 2024-04-30T04:07:31

Not sure if I understand this complaint, since you can still use just a few keywords if you want, and it works fine, but you can also now be much more precise and it works even better.

throwaway5959 · 2024-04-30T04:34:29

Because frankly Google and the open web is so bogged down by ads that it’s become unusable. Try visiting any news site or a site like Fandom. Some sites I can’t even read because the ads take up the whole page on mobile and I can’t close them. Ads just totally ruin it. Meanwhile, with ChatGPT, I can ask it questions and it just gives me an answer. It’s not perfect, but neither are search results.

cjk2 · 2024-04-30T04:36:57

Yes the chat interface is terrible. I actually had an LLM try and gaslight me into changing the subject a couple of weeks back. I’m not sure how I managed to get down that hole but it was frustrating. And eventually the information was wrong.

If I wanted that experience I’d talk to a human.