Hacker News new | past | comments | ask | show | jobs | submit | axegon_'s comments login

Our age gap is less then 10 years but here's my two cents: laziness/convenience. Back in the 90's and 2000's, you had to be ready to spend a lot of time fiddling with setups and maintenance as well as some MAJOR early days security flaws(think the IRC days). Corporate-owned ecosystems solved that problem: you log in and forget about it. They won with what some people call user experience. The lower the entry barrier, the quicker something picks up. Back when I was in school I was the biggest Nokia fanboi and even then I acknowledged that downloading a shady jar file and installing it on my phone was iffy. At a later stage when I was a bit older and could afford it, I got my first Android phone and the existence of a marketplace was a breath of fresh air. The problem is that few people(annoyingly even now) fail to realize or admit that those types of centralizations put handcuffs on your wrists the moment you say "OK, that works for me". Whether that's social logins, cloud providers, services or anything else - it's all the same. For example, if today, OpenAI decided to close off their API's for good, I recon tens if not hundreds of thousands of "AI" startups will collapse immediately since they fully rely on OpenAI's API's. Same with AWS, GCP, Azure or any other provider. And as we see with the current fiasco with twitter, tiktok and bambu labs just to name a few from the past two days, it is abundantly clear that people are in dire need of backups. As much as I used to find google drive and docs convenient, I've personally moved away and self-host everything now. The only thing I rely on(and only as a backup plan to access my home network) is a VPN I host over at Hetzner. But again - this is my backup.

Whether the corporations saw that as an opportunity at an early stage or they were at the right place, at the right time, I can't say. I'm more leaning towards the latter since I've worked at corporations and success in those environments is most commonly a moderately-educated gamble.


Yeah, I think it's clear that laziness/convenience is the answer.

You're absolutely right about people needing backups -- but ofc selfhosting is too huge a hurdle to expect most folks to embark on.

I wonder what can be done to make the "better" options easier. Can this even be done by the private sector alone given the incentives of capitalism? I'm unsure.

Given how many things we've seen happen in the social media space back-to-back (Elon taking over Twitter, Meta pandering to the new US governing party, TikTok's ban), I can't imagine these events will slow down. That at least fills me with hope that more people will wonder "does it have to be this way?" ...obviously that won't be enough for true mass adoption, but it's a start


I think there are two aspects to this:

* The software: different open source solutions have very different requirements at a high level: language, platform or even system requirements. Say you want to take messaging off centralized platforms: you need to host something like Matrix, which is very well made and polished but takes a lot of resources to run. Alternatively, you could use Jabber, which scales like no other but is an absolute hell to setup and maintain. Same can be said about music, videos, movies and all other things

* Operations: probably simple if you ask someone on HN, but you still need to understand networking, operating systems and file systems. I started using Linux when I was 11 in the distant 2000, and even now I'm not very enthusiastic if I have to make some changes to my zfs. You also need to consider backups and security and resources. Say you wanna run openstreetmap(which we recently started doing at work). Awesome but that requires an ungodly amount of fiddling in addition to an astonishing amount of time needed to unpack, even on enterprise hardware.

If you are in the tech world, https://github.com/awesome-selfhosted/awesome-selfhosted is a great place to start. But if you want to make it simpler... Idk... A lot of people would need to put in a lot of effort, as in build a linux distro around this idea, along with "recommended hardware", one click install(a very dumbed down equivalent of portainer), and some backup and alerting mechanisms built into the system. It's a tough question and frankly I don't have the answer.


Seems I spoke too soon about the US taking a good decision for once when it comes to cyber and civil security. Well... I wonder what muskov will come up with now that twitter is still at large inaccessible in China but tiktok is welcome in the US.

TikTok is still blocked in China.

Douyin is the Chinese TikTok equivalent. China isn't opposed to the concept of short form video, they just want to segregate Chinese users into their own app

Douyin also doesn't allow nearly as much brainrot as you see on tiktok, and definitely doesn't allow anything that challenges the CCP.

I heard they opened Douyin registration recently

Yes, but it's irrelevant - Apple and Google refuse to allow Douyin on the iOS and Google Play stores, as of writing (and I do not see this changing)

Sounds like it's a good time to break app store duopoly

Historically the US has been always behind the curve when it comes to personal online privacy and security threats compared to Europe(and it's not like we are doing a good job at all, Europe is pretty shit as well), but here, today, I am utterly jealous of the US. Let's hope the EU commission gets their shit together and follows soon.

No, it has not and will not in the foreseeable future. This is one of my responsibilities at work. LLMs are not feasible when you have a dataset of 10 million items that you need to classify relatively fast and at a reasonable cost. LLMs are great at mid-level complexity tasks given a reasonable volume of data - they can take away the tedious job of figuring out what you are looking at or even come up with some basic mapping. But anything at large volumes.. Na. Real life example: "is '20 bottles of ferric chloride' a service or a product?"

One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get help.


You are not pushing it at 100. I can classify "Is 20 bottles of ferric chloride' a service or product in probably 2 seconds with a 4090. Something that most people don't realize is you can run multiple inference. So with something like a 4090, some solid few shots, and instead of having it classify one example at a time, you can do 5. We can probably run 100 parallel inference at 5 at a time. For about a rate of 250 a second on a 4090. So in 11 hours I'll be done. I'm going with a 7-8B model too. Some of the 1.5-3B models are great and will even run faster. Take a competent developer who knows python and how to use an OpenAI compatible API, they can put this together in 10-15 minutes, with no data science/scikit learn or other NLP toolchain experience.

So for personal, medium or even large workloads, I think it has killed it. It needs to be extremely large. If you are classifying or segmenting comments on a social media platform were you need to deal with billions a day, then LLM would be a very inefficient approach, but for 90+% of use cases. I think it wins.

I'm assuming you are going to run it locally because everyone is paranoid about their data. It's even cheaper if you use a cloud API.


Or you can build a DistilBERT model and get your egregiously inefficient 2 seconds down to tens of milliseconds.

If you have to classify user input as they’re inputting it to provide a response — so it can’t be batched - 2 seconds could potentially be really slow.

Though LLMs sure have made creating training data to train old school models for those cases a lot easier.


Yeah, that’s what I do: use LLM to help make training data for small models. It’s ao much more efficient, fast, and ergo, scalable.

two seconds is a VERY VERY VERY long time. That is mind-bogglingly, insanely slow.

Yes and no. Having used these tools extensively I think it will be some time before LLMs are truly performant. Even smaller models can't be compared to running optimized code with efficient data structures. And smaller models (in general) do reduce the quality of your results in most cases. Maybe LLMs will kill off NLP and other pursuits pretty soon, but at the moment, each have their tradeoffs.

Correct me if I'm wrong, but, if you run multiple inferences at the same time on the same GPU you will need load multiple models in the vram and the models will fight for resources right? So running 10 parallel inferences will slow everything down 5 times right? Or am I missing something?

Inference for single example is memory bound. By doing batch inference, you can interleave computation with memory loads, without losing much speed (up until you cross the compute bound threshold).

You will most likely be using the same model so just 1 to load into vram.

No, the key is to use the full context window so you structure the prompt as something like: For each line below, repeat the line, add a comma then output whether it most closely represents a product or service:

20 bottles of ferric chloride

salesforce

...


Appreciate the concrete advice in this response. Thank you.

At 2s per query for 10m entries, that's 251 days to run through the database.

FFS... "Lots of writers, few readers". Read again and do the math: 2 seconds, multiply that by 10 million records which contain this, as well as "alarm installation in two locations" and a whole bunch of other crap with little to no repetition (<2%) and where does that get you? 2 * 10,000,000 = 20,000,000 SECONDS!!!! A day has 86,400 seconds (24 * 3600 = 86,400). The data pipeline needs to finish in <24 hours. Everyone needs to get this into their heads somehow: LLM's are not a silver bullet. They will not cure cancer anytime soon, nor will they be effective or cheap enough to run at massive scale. And I don't mean cheap as in "oh, just get openai subscription hurr durr". Throwing money mindlessly into something is never an effective way to solve a problem.

Assuming the 10M records is ~2000M input tokens + 200M output tokens, this would cost $300 to classify using llama-3.3-70b[1]. If using llama lets you do this in say one day instead of two days for a traditional NLP pipeline, it's worthwhile.

[1]: https://openrouter.ai/meta-llama/llama-3.3-70b-instruct


> ...two days for a traditional NLP pipeline

Why 2 days? Machine Learning took over the NLP space 10-15 years ago, so the comparison is between small, performant task-specific models versus LLMs. There is no reason to believe the "traditional" NLP pipelines are inherently slower than Large Language Models, and they aren't.


my claim is not that it would take two days for such a pipeline to run but that it would take two days to make an NLP pipeline whereas an LLM pipeline would be faster to make.

Why are you using 2 seconds? The commenter you are responding to hypothesized being able to do 250/s based on "100 parallel inference at 5 at a time". Not speaking to the validity of that, but find it strange that you ran with the 2 seconds number after seemingly having stopped reading after that line, while yourself lamenting people don't read and telling them to "read again".

Ok, let me dumb it down for you: you have a cockroach in your bathroom and you want to kill it. You have an RPG and you have a slipper. Are you gonna use the RPG or are you going to use the slipper? Even if your bathroom is fine after getting shot with an RPG somehow, isn't this an overkill? If you can code and binary classifier train a classifier in 2 hours that uses nearly 0 resources and gives you good enough results(in my case way above what my targets were) without having to use a ton of resources, libraries, rags, hardware and hell, even electricity? I mean how hard is this to comprehend really?

https://deviq.com/antipatterns/shiny-toy


This thread is chock full of people who have no clue about what traditional AI even is. I'm sorry you have to deal with literal children

Sure, but this doesn't answer my question nor tie into your last comment at all. It's Saturday evening in much of the world, are you sober?

OP said 2 seconds as if that wasn't an eternity...

But then they said 250/second when running multiple inference? Again I don't know if their assertions about running multiple inference are correct but why focus on the wrong number instead of addressing the actual claim?

250/s is still nothing when compared to an actual NLP pipeline that takes a few ms per it, because you can parallelize that too.

I know it's hard to understand, but you can achieve a throughput that is a few orders of magnitude higher.


I think your intuition on this might be lagging a fair bit behind the current state of LLMs.

System message: answer with just "service" or "product"

User message (variable): 20 bottles of ferric chloride

Response: product

Model: OpenAI GPT-4o-mini

$0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25

$0.300/1Mt batch output * 1 output token * 10M jobs = $3.00

It's a sub-$25 job.

You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.


You might be able to use an even cheaper model. Google Gemini 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.

17 input tokens and 2 output tokens * 10 million jobs = 170,000,000 input tokens, 20,000,000 output tokens... which costs a total of $6.38 https://tools.simonwillison.net/llm-prices

As for rate limits, https://ai.google.dev/pricing#1_5flash-8B says 4,000 requests per minute and 4 million tokens per minute - so you could run those 10 million jobs in about 2500 minutes or 42 hours. I imagine you could pull a trick like sending 10 items in a single prompt to help speed that up, but you'd have to test carefully to check the accuracy effects of doing that.


The question is not average cost but marginal cost of quality - same as voice recognition, which had relatively low uptake even at ~2-4% error rates due to context switching costs for error correction.

So you'd have to account for the work of catching the residue of 2-8%+ error from LLMs. I believe the premise is for NLP, that's just incremental work, but for LLM's that could be impossible to correct (i.e., cost per next-percentage-correction explodes), for lack of easily controllable (or even understandable) models.

But it's most rational in business to focus on the easy majority with lower costs, and ignore hard parts that don't lead to dramatically larger TAM.


I am absolutely not an expert in NLP, but I wouldn't be surprised if for many kinds of problems LLMs would have far less error rate, than any NLP software.

Like, lemmation is pretty damn dumb in NLP, while a better LLM model will be orders of magnitude more correct.


This assumes you don’t care about our rapidly depleting carbon budget.

No matter how much energy you save personally, running your jobs on Sam A’s earth killer ten thousand cluster of GPUs is literally against your own self interest of delaying climate disasters.

LLM have huge negative externalities, there is a moral argument to only use them when other tools won’t work.


It's digging fossil carbon out of the ground that's the problem, not using electricity. Switch to electricity not from fossil carbon and you're golden.

Haha, this is pretty good. I’m going to take a plane to SF while I laugh at this.

How do you validate these classifications?

The same way you validate it if you didn't use an LLM.

Isn't it easier and cheaper to validate than to classify (requires expensive engineers)? I mean the skill is not as expensive - many companies do this at scale.

The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.

You need a domain expert either way. I mentioned in another reply that one of my niches is implementing call centers with Amazon Connect and Amazon Lex (the NLP engine).

https://news.ycombinator.com/item?id=42748189

I don’t know the domain beforehand they are working in, I do validation testing with them.


Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

Run them all in parallel with a cloud function in less than a minute?

Obviously all the LLM API providers have a rate limit. Not a fan of GP's sarcastic tone, but I suppose many of us would like to know roughly what that limit would be for a small business using such APIs.

The rate limits for Gemini 1.5 Flash are 2000 requests per minute and 4 million tokens per minute. Higher limits are available on request.

https://ai.google.dev/pricing#1_5flash

4o-mini's rate limits scale based on your account history, from 500RPM/200,000TPM to 30,000RPM/150,000,000TPM.

https://platform.openai.com/docs/guides/rate-limits


Surprisingly, DeepSeek doesn't have a rate limit: https://api-docs.deepseek.com/quick_start/rate_limit

I've heard from people running 100+ prompts in parallel against it.


Yes, how did I not think of throwing more money at cloud providers on top of feeding open ai, when I could have just code a simple binary classifier and run everything on something as insignificant as an 8-th geh, quad core i5....

Did I mention openai?

Ah my bad someone further up thread did.

Really it boils down to balance of time and cost, and the skill set of the person getting the job done.

But you seem really anti establishment (hung up over $25 cloud spend), so you do you.

Just don't expect everyone else to agree with you.


Also can’t you just combine multiple classification requests into a single prompt?

Yes, for such a simple labelling task request rate limits are more likely the bottleneck than token rate limits.

>You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

How much for the “prompt engineer”? Who is going to be doing the work and validating the output?


You do not need a prompt engineer to create: “answer with just "service" or "product"”

Most classification prompts can be extremely easy and intuitive. The idea you have to hire a completely different prompt engineer is kind of funny. In fact you might be able to get the llm itself to help revise the prompt.


All software engineers are (or can be) prompt engineers, at least to the level of trivial jobs like this. It's just an API call and a one-liner instruction. Odds are very good at most companies that they have someone on staff who can knock this out in short order. No specialized hiring required.

> ..and validating the output?

You glossed over the meat of the question.


Your validation approach doesn't really change based on the classification method (LLM vs NLP).

At that volume you're going to use automated tests with known correct answers + random sampling for human validation.


Prompt engineering is less and less of an issue the simpler the job is and the more powerful the model is. You also don't need someone with deep nlp knowledge to measure and understand the output.

>less and less of an issue the simpler the job

Correct, everything is easy and simple if you make it simple and easy…


Plenty of simple jobs required people with deeper knowledge of AI in the past, now for many tasks in businesses you can skip over a lot of that and use a llm.

Simple things were not always easy. Many of them are, now.


That’s the argument the article makes but the reasoning is a little questionable on a few fronts:

- It uses f16 for the data format whereas quantization can reduce the memory burden without a meaningful drop in accuracy, especially as compared with traditional NLP techniques.

- The quality of LLMs typically outperform OpenCV + NER.

- You can choose to replace just part of the pipeline instead of using the LLM for everything (e.g. using text-only 3B or 1B models to replace the NER model while keeping OpenCV)

- The (LLM compute / quality) / watt is constantly decreasing. Meaning even if it’s too expensive today, the system you’ve spent time building, tuning and maintaining today is quickly becoming obsolete.

- Talking with new grads in NLP programs, all the focus is basically on LLMs.

- The capability + quality out of models / size of model keeps increasing. That means your existing RAM & performance budget keeps absorbing problems that seemed previously out of reach

Now of course traditional techniques are valuable because they can be an important tool in bringing down costs (fixed function accelerator vs general purpose compute), but it’s going to become more niche and specialized with most tasks transitioning to LLMs I think.

The “bitter lesson” paper is really relevant to these kinds of discussions.


Not an independent player so obviously important to be critical of papers like this [1], but it’s claiming a ~10x cost in LLM inference every year. This lines up with the technical papers I’m seeing that are continually improving performance + the related HW improvements.

That’s obviously not sustainable indefinitely, but these kinds of exponentials are precisely why people often make incorrect conclusions on how long change will take to happen. Just a reminder: CPUs were 2x more performance every 18 months and continued to continually upend software companies for 20 years who weren’t in tune with this cycle (i.e. focusing on performance instead of features). For example, even if you’re spending $10k/month for LLM vs $100/month to process the 10M item, it can still be more beneficial to go the LLM route as you can buy cheaper expertise to put together your LLM pipeline than the NLP route to make up the ~100k/year difference (assuming the performance otherwise works and the improved quality and robustness of the LLM solution isn’t providing extra revenue to offset).

[1] https://a16z.com/llmflation-llm-inference-cost/


What NLP approaches are you using to solve the "is '20 bottles of ferric chloride' a service or a product?" problem?

How about a naive Bayesian Bag of Words? Just find/scrape/generate with an LLM a large enough corpus of products/services, build the term frequency matrix, calculate class priors and P(term|class) and inference with straightforward application of Bayes theorem.

This particular problem, at least to me, seems trivial, and to use an LLM for anything like this for more than a hundred cases seems incredibly wasteful.


That’s sort of like asking a horse and buggy driver whether automobiles are going to put them out of business.

I think for the most part, casual nlp is dead because of LLMs. And LLM costs are going to plummet soon, so large scale nlp that you’re talking about is probably dead within 5 years or less. The fact that you can replace programmers with prompts is huge in my opinion so no one needs to learn an nlm API anymore, just stuff it into a prompt. Once costs to power LLMs decrease to meet the cost of programmers it’s game over.


> LLM costs

Inference costs, not training costs.

> The fact that you can replace programmers

You can’t… not for any real project. For quick mockups they’re serviceable

> That’s sort of like asking a horse and buggy driver whether automobiles

Kind of an insult to OP, no? Horse and buggy drivers were not highly educated experts in their field.

Maybe take the word of domain experts rather than AI company marketing teams.


> Maybe take the word of domain experts rather than AI company marketing teams.

Appeal to authority is a well known logical fallacy.

I know how dead NLP is personally because I’ve never been able to get NLP working but once ChatGPT came around, I was able to classify texts extremely easily. It’s transformational.

I was able to get ChatGPT to classify posts based on how political it was from a scale of 1 to 10 and which political leaning they were and then classify the persons likely political affiliations.

All of this without needing to learn any APIs or anything about NLPs. Sorry but given my experience, NLPs are dead in the water right now, except in terms of cost. And cost will go down exponentially as they always do. Right now I’m waiting for the RTC 5090 so I can just do it myself with open source LLM.


> NLPs are dead in the water right now, except in terms of cost.

False.

With all due respect, the fact that you're referring to natural language parsing as "NLPs" makes me question whether you have any experience or modest knowledge around this topic, so it's rather bold of you to make such sweeping generalizations.

It works for your use case because you're just one person running it on your home computer with consumer hardware. Some of us have to run NLP related processing (POS taggers, keyword extraction, etc) in a professional environment at tremendous scale, and reaching for an LLM would absolutely kill our performance.


My understanding is that inference models can absolutely scale down, we are only at the beginning of these getting minimized, and they are trivial to parallelize. That's not a good combo to be against them, their price/performance/efficiency will quickly drop/grow/grow.

“I couldn’t be bothered learning something, and now I don’t have to! Checkmate!”

While LLM’s can have their uses, let’s not get carried away.


That’s true. I did avoid learning traditional NLP techniques because for my use case - call centers - LLMs do a much better job.

Context for the problem space:

https://dl.acm.org/doi/fullHtml/10.1145/3442381.3449870


Performance and cost are trade-offs though. You could just as well say that LLMs are dead in the water, except in terms of performance.

It does seem likely we’ll soon have cheap enough LLM inference to displace traditional NLP entirely, although not quite yet.


> Appeal to authority is a well known logical fallacy.

I did not make an appeal to authority. I made an appeal to expertise.

It’s why you’d trust a doctor’s medical opinion over a child’s.

I’m not saying “listen to this guy because their captain of NLP” I’m saying listen because experts have spent years of hands on experience with things like getting NLP working at all.

> I know how dead NLP is personally because I’ve never been able to get NLP working

So you’re not an expert in the field. Barely know anything about it, but you’re okay hand waving away expertise bc you got a toy NLP Demo working…

That’s great, dude.

> I was able to get ChatGPT to classify posts based on how political it was from a scale of 1 to 10

And I know you didn’t compare the results against classic NLP to see if there was any improvements because you don’t know how…


> I did not make an appeal to authority. I made an appeal to expertise.

Lol

> I’m saying listen because experts have spent years of hands on experience with things like getting NLP working at all.

“It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

Upton Sinclair

> Barely know anything about it, but you’re okay hand waving away expertise bc you got a toy NLP Demo working…

Yes that’s my point. I don’t know anything about implementing an NLP but got something that works pretty well using an LLM extremely quickly and easily.

> And I know you didn’t compare the results against classic NLP to see if there was any improvements because you don’t know NLP…

Do you cross reference all your Google searches to make sure they are giving you the best results vs Bing and DDG?

Do you cross reference the results from your NLP with LLMs to see if there were any improvements?


> Lol

Great argument

> “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

NLP professionals are also LLM professionals. LLMs are tools in an NLP toolkit. LLMs don’t make the NLP professional obsolete the way it makes handwritten spam obsolete.

I was going to explain this further but you literally wouldn’t understand.

> Do you cross reference all your Google searches to make sure they are giving you the best results vs Bing and DDG?

…Yes I do…

That’s why I cancelled my kagi subscription. It was just as good as DDG.

> Do you cross reference the results from your NLP with LLMs to see if there were any improvements?

Yes I do… because I want to use the best tool for the job. Not just the first one I was able to get working…


I haven’t understood these types of uses. How do you validate the score that the LLM gives?

The same way you validate scores given by NLPs I assume. You run various tests and look at the results and see if they match what you would expect.

> Inference costs, not training costs.

Why does training cost matter if you have a general intelligence that can do the task for you, that’s getting cheaper to run the task on?

> for quick mockups they’re serviceable

I know multiple startups that use LLMs as their core bread-and-butter intelligence platform instead of tuned but traditional NLP models

> take the word of domain experts

I guess? I wouldn’t call myself an expert by any means but I’ve been working on NLP problems for about 5 years. Most people I know in NLP-adjacent fields have converged around LLMs being good for most (but obviously not all) problems.

> kind of an insult

Depends on whether you think OP intended to offend, ig


> I know multiple startups that use LLMs as their core bread-and-butter intelligence platform instead of tuned but traditional NLP models

It seems like LLMs would be perfect for start-ups that are iterating quickly. As the business, problem, and data mature though I would expect those LLMs to be consolidated into simpler models. This makes sense from a cost and reliability perspective. I wonder also about the impact of making your core IP a set of prompts beholden to the behavior of someone else’s model.


> Why does training cost matter if you have a general intelligence that can do the task for you, that’s getting cheaper to run the task on?

Assuming we didn’t need to train it ever again, it wouldn’t. But we don’t have that, so…

> I know multiple startups that use LLMs as their core bread-and-butter intelligence platform instead of tuned but traditional NLP models

Okay? Did that system write itself entirely? Did it replace the programmers that actually made it?

If so, they should pivot into a Devin competitor.

> Most people I know in NLP-adjacent fields have converged around LLMs being good for most (but obviously not all) problems.

Yeah LLMs are quite good at comming NLP tasks, but AFAIK are not SOTA at any specific task.

Either way, LLMs obviously don’t kill the need for the NLP field.


Reply didn’t say that the expert is uneducated, just that their tool is obsolete. Better look at facts the way they are, sugar coating doesn’t serve anyone.

> The fact that you can replace programmers with prompts

No, you can't. The only thing LLM's replace is internet commentators.


As I explained below, I avoided having to learn anything about ML, PyTorch or any other APIs when trying to classify posts based on how political they were and which affiliation they were. That was holding me back and it was easily replaced by an llm and a prompt. Literally took me minutes what would have taken days or weeks and the results are more than good enough.

> what would have taken days or weeks

Nah, searching Stackoverflow and Github doesn't take "weeks".

That said, due to how utterly broken internet search is nowadays, using an LLM as a search engine proxy is viable.


GPT 3.5 is more accurate at classifying tweets as liberal than it is at identifying posts that are conservative.

If you're going for rough approximation, LLMs are great, and good enough. More care and conventional ML methods are appropriate as the stakes increase though.


GPT 3.5 has been very, very obsolete in terms of price-per-performance for over a year. Bit of a straw man.

No you can’t; LLMs are dog shit at internet banter, too neutered

>The fact that you can replace programmers with prompts

this is how you end up with 1000s of lines of slop that you have no idea how it functions.


While I agree with both you and the article I also think it'll depend on more than just the volume of your data. We have quite a lot of documents that we classify. It's around 10-100k a month, some rather large others simple invoices. We used to have a couple of AI specialists who handled the classification with local NLP models, but when they left we had to find alternatives. For us this was the AI services in the cloud we use and the result has been a document warehouse which is both easier for the business to manage and a "pipeline" which is much cheaper than having those AI specialists on the payroll.

I imagine this wouldn't be the case if we were to do more classification projects, but we aren't. We did try to find replacements first, but it was impossible for us to attract any talent, which isn't too much of a surprise considering it's mainly maintenance. Using external consultants for that maintenance proved to be almost more expensive than having two full time employees.


I suspect any solution like that will be wholesale thrown away in a year or two. Unless the damn thing is going to make money in the next 2-3 years, we are all mostly going to write throwaway code.

Things are such an opportunity cost now days. It’s like trying to capture value out of a transient amorphous cloud, you can’t hold any of it in your hand but the phenomenon is clearly occurring.


Can you talk about the main non-LLM NLP tools you use? e.g. BERT models?

> One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get help.

Assuming you could do 10M+ LLM calls for this task at trivial cost and time, would you do it? i.e. is the only thing keeping you away from LLM the fact they're currently too cumbersome to use?


Why not just run a local LLM for practically free? You can even trivially parallelize it with multiple instances.

I would believe that many NLP problems can be easily solved even by smaller LLM models.


I see LLMs best used as part of a more traditional NLP pipeline.

For example, an approach that does me well is clustering then using LLMs on representative docs. Tools like bertopic are great for this.

I also don't see a clear cut difference between the two in certain areas. Embeddings are critical in LLM pipelines, but for me anyway, also "old school" tools.

I think NLP as described in the article is certainly under threat, but the tools and approaches compliment LLM use well, are far more efficient, and distinguish the pros from the neophytes.

If you're using LLMs for NLP-type tasks, but don't know the NLP tools, you're missing out.



10M items @ 10 tokens each ("20 bottles of ferric chloride" etc) plus 10M tokens out (category) is 100M tokens in 10M tokens out.

Claude Haiku is $0.25 per 1M tokens in, $1.05 per 1M out, so cost would be ~$35.

GPT-4o mini is even cheaper at $0.15 per 1M in.

Of course if your volume justifies the hardware cost you could always run Llama locally, for the cost of the electricity used.


So what would you use to classify whether a document is a critique or something else in 1M documents in a non-English language?

This is a real problem I am dealing with at a library project.

Each document is between 100 to 10k tokens.

Most top (read most expensive) LLMs available in OpenRouter work great, it is the cost (and speed) that is the issue.

If I could come up with something locally runnable that would be fantastic.

Presumably BERT based classifiers would work if I had one properly trained for the language.



You can use embeddings to build classification models using various methods. Not sure what qualifies as "get help" level of cost/throughput, but certainly most providers offer large embedding APIs at much lower cost/higher throughput than their completion APIs.

For context, 10M would cost ~$27.

Say Gemini Flash 8B, allowing ~28 tokens for prompt input at $0.075/1M tokens, plus 2 output tokens at $0.30/1M. Works out to $0.0027 per classification. Or in other words, for 1 penny you could do this classification 3.7 times.


Prompt caching would lower the cost, later similar tech would lower the inference cost too. You have less than 25 tokens, thats between 1-5$.

There may be some use case but I'm not convinced with the one you gave.


So there's a bit of an issue with prompt caching implementations: for both OpenAI API and Claude's API, you need a minimum of 1024 tokens to build the cache for whatever reason. For simple problems, that can be hard to hit and may require padding the system prompt a bit.

> LLMs are not feasible when you have a dataset of 10 million items that you need to classify relatively fast and at a reasonable cost.

What? That's simply not true.

Current embedding models are incredibly fast and cheap and will, in the vast majority of NLP tasks, get you far better results than any local set of features you can develop yourself.

I've also done this at work numerous times, and have been working on various NLP tasks for over a decade now. For all future traditional NLP tasks the first pass is going to be to get fetch LLM embeddings and stick on a fairly simple classification model.

> One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get help.

"Prompting" is not how you use LLMs for classification tasks. Sure you can build 0-shot classifiers for some tricky tasks, but if you're doing classification for documents today and you're not starting with an embedding model you're missing some easy gains.


Embedding models are not LLMs in the sense that the term is being used in the title of this post. They are “traditional NLP.”

Can you recommend a way to classify a small number of objects? Local only and Python preferably.

So TLDR: You agree with the author, but not for the same reasons?

> they seem to be doing everything right in the last few years

About that... Not like there isn't a lot to be desired from the linux drivers: I'm running a K80 and M40 in a workstation at home and the thought of having to ever touch the drivers, now that the system is operational, terrifies me. It is by far the biggest "don't fix it if it ain't broke" thing in my life.


Use a filesystem that snapshots AND do a complete backup.

Buy a second system which you can touch?

That IS the second system (my AI home rig). I've given up on Nvidia for using it on my main computer because of their horrid drivers. I switched to Intel ARC about a month ago and I love it. The only downside is that I have a xeon on my main computer and Intel never really bothered to make ARC compatible with xeons so I had to hack my bios around, hoping I don't mess everything up. Luckily for me, it all went well so now I'm probably one of a dozen or so people worldwide to be running xeons + arc on linux. That said, the fact that I don't have to deal with nvidia's wretched linux drivers does bring a smile to my face.

To be honest it is still a problem till this day in some areas. A few years back I was making a gift which involved a ton of electronics and embedded hardware. One of the components I had laying around was a real time clock which refused to go beyond 31.12.1999, no matter what I did. Turned out that in (2021 iirc), there were still rtc modules on the market that had this problem. But it did not affect me all that much since the date had to be displayed on screen so I did the hacky-patchy thing to trim the first two bytes from the string and prepend "20". I bet there are tons of software and hardware out there that use dirty hacks like that to "fix it".

It's lucky that leap years are every 4, and factor as a whole integer into 100. Otherwise that wouldn't have worked?

But not every 100 years except every 400 years so year 2000 is a special case which caused a bunch of issues: https://en.wikipedia.org/wiki/Year_2000_problem#After_Januar...

I'm really glad to see Geerling branching out of his initial niche coverage on everything-raspberry-pi(which I personally find a bit boring 95% of the time).


Most likely an effort to boost the DAU numbers. I quit facebook over a decade ago because I truly felt that it was pointless. At the time I was convincing myself that it's a way to stay in touch with a certain number of people I would otherwise have no way of contacting. Then a friend said something that changed my mind: "If someone is not actively a part of your life, chances are there's a good reason they are not". And he was right: I deleted it, knowing full well that I'd have no other way to connect to hundreds of people. Over a decade later I haven't had the reason to try and contact either one of them. At this point, I don't even know what facebook looks like but all this AI-generated crap is just as pointless as the ads that would get shoved down my throat if I didn't have a very aggressive ad blocker. As much as I was strongly against ad blockers 10 years ago, since many sites and blogs used only that to get some reward for their effort, we are at a point where the internet is unusable without an ad blocker. All major platforms are flooded by AI-generated crap. And I mean Facebook, Medium, StackExchange, hell, I'm willing to bet a good chunk of papers coming out these days are mostly ai-generated. And don't even get me started on musk's shithole that is Twitter. No, I am not saying that AI is not useful or helpful - it is, but it should be a supplement, not the primary ingredient, let alone the sole ingredient.

The true value of the internet used to be the collective knowledge, and not mass-produced regurgitated set of tokens and pixel values. Personally I've gone to the even pre-rss days and have a list of personal blogs I scroll through for things I find interesting and avoid large platforms altogether. Interestingly enough, I've been finding more and more motivation to start writing myself though I rarely get the chance to push it through the end and in most cases I get stuck at 95% for many months until I get to find the time to do the remaining 5% of the work. That's how many I have lined up so far:

git status . | wc -l 59


Not sure if it'll help you, but do read this:

https://maggieappleton.com/garden-history

Perhaps all you need is a perspective shift. Wishing you luck with your personal writing journey!


Given that Denmark gave Ukraine F-16s and didn't bat an eyelash, I'd be all for it. I would be pretty happy if they hand out 2-3 fully loaded B-2's and be like "hey guys, go nuts with them".


I think the one thing no one really mentions is the fact that drones are extremely easy to build in the comfort of your living room. If you have a 3d printer, some spare parts laying around and a cheap electronics store around the corner(check, check and check in my case), you can build a drone for less than 300 bucks which is far less than the retail ones - completely autonomous, no radio link needed, takes off, does what it does and lands on it's own with no human intervention.


We all have that same cheap electronics store, it's called AliExpress.


Delivery success rate is pretty much a random number generator here. Sometimes stuff arrives on time, sometimes it shows up 6 months late, sometimes never. For me it's either the physical store, amazon or mouser.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: