Hacker News new | past | comments | ask | show | jobs | submit | sickblastoise's comments login

Exactly, things like changing the signature of the api for chat completions are an example. OpenAI is looking for any kind of moat, so they make the api for completions more complicated by including “roles”, which are really just dumb templates for prompts that they try to force you to build around in your program. It’s a race to the bottom and they aren’t going to win because they already got greedy and they don’t have any true advantage in IP.

That’s a great analogy! The chemical imbalance thing has always sounded to me like there “are spirits in the system”

The brain imbalance narrative really is a double-edged sword.

On the one hand, it can help people understand that a condition is real and not imagined, especially other than the sufferer. "Look at this brain scan, it looks different from a healthy person's" is quite tangible and convincing.

On the other hand, it can lead to a sense of helplessness, and notions that because there's a physical element to the mental illness, it's a permanent disability, or to be exclusively treated with pills and injections. That just doesn't follow. Sometimes it's true, but in many cases, lifestyle interventions can help, talk therapy can help.


That's just your projection. A proper neurological diagnosis can absolutely help a patient find a viable path to recovery, as well as enable them to find support groups.

And not every neurological disorder is fixable. And the question of whether this is "sometimes" or "in many cases" is still in the air, unless you can provide solid citations proving otherwise. So there is no need for this kind of judgement.


Yeah dude I think you're the one who is projecting. You're reading a lot into what I wrote that just isn't there.

I'm addressing the words you wrote, mainly this idea that learned helplessness is a significant concern when diagnosing chemical imbalances. Learned helplessness is tangential to a proper diagnosis, there is still CBT and other techniques for developing coping mechanisms around mental health issues, but those cannot be utilized without first receiving a diagnosis.

For example, in the case of ADHD, it s clear that patients have a reduced dopamine receptor count. It's not just a chemical imbalance, it's structural. Medication is the only effective solution for severe cases, and the result is night and day. Knowing this helps me make better decisions around my diagnosis, and also alleviates the great shame and anxiety that come hand in hand with ADHD.

I don't wish to devolve into an argument and I apologize if my initial comment was too aggressive.


I think this is a major selling point. If developers could upload their CVs, exercise could package together the users yoe + industry experience with the users learning veracity + main language. I think that would be a solid product that hiring people would pay money for.

The issue is we don’t actually recognize that there are two very different careers that fall under the “software” umbrella.

One is developing hard software, software that needs to be performant on a hardware level. Like operating systems, low level libraries, embedded software, databases. The ability to quickly identify and apply data structures and algorithms leetcode style is very important in hard software.

The second is soft-software, which needs to be performant on the organizational budget/timeline level. This type of software engineering is more about glueing together hard software in the right way to solve business problems. Leetcode style interviews make no sense here, because the glue is usually bash or Python and it isn’t really doing much besides orchestrating hard software and delegating work to it.


And yet it's still not clear that the standard anxiety-driven interview process is in any way helpful for sussing out actual performative ability in the former category.


Imagine you were to pause and take an exact carbon copy of the exact physical state of the universe in the current moment. When you press play in both universes, both universes will play out the exact same way into the end of time, and you would never be able to tell the two apart. That is determinism.

If the universe were not deterministic, for reasons like non-deterministic physics, souls existing, etc. you would potentially see difference between the two as they played out separately.

Another way to think about determinism vs non-determinism is that if something is deterministic, then if you have perfect information about it, you can predict it’s future states exactly. On the flip side, if something is non deterministic, no matter how much information you gather about it, you will never be able to exactly predict its future states (only probabilistically).

Einstein is not so subtly telling us “god doesn’t play dice”, that there are no random physical properties of the universe, and that magic (like souls) doesn’t exist, in his opinion.

Free will means we have a choice in what we do next, but if we go and examine your carbon copy in the other universe you will see they did all of the same things you did. Furthermore if we were granted perfect information about the universe and had a good enough gpu, we could from the physical principles of the universe predict the future state of everything in the universe until the end of time, including all of your future choices, hence no free will.

My thoughts are that people are mixing up the human concept of free will — freedom to make decisions based on how you feel and what you know - with determinism, that the first state of all matter in the universe determined all future states of the universe, and that these are inevitable. The fallacy is that free will in the sense of determinism is only important to human decision making if we have perfect information about the current state of the universe, which we don’t - so until we do, we have to keep guessing what happens next.


Sqlalchemy stands out as a library having probably one of the most complete and pragmatic APIs for database access across all languages.

It is no small feat to create compatibility for modern Python features like type hints and async in a library that has its roots in Python 2, it has absolutely exceeded expectations in that regard.


> Sqlalchemy stands out as a library having probably one of the most complete and pragmatic APIs for database access across all languages.

I can't disagree more. Identity map based ORMs are _awful_ to use, in almost every way.


What's your gripe with identity map specifically? The map is bound to a session which are short lived in SQA and usually map to a single transaction.


What’s bad about them?


Sqlalchemy in general is great but the data class integration feels non pythonic to me, due perhaps to catering first to the typing crowd instead of the ergonomic one.


I felt that too but over time decided that it compromised the theoretical pythonicity for the practical compromise of being flexible enough to work properly with SQL.

One other reason for its popularity and success is how engaged the orginal developer is with the overall community.


What they need is a proper migration diff and generation tool with strong defaults. Alembic is meh and the DX is poor. Prisma and Django's migration/diff tools are the gold standard.

(There's also Atlas for Python but it isn't much better, a lot of fiddly config files https://atlasgo.io/guides/orms/sqlalchemy)


Prisma defeats the entire point of sql with it's weird almost graphql like thing. Apparently developers can't be trusted with it cause you can do big dumb in it.


Alembic is amazing. I haven't found a better tool in any language. I actually use it even when I don't use python as the main language


Agreed, SQLAlchemy impressed me a decade ago and it still does today.


I seriously don’t understand how searching for a file in windows takes so long and yields such crappy results? What abomination must there be under the hood for it to be this consistently bad for all of these years? Microsoft devs chime in if you have any insight.


Somehow i don't even think it is enshittification, because their search has been bad forever. On all previous versions of windows server even.

ok ok, maybe it would slow things down to index shared drives? well how do you fuck up simple search on the LOCAL computer too???? I have to use powershell to do searching "gci -recurse" is built in alias for get-childitem. And it wasn't too many more lines of code to start searching the contents of word and excel files. (although this does take a lot longer, at least it works)


WinXP Search was pretty good.


I do - it just makes sense if you are riding public transport or touching stuff that isn’t as clean as you’d think. Like your phone, wallet, keys, backpack, shoe laces, pant pockets, etc. Not obsessively, but if I haven’t washed my hands in a few hours or feel they are dirty I wash them before eating.


This is really neat, have you had any issues with hallucination?


Thanks for asking! Yes, but we're actively addressing them.

We do a few things under the hood to make hallucinations significantly less likely. First, we make sure every single statement made by the LLM has a fact ID associated with it... Then we've fine-tuned "verification" LLMs that review all statements to make sure that assertions being made are backed up by facts, and that the facts are actually aligned with the assertion.

It's still possible for the LLM to hallucinate in this process, but the likelihood is much lower.


Here’s a simple rule, based on the fact no one has shown that an llm or a compound llm system can produce an output that doesn’t need to be verified for correctness by a human across any input:

The rate at which llm/llm compound systems can produce output > the rate at which humans can verify the output

I think it follows that we should not use llms for anything critical.

The gunghoe adoption and hamfisting of llms into critical processes, like an AWs migration to Java 17, or root cause analysis is plainly premature, naive, and dangerous.


This is a highly relevant and accurate point. Let me explain how this happens in real life instead of breathless C-type hucksterism:

We have a project working on very large code-base in .NET Web Forms (and other old tech) that needs be updated to more modern tech so it can be in .NET 8 and run on linux to save hosting costs. I realize this is more complicated that just convert to later versions of Java, but it's roughly the same idea. The original estimate was for 5 devs for 5 years. C-types decide it's time to use LLMs to help this get done. We use both Co-Pilot and later others, Claude of which turned out to be the most useful. Senior devs create processes that offshore teams start using to convert code. Target tech can be varied based on updated requirements, so some went to Razor pages, some to JS with .NET API, some other stuff. Looks to be pretty good modernization at the start.

Then the Senior devs start trying to vet the changes. This turns out to be a monumental undertaking. Literally swamped code reviewing output from the offshore teams. Many, many subtle bugs were introduced. It was noted that the bugs were from the LLMs, not the offshore team.

A very real fatigue sets in among senior devs where all they're doing is vetting machine generate code. I can't tell you how mind numbing this becomes. You start to use the LLMs to help review, which seems good but really compounds the problem.

Due to the time this is taking, some parts of the code start to be vetted by just the offshore team, and only the "important things" get reviewed by Senior devs.

This works fine for exactly 5 weeks after the first live deploy. At that point the live system experiences a major meltdown and causes an outage affecting a large number of customers. All hands on deck, trying to find the problem. Days go by, system limps along on restarts and patches, until the actual primary culprit is found, which turns out to be a == for some reason being turned into a != in a particular gnarly set of boolean logic. There were other problems as well, but that particular one wreaked the most havoc.

Now they're back to formal, very careful code reviews, and I moved onto a different project on threat of leaving. If this is the future of programming, it's going to be a royal slog.


> Here’s a simple rule, based on the fact no one has shown that an llm or a compound llm system can produce an output that doesn’t need to be verified for correctness by a human across any input:

I’m still not sure why some of us are so convinced there isn’t an answer to properly verifying LLM output. In so many circumstances, having output pushed 90-95% of the way is very easily pushed to 100% by topping off with a deterministic system.

Do I depend on an LLM to perform 8 digit multiplication? Absolutely not, because like you say, I can’t verify the correctness that would drive the statistics of whatever answer it spits out. But why can’t I ask an LLM to write the python code to perform the same calculation and read me its output?

> I think it follows that we should not use llms for anything critical.

While we are at it I think we should also institute an IQ threshold for employees to contribute to or operate around critical systems. If we can’t be sure to an absolute degree that they will not make a mistake, then there is no purpose to using them. All of their work will simply need to be double checked and verified anyway.


There isn’t one answer to how to do it. If you have an answer to validation for your specific use case, go for it. this is not trivial because most flashy things people want to use llms for like code generation and automated RCA’s are hard or impossible to verify without the I Need A More Intelligent Model problem.

2. I believe this is falsely equating what llms do with human intelligence. There is a skill threshhold for interacting with critical systems, for humans it comes down to “will they screw this up?” And the human can do it because humans are generally intelligent. The human can make good decisions to predict and handle potential failure modes because of this.


Also, let’s remember the most important thing about replacing humans with AI - a human is accountable for what they do.

That is, ignoring all the other myriad, multidimensional other nuances of human/social interactions that allow you to trust a person (and which are non-existent when you interact with an AI).


Why not automate verification itself then? While not possible now, and I would probably never advocate for using LLMs in critical settings, it might be possible to build field-specific verification systems for LLMs with robustness guarantees as well.


If the verification systems for LLMs are built out of LLMs, you haven't addressed the problem at all, just hand-waved a homunculus that itself requires verification.

If the verification systems for LLMs are not built out of LLMs and they're somehow more robust than LLMs at human-language problem solving and analysis, then you should be using the technology the verification system uses instead of LLMs in the first place!


> If the verification systems for LLMs are not built out of LLMs and they're somehow more robust than LLMs at human-language problem solving and analysis, then you should be using the technology the verification system uses instead of LLMs in the first place!

The issue is not in the verification system, but in putting quantifiable bounds on your answer set. If I ask an LLM to multiply large numbers together I can also very easily verify the generated answer by topping it with a deterministic function.

I.e. rather than hoping that an LLM can accurately multiply two 10 digit numbers, I have a much easier (and verified) solution by instead asking it to perform this calculation using python and reading me the output


Spitballing, if you had a digital model of a commercial airplane, you could have an llm write all of the component code for the flight system, then iteratively test the digital model under all possible real world circumstances.

I think automating verification generally might require general intelligence, not an expert though.


The same is true of computers, in fact it has been mathematically proven that it is impossible to answer the general question if a computer program is correct.

But that hasn't stopped the last 40 years from happening because computers made fewer mistakes than the next best alternative. The same needs to be true of LLMs.


The theory you’re alluding to says it is impossible to create a general algorithm that decides any non-trivial property of any computer program.

There is nothing in the theory that prevents you creating a program that verifies a particular specific program.

There is an entire field dedicated to doing just that.


The issue is there to verify a program you need to have a spec. To generate a spec you need to solve the general problem.

This is what gets swept under the rug whenever formal methods are brought up.


That is not true at all. You do not need to generate a spec. All you need to do is prove a property. This can be done in many ways.

For example, many things can be proven about the following program without having to solve any general problem at all:

echo “hello world”

Similarly for quick sort, merge sort, and all sort of things. The degree of formality doesn’t have to go to formal methods which are only a very small part of the whole field


>echo “hello world”

Congratulations, you just launched all the worlds nuclear missiles.

This is to spec since you didn't provide one and we just fed the teletype output into the 'arm and launch' module of the missiles.


What you’re saying is equivalent to throwing out all of mathematics due to the incompleteness theorem and start praying to fried egg jellyfish on full moon


No that's what OP is saying about LLMs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: