Hacker News new | past | comments | ask | show | jobs | submit | lolsowrong's comments login

Trail of Bits if you want it done right.


Explain why it’s a huge mistake, please.


I think the idea is ambiguity between a zip file from your coworkers website and an entirely separate phishing website which downloads an entirely different zip file with a malicious payload.

Anything that introduces unnecessary and previously unforseen ambiguity to the olds is just another path to filling the internet with scams


Browser vendors should just splash users with one of those click-through security warnings. Make it bright yellow.

I'd be very entertained by drama from owners of those domains, but in my opinion, such a thing would be completely justified.


Here’s the problem: the biggest browser vendor is the one selling the domains!


Well, we also have .com as a common extension on Windows machines?


Check out familyphotos.zip


A link reading attachment.zip is no longer a 'safe' file but a eg browser window.


Did you learn to write or draw without studying the samples of artists and authors before you?


Kids one hundred percent learn to draw without sampling from other authors.


Do kids draw in a private room, with no other kids around, and are blindfolded from birth until one day, they're given crayons and paper? Do they also not see pictures in books read to them by adults? They may not have taken an art history class and ingested an Internet's worth of Artstation and Greg Rutkowski, but they have plenty of outside influence.

Of course, their limited motor skills get in the way, but they're working on it.


The most stereotypical kid drawing is of themselves in a house with their parents. They do not need to sample other drawings to get to that.

Regardless, if we go back a few centuries, humans definitely demonstrated their ability to physically draw on paper, without sampling from sources other than their eyes.

But even if we were to consider that human eyes are effectively a way to “collect” samples, that’s another thing not taken into account in the aforementioned article.


gnunits*


That’s DARPA’s bread and butter. They’re a research org, not a productization org.


Yes. Most of the work is not done in the DARPA building. Generally federal contractor sites. The daily work in the building is phone calls, reviewing papers, email, building powerpoint slides, and trying to come up with the next program. In my time, most PMs managed between 3 and 5 programs, some of their own creation, and some inherited from outgoing PMs (term limit + federal contracting timelines means generally the originating PM is not around to see a program through to its conclusion)


Program Managers are fixed term - up to 3 2 year terms (though Director Tompkins generally doesn’t like PMs to be around longer than 4 years - see section 1101 of the Strom Thurmond NDAA for more details about the hiring authority).

I don’t know about the admin vs director types split you’re talking about (at least in the PM ranks). Office directors generally provided top cover and some very sweeping research directions, but as a PM it was your job to come up with research programs that were roughly compatible, convince the office director it was, and then manage the technical execution with the performers and doing dog and pony shows when the pentagon called.


How ironic that Strom Thurmond had anything to do with any limits on tenure for such posts...

This guy was in grifting senate/etc for 48!!!!! YEARS...

https://www.ranker.com/list/facts-about-strom-thurmond/micha... (I do not know if these are true - but its not the first time hearing about some of these allegations)



Well damn that is pretty cool. I’ve not used UTM or the Apple Virt Framework but gonna have to look into that now! Thank you for the link.


UTM also have their own docs on it:

https://docs.getutm.app/advanced/rosetta/

And FYI, this is also available in Docker Desktop for macOS, which allows better performance when running x86_64 containers compared to the older solution where Docker was using qemu.


Do you have any sources for that? I basically gave up on Docker for M1 since the qemu performance is terrible.

Last time I checked, Rosetta only worked for one-off binaries, wouldn’t really allow docker to run more x86 code inside.


It is used for individual binaries. The Linux kernel is arm, but the binaries running on it can be x86 running on Rosetta. This works particularly well with containers, which come with their own libc.


Search for Rosetta on this page

https://docs.docker.com/desktop/settings/mac/


I'm not sure what you're asking here?

Rosetta is a tool for running x86-64 binaries on an arm64 OS.

I think you're asking if rosetta lets you run an x86 kernel, to which the answer is no - the whole point of this framework is to support virtualization, e.g. the OS is running directly on the hardware. The moment the OS can't do that, there is no point in doing anything other than emulation.


Yes, I've used it with Terraform plugins by setting the arch to x86 and running via docker. Needed to do this as some plugins don't have arm64 versions.


downside is that there is no JIT-cache so it's much slower than Rosetta on Mac as it always does the translation on demand


On the run so apologies for not reading the docs you attached yet but I’ve been having a helluva time trying to cross-compile rust from an M1 for x86_64. Will this method help in this situation? I’ve tried running an x86_64 vm in emulation to compile rust bits it excruciatingly slow.


Google’s implementation allows multiple passkeys. I have a passkey for my iPhone, one from Windows Hello, and 3 from different yubikeys I’ve picked up over the years. The yubikey implementation requires setting a PIN, so I’m not even really worried about one of them being lost of stolen.


Cloud you please elaborate in detail how someone not using Yubi keys should adopt it by having hardware keys and how much keys should one have?

How Keys are different than an OTP app like Authy?


The Apple security model for Apple ID and iCloud already works like this. Every device is effectively a “passkey” even if they don’t call it that. Been that way for a while now.

Every device (except accessories like AppleTV, HomePod, etc.) you log into iCloud with effectively has Super Admin control over your entire iCloud account. Any logged-in device can remove, modify, or change (almost) anything without a password… including the account password (hence the almost). Once authorized, that access is controlled by biometrics with a backup PIN. As long as you maintain control of a single device you have access to everything.

Yubikeys work the same way. Doubly so when dealing with resident credentials and passkeys. Key as as many as possible—just make sure the PIN is not obvious. If you hold the key and know the PIN, you can do anything. No other information is needed.

The big difference between this and OTP is two fold: 1) much more resistant against phishing; and 2) the underlying key is less likely (or impossible-ish) to be exposed. Phishing a 2FA OTP is actually not hard with a good fake UI. It just requires someone to act quickly on the other end or a good script that can quickly change the password/security settings once a password and OTP are successfully phished.


You’re an inspiration. I’m going to open an accounting firm and refuse to use calculators.


Username checks out because you missed the point.

Yes, I know calculators can encourage people to not think; I should know because I wrote one. [1]

But the current "AI" tech is so much worse on that front. It's a difference of degree, and that degree does matter.

By the way, when I was learning to fly helicopters [2], I used my calculator to calculate weight and balances, but I also did it by hand!

[1]: https://git.gavinhoward.com/gavin/bc

[2]: https://gavinhoward.com/2022/09/grounded-for-life-losing-the...


I think you missed their point - calculators don't exist in accountancy firms to 'encourage accountants not to think', they exist because they dramatically speed up accountancy and make accountants more productive.

Sure you can open an accountancy business and refuse to use calculators, but that's just working with a strange self-imposed limit rather than using technology to best support your business.


And you missed my point.

Yes, calculators do that. I'm arguing that "AI" does not let programmers write better code faster. It lets them write worse code faster, or better code slower.


I'm sure people said the same thing about compilers. And then interpreters. Even today people complain about interpreted languages being too slow and not requiring people to understand "enough" of what's actually happening.

Turns out that really doesn't matter. I think your argument is incredibly weak; the fact that some people don't use these tools effectively doesn't mean that nobody can. Whoever figures this stuff out is going to win, that's just how it works.

That is to say, there will always be a niche for people who refuse to move up the chain of abstraction: they're actually incredibly necessary. However, as low-level foundations improve, the possibilities enabled higher up the chain grow at an exponentially-higher rate, and so that's where most of the work is needed. Career-wise it might be better to avoid AI if that's what you want to do, but as a business I can't see a dogmatic stance against these tools being anything but an own goal.


> Turns out that really doesn't matter.

Except that it does!

For every level of abstraction, you lose something, and abstractions are leaky.

The lower levels of abstraction make you lose the least, and they are also the least leaky. The higher you go, the more you lose, and the more leaky.

What I'm claiming is that these "AI" tools have definitely reached the point where the losses and the leaks are too large to justify. And I'm betting my career on that.


We all rely on abstractions over layers we don't deal with directly, that's just a fact. You're not running a home-grown OS on custom-built hardware made from materials you mined out of the ground yourself. AI is just another layer. Not everyone operates on the highest, newest layer, and that's absolutely fine. You can carve your niche anywhere you like. Telling yourself that the layer above you isn't feasible isn't going to do you any favors but it does generate buzz on social media which seems like it's the goal here.

You're not betting anything because the cost for you to change your mind and start working with AI tools is exactly 0. This rhetoric is just marketing. I'm sure you'll find the customers that are right for you, but you can at least admit that this kind of talk is putting the aesthetic preference of what you want work to look like above what's actually the most effective. Again, I'm sure you'll find customers who share those aesthetic preferences, but to pretend like it's actually an engineering concern is marketing gone too far.


> We all rely on abstractions over layers we don't deal with directly, that's just a fact.

Did I ever deny that? Sure, some of those layers are worth it. That doesn't address my assertion that these "AI" tools are not.

> Telling yourself that the layer above you isn't feasible isn't going to do you any favors but it does generate buzz on social media which seems like it's the goal here.

You're halfway there.

> You're not betting anything because the cost for you to change your mind and start working with AI tools is exactly 0.

And here is where you contradict yourself.

If I'm getting loud about this bet, and making customers because of this bet, then it will cost me a lot to start working with "AI" tools. My customers will have come to be because I don't, so if I start, I could easily lose all of them!

> This rhetoric is just marketing.

Yep! But that's what makes my best actually cost something. I'm doing this on purpose.

> I'm sure you'll find the customers that are right for you, but you can at least admit that this kind of talk is putting the aesthetic preference of what you want work to look like above what's actually the most effective.

No, I will not admit that because I believe very strongly that my software will be better, including engineering-wise, than my competitors who use these "AI" tools.


> Yes, calculators do that. I'm arguing that "AI" does not let programmers write better code faster. It lets them write worse code faster, or better code slower.

The idea isn't to write better code faster, it's to build better products faster.

Although IMO in the future, AI will probably also enable programmers to write better code too (faster, less bugs, more secure, more frequently refactored etc)


> The idea isn't to write better code faster, it's to build better products faster.

All else being equal, better code means better products.

Also, to have a better product without better code, you're implying that the design of the product is better and that these "AI" tools help with that.

Until they can reason, they cannot help with design.


I think all else being equal, better code means you aren’t changing the system as fast and likely have stagnated in the business or growth side. Maybe that is appropriate for where your company is, but worse is better wins so often.

And I would bet that AI design would help things where the existing designers are bad, e.g. so much open source UI (that is, not cli UX) written by devs, but it is still a bit away from the top quality like Steve Jobs.

Maybe this is like the transition from hand crafted things to machined things; we go from a world some some excellent design and some meh design to a world with more uniform but less great designs.


I don't need my business to grow. I want to support myself and my wife. That's it. You can call that whatever you like, but stagnation isn't it, unless you think that SQLite is stagnant because SQLite had the same business model.

"AI" design will not help until we have a true AI that can reason. (I don't think we ever will.)

Why is reasoning necessary? Because design is about understanding constraints and working within them while still producing a functional thing. A next-word-predictor will never be able to do that.


GPT4 can clearly already reason IMO (I mean it can play chess fairly well without ever being taught, or if you create a puzzle from scratch and tell it to it it can try to work it out and describe the logical approach it took). It’s definitely surprising that a next-word generator has developed the ability to reason, but I guess that’s where we are!

What is your definition of reasoning that you do not think GPT-4 would demonstrate signs of?


> What is your definition of reasoning that you do not think GPT-4 would demonstrate signs of?

Heh, there have been many attempts to define reasoning. I haven't seen a good one yet.

However, I'm going to throw my hat into the ring, so be on the lookout for a blog post with that. I've got a draft and a lot of ideas. I'm spending the time to make it good.


Well GPT4 certainly fulfils the existing definitions of reasoning, so maybe you should call your thing something else instead of redefining ‘reasoning’ to mean something different?

Otherwise it’s just moving the goalposts.


GPT4 is certainly not fulfilling the definition of reasoning. It's borrowing the intelligence of every human who wrote something that went into its model.

To demonstrate this, ask it to prove something that most or all people believe. Say some "intuitive" math thing. Perhaps the fact that factorial grows faster than exponential functions.

And no, don't just have it explain it, have it prove it, as in a full mathematical proof. Give it a minimal set of axioms to start with.

Merriam-Webster's definition of "reasoning" [1] says that reasoning is:

> the drawing of inferences or conclusions through the use of reason

So starting GPT4 off with some axioms would give it a starting point to base its inferences on.

Then, if it does prove it, take away one axiom. Since you started with a minimal set, it should now be impossible for GPT4 to prove that fact, and it should tell you this.

Having GPT4 prove something with as few axioms as possible and also admit that it cannot prove something with too few axioms is a great test for if it is truly reasoning.

[1]: https://www.merriam-webster.com/dictionary/reasoning


In order for an AI to reason it doesn’t mean it has to be able to reason about everything at any level - most humans cant rediscover fundamental mathematical theorems from basic axioms, particularly if you keep removing them until they fail, but I don’t think that means most humans are unable to reason.

Take this problem instead which certainly requires some reasoning to answer:

“Consider a theoretical world where people who are shorter always have bigger feet. Ben is taller than Paul, and Paul is taller than Andrew. Steve is shorter than Andrew. Everyone walks the same number of steps each day. All other things being equal, who would step on the most bugs and why?”

I think it’s a logical error to say “AI can’t reason about this, so that proves that it can’t reason about anything at all” (particularly if that example is something most humans can’t do!). The LLMs reasoning is limited compared to human reasoning right now, although it is still definitely demonstrating reasoning.


> "Consider a theoretical world where people who are shorter always have bigger feet. Ben is taller than Paul, and Paul is taller than Andrew. Steve is shorter than Andrew. Everyone walks the same number of steps each day. All other things being equal, who would step on the most bugs and why?"

Because Ben is the tallest, his feet are the biggest, and because he takes the same amount of steps as the others, the amount of area he steps on is larger than the area that the others step on.

Therefore Ben is most likely to be the one to step on the most bugs.

Easy. And I'm not brilliant.

The problem with testing these tools is that you need to ask it a question that is not in their training sets. Most things have been proven, so if a proof is in its training set, the LLM just regurgitates it.

But I also disagree: if the "AI" can't reason about that, it can't reason because that one is so simple my pre-Kindergarten nieces and nephews can do it.

But even if not, the LLM's should have "knowledge" about exponential functions and factorial because the humans who wrote the material in their training sets did. So it's not a lack of knowledge.

And I claim that most humans could rediscover theorems from basic axioms; you've just never asked them to.


“In this theoretical world, shorter people have bigger feet. Given the information provided, we can deduce the following height order:

Ben (tallest) Paul Andrew Steve (shortest) Since shorter people have bigger feet in this world, we can also deduce the following order for foot size:

Steve (biggest feet) Andrew Paul Ben (smallest feet) Assuming that everyone walks the same number of steps each day and all other things being equal, the person with the biggest feet would be more likely to step on the most bugs simply because their larger foot size would cover a greater surface area, increasing the likelihood of coming into contact with bugs on the ground.

Therefore, Steve, who is the shortest and has the biggest feet, would step on the most bugs.”

GPT4 solved it correctly. You didn’t.


My bad. I would have if I hadn't gotten mixed up on the shorter vs taller. You know this too.

And GPT4 didn't solve it correctly. It's a probability, not a certainty, that the shortest person will step on more bugs.


Sure, you would have got it right if you didn't get it wrong.

At the very least, this should be evidence that the problem wasn't a totally-trivial easy pre-kintergarden level problem though, and it did manage to correctly solve it.

It required understanding new axioms (smaller = bigger feet) and infering that people with bigger feet would crush more bugs without this being mentioned in the challenge.

Your dismissal that the AI messed up because it didn't phrase the correct answer back in the way you liked is a little harsh IMO, as the AI's explanation does make it clear it is basing it on likelihoods ("the person with the biggest feet would be more likely...").


That mix up must be the human touch you’ve spoken so highly of.


That all makes sense.


Everyone has limited time, and if AI assistance can increase the speed you can develop & iterate the product to better match user needs that is how it can result in a better product.

Equally if it can help devs launch a month earlier, that’s a huge advantage in terms of working out early product/market fit.

All things being equal, I would rather have a company with better product/market fit than one with great code (even though both are important!).


> if AI assistance can increase the speed you can develop & iterate the product to better match user needs that is how it can result in a better product.

That's a very big "if", and one I just don't think will exist.

Also, that only helps at the beginning. Add the product gets more complex, I believe the AI will help less and less until velocity will become slower than companies like mine.

And product/market fit is just a way for companies to cover up the fact that their founders wanted to found a company, not solve a real problem. If you solve a real problem first, founding a company is simple and you "just" have to sell your solution.


I'm sure that all was very rewarding for you. I'm not sure how it translates into a business. We don't want to teach accountants to deploy their own calculator from the command line and we don't want pilots to do math while they're flying.

You act like encouraging people not to think is a problem. Thing is, you'd be wrong. We want people not just to think, but to focus. If I'm a pilot and I have to worry about the runtime environment of the command-line calculator I use to hand-calculate my route and cockpit configuration, is that a good use of my focus? I think most people would say no. We definitely want to discourage the pilot from actively thinking about that kind of stuff. Should they have a grasp of the basics in case of emergency? Sure. Do we have a sustainable and efficient system of transportation if that's how our pilots spend their time? No.


You don't know much about aviation, do you?

Aviation is about redundancy. Redundancy is a good use of a pilot's focus. That's why I did both. I didn't blindly trust my calculator to not have bugs (even though I wrote it!), and I didn't blindly trust my hand calculations to be correct.

If they agree, though, it's a good sign that everything is in good order. That's what redundancy is for, to ensure that a problem in one thing does not lead to another problem, like in the Swiss cheese model of accidents.


Redundancy is not a good use of focus for most people (except SREs and the like). The whole point of redundancy is to remove something from focus. I'm guessing you checked everything by hand for the sole purpose of not having to focus on these calculations mid-flight. I'm sure most commercial pilots rely on a larger organization to make these checks for them and their organization probably employs its own system of checks and redundancies at scale. Putting that all on the pilot is not going to give you a sustainable transportation business.

If you're just trying to get off Gilligan's island, that's another thing entirely.


The final authority, and final liability, for the airplane is on the head of the pilots.

A commercial pilot friend has told me that they still check the calculations. When they don't, they get accidents like the Gimli Glider.

It's like saying that putting all of the legal checks on the lawyers is not going to give you a sustainable business. But we all know that's wrong.


What was the outcome of that incident? Fault was found with the Air Canada procedures, training, and manuals, while the Captain and First Officer went on to receive FAI Diplomas for Outstanding Airmanship. What did not happen was mass calls for the individual pilots to write their own calculation software. I'm not sure what point you're trying to make here, as reading about the incident you mentioned only paints a clearer picture that systematic redundancy is the responsibility of the creating and maintaining the system, not of those using it.


Yes, they did, but I think they should have also been censured for their lack of care as well.

When I wrote that calculator, I didn't write it for flight. I wrote it as a general calculator and just used it for flight. I would have used the GNU bc if I didn't write my own.

So it's a bit disingenuous to claim that I am claiming that pilots should write their own.

And pilots are included in those maintaining the system; they're not just using it.


It is the ability to do the calculations that matters, not that you would do so in actual practice. He very explicitly mentioned 'while I was learning'.


thats a weird flex


How often are calculators confidently wrong?


As a math teacher, this is such a funny comparison to keep reading.

Yes, there's a difference between a deterministic outcome and a non-deterministic one. But throw humans into the loop, and it becomes more interesting. I can't count the number of times I've listened to someone argue their answer must be right because they got it from the calculator. And it's not just students; as a teacher I've always paid attention to how adults use math.

With calculators or GPT tools, or any other automated assistant, judgement and validation continues to matter.


> I can't count the number of times I've listened to someone argue their answer must be right because they got it from the calculator.

Answers from calculators are always right! But the human may have asked the wrong question.


There are a bunch of well-known areas where popular calculators tend to give incorrect answers: https://apcentral.collegeboard.org/courses/resources/example...

It’s mostly fine until it isn’t. AI will probably operate in the same capacity. We already have so much incorrect information out there that’s part of our pop culture. Even down to things like the fact that Darth Vader never said, “Luke, I am your father,” and Mae West never said, “Why don’t you come see me sometime?”

Even basic movie quotes are beyond our ability to get right. Hilariously, I just asked ChatGPT about these quotes and it explained that these are common misquotes, told me what was actually said in these movies, and explained some relevant context.

Sherlock never said, “Elementary, my dear Watson” even once in the books. Kirk never said, “Beam me up, Scotty.” We’re much less correct than we like to think. And somehow we’ve survived.

ChatGPT is fallible just like we are. We’ll manage, just like we always have.


I have another theory about all those quotes. Regarding that Darth Vader quote, if quoted exactly, i.e. "I am you father", it isn't immediately obvious the quote is from Star Wars. "Luke" gives you a context. Sherlock and Kirk quotes are synthesized from what the characters actually said, and arguably the precise wording doesn't matter, because the point of the quote is to bring up images of the characters and situations, not of those specific words.


Go type .1*.2 into any JavaScript console.

Edit: slapping a few more in here:

https://learn.microsoft.com/en-us/office/troubleshoot/excel/...

https://daviddeley.com/pentbug/index.htm


The answer in the Javascript console is still a correct answer. The user did not specify a level of precision, and web browsers are programmed to use a precision level which is reasonable under most circumstances. If the user needs a higher level of precision, he or she needs to specify that as part of the question (such as by not using floating point numbers).

I don't mean to be pedantic. I teach coding to elementary school students, and this is something fundamental I try to make them understand. A computer will always do what you tell it to do. A bug is when you accidentally tell a computer to do something different than what you'd intended.

Going back to the calculator example, if a student used a calculator and got the wrong answer, the problem didn't come from the calculator. This is useful to understand; it can help the student work backwards to figure out what did go wrong.

AI is different in that we've instructed the computer to develop and follow its own instructions. When ChatGPT gives the wrong answer, it is in fact giving the right answer according to the instructions it was instructed to write for itself. With this many layers of abstraction, however, the maxim that computers "always do what you tell them" is no longer useful. No human truly knows what the computer is trying to do.


> I don't mean to be pedantic.

I'm sorry in advance, but this reply is just to meet pedantry with pedantry.

> A computer will always do what you tell it to do.

This is the Bohr model of computers. It's the kind of thing you tell elementary school students because it's conceptually simple and mostly right, but I think we know better here on HN. Pedantically, computers don't always do what you tell them to, because the don't always hear what you tell them, and what you tell them can be corrupted even when they do hear it.

For instance, random particles from outer space can cause a computer to behave quite randomly: https://www.thegamer.com/how-ionizing-particle-outer-space-h...

  why was nobody able to pull it off, even when replicating exactly the inputs that DOTA_Teabag had used? Simple: this glitch requires a phenomenon known as a single-event upset, which is very much out of any player's control.
I don't think we can reasonably say that in this instance, the computer behaved according to what the user told it to do. In fact, it responded to the user and the environment.


That's true. An earlier version of my comment called out hardware problems as an exception—insufficient error correction for neutrino bit flips is fundamentally a hardware problem—but I removed it before posting. In a way, I feel hardware bugs do still follow this principle: The electrons in the circuits are behaving as they always do, just not in the way we intended. But I agree this gets philosophically messy—no one "programmed" the electrons.

My underlying point is that, at least in 99.999% of cases, the problem isn't the calculator, it's the human using the calculator incorrectly. And although you could draw some parallels between calculators and AIs with regard to selecting the right tool and knowing when and how to use it, I'd say the randomness involved in an LLM is fundamentally different.


I don't think it's fundamentally different, and I think you're conflating complexity with randomness.


>The answer in the Javascript console is still a correct answer.

It's wrong in the same way that saying 1/1 = 1.0004 is wrong. It's not a matter of chosen precision in that it doesn't make the answer correct when you increase the number of zeros between 1 and 4.


It makes it less wrong. For most calculations people do we don’t that very many digits of precision for any one calculation.


That's true. I think that it is analogous to the discussion of AI limitations. Both of these are tools and are not categorically exclusive.

In the case of translation of floating point numbers from base-2 to base-10 we have to make approximations which will often be slightly wrong forever without regard for amount of precision.

With AI, depending on the pre-conditions, the AI could be stuck in a state of being slightly wrong forever for a specific question without regard to further refinement of the query.

These are both still useful as tools. We just need to be able to work on the amount of refinement of the answer that the AI gives, which may be able to be solved fairly well through prompt engineering, if not through the advancement of GPT itself.


One deck is made of wood. One deck is made made of steel. They will behave differently after years of weathering.

Just because they are both both decks doesn't mean they are the same.


Are both useful in some context?


In the same way that nearly any two arbitrary objects are useful in some context.


From the perspective of an investor who just wants their stonks to go up, sure. From the perspective of a sailor who wants the deck to not crumble beneath their feet in a storm, no.


Answers from AI might not always be right and a human has to learn to judge them or refine their prompts accordingly. In either case there's a tool that a human must use and become savvy with.


<< Answers from calculators are always right! But the human may have asked the wrong question.

I actually agree with you, but, in the same vein, does it not mean that user did not ask correct prompt?


No they are not


Calculators have no hallucinations, LLMs do. They can literally say that 1+1 is not 2.


People keep saying this, pointing out the "mistakes with confidence" aspect of LLMs, but as someone who is continually amazed by ChatGPT and finds it very useful in my day-to-day, it's hard for me to take this objection seriously if presented as a reason not to use AI.

That is, for me, the output of ChatGPT or other AI tools is the starting point of my investigation, not the end output. Yes, if you just blindly paste the output from an AI tool you're going to have a bad time, but we also standardize code reviews into the human code-writing process - this isn't that different.

Just giving one specific example, I find ChatGPT to my an incredibly efficient "documentation lookup tool". E.g. it's great if I'm working with a new technology or API and I want to know "what my options are", but I don't know what keywords to search for, it can help give me a really good "lay of the land", and from there I can read on my own to get more specifics.


Maybe you haven't used it enough. ChatGPT is wrong all the time for me, sometimes insultingly wrong. The confidence in it's incorrect answers just makes it that much worse.

I can't buy any of this hype for a "word-putting-together" algorithm. It's not real intelligence.


Please give some examples then. I've found the GPT-4 version to be remarkably accurate, and when it makes mistakes it's not hard to spot them.

For example, I commented last week that I've found ChatGPT to be a great tool for managing my task list, and for whatever reason the "verbal" back-and-forth works much better for my brain than a simple checklist-based todo app: https://news.ycombinator.com/item?id=35390644 . But, I also pointed out how it will get the sums for my "task estimate totals by group" wrong. But it's so easy to see this mistake, and after using it for a while I have a good understanding for when it's likely to occur, that it doesn't lessen the value I get from using the tool.


OK, here's one: this substack [1] was flying around a week or two ago, asserting that the marginal value of programmers will fall to zero by 2030. What a dream! No more annoying nerds!

The code in the post is wrong. For this "trivial" example, if you just blindly copied it into your code, it would not do what you want it to do. I love this example not just because it's ironic, but because it's a perfect illustration of how you need to know the answer before you ask for the solution. If you don't know what you're doing, you're gonna have a bad time.

I'm not at all concerned about the value of programmers falling to zero. I'm concerned that a lot of bad programmers are going to get their pants pulled down.

[1] https://skventures.substack.com/p/societys-technical-debt-an...

(Edit: and as a totally hot take, while I'm not worried about good programmers, I think the marginal value of multi-thousand word, think-piece blogposts is rapidly falling to zero. Who needs to pay Paul Kedrosky and Eric Norlon to write silly, incorrect articles, when ChatGPT will do it for free?)


OK, so we are 100% in agreement then? I absolutely don't believe the marginal value of programmers will fall to zero by 2030 (but, to clarify, the way you phrased your original sentence I thought it was that an LLM made this assertion, not some random VC dudes). I also highlighted in my posts that I use AI as an aid to my processes, "That is, for me, the output of ChatGPT or other AI tools is the starting point of my investigation, not the end output. Yes, if you just blindly paste the output from an AI tool you're going to have a bad time, but we also standardize code reviews into the human code-writing process - this isn't that different."

Also, I think the coding example in that substack highlights that one of the most important characteristics of good programmers has always been clarifying requirements. I had to read the phrase "remove all ASCII emojis except the one for shrugs" a couple times because it wasn't immediately clear to me what was meant by "ASCII emojis". I think this example also highlights what happens when you have 2 "VC bros" who don't know what they're talking about highlighting the "clever" nature of what ChatGPT did, because it is totally wrong. Still, I'd easily bet that I could create a much clearer prompt and give it to ChatGPT and get better results, and still have it save me time in writing the boiler plate structure for my code.


You asked for an example and I provided one that I thought illustrated the mistakes GPT makes in a vivid way -- mistakes that are already leading people astray. The fact that this particular example was coupled with a silly prediction is just gravy.

In short, I don't know if we "agree", but I think OP is/was correct that GPT generates lots of subtle mistakes. I'd go so far as to say that the folks filling this thread with "I don't see any problems!" comments are probably revealing that they're not very critical readers of the output.

Now for a wild prediction of my own: maybe the rise of GPT will finally mean the end of these absurd leetcode interview problems. The marginal value of remembering leetcode soltutions is falling to zero. The marginal value of detecting an error in code is shooting up. Completely different skills.


Getting back to that example from that post, though, thinking about it more, "remove all ASCII emojis except the one for shrugs" makes absolutely no sense, because you can't represent shrugs (either with a unicode "Person shrugging" character emoji, or the "kaomoji" version from that code sample that uses Japanese characters) in ASCII, at all. So yes, asking an LLM a non-sensical question is likely to get you a non-sensical response, and it's important to know when you're asking a non-sensical question.


Well, explain it however you like, but the point is that GPT is more than happy to confidently emit gibberish, and if you don't know enough to write the code yourself (or you're outsourcing your thinking to it), then you're going to get fooled.

I'd possibly argue that knowing how to ask the right question is tantamount to knowing the answer.


That code is wrong and I wonder if the author is familiar with the property of code encapsulated in the halting problem. Generically, reading code does not grant one the ability to predict what will happen when that code runs.

Whatever, time will tell. I still haven’t quite figured out how to make good use of GPT-4 in my daily work flow, tho it seems it might be possible.

Has anyone asked it to make an entry for the IOCC?


For a time, I was attempting to use it for game advice during my recent playthrough of Demon's Souls remake (What's the best build for X? What's the best weapon for X?). I asked ChatGPT where to find the NPC The Filthy Woman in a certain level. ChatGPT answered that that NPC doesn't exist, and perhaps I had the wrong game? That NPC most certainly does exist.

I was also using it to generate some Java code for a bit. That is, until it started giving me maven dependencies that didn't exist, and classes that didn't exist, but definitely looked like they would at first glance.


> I asked ChatGPT where to find the NPC The Filthy Woman in a certain level. ChatGPT answered that that NPC doesn't exist, and perhaps I had the wrong game? That NPC most certainly does exist.

OK, wow - that example kind of perfectly proves my point. If I were to ask ChatGPT an extremely specific, low-level question about an extremely niche topic, then I would absolutely be on "high alert" that it wouldn't know the answer. And while I agree the "confidence" with which ChatGPT asserts its answers (though I'd argue the GPT-4 version does a much better job at not being over-confident than 3.5) is off-putting, I think it's pretty easy to detect where it's wrong.

I'd also be curious about your Java example. There was a good YouTube video of a guy that got ChatGPT to write a "population" game for him. In some cases on first try it would output code that had compile errors, e.g. because it had wrong versions of Python dependencies. He would just paste the errors back in to ChatGPT and ChatGPT would correct itself. Again, though, this highlights my point that I use ChatGPT as the start of my processes, a 1st draft if you will. I don't just ask it to write some code, then when I get an error throw my hands up and say "see how dumb ChatGPT is." To each their own, though.


>OK, wow - that example kind of perfectly proves my point. If I were to ask ChatGPT an extremely specific, low-level question about an extremely niche topic, then I would absolutely be on "high alert" that it wouldn't know the answer. And while I agree the "confidence" with which ChatGPT asserts its answers (though I'd argue the GPT-4 version does a much better job at not being over-confident than 3.5) is off-putting, I think it's pretty easy to detect where it's wrong.

I don't consider a popular video game from 2009 to be "extremely niche", and I also shouldn't have to know what ChatGPT knows. And no, I don't think it's easy to detect where it's wrong if you don't know the right answer, and it's actually pretty useless when you have to spend time confirming answers.


I think these type of errors gets mostly resolved with a search plugin.


Out of curiosity was this 3.5 or 4?


I don't believe it was version 4 yet.


Do you happen to know what messages are gonna get dropped by the client if the conversation becomes too long?


It's still just guessing. Ask ChatGPT to provide some links to documentation and check them.

LLMs are great for “creative” work: images, poems, games - entertainment based on imaginary things.


There are three types of lies, as the saying goes: lies, damned lies, and statistics. But why are statistics considered lies?

Because of how they're used.

If you think of AI as a source of truth, obviously you're going to run into trouble: it "lies"! But if instead of thinking of it in isolation, you think of the person+AI producing results, then you should trust that person exactly as much as you would whether or not they use AI.


Depends on the calculator. Floating point imprecision is well documented.


That's true. But we know exactly why it can't do it.


How important is the "knowing why" if the mistakes are still there? And in reverse, we "know" GPT doesn't use a calculator unless specially pointed at one.

Floating point errors creeping in is why we have to use quaternions instead of matrixes for 3D games. Apparently. I'd already given up on doing my own true-3D game engine by that point.

In some sense we "know why" humans make mistakes too — and in many fields from advertising to political zeitgeist we manipulate using knowledge of common human flaws.

On this basis I think the application of pedagogical and psychological studies to AI will be increasingly important.


Well documented is the key difference.


As often as people put in incorrect values. As often as someone goes beyond the range. Always when someone tries to add yellow to blue. Often when the wrong formula is used.

And in this case you don't get to put in the formula.


AI is confidently wrong about a lot of things, but that doesn't mean it's useless. It means you need to verify what it generates. Doing that for code is much easier than prose. AI that produces wrong code is immediately and obviously wrong. It can't really fool you. It's easy to test. You can even ask AI to write tests for the code it produces to demonstrate it's correct.


Tests help but tests can be wrong as well, if this was all so easy we wouldn't have any bugs.


Just because it is making obvious errors doesn’t mean it isn’t also making subtle errors.


This is the thing. ChatGPT shouldn't be that confident in its wording IMO.. If it just said "according to me" instead of stating things as fact, people would have much less problems with it.

We know this wording is just show but people still get swayed by it and believe it must be true.


Yeah and it can actually reflect on itself and its mistakes when prompted, so I can see a fix like this coming soon. Sometimes just asking, Are you sure? is enough for it to apologize and say a part of its answer wasn't based in known fact.

Also another point while I'm here: Many many humans I've met are often confident when incorrect as well and can and will bullshit an answer when it suits their comfort.


But no one asserts the existence of those people will forever change society.


People are confidently wrong all the time and yet we still seem to get stuff done.

A tool is a tool. it has good uses and not good uses. as the human you figure out where it works and where it doesn't.


Every time the human supervising it makes a mistake in their role in the relationship between user and tool.


There's a lot of work happening at the moment around self-reflection and getting LLMs to identify and correct their own hallucinations and mistakes.

https://arxiv.org/pdf/2303.11366.pdf


I suspect some temporality will need to be added. There are times when writing the code you have a question because the code exposes an unexpressed choice in the requirements. When you are coding in linear time, you then know to go ask the question. I am not sure that just generating the most likely or most rewarded response will do that easily. It seems to just arbitrarily pick the most likely requirement.


Every time they are wrong, which is every time the user slips on a key.


How often are people?


I think the difference, right now at least, is that people will go, "well, I'm not sure about this so I think we should look it up, but this is what I think" - the AI doesn't do that. It lies in the same exact way it tells truths. How are you supposed to make decisions based off of that information?


Does it lie? Or just get things wrong sometimes?

Lying requires knowledge that what you are saying is not the truth, and usually there's a motive for doing so.

I don't think ChatGPT is there yet... or is it?


Technically, what ChatGPT is doing is bullshitting because it doesn't have any knowledge of or concern for truthfulness.

https://en.m.wikipedia.org/wiki/On_Bullshit


Sure, it's not lying, you're right, there's no will there, I'm anthropomorphism. It is producing entirely wrong facts / pseudo-opinions (as it can't actually have an opinion).


I was about to suggest "pathologically dishonest", but then I looked up the term and that seems to require being biased in favour of the speaker and knowing that you're saying falsehoods.

"Confabulate" however, appears to be a good description. Confabulation is, I'm told, associated with Alzheimer's, and GPT's output does sometimes remind me of a few things my mum said while she was ill.


Presumably the same way you make decisions on any piece of information. You should not be blindly trusting a single source.


I was too vague I think. The only place where I can see it being acceptable right now is code because I have a whole other system that will call out failures - I can rely on my IDE and my own expertise to hopefully catch issues when they appear.

Outside of the code use case, what should I rely on ChatGPT for that won't have me also looking for the information somewhere else? I suppose subjective soft things, like writing communications. But I can't rely on it for information.


Again, the idea that you should rely on any single source for information is the issue. Nothing changes with ChatGPT other than the apparent expectation that it is infallible.


So what are the use cases that I would use ChatGPT to find information that would speed up my work but would still require me to verify the information? If nothing changes with ChatGPT what is its use as a tool (assuming you want to use it to get information)?


It seemed to do a good job of outlining a JoJo season in the style of a Shakespearean comedy.

I wouldn’t ride in a vehicle it designed tho, based on my week of asking it to do go programming.


Sometimes it does, but I asked ChatGPT (not 4) to give me song lyrics for a song that it should have had data for and it gave me entirely wrong lyrics. I asked again and it gave me me more bad lyrics, not even close, and it didn't even pretend it didn't know, the lyrics would have been convincing. If I didn't already know the material I wouldn't know it was confabulating.


Calculators use floating point and can have catastrophic errors if not used correctly. So yes, calculators can be confidently wrong.


The equivalent is more like having your accountants tell chatGPT what to have the software you already have do. You're adding an extra thing in the mix, and while the person will ostensibly be checking the AIs work, they will eventually become dependent on it. When something goes wrong nobody will know how to fix it. You'll miss payroll or something and be absolutely fucked.

As a software engineer, I'm not concerned about people using AI to write simple functions. That's not where my value is - it absolutely incidental to me.


I'm going to open a law firm, and instead of using AI to generate volumes of documents, I will write them all, in cursive.


This comment is not useful. Calculators are not a replacement for human intelligence and creativity.


Accounting firms don't use calculators that often. They mostly use "What number do you need from us to give you fantasized credit ratings to fool the taxpayers". Accounting firms would benefit greatly from biased AI.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: