Hacker News new | past | comments | ask | show | jobs | submit | Closi's comments login

Although I think if you asked people 20 years ago to describe a test for something AGI would do, they would be more likely to say “writing a poem” or “making art” than “turning Xunit code to Tunit”

IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.

LLMs to me are still a technology of pure science fiction come to life before our eyes!


Tell them humans need to babysit it and doublecheck its answers to do anything since it isn't as reliable as a human then no they wouldn't call it an AGI back then either.

The whole point about AGI is that it is general like a human, if it has such glaring weaknesses as the current AI has it isn't AGI, it was the same back then. That an AGI can write a poem doesn't mean being able to write a poem makes it an AGI, its just an example the AI couldn't do 20 years ago.


Why do human programmers need code review then if they are intelligent?

And why can’t expert programmers deploy code without testing it? Surely they should just be able to write it perfectly first time without errors if they were actually intelligent.


> Why do human programmers need code review then if they are intelligent?

Human programmers don't need code reviews, they can test things themselves. Code reviews is just an optimization to scale up it isn't a requirement to make programs.

Also the AGI is allowed to let another AGI code review it, the point is there shouldn't be a human in the loop.

> And why can’t expert programmers deploy code without testing it?

Testing it can be done by themselves, the AGI model is allowed to test its own things as well.


Well AGI can write unit tests, write application code then run the tests and iterate - agents in cursor are doing this already.

Just not for more complex applications.

Code review does often find bugs in code…

Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

I also think it’s quite a “programmer mentality” that most of the tests in this forum about if something is/isn’t AGI ultimately boils down to if it can write bug-free code, rather than if it can negotiate or sympathise or be humerous or write an engaging screen play… I’m not saying AGI is good at those things yet, but it’s interesting that we talk about the test of AGI being transpiling code rather than understanding philosophy.


> Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!

But the AI still can't replace you, it doesn't learn as it go and therefore fail to navigate long term tasks the way humans do. When a human writes a big program he learns how to write it as he writes it, these current AI cannot do that.


Strictly speaking, it can, but its ability to do so is limited by its context size.

Which keeps growing - Gemini is at 2 million tokens now, which is several books worth of text.

Note also that context is roughly the equivalent of short-term memory in humans, while long-term memory is more like RAG.


You presumably understand the posters underlying point though - that the definition of 'general intelligence' does not need to be 'at above-average human level' and humans can be intelligent without being able to use a computer or do some sort of job on a VM.

This is an incredibly specific test/definition of AGI - particularly remembering that I would probably say an octopus classes as an intelligent being yet can't use outlook...

In comp sci it’s been deterministic, but in other science disciplines (eg medicine) it’s not. Also in lots of science it looks non-deterministic until it’s not (eg medicine is theoretically deterministic, but you have to reason about it experimentally and with probabilities - doesn’t mean novel drugs aren’t technological advancements).

And while the kind of errors hasn’t changed, the quantity and severity of the errors has dropped dramatically in a relatively short span of time.


The problem has always been that every token is suspect.

It's the whole answer being correct that's the important thing, and if you compare GPT 3 vs where we are today only 5 years later the progress in accuracy, knowledge and intelligence is jaw dropping.

I have no idea what you're talking about because they still screw up in the exact same way as gpt3.

The hallucination quantity and severity is way less in new frontier models.

But not more predictable or regular.

> 99% of air cargo to the UK does NOT come to Heathrow.

Not even slightly true - Heathrow carries over 50% of air freight and is a major hub.

(https://www.heathrow.com/company/cargo)


> Heathrow carries over 50% of air freight and is a major hub.

Not denying it, but it does depend on what you're sending.

For example, if you send something by DHL, it has a significantly greater chance of going through East Midlands Airport than it does Heathrow.

Same for UPS and others. The bulk of their recent investments have been away from Heathrow.

The non-Heathrow sites have better road connections, and more importantly for air cargo, the noise abatement rules at non-Heathrow sites are more relaxed.

The other problem with Heathrow is that BA have their finger in the pies and they have too many slots, so that limits any growth on the independent freight side.

Heathrow has effectively hit its capacity limit. That may or may not change if they ever build the third runway.


> Not denying it

Your original post did though!

Heathrow undoubtedly does the most air cargo. Sure express often comes into EMA on dedicated flights, but lots of freight comes in the hold of passenger aircraft, and that’s where Heathrow is king. The lack of passenger traffic is undoubtedly a key reason why EMA only does 1/5th of Heathrow’s air cargo, as as you have noted it’s ideally located to serve a lot of the UK.


It's really quite incredible how people just make shit up, even in this board


Dragging downloads into the applications folder is 100% amazing imo!

It's because conceptually everything is supposed to be a standalone executable.

The windows alternative is random installers that put files all over your machine and then questionable uninstallers that still leave half the files discarded everywhere.


Except a lot of applications aren't standalone executables, they leave a bunch of configuration data that doesn't get uninstalled.


But they don't know that so they don't care. macOS is very appealing to authoritarian control freak because they have the impression of being "in control", ofc the reality is quite different.

One particularly dumb thing is that apps sometimes have various assets that will be modified or used as template/whatever in their App Bundle and this get copied to their Application Support folder, so there is no application install process per se, but it does get done at first launch, and now you are wasting hard drive space just to store files that got duplicated somewhere else. macOS/iOS App Bundles are extremely large for many cumulative reasons like that. I will note that Apple has no interest whatsoever in improving the situation considering how much they charge for storage...


Windows OS is bad for a tablet compared to iPadOS.

iPadOS is bad as a laptop compared to Windows.

Nobody yet has been able to fully square the two different paradigms.


ChromeOS balances the different paradigms reasonably well with Crostini and the ability to run Linux-based applications with native hardware virtualization. Solves for every power-user case you can't get from Android or the browser.


As soon as I'm running virtualization to get an app to run the OS has already lost IMO.


It doesn't balance the privacy reasonably well.


I recommend the starlite if you don’t depend primarily on battery power. Personally won’t be going back to Apple’s golden handcuffs, and would never consider the user-hostile MS.


Operating on battery power is usually a pretty crucial requirement for a tablet.


It works for five hours or so on battery. But I mostly watch movies on it while plugged in.


This is Nuance, Microsoft acquired them in 2021 :)


It looks great! Although the demo shows horrible security practices...

Clearly authentication shouldn't rely on prompt engineering.

Particularly when at the end of the demo it says "we have tested it again and now it shows that the security issue is fixed" - No it's not fixed! It's hidden! Still a gaping security hole. Clearly just a very bad example, particularly considering the context is banking.


Appreciate the feedback! Completely agree - authentication should be handled at the system level, not just in prompts. This demo is meant to showcase how teams can build test cases from real failures and ensure fixes work before deployment. We’ll consider using a better example.


Your post suggests authorization as a feature:

> For each replay that we run, Roark checks if the agent follows key flows (e.g. verifying identity before sharing account details)

I don't know if AI will be more susceptible or less susceptible to phishing than humans, but this feels like a bad practice.


Appreciate the feedback! To clarify, Roark isn’t handling authentication itself - it’s a testing and observability tool to help teams catch when their AI fails to follow expected security protocols (like verifying identity before sharing sensitive info).

That said, totally fair point that this example could be clearer—we’ll keep that in mind for future demos. Thanks for calling it out!


Again though, verifying identity before sharing sensitive info shouldn’t be down to the LLM following its prompt - it should be enforced by design.


Or just allow competitors in?

Why does the monopoly need paying off? It’s risk/reward for them, if they end up loosing money on their venture why should the taxpayer foot the bill?


They agreed to set up service in the falklands on the condition that they be allowed to have a monopoly. If you take that away you’re breaking a deal you made.


You don’t have to break the deal, just serve notice.

A bit of googling on this agreement shows that there is a 5 year notice clause, so they can just serve notice now if they want. No need to pay anyone off.

(https://www.openfalklands.org.fk/leos-and-vsat-legislation-i...)


> Why does the monopoly need paying off?

Presumably because they have a piece of legally binding paper saying they do? Changing the law to cut out the monopoly is the government reneging on the agreement they made with the monopoly to pony up the cash and presumably a tort.


They could also simply fail to enforce the law. Or use that as a threat to come to a negotiated settlement with the monopoly.


> They could also simply fail to enforce the law

The article says that that’s the status quo but Starlink aren’t ok with it. Not enforcing the law would also seem to be actionable under any agreement signed though. Finally, any precedent set here will make it much harder for the government to strike any other kind of deal next time it wants to encourage a company to invest.


The reputation and credit rating of governments falls if they fail to respect contracts they've signed under their own laws.


They could just exit using the clauses in the agreement though, like the one they have seems to have a 5 year notice clause.

I’m not saying break contracts, just saying there are usually sensible alternatives to exiting these sort of things that don’t involve just paying someone off.


I would imagine there is a contract stipulating that they’re a monopoly because otherwise the economics don’t work out.


There is but it has a 5 year notice period that can be enacted.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: