Although I think if you asked people 20 years ago to describe a test for something AGI would do, they would be more likely to say “writing a poem” or “making art” than “turning Xunit code to Tunit”
IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.
LLMs to me are still a technology of pure science fiction come to life before our eyes!
Tell them humans need to babysit it and doublecheck its answers to do anything since it isn't as reliable as a human then no they wouldn't call it an AGI back then either.
The whole point about AGI is that it is general like a human, if it has such glaring weaknesses as the current AI has it isn't AGI, it was the same back then. That an AGI can write a poem doesn't mean being able to write a poem makes it an AGI, its just an example the AI couldn't do 20 years ago.
Why do human programmers need code review then if they are intelligent?
And why can’t expert programmers deploy code without testing it? Surely they should just be able to write it perfectly first time without errors if they were actually intelligent.
> Why do human programmers need code review then if they are intelligent?
Human programmers don't need code reviews, they can test things themselves. Code reviews is just an optimization to scale up it isn't a requirement to make programs.
Also the AGI is allowed to let another AGI code review it, the point is there shouldn't be a human in the loop.
> And why can’t expert programmers deploy code without testing it?
Testing it can be done by themselves, the AGI model is allowed to test its own things as well.
Well AGI can write unit tests, write application code then run the tests and iterate - agents in cursor are doing this already.
Just not for more complex applications.
Code review does often find bugs in code…
Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!
I also think it’s quite a “programmer mentality” that most of the tests in this forum about if something is/isn’t AGI ultimately boils down to if it can write bug-free code, rather than if it can negotiate or sympathise or be humerous or write an engaging screen play… I’m not saying AGI is good at those things yet, but it’s interesting that we talk about the test of AGI being transpiling code rather than understanding philosophy.
> Put another way, I’m not a strong dev but good LLMs can write lots of code with less bugs than me!
But the AI still can't replace you, it doesn't learn as it go and therefore fail to navigate long term tasks the way humans do. When a human writes a big program he learns how to write it as he writes it, these current AI cannot do that.
You presumably understand the posters underlying point though - that the definition of 'general intelligence' does not need to be 'at above-average human level' and humans can be intelligent without being able to use a computer or do some sort of job on a VM.
This is an incredibly specific test/definition of AGI - particularly remembering that I would probably say an octopus classes as an intelligent being yet can't use outlook...
In comp sci it’s been deterministic, but in other science disciplines (eg medicine) it’s not. Also in lots of science it looks non-deterministic until it’s not (eg medicine is theoretically deterministic, but you have to reason about it experimentally and with probabilities - doesn’t mean novel drugs aren’t technological advancements).
And while the kind of errors hasn’t changed, the quantity and severity of the errors has dropped dramatically in a relatively short span of time.
It's the whole answer being correct that's the important thing, and if you compare GPT 3 vs where we are today only 5 years later the progress in accuracy, knowledge and intelligence is jaw dropping.
> Heathrow carries over 50% of air freight and is a major hub.
Not denying it, but it does depend on what you're sending.
For example, if you send something by DHL, it has a significantly greater chance of going through East Midlands Airport than it does Heathrow.
Same for UPS and others. The bulk of their recent investments have been away from Heathrow.
The non-Heathrow sites have better road connections, and more importantly for air cargo, the noise abatement rules at non-Heathrow sites are more relaxed.
The other problem with Heathrow is that BA have their finger in the pies and they have too many slots, so that limits any growth on the independent freight side.
Heathrow has effectively hit its capacity limit. That may or may not change if they ever build the third runway.
Heathrow undoubtedly does the most air cargo. Sure express often comes into EMA on dedicated flights, but lots of freight comes in the hold of passenger aircraft, and that’s where Heathrow is king. The lack of passenger traffic is undoubtedly a key reason why EMA only does 1/5th of Heathrow’s air cargo, as as you have noted it’s ideally located to serve a lot of the UK.
Dragging downloads into the applications folder is 100% amazing imo!
It's because conceptually everything is supposed to be a standalone executable.
The windows alternative is random installers that put files all over your machine and then questionable uninstallers that still leave half the files discarded everywhere.
But they don't know that so they don't care.
macOS is very appealing to authoritarian control freak because they have the impression of being "in control", ofc the reality is quite different.
One particularly dumb thing is that apps sometimes have various assets that will be modified or used as template/whatever in their App Bundle and this get copied to their Application Support folder, so there is no application install process per se, but it does get done at first launch, and now you are wasting hard drive space just to store files that got duplicated somewhere else.
macOS/iOS App Bundles are extremely large for many cumulative reasons like that. I will note that Apple has no interest whatsoever in improving the situation considering how much they charge for storage...
ChromeOS balances the different paradigms reasonably well with Crostini and the ability to run Linux-based applications with native hardware virtualization. Solves for every power-user case you can't get from Android or the browser.
I recommend the starlite if you don’t depend primarily on battery power. Personally won’t be going back to Apple’s golden handcuffs, and would never consider the user-hostile MS.
It looks great! Although the demo shows horrible security practices...
Clearly authentication shouldn't rely on prompt engineering.
Particularly when at the end of the demo it says "we have tested it again and now it shows that the security issue is fixed" - No it's not fixed! It's hidden! Still a gaping security hole. Clearly just a very bad example, particularly considering the context is banking.
Appreciate the feedback! Completely agree - authentication should be handled at the system level, not just in prompts. This demo is meant to showcase how teams can build test cases from real failures and ensure fixes work before deployment. We’ll consider using a better example.
Appreciate the feedback! To clarify, Roark isn’t handling authentication itself - it’s a testing and observability tool to help teams catch when their AI fails to follow expected security protocols (like verifying identity before sharing sensitive info).
That said, totally fair point that this example could be clearer—we’ll keep that in mind for future demos. Thanks for calling it out!
They agreed to set up service in the falklands on the condition that they be allowed to have a monopoly. If you take that away you’re breaking a deal you made.
You don’t have to break the deal, just serve notice.
A bit of googling on this agreement shows that there is a 5 year notice clause, so they can just serve notice now if they want. No need to pay anyone off.
Presumably because they have a piece of legally binding paper saying they do? Changing the law to cut out the monopoly is the government reneging on the agreement they made with the monopoly to pony up the cash and presumably a tort.
The article says that that’s the status quo but Starlink aren’t ok with it. Not enforcing the law would also seem to be actionable under any agreement signed though. Finally, any precedent set here will make it much harder for the government to strike any other kind of deal next time it wants to encourage a company to invest.
They could just exit using the clauses in the agreement though, like the one they have seems to have a 5 year notice clause.
I’m not saying break contracts, just saying there are usually sensible alternatives to exiting these sort of things that don’t involve just paying someone off.
IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.
LLMs to me are still a technology of pure science fiction come to life before our eyes!
reply