Right on, I couldn’t agree more. We are living during a period of exponential progress. I like AlphaCode’s approach of using language models with search. In the last year I have experimented with mixing language models for NLP with semantic web/linked data tasks, so much simpler than what AlphaCode does, but I have been having fun. I have added examples for this in new additions to two of my books, but if you want a 90 second play-time, here is a simple colab notebook https://colab.research.google.com/drive/1FX-0eizj2vayXsqfSB2...
We live in exciting times. Given exponential rates of progress, I can’t really even imagine what breakthroughs we will see in the next year, let alone the next five years.
EDIT: I forgot to mention: having GitHub/OpenAI tools like CoPilot always running in the background in PyCharm and VSCode has in a few short months changed my workflow, for the better.
I mostly work with data mining on my personal projects(which is a couple of hours every day), and I'm pretty sure I didn't have to write a single regex since I've started using Copilot. It's hard for me to even imagine how I used to do it before, and how much time I've wasted on stupid mistakes and typos. Now I just write a comment of what I want and an example string. It does the job without me having to modify anything 99% of the time, even for complex stuff. Sure, sometimes it gives an overly complicated solution but it almost always works. Exciting times.
I never said anything about not understanding them.
If it's something simple, like getting a date from a string, a quick glance will tell me if it'll work. If it's more complex, a quick glance or two will give me the general idea and then I can test it against what I think will be edge cases.
If you don't know what kind of a string you'll have to process then you can't really know if any regex is correct, and if you do testing it in most cases is pretty easy and quick. You'd have to test even if you write it yourself.
And in the cases where it's wrong it usually gives me a good starting point.
Who cares if they're correct? If they fail a test, you can fix them. If they turn out to be wrong in production you can isolate the example and add it as a test. Producing regexes that pass all current tests but contain a subtle bug satisfy 100% of what programmers are incentivized to do. Producing them very quickly will get you promoted.
I think we will very soon start seeing a clear separation between programmers ("co-pilot operators") and software engineers (those who do the thinking and understanding when there's someone "who cares").
We already have that bifurcation in the sense that the majority of programmers can use frameworks, libraries, etc. to get work done but don't have the deeper knowledge to build them well.
I feel like you only see programmers as cogs, and not programmers as invested in the success of their product with the statement like "satisfy 100% of what programmers are incentivized to do"
Generally the people using regexes care if they're correct. Frequently, all possible input variants are not enumerated in tests. Frequently, companies want to have confidence in their production code. Imagine this regex is deployed on a sign up flow, and its failures invisibly increase your churn rate. Can it happen with a hand crafted regex? Yes, of course. But I'd imagine it will happen even more frequently with an AI produced custom regex plus a person who doesn't actually understand regexes.
Programmers often feel invested in the success of their product but that's not what they're incentivized to do. They're incentivized to produce fast results that are bad in ways that you have to be a programmer to understand.
If you have to be a programmer to understand why something's bad, who's going to prevent it? This is a major unsolved problem in the structure and organization of working.
The CTO who used to be a dev. More generally anyone in management with a technical background. They may not exist in some companies, but that's not because it's "a major unsolved problem in the structure and organization of working", it's because the company sucks in that regard.
Except testers and users aren’t programmers. Code reviews are what’s supposed to catch this stuff, but it’s rare for a team lead or other programmer to investigate every single commit.
Thanks for the good example. The first time I tried OpenAI's assistant (before I had access to GitHub CoPilot) I wrote a comment in Javascript code that read something like "perform SPARQL query to find out where Bill Gates works" and code was generated using an appropriate library, the SPARQL (like SQL) query was correct, and the code worked. Blew my mind.
I installed the VSCode and PyCharm CoPilot plugins, and signed in with my GitHub account (you first need to request CoPilot access and wait until you get it).
As you type in comments or code, CoPilot will sometime autocomplete up to about 10 lines of code, based on the content of the file you are editing (maybe just the code close to the edit point?).
My other tools? I use LispWorks Professional, Emacs with Haskell support, and sometimes IntelliJ. I work a lot on remote servers so I depend on SSH/Mosh and tmux also.
For me both are true. The suggestions are usually annoying and wrong, however sometimes copilot makes an observation that is non-obvious, i.e it suggests code that makes me realize that I was about to write a subtle bug. I keep it on as a sanity check.
When you're on that curve, it's indistinguishable until you hit the plateau. We're in an era where AI is continuing to improve and has already surpassed a level that many people doubted was achievable. Nobody knows when that progress will plateau. It's entirely possible that we plateau _after_ surpassing human-level intelligence.
If you have actual data you can take the derivative of the curve and see you're on the S curve with a lot of confidence by the time you hit the middle and long before the plateau: https://miro.medium.com/max/700/1*6A3A_rt4YmumHusvTvVTxw.png
Exactly correct. We really don't know if the progress is exponential or like a Sigmoid squashing function. You just changed my opinion, a bit, on this.
The AI technology today has practical value for some use cases but it's basically just clever parlor tricks. There has been near zero discernable progress toward artificial general intelligence. We don't yet have a computer that can learn and make optimal resource usage decisions in an open world environment as well as a mouse. In most respects we're not even at the insect level yet.
Yes Excel has more features today. So what? We're still not making any measurable progress towards true AGI. We don't even know what the development path or ultimate goal looks like except for some vague hand waving about passing the Turing Test.
I work for a company that provides "ai" services in a "boring" domain. The Time saved and accuracy (for a legally audited result) we provide is quantifiable, and when I saw the numbers I was surprised. So while a lot of the "AI" hype is stupid marketing bs, some of us are actually doing real stuff, it's not just parlor tricks. You might also be surprised in how many domains a small improvement has huge yield in terms of raw money and/or quality of life improvements.
The steep part of a sigmoid curve DOES grow almost exponentially though. What's the point of saying this? It's like saying, "sure we are making great progress, but eventually the universe will come to an end and we'll all be dead." Who cares? Why not worry about that when we're there?
The point is that it is a completely different framework to think about the future.
The exponential model is something close than what is behind Ray Kurzweil reasoning about the Great Singularity and how the future will be completely different and we're all going to be gods or immortals or doomed or something dramatic in that vein.
The S-Curve is more boring, it means that the future of computing technology might not be that mind blowing after all, we might already have reaped most of the low-hanging fruits.
A bit like airplanes, or space tech you know, have you seen those improving by a 10x factor recently?
100% this. Exponential progress is only a thing if potential progress is infinite. If the potential progress is finite (hint: it is), eventually the rate of progress hit progressively diminishing returns.
The underlying rule here, in my opinion, is the law of diminishing returns. (log shaped curve)
AlphaZero is already capable of optimizing itself in the limited problem space of Chess.
Infinitely increasing the computing power of this system won't give it properties it does not already have, there is no singularity point to be found ahead.
And I am not sure that there are any singularities lying ahead in any other domains with the current approach of ML/AI.
And building on that, the real bottleneck in most domains isn't going to be computer power, it's going to be human understanding of how to curate data or tweak parameters.
We've already seen in games with simple rules and win conditions that giving computers data on what we think are good human games can make them perform worse than not giving them data. Most problems aren't possible for humans to ebncapsulate perfectly in a set of rules and win conditions to just leave the processing power to fill in the details, and whilst curating data and calibrating learning processes is an area we've improved hugely on to get where we are with ML, it's not something where human knowledge seems more likely to reach an inflection point than hit diminishing returns.
Amazing stuff for sure. Looking at the example on page 59, though, I certainly see a description that contains sufficient information to implement against. I read this, and then I jump back into the tech spec that I'm writing to find:
(1) The product specification and use cases are so poorly defined that I need to anticipate the use cases, design a system that is general enough to accommodate them, and implement it in a way that is easily changeable to accommodate the future departures from my assumptions.
(2) As I do this, I need to consider the existing systems that I'm building on top of to ensure there is no regression when this feature rolls out
(3) I consider the other teams that are doing similar work and make judgement calls about whether to write independent systems that do one thing each, or to collaborate on a general-enough system with multiple team ownership.
(4) The tech that I use to implement this must be within the narrow slice of company-sanctioned tech.
(5) I weigh constant tradeoffs on speed to market, maintainability and ownership.
I'm sure there's more, but this stuff is _hard_. If autonomous driving for white collar work is coming, as put forth by comments here, I'd like to see indications that the actual hard part of the job is in jeopardy of being executed effectively.
Maybe I don't want to believe it, so I can't see it. I'll grant that. But I truly do not see it.
I'm mostly with you for the immediate future. So even if cars driving themselves to our offices to write code for 8 hours is in the cards I'd be curious to hear more tactical informed guesses. What would an intermediate stage look like?
Would the demand for junior developers evaporate if more experiences people can 10x daily LoC productivity? Or the other way around? Would languages with higher level abstractions (e.g. comparable to Scala if not Haskell) win or something like JS dominate?
I debated where in the whole thread, if anywhere, to comment that it is unfair to call it a dog. It is unfair to dogs, and misleading to any human trying to form useful abstractions about it. It lacks any of the dog's general intelligence and social skill which are integral to how we humans think about the sophistication of a dog and understand its adaptability and utility. To me, it is about as accurate as characterizing a mannequin as a monkey with mediocre fashion skills.
But, your post makes me realize that a job could be threatened by an AI just superficially emulating a worker, because its existence might exploit the social vulnerabilities of the workplace, in spite of having no general intelligence nor exploitative social skills of its own.
A lot of people who are skeptical about AI progress call it "statistical modeling" and point to large data sets involved and large amounts of hardware thrown at it. Sort of implying it's some sort of a brute-force trick.
I'm afraid they do not understand the size of problem/solution set. Suppose problem and solution are 1000 characters long and there's a set of 32 characters. Then a model is function F: X -> X where X is a set of 2^5000 elements. There are 2^10000 such functions, and the goal of a training process is to find the best one.
Training set for a language model would be well under 1 PB. So a task here is to use a training set of size 2^50 to find a function from 2^10000 space.
It's obvious that no amount of brute-forcing can possibly find this. An no classic mathematical statistical modeling can possibly help. A problem like this can be only approached via ANNs trained through backpropagation, and only because this training process is known to generalize.
We are still inferring a best fit based on available data, so “statistical modelling” is way more descriptive of what we are doing the “intelligence”. That doesn’t mean it isn’t impressive. These non-classical statistical inference are going beyond what we have been able to do with classic statistics, it is indeed impressive.
Personally I only grow skeptical when claims are made about some vague Artificial General Intelligence or some other magic which no amount of inference is gonna give us.
Given that NN components are demonstrated to have universal computation properties, it is more like deriving a program from training data. Technically a program here is a 'curve' but it's insanely complex one.
Do you consider programming math? Since technically you write a function which deals with 0s and 1s.
There is no need to reduce programing to math (even though you can technically do so) just as there is no need to reduce ‘talking’ to math just because speech can be represented with numbers. In fact you can represent most things (probably all things) with numbers.
I do think it is interesting philosophically how much powerful machine learning algorithms are over the traditional linear models. And perhaps it is a good description to say that “best fit program” is inferred. However these algorithms are still limited to inference. And—while useful—calling it intelligent is not very accurate—at least in the way most people use the word—no matter how complected of a construct they are inferring.
I think diff inference might give us that. Basically kinda like dreaming(generating) other samples and checking how they are different. This will help understand how thing is defined but maybe it works for all kind of things.
Obviously it's tough to find such algorithm that works for all types of data/embeddings.
I would think of it as fitting a function which is over-parameterised. So there is intrinsically a model of how things are thought to behave, it’s just a relatively simple one that is over-parameterised so it captures patterns in the data. Get it wrong and it’s over-fit, get it wrong another way and it doesn’t generalise, etc. There can also be an internal representation which is not easily interpretable, an internal representation which is usually of a dimensionality much lower than the number of parameters.
It is in this sense that it is a brute-force approach because we should like to know the underlying model, but instead we can throw a hugely over-parameterised but relatively simple model at a problem and it will learn (statistically) the underlying phenomenon. Like you say it does much better than a combinatorial brute-force approach.
As someone who is skeptical, but open minded, about the impact these technologies will have on practical programming I think I'm one of the "people" in "people are complaining..." The article makes some assumptions about what such people think that certainly aren't true for me:
1. That we are unimpressed.
I'm gobsmacked.
2. That we don't think these are significant advances.
They're obviously huge advances.
3. That we don't think these models will have practical applications.
It's hard to imagine they won't.
4. That we think these systems are rubbish because they get things wrong.
I'm a programmer. I make mistakes all the time.
Having countered those views, the article then seems to imply that it follows that "we’ve now entered a world where 'programming' will look different." As someone who makes a living writing software I obviously have an interest in knowing whether that's true. I don't see much evidence of it yet.
These systems are certainly not (yet) capable of replacing a human programmer altogether, and whether they could ever do so is unknown. I'm interested in the implications of the technologies that have been developed so far - i.e. with the claim that "we've now entered a world..." So the question is about how useful these systems can be for human programmers, as tools.
The reason I'm skeptical of it is that the only model I've seen so far for such tooling is "have the machine generate code and have the human review it, select from candidate solutions and fix bugs". The problem is that doing so is, I expect, harder for the human than writing the code in the first place. I've mentioned this concern several times and not seen anybody even attempt to explain to me why I'm wrong about it. For example, at [1] I pointed out why some generated solutions for a particular problem would have only made my job harder and got accused of "screaming at a child for imperfect grammar."
Reviewing and fixing code is harder than writing it. Please explain why I'm wrong about that (it's certainly true for me, but maybe most people don't feel that way?), why it won't be a problem in practice or what planned applications there are for these technologies that would avoid the problem.
Please don't accuse me of cruelty to dogs or children.
> Post: I realize that it “only” solves about a third of the contest problems, making it similar to a mediocre human programmer.
If this is what academia perceives software development to be then it's no wonder we have software that is so disconnected from the human problems it aims to solve.
Programmers don't routinely [re-]invent complex algorithms. We parse complex and contradictory requirements from humans, and compile them into (hopefully) simple solutions on a computer.
The solutions to "competition programming" problems are a Google search away. If you want to take it up as a professional sport then, sure, AI might just replace you (as it already has done with many other mind sports such as chess).
I agree that reviewing code is much much harder than writing something that works. Hasn't anyone here written what they thought was a smart, compact solution or feature, only to have it ripped to shreds during a code review?
I have worked in large decade-old codebases and sometimes the code is truly puzzling. Usually, an old-timer helps out when you run into a block and explains design decisions taken by people who moved on. This is the crucial factor that determines how long a task takes. A task could be resolved in a few hours with the help of a senior engineer, which otherwise could take days to weeks.
All said, I think simply generating the code will not be enough to replace programmers. Now, if an AI can generate code AND explain queries about logic, syntax, etc, the game is over.
> have the machine generate code and have the human review it
The best characterization of such massive language models I've seen is "world class bullshitters": they are very good at producing output that superficially looks plausible, but neither know nor care whether what they're saying has any relationship to the truth.
This is virtually guaranteed to make such code reviews very frustrating, and of course AlphaCode & co have no capability of explaining why they wrote a certain line. I can't see this having much of a role in a high quality code base, but I suspect the sort of management which currently would offshore coding to the lowest bidder would be quite enamored of it.
I think using these tools might become a science or an art form in its own right. You'll have to give these tools the input they need to produce the most useful answers to you. In the short term at least, this is not going to take away your need to think. But it might change how you think, and it might make you more productive when your problem aligns well with these tools.
I think that you are correct, we will see AI used as a copilot for artists and content creators. I have had access to the OpenAI APIs for GPT-3 since last summer, and in addition to using this for NLP tasks, I have also experimented with using GPT-3 to help me work on a sci-fi short book that I have been occasionally writing for years. It has been useful for expanding sections that I have been stuck on. I feel that I should copyright my story as being authored by The Internet since every content creator whose text, programming listings, etc. that goes into training these language models has in some sense been a contributor to my short story.
You know how you can break writing down into 2 phases: creative and edit?
What if you did the creative part, braindumping all your ideas and vision into a word document. Then had an AI edit your incoherent story into engaging prose.
I find the creative part easy, but the editing part tedious.
Like, a doctor I know asked me to write a review for him. I did the braindump part, but have been procrastinating on the edit part.
Unfortunately, GPT-3, CoPilot, AlphaCode, etc., also excel more at the creative part (helped out by their enormous training database, which saves you from doing the Google search to match your high-level description to examples), but they're still dogshit at editing, because that is the part that actually requires a detailed understanding of the text. So the princess is still in the next castle.
Hmm it intuitively seems like the editing part should be the easy part, because the output is basically a template, and you just need to fill it in, and tweak it a little bit…
I get with code it’s a bit less so, but still, there are plenty of design patterns which basically cover a lot of the output space…
The context window of GPT-3 is vanishingly small compared to any novel-like work. GPT-3 will happily invent descriptions of characters and objects that contradict what it suggested three pages ago.
Not saying it can't help you with getting an idea what to describe, but this is the kind of "editing" that it won't be good at.
Potentially, but that doesn't address the problem of replacing writing code with the (harder) process of reading, verifying and fixing it. However well I engineer my inputs I'll still have to review, verify and fix the outputs.
This problem has been (albeit imperfectly) addressed in speech recognition. When error corrections are made through the the UI the engine can learn from those corrections. Presumably over time the corrections needed in alphacode will become more semantic than syntactical. But you're right, correcting subtly flawed code or text is way harder than writing from scratch.
One of the most difficult problems in software development in general is coming up with the right test cases that cover the real-world domain's needs. If you have an AI like this at your disposal, that you can throw test cases at and it will give you the stupidest piece of code that makes them pass, then at the least you have something that helps you iterate on your test cases a lot more effectively, which would be a great boon.
AlphaCode is a piece of academic research. It's about demonstrating a possibility of using a particular type of a model to solve a particular type of tasks. It's not about making a practical tool.
People can certainly take this approach and build tools. That might take different shapes and forms. There are many ways these models can be specialized, fine-tuned, combined with other approaches, etc.
For example, somebody can try to apply them to a narrow use case, e.g. generate front-end code in React/JS and CSS from a backend API. They can fine-tune it on best example of React code, add a way to signal uncertainty so that specification can be clarified, etc.
Nobody expects these models to be able to write an entire OS kernel any time soon. But a lot of types of programming are far more regular, repetitive and can be verified a lot easier.
>The problem is that doing so is, I expect, harder for the human than writing the code in the first place.
Programming is mostly about writing boilerplate code using well-known architectural patterns and technologies, nothing extraordinary but which takes time (at least in my experience). If I can describe a project in a few abstract words, and the AI generates the rest, it can considerably improve my productivity, and I don't think it's going to be harder to review than code written by a junior dev who also makes mistakes and doesn't usually get it right on the first try anyway (given the AI is pretrained to know what kind of architecture we prefer). I can envision a future where programmers are basically AI operators who iterate on the requirements with the client/stakeholders and let AI do the rest. It looks like we're almost there (with GitHub Copilot and all), and I think it's enough to "revolutionize" the industry, because it changes the way we approach problems and makes us far more productive with less effort.
If a junior dev writes truly head-scratching code, you could ping that person and ask why they wrote this line a certain way, as opposed to a more straight-forward way. Correct me if I'm wrong but you can't ask an ML model to do that (yet).
True. But is it really important "why"? I think what's more important is whether we can correct AI's output in a way that makes it learn to avoid writing similar head-scratching code in the future.
Unless the AI can fix the bug itself, a human is going to have to explore the AI-generated code. The AI cannot help you here and you are going to have to figure out the "why" all by yourself. It will become incredibly easy to generate code with prompts but fixing any problem is going to be a pain. The "Tech Debt" would be astronomical.
In ten years ML models may be "thinking" the same thing about us. Our "more understandable" code is full of inefficiencies and bugs and the bots wonder why we can't see this.
> Reviewing and fixing code is harder than writing it. Please explain why I'm wrong about that (it's certainly true for me, but maybe most people don't feel that way?)
At the very least, I don't think it being true for you (or me) means much. Presumably like me, you learned primarily to write code and have a lot of experience with it, and have gotten better and better at that with time. I assume that if I had spent as much time and effort learning to efficiently review and fix code as I have spent learning to write it that it would be much easier.
I certainly know of artists who have shipped video games they authored alone, who wouldn't call themselves programmers or know where to start programming something from scratch, who got by with a high level engine and copy and pasting starting points from the internet. (I would call them programmers, but their style is much closer to review and edit than my own.)
You raise an interesting point about the difficulty of code writing versus code review, and I think the point generalizes nicely to many other areas where AI is currently rapidly progressing.
For example, self-driving cars are also currently in this weird place where they're prone to making lots of errors, so a simple task for the human (driving) is replaced with a much more taxing one (keeping an eye on an AI driver and being ready to react at any moment), if we want to do it safely.
Whether any of those applications will be able to progress past that weird space into "no/minimal supervision needed" remains an open question.
I think this type of model will have a massive impact on the software industry. 99% of programming tasks in the wild don't involve any kind of algorithmic design, but are more like making a CRUD pattern, writing SQL queries etc. This kind of work is easier to automate but more difficult to source the training data. If and when these models are applied to more mundane problems, I'd expect immediately better performance and utility.
We're also in the very very early days of code generation models. Even I can see some ways to improve AlphaCode:
- the generate->cluster->test process feels like a form of manual feature engineering. This meta layer should be learned as well, possibly with RL
- programming is inherently compositional. Ideally it should perform the generate->cluster->test step for each function and hierarchically build up the whole program, instead of in a single step as it does now
- source code is really meant for humans to read. The canonical form of software is more like the object code produced by the compiler. You can probably just produce this directly
It's interesting that AI is being aggressively applied to areas where AI practitioners are domain experts. Think programming, data analysis etc.
We programmers and data scientists might find ourselves among the first half of knowledge workers to be replaced and not among the last as we previously thought.
Compilers didn't replace any jobs, they created more. Similarly, this type of AI-assisted programming will allow more people to program and make existing programmers more productive.
I was thinking over a really long time period. There is at least 20-30 more years of general purpose programming being a highly sought after skill. But with time most programming is going to be done by AI that is directed by domain experts.
In my view this type of system will only be usable by Real Computer Scientists and will completely kill off the workaday hacker. Think of all the people who bitterly complain that a C++ compiler does something unexpected under the banner of UB. That crowd cannot cope with a world in which you have to exactly describe your requirements to an AI. It is also analogous to TDD, so all the TDD haters, which is the overwhelming majority of hackers, are toast.
You can write code that is valid as in "can be compiled" but outside of C++ standard. It is duty of programmer to not have those, as compiler usually assumes that there's no UB in your code and can do unintuitive things with optimizations.
e.g
int foo(int8_t x) {
x += 120
return x;
}
int bar(int8_t y) {
int z = foo(y);
if (y > 8) {
do_important_thing(z);
}
}
`do_important_thing` may be optimized out because:
1. signed overflow is a UB. Compiler than assumes that everything passed to foo is less than 8;
To be pedantic, C has no 8- or 16-bit addition operators, since everything sub-int is scaled up to int to do arithmetic. Therefore, the `x += 120;` line never overflows, since it is actually `x = (int8_t)((int)x + 120);`, and the possible range of `(int)x + 120` is comfortably within the range of expressible ints, while the conversion to int8_t is defined to wrap around when oversized. So there compiler can't optimize out do_important_thing in your example.
Instead of semantically correct Python, programmer and data scientists’ jobs will be to work in semantically correct English. Fundamentally the job won’t change (you’ll be programming the AI rather than program the machine directly).
> source code is really meant for humans to read. The canonical form of software is more like the object code produced by the compiler. You can probably just produce this directly
They key advantage of producing source code is that you can usually tell what the produced program does.
I think the validation phase of auto-coding fullblown apps is much more complex than AutoCode is ready for. When coding up a specific function, it's pretty easy to assess whether it maps input to output as intended. But composing functions into modules is much harder to validate, much less entire programs.
And specifying a full app to be autocoded is most certainly NOT a solved problem.
Until AutoCode can build an app that employs compound AND complex behaviors, like Angry Birds, or a browser, I'll continue to see it as little more than a write-only copy/paste/derive-driven macro generator.
Reading this I’m reminded of the debates around ORMs. At a basic level they drastically simplify your CRUD app. Until they make trivial errors no self-respecting programmer would (think N+1 queries), and then you need someone who actually understands what’s going on to fix it.
That doesn’t mean you shouldn’t ever use ORMs, or that in simple cases they aren’t “good enough”. But at some level of complexity it breaks down.
AI-assisted programming is the new leaky abstraction.
Because the whole thing is, a dog’s abstract mental capabilities are far below a human — thus why it would be ASTOUNDING that a dog could master even a primitive form of speaking English.
On the other hand, here we are brute forcing a solution from analyzing millions of man-years of published English speech, by using a huge array of computing power to precompute various answers and sift them.
It is a bit like “solving checkers” and then claiming “wow, a dog could have done this”. It is like making a cheat sheet for a test by analyzing all answers ever given, and then summarizing them in a vector of 102827 dimensions, and claiming that it is the same as coming up with clever and relevant one liners on the spot using the brain of a dog.
Excuse my presumption, but it seems that you arrive at the logical conclusion "it might be possible to simulate intelligence with a sufficiently big cheat sheet" - and then you disregard it because you're uncomfortable with it. We already know this is the case for specialized environments, so the "only" question left is how far does this generalize.
In my opinion, more ridiculous claims have already been proven by science (for example Quantum Mechanics).
Also you have to make a distinction between the optimizing process (evolution/training neural nets) and the intelligent agent itself (human/machine intelligence).
I don’t disregard it. It isn’t about discomfort. In fact, I think that “solving checkers” is very useful, if your goal is to get the highest quality answers in checkers.
The problem I have is comparing that to having a dog speak English. It’s totally wrong. You had access to all these computing resources, and the sum total of millions of work by humans. You didn’t bootstrap from nothing like AlphaZero did, but just remixed all possible interesting combinations, then selected the ones you liked. And you try to compare this “top down” approach to a bottom-up one?
The top down approach may give BETTER answers and be MORE intelligent. But the way it arrives at this is far less impressive. In fact, it would be rather expected.
The vocal apparatus is not there, but there is certainly more cognition than what people think dogs have (there's a question that I wonder if language enables thought or if thought enables language)
I'm not so certain. Seems like the owner is doing a lot of work to make sense out of those utterances. I'd like to see Bunny say what he's about to do, then do it. Or watch his owner do something with something, then describe it.
edit: or just have a conversation of any kind longer than a 2 minute video. Or one without the owner in the room, where she talks back with the dog using the same board. That would at least be amenable to Turing.
edit2: here's a test - put headphones on the owner and block her vision of the board. Sometimes pipe in the actual buttons the dog is pressing, other times pipe in arbitrary words. See if the owner makes sense of the random words.
I'm trying, but I don't see it at all with these examples.
1) just seemed like random pressing until the dog pressed "paw", then the owner repeated loudly "something in your paw?" The dog presented its paw, then the owner decided "hurt" "stranger" "paw" was some sort of splinter she found there. The dog wasn't even limping.
2) I didn't get any sense of the presses relating to anything the dog was doing, and since the owner was repeating loudly the thing she wanted the dog to find, I was a bit surprised. Then the dog presses "sound," the owner connects this with a sound I can't hear, then they go outside to look for something I can't see.
Billie the Cat: I simply saw no connection between the button presses and anything the cat did. The cat pressed "outside" but didn't want to go outside. The cat presses "ouch noise" and the owner asks if a sound I didn't hear hurt her ears. Then the cat presses "pets" and the owner asks if the cat wants a pet? The cat presses "noise" and the owner continues the monologue apologizing for the painful noise and offering to buy her cat a pet. Sorry to recount most of the thing, but I don't get it at all.
-----
Not trying to debunk talking pets, but I'm not seeing anything here. I at least expected the dog to be smart enough to press particular buttons for particular things, but I suspect the buttons are too close together for it to reliably distinguish them from each other. I'd be pretty easy to convince that you could teach a dog to press a button to go outside, a different button when they wanted a treat, and a different button when they wanted their belly rubbed. In fact I'd be tough to convince that you couldn't teach a dog to do that. Whatever's being claimed here, however, I'm not seeing.
-----
edit: to add a little more, I'm not even sure that *I* could reliably do what they're claiming the dog is doing. To remember which button is which without being able to read is like touch typing, but worse because the buttons seem to be mounted on re-arrangeable puzzle pieces. Maybe I could associate the color of those pieces with words, but that would only cover the center button on each piece.
If a dog were using specific buttons for language (or if I were doing the same thing) I'd expect the dog to press a lot of buttons, until he heard the sound he was looking for, then to press that button over and over. Not to just walk straight to a button and press.
I think the cat just presses the buttons when it wants the owner to come, and presses them again when the owner says something high pitched at the end and looks at it in expectation.
I saw a lot of videos about this Bunny dog on tiktok, but discarded it as a gimmick, not believing it's real. Your comment motivated me to look into it more (30 seconds of time).
This NYT article at least does not discredit it [0]. Have you looked more into it? Do you think it would be useful to train your dog to do it?
With my cat (I've got the buttons, haven't done anything with them yet) its would be useful to find out if he wants food, attention, or is complaining about the water bowl or litter box.
Even being able to distinguish those would be a "win".
Current machine learning models have around ~100B parameter, human brain has ~100T synapses. Assuming one DNN parameter is equivalent to 1 synapse, then the biggest models are still 1000 times smaller than human brain.
Cat or dog would have around ~10T synapses.
AlphaCode has ~50B parameters, that is 20 times less than number of synapses in a mouse brain ~1T. Honey bee has ~1B synapses.
So AlphaCode would be somewhere between a honey bee and a domestic mouse.
I like to see how AlphaCode will solve a problem no human has solved before (or very unlikely).
For example, given 3 beads and 2 stacks for the 10er and 1er positions of a number. One can make 4 different numbers when stack all the beads like in an abacus. Without using all the beads, one can of course make more numbers. The question is how many different numbers one can make using n beads, giving full and partial usage of the beads.
It's indeed a very simple problem for 5-7 year olds and I hardly think anyone cannot solve it. Nevertheless, I seriously doubt AlphaCode can solve the problem. What does it say about its supposed Intelligence?
Because the backspace question (essentially: is T a subsequence of S with a deletion size of N?) probably occurs hundreds of times, in one form or another, within AlphaCode's training corpus.
Any leetcode grinder can tell you there are a few dozen types of competitive programming problem (monostack, breadth-first state search, binary search over solution space, etc.) so solutions to new problems are often very similar to solutions for old problems.
The training corpus for these code transformers is so large that almost all evaluation involves asking them to generate code from the training corpus.
To evaluate CoPilot, we should ask questions that are unusual enough they can't be answered through regurgitation of the training corpus.
What does CoPilot generate, given this prompt:
// A Go function to set the middle six bits of an unsigned 64-bit integer to 1.
> To evaluate CoPilot, we should ask questions that are unusual enough they can't be answered through regurgitation of the training corpus.
Exactly! It's great that Copilot can generate correct code for a given question, but
we cannot gauge its full capability unless we try it on a range of different questions, especially ones that are not found in the training data.
I mentioned this in the other AlphaCode post: It would be nice to know how "unusual" a given question is. Maybe an exact replica exists in the training data, or a solution exists but in a different programming language, or a solution can be constructed by combining two samples from the training data.
Quantifying the "unusual-ness" of a question will make it easier to gauge the capability of models like AlphaCode. I wrote a simple metric that uses nearest neighbors (https://arxiv.org/abs/2109.12075). There are also other tools to do this: conformal predictors, which are used in classification methods, and the RETRO transformer (https://arxiv.org/pdf/2112.04426.pdf) has a calculation for the effect of "dataset leakage".
> It does not seem to be copying from the training data in any meaningful way.
My point is, I would like to verify this claim with different metrics, because we probably have different interpretations of the word "meaningful".
AlphaCode measures similarity between programs via longest common substrings. That's better than nothing, but that would mean two programs that differ only in variable naming would not be considered similar. If two programs differed only in the names of the variables/functions, I would consider that copying.
I think there are better comparisons of structural similarity: compare the ASTs, or the bytecode/assembly code generated, the control flow graphs, or perform SSA and compare the blocks generated. Each of these might have weaknesses as well, but they won't be as obvious as variable renaming, and so we'd get a better idea of what AlphaCode is copying, and therefore a better idea of its full capabilities.
I expect AlphaCode performs well on Python because the training data is dominated by Python, but Python isn't ideal for comparing program structure. I wonder which programming language (given enough training data) would be best suited for language model generation and analysis.
Sure, I think this would be a really interesting study to do! I have been meaning to scrape a big chunk of GitHub in order to do this kind of analysis. I think I would be surprised (just based on my own use of Codex and Copilot) if they were copying at the level of "same code but renamed variables" either. Past that I think comparisons get pretty difficult, and to some degree I would actually be more impressed if it understood enough about programs to be able to mimic higher-level structure without just memorizing text.
Why do you think Python isn't good for comparing program structure? Certainly you could compare ASTs and bytecode pretty easy; I think it's actually much easier to do so than in C/C++ since you don't have to deal with preprocessor junk and compile flags influencing the meaning of the code. There's less available for classical analyses like data and control flow analysis, in part because those are much harder in dynamic languages, but there are some tools out there like the ones used in PyPy for getting SSA CFGs [1].
I feel like many people are equivocating a bit on what they mean by "regurgitating" here. It seems clear that the models have at best a shaky grasp on code semantics (e.g., they are bad at things like predicting what the output of some code will be: https://arxiv.org/abs/2112.00114), and struggle with problems that are very different from anything they've seen before.
> I would actually be more impressed if it understood enough about programs to be able to mimic higher-level structure without just memorizing text.
Agreed.
> Why do you think Python isn't good for comparing program structure?
From recent experience I prefer control flow analysis, or something that results in a graph structure. As you said, that's harder with dynamic languages. I also think some Python features (english-like syntax, f-strings, division converts int to float, whitespace indentation, loops vs generator expressions) make structural comparisons messy, but that may just be bias.
The ideal language would be one with minimal syntax, where we can target a decent range of programs, and obtain as much info as possible about program structure without actually running the program. I've come across LISP-without-macros in Dreamcoder (https://arxiv.org/abs/2006.08381), BF++ (https://arxiv.org/abs/2101.09571), and a couple of others which I can't remember right now. I think the APL family (APL/J/K) would interesting because fewer characters to generate, but each character has a lot of meaning.
Right now I'm looking at flow-based programming (FBP) for this: In FBP the control flow is explicit - the program code describes a directed acyclic graph (DAG), so comparing program structure becomes straightforward (subgraph isomorphism with some heuristics). I'm writing a toy FBP language that draws images (https://github.com/mayahq/flatland), with which I aim to test what these models understand.
CoPilot is helpless if it needs to do more than just regurgitate someone else's code.
The training of these models on GitHub, so they regurgitate licensed code without attribution, is the greatest theft of intellectual property in the history of Man. Perhaps not according to the letter of the law, but surely according to the spirit.
I like CoPilot's answer better than yours, and I think it's closer to what most people would do; clearly 0x3F is the wrong constant but the approach is good.
For fun I rephrased the prompt a little. "Middle bits" is kind of vague; when provided an explicit description of which bits you want to set it does fine:
Prompt:
// Function to set bits 29-34 in a uint64 to 1
func setbits (uint64 x) uint64 {
Middle bits is not ambiguous, but CoPilot hasn't seen code for that phrase in its training so it has nothing to regurgitate.
You spelled out exactly what to do, in term of what it has seen in its training, and it was able to regurgitate a solution.
By asking question that require mathematical reasoning or are too far from the training corpus, I can create an endless list of simple problems that CoPilot can't solve.
Look at my comment history to see another one (swapping bits).
It would be interesting to know how much of this improvement in the last 25 since he was a student comes from Moore's law, other hardware improvements, various new technologies not related to AI, amount of money being thrown at the problem... and how much of it are advancements in our understanding of AI.
In this work, we argue that algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time in the compute needed to reach past capabilities. We show that the number of floating-point operations required to train a classifier to AlexNet-level performance on ImageNet has decreased by a factor of 44x between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years. By contrast, Moore's Law would only have yielded an 11x cost improvement. We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. (...) This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability.
500x improvement over ~10 years since DQN that roughly 2x improvement in sample complexity every year.
We compare the impact of hardware advancement and algorithm advancement for SAT solving over the last two decades. In particular, we compare 20-year-old SAT-solvers on new computer hardware with modern SAT-solvers on 20-year-old hardware. Our findings show that the progress on the algorithmic side has at least as much impact as the progress on the hardware side.
AI research has also tiny budgets compared to the biggest scientific projects:
Where did you get the $10M figure for GPT-3? That sounds awfully cheap considering the cost of compute alone: one estimate was $4.6M for a single training run [0], while other sources [1] put it at $12M per run. I highly doubt that OpenAI nailed the training process right on the second or even first go respectively (according to your figure).
So even conservative estimates put the compute cost alone at least one order of magnitude higher than your figure of $10M.
A recent paper shows that empowering a language model to search a text corpus (or the internet) for additional information could improve model efficiency by 25 times [1]. So you only need a small model because you can consult the text to get trivia.
That's 25x in one go. Maybe we have the chance to run GPT-4 ourselves and not need 20 GPU cards and a 1mil $ computer.
I love this take. Most AI results provoke a torrent of articles listing pratfalls that prove it's not AGI. Of course it's not AGI! But it is as unexpected as a talking dog. Take a second to be amazed, at least amused. Then read how they did it and think about how to do better.
I mean I'm not so impressed, because it seems like someones figured out the ventriloquist trick and and is just spamming it to make anything talk. Its fun enough, but unclear what this is achieving
I guess there is a bunch of data hiding behind the curtains, and there is human feeding the data to the model. However I don’t agree with GP here as a ventriloquist’s dummy is not doing anything a human can’t. A well trained language model can produce an output in seconds what it takes human weeks to.
This metaphor doesn't do justice to parrots or language models. Parrots only speak a phrase or two. LMs can write full essays.
On the other hand parrots can act in the environment, LMs are isolated from the world and society. So a parrot has a chance to test its ideas out, but LMs don't.
Marketing people love to make false claims, setting crazy expectations. Increased competition encourages these small lies, and sometimes even academic fraud.
I agree. The talking dog analogy deflates those claims while still pointing out what is unique and worth following up on about the results.
Meanwhile, the chorus of "look this AI still makes dumb mistakes and is not AGI" takes has gotten louder in many circles than the marketing drumbeat. It risks drowning out actual progress and persuading sensitive researchers to ignore meaningful ML results, which will result in a less representative ML community going forward.
It's rarely productive to take internet criticism into account, but it feels like AI is an especially strong instance of this. It seems like a lot of folks just want to pooh pooh any possible outcome. I'm not sure why this is. Possibly because of animosity toward big tech, given big tech is driving a lot of the research and practical implementation in this area?
> It seems like a lot of folks just want to pooh pooh any possible outcome. I'm not sure why this is. Possibly because of animosity toward big tech
It's much more simple, and more deep: most humans believe they are special/unique/non-machine like spiritual beings. Anything suggesting they could be as simple as the result of mechanical matrix multiplications is deeply disturbing and unacceptable.
There is a rich recent anti-AGI literature written by philosophy people which basically boils down to this: "a machine could never be as meaningful and creative as I am, because I am human, while the AGI is just operations on bits".
though at the same time, and in the same population, the existence of other planets full of conscious beings was basically non-controversial:
Life, as it exists on Earth in the form of men, animals and plants, is to be found, let us suppose in a high form in the solar and stellar regions.
Rather than think that so many stars and parts of the heavens are uninhabited and that this earth of ours alone is peopled—and that with beings perhaps of an inferior type—we will suppose that in every region there are inhabitants, differing in nature by rank and all owing their origin to God, who is the center and circumference of all stellar regions.
Of the inhabitants then of worlds other than our own we can know still less having no standards by which to appraise them.
Has it been over hyped? Some ML created in the last 8 years is in most major products now. It has been transformative even if you don’t see it, is informing most things you use. We’re not close to AGI but I’ve never heard an actual researcher make that claim or the orgs they work for. They just consistently show for the tasks they pick they beat most baselines and in a lot of cases humans. The models just don’t generalize but with transformers we’re able to get them to perform above baselines for multiple problems and that’s the excitement. I’m not sure who has overhyped it for you but it’s delivering in line with my expectations ever since the first breakthrough in 2013/2014 that let neural nets actually work.
It's just that the day to day instances of "AI" that you might run into are nowhere near the level of hype they initially got. For instance all kinds of voice assistants are just DUMB. Like, so so bad they actively put people off using them, with countless examples of them failing at even the most basic queries. And the instances where they feel smart it looks like it's only because you actually hit a magic passphrase that someone hardcoded in.
My point is - if you don't actually work with state of the art AI research, then yeah, it's easy to see it as nothing more than overhyped garbage, because that's exactly what's being sold to regular consumers.
I agree about the assistants that they are not as much as I would expect but also there are self driving cars heavily using a.i. even at the current state I am personally impressed or indirectly we get the help during pandemic for protein folding/ mRNA vaccine development [1] , I also remember a completed competition for the speeding up the delivery of cold storage mRNA vaccines to quickly figure out which ones could fail
A. Most people still think Google search is good. B. Unless you work for Google specifically on that search team I'm going to say you don't know what you're talking about. So we can safely throw that point away.
I've implemented a natural language search using bleeding edge work, the results I can assure you are impressive.
Everything from route planning to spam filtering has seen major upgrades thanks to ML in the last 8 years. Someone mentioned the zoom backgrounds, besides that image generation and the field of image processing in general. Document classification, translation. Recommendations. Malware detection, code completion. I could go on.
No one promised me AGI so idk what you're on about and that certainly wasn't the promise billed to me when things thawed out this time but the results have pretty undeniably changed a lot of tech we use.
Why would you discount someone who has been measuring relevancy of search results and only accept information from a group of people who don't use the system? You are making the mistake of identifying the wrong group as experts.
You may have implemented something that impressed you but when you move that solution into real use were other's as impressed?
That's what is probably happening with the google search team. A lot of impressive demos, pats on the back, metrics being met but it falls apart in production.
Most people don't think Google's search is good. Most people on Google's team probably think it's better than ever. Those are two different groups.
Spam filtering may have had upgrades but it is not really better for it and in many cases worse.
One of Deepmind's goals is AGI, so it is tempting to evaluate their publications for progress towards AGI. Problem is, how do you evaluate progress towards AGI?
"Our long term aim is to solve intelligence, developing more general and capable problem-solving systems, known as artificial general intelligence (AGI)."
AGI is a real problem but the proposed pace is marketing fluff -- on the ground they're just doing good work and moving our baselines incrementally. If a new technique for let's say document translation is 20% cheaper/easier to build and 15% more effective that is a breakthrough. It is not a glamorous world redefining breakthrough but progress is more often than not incremental. I'd say more so than the big eureka moments.
Dipping into my own speculation, to your point about how to measure, between our (humanity's) superiority complex and with how we move the baselines right now I don't know if people will acknowledge AGI if and until it's far superior to us. If even an average adult level intelligence is produced I see a bunch of people just treating it poorly and telling the researchers that it's not good enough.
Edit: And maybe I should amend my original statement to say I've never heard a researcher promise me about AGI. That said that statement from DeepMind doesn't really promise anything other than they're working towards it.
If we are going to start saying "but it hasn't achieved X yet when Y said it would" as a way to classify a field as overhyped then I don't know what even remains.
I mean, Zoom can change your background of video in real time, and people all over the world do so every day. This was an unimaginable breakthrough 10 years ago.
This is sort of the interesting thing with AI. It's a moving target. Every time when an AI problem gets cracked, it's "yea but that's not really AI, just a stupid hack".
Take autonomous cars. Sure, Musk is over-hyping, but we are making progress.
I imagine it will go something like:
Step 1) support for drivers (anti sleep or colision).. done?
Step 2) autonomous driving in one area, perfect conditions, using expensive sensors
Step n) gradual iteration removes those qualifications one by one
.. yes, it will take 10/20 years before cars can drive autonomously in chaotic conditions such as "centre of Paris in the rain". But at each of those steps value is created, and at each step people will say "yea but..".
ML is most certainly AI. I had a visceral feeling you'd respond with this. Sorry but what ever magic you have in your head isn't AI -- this is real AI and you're moving goal posts like alot of people tend to do.
You have single cell organisms which are able to sense their nearby surroundings and make a choice based on the input - they can differentiate food from other materials and know how to move towards it. They are a system which can process complex input and make a decision based on that input. Yet you wouldn't call a basic single cell organism intelligent in any way. The term usually used is that it's simply a biochemical reaction that makes them process the input and make a choice, but you wouldn't call it intelligence and in fact no biologist ever would.
I feel the same principle should apply to software - yes, you've built a mathematical model which can take input and make a decision based on the internal algorithms, if you trained it to detect background in video then that's what it will do.
But it's not intelligence. It's no different than the bacteria deciding what to eat because certain biological receptors were triggered. I think calling it intelligent is one of the biggest lies IT professionals tell themselves and others.
That's not to say the technology isn't impressive - it certainly is. But it's not AI in my opinion.
> We’re not close to AGI but I’ve never heard an actual researcher make that claim or the orgs they work for.
The fact that the researchers were clear about that doesn't absolve the marketing department, CEOs, journalists and pundits from their BS claims that we're doing something like AGI.
> The machine learning techniques that were developed and enhanced during the last decade are not magical, like any other machines/software.
You might be using a different definition of "magical" than what others are using in this context.
Of course, when you break down ML techniques, it's all just math running on FETs. So no, it's not extra-dimensional hocus pocus, but absolutely nobody is using that particular definition.
We've seen unexpected superhuman performance from ML, and in many cases, it's been inscrutable to the observer as to how that performance was achieved.
Think move 37 in game #2 of Lee Sedol vs. AlphaGo. This move was shocking to observers, in that it appeared to be "bad", but was ultimately part of a winning strategy for AlphaGo. And this was all done in the backdrop of sudden superhuman performance in a problem domain that was "safe from ML".
When people use the term "magic" in this context, think of "Any sufficiently advanced technology is indistinguishable from magic" mixed with the awe of seeing a machine do something unexpected.
And don't forget, the human brain is just a lump of matter that consumes only 20W of energy to achieve what it does. No magic here either, just physics. Synthetically replicating (and completely surpassing) its functionality is a question of "when", not "if".
Was Go ever "safe from ML" as opposed to "[then] state of the art can't even play Go without a handicap"? Seems like exactly the sort of thing ML should be good at; approximating Nash equilibrium responses in a perfect information game with a big search space (and humans setting a low bar as we're nowhere near finding an algorithmic or brute force solution). Is it really magical that computers running enough simulations exposes limitations to human Go theory (arguably one interesting lesson was that humans were so bad at playing that AlphaGoZero was better off not having its dataset biased by curated human play)? Yes, it's a clear step forward compared with only being able to beat humans at games which can be fully brute forced, or a pocket calculator being much faster and reliable than the average humans at arithmetic due to a simple, tractable architecture, but also one of the least magical-seeming applications given we already had the calculators and chess engines (especially compared with something like playing Jeopardy) unless you had unjustifiably strong priors about how special human Go theory was.
I think people are completely wrong to pooh pooh the utility of computers being better at search and calculations in an ever wider range of applied fields, but linking computers surpassing humans at more examples of those problems to certainty we'll synthetically replicate brain functionality we barely understand is the sort of stretch which is exactly why AGI-sceptics feel the need to point out that this is just a tool iterating through existing programs and sticking lines of code together until the program outputs the desired output, not evidence of reasoning in a more human-like way.
AlphaGo is decidedly not brute force, under any meaningful definition of the term. It's monte carlo tree search, augmented by a neutral network to give stronger priors on which branches are worth exploring. There is an explore/exploit trade-off to manage, which takes it out of the realm of brute force. The previous best go programs used Monte Carlo tree search alone, or with worse heuristics for the priors. Alpha Go improves drastically on the priors, which is arguably exactly the part of the problem that one would attribute to understanding the game: Of the available moves, which ones look the best?
They used a fantastic amount of compute for their solution, but, as has uniformly been the case for neutral networks, the compute required for both training and inference has dropped rapidly after the initial research result.
If recent philosophy taught us anything it's that brains are special. The hard problem of consciousness shows science is insufficient to raise to the level of entitlement of humans, we're exceptions flying over the physical laws of nature, we have free will, first person POV, and other magical stuff like that. Or we have to believe in panpsychism or dualism, like in the middle ages. Anything to lift the human status.
Maybe we should start by "humans are the greatest thing ever" and then try to fit our world knowledge to that conclusion. We feel it right in our qualia that we are right, and qualia is ineffable.
> The hard problem of consciousness shows science is insufficient to raise to the level of entitlement of humans, we're exceptions raising over the physical laws of nature, we have free will and other magical stuff like that.
That's not my understanding of the 'hard problem of consciousness'. Admittedly, all I know about the subject is what I've heard from D Chalmers in half-a-dozen podcast interviews.
It's not like people can arbitrarily choose their own loss function; our drivers, needs and desires are what they are, you don't get to just redefine what makes you happy (otherwise clinical depression would not be a thing); they change over time and can be affected by various factors (things like heroin or brain injury can adjust your loss function) but it's not something within our conscious control. So I would not put that as a distinguishing factor between us and machines.
People always reach for these analogies. "Planes don't fly like birds." "Submarines don't swim like fish."
Backpropagation has zero creativity. It's an elaborate mechanical parrot, and nothing more. It can never relate to you on a personal level, because it never experiences the world. It has no conception of what the world is.
> Backpropagation has zero creativity. It's an elaborate mechanical parrot, and nothing more. It can never relate to you on a personal level, because it never experiences the world. It has no conception of what the world is.
The problem is: it's not really clear how much creativity we have, and how much of it is better explained by highly constrained randomized search and optimization.
> It can never relate to you on a personal level
Well, sure. Even if/once we reach AGI, it's going to be a highly alien creature.
> because it never experiences the world.
Hard to put this on a rigorous setting.
> It has no conception of what the world is.
It has imperfect models of the world it is presented. So do we!
> At least a dog gets hungry.
I don't think "gets hungry" is a very meaningful way to put this. But, yes: higher living beings act with agency in their environment (and most deep learning AIs we build don't, instead having rigorous steps of interaction not forming any memory of the interaction) and have mechanisms to seek novelty in those interactions. I don't view these as impossible barriers to leap over.
I agree GPT isn't grounded and it is a problem, but that's a weird point to argue against AlphaCode. AlphaCode is ground by actual code execution: its coding experience is no less real than people's.
AlphaGo is grounded because it experienced Go, and has a very good conception of what Go is. I similarly expect OpenAI's formal math effort to succeed. Doing math (e.g. choosing a problem and posing a conjecture) benefits from real world experience, but proving a theorem really doesn't. Writing a proof does, but it's a separate problem.
I think software engineering requires real world experience, but competitive programming probably doesn't.
If anything is overhyped in AI it's deep reinforcement learning and its achievements in video games or the millionth GAN that can generate some image. But when it solves a big scientific problem that was considered a decade away, that's pretty magical.
I believe modelling the space of images deserves a bit more appreciation, and the approach is so unexpected - the generator never gets to see a real image.
The GANs are backdooring their way into really interesting outcomes, though. They're fantastic for compression: You compress the hell out of an input image or audio, then use the compressed features as conditioning for the GAN. This works great for super-resolution on images and speech compression.
It’s like a lot of the crypto stuff. The research is really cool, and making real progress toward new capabilities. Simultaneously there are a lot of people and companies seizing on that work to promote products of questionable quality, or to make claims of universal applicability (and concomitant doom) that they can’t defend. Paying attention in this sort of ecosystem basically requires one to be skeptical of everything.
> but it feels like AI is an especially strong instance of this. It seems like a lot of folks just want to pooh pooh any possible outcome. I'm not sure why this is.
I presume the amount of hype AI research has been getting for the past 4 decades might be at least part of the reason. I also think AI is terribly named. We are assigning “intelligence” to basically a statistical inference model before philosophers and psychologists have even figured out what “intelligence” is (at least in a non-racist way).
I know that both the quality and (especially) the quantity of inference done with machine learning algorithms is really impressive indeed. But when people are advocating AI research as a step towards some “artificial general intelligence” people (rightly) will raise questions and start poohing you down.
The naming does indeed matter here. The concept of general intelligence is filled with pseudo-science and has a history of racism (see Mismeasure of Man by Stephen J. Gould). Non-linear statistical inference with very large matrices could be assigned intelligence as it is very useful, but it is by no means the same type of intelligence we ascribe to humans (or even dogs for that matter).
If your plant actually looks like a moss you probably shouldn’t call it a rose. (Even though your moss is actually quite amazing).
It's because you have to pay really close attention to tell if it's real or hype. It's really easy to make a cool demo in machine learning, cherrypick outputs, etc.
People think more of what we already have is going to go farther. 1 horse to the carriage gets you to the market. 2 horses to the next village. 4 to town and 6 cross states. Given enough horses we should reach the moon, right?
With absolutely no evidence (as none can be had about the future) I believe that AI can be reached with computers and programming languages as different from the current ones as rockets are to horses.
Scientists in 50s expected to get language translation in 10 years as soon as computers will get enough computation power. They were real scientists, not "data AI scientists" who has little mathematics culture and not aware of any brain studies and problems in this field.
But yeah, all aboard is hype train, we have a dog who speak English! Not a state machine that just do similar to what it was programmed on using statistics tricks. This is SO COOL!! PROGRAMMERS ARE DEAD!!!111 WOOOHOO SCIEEENCEEE!!
The approach reminds me more of junior devs who have no interest in fully understanding the code/problem and they just make semi-random changes to the code until the compiler is happy/the test is green.
It does write its own tests, i.e. to the extent it checks the generated programs work on the data provided and discards the ones that don't. I imagine many of the coding challenges it's trained on come with a few tests as well.
I meant actually coming up with examples consisting of specific problems and their correct solutions (and maybe some counterexamples.)
Ironically, I had just replaced 'test cases' with 'tests', because I thought that the former might seem too generic, and arguably satisfiable merely by rephrasing the problem statement as a test case to be satisfied.
That would imply the AI already solved the problem, as it needs the solution inorder to generate tests, e.g. An AI can't test addition without being able to add, and so on
The problem here is to write a program, not solve the problem that this program is intended to solve. Clearly people can write tests for programs without necessarily being able to write a program to meet the specification, and in some cases without being able to solve the problem that the program is required to solve (e.g. plausibly, a person could write a tests for a program for solving Sudoku puzzles without being able to do that themselves, and it is possible to test programs that will be employed to find currently-unknown prime numbers.)
Having said that, your point is kind-of what I was getting at here, though in a way that was probably way too tongue-in-cheek for its own good: When we consider all the activities that go into writing a program, the parts that AlphaCode does not have to do are not trivial. Being given solved test cases is what allows it to succeed (sometimes) with an approach that involves producing a very large number of mostly-wrong candidates, and searching through them for the few that seem to work.
Fun fact: dogs bark primarily to communicate with humans. Wolves (undomesticated dogs) don’t really bark. And you wouldn’t likely see a pack of dog barking at each other. But humans are keenly able to tell what a dog is trying to express by the different sounds it makes. This is all a result of the convolution between the two species.
this is cool and all but it’s not even close to AGI. it’s a sophisticated gimmick.
to go meta, I don’t believe that our way of approaching AI in general is going to get us to AGI.
Why?
Assume for a second that we could build something that could solve ANY problem.
What would happen next?
What does an intelligent human with zero emotions do? Nothing.
Fear of dying and the hardwired desire to pass on our genes (or to contribute something to society - which is just another way of shooting for immortality) is what drives us.
I predict that true AGI cannot come before a machine that has feelings, that is sentient.
We already have machines that do that. They’re called humans. It’s very arrogant of us thinking we can recreate in silicon what nature has evolved over eons. Not saying it’s not gonna happen - but not with the current approaches.
Also, a dog that speaks mediocre english? please. this is insulting to dogs and shows out tendency to anthropomorphize everything.
I agree with your underlying premise - without stimuli such as pain and pleasure to give motivation it is difficult to foresee an AI which will think enough like humans to be useful or that we feel we can trust. But perhaps that is just chauvinism.
I think I agree with this take. We barely even know how our own brains learn and solve problems. We don't know exactly what the inner workings of AGI will look like... maybe this kind of problem solving is the emergence of that, perhaps in a different way to our own, perhaps not.
Hey it sure feels nice to know that now a "dog" is much more clever than me.
but then again, its approach is so vastly different that it makes it really incomparable with our typical modus operandi. It's throwing millions of monkeys into millions of typewriters and actually producing "to be or not to be" reliably.
I don't see AI writing Shakespeare in my lifetime though (I hope to die rather soon).
As I understand it, all of the progress in AI has come from taking existing theory and throwing enormous amounts of hardware at it. Which is interesting, sure, but whether it represents “progress” in the field of “artificial intelligence” is a different question.
Yep. Some efficiency gain is surely possible with "intelligent" autocomplete (i.e. codex instead of intellisense), but it must be code you wanted to write.
"If you think you're paying me for typing code, maybe you should have hired a typist and not a programmer?"
I think Refactoring will be hardest but Maintenance, Structuring and Domain it's all that even current Codex is good at.
I think with current generation of algos we can get to very good google searches but ones that you can throw own documents/code at. Later probably also DB schema etc.
I am willing take on your bet, as long you agree with my condition...you can't feed it with any of the millions of lines of code previously created by humans you aim to beat...;-)
> Humans learn by looking at existing code and gleaning new ideas, and does the AI although it requires much more data.
Actually I doubt it. There are different ways to learn, the cheapest of which is observation indeed. There's also trial and error, how most enthusiast start with little more than a description of the grammar if it's primitive enough, famously so for the BASIC varieties. Of course this leads to local minima and frustration quickly, Turing Tarpits e.g., but it also leads to independent discovery and rediscovery. At the next step of learning, I believe, it's the differentiation that makes ... all the difference – ie. learning about the development of languages and eventually of computing machines per se. At a higher order of learning it cannot be limited to calculation without a semantic interface (and vice versa).
I mean, etymologies like ominous calx (Latin "pebble", cp. chalk, calcium) for calculation, and symbolical notation show the kind of "code" that we copy from. That said, I'd think at the current state of the art, the ai would have to look at the machine code as the representation of its output eventually (perhaps because I found myself wanting to do that), even if encoded through macros and programming language devices, which it could do entirely unsupervised.
However, understanding the task is a more general problem, isn't it?
Belter was probably joking.
A good logic programming ai could do competitive programming through purely deductive reasoning.
I just don’t see good evidence that’ll be possible at a world class level in 10 year.
I think we need AI to learn to simplify problems because no human (at least not me) solves these problems as stated but works on limited inputs and mostly via brute-force. Then you see some generalisation and you have your solution.
Are they already doing that? If not expect 2x improvement easily.
Also it would be great if algos were aware where in embedding space their solution is and then try to jump between "approaches".
Ok, I've skimmed the paper and they are doing clustering grouping by tests results. Ok that's first step. Now please instead do grouping by classification of algorithm by other NN first.
The places AI excel still have very rigid rules with relatively low branching factors. It's also relatively easy to score both the final result and any intermediate stage. And all the training techniques rely on those factors tremendously.
My guess is that's where the difficulties will arise.
Did I read correctly that the AI generates millions of potential solutions per question and then filters these down? That's an interesting approach and one that had occurred to me. Have a critic evaluate AI generates content and then have it take many tried to come up with a sample that passes the evaluator
I've come up with an algorithm (the same as the AI's) for solving that linear system using a CAS. Feels relevant because this breakthrough is heralding an era of much more powerful CASes.
We're getting close to an era where the drudgery in maths can be automated way.
I still don't believe deep learning is going to take us to AGI. The susceptibility of these systems to completely failing on weird inputs shows that it's a kind of massive memorization going on here, rather than true understanding.
Has anyone tried searching for new basic operations, below the level of neural networks? We've been using these methods for years, and I doubt the first major breakthrough in ML is the most optimal method possible.
Consider the extreme case of searching over all mathematical operations to see if something really novel can be discovered.
> The susceptibility of these systems to completely failing on weird inputs ...
If you believe that people are "general intelligence" systems, then your comment doesn't imply that the existing artificial systems won't achieve general intelligence, because people fail massively on weird inputs.
Sometimes I think peoples’ self-awareness (internal world, whatever you want to call it) makes them think they’re smarter or more competent than they are. My partner has a better memory than me and she will continually point out cases where I make the exact same comment when responding to similar situations across time (driving past a mural on the road, walking past the same flower bed, etc.) I really do feel like a program responding to stimuli sometimes. And sure I have a rich internal world or whatever, but that isn’t so easy to discern from my external behavior. I think as I grow older the magic dissipates a bit and I can see how really, humans aren’t that unique and incredible. What’s incredible is that all this random faffing around at scale without much individual brilliance leads to the wonders of society. But we could divine that by looking at a termite mound.
All of which is to say that memorization as you describe it seems like a plausible path to AGI, and humans don’t deal with weird inputs well either. It’s not like we are trying to make the ultimate single intelligence that rules humanity (although that may eventually come to pass). But something of roughly human competence certainly seems achievable.
I think it depends on what’s meant by “deep learning”. If you mean the latest multi-billion parameter transformer architecture trained on narrow domain data then yeah, you’re probably right. If you mean large networks of simple computational units optimized by gradient descent on parallel hardware, why not?
> The susceptibility of these systems to completely failing on weird inputs shows that it's a kind of massive memorization going on here, rather than true understanding.
Isn’t that a truism? “Understanding” is equivalent to AGI. Nobody would argue that we have AGI yet, but the intelligence-sans-understanding is somewhat similar to animal intelligence, which was the precursor to human intelligence. What should scare us is that we know that animal to human was not a difficult step evolutionarily.
Perhaps most of us in this forum have only the most basic exposure to biology? I say this because with any serious exposure you can't help but be dumbfounded by the complexity and "creativity" of nature. Everything is amazing! The fact that the pieces fit together is just wild and scary and awesome. To think that gradient descent and backprop will give us general intelligence, one of the great mysteries of nature, is incredibly hubristic, IMO. It's statistical inference at scale, with some heuristics up/downstream. It's a cool and useful tool for sure!
>> Update: A colleague of mine points out that one million, the number of
candidate programs that AlphaCode needs to generate, could be seen as roughly
exponential in the number of lines of the generated programs.
To clarify, "one million" is the number of programs generated by AlphaCode on
the CodeContests dataset, not the CodeForces one (although the former is a
superset of the latter). The results on CodeContests are reported in table 5
(page 15 of the pdf) of the DeepMind preprint [1]. The results on CodeForces are
reported in table 4 of the preprint (page 14 of the pdf).
I can't find where the paper lists the number of samples drawn for the
CodeForces results, but on Section 4.4 the preprint says:
Sampling from transformer models can be easily parallelized, which allowed us
to scale to millions of samples per problem (...).
Note the plural. "Millions" can mean one million, or a hundred million, but, as
far as I can tell, the preprint never makes it clear which it is. Note that the
results on CodeForces (table 4) are averaged over 3 evaluations in each of which
AlphaCode generated "millions" of samples and finally submitted 10. So it's 3
times "millions" of samples for the actual results in table 4 (CodeForces). I
assume those were the top 3 of all evaluations.
The idea that the lines of code in a target program are related to the
cardinality of the program space that must be searched before a target program
can be found is not completely unfounded. For sure, the cardinality of the
search space for programs is some function of the number of tokens that must be
combined to form each program in that space. An _exponential_ function because
we're talking about combinations of tokens (assuming only grammatical strings
are generated it gets a bit better, but not by much). We can take lines-of-code
as a rough proxy of number of tokens, and in any case it's clear that the
cardinality of the set of one-line programs is less than the cardinality of the
set of two-line programs, and so on.
I'm not sure if the "Update" bit above is claiming, that AlphaCode must generate
the _entire_ program space of _all_ k-line programs before it can find a
solution to that problem. To be honest, even I don't think that AlphaCode is
_that_ bad. Having to go through the entire program space to find one program is
the worst case. On the other hand, AlphaCode does perform very poorly so who
knows?
Regarding talking dogs, I have never met any, but I have heard of a horse that
can do arithmetic [2].
Frankly asking a computer to write code is like asking a dog to speak facts in straight zfc. What else can I say, ai research has conspired to take over all other areas of computer science. I wonder who exactly started this hype. Also what else will come after this the stupid ai craze.
If we hit another AI winter the next big thing would be new/better energy sources and space travel. 3d printing will also enable a return to local scale manufacturing. Crypto will devour more industry.
We’ll have mostly automated local factories producing whatever you need and be able to sell it for untraceable crypto.
I want that to happen because it will enable resistance movements against the misaligned AI that will inevitably emerge.
I model misaligned AI as like an imperialist state. It’s impossible for a single power to conquer the world because every other state allies against it.
That will be how AI alignment is solved.
We live in exciting times. Given exponential rates of progress, I can’t really even imagine what breakthroughs we will see in the next year, let alone the next five years.
EDIT: I forgot to mention: having GitHub/OpenAI tools like CoPilot always running in the background in PyCharm and VSCode has in a few short months changed my workflow, for the better.