Hacker News new | past | comments | ask | show | jobs | submit login

>But it wasn't designed. It's not a computer program, where one can make confident predictions about its limitations based on the source code.

It definitely is exactly that. It's not any more special than any other program that you can write. I am not totally sure that what you describe could ever exist at all.

What makes this program "magic" compared to any other program exactly? There is no physical difference between it and a "regular" program. Both of them are a bunch of source code that gets compiled into an executable and ran by the underlying OS and hardware. There is nothing physically different between it and other software.

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...




Another example of "evolved behavior" is here, where a robot is trained to walk, run, etc:

https://mrl.snu.ac.kr/research/ProjectAgile/Agile.html

This is done using neural networks. I believe a project like that can be done by a few researchers over months, not years?

If you do this using "regular programming" instead, you'd have to write an insanely complex application that uses inverse kinematics etc.

https://en.wikipedia.org/wiki/Inverse_kinematics

A project like that requires a large team of developers, working over many years. Boston Dynamics is one example.


There is no such thing as non-regular programming though, that's my point.

All programs that run on the computer have the same "power" in terms of what they can do and what can be computed using them. A program that implements a neural net is not inherently any different than a silly python script. One just does a lot more stuff and is much more interesting.


Sure, if you drill down, everything is just a Turing machine.

And then we can drill down even further where everything is just physics with atoms, quantum mechanics, etc.

So you're not different from a computer. Both are just physics.

But that's not a useful world view in my opinion.

I think "regular" and "non-regular" programming is a useful distinction.

In regular programming, I have to write explicit implementations of the algorithms in the program.

In "non-regular programming" (neural networks), I just have to know how to set up and train neural networks.

Once I do that, the neural networks can be trained to evolve algorithms that I myself don't know how to implement.

Don't you see the big difference between "I have to code the algorithms" and "the computer does it for me"?


>So you're not different from a computer. Both are just physics.

Well I do actually believe this! To me it's the only logical thing. The laws of physics apply equally to a brain and a computer, one is just a lot more fancy than the other one.

>Don't you see the big difference between "I have to code the algorithms" and "the computer does it for me"?

I do see the difference and understand what you are getting at. I agree that it's useful to distinguish them in general.

It's also useful to realize that it is just a regular program at the end of the day too, just a really complicated one that does some neat stuff. Believing that AI is "magic" is pretty dangerous I think.


> Well I do actually believe this! To me it's the only logical thing. The laws of physics apply equally to a brain and a computer, one is just a lot more fancy than the other one.

I agree 100% with this statement taken in isolation. I don't believe there's more to a human brain than physics.

There is an interesting theory about the brain using quantum mechanics (https://en.wikipedia.org/wiki/Quantum_mind), but that still puts it firmly in the realm of physics even it is true.

My point about "it's all just atoms" was about the fact that we need mental models to discuss things. The models will never be perfectly accurate. Just like software frameworks, they're leaky abstractions. Sure, some models are just plain wrong and should be discarded, but in general we can't reason without them.

And it looks like you agree with that (?):

> I do see the difference and understand what you are getting at. I agree that it's useful to distinguish them in general.

Thanks!

So, on to the core of the discussion:

> It's also useful to realize that it is just a regular program at the end of the day too, just a really complicated one that does some neat stuff.

Sure. If you look at GPT-4 as a whole, it's just a regular program that executes like any other program. It has instructions that use internal data to process inputs from the user and responds with an output to the user.

Nothing new here. Any Turing machine can do this, given enough time and memory. Heck, I saw a video of an 8-bit AVR boot Linux using a simple ARM instruction simulator. Only took 3.5 hours to get the login prompt :)

> Believing that AI is "magic" is pretty dangerous I think.

Not sure what you mean exactly with the "magic" part? Is this a point about something other people think that is inaccurate? Or did I write something that you don't agree with?

To restate my position: I currently believe the neural networks inside the LLMs used to be "stochastic parrots", but that we saw a step-change in performance 1-2 years ago.

We reached a new level of model size (>100B parameters), training data (trillions of tokens), and training time (>1M GPU hours). Somehow the backpropagation training of the neural networks changed the network parameters so that algorithmic processing capabilities emerged.

This isn't fundamentally different from neural networks evolving algorithms to perform OCR, FFTs, balancing a inverted pendulum, playing Go, etc.

Here, the LLMs evolved language processing algorithms. Not only that, they started evolving algorithms for reasoning, abstraction, logic, planning, and problem-solving. Together with that they also formed models about the world to help with the reasoning.

This was driven by the training which seeks to optimize the accuracy of the next word prediction. Lookup tables only get you so far here. At some point you need to understand the context to accurately predict the next word.

For example, in French and German, there are multiple variants of the word "it". To translate the English phrases: "The box wouldn't fit in the suitcase because it was too large" and "The box wouldn't fit in the suitcase because it was too small", you need to understand if "it" refers to the box or the suitcase.

There are 10^80 atoms in the universe. Even if you assume a tiny vocabulary of 100 words, you get more than 10^80 possible combinations after stringing 40 words together. And even if you have unlimited storage, there's not 10^80 tokens to train with. And even with unlimited storage and examples, we don't have unlimited CPU cycles for the training.

So it's clear to me that a "stochastic parrot" (or Chinese room) will be very simplistic even in 1000 years, no matter how much computers progress in that time. And therefore, the latest LLMs must have evolved algorithms for reasoning, abstraction, logic, planning, and problem-solving.

I don't know if that is what you mean by "magic"?

To me, that's not magic. It's just an algorithm (neural network training) creating algorithms and data structures. Impressive as heck for sure, but not magic. I could be wrong, and am more than happy to consider alternatives if you have any?

We have recent examples of similar emergent behavior from big neural networks where they evolve algorithms far beyond what a human programmer can create. For example, AlphaGo which beat the human Go champion. The AlphaGo programmers could never beat him, but they managed to evolve a program that were "smarter" than them (no, Go-playing is not general intelligence).

Now, I could be wrong about the level of intelligence with the latest LLMs like GPT-4. Maybe they're a lot dumber than they appear. But in that case I'm in good company. From what I can tell, the major AI researchers agree with me in that GPT-4 possesses some form of intelligence. It's not a stochastic parrot.

And to end with something I agree with: You wrote that whatever happens because of LLMs in the near future, it's because of human actions. I agree. The LLMs have no agency in themselves. It's humans that use and misuse them.


Wow this is a really good reply. Thanks for taking the time to write all of this!

I think I agree with pretty much all that you have said here actually, this is one of the better and more accurate descriptions of the current state of things that I have read in general!

As far as the magic thing goes, I was replying to this specifically, and other similar statements made in other parts of the thread and even in the original post (the blog post or whatever you call it) itself, and even more so in the media:

>But it wasn't designed. It's not a computer program, where one can make confident predictions about its limitations based on the source code.

There have been media headlines about the potential for modern AIs to turn evil and destroy the human race sci-fi movie style. I think people who believe this do believe that current AI is "magic" in some sense but I'm not totally sure how to pin down exactly how.


Thanks, I appreciate it!

> There have been media headlines about the potential for modern AIs to turn evil and destroy the human race sci-fi movie style.

Yeah, the public debate isn't very balanced in either direction.


No, machine learning models are not programs and they are not compiled from source code. They are the output of non-deterministic matrix multiplication operations which take encoded data as the input. They can then be used as a black box by an actual program to calculate useful outputs.

The program which takes your text and runs a final calculation on it against the machine learning model to get an output is a program. But that program is not doing anything interesting. All the interesting work was done when the model was cooked up in a black-box non-deterministic process by some other GPUs somewhere else well before it ever came near the inference program.


> "They are the output of non-deterministic matrix multiplication operations"

Just a nit-pick: Aren't neural networks and LLMs perfectly deterministic?

I think you can reproduce GPT-4 perfectly if you have access to the same source code, training data, and the seeds for the random number generators that they used?

As a side note, I think it'd be theoretically possible to do this on a small 8-bit microcontroller given enough time and external storage. That's the beauty of Turing machines.

This would not be practical in the least. But it sure was cool seeing a guy boot Linux in just 3.5 hours on a small 8-bit AVR microcontroller.

https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...


Multi-core math on GPU and CPU is non-deterministic for performance and scheduling reasons.

The errors are small rounding errors that maybe don't have any serious implications right now. But the larger models get and the more operations and cores it takes to train them the more the rounding errors creep up.


> Multi-core math on GPU and CPU is non-deterministic for performance and scheduling reasons.

Ok, I see what you mean.

I can see how that could be the case. It depends on how the software is designed.

Now that I looked up it, I was surprised to see that PyTorch may generate non-reproducible results: https://pytorch.org/docs/stable/notes/randomness.html

But it looks like the sources of non-determinism in PyTorch are known, and can be avoided with a lot of work and loss of performance?

And for the general case, I don't think it's impossible to write deterministic code for multi-core processors?

> The errors are small rounding errors

But rounding errors don't imply non-deterministic answers, right? Just that the answer is different from the true answer?

Calculating the square root of 2 will have a rounding error with 32-bit floating point, but are you saying that you'll get different bit patterns in your FP32 due to rounding errors?


Thanks for saving me the time to write the same reply :)

To expand a bit:

I can write simple image processing code that will find lines in an image.

But I can't write the code to perform OCR (optical character recognition).

However, in the early 90's, I wrote a simple C program that trained a neural network to perform OCR. It was a toy project that took a weekend.

There are many things where I could train a neural network to do something, but couldn't write explicit source code to perform the same task.

If you (chlorion) look up "genetic algorithms", you'll find many clear examples of where very impressive algorithms were evolved using a simple training program.


So I reread here and I think I misunderstood what you meant.

I meant that the process of generating the models, and otherwise interacting with them are regular programs. The model itself is I guess more like a database or something, but it too is just regular data.

The original thing I was replying to was claiming that the process in general was "not a program", as if there was some magic thing going on that made the model different from output of other programs, or the training was somehow magical. (that is how I read it at least)


If they aren't programs, how do they run on computers?

CPUs and GPUs physically cannot do anything other than execute programs which are encoded into bytecode.

What you are describing is that the language model is "magic" and breaks the laws of physics. I don't believe in magic personally though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: