Hacker Newsnew | past | comments | ask | show | jobs | submit | gtoubassi's commentslogin

Hackers by Steven Levy is an incredible story of the industry’s early years (60-80’s) and the characters that were in it for the “love of the game” vs what is more common now (“status and money”). A lot of heroes like woz, but who are less well known in this day and age (Gosper and Greenblatt!). If you are familiar with and a fan of Dealers of Lightning or Dream Machine, check out Hackers! (this is not a paid endorsement).

"The Last Lone Inventor: A Tale of Genius, Deceit, and the Birth of Television" is a great book detailing the Farnsworth journey.

If you have any details written up on your kit (in partic what solar you used) I'd appreciate a link. I'm looking to do similar


https://traquito.github.io/

https://www.zachtek.com/product-page/wspr-tx-pico-transmitte...

Solar cells are polycrystalline. You can buy them on AliExpress (very brittle - takes practice soldering them)

I didn’t use the Traquito Solar truss. Too hard to solder. I used a foam dinner plate and solar tabbing wire


We do the token counting on our end literally just running tiktoken on the content chunks (although I think usually its one token per chunk). Its a bit annoying and I too expected they'd have the usage block but its one line of code if you already have tiktoken available. I've found the accounting on my side lines up well with what we see on our usage dashboard.


As an FYI, this is fine for rough usage, but it's not accurate. The OpenAI APIs inject various tokens you are unaware of into the input for things like function calling.


I struggled to get an intuition for this, but on another HN thread earlier this year saw the recommendation for Sebastian Raschka's series. Starting with this video: https://www.youtube.com/watch?v=mDZil99CtSU and maybe the next three or four. It was really helpful to get a sense of the original 2014 concept of attention which is easier to understand but less powerful (https://arxiv.org/abs/1409.0473), and then how it gets powerful with the more modern notion of attention. So if you have a reasonable intuition for "regular" ANNs I think this is a great place to start.


+1 you beat me to the punch! I think its helpful to start with simple RL and ignore the "deep" part to get the basics. The first several lectures in this series do that well. It helped me build a simple "cat and mouse" RL simulation https://github.com/gtoubassi/SimpleReinforcementLearning and ultimately a reproduction of the DQN atari game playing agent: https://github.com/gtoubassi/dqn-atari.


Token counting is importing when you are injecting fetched data into the prompt to make sure you don't overflow the prompt size (e.g. in retrieval augmented generation). You want to give the LLM as many facts as will fit in the prompt to improve the quality of its response. So even with billions of dollars... token counting is a thing.


+1 the book is entertaining (esp for engineers). Also was released under the title "The Newtonian Casino".




and i too, can write a fictional story of how i am evil.

I think people are reading too much intentions into the output.


I agree this is an incredibly interesting paper. I am not a practitioner but I interpreted the gradient article differently. They didn’t directly find 64 nodes (activations) that represented the board state as I think you imply. They trained “64 independent two-layer MLP classifiers to classify each of the 64 tiles”. I interpret this to mean all activations are fed into a 2 layer MLP with the goal of predicting a single tile (white, black, empty). Then do that 64 times once for each tile (64 separately trained networks).

As much as I want to be enthusiastic about this, it’s not entirely clear to me that it is surprising that such a feat can be achieved. For example it may be possible to train a 2 layer MLP to predict the state of a tile directly from the inputs. It may be that the most influential activations are closer to the inputs then the outputs, implying that Othello-GPT itself doesn’t have a world model, instead showing that you can predict board colors from the transcript. Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there. I think it would be more impressive if they were only taking “later” activations (further from the input), and using a linear classifier to ensure the world model isn’t in the tile predictor instead of Othello-GPT. I would appreciate it if somebody could illuminate or set my admittedly naive intuitions straight!

That said, I am reminded of another OpenAI paper [1] from way back in 2017 that blew my mind. Unsupervised “predict the next character” training on 82 million amazon reviews, then use the activations to train a linear classifier to predict sentiment. And it turns out they find a single neuron activation is responsible for the bulk of the sentiment!

[1] https://openai.com/blog/unsupervised-sentiment-neuron/


Right, so the 64 Probes are able to look at OthelloGTPs internals and are trained using the known board-state-to-OthelloGPT-internals data. The article says

It turns out that the error rates of these probes are reduced from 26.2% on a randomly-initialized Othello-GPT to only 1.7% on a trained Othello-GPT. This suggests that there exists a world model in the internal representation of a trained Othello-GPT.

I take that to mean that the 64 trained Probes are then shown other OthelloGTP internals and can tell us what what the state of their particular 'square' is 98.3% of the time. (we know what the board would look like, but the probes dont)

As you say "Again, not a practitioner but once you are indirecting internal state through a 2 layer MLP it gets less obvious to me that the world model is really there."

But then they go back and actually mess around with OthelloGTPs internal state (using the Probes to work out how), changing black counters to white and so on, and then this directly affects the next move OthelloGTP makes. They even do this for impossible board states (e.g. two unlinked sets of discs) and OthelloGTP still comes up with correct next moves.

So surely this proves that the Probes were actually pointing to an internal model? Because when you mess with the model in a way to affect the next move, it changes OthelloGTPs behaviour in the expected way?


What is MLP?


https://en.m.wikipedia.org/wiki/Multilayer_perceptron

A “Classic” neural network, where every node from layer i is connected to every node on layer i+1


Multi-layer perception, synonym of neural network but perhaps with the additional implication that it is fully connected.


It‘s not a synonym for NNs. It‘s one specific NN architecture, consisting of an input layer, an output layer, and a number of hidden layers in between. It‘s feed-forward and fully-connected, as you said.


Multilayer Perceptron


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: