Hacker News new | past | comments | ask | show | jobs | submit | celltalk's comments login

This is library of Alexendria but in digital format. Amazing work!


All of these smaller model paradigm suggests that we need to incorporate pruning into model training. Neat was one of my favorite algorithms of all time. Same thing with BitNet models which keep showing the information you need is not that much for neural networks. And again, it is same with us, we use much less energy than a regular network so there seems to be immense waste of energy training these models.

My intiution tells me the pre-training paradigm will shift immensely in near future because we started to understand that we don’t need all these paramaters since the subnetworks seems to be very robust preserving information in high dimensions. We keep saying curse of dimensionality but it is more like the bliss of dimensionality we keep seeing. Network redundancy still seems to be very high given BitNet is more less comparable to other LLMs.

This basically shows over 50% of the neural net is gibberish! The reason being is that the objective function simply does not include it.

Again my intiution tells me that neural scaling laws are incomplete as they are because they lack the efficiency parameter that needs to be taken into account (or simply left out due to greed of corporate).

And this is what we are seeing as “the wall”.

I am no expert in neural network theory nor in math but I would assume the laws should be something in the vicinity of this formulation/simulation:

https://colab.research.google.com/drive/1xkTMU2v1I-EHFAjoS86...

and encapsulate shannon’s channel’s capacity. I call them generalized scaling laws since it includes what it should include in the first place: entropy.


I seem to recall that there a recent theory paper that got a best paper award, but can't find it.

If I remember correctly, their counter-intuitive result was that big overparameterized models could learn more efficiently, and were less likely to get trapped in poor regions of the optimization space.

[This is also similar to how introducing multimodal training gives an escape hatch to get out of tricky regions.]

So with this hand-wavey argument, it might be the case that two-phase training is needed: A large overcomplete pretraining focused on assimilating all the knowledge, and a second that makes it compact. Other, that there is a hyperparameter that controls overcompleteness vs compactness and you adjust it over training.


I don't see that contuer-intuitive at all. If you have a barrier in your cost function in 1d model you have to cross over it no matter what. In 2d it could be only a mount that you can go around. More dimensions mean more ways to go around.


This is also how the human brain works. A young babby will have something more similar to a fully connected network. Versus a Biden type elderly brain will be more of a sparse minimally connected feed forward net. The question is (1) can this be adjusted dynamically in silico and (2) if we succeed in that, does fine-tuning still work?


You don't have to compare to old age. Even 10 year old child has its brain pruned immensely when compared to its babyself.


The lottery ticket hypothesis paper from 2018?


Seems this way. Gigantic model, hit the jackpot, prune the nonsense. It doesn't seem like smaller models are enough tickets.


I guess we can think of it like one giant funnel; it gets narrower as it goes down.

Vs trying to fill something with just a narrow tube, you spill most of what you put in.


"Train large, then compress"


> This basically shows over 50% of the neural net is gibberish! The reason being is that the objective function simply does not include it.

This is a mischaracterization of sparsity. Performance did drop, so the weights are not gibberish. Training vs pruning, you can't train into the final state, you can only prune there.


The fact that you can prune a model will not make it smarter, the wall still stands. I think what explains the wall is the fact that we can't scale organic data exponentially, and we have already covered the most useful types.

Going forward we will accumulate truly useful data at a linear growing rate. This fundamentally breaks the scaling game. If your model and compute expand exponentially but your training data only linearly, the efficiency won't be the same.

Synthetic data might help us pad up the training sets, but the most promising avenue I think is to use user-LLM chat logs. Those logs contain real world grounding and human in the loop. Millions of humans doing novel tasks. But that only scales linearly with time, as well.

No way around it - we only once had the whole internet for the first time in the training set. After that it's linear time.


Don't we still have a lot of video, and other non text real world data to go with? Feels like a possible potential break from there.


Generally speaking text only models manage to learn a huge amount about the visual world. So when you put the model train on video it might have less to learn. Video is also less abstract than text, generally. But I am sure we can still extract useful learning from videos, it's probably expensive, but we'll have to do that at some point.


Given how much of the web is ai generated slop now, I think going forward it’s even worse than you suggest.

I have a copy of refined web locally so I have a billion pre-chatgpt documents for my long term use.


In mice, ~30% of neurons are silent [1]. Neuralink team is finding that most are silent, where they probe [2]:

> Also, most of them are silent. They don’t really do much. Or their activities are… You have to hit it with just the right set of stimulus.

> ... When you place these electrodes, again, within this hundred micron volume, you have 40 or so neurons. Why do you not see 40 neurons? Why do you see only a handful? What is happening there?

(Yes, I understand LLM aren't brains.)

[1] https://news.mit.edu/2022/silent-synapses-brain-1130

[2] https://youtube.com/watch?v=Kbk9BiPhm7o&t=7056


That’s why the survey. It could mean something totally different for you, and if you would like to share it. It is just there.


Isn’t there? I think current meta is promising AGI and waiting to see what happens.


Talking heads are inevitably talking about AGI, but there’s no public expectation, at least from those who understand the AI/AGI difference. We are in AI bubble definitely.

A bubble characterizes blind excessive expectations from an existing thing, like dotcom expected from the internet. It’s a snakeoil-craze type of situation. No one buys into a bubble of something that doesn’t exist yet, even if talks get talked.


Well, this was a great answer to the survey!


I’ve created an interactive site showcasing the results of an online survey about AI. My hope is that this site could serve as a snapshot of public sentiment about AI, a sort of historical artifact for our time.

So far, only 15 random people have responded, but I thought it might be interesting to hear perspectives from the HN community as well! If you have a moment, I’d love for you to check it out and share your thoughts.

I’m also open to suggestions on how to improve the site or what else I could add to make it more engaging or insightful.


I haven’t read the paper yet but the news article seemed a bit, meeh.

BCL-2 inhibitors, mainly Venetoclax, is used in cancer therapies quite often which also triggers cell apoptosis and it’s very effective. It was also designed to target B-cell related cancers, but it found to be so effective that FDA approved it to be used in primary cases of Acute Myeloid Leukemia. So, killing cancer with triggerring apoptosis is very well known. I think the novel part might be the two protein, so it is probably more targeted for metabolic activities… but yeah didn’t read the paper yet.

Anyways, for the side effects a major one could be Tumor Lysis Syndrome (TLS). Basically, if you apoptose the cancer cells super fast, the molecules from those cells spread everywhere and it becomes toxic for the patient. This is at least the case for Venetoclax.


How much cancerous cells are similar to let us know how to target them and deliver a payload?

I guess some payload delivering mechanisms expect very 'standard' features from cancer cells?


Cancerous cells are fairly diverse across individuals, or even within a single individual, and many biological treatments require precise sequencing of the tumor DNA of that individual patient to adjust and work. In some cancers, there is a nasty "Russian roulette" effect in play, where a certain treatment may be extremely efficient (in practice a cure, even though oncologists avoid that word) in people with a certain mutation and totally useless in others, even though from the macroscopic point of view, their tumors look the same.


Then, basically, each cancer, cancer cells should be sequenced, then based on the type of cell and DNA sequencing, we have a list of "tools" to deliver payload to those very cells (without delivering such payload to sane cells, ofc)?


That would be the ideal scenario, yes.

In practice, we can only make use of some known mutations. Not just for delivering chemicals, but also for "teaching" the immune system to attack such cells, which, once it is able to recognize them, it will do vigorously.

Let's hope that this catalogue will grow until it covers at least all the typical cases.


I think this is the feeling when you realize what you’re doing does not matter. No big purpose, just a mere immortal. I think you need a bigger purpose bro, like really. Reach out.


How much each hourly update costs? OpenAI pricing table says $15.000/M characters.


Also added a UI for ease of use. You can use in Rstudio with viewer.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: