Hacker News new | past | comments | ask | show | jobs | submit | mendigou's comments login

There are multiple ways to train in parallel, and that's one of them:

https://pytorch.org/tutorials//distributed/home.html


I haven't touched this in a while, but you can train NNs in a distributed fashion and what GP described is roughly the most basic version of model parallelism, where there is a copy of the model on each node, each node receives a batch of data, and the gradients get synchronized after each batch (so they again start from the same point like you mention).

Most modern large models cannot be trained on one instance of anything (GPU, accelerators, whatever), so there's no alternative to distributed training. They also wouldn't even fit in the memory of one GPU/accelerator, so there are even more complex ways to split the model across instances.


And their bottleneck is what? Data transfer. State is gigantic and needs to be frequently synchronized. That's why it can only work with sophisticated, ultra high bandwidth, specialized interconnects. They employ some tricks here and there but they don't scale that well, ie. with MoE you get factor of 8 scaling and it comes at a cost of lower overall number of parameters. They of course do parallelism as much as they can at model/data/pipeline levels but it's a struggle in a setting of fastest interconnects there are on the planet. Those techniques don't transfer onto networks normal people are using, using "distrubuted" phrase to describe both is conflating those two settings with dramatically different properties. It's a bit like saying that you could make L1 or L2 cpu cache bigger by connecting multiple cpus with network cable. It doesn't work like that.

You can't scale averaging parallel runs much. You need to munch through evolutions/iterations fast.

You can't ie. start with random state, schedule parallel training averaging it all out and expect that you end up with well trained network in one step.

Every next step invalidates input state for everything and the state is gigantic.

It's dominated by huge transfers at high frequency.

You can't for example have 2x gpus connected with network cable and expect speedup. You need to put them on the same motherboard to have any gains.

SETI for example is unlike that - it can be easily distributed - partial readonly snapshot, intense computation, thin result submission.


Not disputing all of that, but telling the GP flat out "no" is incorrect, especially when distributed training and inference are the only way to run modern massive models.

Inference - you can distribute much better than training. You don't need specialized interconnects for inference.

The question was:

> > There is probably a simple answer to this question, but why isn't it possible to use a decentralized architecture like in crypto mining to train models?

> Can you copy a neural network, train each copy on a different part of the dataset, and merge them back together somehow?

The answer is flat out no.

It doesn't mean parallel computation doesn't happen. Everything, including single gpu, is massively parallel computation.

Does copying happen? Yes, but it's short lived and dominates, ie. data transfer is bottleneck and they go out of their ways to avoid it.

Distributing training in decentralized architecture fashion is not possible.


How is Germany similar to the US? Perhaps in the enshittification of society, but certainty not in the forces that drive innovation.

Germany could be that innovation hub in Europe, and it seems incredibly resistant to do so. And all of the EU is trailing behind, unfortunately.


This is what I meant with timing: Germany's innovative younger people are held captive by a large conservative older generation that is pushing for security, which leads to overarching bureaucracy. This is especially bad in Germany because the government tries to regulate every tiny detail. Due to missing digitalization, they are failing on a grant scale.

On the other hand, this doesn't mean as you say that Germany has no potential to be an innovation hub. It just cannot use the potential currently. If you live here, you will see many good signs: Open source is strong. Chaos Computer Club mentality spreads and there is a fighting back against the government digital opression.


>you will see many good signs: Open source is strong. Chaos Computer Club mentality spreads and there is a fighting back against the government digital opression.

Sure, but those things alone don't build future product companies and market leaders worth trillions of dollars.


I also don't get it. I worked there for 8 years. I'm southern european and there was always a "half joke" about our lazyness. But we worked the hardest. My local colleagues were incredibly entitled and produced very little value.

In my opinion, Germany was at some point at the top of the world in most aspects, and the people became complacent over time. Which leads us to where we are today. Enshittification is palpable in every single aspect of society. No one cares about doing a good job anymore. Getting something done takes months where it should take days. And then that thing you got done breaks in a few weeks.

I'm sorry for the rant. I got pretty burned out of living there.


> I'm southern european and there was always a "half joke" about our lazyness.

I think the trope is due to hotter weather. Like tourists visiting in the summer seeing outside workers hardly working due to hot weather.


Off topic: was Kunming's slogan still "civilized Kunming"? I found it pretty funny when I was there some years ago.


文明昆明 (wenming Kunming) is such an amazing rhyming couplet


In which European country is triphasic power for houses becoming normal?


In Denmark, the only places without triphase power are some very old apartments in inner Copenhagen. To be fair, some of those apartments predate US independence by a hundred years. Those buildings do have triphasic power, but had a silly scheme where each apartment got two random phases.

All houses have triphasic power (usually 35A per phase, sometimes 63A), and all apartment buildings with electrics from the last 2-3 decades provide triphasic power to each apartment as well.

Our ovens and cooktops expect triphasic power, with a two-phase downgraded configuration for backup.

Same for Sweden I believe.


In the Netherlands pretty much every house has it. Even if it's not connected yet the lines will have been buried already.


Where I am (Eastern Europe) the cost difference is insignificant. There is no reason to not get triphasic for new detached/semi-detached houses.


apartment blocks always do


As someone with 0 idea about semiconductors development: How does making your own chip compare to using FPGAs when you need low quantities? What things can you do with your own IC that you cannot do with FPGAs?


It's more of a principle achievement, like, "hey Intel, come cause a shooting spree at my local grocery store while I'm there, because, anybody who did similar DIY things in the automobile-hydrogen space met a similar fate from that industry!" [1]

1. https://www.politifact.com/factchecks/2022/jun/02/facebook-p...


The point of the conversation is how people express these relationships in their day-to-day so they can be encoded in software.

Would your grandparents' contact be saved on your phone as "Mom's mom" or as "grandma"? Probably the second, which is indistinguishable from "grandma" as "Dad's mom".

In Norwegian, people would naturally call these "mormor" and "farmor" and they would expect that relationship to be correctly labeled in their localized app.


At least in Swedish, I don't even think there is a generic word for "grandmother", you literally always in every case specify which one it is.


I am fully aware of what the topic is about. I'm just pointing out that the English language and native English speakers definitely use the concept of mom's mom and dad's mom without the needing "official" words like "momdad" and "dadmom" because the person I responded to said

> I am fairly sure English doesn't have (or at least does not use) separate everyday words for farmor/farfar.

They then said you would need "academic" language to describe mom's mom and dad's mom. That's why I said I could not tell if they were serious. Anyway, I think you would be surprised if you asked English speakers what they call their grandparents. I personally used memere and grandma to distinguish between my mom's mom and my dad's mom. The point I'm making is that not having specific words for these relationships does not make English speakers unaware of the difference.


For day-to-day familiar conversation we generally use nicknames for grandparents in the US and that's what is in our contact list.

There are probably hundreds or thousands of nickname words for grandma based on a variety of cultural backgrounds, family tradition, and mispronunciations by grandchildren.

The language we use really depends on setting. In a more formal setting we might say paternal grandmother/grandparent. Speaking to a friend we might use the nickname, or we might say the ambiguous 'grandma' or we might say 'grandmother on my dad's side' or 'dad's mom'.

It really depends on the situation and familiarity and formality.


It's a station from DLR in Bishkek, Kyrgyzstan: https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-5287/102...



Any recs on security follows on Mastodon?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: