Hacker Newsnew | past | comments | ask | show | jobs | submit | logicchains's commentslogin

It can't be successful at that any more than 1+1 can equal 3. Fundamentally, if every token wants to be able to look at every previous token without loss of information, it must be O(n^2); N tokens looking at N tokens is quadratic. Any sub-quadratic attention must hence necessarily lose some information and be unable to support perfect recall on longer sequences.

> N tokens looking at N tokens is quadratic

Convolving two arrays can be done perfectly accurately in O(n log n), despite every element being combined with every other element.

Or consider the even more basic sum of products a[i] * b[j] for all possible i, j:

    total = 0
    for i in range(len(a)):
        for j in range(len(b)):
            total += a[i] * b[j]
This can be computed in linear time as sum(a) * sum(b).

Your logic that 'the result contains terms of all pairs, therefore the algorithm must be quadratic' simply doesn't hold.


One of my favorite bits of my PhD dissertation was factoring an intractable 3-dimensional integral

\iiint f(x, y, z) dx dy dz = \int [\int g(x, y) dx]*[\int h(y, z) dz] dy

which greatly accelerated numerical integration (O(n^2) rather than O(n^3)).

My advisor was not particularly impressed and objectively I could have skipped it and let the simulations take a bit longer (quite a bit longer--this integration was done millions of times for different function parameters in an inner loop). But it was clever and all mine and I was proud of it.


This brings me back to DSP class, man learning about FFT was eye-opening.

Convolution is a local operation.

Attention is a global operation.


That's like saying sorting can be done in O(n) because radix sort exists. If you assume some structure, you lose generality, i.e. there'll be some problems it's no longer able to solve. It can no longer approximate any arbitrary function that needs perfect memory over the sequence.

I'm not saying if the paper is correct or not (since I can't tell), but I don't think your argument really holds. Consider applying it to multiplication:

Fundamentally, multiplication need to look at every pair of integer from the two input numbers. It must be O(n^2); N digits looking at N other digits is quadratic. Any sub-quadratic multiplication must hence necessarily lose some information.


Integer multiplication x * y can be trivially done in O(k): k = log₂(min(x, y)). This is because we can do addition in constant time, adding all bits in parallel.

By combining many more adding units, we can do (fixed-size) multiplication in constant time, too: https://en.wikipedia.org/wiki/Dadda_multiplier


Multiplication can be sub-quadratic using Karatsuba's algorithm.

Doesn't that have to do with how many bits you allow in the actual calculation in physical reality?

Well, for multiplication complexity is defined in terms of on the number of digits/bits digits directly. For attention, complexity is defined on terms of the number of input vectors which are all at fixed precision. I don't understand what happens to the method proposed in the paper at higher precision (since I don't understand the paper), but in reality in doesn't matter since there is no value in anything over float16 for machine learning.

Multiplication has some properties like being cumulative. If we assume the sequence has any specific properties then we no longer have a general sequence model.

I think you meant commutative.

Attention also has some specific properties.

And sometimes results are just unexpected. Did you know that anything a Turing machine can do in t tome steps, a different Turing machine can do in O(sqrt(t log t)) memory cells? https://news.ycombinator.com/item?id=44055347


Your argument just assumes there is no latent structure that can be exploited. That's a big assumption.

It's a necessary assumption for the universal approximation property; if you assume some structure then your LLM can no longer solve problems that don't fit into that structure as effectively.

Neural nets are structured as matrix multiplication, yet, they are universal approximators.

You're missing the non-linear activations.

But language does have structure, as does logic and reasoning. Universal approximation is great when you don't know the structure and want to brute force search to find an approximate solution. That's not optimal by any stretch of the imagination though.

That argument could also be used to say that the FFT's time complexity of O(n log n) should be impossible.

You don't see a huge difference between abusing a child (and recording it) vs drawing/creating an image of a child in a sexual situation? Do you believe they should have the same legal treatment? In Japan for instance the latter is legal.

He made no judgement in his comment, he just observed the fact that the term csam - in at least the specified jurisdiction - applies to generated pictures of teenagers, wherever real people were subjected to harm or not.

I suspect none of us are lawyers with enough legal knowledge of the French law to know the specifics of this case


This comment is a part of the chain that starts with a very judgemental comment and is an answer to a response challenging that starting one. You don't need legal knowledge of the French law to want to distinguish real child abuse from imaginary. One can give arguments why the latter is also bad, but this is not an automatic judgment, should not depend on the laws of a particular country and I, for one, am deeply shocked that some could think it's the same crime of the same severity.

The point of banning real CSAM is to stop the production of it, because the production is inherently harmful. The production of AI or human generated CSAM-like images does not inherently require the harm of children, so it's fundamentally a different consideration. That's why some countries, notably Japan, allow the production of hand-drawn material that in the US would be considered CSAM.

If libeling real people is a harm to those people, then altering photos of real children is certainly also a harm to those children.

I'm strongly against CSAM but I will say this analogy doesn't quite hold (though the values behind it does)

Libel must be as assertion that is not true. Photoshopping or AIing someone isn't an assertion of something untrue. It's more the equivalent of saying "What if this is true?" which is perfectly legal


“ 298 (1) A defamatory libel is matter published, without lawful justification or excuse, that is likely to injure the reputation of any person by exposing him to hatred, contempt or ridicule, or that is designed to insult the person of or concerning whom it is published.

    Marginal note:Mode of expression

    (2) A defamatory libel may be expressed directly or by insinuation or irony

        (a) in words legibly marked on any substance; or

        (b) by any object signifying a defamatory libel otherwise than by words.”
It doesn't have to be an assertion, or even a written statement.

You're quoting Canadian law.

In the US it varies by state but generally requires:

A false statement of fact (not opinion, hyperbole, or pure insinuation without a provably false factual core).

Publication to a third party.

Fault

Harm to reputation

----

In the US it is required that it is written (or in a fixed form). If it's not written (fixed), it's slander, not libel.


Pictures are statement of fact: what is depicted exists. Naked pictures cause harm to reputation

The relevant jurisdiction isn't the US either.

> The point of banning real CSAM is to stop the production of it, because the production is inherently harmful. The production of AI or human generated CSAM-like images does not inherently require the harm of children, so it's fundamentally a different consideration.

Quite.

> That's why some countries, notably Japan, allow the production of hand-drawn material that in the US would be considered CSAM.

Really? By what US definition of CSAM?

https://rainn.org/get-the-facts-about-csam-child-sexual-abus...

"Child sexual abuse material (CSAM) is not “child pornography.” It’s evidence of child sexual abuse—and it’s a crime to create, distribute, or possess. "


That's not what we are discussing here. Even less when a lot of the material here is edits of real pictures.

>but I do really like a heterogenous cultural situation, so I think it's interesting and probably to the overall good to have a country pushing on these matters very hard

Censorship increases homogeneity, because it reduces the amount of ideas and opinions that are allowed to be expressed. The only resilience that comes from restricting people's speech is resilience of the people in power.


You were downvoted -- a theme in this thread -- but I like what you're saying. I disagree, though, on a global scale. By resilience, I mean to reference something like a monoculture plantation vs a jungle. The monoculture plantation is vulnerable to anything that figures out how to attack it. In a jungle, a single plant or set might be vulnerable, but something that can attack all the plants is much harder to come by.

Humanity itself is trending more toward monoculture socially; I like a lot of things (and hate some) about the cultural trend. But what I like isn't very important, because I might be totally wrong in my likes; if only my likes dominated, the world would be a much less resilient place -- vulnerable to the weaknesses of whatever it is I like.

So, again, I propose for the race as a whole, broad cultural diversity is really critical, and worth protecting. Even if we really hate some of the forms it takes.


They were downvoted for completely misunderstanding the comment they replied to.

I really don't see reasonable enforcement of CSAM laws as a restriction on "diversity of thought".

This is precisely the point of the comment you are replying to: a balance has to be found and enforced.

They were drafted by Epstein on behalf of Bill's (former) doctor; there's no knowing whether the doctor actually sent it or not.

CSP in Golang makes concurrency in it look pleasant compared to the async monstrosities I've seen in C#.

I heartily recommend reading the link the parent comment on structured concurrency. The alternatives here are not merely goroutines vs. async.

>Laos communism and Vietnam communism are very similar

No they're not; Vietnam scores much higher than Laos on any measure of economic freedom/property rights.


They are culturally very close, and have the same theoricians. That's enough for 'very similar', just like US and UK have a very similar brand of capitalism, despite UK having a better Gini coefficient, a way better press freedom index and higher life expectancy.

Because they created the production; it they couldn't control it then they'd have no incentive to create it and there'd be no non-state-owned businesses, exactly as happened when China was fully communist and still happens in North Korea today. Capital doesn't grow out of thin air just from "working"; the only people who think it does are those who've never tried to build a successful business.

Was it a Perl exam?

>the writing is on the wall that social media's are numbered, well at least its in current form

There's enough of us devs that absolutely fucking hate the idea of governments controlling how people communicate that the next stage of social media will probably be a decentralised system that's extremely difficult to shut down. Unless every government devolves into full on China-style authoritarianism with deep packet inspection, a national firewall and ubiquitous surveillance, there's no way to stop a well designed distributed social media platform. There just hasn't been enough incentive yet for people to build one.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: