
The machine learning community has a toxicity problem - zekrioca
https://www.reddit.com/r/MachineLearning/comments/hiv3vf/d_the_machine_learning_community_has_a_toxicity/
======
bioinformatics
The ____ (insert scientific community/field here) has a toxicity problem.

It can be said from every single field in science (I am in the natural/health
ones):

\- peer review is broken - check

\- reproducibility problem - check (try to reproduce any miracle cancer cure
paper)

\- worshipping problem - check, there are kings and no one can take them down.

\- diversity problem - check. In my department 80% of the professors and PhD
and postdocs are women.

\- moral and ethics are set arbitrarily - check. Morals? in Genetics? give me
a brake

\- there is a cut-throat publish-or-perish mentality - check, tell me
something new

\- discussions have become disrespectful - check. The time I saw Pavel Pezner
giving a keynote lecture at ISMB and instead of showing his work he spend 70%
of the time dressing down other people and trashing their work, Science died
to me. And this was early 2000’s.

Machine learning won’t be the first field to notice it, and won’t be the last.
Science is not scientific anymore.

~~~
tsimionescu
Is this a new phenomenon?

Max Planck had this famous quote:

> A new scientific truth does not triumph by convincing its opponents and
> making them see the light, but rather because its opponents eventually die
> and a new generation grows up that is familiar with it. . . . An important
> scientific innovation rarely makes its way by gradually winning over and
> converting its opponents: it rarely happens that Saul becomes Paul. What
> does happen is that its opponents gradually die out, and that the growing
> generation is familiarized with the ideas from the beginning: another
> instance of the fact that the future lies with the youth.

~~~
ativzzz
I think this goes beyond science. People don't just change their minds. The
only consistent way to get people to change is to wait until they die.

------
tomp
_> At our CS faculty, only 30% of undergrads and 15% of the professors are
women. Going on parental leave during a PhD or post-doc usually means the end
of an academic career. However, this lack of diversity is often abused as an
excuse to shield certain people from any form of criticism. Reducing every
negative comment in a scientific discussion to race and gender creates a toxic
environment. People are becoming afraid to engage in fear of being called a
racist or sexist, which in turn reinforces the diversity problem._

This was an unexpected twist! It's rare to read such an honest, unbiased
opinion on this issue.

~~~
raxxorrax
I would go further and state that there is no "diversity" problem, whatever
that should mean. There is an a priori assumption that it should be 50/50 and
only then a field is just. Why CS though? There are countless other
occupations where that is not given. The problems with PhD and pregnancy is a
university problem. So I still think it is heavily biased.

~~~
kevingadd
"Why CS though?" is a weird statement here.

Why should the vast majority, or even a sizable fraction, of fields not have a
gender or race balance even remotely approximating the balance of the general
population? Is it not a 'should' but merely 'it's okay if they do'? If the
balance was in the other direction and men or white people were being
intentionally denied jobs, would that be okay?

Should it be OK for other things to be unbalanced? Should it be OK if 60% of
black people can't get houses but only 10% of white people have trouble
getting homes? Should it be OK if 90% of asians are turned away from emergency
rooms but only 20% of mexicans are?

You can argue that race/gender genetic differences play a role but it's kind
of hard to explain away the widespread imbalances here just based off DNA.

~~~
devalgo
~80% of the Nurses in the US are Female, this is no doubt due to
discrimination and anti-male bias according to your worldview, correct? 87% of
Garbage men in the US are Men, are you fighting for better female
representation in the Garbage trade?

[https://www.beckershospitalreview.com/hr/gender-ratio-of-
nur...](https://www.beckershospitalreview.com/hr/gender-ratio-of-nurses-
across-50-states.html)

~~~
cycomanic
Yes and what gender are the doctors? Even though medicine students are 60%
female? That's the issue, the prestigious more powerful jobs are largely male
dominated, while the "rank and file" who takes orders (or does the dirty work)
is largely female. Show me one field (that does rely on the person's ability
not looks) where there are more women in the higher ranked positions and more
men in the lower ranked positions.

~~~
devalgo
You ignored my question.

>Show me one field (that does rely on the person's ability not looks) where
there are more women in the higher ranked positions and more men in the lower
ranked positions.

Nursing

~~~
cycomanic
The field is medicine and doctors are predominately male.

------
Topolomancer
The main take-away and problem for me is the 'arXiv dilemma', combined with
shoddy scholarship: newcomers to the field regularly try to drink from the
arXiv firehose and take every paper they find on there as gospel—even though
it is not peer-reviewed (let's set aside the issues of peer review for that
one).

The quick publication cycle creates an environment that is always just about
'beating' the state of the art, but if you look closer into the reported
values, you will often find a lot of questionable experimental choices. In one
of my main application areas, viz. graph classification, almost _none_ of the
papers holds up (with respected to the reported performance gains) if
subjected to a thorough experimental setup.

This creates a dangerous environment; in the worst case, we might _miss_ some
interesting contributions because they are drowned by the noise of reviewers
(here we go again!) claiming that 'It does not beat the state of the art, so
it must be crap'.

~~~
YeGoblynQueenne
>> In one of my main application areas, viz. graph classification, almost none
of the papers holds up (with respected to the reported performance gains) if
subjected to a thorough experimental setup.

Could you give an example? Just being curious :)

~~~
Topolomancer
Sure thing!

For the GIN-ε
([https://arxiv.org/pdf/1810.00826.pdf](https://arxiv.org/pdf/1810.00826.pdf)),
for example, the authors report a classification accuracy of 75.9±3.8 on the
PROTEINS data set (classical graph benchmark data set). If you run it with a
cross-validation setup that is _repeated_ to account for effects of chance,
performance drops to 73.1±0.7.

Notice the drop—the second accuracy value is at least within the standard
deviation of the first one, but you can see that a different experimental
setup shrinks the gains quite a lot...

Same goes for different data sets. Since the gains are not super large for
most papers, these changes matter a lot. But of course, the paper is now
published, so no one is going to go back and change it.

FWIW: I like the GIN paper and think the authors did a good job. It's just
that their experimental setup is insufficiently thorough for the data sets
they are considering, thus leading to overoptimistic estimates. This is a
problem because the next 'state of the art' paper has to find a way to get a
slightly higher mean accuracy, at the expense of an even larger standard
deviation, etc.

~~~
YeGoblynQueenne
Thanks - it will take a while to read through the paper. I'm very surprised to
see actual theoretical contributions in a (recent) neural networks paper.
_Pleasantly_ surprised.

I agree the trend you point out is worrying. I suppose if this continues at
some point the benchmarks are beaten, but that still tells us nothing about
the true abilities of the tested systems or algorithms.

~~~
Topolomancer
You are welcome! If you want to go further down the rabbit hole, feel free to
ping me via another communication channel; I have some interesting findings to
share but they are unfortunately not ready yet for public consumption.

(this sounds more ominous than I intended it to sound; the reason is plain and
simple that the publication is still under review and we have no preprint)

~~~
YeGoblynQueenne
Haha, don't worry, I thought you meant it's a draft or under review :)

I got a research interest in GNNs and the datasets used in papers like the one
you link, but I have so far only dipped a toe- because I have other priorities
right now. But, more if I ping you :)

------
twsttest
"the way Yann LeCun talked about biases and fairness topics was insensitive"

Insensitive according to who? The most sensitive 5% of people? All statements
will be deemed insensitive by at least one person somewhere. It's silly to
allow the most extremely (often unreasonably) sensitive people to set the
threshold for what is sensitive or insensitive speech.

~~~
chomp
Insensitive to anyone who has a moderate amount of understanding of machine
learning and social empathy.

You can't plug your ears and say "it's just your training set" as a response
to unfairness in ML algorithms. Real life is biased. Any real life data in our
world is going to be biased. If you train algorithms on this data, they will
cement any existing divides in society. So, with the understanding that
researchers need to be more circumspect about ML algorithms than worrying
about just the training data, consider that the upsampling algorithm in
question only worked for white people because they fed it a huge amount of
white faces. Claiming "it's just the training data" is one of those "well yes,
but actually no" situations where ML researchers tend to miss the broader
picture of how ML algorithms are used in real life, and just makes Yann look
ignorant.

~~~
devalgo
The real argument they were making against Lecun is whether a mathematical
function can be biased. Care to explain how a gradient is racist?

~~~
chomp
>Care to explain how a gradient is racist?

Sure. Your comment's language equivalent is something along the lines of "Care
to explain how words are racist?" Which yes, they are just a collection of
words. They possess no consciousness and cannot be racist by themselves.

Similarly, a gradient is just a collection of vectors. It's just numbers.
However, like language, it's what they represent that matters.

For example, I can create a machine learning algorithm to determine who should
get a home loan. I create a gradient to optimize the algorithm to give loans
to people who I think are unqualified.

The gradient can easily be racist if it optimizes heavily on something like
race. Minorities tend to be lower income and so can be seen as less qualified
as higher income individuals. However that's the easy argument, and also quite
illegal. If you exclude race, there's 2nd degree variables that are proxies
for race. Things like zip codes, job titles, whether they rent or buy. These
are not explicitly illegal to filter on, though the end result is illegal if
they exclude certain protected statuses. It can even be no fault of the
researchers who implement the algorithm, because controlling for bias using
real world data is extremely difficult. But we must do it, since it is the
ethical thing to do.

And so, it's easy to see that one can optimize ML algorithms to exclude
certain protected statues, which is what can make the algorithms racist.

~~~
devalgo
You failed the test. The Gradient is not biased, the data is. This was of
course LeCun's point... This is pure foolishness

~~~
chomp
Maybe I'm not explaining it very well. Look, so things have meaning deeper
than their face value. To use a really basic example, The number 14 means
nothing, it's a number. The number 88 means nothing. In the same context, they
mean something not good.

There are English words that as pieces, they don't mean anything except their
face value. I can string words together that mean bad things that are harmful
to real humans.

Gradients are not racist by themselves, they're just math. It's like saying
multiplication is racist.

But I can use multiplication as a tool in a chain to create weighted averages
to create a naive Bayesean classifier to reject people for home loans.

And so too can I misapply gradient descent as a part of a larger ML model that
is racially biased. For instance, I could choose a loss function that when
minimized, gives biased output despite less biased input. Or, I could
accidentally settle on a local minimum on the gradient in my model. There's
many naive implementations of an algorithm that will just be biased no matter
the unbiased inputs.

So in summary, a gradient is just math and is not racist by itself. It's being
used in an algorithmic tool chain that researchers are frequently using which
potentially will always produce biased output no matter the inputs (but more
often than not also with biased input).

------
mellosouls
This is very refreshing to read - looking from the outside in, there has been
a pretty clear problem with (at least the representation of) ML research along
the lines highlighted here - ridiculous hype, papers again and again linked
from the same researchers and institutions indicating _potential_ cliqueyness,
nepotism and celebrity worship.

All very dispiriting, indicating real problems making progress in AGI beyond
the clear lack of ideas.

Of course, just because these toxicity claims are being made, it doesn't mean
they are accurate, but they certainly ring true.

If they are, it is good to know they are being talked about in some quarters.

~~~
zippy5
I'm super curious why you were optimistic about AGI in the first place?

it seem to me that a majority of the performance gains in ML are a result of
using better hardware to run brute-force statistics with larger more complex
models but the algorithms themselves have been improving at a nominal rate.

~~~
mellosouls
I'm optimistic about AGI because I see no reason for it not to be implemented
(though the time-frame is a different matter).

Going by the hype articles (which may be unrepresentative), we just seem to be
moving faster and faster on an impressively powerful, but AGI-irrelevant train
along a machine "learning" railway track and although I suspect plenty of
people on the train would like to get off, the drivers and momentum are making
that very difficult, as indicated in the OP article.

I'm completely optimistic about AGI, just think we are allowing the excitement
of the advances in Artificial Unintelligence over the last few years
erroneously dominate our thinking about it - at least in the sort of papers
that turn up in tech-related feeds. Again, this may be unrepresentative of the
top thinkers in computer science (machine-learning/whatever).

My own (layman!) opinion is that the good ideas have and will continue to come
from external (or intersecting) fields, philosophy, neuroscience, etc; not
computer scientists raving about the power of DeepWhatever using cloud-enabled
networks.

~~~
zippy5
Thanks for sharing! I totally agree with you that we seemed to focus a little
too narrowly. If you haven't read it already, you might enjoy the book Range
awesome look at the impact of interdisciplinary innovation

~~~
mellosouls
I assume you mean:

Range: How Generalists Triumph in a Specialized World by David Epstein.

I'll check it out - thank you.

------
duaoebg
My group of hard core ML friends went fully private. I don’t even pay
attention to the public ML discourse (except on HN.) I think a big part of it
is a low barrier to an international culture and academia which can be toxic
in their own ways.

~~~
asdff
Why go private?

------
chillee
Also see: r/machinelearning has a toxicity problem

[https://www.reddit.com/r/MachineLearning/comments/hkc697/d_r...](https://www.reddit.com/r/MachineLearning/comments/hkc697/d_rmachinelearning_not_just_twitter_has_a)

------
Antoninus
It is unfortunate that culture politics have taken over HN.

------
mrkeen
> everybody is under attack, but nothing is improved.

------
SmokeyHamster
>Fifthly, machine learning, and computer science in general, have a huge
diversity problem. At our CS faculty, only 30% of undergrads and 15% of the
professors are women.

Yes, and the deep sea fishing, oil drilling and logging industries also aren't
super diverse and have a severe lack of women. I don't see anyone complaining
about that. Different groups have different preferences. I've met several
women who have gone into science and IT only to find it immensely
unfulfilling, as it's often a very socially isolating job by nature, and leave
to switch careers.

If there's a rule or law preventing women from getting into the industry, then
let us know and we'll change that. But don't criticize an entire industry
because women on average chose to pursue other passions.

>At this very moment, thousands of Uyghurs are put into concentration camps
based on computer vision algorithms invented by this community, and nobody
seems even remotely to care.

What does he propose be done about this? Tell Chinese government bureaucrats
to stop "stealing" publicly accessible research papers and code to implement
tools that help commit genocide? Sue the Chinese government for violating
licensing agreements that require "no violation of human rights"? Not
everything should, or can be, about global politics. We should let people
researching machine learning worry about machine learning, and leave the
broader socio-political effects to political pundits and sociologists.

------
mendelmaleh
The west has a fragility problem.

~~~
moccajoghurt
We cannot handle the freedom we had since the 90's. We abolished most rules
that a conservative / christian society had. As it turns out we actually
prefer having strict rules. These new rules are now about diversity,
discrimination and racism. They fulfill the same role as the rules of the
conservative / christian society. If you abide the rules, you are a good human
being. If you are ever in doubt about yourself, just stick to the rules and
you will be fine.

I personally don't really like this trend but I think our society is not ready
to handle freedom yet.

------
cheesecracker
I think as long as no government funding is involved, people sho0uld be free
to worship whoever they want and read whatever they want and flock around
posters of whatever topic they want.

If government money is being distributed, things become more interesting. But
I don't think everybody is entitled to a career funded by tax payer money. So
it should probably remain "cut throat" to get a good job based on government
money. Whether governments have the best criteria for handing out money is
another question (number of papers published might not be the best metric).

------
craftinator
I would argue that anything that becomes sufficiently popular enough to be
lucrative beyond some arbitrary threshold becomes toxic. Welcome to the cycle
of capitalism!

~~~
godelzilla
Progress < profits and control

~~~
DudeInBasement
Just a series of weighted if statements anyway...

------
baylearn
Agree with this point:

Sixthly, moral and ethics are set arbitrarily. The U.S. domestic politics
dominate every discussion. At this very moment, thousands of Uyghurs are put
into concentration camps based on computer vision algorithms invented by this
community, and nobody seems even remotely to care. Adding a "broader impact"
section at the end of every people will not make this stop. There are huge
shitstorms because a researcher wasn't mentioned in an article. Meanwhile, the
1-billion+ people continent of Africa is virtually excluded from any
meaningful ML discussion (besides a few Indaba workshops).

~~~
LatteLazy
Maybe I am misunderstanding but... Why does a community need a moral stance on
external issues? If everyone is there to do/learn/advance machine learning,
isn't that the end of it? It isn't their purpose to engage with Chinese or
other issues.

Edit: I think it makes sense to have a stance on Machine Learning moral issues
in a machine learning group. I just don't think you need to have a stance on
every issue ever.

I say this as someone always a bit frustrated that HN and Reddit etc are very
US centric.

~~~
andyjohnson0
> Maybe I am misunderstanding but... Why does a community need a moral stance
> on external issues? If everyone is there to do/learn/advance machine
> learning, isn't that the end of it? It isn't their purpose to engage with
> Chinese or other issues.

Because its wrong to write-off the mistreatment of other human beings as an
external issue. It never is: we're all human and we all share the same planet.

~~~
LatteLazy
So any group anywhere needs to have a stance on every injustice everywhere?
Isn't that exhausting?

~~~
cycloptic
Of course it's exhausting. It would be less exhausting if more people paid
attention to these things instead of viewing it as someone else's problem.

~~~
LatteLazy
That's not true though is it? The more people get involved, the more opinions
you need to weigh and the more consensus you need to get to even 1 stance.

Getting 10 people to agree a stance on the 10 issues facing you is much much
easier than getting 7bn people to agree on the billions of possible issues in
the known universe.

That's sort of my whole point: eithe limit the issues you have to take a
stance on, or watch the amount of time taken up with stances grown
exponentially.

~~~
cycloptic
It is true because there is no alternative. Dismissing the exponential growth
and required time commitment won't make it go away, if anything it amplifies
the problem and increases the number of discussion that need to happen in the
end. If you have no consensus then the stance will just be taken for you by
whoever screams the loudest, in modern times that is social media and
advertising.

~~~
LatteLazy
So you accept its impossible to do mathematically, But insist we still need to
do it because we have no alternative? Doesn't this mean that no community can
ever actually function beyond very small sizes? They will just get stuck
answering infinite moral quandries?

~~~
cycloptic
I can't speak for what the best method of organizing for the ML community
should be, but society at-large is already stuck answering these infinite
moral quandaries.

~~~
LatteLazy
So why are you insisting we do the impossible at either level?

------
eNTi
The problem is not toxisity itself. That's but a symptom. It's:

1\. People getting into positions for the wrong reasons (diversity hires), 2\.
No one has any kind of backbone any more... victim culture and molycoddling
shows it's ugly face, 3\. Real smart people that don't' want to content with
stupid getting rightfully angry with their subpar collegues who are desperatly
trying to being somewhat relevant.

Let's face it... scientists are most of the time not people persons. Most
smart people don't care for your snowflake persona getting hurt by the truth.
It's basically impossible to do antying in an environment that treats
microaggressions as anything else but the completely infantile bullshit that
they are.

GROW A FUCKING SPINE AND GROW THE FUCK UP WHINERS.

