
Why can't you guys comment your fucking code? - clamsprez
https://www.reddit.com/r/MachineLearning/comments/6l2esd/d_why_cant_you_guys_comment_your_fucking_code/
======
marcus_holmes
I have to comment my code so future-me will understand it. I'm often grateful
to 2-weeks-ago-me for them. But this awesome rant isn't actually about
comments, it's about academic writing in general.

I've always thought that academic writing is designed to impress rather than
educate. I haven't read much academic code, but it seems the trait has carried
over.

I really can't stand code that is designed to make the programmer look smart.
I stopped messing about with CodeWars for this reason. All this dick-measuring
about how few instructions you can use to implement a problem _eyeroll_. It
misses the important thing: when some poor sap comes back to maintain the code
in 2 years' time, can they?

~~~
throwawayjava
Believe me, very few grad students are writing code to "look smart".

First of all, coding isn't exactly a high class activity in most of academia.
Any implementation that's not an uncommented mathematica notebook or matlab
mess is in the top 1% of academic code.

Second of all, no one reads the code. So while there are incentives to
intelli-obfuscate papers to trick naive reviewers, those incentives don't
exist with code. (Oh, and it doesn't really work out well in papers either
most of the time.)

~~~
pmiller2
Don't forget about the third type of shitty academic code: the gigantic pile
of undocumented C++ that doesn't even build anymore.

------
Cieplak
I hate to disagree but that's fairly clean code. It's not doing anything
tricky or exceedingly clever, so the code speaks for itself. They're even
polite enough to separate their implementation from their interface, which is
rare for projects in dynamic languages. It reads like idiomatic, OO, pep8'd
python. Python is basically executable pseudocode as it is, provided the
method names and parameters convey the semantics of the code.

[https://github.com/facebookresearch/end-to-end-
negotiator/bl...](https://github.com/facebookresearch/end-to-end-
negotiator/blob/master/src/agent.py)

~~~
stubish
The code speaks for itself, so there is no way to confirm it works as
intended. Because the intention is undocumented.

The code tells what the program does. The comments should state what it is
supposed to be doing and why it should do it. When they don't match you have
found a bug.

~~~
DigitalJack
“Code never lies. Comments sometimes do.”

I forget who that quote is attributed to.

------
Ace17
> When naming things, are you charged by the character? Do you get a bonus for
> acronyms?

Funny you're saying this, because it took me a while to understand what you
meant by "DL" (which is required to understand what code you're complaining
about).

~~~
vegiraghav
Kind of a hypocrite mentality. Uses abbreviations in a forum where others will
read too. But a coder cant use abbreviations. Cool.

------
chrisbennet
I'm not sure that (gasp!) comments are a good thing these days.

See "Comments are Lies" [http://www.codingblocks.net/podcast/clean-code-
comments-are-...](http://www.codingblocks.net/podcast/clean-code-comments-are-
lies/)

------
WheelsAtLarge
I'm a programmer that hates to comment his code so I don't do it. Most
programmers feel the same way. Basically the reason we don't do it is because
we don't have to. I suspect the only way it will happen is if our code needs
it to properly compile.

~~~
copperx
Strangely, I really like to comment my code, although it frustrates me that
even the best IDEs have zero support for keeping comments up to date with the
code.

Changing, say, a function should prompt the IDE to somehow "tag" the
associated comment so that you remember to check its accuracy, at the bare
minimum. Other nice-to-haves would be having references to code, e.g., if you
talk about variable x in your comment and then rename it, the comment
reference should be updated.

You can feel the disdain for code commenting in the industry by looking at the
anemic tool support for comments.

~~~
oldandtired
Agreed. One of my projects is about updating a specific code editor to manage
comments as a "meta-data" of the code itself. Otherwise, maintaining comments
becomes a nightmare in the maintenance regime.

------
arvinsim
Because comments can get outdated and developers don't bother to sync them
with their code.

The best compromise is to make your code readable and only comment about the
"why", not the "how".

~~~
sclangdon
That isn't a compramise, it's the goal. Your comments _should_ be about the
"why", and not the "how". We already know how, that's what code is -
instructions about how something works.

What we don't know is _why_ it was done in the first place, or _why_ it was
done a particular way, or _why_ I shouldn't change that line that looks wrong
but is really correct.

------
lojack
This is less a problem with the code being well commented and more a problem
with it being difficult to understand. Comments can help clarify a difficult
to understand block of code, but they don't make that code more readable,
which doesn't really solve anything when you go to maintain it.

The fact that the code is unreadable is because it's written by a ML expert.
I'd say it's actually higher quality than you'd usually get from academia, but
nowhere near what you'd see outside of that. Academic code serves a very
different purpose than production code. It's meant to prove a point and
usually not used after. There simply isn't incentive to write well tested
maintainable code.

------
Radim
We run an Incubator for university students, helping them bridge the gap
between "academic" and "useable" code, so I have to deal with this topic all
the time.

Why is code coming out of research labs/universities so bad?

1\. DON'T WANT TO WRITE CLEAR CODE

Different incentives between _academic research_ (publications count, citation
count, fear of being scooped if research too clear...) and _industry_ (code
maintainability, clarity, robustness, handling corner cases, performance...).

2\. DON'T SEE WHY CLEAR CODE MATTERS

Academic projects are one-offs, not grounded in a wider context. Even if the
researcher would _like to_ build something long-term useful and robust, they
don't have the _requisite domain knowledge_ to go that deep. This external
knowledge, the subject matter expertise, is hard to come by.

In lieu of that, researchers start solving _artificial problems_ on
_artificial datasets_. The proper project cycle is broken, there's no feedback
from other people using the fruits of your labour.

3\. DON'T KNOW HOW TO WRITE CLEAR CODE

Lack of programming experience. Choosing the right abstraction boundaries and
_expressing them clearly and succinctly in code_ is HARD. Naming things is
hard, structuring things is hard -- clarity in your head first, the code
follows naturally.

But it's a skill like any other. When we put students to work on real ML
projects, they are shocked. Many have never seen a properly designed piece of
code in their life, don't know any tools, how to share or collaborate (git,
SSH...). It's not like they'd sprinkle some extra comments here and there and
be done -- the whole _code structure_ and how they attain clarity of thought
is different.

The GOOD NEWS is, _it doesn 't cost any more time to write good code than bad
code_.

So once they learn, everyone wins. That's the main purpose of our Incubator
programme. It's not like researchers write crap code on purpose, no need to
assume bad faith. Seeing the fruits of your labour useful and used by others
is immensely rewarding!

------
wcummings
Wow, the reddit mobile site is totally unusable. Can someone paste the
content?

------
bryanrasmussen
As a general rule I comment code that I don't find self-explanatory, and that
I can work up a meaningful comment for.

The first condition is of course a problem that one has to deal with if the
person writing code finds things self-explanatory that to you are highly
complicated.

The second condition is of course a problem for those people who are not able
to explain things well in human language.

edited: for clarity (prior actually my really clever example of people who are
not able to explain things well in human language)

------
bdibs
The majority of my work is written clearly and named well, so it doesn’t (in
my opinion) merit comment.

That being said, if I’m doing some obscure calculations or something
ambiguous, I’ll usually comment on it (for future me to understand the
context).

~~~
oldandtired
By whose standard is your work written clearly and named well? All code merits
comment, even if only to say why it exists at all.

------
honestoHeminway
My rant to patch trans-piler did throw a error at line 42.

Error: Not enough emotional investment to fix a single line, but enough to
write several paragraphs of rant.

Nobody gets a free lunch, not even those trying to get a free lunch from free
academic software.

------
throwawayjava
This comes off as extraordinarily entitled.

Academics are not rewarded for good software engineering. The incentives are
not properly aligned. And more-over, the that's not necessarily a bad thing!
Research is research, not engineering.

Stop expecting production quality code out research groups. _The purpose of
research is to explore ideas, not implement some epsilon of your product._

If you want a perfect exposition in addition to a high quality implementation,
there are ways to get it. Pay the consulting fees. Or pay the tuition. Or hire
the students, Or if you can't afford that, wait for the thesis/book.

 _> Is pseudo-code a fucking premium?_

In a paper? Well... yes. Categorically, emphatically, yes. If you want to read
an exposition where space does not come at an extreme premium, you'll have to
wait for the thesis/journal article.

Space comes at a premium for stupid reasons (artificial page limits) but also
for good reasons (expositions cost time, which costs money and grad student
time, and fuck if I'm doing a 10 year Ph.D. at $7/hr because you're irritated
at some python code.

 _> Can you at least try to give some intuition before showering the reader
with equations?_

Either 1) no because you're not the intended audience, in which wait for the
thesis/TR/dissemination blog post where I'll work every gory detail; or 2) no
because the equations are hard for all of us to understand and no one has
solved the exposition problem yet.

 _> How the fuck do you dare to release a paper without source code?_

So all the products you work on all totally open source, eh? Either fully
commit to Stallmanism while living off less than minimum hourly wage or get
off your high horse.

 _> Why the fuck do you never ever add comments to you code When naming
things, are you charged by the character? Do you get a bonus for acronyms? Do
you realize that OpenAI having needed to release a "baseline" TRPO
implementation is a fucking disgrace to your profession? Jesus christ, who
decided to name a tensor concatenation function cat?_

Because they're being paid 30k a year at best for 80 hour weeks under intense
pressure to implement new ideas and the prop firms and Googles are knocking
with 200k comp packages and honest to god 5 day work weeks. Time is (SERIOUS)
money when you're slogging through grad school. Again, walk a mile.

And again, really, the point is to test ideas. For every publicly release line
of code coming out a research group there's 5x or 10x unreleased code
implementing dead ends.

If you want all this, stop bitching at over worked and absurdly under-paid
grad students, and start calling your congressman to support higher NSF
budgets so research groups can hire more permanent research support staff.

~~~
cjhanks
You might be conflating software graduate students with research developers. I
have translated a lot of research code into production code; whether it be
sourced in Matlab, incomprehensible C, confused C++, or convoluted Numpy.

Okay - so grad students have very little incentive to write readable code...
fine, understandable. They are usually lucky if they can write an algorithm
which isn't finely tuned to a very narrow dataset. In that sense, most Ph.D
papers are outright fabrications lying about capabilities written by authors
seeking paychecks. If people were able to investigate the underlying source
code, that would be fairly apparent.

That said, why do so many professional research developers also suck as
writing comprehensible code? They have fat paychecks, what's the problem now?
There absolutely exist research developers who write very high quality
software. They may still use very terse nomenclature (due to high cognitive
load), but their code can be logically comprehended by those who do not
understand the field. There are a lot of reasons (I think) for this; no
incentive to improve, poor understanding of their own discipline, no
acknowledgment of the problem, expectation that such work is for "engineers".

In reality - a lot of research developers are simply incompetent, but they
find a way to regularly convince others they are not.. often through
obfuscation. They're stringing together implementations from various
publications (which themselves might be total fabricated lies) with relatively
little comprehension.

That's okay, a lot of software engineers are incompetent too.

Own it.

~~~
throwawayjava
_> You might be conflating software graduate students with research
developers_

I'm not conflating. The vast majority of research software is written by grad
students.

 _> why do so many professional research developers also suck as writing
comprehensible code?_

Because incentives aren't aligned. And it's not even clear they _should_ be
aligned.

 _> There absolutely exist research developers who write very high quality
software... In reality - a lot of research developers are simply incompetent_

There absolutely exist software developers who write very high quality
software... In reality - a lot of software developers are simply incompetent.

 _> If people were able to investigate the underlying source code, that would
be fairly apparent... Own it_

1\. A huge amount of research software _is_ open-sourced and _does_ work as
advertised.

2\. You don't get it both ways. Either no open source demands, or no
expectation that grad students should "own their work" \-- and then making
demands that they _literally_ don't own their work.

------
peapicker
Rant on, but I tuned you out.

