
The most copied StackOverflow snippet of all time is flawed - chris_wot
https://programming.guide/worlds-most-copied-so-snippet.html
======
ben509
Suppose we had an index of snippets, meaning you've parsed them and are able
to search isomorphically. So, e.g. variable names are not significant. Some
techniques discussed[1].

Then we run that against source repos, we could get update notifications for
copypasta'd code.

"In file F at line L, it looks like you used some code from SO at revision R.
In revision R', it's been corrected."

[1]:
[https://wiki.haskell.org/Hoogle#Theoretical_Foundations](https://wiki.haskell.org/Hoogle#Theoretical_Foundations)

~~~
eterm
We essentially have that, they're stored in NPM, and it's horrible.

It turns out when you can package snippets you use so many you can't possibly
keep track and audit them all.

Just look at the Left-pad thing, or the event-stream thing.

~~~
spuz
What do you mean by this? As far as I understand, NPM provides access to
packages, not snippets and doesn't as far as I know provide a way to search
the code in those packages let alone isomorphically.

~~~
eterm
A lot of npm packages aren't longer than a typical stackoverflow answer, and
they get used everywhere, to the point where installing a dozen packages can
lead to tens of thousands of sub-packages being installed.

At that point, the packages are essentially "indexed snippets" of code.

------
loldot_
The simple, readable, loop-based snippet mentioned in the article also works
for the edge cases. Sometimes it's better to not be so clever.

~~~
Ragib_Zaman
And the pow and log methods they call run loops anyway. What is wrong with
loops anyway?

~~~
Recursing
> the pow and log methods they call run loops anyway.

That's actually not true (as somebody mentioned on reddit)
[https://github.com/openjdk-
mirror/jdk/blob/jdk8u/jdk8u/maste...](https://github.com/openjdk-
mirror/jdk/blob/jdk8u/jdk8u/master/src/share/native/java/lang/fdlibm/src/e_log.c)

It is probably slower anyway, but I was surprised to see no loops

~~~
rand_r
They kind of cheated because they used a hard-coded polynomial approximation
of log(x). So there is an implicit unrolled loop going over A_i*x^i.

------
billpg
Is using Math.pow and Math.log (twice) faster than a tight loop that won't run
more than six times?

~~~
PaulHoule
No.

You run into the same problem if you are writing something like the C 'itoa'
function (integer to ascii); if you want to write the digits out front to back
you need to know what divisor to use for the leading digit so you need to
either look it up in a table or take the log.

Taking the log is a _lot_ slower than the table lookup, I found that out the
hard way.

People convert so many integers to ascii and it is shocking how slow ascii <->
binary numeric conversions are compared to binary numeric operations, so it's
not a matter of "premature optimizations".

Now you can write an itoa which generates the digits from back to front and
not have to worry about copying the results because you return a pointer to
the middle of the result buffer but then memory management gets more
complex...

~~~
mark-r
You can use a binary search to make the lookup even faster. There's no need to
iterate over all possible lengths.

~~~
PaulHoule
I'm not sure that you come out ahead with binary search over n=10 given branch
prediction issues. You'd have to test it to really know.

------
janpot
I feel like this would be a good argument in favour for small scoped packages
like we sometimes see on npm. Often enough it turns out that a trivial code
snippet like this turns out to be not so trivial after all.

edit:

The point being that you lose all connection with a snippet after you
copy+paste it. I can clearly see benefits when you centralize its development,
make use of the collective mind to harden it, and get notified about possible
updates whenever an edge-case is found.

~~~
Freak_NL
It's a shame that including unit tests makes the snippet non-trivial to copy
quite fast.

Rust has a nice solution for this though: tests can also be embedded in
documentation comments:

    
    
        /// Adds one to the number given.
        ///
        /// # Examples
        ///
        /// ```
        /// let arg = 5;
        /// let answer = my_crate::add_one(arg);
        ///
        /// assert_eq!(6, answer);
        /// ```
        pub fn add_one(x: i32) -> i32 {
            x + 1
        }
    

From: [https://doc.rust-lang.org/book/ch14-02-publishing-to-
crates-...](https://doc.rust-lang.org/book/ch14-02-publishing-to-crates-
io.html)

More languages could adopt that idea, and a good StackOverflow answer would
include those tests in the snippet. StackOverflow might even automatically run
the tests and add a passing/failing badge!

~~~
gimboland
In python since 1999:
[https://groups.google.com/forum/#!msg/comp.lang.python/DfzH5...](https://groups.google.com/forum/#!msg/comp.lang.python/DfzH5Nrt05E/Yyd3s7fPVxwJ)
:-)

(Though not in the stdlib til v2.1, April 2001)

------
calibas
I forget that everything I copy from SO, and everything I post there, is under
a CC BY-SA license. That SA is "share-alike" and I don't think people really
understand what that means. From Wikipedia's article on it:

"These licences have been described pejoratively as viral licences, because
the inclusion of copyleft material in a larger work typically requires the
entire work to be made copyleft."

Now how much code uses something copied from SO? And I wonder how copyright
even applies to "code snippets"?

~~~
skymt
> And I wonder how copyright even applies to "code snippets"?

The Software Freedom Law Center has a substantial article on copyrightability
of code: [https://softwarefreedom.org/resources/2007/originality-
requi...](https://softwarefreedom.org/resources/2007/originality-
requirements.html)

The important point here is that there's no minimum length for code to be
copyrightable. It simply needs to be original and at least minimally creative.
Since at least thousands of other developers have found the snippet to be
useful enough to directly borrow rather than writing an equivalent, it sure
looks copyrightable to me.

~~~
calibas
Thanks for that link! I found this part especially interesting:

"In particular, the laws stress that it is a programmer’s expression of some
functionality that may be protected by copyright, and not the functionality
itself. If code embodies the only way (or one of very few ways) to express its
underlying functionality, that code will be considered unoriginal because the
expression is inseparable from the functionality. Similarly, if a program’s
expression is dictated entirely by practical or technical considerations, or
other external constraints, it will also be considered unoriginal."

Sounds like a case that at least some snippets aren't copyrightable.

~~~
perl4ever
I don't understand how the above principle can distinguish anything, at all.

You could reasonably argue that every piece of code is completely and only
expressing functionality, because it's all inherently directing the computer
to do stuff. So only comments would be protected.

On the other hand, you could instead argue that every piece of code can be
translated into another language, and in fact is, whether interpreted or
compiled, so the source code is exclusively expression only as the
functionality is never tied to it.

But it doesn't seem to me to make any sense to say that some part or aspect is
expression and another is functionality. It's all or nothing.

~~~
calibas
I assume it's like plagiarism, and if you rewrite the code "in your own words"
you've copied the functionality but not the individual expression.

Also seems that copyright regarding code and the whole world of copyleft is
still a grey area in the courts.

~~~
perl4ever
That's interesting, comparing it to plagiarism, reminds me of when I was
shamed when I was like 8 for rewriting a paragraph from a book in my own words
for an essay. At least when I was that age, that was totally considered
plagiarism (at least by my parents). It was crushing to find that even though
I'd worked _really_ hard on paraphrasing each sentence, it didn't count and
I'd missed the whole point.

I wonder what standards colleges and research journals have now.

------
mfer
> user contributions licensed under cc by-sa 4.0 with attribution required.

I was just thinking about how everyone ignores the license for the code on SO.
The code and way it's used is flawed.

~~~
nkrisc
For what it's worth, in the common user journey through SO (Google SRP to SO
question page, scroll to answers), where does SO ever tell you about the
license or the requirements of complying? If they simply rely on people
knowing that generally there are licenses and such and they should look into
it, then it's hardly a surprise almost no one complies. I've worked with many
people, developers and not, who probably couldn't even rattle off a few common
licenses.

I'm not a professional developer, but I've copied snippets from SO before.
I've always included the answer URL in a comment next to it though. But mostly
because if I ever had issues with it, I wanted to know where I got it and on
the off chance anyone else looks at my code, I wouldn't want them thinking I
wrote code I didn't write.

~~~
umanwizard
Common misconception: that licenses create requirements and reduce rights.

In legal reality if no license is attached to code you have almost zero rights
to copy, use, or distribute it.

So if a casual browser of SO doesn’t see any license terms, they should assume
that doing almost anything with the code is illegal.

~~~
Kuinox
It mean the user that asked the question cannot use the code of the answers.

~~~
zAy0LfpBZLC8mAC
No, it doesn't, because it is licenced under CC BY-SA.

------
zaroth
One of my favorite functions like this that I found years ago on SO is an
extremely efficient bit of code to convert a hex string to byte[] in C#.

It would be fun for someone to make a standard library that consisted of all
the highest voted SO utility functions.

~~~
cm2187
Would be even better for Microsoft to add the most frequent snippets of code
to the CLR!

------
zelly
I wonder what percentage of bugs in general are from people not understanding
how floating points work. I think property based testing (QuickCheck) should
be used whenever floats are involved. Nobody ever seems to get them right.

~~~
deogeo
Would QuickCheck know to try 999999 as input? Most of the possible inputs
would give correct results, and those that don't, aren't very 'special', such
as 0,1,-1,MAX_INT, and so on.

~~~
mrgriffin
Don't many QuickCheck-inspired libraries have special cases to ensure they
generate common numbers like those? I could be misremembering, but I would
have sworn I read that in the documentation of the last library I looked at
(whose name escapes me).

~~~
deogeo
Sorry, I phrased myself poorly - in this case, the inputs that give incorrect
output aren't very special, _as opposed to_ 0,1,INT_MAX,...

Although including numbers such as 10^N - 1 isn't out of the question either.

------
jancsika
What's the cost of reading, correcting, and creating pull requests for 6943
bugs vs `apt-get update && apt-get upgrade`?

~~~
sebazzz
Your Github account.

You'd be violating the ToS.

~~~
thekyle
What part of the Github ToS would that violate?

------
mkesper
To be human comparable, I'd prefer everything in MBs instead.

    
    
        ls -l --block-size=M

~~~
andreareina
A lot of the gnu coreutils have a -h flag for "human-readable" sizes. Works
with sort too, so it'll put 990M before 1.2G

------
codeulike
Its not that flawed though. Showing 1000kb instead of 1mb could be interpreted
as a deliberate decision.

~~~
alpaca128
To me 1000kB would imply that the result was calculated with 1024 as base
instead of 1000, which is a wrong assumption.

It should always use the highest relevant prefix to avoid confusion imho.

~~~
mark-r
The best way to avoid confusion is for _everybody_ to stop using powers of 2
instead of powers of 10 for K, M, etc. The meanings of Kilo, Mega, etc. were
well defined before computers were invented. It was a mistake to steal those
terms and use them for something different.

------
Dowwie
this was as of 2010?

------
nottorp
Actually, SO's obsession that the answers should contain code ready to
copy/paste is flawed. They'd rather give them fish instead of teaching them
how to fish.

~~~
the8472
I don't see that obsession, I have given plenty of prose-only answers that are
the most popular one for the given question.

~~~
mark-r
I've even gotten compliments for giving a prose-only answer instead of code.

~~~
nottorp
But those were the users, not the moderators :)

~~~
mark-r
On StackOverflow there's not much difference.

