
Drop millions of allocations by using a linked list - jcn
https://github.com/rubygems/rubygems/pull/1188
======
bcg1
I'm not a ruby dev, so I guess maybe my perspective is not that great on this
particular issue... but hats off to the dev with the fix, indeed this is how
free software collaboration is supposed to work in my opinion.

Even the dev with the fix wasn't rude about the original problem, he seemed
pretty humble about it actually.

If you think the Ruby guys are such shitty programmers you should be able to
dive into their codebases and find the plethora of problems to show them
what's up... so either give them a pull request or STFU ;)

~~~
michaelfeathers
> Even the dev with the fix wasn't rude about the original problem, he seemed
> pretty humble about it actually.

The Ruby community is very good interpersonally from my experience. It's a
culture that I think comes from this:

[http://en.wikipedia.org/wiki/MINASWAN](http://en.wikipedia.org/wiki/MINASWAN)

~~~
nevinera
Unfortunately, the _rails_ community are the visible minority, and they follow
DHH's example more than Matz's.

~~~
thinkbohemian
Tenderlove, the super nice guy who submitted that PR is on Rails core. Too bad
to hear he's part of the evil visible minority that you just made up in your
head.

~~~
nevinera
Right, because I claimed that there are no nice people working on rails at
some point..?

We were talking about the overall culture of the rails community. Nor did I
intend to imply the DHH was evil, just abrasive.

"nice" != "good"

------
jph
Great pull request. Ruby makes it easy to duplicate data by calling `.dup` or
`+`, and this does help with state isolation. But duplication is an expensive
operation.

Ruby's standard libraries don't have much support for immutability, or deep
cloning, or copy on write, or linking concatenation. There's no standard
library way to ask for a snapshot of an object.

So in the early days for Ruby, an idiom was: if you're writing a method that
takes a list, and you need to be sure your list doesn't change out from under
you, then duplicate it, get it working, and if it becomes a bottleneck then
optimize it.

------
rfrey
I'm really surprised by the amount of smugness in the comments here. A bit of
good-natured teasing, followed by a wheelbarrow full of "ruby-devs" this and
"web-devs" that.

Take off your Hats of Superior Coding. Any one of us, regardless of honorific
titles, could have made this mistake, and you know it. Being steeped in CS
Fundamentals does not immunize you against bugs.

Congratulations to tenderlove for finding the bug. Remember the details -
it'll be a great war story in a few years.

~~~
agentultra
Smugness: the dark-side of hubris. Hubris being one of the three virtues [0].

I like to remember the past of computer programming as though it was once
friendly and receptive to people of all skill levels. I owe quite a lot to the
geeks who came before me and answered my stupid questions, gave me powerful
tools to learn with, and accepted my contributions; flawed as they were.
Without making a few mistakes along the way I wouldn't be where I am today.

I don't know whether I would have given up if the people I met were more smug
and mean-spirited but my progress might have been slowed by it. Life is too
short to waste bothering with people who are miserly with their good fortunes.
It doesn't cost you anything to be nice and share your knowledge and wisdom.
It may pay off when the person you're sharing with rises to stand upon your
shoulders one day and pay homage to you.

[0] [http://threevirtues.com/](http://threevirtues.com/)

~~~
duaneb
You can be proud of your achievements without being smug.

Also, hubris being a virtue is crap. It may be decent for your self esteem but
it makes you miserable to be around. Case in point: Larry Wall.

~~~
bigtunacan
Agreed, hubris is not a virtue.

Merriam-Webster - hubris - "a great or foolish amount of pride or confidence"

It's just a synonym used for arrogance and smugness. Humbleness is a virtue.

~~~
wutbrodo
From the way you phrased this, I think You may have missed the context link a
couple comments above. Everyone is aware of what hubris means and that it's
not a virtue per se. The point of the linked page is that it's (somewhat
tongue-in-cheek) taking three vices and positioning them as virtues in a
narrow context. It's just semantics; the saying could easily have used
synonyms that have positive connotations, but using negative words instead is
part of the joke.

~~~
bigtunacan
Yes; apparently I had. Thanks.

------
mbrock
Semi-related: does anyone know why installing gems is so ridiculously slow?
What is the thing doing? Downloading tarballs, yes, but then? It's a dynamic
language; there is no compilation or verification! Why can I install Ruby
packages using apt almost immediately, when gem/bundle install takes half a
coffee break?

I'm growing more impatient with the years. I have measured out my life with
slow software. We talk about saving developer time with dynamic languages,
but, as Flight of the Conchords sang, _the sneakers don 't seem to get much
cheaper; what are your overheads?_

~~~
steveklabnik
This may sound a bit glib, but the reason it's so slow is because basically
every Rubyist goes "Does anyone know why installing gems is so slow? What is
this thing doing?" And then _takes a coffee break instead of figuring it out_.

There are very, very, very few people who actually do any work on core
infrastructure projects. I don't blame them. The codebase of rubygems is...
not exactly welcoming. I myself did some work a while back, got frustrated,
and quit. But if you want the real reason, there it is.

Oh, and when you _do_ overcome all these barriers and then actually make an
improvement, people will say "lol rubyists learning data structures" instead
of saying "thank you for saving a bit of the most precious resource I have
every single day, time."

~~~
moe
_The codebase of rubygems is... not exactly welcoming._

What a very polite way to put it.

IMHO that whole mess (rubygems + bundler) would ideally be replaced from
scratch, removing the need for bundler in the process.

If any generous sponsor wants to improve Ruby as a whole, that's where their
money should go. Imagine the productivity gains if everyones test-cycle was
suddenly >10% faster, and nobody would have to waste energy on
bundler/rbenv/rvm issues anymore.

Perhaps we could even fix the deployment nightmare in the process, with e.g.
jar-style packaging, but now I'm really dreaming...

~~~
steveklabnik
I may or may not have threatened to do this after having had one too many
drinks, but when I sobered up, I... sobered up. ;)

While I love a good 'burn the world down re-write,' it's a _lot_ of work and
isn't guaranteed to succeed. It's been tried before, and in other languages
too: check out wheels in Python.

That said, Rubygems did replace what came before it, and Gemcutter replaced
what came before it... so it could be done. It's just non-trivial.

~~~
moe
_I... sobered up._

Same here, still have my napkin notes. It's actually one of my bucket list
projects, if there wasn't the dreadful food-on-table constraint...

 _check out wheels in Python._

Valid point. Python is indeed a good example for an even worse situation. In
fairness, most languages are still worse off than Ruby even now. I wouldn't
trade maven, CPAN, etc. for bundler with all its warts.

However, there's also languages pulling ahead. npm seems to be slowly getting
there (after a rough start) and the Go experience (godep), while still in flux
and not directly comparable, is also something to draw lessons from.

------
kilotaras
I'm torn on this one. It's a great performance improvement, but on the other
hand I would expect this way sooner than after almost 4 years of usage.

~~~
vidarh
It's because the main bottleneck for most apps that uses lots of gems is
elsewhere (the load path grows with each extra gem, meaning a simple 'require'
gets more and more expensive the more gems your app uses).

------
mjs
Are there any profiling tools that would have found this? Flame graphs showing
the time spent in traverse, perhaps?

It seems like this should have been trivially detectable, since the difference
is so dramatic.

------
noir_lord
It's always nice when a small change is a big win.

This reminds me of the gc_disable() PR that reduced composer install times by
half a few months back.

~~~
Someone1234
Here's an article on the composer change:
[http://blog.ircmaxell.com/2014/12/what-about-
garbage.html](http://blog.ircmaxell.com/2014/12/what-about-garbage.html)

------
cranium
All your tests passed, nothing broken, a small bit of code for tremendous
optimization,... I can feel the satisfaction!

------
kendallpark
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass up our
opportunities in that critical 3%"

\--Donald Knuth

------
icebraining
I'm not very familiar with Ruby; where exactly did the duplication occur in
the original code?

~~~
barrkel
I believe it's in the Gem::Specification.traverse method here:

[http://ruby-doc.org/stdlib-1.9.3/libdoc/rubygems/rdoc/Gem/Sp...](http://ruby-
doc.org/stdlib-1.9.3/libdoc/rubygems/rdoc/Gem/Specification.html#method-i-
traverse)

The 'trail' parameter, an array, was implicitly duplicated by applying the '+'
operator on each recursion through a dependency.

~~~
weavie

        trail = trail + [self]
    

That + operator looks so innocent, so seductively simple..

------
airblade
So many armchair critics!

It's very easy to pontificate in hindsight when somebody else has done the
hard work of actually finding something that can be improved.

------
steipete
Ruby people discovering algorithms _ducks_

~~~
perdunov
Really. Every time I whine publicly how web programming people aren't familiar
with even basic CS, I get a slap.

But really, I should move to web programming. I'll be an expert computer
scientist there, probably.

~~~
aidos
Annnndddd.... what reaction do you expect? _" Oh, please come and do web
development so we can bask in the glow of your self-righteousness and infinite
knowledge of computers."_

/snark (apologies for being offensive, but good lord, what a silly statement -
unless I missed the joke)

I stand before you as a counterpoint to your foolish generalisation, and guess
what, I know plenty of other people that don't fit your stereotype either.

I'm not saying there's not an element of truth to your statement. See, here's
the thing. People throughout the software (or any) industry have different
collections of knowledge. There are an endless number of things to learn and
each one of us is different and brings a different set of skills to the table.

To be great at web development you spend _years_ learning the subtleties of
developing for a vast domain of different platforms. There are bugs in the
platforms decades old that I know the intimate details of and have workarounds
for especially constructed to fit in with the other bugs in the other
platforms we deal with. And that's a tiny facet of what you need to know.

You know what would really happen when you came over to web development? You'd
find that a lot of the skills as a computer scientist aren't altogether
useful.

You wouldn't be an expert computer scientist. You'd be a junior developer,
probably.

~~~
Swizec
I keep repeating myself on Hacker News, but once more, we've found the
difference between Software Engineer and Computer Scientist. One makes things
work, the other is a mathematician.

Why do we keep conflating the two?

~~~
robmccoll
You should be careful with the term engineer. By definition, engineering is
the application of scientific and mathematical knowledge to solving practical
problems. Without knowing and understanding the science and math behind
computing and software, one can hardly claim to be a software engineer.

~~~
Dewie
So many programmers have this weird inferiority complex when it comes to the
term "engineer". Not you, but those who think that most programming can never
be called "engineering" because people don't die if you introduce a software
bug[1] (as if the only kinds of modern "engineers" have to do with immediately
safety-critical things). I prefer the plain "programmer" myself, but I don't
see the big deal unless "engineer" is a protected title wherever that person
lives.

[1] Note that I said "most programming".

~~~
agentultra
I'm a little sheepish about using it around the engineers in my life because I
know I'm not legally liable and held accountable to the same standards they
are when I make a mistake in the software I ship.

I like to think I take a certain amount of rigor in the choices of tools and
processes and design philosophy that reduces the amount and impact of bugs...
but if we get a customer complaint about our product we don't generally issue
a recall and lose millions of dollars.

It's not that I spend any less time learning theory and application and it's
certainly no less challenging in some cases than even mechanical engineering
but... it's a liability thing.

Also, I don't write software for aerospace control systems.

I've seen companies advertise "software engineer," positions whose primary
responsibilities included running a fleet of Wordpress blogs.

Just a matter of perspective I guess.

~~~
Dewie
Like I said: inferiority complex.

------
shiggerino
Nobody show this to Bjarne Stroustrup [https://www.youtube.com/watch?v=YQs6IC-
vgmo](https://www.youtube.com/watch?v=YQs6IC-vgmo)

~~~
kaeluka
AFAICT, this performance bug is not at all related to the linked-list vs.
vector issue.

~~~
rakoo
Yes it does: vectors are good for random access, linked-lists are good for
doing stuff in the front/back of the list. The performance bug we have here is
solved by finding a way to insert stuff at the front/back (and also going
through each item in the list); there is no need for random access.

~~~
kaeluka
If I remember Bjarne's talk correctly, vectors (in C++) are even fast at
inserting because they have densely packed representation which rhymes well
with modern computer architecture. Inserting in a linked list is slow, as
walking the list to find the element at which to insert will already incur
O(N) cache misses, whereas in vectors it's only O(1) cache misses. Moving the
elements in the vector one to the right is fast due to computer architecture
dealing well with predictable patterns.

The allocations here (ruby) are reduced because the implementation of
appending is horribly slow in the first place, using defensive cloning (I'm
taking jph's word here).

~~~
herewego
Yes, but FWIW most linked list implementations have a reference or pointer to
the tail, making appends not O(n), but O(1). However, there is a threshold,
depending on use case, where a small vector being resized multiple times
larger than the original will be faster than many linked list appends. Point
being, either can accel depending on use case.

------
chatman
Just comes to show how careless Ruby guys were while building this.

~~~
eddd
1\. Make it work 2\. Optimize

~~~
rosspanda
From working with Ruby guys its normally 1. Make it work 2. Spin up more AWS
boxes

~~~
pothibo
Nothing's wrong with spinning up more AWS boxes. If it costs 300$ annually to
solve a problem that would cost 5k$ in development to fix, I believe it's a
wise choice.

Yeah, down the line you will eventually have to do optimization, but you will
prioritize.

~~~
rosspanda
A agree somewhat, but I've seen 10 box systems that could run on raspberry pie
with good code

~~~
lmm
I've seen them. I've worked on them. But again, where's the cost/benefit?

~~~
nirvdrum
I'm hardly what most environmentalists would call an "environmentalist", but
one cost here is the increase in carbon footprint. Of course, to the company
the cost/benefit analysis errs on the side of just spinning up more boxes. But
from a larger perspective, taking some extra time to make more efficient use
of machines could have a drastic impact. Many optimizations don't require
months to implement. Many of those are even avoidable with a bit of foresight.

------
jokoon
I still can't see the real usefulness of linked lists, the idea of having a
data container that doesn't have a transparent indexing algorithm sounds ill-
advised.

Linked lists should be named "linked graphs" instead.

There is so much relevant science to learn about CPU caches, than there is
about using a container which is based on nested pointer indirections.

~~~
jdmichal
If you're going to change the name, "unary trees" makes a lot more sense.
"Linked graph" does not imply the 1-child-per-node linearity requirement of a
linked list.

------
voidhorse
Good job, tenderlove, that's a nice performance boost.

Personally, I really dislike Ruby's syntax, though I haven't spent a huge
amount of time with it (because I dislike the syntax). The use of bracers and
other lexical markers makes code a lot clearer and faster to decipher, imo,
than a bunch of def and ends. (I know that () are optional in Ruby, can you
also use {} if you desire? Again, not 100% familiar with the language
features. Just know some standard rails implementations of the language).

Maybe it just hasn't 'clicked' with me yet, but bleh. The dynamic typing
doesn't help its case in my book either. That's my personal preference, and
why I try to avoid using ruby, even for the web backed by rails despite it's
popularity. Then again, if you need to get a web project up and running
quickly rails is never a bad choice (in my experience).

~~~
swah
Related:
[https://twitter.com/tenderlove/status/576389996019462144](https://twitter.com/tenderlove/status/576389996019462144)

