Since Ruby 3, the automatic coercion of keywords to a hash—the second example underneath "Passing Hash to Functions" in this post—is considered a legacy style and is generally frowned upon in new code. That is to say, code like the second call to `foo` here:
One of the strongest arguments for avoiding this sugar is that it makes the code more brittle in the face of future changes. In particular, in the example above, if we add a new keyword argument to `foo`, then any call which omitted the curly braces will break, while calls which used them will keep working fine:
# added a new keyword arg here
def foo(kwargs = {}, frob: false)
kwargs
end
foo({k: 1}) # still ok: `frob` defaults to false
foo(k: 1) # ArgumentError: no keyword: :k
That's kind of the point of default values, though. If you wanted to make a change to the method and you want it to break to alert you to all the calls, then you can add a keyword without adding a default value for it.
And that first k[v] (unlike k.key?(v)) will trigger the Hash.new block again, so it'll recurse until it runs out of stack. But neither check is necessary, because the Hash.new block will only ever get called if k.key?(v) is false.
If you want a more compact version, you could do:
fib = Hash.new do |k,v|
next 1 if v == 0 || v == 1
k[v] = k[v-1] + k[v-2]
end
That was the first thing I tried, but it blew the stack :) You can still blow the stack if you pick a number that's too high, like k[20000] or something. But if you pick a lower number & cache that, then you can (eventually) call a higher number without blowing the stack. This recursive approach is a horribly inefficient algorithm anyway, so I don't think it's worth optimizing :)
I just did it for fun. This particular recursive approach is super slow for numbers of nontrivial size, so I was just curious if I could even make the caching work in the block. It's not worth optimizing a suboptimal query when a more efficient option is available anyway.
However, the application utility of a self-populating lazily-evaluated lookup structure goes further. A hash with a default function works great as a simple caching wrapper for all manner of results, for example when talking to slow or fine-grained APIs.
It is just a hash map with a few common functions defined; hash maps are occasionally useful, but what is all the praise about?
The mentioned "simple" bit is arguable: as a language's building block, a hash map is relatively complex and specific, since those can be built out of lists (or trees, though lists/arrays may be preferable for efficiency), which can be built out of pairs (product types, cons cells, tuples; unless going for that efficiency, though it can still be pretty efficient with trees). Maybe it is one of those "simple versus easy" mix-ups.
Well yeah, it's "simple" (or easy) to use, definitely not simple to implement. But having an easy-to-use hashmap is par for the course for most newer languages - not only Ruby, but also PHP (associative arrays), JS (objects), Go (built-in map type) etc.
This is a lovely overview. Hash is a great example of how delightful it can be to program in Ruby.
One more technique worth noting is the chained functional style of using Hash, which you can do in Ruby because Hash inherits from Enumerable. If you're prototyping a script to do some data-cleaning, this makes it easy to build up your pipeline and iterate on it. For example:
foobar = { ...your data here... }
foobar.map do |k, v|
# ...
# do some transformation here
# ...
# and then return the replacement key/value for this entry
[key, new_value]
end.select do |k, v|
# do some filtering here, e.g.:
is_foobarable?(k, v)
end.map do |k, v|
# ...
# do some expensive transformation here on the smaller data set
# ...
[key, newer_value]
end.to_h
(Note that you have to call #to_h at the end since the Enumerable functions will coerce the hash into an array of arrays.)
Now your code literally shows the pipeline that your data is falling through — and each of these steps can be side-effect-free, with no mutations to the original foobar structure.
Unless things have changed and Ruby has stream fusion now, this is bad advice for scale. You are iterating over a fat object multiple times. Even if its uglier its much better in this case to create an empty array/hash, iterate over with #each and #<< to the hash.
I worked at the largest Rails shop in the world and this would be rejected in code review.
Edited to add more detail: the only method you need to write to implement Enumerable is #each. Every step of your pipeline here is _another_ call to #each. Just do it once.
> I worked at the largest Rails shop in the world and this would be rejected in code review.
Not sure if this means GitHub or Shopify. Until earlier this year I worked at GitHub for a decade, leaving as a principal engineer, primarily writing Ruby.
This would not be rejected at code review there unless the Hash had e.g. millions of values and, even then, it might not be a meaningful performance problem in context.
If the Hash is very small and will always be: readability trumps Big-O "performance" when n is very small.
Am I being pedantic and appealing to authority? Yup but, well, you started it and I hate to see helpful "Ruby is nice" comments like the grandparent get crapped on for no good reason.
There's #lazy to turn things into a lazy enumerator, to be iterated over when you so desire with e.g #force.
If you're going to iterate over an accumulator variable, use the for keyword instead of each, it's faster.
Alternatively, one can use .reduce({}) { |h, (k, v)| ... h } or .each.with_object({}) { |(k, v), h| ... } which makes the block not close over an external variable, and makes the assignment to that variable "atomic" (wrt the hash construction, the variable will only contain the final result, that is if a final variable is needed at all, which it may not with implicit return of the last value)
I wish it was typed though. So many times I’ve seen a function that takes a hash of “options” or “config” and have no idea what that actually contains. Even for official rails methods it’s often complex to know what the possible options are. Some of them seem almost internal with how obscure they are.
RBS+steep to the rescue! We typed our configuration this way for ddtrace Ruby. On the external (set) side it makes it very easy to explore configuration in an editor, on the internal (get) side it makes us ensure we don't make mistakes.
I've been in Python/Django for about a year now and I really miss Ruby's hash vs the dict.
`.dig(:key, :key, :etc)` is so nice to find deeply nested data without blowing up.
One thing I don't miss is knowing whether a hash's keys are strings vs symbols. While it's easily solvable, I've definitely lost time only to smack myself that I need to use a str but was always feeding a sym and swore that this should be a sym based hash.
I don't know ruby and if it's the same, but if you're not using a library like funcy or toolz for a nested get helper, you can do `dict.get('key', {}).get('key2', {}).get('key3')`. Not the prettiest, but can do in a pinch.
It should be noted that ** operator works like .merge, and also accounts for the order of key definition in a given hash declaration (whatever is declared earlier gets overwritten if a key with same name is used in the same hash declaration later).
Hash is so powerful in Ruby that people often overuse them.
One of the most common issues I found on Ruby code-bases is to not create classes to represent their domain and simply use hashes everywhere.
The downside is that a hash has no shape. It can (and will) be anything you want it to be, often causing havoc once the system grows.
Checks for keys everywhere. Almost all statements use the safe navigation because you never know what shape you're dealing with. Multiple places performing the same map/reduce/filter/etc. All because people stick to hashes a bit too long.
Amen. This is an issue at the company I work at. Common typos when looking up has keys will return nil - this has a tendency to silently keep working and blow up with a runtime error further down the chain. I am trying to insist on using .fetch to force an exception.
When the company switches to 3.2 I will insist on everyone using the new Data class for value objects rather than hashes.
This is perhaps the biggest killer feature of typescript: that your object literals (which are basically hashes) can have types. Or possibly interfaces.
> as a programmer building run-of-the-mill CRUD web applications, having so many choices can be really daunting and confusing. When do you choose which type?
I feel similarly after getting back into Java after years of Javascript. Java has a ridiculous number of arrays and lists: Object[], List, Iterator, Iterable, Stream, and I think I'm forgetting a few. Some functions want or return one, some want or return another. I'm constantly convertings these from one to the other, and it's never pretty. (What the hell is a Spliterator?!)
Meanwhile, in Javascript, there's just the Array. (It's not even really an array, but a hashmap that only takes integers as index, but nobody cares; it works.)
And, like Ruby, Javascript also has easy object literals that is basically the exact same as the hash in Ruby. If there are any meaningful differences, I'd love for someone to enlighten me. The only difference I'm aware of, is that the preferred type for keys in Ruby is the Symbol, which is a very lightweight string, whereas in JS it's just a regular string.
But, like the author, I strongly prefer to work in a language that just has a couple of sensible data types instead of making me choose from thousands that have no meaningful differences beyond their incompatibility.
For me the most useful bit about #fetch is not that it throws, it's that it allows setting a default value while distinguishing between a missing key and a key with nil as a value. If I want to handle key presence without a default I just use #key? as exceptions come with additional performance cost (building the stacktrace)
OpenStructs are ungodly slow anyway. If you want such accessors you probably want a plain Struct anyway and not mix hashness and method accessors. You can even define additional methods on structs:
Foo = Struct.new(:a, :b, keyword_init: true) do
def frobz
...
end
end
Structs are quicker to instantiate than classes but slower to call methods on, so if you instantiate a lot but seldom call they're great. Otherwise a class is better.
In my opinion Lua's equivalent tables or more of a Swiss army knife.
It's been a while since I've used them but afair they combine hash/dict, array and objects in one data structure.
They're hashes as usual. With a special constructor or int keys they turn into arrays, last I looked the Lua interpreter even optimizes contiguous ranges of keys as arrays. And by adding metatables and metamethods they also emulate JS-style objects.
My feeling with Lua is that it's lacking arrays due to minimalism, as in "we don't need arrays because tables will do". Probably the same reason why there is no "continue" in a for-loop.
In comparison, Ruby has arrays and hashes, and gives me the feeling "yes these do exactly what you expected, but wait there is more!" In that sense it does feel more like a Swiss army knife.
While is share the praise for Ruby's hash I find the argument that it might be daunting for a developer (of any persuasion) to pick between a `SortedList` and `Dictionary` worrying.
Yes, hashes in Ruby, associative arrays in PHP, maps in Go[1], dictionaries in Python[2] and C#[3] represent the same concept, a collection of key-value pairs.
The interesting thing is that it definitely feels different to me.
While the basic data structure is obviously identical to its counterparts, the way it interacts with the ecosystem as a whole makes it feel way more powerful. The interaction of hashes, symbols, and blocks lead to a language which feels like it is designed with DSLs as first-class citizens. This leads to things like Ruby on Rails, which at first glance feels like pure magic.
I get a somewhat similar feeling about lists in Python. Because of its very extensive list comprehension you start thinking in lists, leading to some very clean code which would be extremely awkward to implement in many other languages.
Yes. The OP's post is basically "baby's first API." The are a very handy API but relying on them at scale is much less efficient than structs or arrays. This post doesn't even go into the different performance characters of these APIs, which can be a major foot gun.
I've noticed a lot of novice-level posts on here lately. Although I admit the post was very well written.
My experience is that Struct only performs better than Hash in a pretty narrow set of circumstances, rarely enough to be a deciding factor unless profiling suggests it to be.
Selling it as associative array is nice. (And I know the docs do that, too)
It's a doubly linked list, array, hashmap all in one, with some weird magic when handling numeric indices.
There's often the time when using another language that I made a bad choice by picking either an array or hash or list while later I need one of the other properties PHP would give me by default. The extra cost for most uses is neglectible.
> It's a doubly linked list, array, hashmap all in one, with some weird magic when handling numeric indices.
In what way, exactly, is PHP's associative array (a hash table of course, like all these things) a doubly linked list ?
Sure, in a language which isn't very fast anyway, I can believe it's reasonable to use your hashtable as an array, O(1)~ or O(1)* [Expected or Amortized constant time] probably feels enough like 1 [a single operation] in this language that you shouldn't worry about it. Explaining to beginners why they can't write digits["four"] is more trouble than it's worth, so how about just make it work instead.
It is a hashmap, so, that makes sense. These are different words for essentially the same type.
Oh I see. Yes, this is a very common way for C programmers - in particular - to make a hash table, it's the design that C++ enshrines as std::unordered_map and it's the data structure you'd have been taught in a University data structures course in the 1980s, and maybe even the 1990s [depending on who is teaching and where]
While internally this is indeed using linked lists, I don't think you can make use of that from PHP which is what I'd imagined was meant. Suppose I had a million integers that I found somewhere, and then I want to splice in a different million integers I have, so that in total there are two million integers. With linked lists that's very cheap, basically instant, but with most data structure, including a hash table, not so much.
Linked Lists are very rarely what you needed, even though when they are what you needed they're invaluable. They are really easy to implement in C, so it's sort of a "I've got a hammer, now all I see are nails" situation in C.
Well, the hash table also uses list for handling collisions, but also using it to doubly link all elements.
And it's true, on modern hardware linked list are often bad due to data locality making iteration bad ... but that data structure was designed before (while, as said, revamped meanwhile ...)
But it's quite fancy. If you deal with it with "proper" indexed keys data will be packed in an array (or vector, ifyou prefer C++ terms; while most data stored in there is just pointers taking away some benefit of the packing...) while also keeping the doubly-linked-list pointers and once you have holes in it or string keys or something it plays hash and gives away some of the ordered storage ...
In the end that extra bookkeeping everywhere makes it slow and bloated for everything, but from a usage perspective it's quite nice.