> Can someone who doesn't like symbols help me understand the downsides of them?
I wish I had been clearer in my talk but I only had 30 minutes and wanted to cover other topics. Here is a more comprehensive argument against symbols in Ruby:
In every instance where you use a literal symbol in your Ruby source code, you could you could replace it with the equivalent string (i.e. calling Symbol#to_s on it) without changing the semantics of your program. Symbols exist purely as a performance optimization. Specifically, the optimization is: instead of allocating new memory every time a literal string is used, lookup that symbol in a hash table, which can be done in constant time. There is also a memory savings from not having to re-allocate memory for existing symbols. As of Ruby 2.1.0, both of these benefits are redundant. You can get the same performance benefits by using frozen strings instead of symbols.
Since this is now true, symbols have become a vestigial type. Their main function is maintaining backward compatibility with existing code. Here is a short benchmark:
There are a few things to take away from this benchmark:
1. Symbols and frozen strings offer identical performance, as I claim above.
2. Allocating a million strings takes about twice as long as allocating one string, putting it in into a hash table, and looking it up a million times.
3. You can allocate a million strings on your 2015 computer in about a tenth of a second.
If you’ve optimized your code to the point where string allocation is your bottleneck and you still need it to run faster, you probably shouldn’t be using Ruby.
With respect to memory consumption, at the time when Matz began working on Ruby, most laptops had 8 megabytes of memory. Today, I am typing this on a laptop with 8 gigabytes. Servers have terabytes. I’m not arguing that we shouldn’t be worried about memory consumption. I’m just pointing out that it is literally 1,000 times less important that it was when Ruby was designed.
Ruby was designed to be a high-level language, meaning that the programmer should be able to think about the program in human terms and not have to think about low-level computer concerns, like managing memory. This is why Ruby has a garbage collector. It trades off some memory efficiency and performance to make it easier for the programmer. New programmers don’t need to understand or perform memory management. They don’t need to know what memory is. They don’t even need to know that the garbage collector exists (let alone what it does or how it does it). This makes the language much easier to learn and allows programmers to be more productive, faster.
Symbols require the programmer to understand and think about memory all the time. This adds conceptual overhead, making the language harder to learn, and forcing programmers to make the following decision over and over again: Should I use a symbol or a string? The answer to this question is almost certainly inconsequential but, in the aggregate, it has consumed hours upon hours of my (and your) valuable time.
This has culminated in objects like Hashie, ActiveSupport’s HashWithIndifferentAccess, and extlib’s Mash, which exist to abstract away the difference between symbols and strings. If you search GitHub for "def stringify_keys" or "def symbolize_keys", you will find over 15,000 Ruby implementations (or copies) of these methods to convert back and forth between symbols and strings. Why? Because the vast majority of the time it doesn’t matter. Programmers just want to consistently use one or the other.
Beyond questions of language design, symbols aren’t merely a harmless, vestigial appendage to Ruby. They have been a denial of service attack vector (e.g. CVE-2014-0082), since they weren’t garbage collected until Ruby 2.2. Now that they are garbage collected, their behavior is even closer to a frozen string. So, tell me: Why do we need symbols, again?
I should mention, I’d be okay with :foo being syntactic sugar for a frozen string, as long as :foo == "foo" is true. This would go a long way toward making existing code backward compatible (of course, this would cause some other code to break, so—like everything—it’s a tradeoff).
I wish I had been clearer in my talk but I only had 30 minutes and wanted to cover other topics. Here is a more comprehensive argument against symbols in Ruby:
In every instance where you use a literal symbol in your Ruby source code, you could you could replace it with the equivalent string (i.e. calling Symbol#to_s on it) without changing the semantics of your program. Symbols exist purely as a performance optimization. Specifically, the optimization is: instead of allocating new memory every time a literal string is used, lookup that symbol in a hash table, which can be done in constant time. There is also a memory savings from not having to re-allocate memory for existing symbols. As of Ruby 2.1.0, both of these benefits are redundant. You can get the same performance benefits by using frozen strings instead of symbols.
Since this is now true, symbols have become a vestigial type. Their main function is maintaining backward compatibility with existing code. Here is a short benchmark: There are a few things to take away from this benchmark:1. Symbols and frozen strings offer identical performance, as I claim above.
2. Allocating a million strings takes about twice as long as allocating one string, putting it in into a hash table, and looking it up a million times.
3. You can allocate a million strings on your 2015 computer in about a tenth of a second.
If you’ve optimized your code to the point where string allocation is your bottleneck and you still need it to run faster, you probably shouldn’t be using Ruby.
With respect to memory consumption, at the time when Matz began working on Ruby, most laptops had 8 megabytes of memory. Today, I am typing this on a laptop with 8 gigabytes. Servers have terabytes. I’m not arguing that we shouldn’t be worried about memory consumption. I’m just pointing out that it is literally 1,000 times less important that it was when Ruby was designed.
Ruby was designed to be a high-level language, meaning that the programmer should be able to think about the program in human terms and not have to think about low-level computer concerns, like managing memory. This is why Ruby has a garbage collector. It trades off some memory efficiency and performance to make it easier for the programmer. New programmers don’t need to understand or perform memory management. They don’t need to know what memory is. They don’t even need to know that the garbage collector exists (let alone what it does or how it does it). This makes the language much easier to learn and allows programmers to be more productive, faster.
Symbols require the programmer to understand and think about memory all the time. This adds conceptual overhead, making the language harder to learn, and forcing programmers to make the following decision over and over again: Should I use a symbol or a string? The answer to this question is almost certainly inconsequential but, in the aggregate, it has consumed hours upon hours of my (and your) valuable time.
This has culminated in objects like Hashie, ActiveSupport’s HashWithIndifferentAccess, and extlib’s Mash, which exist to abstract away the difference between symbols and strings. If you search GitHub for "def stringify_keys" or "def symbolize_keys", you will find over 15,000 Ruby implementations (or copies) of these methods to convert back and forth between symbols and strings. Why? Because the vast majority of the time it doesn’t matter. Programmers just want to consistently use one or the other.
Beyond questions of language design, symbols aren’t merely a harmless, vestigial appendage to Ruby. They have been a denial of service attack vector (e.g. CVE-2014-0082), since they weren’t garbage collected until Ruby 2.2. Now that they are garbage collected, their behavior is even closer to a frozen string. So, tell me: Why do we need symbols, again?
I should mention, I’d be okay with :foo being syntactic sugar for a frozen string, as long as :foo == "foo" is true. This would go a long way toward making existing code backward compatible (of course, this would cause some other code to break, so—like everything—it’s a tradeoff).