A lot of people are talking about the problem itself - which is interesting - but very often you run into real problems which cannot be simplified so easily (NP hard problems, many different other cases). In addition, when you need to do important calculations on a server for your product and you're using Ruby, you may very well end up requiring 10x as many servers. This is pretty massive when you consider an AWS server can be $350/month with Go (1 large instance), and then $3,500/month with Ruby - if you can even parallelize your problem that well - generally you can't. That's a lot of startup runway being eaten for not that much benefit.
Of course, most products don't really need to do calculations outside of a database and this is why Ruby has taken off so well - but it's still important to realize that choosing Ruby over PyPy/Go/Whatever really can be a very expensive choice in the long run when your product suddenly relies on some unique math solutions, and having half your product in Ruby and half in C is awful to have to deal with.
The solution to this is a project like PyPy for Ruby, but it doesn't seem to be coming...
> having half your product in Ruby and half in C is awful to have to deal with.
Why? In many cases, only a tiny fraction of your code needs to be fast. In my experience, it is very reasonable to code that part in some fast language while delegating the bigger (and often more complex) part in a higher level language such as Ruby.
Everyone always agrees with this and, in my experience, has never really tried it. It works better in theory than in practice (or, at a minimum, it works only on a small subset of the types of problems that a naive view might otherwise lead you to believe). The problem is the data isn't organized in a way to be computation efficient, and all the C code in the world isn't going to make chasing pointers to pointers to pointers to ints fast.
A ton of work has to go into computation-friendly data organization (see all of numpy, scipy, etc) before you can ever actually optimize effectively.
EDIT: since all of my responses are of the form: "oh no I did it once with great results", I'll retract the part where I said "have never really tried it". I'm not arguing it's never worked. I'm arguing that the set of likely performance problems and the set of easy-to-optimize-with-C problems don't overlap as much as people would like to believe.
I have tried it. My MSc dissertation required a lot of computation (weeks of runtime after I'd rewritten critical portions in C++), and I'd do it that way again.
For many types of problems, putting the data into an efficient format before passing it to your C/C++ code is trivial in terms of both difficulty and time spent.
I prototyped everything in Ruby, profiled, and then rewrote a handful of the critical parts in C and C++.
I've done it too, and I think most of the responses to the GP are missing the point. Can it be done? Sure. But it's a pain in the ass. It's not nearly as trivial as people suggest. It's basically only worthwhile when you know you have the resources to build an infrastructure of re-usuable fast modules that get coupled with a scripting language interface.
Even when you give up on writing true "hybrid" code (i.e. "just write modules for the slow parts in C++", which is a gigantic pain), and instead try to write driver code that calls standalone compiled applications, you find yourself having to write a lot of duplicate code to parse/validate I/O on both sides of the language boundary. It sucks, and there's no way to make it not suck.
I'd go so far as to say that for problems where processing speed is a known requirement from the start, you should just give up on interpreted languages. Fast interpreted languages have been right around the corner for long as I've been writing code, but they sure seem to be taking their time.
I would venture to say the triviality of writing hybrid code is a function of how often you have done something similar, and how well you understand the underlying principles. This is really the case with any software problem.
Ruby provides some very useful features, but a lot of those come at rather high costs. Those costs are exacerbated if you do not understand how the internal components of the language are laid out, and the purpose behind this layout.
Ruby tries to be a lot of things to a lot of people, so then when people learn how to use it for one task they automatically assume that their skills will carry over. This sort of approach might work reasonably well with a straight-forward compiled language, but this it simply can't be that easy for an interpreted language like ruby, with it's reams of special context sensitive behaviour.
For example, consider the "pointers to pointers to pointers" complaint. Nothing is stopping you from having a low level C struct for all your performance sensitive data. Granted, you would have to write a ruby wrapper for this data for when you need to export it back out to ruby, but wrapper generation could be automated.
Sure, you could just say, "You know what. I'm just going to use C" but what if your project could really use some of those ruby features outside of the performance sensitive parts of the code? It's always a tradeoff.
I didn't find it hard at all, nor a pain in the ass. I wrote my code. Profiled it. Found the few, small bits that needed to be fast and treated the Ruby code as an executable spec for the C, and rewrote it very quickly.
There is SWIG support for Ruby, as well as the Ruby FFI implementation and RubyInline, offering various degrees of simplicity, but none of them are hard. The Ruby extension interface is more work, but even that is fairly simple to work with. I've taken C libraries and wrapped them in Ruby in minutes using SWIG for example.
Overall I often these days choose to write in Ruby first and replace even for applications that I intend to write the finished version of entirely in C. Prototyping in Ruby and rewriting in C once I have a working version is faster to me than iterating over a C version.
"I wrote my code. Profiled it. Found the few, small bits that needed to be fast and treated the Ruby code as an executable spec for the C, and rewrote it very quickly."
The fact that the speed-critical parts of your problem could be reduced to a "few, small bits" indicates that we're talking about entirely different things.
I was in a problem domain where writing the time-critical pieces in a different language meant ensuring that huge amounts of working-set data were available to that optimized code. Efficient access to that data meant storage in native format, which meant that supporting two languages would require a nasty I/O layer for the interpreted language. Not worth it.
And that's just for the problems where it was possible to consider using Perl/Python at all. For some problems, the memory overhead of Perl/Python data structures ruled out use on all but the smallest problems. Again, not worth it.
It's nice that the speed-critical parts of your problem were reducible to a few lines of code, but that's not often the case. For most real problems, I/O is a huge concern, and can't be dismissed just by pre-processing the input.
Also, it's a weird wonderland in which you know exactly which parts need to be fast, reimplement those, and then never need to touch those bits again. In my experience, that's not the case - you're going to want to tune and alter your algorithm, and if it's API needs to go through several languages, it's going to be a pain. Every time you decide you need a slightly different input format or data store or set of hyperparameters, you'll likely be changing many more files than you would in a one-language solution, and in a way that's hard to unit-test and hard to statically type.
It's certainly doable, and definitely the right way to go if you want number crunching in a high-level language, but it's not a free lunch. It's certainly not a quick one-time conversion and then you can forget about it.
We don't call 'scripting languages' 'glue code' for nothing.
If more people thought of FFI/RPC/etc. as just another pipe-and-filter framework, they'd be less reluctant to write properly-defined, performance critical, pieces in a more performant environment.
Then again, your use of a profiler and FFI/pipe/whatever is a secret weapon that seems to be one of the things that separate 1x from 10x engineers. :-)
After profiling a backfill script which was looking like it was going to take about two weeks to run, I found an area where certain bit-twiddling was being done which was extremely slow in Ruby. I rewrote this section in C, and was able to make this section 1000x faster then the ruby equivalent (this was MRI 1.8.5 several years ago). The total runtime for the script was brought down to about 8 hours (from about 165 TPS to around 7,000 TPS).
Fortunately, I've found that Ruby makes it exceptionally easy to write C extensions, and I often think of reusing C libraries in cases where I'm worried about performance.
I didn't mean to imply it wasn't ever possible. Just that the subset of things that it works great for is smaller than people wish, or believed. That "I'll write in Ruby and optimize in C when I need to" is a myth on the order of the "sufficiently smart compiler". There are things for which it just plain doesn't work. Things that are quite common.
Yes, you can make image transformations for your webapp fast by piping in a C-extension, because the C code can control the i/o and data organization as well as the algorithms. The integration surface is tiny. You can't, however, make the language's memory usage and garbage collection faster. At least not as easily. You won't be "dropping into C" to fix your garbage collection problems, at least not in the way people dream of when they think of "performance problems" at the outset.
If your memory usage and gc speed are of critical concern, then you should really know that ruby is just not going to provide the tools you need to handle that. That's like using a hammer when you need a screwdriver. Even the most modifiable hammer is still meant for pounding, not screwing.
Whenever I see "I'll write in Ruby and optimize in C when I need to" I assume the context of "my program is well suited to be written in Ruby." For instance, I certainly wouldn't write embedded code in ruby. However, I would venture to say that these situations are the exception rather than the rule.
These days most software has access to fairly reasonable computational resources. If you're writting microcode for some industrial control system, or calculating stock price variations with micro-second resolution then certainly stick to C or ASM or what have you. However, in an age when even a fridge will have a few hundred MB of memory, and when a lot of code is meant to be just "good enough" I think even ruby will suffice.
In a way you actually do fix the garbage collection problems by dropping down to C. One of the main performance hit in tight loops in Ruby is usually object allocation and GC. If you drop down to C you can handle the memory manually and avoid most of the allocations.
Now, imagine if you hadn't had to profile at all, or even drop to C, because you'd written algorithmically efficient code, and could rely on the runtime executing it well?
I'll interject one more 'done it' anecdote. The document processing system I've been working on for several years had several libraries in C. Very few issues compared to the general logical bugs. Integrating the Judy arrays library was a huge performance benefit in a number of areas.
Depends on how you go about it. I personally haven't encountered any cases where it would make sense to write a few functions in C and link them into a Python/Ruby program. But I have encountered a number of cases where what works well is to write the main program, the part that does the heavy crunching, in C++, and various ancillary scripts for preprocessing, postprocessing etc. in Python.
> it is very reasonable to code that part in some fast language while delegating the bigger (and often more complex) part in a higher level language such as Ruby.
Agreed. Contrary to what lbrandy said, I've tried it, and it works very well. Writing Ruby extensions in C, or wrapping C libraries for Ruby, is really easy. Wrapping a PNG library for Ruby allowed us to draw about a trillion pixels' worth of images based on the human genome across a modest cluster of ~100 cores in a few days [1,2,3]. For reference, a terapixel image is about ~4TB of uncompressed data and ~800GB compressed, with Microsoft putting up comparable numbers for a project they ran on a fancier HPC cluster in 2010 [4]. We were heavily compressing it into PNG tiles on the fly so ours came down to ~60GB.
Regarding pointers to pointers to ints--if you keep all the data within the heap of the Ruby extension, as we did with the wrapped PNG library, then it can be stored in whatever simple data structure you want.
When generating huge numbers of png images, it's worth it to tweak the compression settings. Run png optimizers like optipng/pngcrush on your output, and use the settings it finds.
LodePNG is easy to use, but doesn't let you tweak as many options as libpng.
The tiles I've inspected are 15-30% larger than necessary-- besides using sub-optimal compression heuristics, they include unnecessary metadata (creation/modification time, aspect ratio).
Yeah, I did consider optipng/pngcrush. In the end it was a tradeoff between running time and space. I could afford the space but not the time. When I added in pngcrush, because it brute forces a whole bunch of compression routines on each tile, running times went up dramatically and I preferred having the job complete in days as opposed to weeks. It might be something to consider if we generate more tiles on a faster cluster.
-Write application in Ruby
-Profile Ruby application to identify code to be rewritten in C
-Learn C if you don't already know it
-Rewrite various parts of your application in C
-Run Valgrind to search for memleaks
-Ensure that your target systems all have required libs installed
Sure it does, right up until you also need libraries that Ruby has but Go doesn't. Also you're trading away a few language features to get Go, so between the two you're tossing a lot of programmer time out the door to avoid writing in C. Maybe that's still a worthwhile trade, but it's not really a simple one.
>Sure it does, right up until you also need libraries that Ruby has but Go doesn't.
Now replace go with ruby and ruby with perl. Go has libraries for 99% of what people are doing with ruby. If you are in that 1%, then you need to evaluate whether or not it is worth writing the library you need or if you should use another language.
A quick check led to me not finding a package manager for Go similar to RubyGems. Does it have one? Manually installing dependencies is a colossal waste of time.
It is just go. Do a "go build foo" or "go install foo" and it will handle any dependencies for foo on its own. Do a "go get bar" and it will download and install bar and anything it depends on.
Where are these packages published? Can I install things from this page[1] in that way?
At least from 1000 feet away, it seems like Ruby has this problem solved better than Go does. Maybe it's just that the nature of the solution is more apparent.
>Can I install things from this page[1] in that way?
Sure. Just do a "go install github.com/foo/bar". Bitbucket, github, google code, and launchpad are all in the defaults, but you can configure your import path to add whatever locations you like, and then "go list" will show packages available there, and the other go commands will be able to build/install/etc those. The same syntax is used for import statements, so your code can depend on any code any where:
import "mycoolwebsite.com/~bob/myproject.git/foo"
>Maybe it's just that the nature of the solution is more apparent.
I think they make it pretty clear: http://golang.org/cmd/go/ gives you pretty much everything about packages and so does "go help". Go pretty much does everything out of the box with the included toolset, from installing dependencies to formatting code to refactoring.
Because I am being realistic. Ruby, C, Java etc have had decades to build a comprehensive set of developer libraries. Go hasn't.
It's a "cute" platform that I am sure will beat Ruby in a few benchmarks. But I don't see any reason why anyone would seriously consider it for a real world app.
Ruby has a sprawling ecosystem of libraries largely drawn into being by Rails (itself a sprawl) and by design decisions in Ruby's core and stdlib that complicate things and need further code to work around. Threads and fibers and events, oh my. Each workaround has its own ecosystem. You could even consider things like rack and unicorn workarounds for the weak implementations in stdlib.
I don't think Ruby's libraries honestly provide a whole lot better coverage than Go. The stdlib is very well considered. It's fast, it scales, it's thread safe in the appropriate places. It's not a toy implementation. What it doesn't provide, it provides hooks for. Third party libs cover all the major bases. And unlike Ruby, writing interfacing code in Go is a snip. I don't see libraries as a downside.
(With one exception. Why oh why did they not include a bigdecimal? Argh, a ratio type is not an adequate replacement. Try rounding a ratio some time. Blarg.)
Well for one Go beats Ruby in every benchmark that I am aware of, and generally by quite some distance.
Also realistically 60-70% of all libraries created for other languages are bitrotted to junk or pointlessness by this point.
If you can't see why an app that fits within the worked problemspace of Go (and even with the current libraries plenty of things are wholly doable in Go) and will require much less server resources might be a good fit, then you shouldn't be making those decisions, because that is the thought pattern of a zealot.
Look Ruby is well supported, established, and proven, but it is also slow. For a lot of things that is fine, but if you are dealing with something that requires a lot of serverside work on low margins it will kill you dead. Horses for courses.
The point is if you need the performance, C is the more natural choice for a lot of it at this point. I looked through one of the Go tutorials a while back, and it was WTF after WTF. If it works for you, fine, but Go has a long way to go to be interesting for a lot of us.
Because in reality, people don't make decisions based on how many duplicate, unmaintained, broken libraries exist for language X. They make them based on "does language X have the libraries I actually need". Oh, json, DB_of_choice, and web framework exist? Gee, I guess 90%+ of people are covered right there. Virtually every language, even as seldom used as say, ocaml, have the libraries that 90% of people actually use. Libraries aren't an issue for the majority of tasks people are doing.
The bottom line is: the time necessary to optimize Ruby to run below 8 minutes was bigger than he could afford, and apparently Go, once written (which apparently gave the OP some issues) needn't be optimized.
That said, Go wasn't necessarily the better choice, because of the subtlety that made the Go program overflow silently. Writing the solution in Scala or even in JS would have probably given him less of an issue.
I recently wrote a image searching program (find a set of images as sub-images in another set of images) and I wrote most of it in Racket (Scheme dialect). I wrote the inner loop to calculate the correlation matrix in C and made an FFI call.
I'm not familiar with Ruby's FFI, but I found making FFI calls dead simple in Racket.
It's pretty much `gem install ffi` and off you go. I've used a Ruby gem [1] that talks to a hashtable storage library via FFI and found it perfectly stable.
Indeed. And there is more than one way to do it. The most obvious thing is to write a Ruby C extension. But you could also write a totally separate worker process in the language of your choice and glue it to your main Ruby app with a message queue. In most real-world cases, either solution will get you the performance you need to avoid 10xing your servers.
A really simple solution in that case is to run JRuby and then write computationally intensive code in a JVM compatible lang like Scala or Clojure. This route seems popular since you can easily give JRuby access to your Scala classes.
Right, but if already writing in Scala, why ever write some parts of code in Ruby? Scala gets you expressivity of Ruby with additional benefit of static typesafety.
Well a pretty typical scenario is you are writing a ruby app and run into an area that needs additional horsepower (say a gaming API or something). Being on JRuby would give you more options at that point.
Because it is a huge pain in the ass, and very rarely does any good. As soon as you are writing C, you are dealing with all the headaches that go with it for both development and deployment. You've lost a big part of the benefit of using scripting language X. And this part:
>In many cases, only a tiny fraction of your code needs to be fast
Isn't true. People repeat it a lot, but I've seen no evidence to support it. Most people are using scripting languages for web apps which really are just generally slow all over. They would need to write 80% of it in C to get good performance. Seems like using a language that is as productive as scripting languages, but is also fast, makes a lot more sense.
Not to thread-crash, but in case you missed it, we posted the results of Round 3 of our web frameworks and platforms benchmarking last week [1]. Seems relevant given your comment about hosting costs.
This round includes many more community contributions including more Scala, Erlang, Haskell, Python, Java, PHP, etc. And Go 1.1.
Topaz is also a really early experiment that supports a pretty small subset of ruby. I really hope it gets developed into a real, useful ruby vm but at the moment it is not even worth considering.
There's always Rubinius but it seems that the path to alternative implementations of the Ruby language is paved with hoards of hurdles so yeah, it'll take time.
Do you think we as a community should be pushing for a potential rewrite of MRI to get rid of the GIL? I feel as if the GIL is what is limiting true ruby performance.
It also seems like Matz and ruby core team don't really give a shit about performance.
It is very possible to achieve a high level of performance on an interpreted language!!
I don't think that it's true that the core developers don't care about performance at all.
There's an argument that removing the GIL would require a lot of effort for little gain (and in some cases decreased performance). It's just not a priority for them.
Of course, every use case is different (sometimes very different), but I haven't ever hit a problem that has caused me to curse the GIL.
Sure, I knew Ruby wasn’t going to be zomg fast, but I always assumed that if I chose the right solution and wrote in an efficient manner (memoizing, storing values/lookups that would be used later, limiting search spaces, etc), my ability to write code quickly and concisely mattered more than sheer processing speed.
I was wrong.
Sure, the author is wrong, but not because Ruby is slow (it is, though). He's wrong, because he did not find the right solution for the problem. Instead, he wrote a brute-force solution with a simple optimization, and did not realize that this is the real cause of his code under-performing, blaming the Ruby's slowness for it, when he really should have blamed the Dunning-Kruger effect.
One of the points of scripting languages is that you can solve the problem quicker, in terms of developer time, by not having to think about it as much. If you have to think about it more, and come up with clever solutions while a faster language just lets you brute-force it, then they haven't really let you solve the problem more quickly.
(That said, programming competitions are a pretty unrealistic subset of what actual software engineering is like. C++ is a fine language for programming competitions, because it's fast, excels at the sort of numeric manipulation that's common there, has a "good enough" standard library with the STL for common tasks featured there, and the common pitfalls of C++ development - memory management and slow build times, for example - won't bite you in programs of competition-length. That doesn't mean the equation doesn't change if you're, say, building a web app or parsing a simple file format.)
To me, the obvious solution is to generate the palindromes, not iterate through every number and check if it is a palindrome. Unless generating palindromes is a very hard problem, iterating over them should be the obvious solution, as it obviously will mean a substantially smaller set of iterations.
As it happens, generating a series of palindromes is easy, and is obviously massively faster, since as a first crude approximation of the difference you can take his example program, change it to step through the space with "step(10)", convert to string, replace the last digit with the first and check that the resulting number still falls within the range, and then check if it's a palindrome. Just this crude change cuts the number of iterations required to 1/10th, and it hints strongly at the effects of recursively applying the same method.
Having to handle varying length palindromes slightly complicates matters, but not much.
Even if he did brute force it, he did the same thing with Go and it was much faster. So by comparison Ruby is slow and his conclusion is sound. Yes he could have optimized it, but that just means he could of optimized the Go solution too.
ruby is slow: sound
ruby's slowness is his main problem: not-sound
He's not going to win programming contests with brute-force solutions. Even his brute-force method makes unnecessary computations - why keep running after n^2 is out of range?
> memoizing, storing values/lookups that would be used later, limiting search spaces, etc
He did none of that. Kudos to OP for benchmarking & profiling though -- the speed of Go is impressive.
In fact I don't think his conclusion is sound. If you're into algorithmic competitions you most likely don't need to change the language. If you devise a solution with a good enough complexity, the competition (if well designed) should allow enough time for the execution of that (near-)optiumum solution, while not awarding points for solutions which are asymptotically worse. The time taken by algorithms with different complexities vary with the size of the input, as a nonconstant function, while implementing the same algorithm in different languages will give you a difference of a constant factor.
What I'm trying to say, (stating the obvious):
When the input is large enough, it doesn't matter what language you're programming in. And programming competitions should (and generally do) only focus on that.
But I'll admit that I'm a huge fan of optimizing my algorithms with bit-level operators, extreme care of memory allocation and other tools C offers.
> when he really should have blamed the Dunning-Kruger effect.
Going a layer deeper, blame the competitions themselves. They very rarely have good data sets to actually weed out algorithmically poor solutions, so you can often get away with brute force or mostly brute force solutions -- if you use a low-level programming language.
For instance many times a O(N^2/100) will pass the test data just because it's written in C++ instead of something slower. That's a failure of the test data.
Not sure I agree, although I understand your point. Most algos have performance profiles that are significantly different when run on best-case and worst-case data sets. An algo implementation that works correctly for the data set of a particular problem is not invalidated b/c it does poorly when implemented in another language. If it works well only in C++ in this competition, and the developer chose C++ then it seems to me he has successfully answered the call.
Ruby is great in terms of allowing people to experiment with different algorithms pretty quickly. It's easy getting the code written and tested because the code tends to be so succinct, the standard library already includes a lot of shortcuts, it's synchronous, there's a lot of metaprogramming features readily available and the interpretation allows for quicker testing when you can get the basic syntax right after some experience without requiring the help of an IDE as most Ruby code is written in text editors.
Languages that demand IDEs tend to be more complicated in many ways. They can be faster, but also might demand more code obfuscation through more code that need to be written like types and longer names and importing of libraries and so on. My pet peeve is that I absolutely love Ruby, but it indeed is not meant for ultimate performance. On the other hand, languages like Dart might allow for some compromise if you can get stuff done with less power at your hands like less metaprogramming and fewer shortcuts... Except that Dart is trying to tackle asynchronous programming with the standard libraries which is itself quite complicated (Future what?)
Go and Dart are not themselves too tuned for performance yet, even though they tend to be very quick when compared to Ruby. They tend to solve different problems though.
Ruby has a couple of architects, Matz and SASADA for the VM. Go has a few. Dart has many, with some tending to the standard libraries.
Programming is unfortunately way too complicated. In my opinion, languages like Ruby are wonderful when you don't have a need to hide the source-code. On the other hand, languages like Go and Dart might be good when you do have a need to hide the source-code when deployment time comes.
I also come to the conclusion that Ruby-like languages are just not the right tool for this kind of problem, but it should be pointed out that you would have been able to solve the 10^14 data set easily, even with Ruby.
I was solving that problem with PHP and got it running reasonably fast (1s per range) for ranges up to around 10^60. PHP is probably faster than Ruby, but it's still in the same area.
The first trick you can use is that you don't have to go through all numbers between start and end, rather you can directly only traverse the palindromes (i.e. you will only have to increment one half of the integer).
The second trick is that only numbers consisting of 0s and 1s (and a 2 at the start/end or the middle) can satisfy the condition (exception: the number "3"). This further massively reduces the search space.
There are more criteria that you can use to narrow down the numbers to consider (e.g. the number of non-zero digits can't be greater than 9). But in the end, cracking the 10^100 range is probably not possible in a language like Ruby or PHP. Using something like C++ on the other hand it becomes a triviality.
I learned python about a month ago and was able to solve all three parts of this problem correctly (part C was the most difficult). My code was poorly optimized, but with some careful forethought, python's bottlenecks were not too problematic.
The first two parts were very easy with only brute force by working thru the square root of the full range once, caching the numbers, and using that cache. I don't see why people (including the auhor) recalculated the numbers for each goven range, especially in part B which required 10k such ranges.
That said, my python code, as a relative newbie, took 10s to cache all fair and square numbers from 1 to 10^14. I ran thru from 1 to 10^7, checked if palindromic using strings, and if good checked the square. (there were only 39 such numbers in total). I then ran all 10k answers in seconds.
Part C was MUCH more tricky, until I discovered the same trick most others found, that only some of the square root palindromes are valid with digits 0, 1, and 2. That said, as a python n00b, my code took 40 mins to run the range from 1 to 10^100. I asked the google staff if pre-caching is allowed, and they said yes, as long as my code to generate the pre-cache is provided wih my submission.
Properly optimizing my python would probably give similar performance. But the point is that with a bit of forethought, even poorly-optimized python code by a newbie was sufficient to solve all three parts of this question.
I also solved in python, however, my code to pre-generate all "fair and square" numbers in the range 1->10^100 only took 6s to run. Didn't even need optimization. I simply constructed the numbers outright, ie, there was no iteration in my code and trying to see if something fit. I just figured out a rule that would generate all those palindromes with no checking. I think that was what they were looking for in that submission.
If anyone is curious, the rule is as follows. Any palindrome where the sum of the squares of the digits add up to strictly less than 10(9 or less) will be the square root of a "fair and square number". From there, I constructed them using a recursive method which finds the first half of numbers fitting that description and mirrors them to get the palindrome.
I ended up getting an incorrect solution on C-large-2 but that was because I failed to realize that I would lose precision when finding the square roots of the bounds of the range. While integers in python can grow indefinitely math.sqrt() will give you a 32bit floating point number which will represent square roots of 10^14 numbers accurately but not square roots of 10^100 numbers. The diff which turns my program into one that gives correct answers has literally 3 changes lines(to use decimal.Decimal instead of the math library for doing square roots).
You don't need to use Decimals. When you find all square roots of fair and square numbers, don't cache them but their squares (ie, the fair and square numbers themselves).
Then for each of the 1000 inputs, you run through the set of 46k fair and square numbers, counting how many are between A and B. no square roots required.
Yes, I could've done that, but my code from C-large-1 already use sqrt() on the bounds of the range and I never thought to change that. Since I was generating the roots instead of the full numbers it seemed natural to iterate over those. If I had thought there might be a problem with sqrt() I might've done that. However, it just never occurred to me. I just ran my C-large-2 solution against C-large-1 and got a correct answer so I assumed I hadn't made mistakes. I only found out about this after the competition was over. Oh well, this was just a qualification round anyway :)
Yep, in a competition you want a language that had as little magic as possible. Languages like Ruby (not necessarily dynamic languages!) are just not really suitable for it
This seems like such a vague, insubstantial piece of advice. I've seen it before, but never in a more concrete form. I don't see any "magic" in Ruby. How do I determine if a language is magic? Usually, the best I can figure is that they're either confusing syntax with semantics or they're actually thinking of some library like ActiveRecord rather than Ruby.
Really, for pretty much anything I could conceivably think of to call "magic" in Ruby (e.g. method_missing), there's equivalent "magic" in Python (e.g. __get_attr__). And yet PyPy is pretty darned fast.
I think people are referring to language features that are very powerful and concise but have horrible performance when you get large data sets like this.
Something like the inject method, at least in my limited use of ruby with these kind of Google Code questions, seemed problematic.
I've been taking part in GCG since ~2008 now. I have only once encountered a problem where my solution in python was too slow but an equivalent one in C would've been fast enough. Though just barely(my python solution ran in ~10min, the C one I wrote later ran in 5min, the time limit was 8min). In that particular case, there was a more efficient algorithm which we were expected to find to solve the problem(ie, my brute force-ish solution was not anticipated and it would've been "cheating" if it had actually worked) and I didn't find it.
In general, I'd much rather be able to implement my solutions as quickly as possible as the challenge is usually in 1) finding the correct algorithm and 2) implementing it correctly and fast. If you can do that, choice of language usually doesn't matter unless in some of the problems you might get in the finals(I've never gotten that far, I usually rank somewhere in the top 2000). Also, for me, the ease with which I get things like arbitrary-size integers, sets and dictionaries far outweighs any speed gains from something like C++.
In these competitions, it's about implementing a solution as fast as possible, while still get it working under the time limit (5/8 minutes). It's mostly about having the right idea and implement it.
In very many cases, you would like to be able to manipulate and play with algorithms and have efficient mutable data structures. For this, Java/C++ (Essentially C with data structures) is very suitable.
However, in some cases, you'd like to perform advanced simulations, mathematical calculations or manipulations where immutable data structures are more suitable. In those cases, you would LIKE to have magic: Arbitrary numbers, ratios, (efficient) immutable data structures and a more functional style would not only eliminate work, but also reduce the amount of errors you're likely to do.
The important part is that the implementation uses the right idea, not that the implementation is fast. Usually it's handy with a lower-level language to implement the algorithm right, but at times, a higher level (functional) language is better when it comes to implementation speed.
Wouldn't you want to use a language that is familiar in a competition? If Ruby is your forte, surely you should use it?
I fail to see how "magic" properties of a language matter if you've got reasonable algorithmic chops and focus on solving the problems rather than trying to be clever.
If the goal of the competition is to slice apples, then a knife is pretty good compared to a gun. Or if the competition takes place under water. Or if you're Steven Seagal and the ship where you work is taken over by a splendidly over-acting Tommy Lee Jones and his gun-wielding minions.
Execution speed is nice, I don't disagree. But it's not always the be-all and end-all deciding factor for what tool to use to solve a problem in a specific context.
Competitions can be stressful, and choosing a tool that feels like an extension of yourself (be that Ruby, COBOL or whatever) is, to me, the better route. Compared to something that executes faster on your computer but not in your head.
I don't think the parent post (which I totally agree with) was saying that you must always decide by picking an inherently fast language, but rather if you decide to pick one whose execution time is not inherently fast to enter a competition where execution time is factored you will start with a major disadvantage.
That doesn't mean you'll lose (ala your Steven Seagal analogy), maybe you're so much better than everyone else that you can overcome the disadvantage, but you'll surely be starting in a bad position compared to someone holding a 'gun' even if he or she is somewhat less skilled at using the gun than you are at using the knife.
My experience with GCJ has been that raw execution speed almost never matters as long as you have the right algorithm.
In the case of the OP, for example a trivial optimization(the kind you do without even thinking in these competitions) would've brought his execution time down 10000x. Basically, you are asked about the number of "fair and square" numbers in a certain range, over and over again, in 10000 different cases. He could have just taken his solution, pre-generated all those numbers once and then ran all those test cases by counting in that pregenerated list, instead of generating it for every single test case(all 10^5 of them). It turns out, that in the whole span of 1->10^14 there is only 39 such numbers and it would've taken his program probably less than a minute to generate all of them(I'm being generous here, since it ran 10^5 test cases in 53min).
In fact, if you read the problem analysis later provided by google you will see that in fact you were expected to make this optimization to solve the problem. The fact that C-small had 100 test cases, C-large-1 had 10^5 and C-large-2 had 100 should've been a strong hint that C-large-1 required some pre-caching across test cases.
Well sure, adding "where execution time is factored" makes it hard to disagree with you. (And ultimately, it usually becomes a factor sooner or later.)
That wasn't the context in which I was writing my original reply, however. I was responding to a post that mentioned "magic" as a reason to disregard Ruby for competitions, which I didn't agree with. (The Ruby solution in the submission was slow even using a while loop, which is far from magic.)
If you're able to solve problems in the same amount of time using a fast and a not so fast language, of course you pick the faster one in a competition. I'm not arguing against that.
I disagree with the premise in a similarly nonsensical fashion. (I mentioned neither execution performance nor weapons in my reply.)
The ultimate goal in a competition is to solve the problem within a fixed amount of time. This requires you to actually implement the solution to be able to run it. If you are able to do that more easily with a language that is inferior in execution speed, that will probably be the best strategy. At least compared to using a really fast language that you are not as comfortable with.
It's why IOI (http://www.ioi2012.org/competition/rules/) doesn't allow interpreted languages, because they are (or at least when the competition was first held were) just too slow.
If you have only 1-5 seconds of computations time you're going to choose for a compiled language every time
Well it's just that to support another language they need to make solutions for every problem in that language to test that it's actually possible and they don't want to do that.
and on that page it says:
Each submitted source program must be written in C, C++ or Pascal, it must be smaller than 100 KB, the evaluation server must be able to compile it in less than 10 seconds and at most 256 MB of memory. No other extensions different than the ones listed above will be run.
They put Python and Ruby on the machines so you can write a script to generate testcases, you can't submit programs written in those languages
When you say "magic", what you really mean is "abstraction", which is a more suitable and neutral word to describe the pros and cons of using a higher level language.
Not that I disagree with your premise - if you need to write highly performant code, lower level languages will pretty much always be faster. But abstraction is extremely beneficial in many other circumstances.
On the contrary, the biggest downside of magic comes in maintainability. So languages with more magic are more appropriate for competitions than for regular code.
For comparison, I translated it pretty directly to Python:
import sys
from math import sqrt
def string_palindrome(num):
s = str(num)
return s == s[::-1]
sys.stdin.readline() # throw away first line (number of cases)
for count, line in enumerate(sys.stdin):
found = 0
start, finish = [int(num) for num in line.split(" ")]
sqrt_start = int(sqrt(start ))
sqrt_finish = int(sqrt(finish))
for x in xrange(sqrt_start, sqrt_finish+1):
if string_palindrome(x):
square = x * x
if string_palindrome(square) and start <= square <= finish:
found += 1
print("Case #%d: %d" % (count, found))
For the first 15 test cases I get the following run times:
The integer version for checking palindromes was also much slower in python.
Edit: Even though the interger version of _palindrome is slower in pypy, there's another optimization that works there, but is slow in cpython. It brings the run time down to 0.6 seconds!
def half_string_palindrome(num):
num_str = str(num)
num_len = len(num_str)
for i in xrange(0, num_len // 2):
if num_str[i] != num_str[num_len - i - 1]:
return False
return True
I don't think you can just pass '(i) to palindrome? here - doesn't the number need to be converted to a list of digits first? I got #t for everything using your code.
I took some code off SO and it bumped up the runtime substantially:
#lang racket
(require srfi/1 srfi/26)
(define (digits->list num (base 10))
(unfold-right zero? (cut remainder <> base) (cut quotient <> base) num))
(define (palindrome? lst)
(equal? lst (foldl cons empty lst)))
(define ITERATIONS 10000000)
(for ([i (in-range ITERATIONS)])
(palindrome? (digits->list i)))
racket palindrome.rkt 11.66s user 0.55s system 99% cpu 12.293 total
I'm a racket beginner so I'd be interested in better answers to this.
There are actually several people who solved this problem (including both large inputs) using Ruby.
And for those asking about Python, most Code Jam problems are apparently calibrated to be solvable using Python. Keep in mind that it's one of the languages Google uses internally and Guido used to work there, so they probably want to support it. From https://code.google.com/codejam/problem-preparation.html:
"Where possible, if you allow multiple programming languages, make it so that all languages are fast enough to solve the problem. Occasionally that's impossible; in Code Jam we generally calibrate by making sure a somewhat optimized Python program can solve a problem. Sometimes, though, there's just no way to have Python succeed with the best algorithm and C++ fail with a worse one. In those cases we'll typically decide that sometimes in life, you just need to use a faster programming language. We try to avoid making that decision if we can."
First of all, there are plenty of people who solved both large inputs with Python/Ruby (with and without pre-computing all the fair and square numbers).[0][1]
As others have stated, the contests are designed such that language shouldn't matter and problem sets are deemed solvable with Python as a baseline. For this particular question, the official contest analysis goes into detail on how to reduce the solution space:
Now the issue becomes the trade off between development speed vs runtime performance.
Is it possible that you might solve the problem with a worse solution in a better performing language? Yes.
Is it possible you might finish a solution within the time constraints in a more expressive language? Yes.
If something is CPU bound in production, then going with a better performing language is a viable consideration.[2] However, my goal in programming contests is often to become a better programmer in the process. As a result, I choose to focus more on improving algorithmic complexity which is language agnostic.
Wow, 5 minutes in Ruby, and less than a second in Go? I know scripting languages are slow, but it's easy to forget just how much slower. (EDIT: Corrected misread number thanks to dljsjr).
I don't want to detract from the article's main point with a cliche "algorithmic optimization beats fine tuning", but I think it's worth mentioning. Unless I'm mistaken, this particular problem can be solved about 10,000,000x more efficiently.
> Given two numbers X and Y, return how many numbers in that range
> (inclusive) are palindromes, and also the square of a palindrome.
I would just count the digits of X and Y, then ranging between those counts, apply the following algorithm: For N digits, enumerate all numbers of N/2 digits satisfying the bounds X and Y. If N is even then simply mirror the first N/2 digits. If N is odd, mirror the first N/2-1. Finally, check the bounds of the resulting number against X and Y to filter out edge cases.
If the original algorithm runs in 10^N time, this would be 10^(N/2). For N=14 as mentioned, that's roughly a speedup of 10,000,000.
Edit: It seems I misinterpreted the question (which sounds like you're supposed to enumerate all palindrome numbers X>=N>=Y, and also print out the squares of those numbers). This is actually asking for a particular type of palindromes -- those which are palindromic squares of palindromes. As macobo helpfully points out, you can drastically optimize mathematically from this additional filter.
This problem has a quite elegant solution - Suppose you know the fair palindromes (is a palindrome and a root of a palindrome) of length 2*k. Then you can add either a single digit or double that to the center, and test if it's fair. This algorithm generates all the fair palindromes. https://gist.github.com/macobo/5430510
By memory, there were under 50000 such palindromes in range 0..10^100.
I'm not a rubyist but it appears the OP was doing something like this:
for each range (start to sqr(end))
for each number in the range
check if the number is fair and square
If you do this you're checking the same numbers many times (total count of numbers checked is over 3 hundred million calls...)
The right approach is to first compute all fair and square numbers between 0-10^14, store the result, and then simply check how many are found in each range.
It turns out there are only 39 fair and square numbers between 0 and 10^14.
I participated in Code Jam 2013 and passed the qualification round using Python; after precomputing the fair and square numbers, checking ranges was pretty much instantaneous (for large input 1).
A slightly more interesting approach: find out all non-overlapping ranges you need to test, and then run the fair-and-square generator, testing each to check in which of the ranges it should be put, finally combining the sub-ranges to make the ranges asked for. My (not particularly optimized) Go solution ran through the C-large-1.in set in 3 odd seconds.
The Go solution is incorrect. It's not using int64, and most of the finishes are bigger than a 32bit int.
If you run it on the following input:
"""
2
8 10000200002
1 500
"""
You will see that the loop that runs from start to finish is not running on the first case, because it sees the finish as 0.
Haven't tested it using int64, but I would imagine it would be much slower(Still a decent bit faster than Ruby presumably), because any heavy computation for this problem was removed.
I noticed that as well [1], and rewriting the code to use int64 (and thus cover the entire range) makes the Go code run in 2m40s. Still faster than Ruby, but not as incredible as the author thought it was.
These sort of contests are not tests of speed of computation, they are tests of algorithmic knowledge. The final data sets will (usually) all be designed to make even the most optimized implementations of the wrong algorithm fail.
I'm sorry, but is the Go program correct? I ran it on my machine, and for Case #1, it reports a solution of 1, while there are actually 19.
EDIT: I modified the Go program to use int64 instead of int (the upper bound of the first case is not a valid 32-bit integer), and the execution time is now much higher: 2 minutes 41 seconds on my laptop.
If your system is running a 64-bit OS on an amd64/x64 chip, set your GOARCH to amd64 prior to compiling the test program (depending upon how you installed Go, this may require a recompile of the Go tools).
Presumably that was the target GOARCH the OP was aiming for since int on GOARCH=amd64 is 64-bit (int mirrors the native bit-ness of the architecture in Go instead of virtually always being 32-bit like it is in C/C++).
Using int64s on a 32-bit GOARCH (which I assume is your current GOARCH if int is overflowing after 32-bits) will certainly result in a lot of slowness compared to using int64 or int (which will be the same as int64) on a 64-bit GOARCH.
Also as far as speed of execution goes, it'll likely depend greatly on the version of Go you're using. The RC releases of Go 1.1 are much, much faster than Go 1.0.3 for a lot of code.
> (int mirrors the native bit-ness of the architecture in Go instead of virtually always being 32-bit like it is in C/C++).
Seriously, in a language designed ~2010 a common "int" variable could be 32-bit? 64-bit should be the minimum because 32-bit is too small to express common values these days. It's 2013, if you actually want a 32-bit integer use int32. Better yet make "int" always be a particular size and use "register_t" or "int_fast_t" similar as a native int.
You can't even assign a smaller type to a larger one in Go without a cast, but you can recompile it on a different arch and, like this program, have it completely fail because the size of the variables changed. And baked-in coroutines so that you won't run out of address space on 32-bit? Really?
Go is so archaic in many ways like this. It's like somebody from the 70s tried to come up with a better C than K&R.
I have mixed feelings about the way 'int' works in Go.
I share your annoyance to some degree and think it gets even worse when you mix CGO into the equation since people writing CGO will often leave C ints as Go ints when porting structs and function signatures, which is a terrible idea (because on a 64-bit system with a 64-bit C/C++ compiler and a 64-bit Go compiler, int will be 32-bit in C but 64-bit in Go).
OTOH, having int be variable-sized allows you to easily have maps or other int-indexed structures that naturally grow in capacity with the current architecture and I'm glad they didn't just peg everything to 64-bit as a 'modern' shortcut because I do a lot of Go programming that targets 32-bit ARM CPUs and the performance would be shit if the Go runtime just assumed a 64-bit baseline.
In any case, well written Go code will generally use the specific int type that makes sense for the data at hand, int16, int32, int64, etc, and not simply int. A programmer declaring a lot of un-qualified 'int' variables (for things other than loop indexes and such) in his or her Go code is (in most cases) doing Go wrong.
"It's like somebody from the 70s tried to come up with a better C than K&R."
That's basically what Go was meant to be and I suspect the Go team would consider your description of Go to be more flattering than you perhaps meant it to be.
> having int be variable-sized allows you to easily have maps or other int-indexed structures that naturally grow in capacity with the current architecture ... Go programming that targets 32-bit ARM CPUs and the performance would be shit if the Go runtime just assumed a 64-bit baseline.
That's why you have a -fint-is-32-bit flag or some way to bend the standard to make it run faster on those system when needed. There's no way Go with it's shared memory between threads is going to survive any future shift to 128-bit computing anyway, so just making int always 64-bit is the best choice; if your maps or indexed structures are so vast then you'll need better abstractions than coroutines.
> [better C than K&R is] basically what Go was meant to be and I suspect the Go team would consider your description of Go to be more flattering than you perhaps meant it to be.
Well it certainly was not intended to be flattering, but I suspect you are correct.
Welp, I had no idea about the int64 business. Probing on this, but it does seem like it changes things quiet a bit.
UPDATE: After updating go from 1.0.2 to 1.1, this matches my results - it now takes about 2 minutes, 47 seconds to run through the input (and Code Jam confirms the output is correct). This seems like a significant detail to me, so I'll update the post.
Modifying your Go code to use int64's the program (compiled with Go v.1.0.3) takes about 20 minutes for the L1 problem on my Thinkpad x200 (core duo 2.4GHz).
More importantly, the argument that Ruby is "too slow" seems to hedge on the L1 input for Fair and Square problem being solvable with "fast" languages using same unoptimized algorithm as can be used with the S small input. If you'll look at the Codejam results this is simply false. In fact, a large percentage of Ruby users passed Fair and Square L1 problem, higher than many "fast" languages:
Language % of S input solvers successful on F&S L1 input
I think this means you need to rethink your conclusions.
In fact algorithm contests are usually designed so that the small inputs are solvable by brute force algorithms, but you need to optimize them to solve larger inputs, harder problems. It's not just people using Ruby and other "slow" languages who have to rethink their approach.
I'm still on Go 1.0.2. However, I strongly doubt that having 1.1 yields the correct result in the amount of time that the article boasts; I rewrote the program in C (direct translation), and with both clang and gcc, I get timings of around 2m05s. It's hard to believe that for a CPU-bound task, a Go program would be 200x faster than a C program.
I'd agree it's dubious, but I was mostly commenting on the "correctness" of the code. It'd work on Go 1.1, so it's technically "correct". Sometimes. :)
That is not quite right, before 1.1 int was 32-bit on all platforms and post 1.1 an int is 32-bit on 32-bit platforms and 64-bit on 64-bit platforms.
You are supposed to use int when you don't care about its size (int = native word size). If you are writing a program where it matters you should explicitly use int32 or int64.
It is true that Ruby tends to be slow for programming competitions, especially if the computation is done serverside such as Codeforces.com.
Notice that your solution runs in O(sqrt(N)) time. If N were up to 10^14, with 10^3 test cases, then your solution could have to deal with up to 10^10 iterations... which is dangerous even for a fast language (since your CPU can only do about 10^9 things per second). As a rule of thumb, one should only ever deal with at most 10^8 things. Even just incrementing an integer 10^10 times can take some time (fortunately, not every one of the test cases were 10^14 so your Go solution took under a second).
However, Problem C of Google Code Jam qualification 2013 can be easily solved, even with a slow language, by making some observations:
1) Instead of iterating through numbers and checking if they are palindromic, you can just iterate through the numbers with half the number of digits and generate the palindrome by mirroring the half. A naive solution that uses this would run in O(N^(1/4) log(N)^1.6) instead of your O(N^(1/2) log(N)^1.6) solution. (The log(N)^1.6 is due to the time taken for multiplication, assuming the Karatsuba method).
2) With a small amount of math you can figure out that for any fair and square number N = MM, then for any digit s in M, we must have s(sum of all digits in M) < 10. Thus solutions are of the form:
case 1) Contains up to nine digits 1. (e.g. 101, 111111111)
case 2) Contains up one digit 2 and up to four digits 1. (e.g. 1002001, 2)
case 3) Contains up to two digits 2 and up to one digit 1. (e.g. 200010002, 202)
case 4) Contains one three. (i.e., just 3)
A naive solution using observation 2 may run in O(log(N)^5.6) time.
3) Use combinatorics to figure out count the ways to arrange the digits in the 4 cases shown above, rather than iterating through them. As it turns out, it is unnecessary to do this since even using an O(log(N)^6) solution I passed the 10^100 input. Since there are only about 40,000 fair and square numbers from 1 to 10^100, you can easily precompute them and find them by binary search.
This is a problem which is CPU-bound rather than IO-bound. Ruby, Python and the like won't typically show a great difference in speed when the majority of time is spent in IO operations (like web crawling, or text processing, running web apps, querying databases, etc), because the bottleneck on performance comes mostly from the slow reading of files and network sockets. In this case, though, the operations are mostly dependent on just doing computation, so a high-level interpreted language like Ruby really shows its limitations.
This is one of the (many) reasons why although languages like Ruby and Python are great for rapid development, on-the-fly updates and other reasons, it can pay dividends to be fluent in a high-performance language. It wouldn't even need to be C (although this wouldn't be hard to do in C) -- a language like D would give you excellent performance and nearly as easy development as Ruby.
Hey everyone, thanks for all the input on this. As I mentioned in the post, programming competitions aren't my forte, and I have a lot to learn in that arena. All the feedback on how this should have been done more efficiently is great.
In the future, I obviously need to work on coming up with better algorithms for solving these kinds of problems, but I'll probably continue to try and use Go because it's nice to have that fallback to sheer speed when my algorithm isn't great.
This is definitely a good idea. As the responses here demonstrate, there are many different and independent optimizations available for solving this problem, so it can be useful to fall back to a faster language if you are having a hard time improving your solution.
Just a suggestion, but I would also give some thought to efficient coding style, not just algorithms. For example, you could have speeded up your I/O dramatically.
I benchmarked the I/O improvement I was thinking about. It actually only resulted in an improvement of about 30%. Worthwhile, probably, but not as "dramatic" as I thought it would be. Mea culpa.
I think that 30% improvement you're referring to must be for just the i/o itself. Given that the i/o for the problem amounts to less than 1/1000th of the processing time, a 30% improvement is basically irrelevant. It doesn't help much to speed up something that's not even close to being a bottleneck. I think many of the best algorithm contest coders just use whatever i/o method uses the least code, keeps their answers simple and clean.
It was only an EXAMPLE. It might be irrelevant to this particular issue, but look at the context of my comment: it isn't irrelevant to the subject of learning to code more efficiently.
No problem. When I read your post I was confused whether you were suggesting that he could have gotten a 30% speed up in overall time by making i/o more efficient. Now I understand you weren't.
But I would still suggest that, within the context of programming competitions, a certain kind of "programming efficiency" is close to irrelevant. A sloppily coded solution that nevertheless implements the proper algorithm can be many orders of magnitude faster than an efficiently coded implementation of a slow algorithm. Programming contests are more about having the insight to implement an algorithm having low big O, not so much about efficient coding practices. This isn't to say that efficient coding practices aren't important generally, just that there are much more important things to focus on in programming competitions.
It is, in a sense. The later rounds only give you 2.5 hours to implement and execute your solution to the problems. Thing is, you are competing against folks who can write complex C++ algos in under 10 minutes!
I participated in this contest, and solved this problem. The problem with the post is that it's basically saying "This correct ruby program is way slower than this incorrect go program, therefore ruby is bad for programming competitions". Obviously not a valid argument.
I am risking myself getting insanely downvoted for writing this. But I hope it at least gets some of the attention.
I really hope the Ruby Communities do actually admit Ruby is Slow. Because until you actually admit there is a problem, there will never be any notion of wanting to fix it.
Most of the time, they will come up with Rails Examples or serving dynamic web pages, and the bottleneck being in Database. And the higher amount of request you get, you can solve it with caching.
This is, in my view a very limited scope of Ruby usage.
They deny any wrong doing of Ruby, suggesting it is a dynamic languages, and it's wrong comparing to a compiled languages, or the algorithm are done wrong.
They keep suggesting under xxx usage, Ruby is fast enough to get the job done.
Then there is the argument of throwing more money and hardware to grow and scale. But the truth is most of time Ruby will require more then a few dedicated servers to keep going. While it is not as drastic as the iron.io 28 to 1, but for startup lowering running cost is important too.
Then there is this arguments comes up which i hated most. You are not going to worry about the need of that many servers, or scaling that needs to change to another languages unless you are AirBnb, Twitter etc..... I mean what? I am sorry i may not have worded it correctly but most of the time those read like you are never going to be successfully that you are very unlikely to have to worry about.
Performance is important!. And for most this isn't about pushing it to Javascript V8 or insane 1000x programmer Mike did with LuaJIT. It is just Ruby should have a more respectable performance VM. While Ruby is designed to make programmers happy, I am pretty sure there are many aren't happy with its basic performance.
PHP and PYTHON are slow too. Like them Ruby can take advantage ) of C extensions. You cant write everything in ruby like PHP developpers use native drivers for database communication , compression , image manipulation ,etc ... that's normal.
Scripted languages should be used for what they are for , a thin layer on the top of other more performant languages.
Like Twitter , Ruby/Rails allowed them to bootstrap their idea ,"probe" a market, make it work , then they thought about performance and scaling. That's what scripted languages are for.
Yes, but I think what ksec was driving at is there isn't an intrinsic reason what you said should be true.
Why should Ruby (and scripting languages) be used for RAD and not performance? That's because implementations range from really slow to somewhat slow. Self, LuaJIT, V8 and IonMonkey, and increasingly PyPy are ample evidence that this need not be the case, and that's just listing the dynamically-typed languages.
The bar has been raised the past few years. Increasingly we can pick three of: productivity, libraries, performance. Ruby lacks the last, which makes it a less interesting prospect for future greenfield projects compared to its competition.
The problem was PHP and Python are slow too, and yet they are still faster then Ruby. And not to mention there are many other implementation of the two that gets them further along. So if Ruby was 1x, and PHP / Python are 1.5x, there are things that can get the last two to 3 - 4x. While Ruby is still just there, 1x. ( Numbers are for explanation only ). I really wish more resources were pour into Rubinius.
Even using the same algorithm implementation, there is a huge speed processing difference between languages.
For example the implementation of the Mandelbrot algorithm in RUBY takes 47 MINUTES to complete, while it takes LESS THAN 30 SECONDS when doing the same computation using much faster languages such as JAVA, SCALA, C or FORTRAN. According to this performance test, those languages are more than 100 times faster than Ruby, Python or Perl.
Source: Computer Language Benchmarks http://benchmarksgame.alioth.debian.org/u32q/performance.php...
Other benchmark speed comparisons between programming languages, showing similar results:
http://benchmarksgame.alioth.debian.org/
This whole, "Ruby is slow," diatribe is just ridic.
Everyone here that has done any C/C++ or Ruby development would probably agree that if you're looking for number smashing, Ruby is just not cutting it unless you're into rockin' C extensions and getting what you want... for now.
Ruby, within a few years, will probably have some VM action, whether the JVM with JRuby or Rubinius wins out, who cares.
In any case, in like 5 or 6 years when we all have 50/100 core machines, Ruby's gonna be just fine for almost everything. Maybe by then we will be done arguing and just agree that Ruby is the nicest language to look at and develop in, even if it has subpar performance. Because if you're arguing something different, you've probably never gone from Ruby to C or Java or even Python... just so you all know, it blows.
Tim Sweeney predicted that we'd all have 20 core machines with 80 hardware threads by 2009. Not really what happened.
The whole point is that while Ruby is nice, it's just too slow. Someone is better off investing time learning a faster compiled/JIT language.
Too slow for what? Certainly not all classes of problems. Heavy mathematical work (every number in Ruby is an object! That's going to be much, much slower than a C primitive!), sure, but there are lots of folks using it quite happily to solve a wide range of problems.
Many people seem to be asserting just because the OP's algorithm was sub-optimal, that it means his Ruby code running slow vs Go is a non-issue. The problem is even well written Ruby vs Go has the same kind of performance divide:
Ruby is said to be useful for text processing and such. In Japan they had a need for the language to deal with text files specified in a Japanese codeset. Some of the flexibility to support different codesets cost Ruby a little. For Ruby 1.9 and 2.0, Matz took a while in the transition to "Unicode" support in order to keep some flexibility, even though many people preferred the old way of UTF8 and so on.
Ruby is best when it's used for the things it was meant for, but that hasn't kept people from trying to use it for more stuff. That's how Ruby found a good use in servicing web apps, first with CGI and then with dedicated instances in Rails and the like.
Now with Dart I understand that people demand languages be architected in different ways to extract more performance from them. Dart doesn't have "eval", for example. Whereas Ruby and Javascript do.
The question though is one of a divide, between the people who would never use Ruby or JavaScript and those who do use and love those languages that come with a lot of flexibility from the core. Python is like Ruby and JavaScript, even if the Python syntax could be borrowed for different languages that are more restricted in what they offer.
With eval language designers might have a delayed execution that could make compile-time optimizations less straightforward. Part of the toolset could have to be present during deployment time to be able to handle eval which happens at runtime. And yes, security concerns could also play a hand.
In Dart the code declaration is taken very seriously in many ways, and avoiding eval buys them more compile-time, loading-time and perhaps runtime optimization opportunities. I find that more than the language designers themselves, it's the end-users or developers that ask for more "bondage and discipline" from their tools. People who come from Java and C++ backgrounds, for example. They just can't have enough compile-time stuff aided by an IDE.
There is a huge difference between Ruby and Go, in terms of performance. I wonder if anyone has any experience with Python, considering that both are interpreted programming languages.
I've has decent experiences with Python in Google Code Jam, with two exceptions. (1) Python can't handle large numbers, so the mere existence of huge integer inputs for the large prime number problem blew up my code since Python could not convert numbers that large. (2) Python is fine with math, but at a certain point of data accumulation, performance just plummets, no matter how good your algorithm is. Overall, though, I haven't had the experience the parent comment did. Either Python works beautifully or it literally just doesn't work, rather than take an hour.
I treat it as a challenge, though, as it forces me to be smarter about my implementation.
> Python can't handle large numbers, so the mere existence of huge integer inputs for the large prime number problem blew up my code since Python could not convert numbers that large.
Beg pardon?
>>> 2**2**2**2**2
[snipped because HN does not allow 20k comments)
In my experience the speed decrease isn't worth it during competitions.
I heard some talk about allowing Python at the International Olympiad in Informatics http://www.ioinformatics.org/index.shtml (in addition to Pascal and C(++)) but that's probably far off
Actually, Python is compiled (just like Java, C#, Ruby) to bytecode which is executed in a virtual machine. It is the dynamic nature of certain languages that can make them slower than languages that are statically typed.
Ruby is too slow for brute force solutions for programming competitions. If you look at the third set of cases, the inputs could have a range 0..10^100. Even with the square root, any solution that is going to check 10^50 numbers/strings is going to take a very long time in any language.
He did everything right. Tried it in ruby, if it would have been fast enough this article wouldn't exist. Instead he applied good software engineering and found the right tool for the job.
This doesnt say "don't use ruby it's too slow". It's says prototype in your comfort zone and optimize. Isn't this what we all agree on?
I have noticed a similar thing recently. A programming challenge with a 10s max time restriction, and just reading the input and storing it in an array takes over 7 seconds.
I allocate the array ahead of time, as soon as I know the size.
One line at a time. Is that significantly slower for stdin? I would assume that this wouldn't make a difference, since we're talking about simple memory operations, and 20k method invocations shouldn't be that hard on a modern system.
One thing that Code Jam does differently (and imho better) is that they give you an input set and you have 8 minutes (or so) to upload the solution. This is expected to give more than enough space for language speed variance.
Instead of execution time, coding time is much more precious in such competitions, and
Micro-optimizing has it's place, but it's usually not in a programming competition. that's what using ruby instead of C gives you leverage for that, as you can express more complex ideas and algorithms more easily (which should be the focus point).
This doesn't mean you shouldn't know the cost of operations.
Coding time was actually not precious here. They gave about a day to do the qualification round but only eight minutes to run your program once you downloaded the input data, so the round was optimized for writing highly performant code at your leisure and then downloading the data when you are all set to go.
That's because most contests (but not all) are limited to C, C++ and Java and they've practiced hundreds if not thousands of hours on solving those kinds of problems.
Of course they will be more efficient in c++ than a person who is spending most of their time wondering how costly string operations are.
That doesn't mean that there aren't also people who have done the same and still prefer python (or some other language) not for execution, but development speed (and bignum, larger standard library...)
So you wrote a brute-force algorithm in a slow, interpreted implementation of a dynamic language, and you were surprised that your solution took a long time to run?
Using a faster language implementation could have allowed you to use the same brute-force algorithm and come in under the time limit, but your real problem is your algorithm, not your language implementation.
You can write fast code in a fast language and get very fast running time.
You can also write slow code in a fast language, or fast code in a slow language, and still have acceptably fast running time.
But you wrote slow code in a slow language, the worst of both worlds.
Maybe the result would vary with JRuby or Rubinius (I'd be interested in benchmarks) because not only is Ruby itself not the fastest language, but its main implementation (MRI) is also not the best for speed.
This is a good point that I didn't consider. Running the 'just iterate over the range version' of this code with jruby:
[master] clifff@fair_and_square: ruby -v
jruby 1.7.3 (1.9.3p385) 2013-02-21 dac429b on Java HotSpot(TM) 64-Bit Server VM 1.6.0_43-b01-447-11M4203 [darwin-x86_64]
[master] clifff@fair_and_square: time ruby fair_and_square.rb C-large-1.in.bak
real 6m39.105s
user 6m37.762s
sys 0m19.009s
The article doesn't actually say that performance was one of the judging criteria, and I'm unable to find anything in the code jam rules about this. Can anyone here clarify?
Performance is not technically a criteria but you have eight minutes from the time you download the input data until you have to be done uploading the output data. So if you spend 50 minutes incrementing integers, then you'll get 0 points no matter what :)
Performance isn't a judging criteria, other than you need to be able to provide the proper output from a given input within a certain number of minutes. For this problem, it had to be within eight minutes.
Not for the qualification rounds, but in later rounds, the winner is the person who gets the most number of points first. It won't come to any surprise that the finals tend to be dominated by C++.
Once you request to download the input data there is only a short (8 minutes IIRC) window of time during which you can upload your output data. Therefore your program must run within that time period.
If you precompute the palindromes in the range, store a table of prefix sums, and calculate each of the 10,000 cases in constant time it will probably pass (in ruby).
Edit: as electrograv pointed out, another optimization is to generate palindromes from sqrt(start) to sqrt(end) rather than testing each number in the range for palindrome-ness. This saves you a factor of 10^3.5.
From the edits: "If you are like me and don’t always come up with the best solutions, you may want to consider other languages too."
--> "You may always want to consider other languages too."
No one should ever agree with an ultimatum on language choice: if you need hardcore speed, use a lower-level language, not an interpreted garbage-collected scripting language. Duh.
Also, if you ever think you need built-in support for large numbers in your programming language during one of these contests, then you're using the wrong algorithm. These contests are typically designed to be doable with very, very simple code using primitive data types.
Isn't it obvious that there must be a better solution to the problem then your solution? You shouldn't try to find the palindromes, but rather generate them.
You're right that EventMachine helps when you're dealing with IO-limitations (like those commonly found in web applications), but for CPU-limited functions like the ones that you see in programming competitions, it's not going to provide much of a benefit.
Great, so stop dicking around with some zero-sum game ( http://en.wikipedia.org/wiki/Zero%E2%80%93sum_game - a competition may only have one winner ) and go build a business, where no one cares what language you use as long as the solution works for your customers - and there can be more than one winner in many cases.
Or even if you don't have a side project or startup or something, go build something cool and open source it and do that, rather than working on some artificial problem.
You can click the little down arrow all you want, but it won't keep me from thinking you're mostly wasting your time with these competitions, and it won't put any money in your pocket.
Curious, would you call University education dicking around as well?
Seems to me challenging the brain is never a waste of time. We would not have some of the medical technologies we have today if it were not for massively parallel GPUs, which of course were developed for the sole purpose of playing games.
> would you call University education dicking around as well?
That's a big debate! For a lot of people, I would, though, yes. There are serious economists who posit that universities are mostly about signalling, rather than actually learning much. I don't think I agree with that entirely, and believe it depends a lot on what's studied. It also depends on the opportunity costs, which may not be that high for many people that age.
> GPUs, which of course were developed for the sole purpose of playing games.
I don't believe the creation of games is a zero-sum game at all; there can be many winners, and, as you say, there are other side effects that are beneficial. The creation of games also falls into one of business, side-project or open source project as mentioned in my other post, no?
Artificial coding competitions aren't really about creating much of anything though, from what I can tell. A bunch of people solve the same thing and then throw away what they made.
I agree that challenging your brain isn't a waste, but it seems that in a world that is desperate for coders, there are more productive ways of challenging your brain in a way that's perhaps even more satisfying, because of the positive feeling of having helped to create something useful.
> That's a big debate! For a lot of people, I would, though, yes. There are serious economists who posit that universities are mostly about signalling, rather than actually learning much. I don't think I agree with that entirely, and believe it depends a lot on what's studied. It also depends on the opportunity costs, which may not be that high for many people that age.
Fair enough. For myself, it was what lead me to my career. I found a large number exams followed a similar format to this very coding competition. Many times on the job I have thought back on exam questions for inspiration for the problems I was solving. In that sense, I most definitely do not classify my education as "dicking around".
It brings be to my original point; when video games were being developed, these admitted benefits were certainly not obvious at the time. We simply wanted to render things faster, and at higher resolutions. Now other applications of the same technology is ubiquitous in many regards.
As a final tangential point, I don't think the code competition is wasteful because it does not have any immediate intrinsic value. See exams example above, and really, the discussion here. Also, perhaps this was his so called "downtime". Many people would argue that not all facets of our life should be strictly geared towards productivity. He may already run a business or hold a day job (maybe his blog gives clues otherwise, I don't know).
There are plenty of things that aren't "productive", but strike me as still being more creative/interesting/useful than churning out the same answer as everyone else to a made-up problem. Making your own programming language or OS is not 'productive', but a great learning experience, creative, and lots of fun. And who knows, maybe someone will get some use out of it. I once made a fairly simple programming language, and it did end up seeing some niche adoption and use in industry, which was cool.
I think it adds a lot to the discussion to point out that games are just games and that in the real world, the speed of a language's implementation may not be what actually matters, at all.
Of course, most products don't really need to do calculations outside of a database and this is why Ruby has taken off so well - but it's still important to realize that choosing Ruby over PyPy/Go/Whatever really can be a very expensive choice in the long run when your product suddenly relies on some unique math solutions, and having half your product in Ruby and half in C is awful to have to deal with.
The solution to this is a project like PyPy for Ruby, but it doesn't seem to be coming...