Hacker News new | comments | show | ask | jobs | submit login
Announcing Topaz: A New Ruby (topazruby.com)
481 points by jnoller 1804 days ago | hide | past | web | favorite | 182 comments

Completely unscientific, but if these outputs are any indication, this is going to be great news for Rubby users in the future...

  $ time ruby -e "puts 'hello world'"                                                                                                                           
  hello world

  real    0m0.184s
  user    0m0.079s
  sys     0m0.092s

  $ time ~/Downloads/topaz/bin/topaz -e "puts 'hello world'"                                                                                                    
  hello world

  real    0m0.007s
  user    0m0.002s
  sys     0m0.004s

There is a neural net example benchmark in the topaz git repo. Don't know how representative that example is, but at least startup time shouldn't be dominating the results...

  $ ruby -v
  ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin11.3.0]
  $ ruby bench_neural_net.rb
  ruby bench_neural_net.rb  17,74s user 0,02s system 99% cpu 17,771 total

  $ bin/topaz bench_neural_net.rb
  bin/topaz bench_neural_net.rb  3,43s user 0,03s system 99% cpu 3,466 total

should be able to run any number of ruby examples from here: http://benchmarksgame.alioth.debian.org/u32/ruby.php

You should check out a tool I heard about recently called "ministat". An example of how to use it to benchmark multiple runs, and then compute the statistics from each run are available here:


Example output:

  |                   +                            x                             |
  |                   +                            x                             |
  |           +       +                            x x            x              |
  | +    ++   +++     +                            x xxx xx       x              |
  |++ ++++++++++++++++++                   x  x    x xxx xxx  xxxxx              |
  |++ ++++++++++++++++++ +++ +       ++   xxxxxxxxxx xxxxxxxxxxxxxxxx  xx x    xx|
  |     |______MA______|                        |________A________|              |
      N           Min           Max        Median           Avg        Stddev
  x  57      45.62364     46.437353      45.93506     45.951554    0.19060973
  +  57     44.785579     45.534727     45.042576     45.056702    0.16634531
  Difference at 95.0% confidence
    -0.894852 +/- 0.0656777
    -1.94738% +/- 0.142928%
    (Student's t, pooled s = 0.178889)

In order to eliminate the potential bias of startup time I run similar test in many iterations. Here's what I got:

    $ time ruby -e "10000.times { puts 'hello world' }" > /dev/null
    real    0m0.102s
    user    0m0.096s
    sys     0m0.005s

    $ time ./topaz -e "10000.times { puts 'hello world' }" > /dev/null
    real    0m0.098s
    user    0m0.071s
    sys     0m0.026s
Any idea why I don't see such big difference?

topaz probably has a pretty slow IO (for bad reasons, it's an RPython problem)

Because the majority of the cycles ran are probably in 'puts', which is implemented in C if I were to guess.

The overhead of starting the interpreter must be longer that executing the loop.

Because a puts loop is I/O bound, not CPU bound.

what is your 'ruby'?

Good question. How many shells is rbenv/rvm executing? Do you get similar results with an absolute path?

ruby that comes with Debian Squeeze.

    $ ruby --version
    ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]

To scratch a quick interesting thought that came into my head I just had a look at Cardinal, which is a Ruby implementation running on the Parrot VM - https://github.com/parrot/cardinal

So downloaded & built Cardinal (which went seamlessly however I did have Parrot already installed) then I did same benchmarks alongside ruby1.8 here:

  $ time ruby -e "puts 'hello world'"
  hello world
  real	0m0.130s
  user	0m0.049s
  sys	0m0.071s

  $ time parrot-cardinal -e "puts 'hello world'"
  hello world

  real	0m0.057s
  user	0m0.037s
  sys	0m0.019s
Very interesting because I thought Cardinal was supposed to be slow!

I think more diverse benchmarks are required. And when time permitting I might add Topaz & ruby1.9 into the mix.

Whenever I see frontpages for these kinds of projects like "a faster X" or "X written in Blub", the first thing I want to see on the frontpage is how this new project compares to X in terms of quality and performance. Even specious benchmarks would help more than zero benchmarks.

I wish more frontpages for these kinds of projects would do that.

If they put that on their frontpage, there would be at least 20 posts on here bashing them for it because they didn't get it right (or just accusing them of outright lying/incompetence).

"Even specious benchmarks would help more than zero benchmarks."

I disagree. Zero benchmarks is definitely better than specious benchmarks.

To clarify, I was trying to use "specious" as a synonym of "flawed." I thought this was the common usage, but apparently not.

As to your point, obviously no one should be making any decisions off of flawed benchmarks, but flawed benchmarks (not so far as outright lies, just flawed) at least give me an objective justification to investigate further.

Even some flawed benchmarks could help turn the initial tide of responses like "this is X written in Blub, it's bound to be better!" or "this is a faster X! Now everything will be twice as fast!" They're silly examples, but it seems like every time a new technology comes out, these are the kinds of knee-jerk, overly-optimistic reactions people tend to have.

" To clarify, I was trying to use "specious" as a synonym of "flawed." I thought this was the common usage, but apparently not."

"Specious" does mean, more or less, flawed.

Zero benchmarks are better than flawed benchmarks.

All benchmarks are flawed in one way or another.

spe·cious /ˈspēSHəs/ Adjective Superficially plausible, but actually wrong: "a specious argument". Misleading in appearance, esp. misleadingly attractive: "a specious appearance of novelty".

So you are saying you would prefer wrong information?


1 obsolete : showy 2: having deceptive attraction or allure 3: having a false look of truth or genuineness : sophistic <specious reasoning>

So instead of taking the common meaning of specious, a deceptively attractive benchmark we have to go to a less common use of specious in order to construct an specious argument about the proper use of specious? Nevermind that it is distracting from the main point of the discussion about benchmarks in the context of Topaz.

For what it's worth, amalog's definition of "specious" is the one I'm familiar with. My girlfriend recently quizzed me on her GRE words with that one, and my definition was the given one, too.

So I'd argue that amalog's definition IS the common one.

It depends where you look for the common meaning. To native English speakers in the UK, Australia, New Zealand (ie outside North America, coincidentally including the country where the English language developed) specious has one very clear meaning

I would also direct you to the usage examples in your link. All of which use specious in a context that implies deception of outright falsity. I have never ever seen specious used synonymously with obsolescence.

I read that as "showy (obsolete)". Either way, amalag's says "wrong", the other says "deceptive" and "false", I'm not sure what anyone's arguing about. Specious data has no value.

I disagree with your argument! If inventing the language meant you got to choose all the definitions we wouldn't have English in the first place. Shift happens. I don't think this particular usage is common, though.

I was using "specious" as a synonym of "flawed." Perhaps that was the incorrect usage.

But in the case of advertising a new library/project, which is arguably one of the main functions of the frontpage, flawed benchmarks (though not so far as outright lies) at least give me an objective reason to investigate further.

With no benchmarks, generally I'll open the page, mutter "that's nice" and move on with my business. I'd imagine I'm far from the only person who does that. Young projects don't help themselves when they don't effectively advertise themselves.

Expanding on the concept of entirely unscientific benchmarks, I benched some simple prime number math with Topaz, Ruby, JRuby and RBX.

Looks promising for Topaz! https://gist.github.com/havenwood/4724778


the idea was to implement enough of "hard stuff". So it should not change. but hello world is not a good idea (this one also might actually change due to library loading, but please don't benchmark it like that)

Makes sense. I wasn't claiming any "absolute" benchmark, of course, just pointing out that real benchmarks should wait until the implementation is more complete.

in other words - if you can run it today, you should believe numbers. if you cannot run it, then well, you cannot.

> the lack of many core features probably helps out a lot atm.

I don't understand why the lack of features would affect the time of "hello world."

Loading the standard library takes time. With a smaller (underimplemented) standard library, you can get to the user's code much more quickly.

Perl 5 had an in-memory footprint of about 768k. VisualWorks Smalltalk had a standard one of 12MB, yet loaded and started it faster than Perl did with 768k. If just your standard library takes very long at all to load, something needs attention.

Yeah, a lot of language runtimes seem to not have discovered mmap.


time echo "hello world" hello world

  real    0m0.000s
  user    0m0.000s
  sys     0m0.000s

  $ type /bin/echo
  /bin/echo is /bin/echo
  $ time /bin/echo "hello world"
  hello world

  real	0m0.009s
  user	0m0.002s
  sys	0m0.004s

  $ type echo
  echo is a shell builtin
  $ time echo "hello world"
  hello world

  real	0m0.000s
  user	0m0.000s
  sys	0m0.000s

Try comparing with Rubinius

Impressive, how does it compare to JRuby?

That's probably not a fair test to use on JRuby, as the JVM is notoriously slow to start.

That doesn't make it unfair to JRuby, it just means JRuby will probably lose that benchmark. :)

If the goal is to benchmark Ruby execution time, it's unfair. The slow startup time is a valid concern when considering short-lived sessions, but it's kind of meaningless when trying to benchmark a Ruby implementation.

There are no 'fair' benchmarks, all benchmarks should be biased to the problem you're actually solving running your actual workload. If you can't replay your workload at multiples of real volume then you should probably work on doing that before benchmarking as it helps you out with the real problem of verifying your infrastructure.

In general a benchmark is probably the worst metric you could ever use for deciding on an implementation. Unless the profit margin of your business is razor thin and dependant eeking out every last drop of performance, and even then most of those gains will be from extremely small sections of code that are probably best written in assembler by a programming God, and you should investigate FPGAs, ASICs, and other high performance solutions.

If your benchmark (infrastructure) involves a database (or anything that uses disks) that's probably going to be the problem long before the speed of your language / language implementation.

I don't even know where to begin with this.

In all comparisons, you should remove confounding variables. Yes, you should benchmark something you actually care about, otherwise what's the point? That doesn't mean all other variables are immediately null and void. That's why I said said if your goal is to measure ruby execution time, you should remove startup time.

As for the practice of benchmarking in general, you're partially right. Micro-benchmarks are usually useless because they don't map to real work load. But profiling and speeding up small portions that are used heavily can have drastic improvements that in isolation seem small -- the death by a thousand cuts problem. Not all improvements come from isolated instances with very slow performance profiles.

This fallacy about DB access and not needing to optimize really needs to go away though. Even if 50% of your app is spent hitting DB, you have opportunity to speed up the other 50% and it's likely far easier. Ruby in particular is ripe for improvements on the CPU side. I managed to reduce my entire test suite time by 30% by speeding up psych. I managed to cut the number of servers I need in EC2 in half by switching from MRI, Pasenger, and resque to JRuby, TorqueBox, and Sidekiq. And I've managed to speed up my page rendering time anywhere from 8 - 40x by switching from haml to slim. None of these changes required modifications to my DB, none required me to write assembly, none required me to switch to custom-built hardware, and each helped reduce the expenses for my bootstrapped startup, while improving the overall experience for my customers.

How much does one EC2 server net you in revenue?

What is the percentage increase in profitability yielded from these optimizations?

What was the percentage increase in profitability yielded from the last A/B test of your homepage CTA copy?

An EC2 server doesn't net me anything in revenue. It costs us money and falls into the category of expenses. Reducing expenses makes us more profitable. Reducing our expenses by 50% made us roughly 50% more profitable.

Better than that, this savings isn't one-time. It's recurring, as EC2 is recurring. But we've also reduced the expense growth curve (the savings wasn't linear), so we can continue to add customers for cheaper.

The A/B testing thing is a complete non sequitur. a) there's no reason you can't do both. b) most A/B testing yields modest improvements.

I find it interesting that reducing your EC2 usage by 50% decreased your expenses by 50%, it means one of two things, you aren't paying your employees and don't have any overhead, or the cost of EC2 dwarfs the cost of your employees and overhead.

If it's the latter, I'd seriously consider colo as you can probably reduce costs by another 80%.

Obviously the discussion was scoped around non-personnel expenses. I'm not going to dump an entire P&L here. And this is now wildly off-tangent.

I was illustrating that there is real world gain to be had by doing something as simple as switching to a new Ruby or spending some time with a profiler. These weren't drastic code rewrites. They didn't require layers of caching or sharding of my database. I fail to see what's even contentious about this.

As long as it can reasonably expected to be mostly bug free and support everything you need it to with little changes to the app. It wouldn't need to require to much time playing around with it before the EC2 savings are eaten up by the wage costs of spending time on it. (Depending on how many servers you are running of course)

This isn't a theoretical argument. I actually did this and it didn't take all that long and some of it was even fun. An added benefit is my specs run faster, too. So developer time is saved on every spec run now. You also hit that intangible of improved developer happiness.

Additionally, the X time exceeds Y cost argument really only works when people are optimally efficient. Clearly those of us posting HN comments have holes in our schedules that might be able to be filled with something else.

Amdahl's law disagrees -- the value to improving the non-DB part is limited.

I'm not trying to be snarky here, but that would advocate for no improvements anywhere else. Why did Ruby 1.9 bother with a new VM? Why try to improve GC? Why bother with invokedynamic? Why speed up JSON parsing? Why bother with speeding up YAML? Yet there's obviously value in improving all these areas and they speed up almost every Ruby app.

It's overly simplistic to say the only option is to cache everything. Or that your DB is going to be your ultimate bottleneck, so the other N - 1 items are worth investigating.

And even in the link you supplied, the illustrative example is getting a 20 hour process down to 1 hour without speeding up the single task that takes 1 hour. It suggests there's an upper limit, not that because there is an upper limit you can't possibly do better than the status quo.

" that would advocate for no improvements anywhere else"

Amdahl's law advocates starting from the part that takes the most time. In a database application, it can be interpreted as either A) improving the connector or B) reducing the application's demand for database resources.

"Why did Ruby 1.9 bother with a new VM? Why try to improve GC? Why bother with invokedynamic? Why speed up JSON parsing? Why bother with speeding up YAML? Yet there's obviously value in improving all these areas and they speed up almost every Ruby app."

JSON parsing improves those applications that use JSON parsing, and in many applications JSON parsing is the main operation. There are many other applications for which garbage collection is the limiting factor. You are taking my comment, which was addressing the parent comment's remark that "This fallacy about DB access and not needing to optimize really needs to go away though.", way out of context. It's not a fallacy -- you need to know what is dominating execution time and how to improve that aspect.

Take it to the logical extreme -- you could just write in x86 assembly directly. The program would be faster than ruby, but the development time would not make assembly a worthwhile target.

"And even in the link you supplied"

What link did I supply? I recommend the Hennessy and Patterson "Computer Architecture" book :)

Sorry about the link comment. I'm so used to Wikipedia links being passed around I must have instinctively looked there.

In any event, we probably agree on more than we disagree. I never disagreed with working on the DB if that's truly the bulk of your cost. But, you do actually need to measure that. It seems quite common nowadays to say "if you use a DB, that's where your cost is". And I routinely see this as an argument to justify practices that are almost certainly going to cause performance issues.

Put another way, I routinely see the argument put forth that the DB access is going to be the slowest part, so there's little need to reduce the other hotspots because you're just going to hit that wall anyway. And then the next logical argument is all you need is caching. The number of Rubyists I've encountered that know how to profile an app or have ever done so is alarmingly small. Which is fine, but you can't really argue about performance otherwise.

The issue is that, to some, "ruby execution time" may include the startup time.

But if it's not executing ruby, then it can't be ruby execution time... That's the point I'm making. By all means, if you want to measure start-up time, that's valid as well, just not for "how fast does this execute Ruby."

You may have a finer-grain definition of "execute Ruby" than someone else.

It's not meaningless if you're trying to make a command line app and not a Rails app. Startup time does matter for command line apps (a lot!).

Agreed. But then you're not benchmarking Ruby execution speed, which really makes comparing the two not very worthwhile. By the same notion, then when you compare on something like JRuby, you really should be comparing JDK 6, 7, and 8 builds, along with various startup flags, both JVM and JRuby. E.g., short runs may benefit greatly from turning off JRuby JIT, JAR verification, and using tiered compilation modes. May as well add Nailgun and drip in there as well, since both can speed up startup time on subsequent runs. Which one of these is going to be the JRuby you use in your comparisons? Or are you really going to show 20 different configurations?

You can run into the same problem with MRI and its GC settings. If too low for your test, you're going to hit GC hard. It's best to normalize that out so you have an even comparison. Confounding variables and all that.

There's a lot you can do outside the Ruby container to influence startup time. When comparing two implementations, the defaults are certainly something to consider, but not when trying to see which actually executes Ruby faster. They are two different metrics of performance and should be compared in isolation.

Well startup times of the JVM aren't terribly relevant if, for example, your app doesn't need to start the JVM every time it's used.

In production server apps the JVM always running anyway, and under some degree of Hotspot optimization, so for a JRuby benchmark to be informative and worth anything to you, you'll want to account for that.


Topaz doesn't run rails yet (as far as I know, I didn't even dare to try!), so I doubt you'll find any benchmarks ;) There is one benchmark in the bench/ directory of the repository you can try though!

Instead of asking, you could read the linked article, which itself says it isn't complete enough to run Rails yet.

How the hell is your Ruby that slow to start?

    $ time ruby -e "puts 'hello world'"
    hello world
    real    0m0.011s
    user    0m0.008s
    sys     0m0.003s

The first time I ran:

   time ruby -e "puts 'hello world'"
   hello world

   real	0m0.221s
   user	0m0.005s
   sys	0m0.006s
subsequent times:

   time ruby -e "puts 'hello world'"
   hello world

   real	0m0.008s
   user	0m0.005s
   sys	0m0.003s
So, I guess he ran ruby first followed by topaz and ended up with those results

    $ time ruby -e "puts 'hello world'"
    The program 'ruby' can be found in the following packages:
     * ruby1.8
     * ruby1.9.1
    Ask your administrator to install one of them

    real	0m0.060s
    user	0m0.040s
    sys	0m0.016s

Don't know if you are trolling. But if this is genuine: this is rbenv telling you that you have multiple rubies installed.

1. Pick one (just for this session): $ rbenv shell ruby1.9.1

2. And then run the example.

By the way 1.9.1 is really old already, 1.9.3 has a lot more bug fixes.

ruby1.9.1 on Debian isn't ruby 1.9.1.

The Ruby language changed between 1.9 and 1.9.1, so a new package name had to be created.

If 1.9.1 was just called "1.9" it would break all of the packages in Debian that depend on whatever language features were different between 1.9 and 1.9.1.

"ruby1.9.1" in Debian 7.0 provides version

The OP's output looked more like rbenv's output though, right? How did you figure that to be debian's message?

I'm all for new Ruby implementations, but be very careful about judging performance of an incomplete implementation.

It's "trivial" to make a fast language that looks quite a bit like Ruby. It is a lot harder that make a language that remains fast in the face of handling all the quirks of the full Ruby semantics, though, such as selectively handling the risk of someone going bananas with monkey-patching core classes that could happen at any "eval()" point.

What you just wrote is exactly what I've been telling people about every new "Python" implementation for the last 3 years. Believe me, I understand this argument completely, that's why I made sure we had all the hard bits (monkey patching core classes, eval, etc.) before I released this.


no call/cc yet, I do have a branch with fibers, but I'm waiting for an RPython branch to land for that first.

1) Note that the post doesn't sell its speed; all it says is that they are interested in a high-performance Ruby

2) Python is a very similar language to Ruby, and Pypy already runs python very rapidly. This doesn't guarantee that the Ruby interpreter will be anywhere near as fast, of course, but it does give evidence that it's possible.

3) As kingkilr notes below, they've taken into account your argument and they believe they've implemented enough of the language to be confident that they can run it rapidly. No reason you need to believe him, but it's worth listening to.

The third paragraph reads "Out of the box Topaz is extremely fast." This is most definitely selling its speed.

Fair enough, I missed that. Thanks.

For those with long memories there was a (in)famous Topaz project in the Perl world. This was an attempt to implement perl5 in c++. It was abandoned however it did help kick-start Perl6 & Parrot.


- http://topaz.sourceforge.net

- Topaz: Perl for the 22nd Century http://www.perl.com/pub/1999/09/topaz.html

- Historical Implementations/Perl6 http://www.perlfoundation.org/perl6/index.cgi?historical_imp...

I will be the first to ask: why did Alex / the pypy devs start this project? I thought the three community funded ideas were great (STM, Python 3, and Numpy). What prompted this new project? Is it meant to attract new developers to pypy as platform for implementing languages? Is it a side project? Is it a long term project meant to become the premium implementation of ruby as pypy is becoming for python? Etc.

Not criticizing at all by the way, just curious about the motivation / background / context for the project, which is missing from the docs.

Hi to be clear, this is project undetaken by me (Alex) independently, it wasn't funded out of the PyPy funds or anything like that. There were 3 primary motivations:

a) Because it's fun

b) To prove RPython is a great platform

c) To mess with people's heads, it's crazy!

Hey Alex, let me tell you I completely understand points a) b) c) and more. It seems we started our competing Ruby on PyPy implementations at about the same time [0]! My goal was to build a robust and documented Ruby parser, and hope PyPy make it magically fast. To that end I embedded PLY and modified it to not require kwargs. Then a thunder of work fell from the skies, and I was just recently unearthing PyPy 1.9 to put the thing back on track.

Congratulations for the release :-)


    $ ls -l playground/pypy
    total 29160
    drwxr-xr-x  12 lloeki  staff       408 18 Feb  2011 ply-3.4
    drwxr-xr-x  15 lloeki  staff       510  3 Apr  2012 pyby
    drwxr-xr-x  23 lloeki  staff       782 27 Mar  2012 pypy-pypy-2346207d9946
    drwxr-xr-x  18 lloeki  staff       612 27 Mar  2012 pypy-tutorial
    -rw-r--r--   1 lloeki  staff  14927806 27 Mar  2012 release-1.8.tar.bz2
    drwxr-xr-x  10 lloeki  staff       340  3 Apr  2012 unholy

Holy snakes[0]. It seems you pursued the exact same approach[1] as me :-)

[0]: https://github.com/topazproject/topaz/blob/master/topaz/lexe...

[1]: https://github.com/alex/rply

It would be nice to see Rpython/pypy turned into a project not unlike WebKit where many different projects benefit from the same open-source core. If pypy is flexible enough to implement all of ruby and all of Python, has a reasonable amount of maturity, and is already rather fast it seems like a good candidate for such a thing.

Quick how-to:

1. (optional) install PyPy "JIT compiler" binaries [0]

2. get and extract the PyPy source[1] from bitbucket release packages, which includes the RPython translator, written itself in Python and, although working on CPython, best run under PyPy as installed on step 1.

3. write a (R)Python module including a def target returning the entry point function[2], and call the translator upon your module, like so:

    python ./pypy/pypy/translator/goal/translate.py example2.py
I wish getting the translator was more 'packaged' and did not involve getting the whole PyPy source but I hear this is in the works, and it is reasonably easy already. The hardest part is actually figuring the translator is not part of PyPy binary release, and that it's available straight from bitbucket.

[0]: http://pypy.org/download.html

[1]: https://bitbucket.org/pypy/pypy/get/release-1.9.tar.bz2

[2]: http://morepypy.blogspot.fr/2011/04/tutorial-writing-interpr...

yeah, working on it, sorry :/

http://tratt.net/laurie/tech_articles/articles/fast_enough_v... is probably the best example, because — as far as I know — outside of having reimplemented his Converge in RPython Laurence Tratt is not affiliated with the pypy project in any way, shape or form.

Amazing link, very clear and well-written description of how PyPy works and great background for understanding this announcement, thanks!

Forgive me for asking what may be a very stupid question: Can someone explain to me the need for a Ruby interpreter built on top of Python? He mentions performance in the announcement, but is this really going to be faster than, say, MRI? Thanks in advance.

Think of Pypy as somewhat like LLVM, except the language it's built on is a subset of the python language, called RPython.

Although the most well-known language implemented in the pypy "vm" is python, the toolchain is completely language agnostic.

So, this is more akin to Apple writing a C interpreter on top of llvm than it is to building a ruby interpreter on top of python.

PyPy is totally not like LLVM in many regards. It targets a very different demographics. Some differences:

* RPython comes with a good garbage collector

* the language where you specify what's going on is RPython in which you write an interpreter. Then you get a JIT using a few hints. This is difference than "interpreter in C + compiler to LLVM" scenario by quite a bit.

* RPython comes with a set of data structures that are higher level (lists, dicts, etc.) and is a GCed language. JIT is also well aware of those.

Completely true, I was just trying to convey how different "building ruby on RPython/Pypy" is from "building ruby on top of python".

Do you know what an analogy is?

I'm actually not a native speaker, so indeed, I might not know. I checked in the dictionary though. This is an analogy, but not a very good analogy, because of the reasons that I pointed out. Are you disagreeing with any particular reasoning there? Or should I just say "they're similary, but"?

An analogy does not require that the items involved actually be similar, only that the relationships in each are the same. For example, "Gasoline is to cars, as sunlight is to trees" is a perfectly good analogy, even though gasoline is nothing like sunlight, and cars are nothing like trees.

In much the same way, comparing "Apple writing a C interpreter on top of llvm" to "Alex Gaynor writing a Ruby interpreter on top of RPython" makes a very good analogy, even though Apple is not much like Alex Gaynor, Ruby is not much like C, and PyPy is not much like LLVM.

A good explanation, and further, many times a good analogy relies on the things being very different in every way except for the one way it intends to emphasize. An analogy is often a way of saying: "Let's focus on this (often very abstract) aspect of interesting similarities. Yes, you can easily point out many other dissimilarities. That ease means I'm not talking about those aspects, so if we understand each other those don't even have to be mentioned."

"They're similar, but" is much better than "is totally not like." The point of the analogy is to give a high-level view, not to achieve 100% accuracy at the lowest levels of detail.

Well, so you're right and you're wrong.

You're right that the analogy doesn't quite capture everything here. On the other hand the analogy gets across most of the point.

I would have said "That's mostly right, but the interesting thing about sharing code on a VM is access to VM features and code shared on the platform, not just that it's possible to decouple a front end from a VM."

or something to that effect.

There's no need to be a jerk.

Yeah, although there may be other ways to phrase your admonishment which people will be more receptive to.

"C'mon dude, don't let flaming beget flaming."

I do, but I also appreciate further distinction between pypy and llvm in this context.

Not really. The LLVM analogy does capture the toolchain aspect; the RPython toolchain generates a JITted VM from an interpreter description.

Thanks for that simplified explanation. As someone not familiar with Python and Pypy I was having a tough time getting past the "Ruby on Python is efficient?" block; the analogy to LLVM is helpful indeed.

RPython is a subset of Python that is used to build PyPy. If you want more information on it, I would do some reading on the structure of PyPy.

To be clear, this isn't "Ruby running on top of Python." This is "Ruby specified in RPython."

So, lets talk about JRuby. JRuby is a Ruby implementation on the JVM, and its really awesome power is actually that you can wrap and interface with Java libraries, and still write all of your app code in Ruby.

If Topaz was able to give me the ability to access NLTK, but still write in Ruby? I'd be overjoyed.

> If Topaz was able to give me the ability to access NLTK

I doubt that's going to happen. Topaz is not a Ruby running on the CPython interpreter, the Topaz VM is developed in RPython. As far as I know, RPython doesn't provide that capability either (between VMs coded in RPython), which fijal seems to confirm.

So that would be a very cool project to merge python <-> ruby (or any other interpreter written in RPython). It requires quite a bit of work to pull it off the ground though. I would be happy to provide guidance on that, but I definitely won't do it myself (just yet).

It's also worth noting that this is (mostly) possibly with RubyPython running in Ruby against a CPython2 dynamic library.

Zach Raines did 99% of the heavy lifting (I've got a very small part to the whole project, and no time to work on it right now), and it works pretty nicely overall.

Yes, because it's actually implemented on top of PyPy, which is a very modern run-time with a state-of-the-art JIT and good garbage collectors.

It's not vanilla Python but RPython, and presumably to take advantage of the PyPy toolchain to get similar results in Ruby as PyPy has in Python.

I'm curious to see how this compares to MRI/Rubinius/JRuby once it's feature-complete; the PyPy vs. CPython benchmarks these days are pretty convincing.

One of the major launch criterion was having "enough of Ruby that the performance won't change". What does that mean in practice? We have exceptions, we have bindings, and we have all manner of other obscure corner cases. Charles Nutter (JRuby dev) writes about this: http://blog.headius.com/2012/10/so-you-want-to-optimize-ruby...

Does Topaz benefit at all from the work that's been done already on Rubinius? I seem to recall that Rubinius implements as much of Ruby in Ruby as possible. It seems like this would help Topaz as well in implementing the standard library. Or are the implementations different enough that this doesn't work?

One of the things to come out of the Rubinius project is RubySpec, which is pretty much a language spec in unit test form, and which be very useful to test Topaz.

AFAIK much of the Rubinius library is independent of VM features (since it was only in C for historical speed reasons), so there should be opportunity for reuse.

I'm probably in the minority but I would love to see a dialect of ruby that has significant whitespace. Death to unnecessary 'end's

> Some observers objected to Go's C-like block structure with braces, preferring the use of spaces for indentation, such as used in Python or Haskell [braces are optional in haskell]. However, we have had extensive experience tracking down build and test failures caused by cross-language builds where a Python snippet embedded in another language, for instance through a SWIG invocation, is subtly and invisibly broken by a change in the indentation of the surrounding code. Our position is therefore that, although spaces for indentation is nice for small programs, it doesn't scale well, and the bigger and more heterogeneous the code base, the more trouble it can cause. It is better to forgo convenience for safety and dependability, so Go has brace-bounded blocks.



Especially when writing a code generator. Targeting a significant whitespace language is potentially more tricky than one with explicit delimiters. I love the Haskell approach to this.

(For those who don't know, Haskell has an optional whitespace-significant syntax (which most humans use) as well as a brace-delimited syntax, and these are exactly equivalent.)

Significant whitespace in Python was a significant factor in my decision to get proficient in Ruby over Python.

Yes, I hate it that much, and I know quite a few Ruby devs with similar views...

Here's how I look at it: I indent my code properly and my team does the same. If we're already doing the right thing where whitespace is concerned, why not reap some benefits from it?

Of course, I also use Coffeescript and Haml...

Of course, I also use Coffeescript and Haml...

I'm curious: why not use Python then as well?

He's saying that him and his team indent their code correctly anyway, so it might as well be significant

I'm asking if someone wants a language like Ruby but with significant white-space why not use Python?

I'm pretty sure his comment is indicating he _does_ use Python and that is his take on the "significant whitespace" issue.

Funny, I posted the following to HN this morning: Acme::Pythonic - Python whitespace conventions for Perl - http://news.ycombinator.com/item?id=5175619

Sweet, I'm very interested in this. I assume this implementation has a GIL? Any word on how threads will be implemented?

No GIL. Also no threads yet :) We're going to wait to see how Armin's work on STM support pans out.

Implementing other languages on top of the PyPy toolchain is an interesting concept, and I'd like to see it happen more often. But calling this "Ruby in Python" is sure to invite flamewars from all sides. The actual Topaz site seems to be careful not to do this, but if this thread is any indication, a fair number of people are already mistakenly calling it that.

There are already a few other language implementations on PyPy.

For e.g.

- Javascript | https://bitbucket.org/pypy/lang-js

- Smalltalk | https://bitbucket.org/pypy/lang-smalltalk

- Schema | https://bitbucket.org/pypy/lang-scheme

- Io | https://bitbucket.org/pypy/lang-io

Topaz does not, and that's not its point, but technically since RPython is still Python Topaz is indeed a Ruby interpreter in Python. Although a very slow one. Hence the final two lines of the readme:

> To run Topaz directly on top of Python you can do:

> $ python -m topaz /path/to/file.rb


At least once a year there's a maelstrom of posts about a new Ruby implementation with stellar numbers. These numbers are usually based on very early experimental code, and they are rarely accompanied by information on compatibility. And of course we love to see crazy performance numbers, so many of us eat this stuff up.

Given that the original article thanks the author of that post (Charles Nutter) by name, I think it's likely that Alex is aware of it.

If it doesn't have all of Ruby's weird features, claims of speed don't matter, because it's those weird features that make Ruby slow and hard to optimize.

Could this lead to a runtime in which I can use really cool ruby modules from python and really cool python modules from ruby? I mean, seriously, these two languages both have some pretty awesome stuff, and regularly I say about one, "man I sure wish I could use $module from the other".

Any reason for not targeting Ruby 2.0?

Excellent news. I wish this project the best of luck. More competition in the Ruby implementation field is always a good thing for end users, it fosters innovation and provides choice and a good kick in the ass to everyone.

I've been wondering why the tools the PyPy people have been building are not used to implement more languages. It's nice to see this happening. And it will be interesting to watch where this leads.

Doesnt mention what the current status of topaz is with respect to http://rubyspec.org/

I mean, it's cool that people want to expose Python's version of Rubinius as a building block to yet another Ruby interpeter, but, come on. Why is this a thing?! Why not waste time improving the JVM, contributing to Rubinius, patch the MRI, or a million other open source projects already tackling Ruby being "slow". By the way, has anyone else realized that, while Ruby is slow, computers are getting faster for us too!

Sounds really interesting, but we already have a dozen of incomplete alternative Ruby implementations...

Just curious, for this project did you set out to write it in idiomatic ruby or idiomatic (R)Python?

So Ruby on top of RPython on top of PyPy? Why not, but how does it compare to JRuby and Rubinius?

> So Ruby on top of RPython on top of PyPy?

No. It's Ruby in RPython, where PyPy is Python in RPython. The pypy project is currently in the process of splitting "RPython" (the VM-development framework) from PyPy (the Python VM) to make that clearer.

RPython is a general purpose, language-agnostic (ish?) core for implementing JIT-ed garbage-collected languages, it's kind-of similar to LLVM being a framework for implementing compilers.

(now because rpython is a proper subset of python you can run your rpython VM on a python interpreter without translation, but that doesn't give you any specific python bridge, in the same way coding your VM in C doesn't give you a c FFI for free)

RPython is generally language-agnostic, although dynamic langauges are much better suited for this sort of transformations. So it makes sense for any kind of dynamic language (like prolog even), but makes less sense for say Java or C.

But RPython used to be the reduced version of the Python language. What name do you give to the jit-compiler generator?

The same, as far as I've seen. RPython is both the VM-framework and the Python subset which exists within (and for) that framework (RPython is not considered and developed as a general purpose language, but as a language for building VMs within the RPython system/framework/toolset).

The important distinction really is between RPython (a language and framework for building virtual machines) and PyPy (a virtual machine implemented in/on RPython)

Not really. RPython is a statically-typed language which is used to implement PyPy's virtual machine framework. PyPy is itself a Python VM built on top of that framework, but the framework is designed to be reusable. So Topaz is a Ruby VM using that same virtual machine framework.

So... GIL?

RPython is the new C?

Absolutely not. RPython is a language to write dynamic languages virtual machines. That's a much less broad scope than C aims at (and achieves).

In addition to fijal's comment: I also heard somewhere that RPython is a) not strictly specified and b) may change when needed without notice. So I would stay clear of it. :)

we removed that from the docs, it's kinda stable by now (but it's still an atrocious language to use, just beats C/C++ for that particular task by a lot)

Out of curiosity, whats so atrocious about it? The slow translation step or language idiosyncrasies?

Slow compilation, undocumented semantics, and most importantly: atrocious error messages..


It's Ruby in RPython, the implementation language of the PyPy VM toolchain. It has nothing to do with integrating with Python. It's an alternative Ruby VM using a proven JIT generator technology that happens to be built with a Python-like language. Hope that helps.

> Ruby running on Python

That isn't what it is. Either you don't have the technical chops to understand, or you didn't read, or both.

Topaz as a name? Seriously? (http://topaz.sourceforge.net/)

I find it odd that you link to a project that hasn't seen an update in the last 12 years. How old does a project have to be before it is considered "dead"?

The project is certainly dead, but it's also historically significant.

It's impossible and immoral to have two projects with the same name. How dare they.

Well maybe, but from the outside they look like really similar projects (just different languages).

Last modified in April of 2000. Y'know, back when we used Perl willingly.

Looks like topaz.sf.net was a 3-month, early alpha post-perl-5 experiment twelve years ago.

The fact that it used the name "Topaz" should have absolutely no bearing on any current project.

The link goes to a project last committed to in 2000. You surely can't be objecting for that reason.

Names are not unique.

Time to use UUIDs for project names :)

Why not implemented on Erlang OTP? Ruby and Python share the same scaling issues, so I don't see a benefit it combining them or migrate from one to the other.

I also dislike the "high performance" claims without showing a single performance comparison. Not to mention the state of implementation completeness of the language.

I think you're missing the point. It's not written in Python. It's written i RPython, which is vastly different. Technology behind this and PyPy is the same, however there is no Python usage here.

Additionally, erlang has avoided scaling issues by being functional. This might be good or bad, depending on your viewpoint, but just implementing a ruby interpreter in erlang won't cut, because you're not answering any of the hard questions about the state. You might answer "meh, ruby is just a broken language", but well, this has nothing to do with that article.

"Topaz is written in Python on top of the RPython translation toolchain (the same one that powers PyPy)." [1]

"Topaz - An implementation of the Ruby programming language, in Python, using the RPython VM toolchain." [2]

[1] http://docs.topazruby.com/en/latest/blog/announcing-topaz/ [2] https://github.com/topazproject/topaz/blob/master/README.rst

As RPython is a subset of Python, that statement is technically correct but misleading. Stating that Topaz is "Ruby written in Python" leads to the idea that it's written in / runs on CPython, which is really not the case.

Well, it can be run on CPython if you wanted (that's how we run many of our tests). It's just crazy slow.

The Erlang runtime imposes a different set of constraints than Python that would impact the design of the language being built on top of it. Given that, someone already has come as close as you can to porting Ruby to the Erlang VM, it is called Elixir. http://elixir-lang.org/

Its implemented on top of RPython, which is something completely different then Python, the runtime.

No it's not. Technically speaking yes, but without Python there would be no RPython. In other words you're betting on three language/vm communities: Ruby language, Python (RPython!), LLVM (iirc).

I don't believe introducing a lot more dependencies and complexities will cure any problem. Sure, it's a nice project and the work shows the great skills of the developer. But it's not practical to use this besides some non-crictical fun projects.

Pretty sure LLVM has nothing to do with this, it's not currently used in PyPy.

(Also, LLVM is not a language...)

Yes, but betting on Python is the whole point of PyPy. The idea is to build a toolchain using a relatively high-level programming language like RPython to build new languages.

As far as I understood is that this allows for low-level concepts to be changed more easily, while low-level implementations always suffer from basic decisions (e.g. you will never get reference-counting out of CPython, while you might be free in to change that in an RPython-implemented language).

So, the entire point of PyPy is being able to implement other high-level languages in RPython, so an attempt to implement Ruby is definitely very interesting! Also, the resulting binary is not depending on Python whatsoever.

That's not really rebutting the GP's point, though, which is that despite using a Python-derived language, we're not running Topaz on top of a Python runtime.

Build an Erlang backend for PyPy. Should be doable, but what benefits would it bring?

It's not quite Ruby on top of Erlang, but you might be interested in Elixir (http://elixir-lang.org/). Elixir has some of the syntax of Ruby but is built on the Erlang VM.

What does that have to do with the article?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact