Hacker News new | comments | show | ask | jobs | submit login
SciRuby (sciruby.com)
225 points by jergason 1905 days ago | hide | past | web | 61 comments | favorite

I'm glad to see this project is coming back.

I'm in the middle of writing a Stats library for Ruby[0]. Maybe we can join forces?

[0] https://github.com/davejacobs/stats

I scanned your project and it looks great. This is one of the major things we are trying to accomplish with SciRuby - simplified access to things like GSL for basic/essential science and stats. Seems like you and Claudio Bustos should get together (Claudio is working on distribution[0] and is very involved in sciruby). We should join forces.

[0] https://github.com/clbustos/distribution

I'll be in touch this week. This could be exciting. More than anything, I think this will give statistics a home on the Web and will give a new face to statistics for non-engineers. Maybe we can do for statistics what Rails did for Web development.

I noticed this is licensed as GPLv3 as opposed to a BSD license for SciPy. Isn't this going to be a huge barrier to adoption, or am I missing something?

Could you go into more detail? Since we haven't released yet, we might still switch. But I haven't heard a lot of compelling reasons for doing BSD instead.

As a matter of course, MIT and BSD licenses are non-threatening to businesses and organizations who do not necessarily have software as their primary focus.

Non-software shops who may be interested in using a piece of software may balk at using GPL3'd software because they don't know what the legal ramifications are of failing to comply, or the knowledge/wherewithal/processes to do release the software that they're using.

Talking to people about BSD/MIT is really easy: "You can do whatever you want with it as long as you retain the license and copyright".

At the risk of getting into a FOSS license debate, i'd like to think that FOSS contributors do it out of a motivation other than license restriction (and hell a lot of people still rip libs off, even when they are GPL'd!).

I am sure that most FOSS contributors do it out of a motivation other than license restriction, but I am also certain that many potential FOSS contributors work for organizations that may have questions about contributing to a GPL project (not particularly valid concerns/questions, but the less you have to deal with the legal department the easier it is to get sign-off on contributing to a project...) When you are talking about semi-specialized software like numerical analysis tools it is quite possible that there is a latent pool of potential contributors in industry who would find it easier to contribute to a BSD/MIT project than a GPL one.

It is also the case that this is basically a library, and many who would have no problems using/contributing on a GPL application will balk when it comes to a library or framework.

The resurrection of SciRuby is great news, but the use of the GPL was one of the first things I noticed, and I strongly recommend using a more permissive license. The MIT license would be a good choice. In addition to avoiding the fears associated with the viral nature of the GPL, using the MIT license would be more in line with the culture of the Ruby community. Most Ruby software I know of (though not, strangely enough, Ruby itself) is MIT-licensed, and people will be more comfortable with SciRuby if it conforms to this expectation.

The main reason we initially chose GPL, I think, was because while we had strong respect and appreciation for the MIT-license-like Ruby culture, we were more concerned about the science culture.

In other words, what could we do to provide another arm-twisting mechanism to force publishing authors to release their source code? We wanted to facilitate openness among people who might join the Ruby community by way of SciRuby, as opposed to those who might join the SciRuby community by way of Ruby.

Whether or not we can actually enforce release of source code is an open question. Certainly many journals and funding agencies have enormous problems here.

What about a joint license? For example, is the following practical? "If you publish your work in an academic context, SciRuby is GPLv3 for you. If you do not publish your work in an academic context, SciRuby is MIT."

Dual-licensing under MIT and GPLv3 would probably do the trick.

Sure. My understanding is that you can't include GPL'd code in a project that is not itself licensed under the GPL. This may be an oversimplification, since you could potentially use the tools without actually including them in a project, but this is sort of a legal gray area that I don't fully understand (e.g. what constitutes inclusion of GPL'd code in a project?)

So this is one problem with using an MIT license for SciRuby.

Our distribution gem -- well, technically, Claudio's distribution gem, but used by SciRuby -- has some Ruby code in it derived from C code in the GNU Scientific Library. Being a GNU library, GSL is licensed under the GPL.

Interestingly, the GSL-derived code is only utilized if the user does not have libgsl installed. And my understanding is that code which uses libgsl is not technically a derivative work, and therefore not required to be GPL'd itself.

I suppose one possibility is to abstract the GPL'd code into yet another gem (distribution-gsl?), which is itself licensed under the GPL.


The two technical justifications (objects all the way, and enumeration) in the FLOSS interview [0] are both arguable and not nearly convincing enough to justify further fragmentation of the open-source science ecosystem. If I want cleaner semantics, s/python/ruby is at best moving sideways - for the sake of a few keystrokes? Ruby is slower both in the interpreter itself and in the lack of f2py,Cython,Numexpr,PyCUDA,weave (even Theano sometimes).

If I need a real change, I'll use Ocaml or Clojure, and gain speed from the change.

It seems like a waste to discard (or attempt to replicate) the 10s-100s of person years represented by SciPy and the ecosystem including f2py, Cython, MayaVi, IPython (not just a REPL), Pandas, Chaco, PyCUDA/OpenCL, and SAGE - to name a few.

Are there any other, better reasons to want to build an ecosystem from scratch?

[0] http://www.floss4science.com/interview-sciruby-team/

The same arguments could have been used when scipy was starting up: * Why did folks build scipy when there was PDL? Just to avoid a few sigils? * Why build scipy when R had been released just before? * Why didn't they just support matlab/octave? (scipy and matlab are so close in syntax anyway) * If syntactical differences are not enough to justify a new scientific library, why not just stick to Fortran, or C/C++? One can argue that an enormous amount of time has been wasted on writing python wrappers. For what? Just to save a few keystrokes? However, most would agree that the world is a much better place now that scipy and related tools are in it.

Some arguments: * Ruby is expressive and flexible in ways that python is not. For example, Rubyvis is flexible enough (or similar enough to javascript at least) to essentially accept protovis (javascript) code directly. I don't think python can do this. * Ruby uses blocks/enumerators instead of 'for' loops. How much programming involves enumeration of one kind or another? * len(array) vs. array.length

Coding is not just getting the computer to do what you want, it is also how you think about it and the form that it takes. 'len(array)' vs. 'array.length' may not matter to most, but it matters to some of us.

The great news is that we can still use scipy when we need to. I'm betting there is room for both projects, especially considering how small sciruby is at the moment.

Ocaml and clojure are great, but have a steeper learning curve; getting non-programming scientists to contribute is far easier in a language like ruby or python.

A beginning-programmer scientist who wants to start writing code to solve problems currently has to 1) code in python or 2) learn ruby and python/scipy or matlab or R in order to do some scientific computing. SciRuby means (eventually) that for most things, novices only have to learn ruby. It is hard to overstate the importance to new programmers of being able to use just one language (at least to start with).

If you like python over ruby, this is easy. If you like ruby over python, it gets old piping all your data over to a python script to use the basic features of scipy.

SciPy is not currently starting up. There were other open-source alternatives to Matlab in 1995, but certainly none with the scope and community of SciPy in 2011. That full-spectrum alternative is not Octave, not now or then. R is a very nice statistics dsl with prevalence in publications and thus a trove of code (arguments much weaker at that time) - but there is no PyMol or MayaVi in R for good reasons.

For the record, NumPy predates PDL by a smidge, and the existence of PDL is more of an argument against SciRuby, considering the cultural similarity and continued strength of BioPerl. As for C/C++/Fortran wrapping - there's a bit more to it than syntax efficiency, plus wrapping is often semi-automatic and leverages multi-language, science-ambivalent toolkits (ie SWIG or SIP). However, one nice consequence is the fact that via buffer wrapping, the NumPy array has become an efficient common currency for a huge number of legacy libraries.

IMHO, the advantages you have cited pale in comparison with the task of reimplementing 15 years worth of work for what is essentially unity gain in code style (+/- 2% depending on your flavor preference). Regarding that code style, you may be missing the forest for the trees: the inflexibility of Python is a small price to pay for community cohesiveness, and the resulting multiplier effect is non-trivial. Put another way, the time I've spent attempting to read Perl code-golf leaves me very leery of Ruby.

My basic argument though is not that Python is superior or coexistence impossible, it's that every person-year spent reinventing a mature system that is far beyond 'good enough' is a person-year that could be spent advancing the state of the art in scientific computing by building on Theano or improving SAGE - or preferably, doing real science.

Re-implementing is much easier than implementing. Scipy source code is all available, right? Besides, we aren't trying to duplicate the entire scipy stack and ecosystem. We are just trying to make it easier to do common scientific computing in Ruby. Also, trying to do it in new ways.

You seem to be saying that ruby is python, just with perl's inconsistency and unreadableness. The syntax differences between python and ruby (more than 2% IMHO) amount to very large differences in code organization. The ruby community places a high premium on brevity and clarity, and ruby's flexible syntax facilitates this. Part of the reason monolithic code bases are more rare in ruby is because we tend to do more with less code. We are talking about two very different forests.

We weren't going to be working on Theano or SAGE anyway, just doing basic science computation to solve real problems in our fields. That's really the problem, we find ourselves quite productive doing science in ruby at the moment, despite its relative immaturity in this area. It's hard not to imagine what we could do with a few more foundational tools. It really is a small cost to enable us to be able to do science in ruby. Plus, building scientific computing libraries is good fun. Every community should have the chance. As Abe Lincoln suggested, 'Let not him who is houseless pull down the house of another; but let him labor diligently and build one for himself.'

Is it so terrible to want to do science in the language you most enjoy and are most comfortable with? To continue with the maladroit quoting of Abe Lincoln, "it's best to not swap horses when crossing streams."

I made the point further down, but will restate it: if ruby hadn't persisted in existing (the nerve!), would django exist today? I find it hard to believe that this is a zero-sum game, and I wish python and scipy every success. Perhaps the most valuable scipy contribution we could make will come by making sciruby something worth borrowing ideas from?

I hope they don't set up SciPy as the project to emulate and improve upon. Deep knowledge of R, Fortran, and Matlab would better inform this project of what scientists need.

SciPy is great, but it's clearly best for programmers that have a slight scientific bent and can't stomach learning the existing scientific tools (which are admittedly a bit difficult to combine with modern software engineering). There are some great ideas in SciPy, but a broader set of influences is essential to making a great scientific toolkit.

Matlab and Fortran are influences for Scipy; in fact, there is Fortran code inside Scipy.

I'd love to hear more details about the deficiencies, or how it might be more influenced by those.

I agree that the ideas of statistical processing in R are absent from Numpy, but Pandas is attempting to remedy that.

Plenty of real science happens in SciPy. http://scholar.google.com/scholar?cites=2086009121748039507&... is the list of papers on Google Scholar that cite the SciPy website. Biology, machine learning, physics, and more.

I'm a physicist by trade and every colleague I've introduced to SciPy vastly prefers it to Fortran, R, and Matlab.

What do you find missing in SciPy?

> can't stomach learning the existing scientific tools [I assume you mean R, Fortran, Matlab]

The sum of anecdotes is not data, but

It might be the opposite: people who know the pain to work with this tools move to Python for complex projects if they can.

>> Sometimes when a solution of sugar and water becomes super-saturated, from it precipitates a pure, delicious, and diabetes-inducing crystal of sweetness, induced by no more than the tap of a finger. So it is, we believe, with the need for numeric and visualization libraries in Ruby.

Great point about sugar water. Totally on point.

Actually, I have experience with both SciPy/Numpy and Matlab and I must say, I vastly prefer Python.

I want to interoperate with C code without disgust. I actually want to write my computation kernel in C, inline with my general high-level code. Numpy does that and I LOVE it for that.

I want to be able to use libraries that are not provided by Mathworks. Serious GUI programming using PyQt, for example. Or, say, a decent XML parser. Or maybe some JASON importing.

I want to be able to run my scientific programs without having to wait for a huge Matlab installation to start up and I want to be able to use my terminal properly.

I want to not have to run X11 on OSX for christ sake. (Though apparently, this has been alleviated to some extent in the latest version. Anyone know first hand?)

Oh, and I don't want to fork over 3k bucks just to write some simple signal processing stuff. (And I don't want to pay upgrade fees every year.)

But then, I don't have an infinite budget, I am mostly interested in signal processing for audio signals and I certainly have more of a programming background than a science background (though my formal education would have me believe otherwise). Also, I am not much interested in Simulink and my latest version of Matlab is of 2007 vintage.

What do you mean exactly? Most of the SciPy API maps pretty directly to MATLAB.

It's hard to argue with SciPy's success--it is well engineered, relatively easy to use, highly useful, and well documented. Though SciRuby should be able to do most of what SciPy is capable of, we don't plan on SciRuby being a SciPy rewrite. We all pretty much use matlab, R (and scipy), as well as ruby, so these are all influences.

This is a sincere question, if you're doing science, why would you want to use ruby over python?

You could have asked this question anybody who tried to write SciPy or NumPy before they existed: "if you're doing science, why would you want to use python over fortran?" Or, befor ROR existed, "if you're doing web development, why would you want to use ruby over java/perl?". Your question is basically asking why anybody would ever try something outside of an established ecosystem.

The reason is, they like ruby so much that they want to use it for number crunching too, and not to have to use a less appealing language. They want to build an ecosystem so other people can join and contribute and grow together and maybe outgrow python. They want ruby to win so much that they are willing to work on duplicating a framework existing somwhere else.

My answer to this question is transparency. At my company Brighter Planet we write all of our scientific/methodological code in Ruby so that people with basic technical skill can understand what's going on. The ability to create expressive DSLs is really crucial.

I have a similar answer - I'm a data analyst at a company full of rubyists. I can (and do) use R for most of my analyses, but there's a cost in transparency -- I can't realistically ask someone to review my R code when they don't know the language, and the bus factor is high if anything I write in R is at all important.

That's why I'm trying to use Ruby where possible, even at the cost of a small productivity hit. The benefits of others being able to read my code far outweigh the few extra minutes it takes for me to do something (and in many cases, the sheer brevity of Ruby as a language means it's faster, simply because it's less typing).

I'd love to see SciRuby become a more useful project, and I'd love to contribute. Unfortunately, they don't make it especially easy to get involved -- the mailing list points people to the roadmap, but it's not at the level of detail where someone could jump in (and the component gems don't seem much better), so it's a bit hard to know where help would actually be useful.

Noah -- I'd love to get your thoughts on how to make it easier to include people in our project. Would you be willing to send me an email? john dot woods at marcottelab dot org.

That's very interesting. What sorts of DSLs are you creating, or can you say?

In addition to the point muuh-gnu makes, we wonder if ruby will be able to add something fresh to scientific computing. In a recent interview[0], I outline some thoughts on this:

1. Because of its consistent object oriented design and because everything returns a value, chaining is natural in Ruby.

2. Avoid index errors and for loops with powerful block Enumerators.

3. Scientific data and services are moving to the web, and Ruby is a great web language (although it is incorrect to call it 'just' a web language).

4. The Ruby community is highly innovative and dynamic, so we can collectively generate solutions quickly.

(see the interview for more explanation) Some of these points are more vision than reality at this point, but we think ruby has great potential as a science language. The other reason this makes sense is that folks are already doing science in ruby (and have been for some time) and many tools already exist--we are building on a good foundation and just hoping to extend it to make things better/simpler.

[0] http://www.floss4science.com/interview-sciruby-team/

1. Everything in Python and every function returns a value, but that method just happens to be None if you don't explicitly return. You could probably pretty trivially implement a class that wraps an object and returns the object itself if one of its methods returns None.

2. Python has list comprehension, iterators, and generators, which also allow you to loop without for loops or explicit indexing

3. Python also has lots of web frameworks/libraries.

4. Are you saying the Python community is not (as) innovative?

I'm not saying this project is completely without merit. I know that many people prefer Ruby's syntax and preferences to those of Python. But I think it is disingenuous (and somewhat insulting to Python developers) to claim that this project has any potential benefit beyond allowing Rubyists to do scientific computation in their preferred language.

I'm not arguing that everyone using python for science should come running in hordes to sciruby. What I'm saying is that if you prefer ruby to python that this is helping to make it possible to do a lot of science all in ruby. Also, some of the nuances of ruby could make for some interesting science code.

1. (reply) I know it is possible, however, in practice it is easier and done more often in ruby. The idea permeates ruby thinking much more than in python. Even 'if' statements return a value, and rubyists often use that.

2. (reply) yeah, list comprehensions are cool, wish ruby had 'em. The original point was more directed to R and matlab code. Still, the way in which enumeration is done in ruby looks and feels quite different than python, even if the same effect is achieved. [1,2,3].each_cons(2).map {|x| x + 3}

3. (reply) True, but if ruby hadn't persisted alongside python, would django exist? sciruby may do things differently than scipy, and that just might make python/scipy better down the road.

4. (reply) To be clear, I have the utmost respect for the python community. How about saying that they innovate in different ways? Python's emphasis on having one preferable way of doing things tends to yield well engineered systems, and innovation occurs easily in layers as a result. Because of ruby's flexibility, Rubyists are more prone to rebuild core functionality in different ways. Depending on how much you care about syntax, that's either a colossal waste of time or quite useful.

Finally, if you don't care about the syntax of scientific computation, then yes, this only allows rubyists to do computation in ruby. But, despite their many similarities, ruby and python are fundamentally different in several ways. To suggest that we might be able to do scientific computing in somewhat new and interesting ways is not meant to be insulting to python developers, and we make the claim with the deepest sincerity, and humility. Like python, ruby possesses unique strengths, and we hope to bring these to bear on scientific computing as best we are able.

As for #1, I've just implemented it. https://github.com/zhemao/chainsmoker

From the website:

    However, due to the special nature of python's builtin datatypes (str, list, int, float, tuple, dict, etc.), you will not be able to use chained_class on those classes.
This helps to make my point about ruby's object model. Of course, ruby takes a speed hit for it, but it is more consistent in this regard.

Wow. This is very much needed. That being said, to what end are they going for ?

I think this was posted in response to this article (http://news.ycombinator.com/item?id=3179370) rather than because of some recent update.

That is correct. I am somewhat connected to a few guys working on this, and I saw the article and thought I would share.

I WAS just gonna ping that guy and let him know about it haha

> This is very much needed.

Why? (just curious) SciPy is mature, popular, and has already been heavily peer-reviewed. Then there are R an Matlab and ...

What does SciRuby bring to the table that makes it stand out from the rest?

Some labs do lots of work in Ruby. It would be nice to stick with Ruby for analysis of your data as well. Kindof like the cas for server-side JavaScript: less friction between code in the server and browser.

in addition to jergason's response, sciruby allows/facilitates:

1. better chaining of commands 2. blocks and enumerators 3. integration with rails and other web services 4. a dynamic community

see http://www.floss4science.com/interview-sciruby-team/

I'm really glad they're implementing Protovis instead of trying to copy Matlab's plotting facilities like Matplotlib. Matlab plotting is pretty terrible.

matplotlib is very flexible and has a wide array of plot types. It's great for quickly making plots of some data in Python. But IMO the output is not really nice to look at, it takes a lot of customization to make the plots publication-quality, and it feels kind if clunky for making interactive plots (however, some animation support was added recently, I haven't looked at that in deail yet).

The screenshots of protovis/d3 look very promising, I'll have a look at it. The last time I needed a JS charting library I went with Highcharts, as it had somewhat better support for the run-of-the-mill chart types I was using in my project.

Matplotlib is ok for a first plot during exploratory analysis, but is a far, far cry from the facilities available in R through base graphics, grid graphics, trellis graphics, or ggplot2.

Protovis/d3 take a different approach, also focused on a similar Grammar of Graphics like ggplot but primarily concerned about the tooling, instead of the application.

Tooling level libraries are nice because they tend to be flexible enough for high data ink ratios, unlike highcharts, which turns me away with every example.

Your choice of terms is very interesting! I would say that Protovis and d3 do not take a Grammar of Graphics approach per se; rather, they tackle mostly just the lowest level of the grammar, namely aesthetic composition and some transforms. ggplot is very nice for the kinds of datasets that people use R for, but it's only one part of the story.

For large dataset and interactive visualization in Python, take a look at Chaco: http://code.enthought.com/chaco

You're right to point that out. I abused the term. The real thing I think ggplot and protovis/d3 share, and the fundamental thing that makes them powerful and interesting, is the declarative nature of those DSLs. GoG is quite a bit more on top of declarative syntax, but, while I really enjoy Hadley's work, I don't think it's the end all be all in declarative graphing DSLs.

Great idea and something that will definitely be highly used (I'm downloading it now!), but I really would be cautious attempting to provide what R provides, 'but better'.

R's power comes from the fact that hundreds of scientists have written packages for it when they have a new method - you won't be able to get that overnight. Also, R has strong links with other languages like C.

Finally, while I agree that sometimes R's syntax can be slightly obfusicated, I don't really think the examples on their site are fair... You can 'plot(y~x)' guys :p

Interesting stuff, but given the sheer brainmass that SciPy has attracted its going to be hard to draw users away from that crowd. I think the only appeal of this project, at least for the time being, is labs that are already entrenched in Ruby.

hopefully they make the DSL similar to matlab. Having worked in matlab & numpy, I can definitely say that having numpy's syntax semi-close to matlab helped a lot when we do ports. It (porting) happens more often than you'd think.

Looks good. I'm a big fan of narray to get decent performance for array operations, interesting to see how the rewrite of that will perform. Narray is pretty damn close to realtime compared to native ruby implementations.

realtime what? Realtime like C? Realtime like ATLAS or MKL?

realtime like it's fast fortran matrix stuff. compared to slow-as-molasses ruby native Array#mean (which has to be implemented in ruby).

This is absolutely needed. Glad that the project is come back to life. Big up to the developers.

God forbid you use existing mature tools appropriate to your task.

And while we're at it, may he forbid the co-existence of emacs, vim, and textmate; java, C++, C#; javascript and dart; KDE, gnome, enlightenment, fluxbox, fvwm, icewm, metacity, compiz, wmii, monad; ubuntu, debian, gentoo, redhat, and arch; bash, csh, and zsh ... nothing good ever comes out of approaching a problem from a new angle.

This made my day.

This is terrific. This is newsworthy. Hope it gets legs.

python's concurrent programming is not that good. how is the situation in ruby?

It's the same. The Ruby 1.9.x (YARV) interpreter uses a GIL as well, although like Python, there's a way for C extensions to release the GIL.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact