

SciRuby - jergason
http://sciruby.com/

======
djacobs
I'm glad to see this project is coming back.

I'm in the middle of writing a Stats library for Ruby[0]. Maybe we can join
forces?

[0] <https://github.com/davejacobs/stats>

~~~
jtprince
I scanned your project and it looks great. This is one of the major things we
are trying to accomplish with SciRuby - simplified access to things like GSL
for basic/essential science and stats. Seems like you and Claudio Bustos
should get together (Claudio is working on distribution[0] and is very
involved in sciruby). We should join forces.

[0] <https://github.com/clbustos/distribution>

~~~
djacobs
I'll be in touch this week. This could be exciting. More than anything, I
think this will give statistics a home on the Web and will give a new face to
statistics for non-engineers. Maybe we can do for statistics what Rails did
for Web development.

------
gphil
I noticed this is licensed as GPLv3 as opposed to a BSD license for SciPy.
Isn't this going to be a huge barrier to adoption, or am I missing something?

~~~
mohawkjohn
Could you go into more detail? Since we haven't released yet, we might still
switch. But I haven't heard a lot of compelling reasons for doing BSD instead.

~~~
knowtheory
As a matter of course, MIT and BSD licenses are non-threatening to businesses
and organizations who do not necessarily have software as their primary focus.

Non-software shops who may be interested in using a piece of software may balk
at using GPL3'd software because they don't know what the legal ramifications
are of failing to comply, or the knowledge/wherewithal/processes to do release
the software that they're using.

Talking to people about BSD/MIT is really easy: "You can do whatever you want
with it as long as you retain the license and copyright".

At the risk of getting into a FOSS license debate, i'd like to think that FOSS
contributors do it out of a motivation other than license restriction (and
hell a lot of people still rip libs off, even when they are GPL'd!).

~~~
evgen
I am sure that most FOSS contributors do it out of a motivation other than
license restriction, but I am also certain that many potential FOSS
contributors work for organizations that may have questions about contributing
to a GPL project (not particularly valid concerns/questions, but the less you
have to deal with the legal department the easier it is to get sign-off on
contributing to a project...) When you are talking about semi-specialized
software like numerical analysis tools it is quite possible that there is a
latent pool of potential contributors in industry who would find it easier to
contribute to a BSD/MIT project than a GPL one.

It is also the case that this is basically a library, and many who would have
no problems using/contributing on a GPL application will balk when it comes to
a library or framework.

------
bugsbunnyak
Why?

The two technical justifications (objects all the way, and enumeration) in the
FLOSS interview [0] are both arguable and not nearly convincing enough to
justify further fragmentation of the open-source science ecosystem. If I want
cleaner semantics, s/python/ruby is at best moving sideways - for the sake of
a few keystrokes? Ruby is slower both in the interpreter itself and in the
lack of f2py,Cython,Numexpr,PyCUDA,weave (even Theano sometimes).

If I need a real change, I'll use Ocaml or Clojure, and gain speed from the
change.

It seems like a waste to discard (or attempt to replicate) the 10s-100s of
person years represented by SciPy and the ecosystem including f2py, Cython,
MayaVi, IPython (not just a REPL), Pandas, Chaco, PyCUDA/OpenCL, and SAGE - to
name a few.

Are there any other, better reasons to want to build an ecosystem from
scratch?

[0] <http://www.floss4science.com/interview-sciruby-team/>

~~~
jtprince
The same arguments could have been used when scipy was starting up: * Why did
folks build scipy when there was PDL? Just to avoid a few sigils? * Why build
scipy when R had been released just before? * Why didn't they just support
matlab/octave? (scipy and matlab are so close in syntax anyway) * If
syntactical differences are not enough to justify a new scientific library,
why not just stick to Fortran, or C/C++? One can argue that an enormous amount
of time has been wasted on writing python wrappers. For what? Just to save a
few keystrokes? However, most would agree that the world is a much better
place now that scipy and related tools are in it.

Some arguments: * Ruby is expressive and flexible in ways that python is not.
For example, Rubyvis is flexible enough (or similar enough to javascript at
least) to essentially accept protovis (javascript) code directly. I don't
think python can do this. * Ruby uses blocks/enumerators instead of 'for'
loops. How much programming involves enumeration of one kind or another? *
len(array) vs. array.length

Coding is not just getting the computer to do what you want, it is also how
you think about it and the form that it takes. 'len(array)' vs. 'array.length'
may not matter to most, but it matters to some of us.

The great news is that we can still use scipy when we need to. I'm betting
there is room for both projects, especially considering how small sciruby is
at the moment.

Ocaml and clojure are great, but have a steeper learning curve; getting non-
programming scientists to contribute is far easier in a language like ruby or
python.

A beginning-programmer scientist who wants to start writing code to solve
problems currently has to 1) code in python or 2) learn ruby _and_
python/scipy or matlab or R in order to do some scientific computing. SciRuby
means (eventually) that for _most_ things, novices only have to learn ruby. It
is hard to overstate the importance to new programmers of being able to use
just one language (at least to start with).

If you like python over ruby, this is easy. If you like ruby over python, it
gets old piping all your data over to a python script to use the basic
features of scipy.

~~~
bugsbunnyak
SciPy is not currently starting up. There were other open-source alternatives
to Matlab in 1995, but certainly none with the scope and community of SciPy in
2011. That full-spectrum alternative is not Octave, not now or then. R is a
very nice statistics dsl with prevalence in publications and thus a trove of
code (arguments much weaker at that time) - but there is no PyMol or MayaVi in
R for good reasons.

For the record, NumPy predates PDL by a smidge, and the existence of PDL is
more of an argument against SciRuby, considering the cultural similarity and
continued strength of BioPerl. As for C/C++/Fortran wrapping - there's a bit
more to it than syntax efficiency, plus wrapping is often semi-automatic and
leverages multi-language, science-ambivalent toolkits (ie SWIG or SIP).
However, one nice consequence is the fact that via buffer wrapping, the NumPy
array has become an efficient common currency for a huge number of legacy
libraries.

IMHO, the advantages you have cited pale in comparison with the task of
reimplementing 15 years worth of work for what is essentially unity gain in
code style (+/- 2% depending on your flavor preference). Regarding that code
style, you may be missing the forest for the trees: the inflexibility of
Python is a small price to pay for community cohesiveness, and the resulting
multiplier effect is non-trivial. Put another way, the time I've spent
attempting to read Perl code-golf leaves me very leery of Ruby.

My basic argument though is not that Python is superior or coexistence
impossible, it's that every person-year spent reinventing a mature system that
is far beyond 'good enough' is a person-year that could be spent advancing the
state of the art in scientific computing by building on Theano or improving
SAGE - or preferably, doing real science.

~~~
jtprince
Re-implementing is much easier than implementing. Scipy source code is all
available, right? Besides, we aren't trying to duplicate the entire scipy
stack and ecosystem. We are just trying to make it easier to do common
scientific computing in Ruby. Also, trying to do it in new ways.

You seem to be saying that ruby is python, just with perl's inconsistency and
unreadableness. The syntax differences between python and ruby (more than 2%
IMHO) amount to very large differences in code organization. The ruby
community places a high premium on brevity and clarity, and ruby's flexible
syntax _facilitates_ this. Part of the reason monolithic code bases are more
rare in ruby is because we tend to do more with less code. We are talking
about two very different forests.

We weren't going to be working on Theano or SAGE anyway, just doing basic
science computation to solve real problems in our fields. That's really the
problem, we find ourselves quite productive doing science in ruby at the
moment, despite its relative immaturity in this area. It's hard not to imagine
what we could do with a few more foundational tools. It really is a small cost
to enable us to be able to do science in ruby. Plus, building scientific
computing libraries is good fun. Every community should have the chance. As
Abe Lincoln suggested, 'Let not him who is houseless pull down the house of
another; but let him labor diligently and build one for himself.'

Is it so terrible to want to do science in the language you most enjoy and are
most comfortable with? To continue with the maladroit quoting of Abe Lincoln,
"it's best to not swap horses when crossing streams."

I made the point further down, but will restate it: if ruby hadn't persisted
in existing (the nerve!), would django exist today? I find it hard to believe
that this is a zero-sum game, and I wish python and scipy every success.
Perhaps the most valuable scipy contribution we could make will come by making
sciruby something worth borrowing ideas from?

------
epistasis
I hope they don't set up SciPy as the project to emulate and improve upon.
Deep knowledge of R, Fortran, and Matlab would better inform this project of
what scientists need.

SciPy is great, but it's clearly best for programmers that have a slight
scientific bent and can't stomach learning the existing scientific tools
(which are admittedly a bit difficult to combine with modern software
engineering). There are some great ideas in SciPy, but a broader set of
influences is essential to making a great scientific toolkit.

~~~
d0mine
> can't stomach learning the existing scientific tools [I assume you mean R,
> Fortran, Matlab]

The sum of anecdotes is not data, but

It might be the opposite: people who know the pain to work with this tools
move to Python for complex projects if they can.

~~~
jwallaceparker
>> Sometimes when a solution of sugar and water becomes super-saturated, from
it precipitates a pure, delicious, and diabetes-inducing crystal of sweetness,
induced by no more than the tap of a finger. So it is, we believe, with the
need for numeric and visualization libraries in Ruby.

Great point about sugar water. Totally on point.

------
hogu
This is a sincere question, if you're doing science, why would you want to use
ruby over python?

~~~
briteside
My answer to this question is transparency. At my company Brighter Planet we
write all of our scientific/methodological code in Ruby so that people with
basic technical skill can understand what's going on. The ability to create
expressive DSLs is really crucial.

~~~
noahnoahnoah
I have a similar answer - I'm a data analyst at a company full of rubyists. I
can (and do) use R for most of my analyses, but there's a cost in transparency
-- I can't realistically ask someone to review my R code when they don't know
the language, and the bus factor is high if anything I write in R is at all
important.

That's why I'm trying to use Ruby where possible, even at the cost of a small
productivity hit. The benefits of others being able to read my code far
outweigh the few extra minutes it takes for me to do something (and in many
cases, the sheer brevity of Ruby as a language means it's faster, simply
because it's less typing).

I'd love to see SciRuby become a more useful project, and I'd love to
contribute. Unfortunately, they don't make it especially easy to get involved
-- the mailing list points people to the roadmap, but it's not at the level of
detail where someone could jump in (and the component gems don't seem much
better), so it's a bit hard to know where help would actually be useful.

~~~
mohawkjohn
Noah -- I'd love to get your thoughts on how to make it easier to include
people in our project. Would you be willing to send me an email? john dot
woods at marcottelab dot org.

------
jcarden
Wow. This is very much needed. That being said, to what end are they going for
?

~~~
pdenya
I think this was posted in response to this article
(<http://news.ycombinator.com/item?id=3179370>) rather than because of some
recent update.

~~~
jergason
That is correct. I am somewhat connected to a few guys working on this, and I
saw the article and thought I would share.

------
tel
I'm really glad they're implementing Protovis instead of trying to copy
Matlab's plotting facilities like Matplotlib. Matlab plotting is pretty
terrible.

~~~
wladimir
matplotlib is very flexible and has a wide array of plot types. It's great for
quickly making plots of some data in Python. But IMO the output is not really
nice to look at, it takes a lot of customization to make the plots
publication-quality, and it feels kind if clunky for making interactive plots
(however, some animation support was added recently, I haven't looked at that
in deail yet).

The screenshots of protovis/d3 look very promising, I'll have a look at it.
The last time I needed a JS charting library I went with Highcharts, as it had
somewhat better support for the run-of-the-mill chart types I was using in my
project.

~~~
tel
Matplotlib is ok for a first plot during exploratory analysis, but is a far,
far cry from the facilities available in R through base graphics, grid
graphics, trellis graphics, or ggplot2.

Protovis/d3 take a different approach, also focused on a similar Grammar of
Graphics like ggplot but primarily concerned about the tooling, instead of the
application.

Tooling level libraries are nice because they tend to be flexible enough for
high data ink ratios, unlike highcharts, which turns me away with every
example.

~~~
pwang
Your choice of terms is very interesting! I would say that Protovis and d3 do
_not_ take a Grammar of Graphics approach per se; rather, they tackle mostly
just the lowest level of the grammar, namely aesthetic composition and some
transforms. ggplot is very nice for the kinds of datasets that people use R
for, but it's only one part of the story.

For large dataset and interactive visualization in Python, take a look at
Chaco: <http://code.enthought.com/chaco>

~~~
tel
You're right to point that out. I abused the term. The real thing I think
ggplot and protovis/d3 share, and the fundamental thing that makes them
powerful and interesting, is the declarative nature of those DSLs. GoG is
quite a bit more on top of declarative syntax, but, while I really enjoy
Hadley's work, I don't think it's the end all be all in declarative graphing
DSLs.

------
willpearse
Great idea and something that will definitely be highly used (I'm downloading
it now!), but I really would be cautious attempting to provide what R
provides, 'but better'.

R's power comes from the fact that hundreds of scientists have written
packages for it when they have a new method - you won't be able to get that
overnight. Also, R has strong links with other languages like C.

Finally, while I agree that sometimes R's syntax can be slightly obfusicated,
I don't really think the examples on their site are fair... You can
'plot(y~x)' guys :p

------
freyrs3
Interesting stuff, but given the sheer brainmass that SciPy has attracted its
going to be hard to draw users away from that crowd. I think the only appeal
of this project, at least for the time being, is labs that are already
entrenched in Ruby.

------
catch23
hopefully they make the DSL similar to matlab. Having worked in matlab &
numpy, I can definitely say that having numpy's syntax semi-close to matlab
helped a lot when we do ports. It (porting) happens more often than you'd
think.

------
sunkencity
Looks good. I'm a big fan of narray to get decent performance for array
operations, interesting to see how the rewrite of that will perform. Narray is
pretty damn close to realtime compared to native ruby implementations.

~~~
pwang
realtime what? Realtime like C? Realtime like ATLAS or MKL?

~~~
sunkencity
realtime like it's fast fortran matrix stuff. compared to slow-as-molasses
ruby native Array#mean (which has to be implemented in ruby).

------
georgeg
This is absolutely needed. Glad that the project is come back to life. Big up
to the developers.

------
piccadilly
God forbid you use existing mature tools appropriate to your task.

~~~
jtprince
And while we're at it, may he forbid the co-existence of emacs, vim, and
textmate; java, C++, C#; javascript and dart; KDE, gnome, enlightenment,
fluxbox, fvwm, icewm, metacity, compiz, wmii, monad; ubuntu, debian, gentoo,
redhat, and arch; bash, csh, and zsh ... nothing good ever comes out of
approaching a problem from a new angle.

~~~
mohawkjohn
This made my day.

------
jwallaceparker
This is terrific. This is newsworthy. Hope it gets legs.

------
timwang
python's concurrent programming is not that good. how is the situation in
ruby?

~~~
zhemao
It's the same. The Ruby 1.9.x (YARV) interpreter uses a GIL as well, although
like Python, there's a way for C extensions to release the GIL.

