
Introducing the Mozilla Science Lab - Lightning
https://blog.mozilla.org/blog/2013/06/14/5992/
======
glesica
Soooo... what exactly do they plan to DO? The entire article is filler,
there's no actual information. All I can gather is that they might tell
scientists about the "open web", whatever that is. Couldn't they just have
done that in the blog post and called it a day?

~~~
toufka
It would be really nice to have an open, human-readable scientific data
format. Basically a CSV+ that had a single piece of metadata - uniform
mandatory SI units.

To be able to read/write/import/export that to and from desktop machines to
the web would aid science so much.

AND it seems like a perfect job for the Mozilla foundation. I have no idea
from their (uninformative) article what they're actually doing...

~~~
jpallen
I don't really know why human readable is important here, but there already
exist open data formats that have good support for the things that matter:
large data sets, compression and support for multiple platforms and languages.
I'm not very knowledgeable about any, but some I am aware of:

[http://www.unidata.ucar.edu/software/netcdf/](http://www.unidata.ucar.edu/software/netcdf/)
[http://cdf.gsfc.nasa.gov/](http://cdf.gsfc.nasa.gov/)

~~~
toufka
Entirely different concept. I want _simple_ data to be easily distributable
and interpretable. Make a format that can universally read/write the data in
these very basic graphs:

[http://download.cell.com/images/journalimages/2211-1247/PIIS...](http://download.cell.com/images/journalimages/2211-1247/PIIS2211124713001964.gr3.lrg.jpg)
[http://download.cell.com/images/journalimages/2211-1247/PIIS...](http://download.cell.com/images/journalimages/2211-1247/PIIS2211124713001721.gr2.lrg.jpg)
[http://download.cell.com/images/journalimages/2211-1247/PIIS...](http://download.cell.com/images/journalimages/2211-1247/PIIS2211124712004299.gr4.lrg.jpg)

Most science (that really _needs_ hackers' help) is not the big sexy datasets,
but the really basic HTML5/CSS viewable graphs otherwise done in Excel. But it
must be done in a scientifically rigorous way (retain units and labels, be
able to manipulate those units, and be able to show/hide data between
different datasets).

Think a way to download and view a melting point curve dataset without ANY
user configuration. Or an enzyme activity curve. Or a drug dosage response
curve. VERY basic data, but it _must_ be able to maintain and interpret a
label and a unit in addition to the number itself.

~~~
jpallen
I don't necessarily disagree with you, I just always prefer to sit more on the
side of 'don't reinvent the wheel if you don't have to'. I don't think
creating an n-th competing format is the answer here.

In the examples in your final paragraph, surely what is needed there is a new
tool, not a new data format?

I think the larger 'problem' (if it really is a problem, and not just hackers
wanting to believe they know the One True Way) is that there is no such thing
as the typical 'scientist'. Every small sub-discipline has its own needs, its
own standards that it has settled on and its own toolchain. There is
definitely room for improvement within every one of these small areas, but
there's no global problem that needs solving.

~~~
toufka
Part of this is very true. Part of it is that science has already done this -
200 years ago with the development of scientific units. We already have a
great universal format - it is just not digital in nature and doesn't exist as
a 'file format'.

------
galapago
I can't find the concrete idea behind this project..

~~~
Zikes
Many areas of science can be and currently are improved through computing.
Folding@Home is a great example of this, as are various projects that aim to
use the web to detect planets by having people analyze data for transit
anomalies.

Unfortunately, many areas of science aren't funded well enough to support a
team of CS majors to work with the scientists, and indeed many scientists
might not even be able to identify what aspects of their work could be
improved through computing.

This project seems to be aimed at educating scientists to overcome those
hurdles.

~~~
toufka
Educating us is not the problem. We know. Just not much we can do about it.

We don't have funding to hire coders/developers/designers. And even if we did,
we'd likely spend it on that new microscope instead.

Much of what we need doesn't require computational power, but requires a good
input parser, a reasonable UI, and some basic (though absolutely solid) math.
And a way to change a whole host of variables used in the computations without
too much effort. I have a set of a few projects that a mildly competent coder
could bang out in a relatively short period of time that would undoubtedly
progress the whole field in which it was aimed. But that's pretty much like 20
people. 20 grad students - their bosses don't care if they count the data by
hand over a week, or have a computer do it in 2 minutes. It's not worth
anyone's (unpayable) time. So I code it myself - it's crappy, it's rough, but
it gets the job done. But no one will ever be able to reuse that code.

You hackers want to do something good for the world? Set up a hackathon thing
for your local biology/chemistry grad students. In a weekends' time you can
write a few open-source apps which would rapidly and noticeably affect the
pace of cancer and other basic research.

~~~
jk4930
> biology/chemistry

What are the low-hanging fruits I can aim for to get used to the problem
domain? What are the relevant pain points that justify some hard thinking and
coding?

In the end what I have in mind is basically this:
[http://biomind.com/AI_Against_Aging.pdf](http://biomind.com/AI_Against_Aging.pdf)

~~~
simonster
As far as neuroscience goes, we need the following:

\- An open source GUI for sorting spikes. When you put an electrode into the
brain, you often end up recording the "sounds" of multiple neurons firing. To
make sense of what's going on, you have to identify which spikes came from
which neurons. While there are fully automated algorithms for spike sorting,
they aren't widely used because people don't trust them to work on real
recorded data. Even if you use them, you probably want to check their results
later, and you need a good GUI to do so. Right now, the "state of the art" is
Plexon's Offline Sorter ([http://www.plexon.com/products/offline-
sorter%E2%84%A2](http://www.plexon.com/products/offline-sorter%E2%84%A2)),
which is slow, occasionally crashes, and costs thousands of dollars per
license. It's mostly a GUI, and replicating it would probably not require much
domain-specific expertise. The fact that it's closed source actively prevents
the development of better partially supervised spike sorting algorithms.

\- Better fMRI data analysis tools. See
[https://news.ycombinator.com/item?id=5239530](https://news.ycombinator.com/item?id=5239530)
for some previous discussion.

\- An open source replacement for MATLAB that is familiar to users of that
language and encourages modular design and has a JIT that can achieve C-level
performance. I am betting that this is going to be Julia
([http://julialang.org/](http://julialang.org/)), but there's still plenty of
work to be done before Julia is ready for people without substantial
programming experience.

\- Better tools for visualizing high-dimensional data sets. I'm not actually
very familiar with what's been done in this domain, but I feel it's something
that isn't often attempted because there's little overlap between experienced
graphics programmers and scientists. A good way to visualize your data can
save you days spent performing statistics.

~~~
jk4930
Looking at Plexon's GUI this doesn't seem to be something done in a hackathon
and requires active participation of neuroscientists as domain experts and as
end users to develop. And they usually don't have time (or willingness) for
that.

Regarding fMRI, I take your posting as something to talk about with the neuro
people.

My main takeaway from your posting is that better GUIs and visualizations are
needed. That's what I found (workflow, UI) before and your posting is another
affirmative data point here. Thanks for the links.

Any hints how to talk to biologists / neuroscientists? I still need to know
more about their incentives (what they have to accomplish or help them doing
so) and constraints (what they can't do or is unimportant) and culture. As
usual, publications and funding, but what is seen as high status and what is
frowned upon?

~~~
pseut
Talk to grad students; more time, fewer commitments, still very knowledgeable,
but with less "turf" to defend.

------
ajays
Mozilla seems to have lost focus. Firefox still has massive memory leaks; it's
not uncommon for me to come home at the end of the day to find FF using 3.8GB
out of the 4GB I have, and my Linux box thrashing like crazy. Typically the
culprit is Facebook's homepage. But I'd love to see Mozilla spend their effort
on first making Firefox great. Memory leaks in FF aren't a new thing.

~~~
Zikes
Since this is a project of the Mozilla Foundation, not the Mozilla
Corporation, it does not necessarily "steal resources" from the Firefox
development team. The Mozilla Corporation is no more or less free to improve
the browser based on what the Mozilla Foundation chooses to do in its non-
profit capacity, except where the Mozilla Foundation's projects improve on the
concepts and ideas of the internet and computing as a whole, such as what this
project does.

