
Scientists make huge dataset of nearby stars available to public - daegloe
http://news.mit.edu/2017/dataset-nearby-stars-available-public-exoplanets-0213
======
aphykit
For those looking for a larger dataset (~500GB), the European Space Agency has
made available last September a similar dataset with 1B star positions and
magnitudes, including 2M with velocity information.[1] One nice graph is an
impressive sky map generated using simply a density plot of the object
positions [2]

[1]
[https://www.cosmos.esa.int/web/gaia/dr1](https://www.cosmos.esa.int/web/gaia/dr1)
[2] [http://sci.esa.int/science-e-
media/img/61/Gaia_GDR1_Sky_Map_...](http://sci.esa.int/science-e-
media/img/61/Gaia_GDR1_Sky_Map_HD.png)

------
M_Grey
This is science at its very best; open, a rising tide that raises all boats.
Most of all though, this kind of thing is going to be very exciting for a
particular kind of young person who should be encouraged at every turn.

~~~
billforsternz
Agreed. Also it's pretty exciting to 56 years old me (I happen to be looking
for something to sink my teeth into at the moment).

------
mcbits
If anyone's worried that this "huge dataset" might be too much to handle on a
home computer, it looks like it's around 5 MB if I clicked the right links.

~~~
dghughes
That's like the Large Synoptic Survey Telescope (LSST) it takes pictures of
the entire southern sky every few nights using a 3 Gigapixel camera . It will
do that for ten years and after that only gather 15TB of data. If you do the
math it's more like 46TB but even so that seems low.

~~~
mcbits
I assumed this was going to be thousands of images and 10s-100s of GB, so I
started the download just to see how much it would be. Was pleasantly
surprised when it was done as soon as it started.

------
bpodgursky
This is awesome. I've been using an older dataset for a visualization of
nearby stars + exoplanets
([http://uncharted.bpodgursky.com/](http://uncharted.bpodgursky.com/))

Going to look into using this though.

~~~
arunix
That is very cool, but why do G type stars render as red?

------
pbhjpbhj
There was a recent "The Sky at Night" episode
[http://www.bbc.co.uk/programmes/b088d1pv](http://www.bbc.co.uk/programmes/b088d1pv)
which shows, to the casual observer, how far we've come in cataloguing stars,
with a particular focus on the Milky Way.

------
SloopJon
I followed the links to this page:

[http://home.dtm.ciw.edu/ebps/data/](http://home.dtm.ciw.edu/ebps/data/)

When I try to download the TAR files, I'm sent to a Google Drive page where
the download button gets a single 5 MB file.

~~~
mattkrause
That seems about right.

You're getting exactly what it says on the tin: 64,480 velocity measurements,
collected from a total of 1,699 stars. They've thrown in the observation date,
a measure of uncertainty, and a few other values.

It's not "Big Data" in the sense of terabytes of ad clicking data. It is a
"big data set" if you consider that each and every entry in the data set took
a fair amount of time to acquire and process, not to mention the ungodly-
expensive equipment and non-trivial work to set it up correctly.

~~~
SloopJon
Ah, the downloaded file didn't have a .tar extension, and looking at the file
with the head utility made it seem like it was just a single TSV file. It is
in fact a TAR file that expands into 1,699 .vels files.

Thanks for the explanation.

------
hoziyw
Kind of funny they say they're making the data public... 61,000 raw keck
observations should be... ~600 GB?

I kind of wonder if they were forced. Most proprietary observations with Keck
or Hubble or whatever come with a little caveat that you only get the data for
1 or 2 years before it get's publicized. I'm guessing that running their code
on their released data set _after_ they've published is gonna come up with a
big fat nothing new. But maybe mixing with dupe detections from RAVE / LAMOST
/ SDSS / APOGEE (massive spectroscopic surveys free to the public after
proprietary times) and running their code might turn up one... highly doubtful
though.

Anyways, there's loads of huge data sets out there that are free. TGAS-Gaia
was free to the public as soon as it was made free to the community, so that's
where all the big boys are playing right now (The _really_ big fish are
fighting it out with the Gaia-raw, minus the tycho supplement that makes
TGAS). This summer Gaia2 is coming out to the public the same time as the
community again -- people are already setting up war rooms around the world to
hack out papers in week long sprints the day after release.

If you're super into planets for some reason, Kepler's been free for years.
Google "NASA Mast" for all Nasa data. Google "Vizier CDS" for any European
catalog. Gaia has light curves I guess, but it's not good enough yet, I'm
pretty sure) If there's a specific telescope you like... like CFHT for
example, just google it and they got free data.

~~~
coderdude
Of all the things the world has to offer to be cynical about... this is your
beef.

~~~
shepardrtc
Behind the cynicism seems to be useful information. While negative attitudes
are not helpful, they can often motivate people to speak up about things that
normally would not be widely known. For instance, the gist of what this person
is saying is that the released information probably has already been combed
over. And that if you want to spend your time actually looking for something,
there are better sets out there. But also be prepared to get behind entire
teams doing the same thing. Not that you wouldn't find anything, but otherwise
you get the impression that there's only a handful of researchers looking
through this stuff. Therefore, I actually found the comment useful.

------
cr0sh
I'm sitting here wondering if any deep learning methods could be used on this
(and similar) datasets? I don't know enough about what this data exactly is or
what kind of questions that could be "asked" about it to build a model for
such a purpose, but maybe someone else would...?

------
castis
Small nitpick; If anyone who can make a change to that page ever sees this,
please remove the scroll hijacking on
[http://home.dtm.ciw.edu/ebps/](http://home.dtm.ciw.edu/ebps/).

------
xchip
If only Kepler had had this!!

------
IndianAstronaut
I am curious as to what sorts of image processing and algorithms are used to
analyze this data set. Anomaly detection in images? Classification of images
based on known extrasolar planets?

~~~
welterde
So first of these are not images of stars, but spectra. From these spectra you
can extract quite a number of observables, such as radial velocity (how fast
it is coming towards/moving away from us) in this case.

Since planets orbiting stars are not massless, both the star and the planets
orbit a common center of mass, causing the star to periodically move
toward/away from us (much more complicated of course with multiple planets,
plane of the planets, oscillations in the star itself, etc.) - which we can
measure.

To find exoplanets using this methods you need spectographs, which are very
stable in the long-term (especially if you are interested in lower mass
planets) and can measure radial velocities in the range of 1 m/s, such as
HARPS [1].

[1]
[http://www.eso.org/sci/facilities/lasilla/instruments/harps/...](http://www.eso.org/sci/facilities/lasilla/instruments/harps/inst/description.html)

------
coldcode
When I was little I dreamed of visiting stars. But having more data about them
will have to do for now.

