
Odo: Shapeshifting for your data - jonbaer
http://odo.readthedocs.io/en/latest/
======
danso
Very nice. Having a one-line wrapper for the fast, native ways to do bulk data
import for databases is fantastic on its own:
[http://odo.readthedocs.io/en/latest/perf.html](http://odo.readthedocs.io/en/latest/perf.html)

I ended up learning how to do this from the various SQL shells. But, that was
a bit of a cognitive load, especially when the CSV files were complicated.
AFAIK, for example, SQLite won't properly ignore commas in quoted fields, so
you have to throw in an extra utility (like csvkit) to change the delimiter
before importing into SQLite.

sidenote: through Odo's homepage, I discovered this amazing library for
generating network graph diagrams, NetworkX:
[http://networkx.github.io/](http://networkx.github.io/)

~~~
flashman
NetworkX is very powerful, however for more hands-on network graph diagrams
(perhaps more art than science) I rely heavily on
[http://gephi.org](http://gephi.org).

(Here's my Gephi map of NSFW subreddits according to their links to each
other:
[http://electronsoup.net/nsfw_subreddits/](http://electronsoup.net/nsfw_subreddits/))

~~~
danso
Nice! Have definitely heard of Gephi but haven't made an effort to use it, out
of reluctance to learn a new GUI/system and how I rarely ever try to solve
problems that require graph analysis. For that subreddit visualization, how
much data prep/wrangling did you do (after making the API requests of course)
before you worked with the data in Gephi?

~~~
flashman
Not too bad. I just had to get it in the format of one line per source-target
pair. The data came from /u/uglyasblasphemy in SQL format though apparently
he's removed the link. Most of the fun was in arranging nodes with the move
tool. With the layout I used (Fruchterman-Reingold) the nodes form a circle
and you can pick up clusters and move them where you want them. Useful to make
things more meaningful to the eye, if probably less mathematically correct.

------
marchenko
What a great tool. I'm going to bookmark odo's homepage so that I don't have
to wait three seasons to find it again.

~~~
skyrw
Bigger nerd? You for making that joke, me for getting it.

~~~
GrinningFool
Statistically it was likely that at least one person in this audience would
get it. Therefore I think that OP wins this nerd throwdown.

Full disclosure: I still don't.

~~~
cpr
Throw us a clue?

~~~
danso
Odo is the name of a shapeshifter on Star Trek: Deep Space 9, who doesn't know
his origins or homeworld until around season 3. I guess the homeworld isn't
revisited as a major plot point until season 6? I don't remember...seasons 6
and 7 were honestly kind of a blur for me.

[http://memory-alpha.wikia.com/wiki/Odo](http://memory-
alpha.wikia.com/wiki/Odo)

~~~
goatlover
I think prior to season 6, the Klingons and Romulans tried to destroy the
Dominion by taking out Odo's homeworld in a sneak attack. Odo's people turned
out to be the rulers of a large empire that sought to conquer solids (non-
shapeshifters).

~~~
moosingin3space
Cardassians, not Klingons.

------
RileyKyeden
Does it have to travel to its homeworld to learn the full range of its
shapeshifting abilities?

~~~
cholantesh
Yes, and it also has to be electrocuted repeatedly before it runs for the
first time.

------
ElComradio
Also today is the 23rd anniversary of the premiere of DS9.

~~~
oneplane
I love how more DS9 community got sparked to life after Netflix decided put it
up.

~~~
TeMPOraL
TNG and DS9 being on Netflix is the very reason I signed up and stayed.

------
makmanalp
I've loved the idea of Odo since I first saw it, but I've always been wary:
the devil is in the details. I'm curious when and how information gets lost
during each transfer because of the peculiarities of each format, or how those
decisions are made and exposed.

Min/max limits, truncation nulls, floating point precision, encodings, picking
CHAR vs VARCHAR or string vs categorical, metadata like indices, etc are some
of the hard problems behind bulk loading.

------
misterdata
Very nice! I have been working on a similar library in Swift [1] which does
this, but also has a nice user interface on Mac and (soon) iOS [2].
Coincidentally also uses a Star Trek themed name :-)

[1] [http://github.com/pixelspark/warp](http://github.com/pixelspark/warp) [2]
[https://warp.one](https://warp.one)

~~~
panic
Neat! Unless I'm misunderstanding, I think Odo is more general, though, in
that you don't need to pass data through any sort of unified Dataset protocol:
there's a graph of direct translators between formats.

------
merqurio
I use odo a lot to take any data source into pandas or blaze and it's been
great so far. For my it's the lazy and easy way of moving data around.

------
eximius
I wish it supported large csv to partitioned parquet. THAT is something I need
a good solution for.

~~~
quasiben
Are you familiar with fastparquet
([https://github.com/dask/fastparquet](https://github.com/dask/fastparquet))
and pyarrow
([https://pyarrow.readthedocs.io/en/latest](https://pyarrow.readthedocs.io/en/latest))
?

~~~
ColanR
Your link is malformed.
[https://github.com/dask/fastparquet](https://github.com/dask/fastparquet)

~~~
ColanR
...and its fixed.

------
kevinwang
Oh my god it looks beautiful

