

Little Data: How do we query personal data? - zrail
https://www.petekeen.net/little-data

======
iambateman
Imagine if you had a heart-rate monitor and a sleep monitor.

The heart rate monitor would know when you spiked yourself with caffeine,
combined with the sleep monitor that knows you've only slept 4 hours the past
3 nights.

An alert pops up on your computer: "hey, you need rest. Stop drinking so much
caffeine. Eat an apple and take a 20 minute nap, you'll feel better."

Those are just two data sources.

Harvest (time tracking) + RescueTime (productivity) + Runkeeper would be super
valuable too. You'd know that you're 10% more productive on days that you run.

~~~
jacalata
Better to have a fridge monitor that pops up when you take the Red Bull out of
the fridge and before you drink it. (Or coffee machine...or phone with GPS
that recognises that you just entered starbucks...the trick is it should pre-
empt the caffeine to affect your behaviour in the moment).

------
thejash
I'm working on some software to record as much data as possible, from as many
sources as possible, into one unified form so that you can actually analyze
it.

I want the data sources to include far more than just health data--GPS, title
of active computer window, sms history, file modification times, device and
sensor readings, snapshots from your webcam, etc--really anything that you can
record.

If anyone is interested, feel free to email me!

~~~
icebraining
What's the unified form?

~~~
thejash
I thought about it for a while, and this is the best I could come up with:

\- All data is just a stream of "events" from different "sources" for any
given "data type"

\- Sources are things like my desktop, my laptop, my phone, my camera, etc

\- Data Types are things like global position, cpu usage, webcam snapshots,
webpage text, etc. Anything really.

\- Events are just a timestamp + zero or more other "lightly typed" "fields".

\- By "lightly typed" I mean the field is either: binary data, searchable
text, a float, or an integer

\- Fields give the actual data for the event. If there are no fields, it's a
countable event (ex: imagine recording every heartbeat, all you need is a
timestamp). For something like global position, you would have latitude and
longitude fields. For another datatype called "semantic location" you have
just a searchable text string with things like "car", "work", "home" or
whatever other labels you have. Fields can also be binary data (like photos,
videos, etc). I'll probably try to make lots of data types, each with very few
fields, to make it easier to work with the data (ie, it's nice if there is
only one numeric field in any given data type, then it's easy to graph over
time)

I'm currently collecting all of the events via some scripts that are specific
to each data type, and then emailing those events (serialized as json) to my
own server, which just inserts the events into a sqlite database specific to
that source/data type pair.

The nice thing about this approach is that anyone can easily send an email
from any language, and if you don't want to use my backend, you can just send
the events to your own email server and then do whatever you want with them
once they're there.

Another benefit is that you and I don't have to agree about exactly what
fields and data types there are. If you call your gps data "gps" and I call
mine "global position", well, who cares? It's our own personal data. If this
takes off, I'll make guidelines about what to call what information and how to
format it for easier interoperability.

------
andreipop
We are working on solving some of this, specifically for health data, at
[http://humanapi.co](http://humanapi.co) \- you can connect your various data
sources, and we then give you a central personalized API you can query.

(full disclosure, I am the founder of this company)

~~~
iambateman
Cool! Is this going to be available to consumers in general?

edit: fwiw, I understand if it won't be.

~~~
bmelton
If not, you can (eventually) use Personable.me, but I'm still in the process
of integrating enough sources that one could realistically use for health
analysis.

~~~
iambateman
That's cool. Are you building it?

~~~
bmelton
Sorry, yes, I should have qualified that. Yes, I'm building that. It's only
about a week's worth of development thus far, but I've got a fitbit and a
WiiThings scale on the way to start integrating those into the API as well.

------
hauk1
My startup is working on this, analysing and making sense of personal data,
check it out: [http://sympho.me](http://sympho.me)

It works by connecting to services you might all ready use etc.

(full disclosure, I am the Co-founder of this company)

~~~
kelvinn
Your product looks like it will be a neat service. Do you intend to also have
an API to contribute data (in addition to your "import" tool?)

~~~
hauk1
Thank you!

We have that planed down the pipeline:D

------
jbu
[http://chai.it.usyd.edu.au/Projects/Personis](http://chai.it.usyd.edu.au/Projects/Personis)

Project Description

Personis supports an accretion/resolution approach to reasoning about people,
places and devices. It was designed to support user control of both the
information held about them and the way that it is used.

also [https://github.com/jbu/personis](https://github.com/jbu/personis)

------
icebraining
_In computing, linked data (often capitalized as Linked Data) describes a
method of publishing structured data so that it can be interlinked and become
more useful._

[http://en.wikipedia.org/wiki/Linked_data](http://en.wikipedia.org/wiki/Linked_data)

~~~
zrail
Could you extrapolate a little? Maybe include some links?

~~~
icebraining
It's the concrete implementation of the Semantic Web - using a common model
(RDF[1]), to link different data sources / pools and query them in a standard
way, using SPARQL and similar technologies, and even perform automated
reasoning to discover new facts about it.

[1] [http://www.rdfabout.com/intro/](http://www.rdfabout.com/intro/)

~~~
zrail
Sounds pretty cool. Most of the data sources that I'm interested are not
interested in publishing their data using RDF, but maybe it would be useful to
have RDF be a result of an ETL process.

~~~
icebraining
Yeah, there are many tools already that can help you import different data
formats into an RDF datastore, such as Virtuoso:
[http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/](http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/)

------
po84
IPython Notebook is my go-to environment for munging small to medium data.
Admittedly, it's not for non-programmers, but for personal hacks and hobby
projects, it's great having Pandas, SciPy, scikit-learn, etc. at my finger
tips in the browser.

------
boddah7
The problem is that companies will always have more data and can do more with
that data than an individual.

------
skizm
How do I collect all my personal data? (without a being a major pain in the
ass)

~~~
zrail
That's one of the big barriers to this actually becoming more than just a
thought experiment. Lots of interesting data sources are locked up with either
no export capability at all or no way to automate exports. For the power
company example, the only way to get that data out is to login to their
website and follow a convoluted path to a "download" button which you have to
click twice for some reason.

~~~
a3n
While all auto is cool, it doesn't seem like having to download data
once/month would be all too terrible, and it _certainly_ shouldn't stop you
from the larger project. You could have the system prompt you to do it.

Like balancing your checkbook once/month. (You see, we used to have these
little books called "registers," and ...)

------
rpedela
Why not use a backend as a service company for your "data soup"?

~~~
zrail
I'd like to self-host it if possible. Letting a 3rd party possess all of that
information at the same time seems risky to me.

~~~
nekgrim
Yahoo Pipes can be used for normalisation, but I don't know any datamining
program for little datas...

------
bmelton
Warning: Shameless plug for project that isn't finished

This, among other things, is the reason I started building
[http://personable.me](http://personable.me).

Ultimately, I see it as being the recepticle for WiiThings data, Fitbit data,
etc., and I'm working on those now. I just finished up (but haven't deployed)
Webhooks, with the idea that input + webhooks for output should allow a
personal API to interact with other systems when API data is updated.

~~~
Spearchucker
Nice idea. What worries me is having to log into what looks like a cloudy
service which I imagine stores my personal data, accessed using other cloud
credentials. I get that cloud is where the burgers are good at the moment, but
my personal preference is for something I can put onto my own server. At home.

To that end I've been working on my own analysis tool for personal data. Most
of the work has been on a Windows desktop app, and data import is limited to
CSV. But data is synchronised to my phone.

[https://www.wittenburg.co.uk/Entry.aspx?id=0a505400-5bf6-4a6...](https://www.wittenburg.co.uk/Entry.aspx?id=0a505400-5bf6-4a6d-b107-6b4b797f33ae)

~~~
bmelton
So, the source is open, if that helps... or, at least was, and will be again.
I closed it down temporarily while I separate the 'Personable.com' source from
the 'Personable' source, but soon, you'd be able to clone the repo and stand
up your own instance of it and host the data yourself, if you preferred.

Interact looks neat, but I suppose that this has a slightly different
objective. To me, for Personable, the aim is that the API interacts with other
services you use. Perhaps for the purpose of making your life easier (e.g.,
enter a personal API endpoint instead of having to fill out a profile, and let
the website read in the data you let them access), perhaps for making the
lives of others easier (e.g., your friends want to know where you're at, but
don't want to check Facebook, Foursquare and Twitter to see which checkin is
the most current), but all of that interaction depends on at least some of
that data being net-accessible.

I haven't gotten into what should be public vs. what should be private yet,
so, thus far, the advice for Personable is to just treat it like you should
treat every other cloud service, and not put anything there you wouldn't want
the world to read, but privacy options are forthcoming as well.

~~~
Spearchucker
That's fair enough and adds good perspective. Will keep an eye on your
progress. Interestingly, open-sourcing Interact is still doing my head in. If
I ever finish it I'd like to make some money from it. But at the same time I
understand that I get zero security cred for closed source. Happily I can
delay that decision for a while...

~~~
bmelton
If all goes well, I see Personable as being a hosted API provider for those
who don't want to bother with hosting it themselves, with the source code
otherwise available for those who don't mind, or don't want to relinquish
control -- similar to how Wordpress operates currently.

There's no inherent reason you can't profit from an open source service,
especially if you're able to provide value to those willing to pay for the
convenience of not having to run the service themselves, and/or those unable
to.

Good luck either way. Closed source may not be the death knell you expect.

------
m3h
I like the way this guy thinks :)

