Imagine if you had a heart-rate monitor and a sleep monitor.
The heart rate monitor would know when you spiked yourself with caffeine, combined with the sleep monitor that knows you've only slept 4 hours the past 3 nights.
An alert pops up on your computer: "hey, you need rest. Stop drinking so much caffeine. Eat an apple and take a 20 minute nap, you'll feel better."
Those are just two data sources.
Harvest (time tracking) + RescueTime (productivity) + Runkeeper would be super valuable too. You'd know that you're 10% more productive on days that you run.
Better to have a fridge monitor that pops up when you take the Red Bull out of the fridge and before you drink it. (Or coffee machine...or phone with GPS that recognises that you just entered starbucks...the trick is it should pre-empt the caffeine to affect your behaviour in the moment).
Warning: Shameless plug for project that isn't finished
This, among other things, is the reason I started building http://personable.me.
Ultimately, I see it as being the recepticle for WiiThings data, Fitbit data, etc., and I'm working on those now. I just finished up (but haven't deployed) Webhooks, with the idea that input + webhooks for output should allow a personal API to interact with other systems when API data is updated.
Nice idea. What worries me is having to log into what looks like a cloudy service which I imagine stores my personal data, accessed using other cloud credentials. I get that cloud is where the burgers are good at the moment, but my personal preference is for something I can put onto my own server. At home.
To that end I've been working on my own analysis tool for personal data. Most of the work has been on a Windows desktop app, and data import is limited to CSV. But data is synchronised to my phone.
So, the source is open, if that helps... or, at least was, and will be again. I closed it down temporarily while I separate the 'Personable.com' source from the 'Personable' source, but soon, you'd be able to clone the repo and stand up your own instance of it and host the data yourself, if you preferred.
Interact looks neat, but I suppose that this has a slightly different objective. To me, for Personable, the aim is that the API interacts with other services you use. Perhaps for the purpose of making your life easier (e.g., enter a personal API endpoint instead of having to fill out a profile, and let the website read in the data you let them access), perhaps for making the lives of others easier (e.g., your friends want to know where you're at, but don't want to check Facebook, Foursquare and Twitter to see which checkin is the most current), but all of that interaction depends on at least some of that data being net-accessible.
I haven't gotten into what should be public vs. what should be private yet, so, thus far, the advice for Personable is to just treat it like you should treat every other cloud service, and not put anything there you wouldn't want the world to read, but privacy options are forthcoming as well.
That's fair enough and adds good perspective. Will keep an eye on your progress. Interestingly, open-sourcing Interact is still doing my head in. If I ever finish it I'd like to make some money from it. But at the same time I understand that I get zero security cred for closed source. Happily I can delay that decision for a while...
If all goes well, I see Personable as being a hosted API provider for those who don't want to bother with hosting it themselves, with the source code otherwise available for those who don't mind, or don't want to relinquish control -- similar to how Wordpress operates currently.
There's no inherent reason you can't profit from an open source service, especially if you're able to provide value to those willing to pay for the convenience of not having to run the service themselves, and/or those unable to.
Good luck either way. Closed source may not be the death knell you expect.
I'm working on some software to record as much data as possible, from as many sources as possible, into one unified form so that you can actually analyze it.
I want the data sources to include far more than just health data--GPS, title of active computer window, sms history, file modification times, device and sensor readings, snapshots from your webcam, etc--really anything that you can record.
I thought about it for a while, and this is the best I could come up with:
- All data is just a stream of "events" from different "sources" for any given "data type"
- Sources are things like my desktop, my laptop, my phone, my camera, etc
- Data Types are things like global position, cpu usage, webcam snapshots, webpage text, etc. Anything really.
- Events are just a timestamp + zero or more other "lightly typed" "fields".
- By "lightly typed" I mean the field is either: binary data, searchable text, a float, or an integer
- Fields give the actual data for the event. If there are no fields, it's a countable event (ex: imagine recording every heartbeat, all you need is a timestamp). For something like global position, you would have latitude and longitude fields. For another datatype called "semantic location" you have just a searchable text string with things like "car", "work", "home" or whatever other labels you have. Fields can also be binary data (like photos, videos, etc). I'll probably try to make lots of data types, each with very few fields, to make it easier to work with the data (ie, it's nice if there is only one numeric field in any given data type, then it's easy to graph over time)
I'm currently collecting all of the events via some scripts that are specific to each data type, and then emailing those events (serialized as json) to my own server, which just inserts the events into a sqlite database specific to that source/data type pair.
The nice thing about this approach is that anyone can easily send an email from any language, and if you don't want to use my backend, you can just send the events to your own email server and then do whatever you want with them once they're there.
Another benefit is that you and I don't have to agree about exactly what fields and data types there are. If you call your gps data "gps" and I call mine "global position", well, who cares? It's our own personal data. If this takes off, I'll make guidelines about what to call what information and how to format it for easier interoperability.
We are working on solving some of this, specifically for health data, at http://humanapi.co - you can connect your various data sources, and we then give you a central personalized API you can query.
(full disclosure, I am the founder of this company)
It is free for the end user to sign up and use as a personal API (and we already have around 50 or so services). What we charge developers for is building multi-user applications on top of it.
If not, you can (eventually) use Personable.me, but I'm still in the process of integrating enough sources that one could realistically use for health analysis.
Sorry, yes, I should have qualified that. Yes, I'm building that. It's only about a week's worth of development thus far, but I've got a fitbit and a WiiThings scale on the way to start integrating those into the API as well.
The problem with that approach is that it's not sustainable. It might work if you limit yourself to health data, but then you're losing a lot of possibilities, and if you try to expand, you won't be able to add every single API out there and write intelligent algorithms for every possible view of the data.
We need to standardize on common formats so that anyone can publish in them (either original data or a bridge from a custom API) and anyone can write code to do analyze and augment it.
Some fields - particularly in the "sciences" - are already linking massive datasets between them, allowing new applications that they had never even considered, but here in the common "web app" world we're still stuck in the world of non-standard APIs with custom and non-extensible JSON formats.
Personis supports an accretion/resolution approach to reasoning about people, places and devices. It was designed to support user control of both the information held about them and the way that it is used.
In computing, linked data (often capitalized as Linked Data) describes a method of publishing structured data so that it can be interlinked and become more useful.
It's the concrete implementation of the Semantic Web - using a common model (RDF[1]), to link different data sources / pools and query them in a standard way, using SPARQL and similar technologies, and even perform automated reasoning to discover new facts about it.
Sounds pretty cool. Most of the data sources that I'm interested are not interested in publishing their data using RDF, but maybe it would be useful to have RDF be a result of an ETL process.
IPython Notebook is my go-to environment for munging small to medium data. Admittedly, it's not for non-programmers, but for personal hacks and hobby projects, it's great having Pandas, SciPy, scikit-learn, etc. at my finger tips in the browser.
That's one of the big barriers to this actually becoming more than just a thought experiment. Lots of interesting data sources are locked up with either no export capability at all or no way to automate exports. For the power company example, the only way to get that data out is to login to their website and follow a convoluted path to a "download" button which you have to click twice for some reason.
While all auto is cool, it doesn't seem like having to download data once/month would be all too terrible, and it certainly shouldn't stop you from the larger project. You could have the system prompt you to do it.
Like balancing your checkbook once/month. (You see, we used to have these little books called "registers," and ...)
I have been pondering the op's question for over two years, and this is the route I will probably take. Some websites like TicTrac do a reasonably good job at aggregating your stats from the web, but, like the op's example of electricity, will obviously never be able to query every service.
Much of my quantifiable data is sent to Google's Fusion Table, but I do not feel this is a good long term solution.
My intention is to define a base set of criteria for how a certain database needs to be formatted, similar to another comment here, and then let anybody make their tools available as either for "collection" of "visulisation/analysis". As long as some core fields are standardised, e.g. "quantity" and "date", then the data can be easily analysed. Each individual would control their own database, either on their own host or as a DBaaS, but tools to collect and visualize data could be shared.
I am leaning towards a document store (e.g. CouchDB and Cloudant), as that would allow any tool to push data in without knowledge of the schema. One of the nice things about some of the DBaaS is you can easily create individual username/password or API keys with specific permissions, so a third-party tool could write records, but not necessarily read any of your data. A standardised database would also benefit by having other tools able to utilised it, unlike something like Google's Cloud Datastore (which I do like!) In particular, I am thinking about the CouchDB and ElasticSearch integration.
So, why not just wait for apps like TicTrac or Saga to support every service? I have two reasons. Firstly, many of the other tools to aggregate data seem to have gone out of business. Secondly, there are some services I do not like to give third parties access to; email is one example, as is the ability to log keystrokes on my computer. However, I would like to see the summarised data from these services recorded with the rest of my Little Data.
Another option would be to dump everything to text files and upload them to Google's BigQuery, but I am leaning towards a shared tools / individual database model, as it would probably encourage better collaboration with other people.
Very interesting. I am building a service called Datalanche (~2 weeks till release) which sounds a lot like what you describe. I would love to further discuss this with you offline if you have time.
If so, please contact me: rpedela [at] datalanche [dot] com
The heart rate monitor would know when you spiked yourself with caffeine, combined with the sleep monitor that knows you've only slept 4 hours the past 3 nights.
An alert pops up on your computer: "hey, you need rest. Stop drinking so much caffeine. Eat an apple and take a 20 minute nap, you'll feel better."
Those are just two data sources.
Harvest (time tracking) + RescueTime (productivity) + Runkeeper would be super valuable too. You'd know that you're 10% more productive on days that you run.