I enjoyed reading this author's essay on why he doesn't think loading everything into relational databases is the best approach: https://beepb00p.xyz/unnecessary-db.html
I think I have a workarounds for most of the issues he describes there. One example: I mostly create my base table schema automatically based on the shape of the incoming JSON, and automatically add new columns if that JSON has keys I haven't seen before.
Very thought-provoking writing (and software) here.
Ah I see, so your approach is using databases as an 'intermediate' layer, rather than the main source of 'truth'.
My main objection to databases is the fricton with the maintenance and the speed of prototyping, but if it's automatic and used as a cache, I agree it's reasonable. Although I'd still be worried about normalising something wrong, but this is my personal sort of anxiety :)
I've got something similar : if you have a sequence of NamedTuples/dataclasses, you can get a database 'for free'.
It downloads your data from social media (or other online sites) into a single sqlite database. It can even draw relations between various accounts/people. I use it to back up my Google Photos.
I've focused on Google Takeout exports so far. Only gotten Hangouts and Voice working at this point.
I feel like all of us need to be working on a common codebase for all these parsers. I just found out a week ago that the Hangouts format changed significantly and totally broke my parser (which I actually adapted from https://bitbucket.org/dotcs/hangouts-log-reader/)
Yep, agree about the common codebase, that's why I'm trying to keep everything modular and thinking a lot about it. I describe this bit of my philosophy/design here .
I've only written parsers/exporters if I haven't found any (or any that wouldn't be a complete pain to adapt) so far. Google Takeout processing is a part of HPI, but I was going to extract it in a separate repository, so perhaps we both could benefit from it.
I suppose one thing that could be universally useful is common documentation about what we've discovered in reverse-engineering the formats. I often have extensive notes like this: https://github.com/NickSto/life-browser/blob/master/drivers/...
I remember finding a wiki about this a while back but I can't find it right now..
(Shakes fist at the RTM API which would return one and many child tasks in different ways, so it took me an extra half-hour to write the convoluted Jq query.)
I want to have an all-encompassing personal history for the same reasons I want good access and error logs. I want to see when something wrong happens, and I want the means to diagnose the problem and fix the damage.
A few good examples:
- Did I forget to write down any tax-deductible expenses?
- What is my net worth? Is my current lifestyle sustainable?
- Which parts of my city have I yet to explore?
- Can I search all my conversation with John across 3 messaging apps, email, SMS and phone?
I took a few jabs at these problems. Invariably, the services I consumed data from shut down their API or hid my data better. In some cases, I wanted to use a different service, but didn't feel like updating my scripts.
I'm currently looking into this again, but this time it's also a matter of privacy. I want to own all that data I create. I'd rather integrate tools under my control than services who actively try to hold my data hostage.
> Can I search all my conversation with John across 3 messaging apps, email, SMS and phone?
These are totally pain points for me / questions I'd love to have some answers for... Anyone tried to get the historic iOS location data to answer the first question?
Google Takeout gives me my location history, but getting it is a manual process.
Banks must offer an API. I just haven't looked into it yet.
YNAB's method is excellent for this. TLDR: you record your expenses for some time, and then you begin planning them ahead for each month semi-automatically, putting in money for costly purchases in advance and leaving some for when shit happens.
I've been kinda looking for an OSS/self-hosted implementation of the method, but none seem inviting so far.
(G2A seems to have a Steam key for the old version of YNAB, under ‘You Need A Budget’—which version could sync via Dropbox, without its own cloud service. But the key costs quite a bit more than what I've paid for the app back then.)
> Which parts of my city have I yet to explore?
I'm using desktop QGIS to mark my trips, while OsmAnd on the phone shows the exported GPX on the go (among other pleasant features). In theory I can use OsmAnd itself to record the trips with GPS, or any of the other dozen apps for that, but personally I don't believe that Google won't suck the location up regardless of my settings.
Hm. I actually don't agree with that sentiment as written. It's not time wasted if you enjoy it and have something to show for it? That's not it either.
Maybe this. I view the work we're discussing like a programmer's equivalent of woodworking in your garage. Sure, it's not necessary. But it's creative, fulfilling, and enjoyable. There's something immensely satisfying about using something you made with your own two hands, even if it's not perfect. As long as all these things remain true, I wouldn't call it time wasted.
To me your critism reads something like, "if you wanted to canoe, why did you spend a year making your own? you could have bought one and spent that year doing what you really wanted to do!" Well, I wanted do that, but I also enjoyed making something to do that too. I feel my life is better for having made it.
Yep, agree! I really enjoy it, my only problem is that there are too few hours in day :)
I guess it depends on the goals though -- it's perfectly fine to build something just for the sake of it, as long as you have fun. One of my goals through is to stop/pause the active phases of building and spending more time using it. Partly because of the lack of time, partly because it means iterating and reflecting on whatever I've already built. So ultimately I agree that it's important to improve your life instead of building something that may improve it one day.
I mean, do I want my time back? Yes, sure, like with any learning in hindsight it always feels that I spent more time than necessary om this.
But the thing is, I did improve my life while building this! I've learnt so much: obvious things like building better and more resilient software to my preferences in terms of tools and services. I've started the blog to share this system and the related things, which made me appreciate writing, and I feel like I'm getting better at it. The whole quantified self thing was super useful: it motivated me to think and learn how my body works, to eat healthy, etc. However stupid it is, self-tracking gives me extra motivation for the regular exercise and trying to pish my limits.
Every new bit of data I'm adding is easier and smoother, so I can easily imagine myself using this system for years (and fixing minor bits that break once in a while, just like people fix things in their homes). Sharing it also means I can potentially improve others' lives, which makes me regret the time I spent building and researching much less.
> "if you wanted to canoe, why did you spend a year making your own"
I feel it's more like "you can rent/borrow a canoe, but you only have a spoon instead of a paddle which you can't switch. Oh and it can also collapse anytime. Why would you spend a year making your own?"
Today abstraction is no longer that of the map,
the double, the mirror, or the concept. Simulation
is no longer that of a territory, a referential being
or substance. It is the generation by models of a real
without origin or reality: A hyperreal. The territory
no longer precedes the map, nor does it survive it.
*It is nevertheless the map that precedes the territory—
precession of simulacra—that engenders the territory.*
The best way I can describe this type of thinking is a very strong, overriding compulsion to think and live in an augmented, external, hivemind-like capacity. It's beautiful, but it certainly isn't normal.
It makes one wonder what the etiology of such thinking is. Were I to guess, based on personal experience: hypomania, ADHD, obsessive tendencies, indignation at tech platforms sucking so bad, technology fracturing our fucking minds. Any combination thereof.
>I'm not willing to wait till some vaporwave project reinvents the whole computing model from scratch
Sorry, turns out it's kind of hard! :)
First off, there is something deeply satisfying about seeing your stuff properly categorized and easily accessible when needed so I definitely understand the impulse. And the result looks like it does exactly that. Kudos.
There is a part of me that dislikes this level of insight someone can have into my life and habits ( yes, it is intended for owner use, but the data is there, neatly organized for someone to access ).
It is a weird comment for me, because I absolutely see the benefit of this project.
- One is along the lines of post-privacy:
Lots of the stuff that I'm collecting one can find online anyway if they deliberately search for it. Tweets/reddit comments/instagram photos, etc.
Some of it isn't public, let's say, Amazon purchases. But would I mind sharing it?
Maybe not, who cares what I buy? Some ML algorithm that would show me ads? Whatever, I don't click on them anyway.
The more stuff I don't mind to be public in the first place, the less is the chance someone can use it against me.
But I understand it's ultimately something very personal.
- Second is that I agree that the security professionals would handle the data much better than me.
I don't even mind Google keeping my data, if only it was easier for me to access it when I need it!
So generally I would happily pay professionals for a service to keep my data safe, being able to access it and integrate together, and for the infrastructure around it.
Same way I'm using a bank to keep my money safe.
But such infrastructure for data simply doesn't exist at the moment, because there is no demand for it from people.
And silos only benefit the companies that keep this data, so the change isn't going to be come from them.
E.g. if you can easily access your tweets through a nice and fast local app, why would you even visit bloated twitter.com, which is trying to 'entertain' and 'engage' you?
I'm hoping that with projects like this, I can inspire people to think differently about their services and tools, and demand (from professionals, i.e. engineers/designers/product managers) for better means of using their own data.
Well this is already happening. Google and facebook for example earns basically all of their money from this. They replaced advertisment middle men because they can track and quantize human behaviours and identities. They do it because they are able to aggregate the data and provide API for marketing people.
This is a tool that brings it closer to you as the creator of the data. it doesn't do anything new just exposes what's going on.
If you are able to open pages through EU check out data processing partners from GDPR consent you have to accept. Lists on every professionally built services are long. There are literally thousands of companies accessing, aggregating, processing and reselling the data and their analysis.
The tools you are afraid of already exist and are commercially available.
Wondering if anyone else does this?
I like doing it because it helps clear my head, even though the files or data are digital they take up some space in my head. I do it every few months, just have a fresh slate. I wonder if that says something about me - I'm not really sure. It's probably just another form of procrastination.
I once read Stephen Wolframs post about how he has a key logger that backs up every keystroke he types and its all indexed and he has a little front end over his enormous amount of data and it made me feel anxious just even reading it.
I think most people would want amnesia for certain memories like traumatic events. E.g. a romantic interest rejected you, or football quarterback throws an interception and needs to forget it to fully concentrate on the next play.
For most things, I wish I had more digital documentation of my past life. There were places I visited once that I wish I had photos for. There were people that were important in my life but I forgot their names and never wrote them down. I started keep a daily journal for the last 10 years and it's been so useful that I wish I had been doing that since I was child. If I had a git repo for everything single piece of code I wrote since I was a kid, that would be cool for me to revisit.
Back to your point, another type of purge that's useful are open projects that just nag you. For example, I had a "home automation" project on my todo list for years. In addition to taking space in brain, there were half-assembled open boxes of electronics on the shelf for years as physical reminders of that unfinished task. I finally decided to be realistic about my enthusiasm for that project and just abandon it. I sold the electronics and I feel better now. If I later decide to tackle it again, there will be newer and better technology anyway.
I also had a bookshelf of foreign languages for years that I thought I'd learn for foreign travel. After the COVID-19 crisis, I decided to give them away since I won't be traveling overseas for years. Now their absence no longer remind me of things I never got around to.
Purging bad emotions and unrealistic projects -- yes. Purging life data -- generally no.
I never really look back through any of it but when I do it’s quite nostalgic.
I think there is a happy medium and you can achieve it by keeping your inbox inside your trashcan.
This is not my idea and I have heard of it, in various forms, from different corners.
The idea is that all of your "important" papers get placed, as you receive them, into your larger (deeper) than normal trash can. I use a deep tray for this. It is basically a trash can that takes 6 or 8 months to fill.
When you fill the trashcan, you remove the bottom 1/4 of it and actually throw it away - since it turns out it wasn't important or necessary or actionable.
You can (and I have) created digital metaphors for this:
For instance, I dump my profiles directory, with all of my browsing history and bookmarks and so on, every month or so and then reset my browser. Then I delete those tarballs later when it's clear they were not important.
The key is this:
You can retain maximal optionality while still minimizing processing and sorting time. Just chuck it all into the "trash" bin and if it's really important you always now where to look.
So I chuck it in an archive. Gradually expanding storage at low prices means never having to think about deletion.
I don't use any of the systems for sharing tabs or history across browsers, because each has a different set of almost abandoned tabs. Sometimes "tab bankruptcy" is forced on me by software failure, and it's frustrating.
Notes, not really, there's a ton of them in Keep so I look up some stuff from time to time.
I have a massive collection in OneNote notebooks, and once a year it seems I check them and remember some good stuff, but mostly just how dumb I was before :D
I have a similar project, DL, that's unfinished. Mine revolves around using a custom API in both Rust and REST to aggregate all my digital life events using ActivityStreams 2.0 and extensions to that, in a manner that is decentralized and ranked/categorized through machine learning. I am still working on it and releasing it Open Source is one of this year's goals.
My motivation is that the amount of information I receive from Twitter,Mastodon,Facebook,Reddit,HN,various Stack exchanges,blog postings,etc. has gotten to the point where it's too easy to miss things.
Jeremie Miller, one of the creators of XMPP, had something similar revolving around the Telehash protocol. As far as I can tell, that effort is discontinued, or at least no longer Open Source.
Than I have different API's to that my IoT devices call to fetch the data from The Archive to trigger events and generate dashboards.
This might be a great way to optimize it and save my some development time for things that I don't have yet implemented. (Twitter archive, messages, etc)
Some examples of things already implemented:
- Raspberry Pi Touchscreen on my desk with dashboards
-> Financial Dashboard (Net Worth, Stocks, Expenses)
-> Reminders (ongoing to do list, train delays etc)
-> Dashboard with my indoor air quality/temp and plant
- Self made night light + alarm clock
-> Calculates time I need to go to bed to get 8 hours of sleep and indicates it with a red light.
-> In the morning triggers alarm and reads highlight, light shows weather forecast
I've been working on something quite similar to this project (in Rust), and I had the same need.
I started tinkering with a library + DSL that describes data fetching and transformation, where all data is transformed into schema.org types.
The DSL can then be either interpreted or client code generated for various languages.
This would potentially make such a API collection library much more feasible.
Sadly such a DSL turns out to be more of a full programming language if all requirements are covered.
That would be so useful, as also kinda dangerous. It would be also a damn amount of work, just for managing it. Maybe building some general framework and API and let people build it decentral?
Intake  is another package that might help here. It organizes a set of data sources into
(1) plugins that actually connect to the data source and map the data to standard Python data structures like Data Frames
(2) catalogs that reference the plugins you want to use alongside project specific metadata like usernames/passwords/source URIs
(3) convenience functions that persistence, concatenation, etc
(4) a GUI for browsing data sources
Looks like OP has already written a lot of data access logic... I might fork them into Intake plugins for my personal use.
Really cool interface there: https://hyfen.net/memex/
My approach is basically to view myself as a distributed system that receives various inputs (jira tickets, PR comments, slack messages, emails) and performs work that produces outputs (PRs, emails, PR reviews, deploys). I'm trying to model workflows for all types of work I do, and write plugins that process various inputs and outputs so tasks can enter and progress through these workflows.
The idea of exocortex seems to be something that many people want, and almost universially re-invent because the way in which structuring makes sense is different for everyone. I like the approach of making it modular so people can pick and choose parts and do their own integration.
What I learned from thinking about this problem is that this kind of integration is really hard and almost impossible to tackle without an evolutionary approach, and it made me go meta and focus on incremental development of the integration layer first. My interpretation of the idea of exocortex might be a bit different in that the focus is not so much about information per se, but more about systems that interact with the physical world. Here's a writeup in case anyone is interested in where it took me: https://github.com/zwizwa/erl_tools/blob/master/doc/exo.org
I'm interested in a my.spotify and a my.film.traktv
In the long run, I'm not sure how sustainable it is, since there are many different data sources and ways of representing them.
This is also particularly difficult when you don't use the service, e.g. as a maintaner I wouldn't have any means of testing Traktv.
So far I've kept all HPI modules close, because a monolith is easier for prototyping and refactoring. But my fear is the fate of oh-my-zsh, or spacemacs, which are overwhelmed by the pull requests.
Ideally I think it should be a simple core, only containing the common utility functions, extraction helpers, error handling, caching, logging, that sort of thing; and make the rest
It's possible to achieve this in Python, thanks to the namespace packages. The only problem I see is managing these small individual packages, and declaring dependencies between them. This is possible with PIP and setuptools, but there is certain overhead involved, I feel like this step is ought to be simpler, especially to people who don't want to fully dive into Python.
It's the Python plugin system that was spun out of pytest. I'm really impressed by it - it's a very clean design, and integrates great with Python packaging.
I'm using it extensively for Datasette, which means myself or others can add new features to the core software without needing to ask permission and in a way which supports trying out crackpot new ideas without sullying the design of the core software.
I've been thinking about usi it for my Dogsheep personal analytics suite too, which is currently split up into a bunch of separate tools in separate repos (since the only unifying interface is that they all spit out SQLite databases).
I've been working on something quite similar for a while (in Rust).
I decided to normalize data to schema.org types where available and store JSON-LD documents in a key/value store. While the schema.org types are far from perfect, it at least morphs data into a standardized format.
Offtopic: how did u create the blog? Did you use WordPress? If so, which theme? If not, what UI library did you use?
I ask because I have a blog myself which looks like crap haha.
Probably in combination with org-mode http://wikemacs.org/wiki/Blogging
Hakyll in HTML sources is actually just an artifact left in my template, from the times when I did use Hakyll. Should probably remove it.
I found it a bit overkill for my purposes and often overly restricting me, so I've switched to a Python script to generate everything: https://github.com/karlicoss/beepb00p/blob/master/src/build....
It's a bit ad-hoc, but ended up the same length as my old Hakyll code, and allows me to experiment much faster.
Your work is a great example for that and I like how you display them
I am putting as much as i can into my TODO+archive database (see https://github.com/andrey-utkin/taskdb/wiki/Live-demo#workou... ), and it is pretty neat already for analysis with querying and visualization. But your stuff is orders of magnitude bigger. Possibly I will set up HPI for myself some day.
- social networks: posts, comments, favorites
- reading: e-books and pdfs
- annotations: highlights and comments
- todos and notes
- health data: sleep, exercise, weight, heart rate, and other body
- photos & videos
- browser history
- instant messaging
I've been studying the concept of "life management software" for decades so thanks for sharing your project and I enjoyed reading your thought process.
The concepts I always think of that the idealized software would manage:
+ timeline: [past] --> [present] --> [future]
+ The Big 2: time & money
+ reflection vs growth : mining the past data for patterns and metrics -- vs -- managing and prioritizing a future wishlist to deliberately design a future life
I've come to the conclusion that it's very hard to come up with a good schema that unifies all aspects of life that's important to me. For example, looking at your list above, it is heavily populated by digital artifacts (things that happened in the past). One of the exceptions in your list that is "future" oriented would be "todos" and possibly "notes".
I'm also interested in life "situational awareness".
E.g. the other aspect of life data is money which means financial budgeting/planning. Another aspect of life data is time so a digital calendar of future events -- and goals -- is essential. Yes, I keep a list of books I've read (the "quantified self") -- but also books I plan to read in the future (designing a future version of myself a.k.a personal "growth").
There's nothing wrong with your project. I'm just explaining my observations after seeing various attempts at this over the decades. This includes 1980s software like Borland Sidekick, 1990s PIMs (Personal Information Managers) like Lotus Agenda, late 1990s PDA (Personal Digital Assistants) like Palm Pilot, 2000s software like Evernote, and a hundreds of 2010s SaaS "todo/calendar/notes" websites, or uber-geek tools like Emacs Org-mode.
None of the above really do what I want so for now, I just split my digital life across various tools and files. My daily notes in "journal.txt". My financial planning (and bank & credit card data downloads) in "budget.xls". Saved webpages in ".mhtml" files. Planned programming projects in "projects.xls". Etc.
Yep, totally agree it's hard to incorporate the future, especially when it's in free-form like org-mode notes. Past data is much easier becuase it's at least somewhat structured and easy for the computer.
I guess my take on this is that subjective metrics, like the one people use to define 'success', 'growth', etc., are orders of magnitude harder to grasp than objective metrics (e.g. hard data, like sleep or excercise logs). Yet, we don't even have good means of incorporating the hard data in our lives, so I chose to start with an easier problem.
Yeah you're right, my future planning is all in my org-mode notes, I reflect on it with my brain, but so far not sure how I can use software (apart from organizer software) to aid me with it. It would certainly be an interesting area to explore.
Note that it's meant to be running on your own computer, and it's using the filesystem to access the data, no network interactions involved. Ultimately it's about how you're protecting the data on your disk and whether you trust youself with it. Of course, my code has to be trusted too, but it's at least possible to run it in a sandbox/use Docker, etc.
The modules run untrusted Python code, so if you keep your token on the disk, they can potentially steal it token too, unless you use some elaborate authentication system.
> It's not vaporwave
I look forward to the day we replace vaporware with vaporwave, ala segue>segway
If you can add some social scoring, it would be great too. Maybe if you have a way of certifying it too, it would be great indicator to put on resumes to show how much better you are than those other candidates.
What about the next step ? Have you considered just putting screen recording (+ body cam if you want to extend it to all life monitoring and not just digital) and store all the footage. Then it's just some video indexing. You can always post-process the data as new deep learning algorithms become available.
This way you don't have to mess up with specific api for each service. You handle the general case and you are done. (That's what smart people do https://writings.stephenwolfram.com/2019/02/seeking-the-prod... )
Have you thought about plugging an optimizer into it so we could make the most of our lives ? I have got an Hedonic treadmill to climb.
Next step: yep, I throught about it a bit , agree that it would be great as a service-agnostic way of accessing the data, the same way we can see it with our eyes.
But figured I should start with strictly easier problems for now, ones that I can solve at least to some extent. For now my goal is simply enjoying using it, making the setup simpler (to make it accessible to more people), making sure it's robust to the changes, flexible enough, etc.
I want to inspire people to own their data. In many ways, I'm advocating an approach to owning it and working with it, rather than a specific set of tools, formats, etc. It would be great to have more people experimenting in the same area and trying to loosely integrate with existing efforts.
Optimizer: to some extents, yes! Not sure about 'general happiness', but I'd happily let the computer dictate me what to eat, how to exercise and how to sleep. Maintenance is probably one of the most annoying things in my life.
In the current state, people don't know how to handle their data correctly. You are just making it easier for some people siphon other's people data even more. And having their data played against themselves.
Most people are not quants. And becoming one is not something trivial. Having people play unguided with their own data is like giving uranium to little kids. They will have a great fun !!!
From the post:
> Everything is local first, the input data is on your filesystem. If you're truly paranoid, you can even wrap it in a Docker container.
> There is still a question of whether you trust yourself at even keeping all the data on your disk, but it is out of the scope of this post.
> If you'd rather keep some code private too, it's also trivial to achieve with a private subpackage.
This seems as safe, if not safer, than the average service where a rogue employee can read/write to your data with little to no visibility.
> You are just making it easier for some people siphon other's people data even more.
The data is already being siphoned off of them -- usually through vague legal agreements and opaque security infrastructure they can't audit.
At least with this model, a user can see the whole scope of data a service is collecting and act accordingly: whether by limiting the data collection as needed or by switching to another service.
I guess you mean someone could easily steal your data, etc? I agree, it's a tradeoff, would be interesting to explore how to make it more secure.
> Having people play unguided with their own data is like giving uranium to little kids
Well, handling Uranium is a bit different. Can you elaborate on some specific examples?
I've seen people misusing statistics, and drawing misleading conclusions, sure. But even more people are using anecdata and broscience, which I think is worse.
If your top ten "most interesting Slate Star Codex" contains certain keywords, they will flag you in their database.
>Can you elaborate on some specific examples?
-People having sport injury from those sport performance training app. Their knowing of their data made them push-themselves past their limits.
-Little kids getting bad marks, and assuming they are just bad and not persevering. Their knowing made thing worse.
-Hypochondriacs becoming ill from the stress of all the illnesses they think they have.
-People doing weight loss badly focusing too much on the scale.
-Musk tweets :)
-People censoring themselves from knowing that having an unpopular position will cost them karma points.
As soon as you make a system self reflective, you impact it in profound ways. You generate some reinforcing feedback loops. That tend to either break the stability resulting in chaos, or tend to lock it into a specific behavior from which it's hard to escape.
I guess it comes down to a personal level of paranoia. And to the old question "Is ignorance bliss?", for which people had very different answers long before the software.
I’m sure achieving whatever your sufficient level of “quant” isn’t trivial (congratulations presumably) but a command line interface took shared with the Hackernews community is fairly self-selecting.
I think this is a super interesting project, looking forward to seeing how this gets developed!