More about that here: https://dogsheep.github.io/
The tools I've built so far are under https://github.com/dogsheep
Another thing that annoys me right now (and this is a problem of my own making...) is earlier this year I started taking notes with an app on my iPad called Notability. It works great with the logitech crayon stylus/pencil and is useful for jotting down notes when doing online courses etc.
Except I've shot myself in the foot a bit because those notes are now bound to the app. Yes there is the Notability app for OSX, and yes I should have anticipated this problem sooner, but that's beside the point, my notes are locked into the Notability ecosystem. They support this half assed solution to export them as RTF files or PDFs but you lose stuff like handwriting recognition.
One project on my TODO list is to see if I can reverse engineer the proprietary Notability file format, which includes the text recognition and all the things needed to render the lines that make up your notes. I know there have been attempts to do this e.g. https://jvns.ca/blog/2018/03/31/reverse-engineering-notabili... I just need to put the time aside to make it work
I know I can use concur and ten thousand different apps but FFS I dont need to.
But Evernote gradually turned to trash by neglecting basic functionality like text editing and adding bugs with each release.
Someone raised this on here a few weeks ago and it was a gobsmacking moment - my hand flew to my forehead and I realised yes - that would be so useful but the big tech firms find it more profitable to have that data on their hard drives not mine.
You might find that (both the page and the project) useful too: https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Comm...
sqlite3 ~/.mozilla/firefox/profile/places.sqlite 'SELECT title, url FROM moz_places WHERE url LIKE "%mozilla.com"'
Like Firefox's Awesome Bar?
I believe that's the exciting idea the GP is talking about.
It's a great app, on a great device.
But I don't use it for anything serious any more because I can't get the notes out into other tools.
Quite a shame really.
edit: also meant to say that exporting the drawings only gives you raster images.
As the article says:
> Best case scenario is if the service is local-first in the first place. However, this may be a long way ahead and there are certain technical difficulties associated with such designs.
> I'm suggesting a data mirror app, that merely runs in background on client side and continuously/regularly sucks in and synchronizes backend data to the latest state.
Here are a few premises:
1. It's a fact that only a small portion of users care heavily about centralizing and truly owning their data
2. As such, it's reasonable for companies to not focus on exporting data. That's not how they get value out of their data
3. That being said, companies should at least not punish that small group of users for taking matters into their own hands
Scraping is our solution to this problem, and the least companies can do is allow well-behaved (rate-limited) scraping.
> The Solid Project  (AFAIK which is led by Tim Berners-Lee) is made to tackle this exact problem.
> It is about defining a standard/protocol to store personal data in a 'pod' and give minimal/granual access of my pod to web apps.
> Apps are separated to pods, and it allows the decoupling of data & functionality.
> You can switch(upgrade) from text message to instant messengers without losing any chat history for example.
> It also has an advantage that it prevents lock-in, since one can move their data around trivially.
> Looks like the OP would greatly benefit from this.
>  https://solidproject.org
What I picture is an a program that you use to store your own microblogs, blogs, contacts, comments, etc. and then you publish to whoever from that app via their API or crawling.
Imagine you just created a new microblog entry. You can now either post to your Twitter, Mastodon, etc. accounts with the click of a button. You would have to poll for replies though and it would be up to you to store them if you wished (you probably want to if you are storing your replies). As an added benefit you could see the replies in one place instead of bouncing between two sites.
The point is, when you create the data, it's yours first. Then if you want to, you can post it other places. Tools like this are abundant for businesses, but we don't seem to build tools for actual people anymore.
That's how I do it, see the original https://jeena.net/photos/524 the mastodon copy https://toot.jeena.net/@jeena/103214370709720207 and the Twitter copy https://twitter.com/jeena/status/1199954031134887936
Facebook on the other hand removed that API so I stopped crossposting there: https://github.com/snarfed/bridgy/issues/817
Both Google and Facebook have tools that allow you to export your data in both (usable) JSON and HTML.
As in, you can do GDPR export or Google Takeout now and then, but then you request it, enter your password/etc; in few days you will get a link to the archive. You can put a reminder to do it now and then, but then your export stales, it's just so frustrating. It's almost ok as a means of backup, but it's hard to use this data in a meaningful day.
It made me think about those hoarder reality TV shows and how there is a digital equivalent of that. In some ways I'm glad all of my digital data isn't amassing and following me around. I'm not sure I really care about what I posted to MySpace pages back in 2003 or usenet in 1998.
This is really a matter of ambivalence to me. On the one hand I would love detailed data about my history, even down to the exact GPS coordinates of my location every moment of my life. On the other, I am not sure this data would truly improve my life in any way. It may be that access to such data could make my life worse.
One of my back-of-the-mind ideas is to combine GPS data like this with some kind of basic activity-type id. Like "working", "practicing guitar", "socializing with friends", etc. Then I could get reports on how I spend my time.
As a third generation hoarder this is how I tried to tame it - I started hoarding data instead of things.
After a few major losses(a social network local to my country shut down without providing any option to export the data, I accidentally started formatting my backup drive after I just formatted my main drive) I developed a healthier relationship with my data.
I'm still reluctant to delete things, but now I can at least bring myself to do it.
It is about defining a standard/protocol to store personal data in a 'pod' and give minimal/granual access of my pod to web apps. Apps are separated to pods, and it allows the decoupling of data & functionality. You can switch(upgrade) from text message to instant messengers without losing any chat history for example.
It also has an advantage that it prevents lock-in, since one can move their data around trivially.
Looks like the OP would greatly benefit from this.
The problem though is the same with all self-hosting: maintaining a server. It looks like you're just on your own if you want to use a "solid" app.
Technologists seem preoccupied with creating perfect recollection, seemingly without realizing that people who have this ability innately often find it burdensome.
Given my druthers, I wouldn't retrieve my data from a service, I'd purge it.
The problem is exactly the lack of such control. Frankly, if you let the data out, there's no guarantee a copy of them does not linger somewhere.
Having companies have an incentive to get rid of your data after some time, for profit reasons, might be helpful. For instance, work email is usually purged promptly after the mandatory retention period expires, so that it could not be used in litigation.
It's shocking to me how few people I've met take their personal data storage seriously. Most folks I know treat Dropbox/Drive like a landfill.
There has been piecemeal progress in swinging the pendulum back from cloud-everything to easier to use edge computing. The Helm email server is one example. The slightly more plug-and-play approach to modern NASs is another. And there are others. But you can tell that the vast allocation of R&D is not going here yet. I do think investors will eventually wake up and realize that user demand for data control means better edge devices and avoiding reliance on the centralized cloud.
What I have envisioned for PAO  is federated encrypted backup. I would like to see NASs allow me to basically allocate a percentage of my capacity to various peers to store encrypted-at-rest duplicates of their data. And vice-versa. Basically a federated mesh. No need for blockchain or other crypto-hype nonsense. Just straight authenticated and encrypted file storage.
My opinion is that cloud dominance really traces back to the advent of and self-reinforcing power of asynchonous Internet connectivity. When Internet connectivity was often synchronous (think the very early days of DSL), peer-to-peer networking remained very common. As the number of users using asynchronous connectivity increased, it became reinforced as more services centralized data and content. Peer-to-peer is effectively a relic of the past now. Only today have we started to see some resurgence of symmetric connectivity (e.g., 1Gbps symmetric fiber). I believe deployment of symmetric connectivity will be a decentralizing force as more people realize it's possible to just access your file system and data directly between devices rather than use an intermediary. And as vendors realize this is an opportunity space to offer interesting technology (e.g., the likes of Zerotier) to consumers.
Hear hear! We need to not forget the core lesson of the internet, which is centralization is a weakness. I have not been happy with the increasing trend towards centralization in tech, and I agree that people taking more ownership is going to mean more edge devices.
The hardware is there, it's the software that needs to catch up.
This is something I've been thinking about, too. We have so many Internet-connected devices with increasingly cheap storage-- Some universal protocol for distributing data across this network would be really cool. (I understand this could sound like blockchain. I have no horse in that race.)
There's no setup on the user's part. For the developer, it allows you to abstract the datastore and give ownership back to the user - the developer doesn't have access to the private databases of its users. It's possible to build completely client side apps with syncing between devices without ever being exposed to the user's content.
Looks like CloudkitJs also exists for the web. I'm not sure if it allows the user to export directly, but that would help guarantee users weren't trapped.
All that said, it's tied to the Apple ecosystem. An independent service with similar features and a large enough community would be interesting.
Now, it's much more likely that you'll get a "You have violated our Terms of Service and are banned from using <the service>", pointing you to a line that looks like: "The company may, at its sole discretion, decide what constitutes a violation of the Terms of Service, and terminate services as a result." It happens!
We need something like a geographically distributed coalition of storage that allows one to provide storage space for others in exchange for storage of your own data remotely. Then data can be replicated into multiple locations and roughly be secured on a mutually assured destruction sense (If you take down replication of my data, you lose your replication on my site)
-some open pit data mining/management protocol exists and scrapes data out of your own personal forward proxy / metal that lives on the edge of fat bandwidth, you can do whatever you want with it, autogenerate bookmarks and forum interaction tags if you want (hadn't thought of that,) .. including not store it, because the software that stores it is part of a personally owned open source platform that is also providing all the cloud services that you normally go to third parties to obtain
-the basics are baked in, its got your social media/self-promotional pages that are interoperable with others, an online store, search/index peers are essentially friends on social networks.. its gets foggy, how much granularity? what sort of resource commit to the forward cache? anonymization routines? regional compliance issues? capability to sell dataset(?) like, who would actually use it?
.. etc ..
-if something like this were actually to organize i think it would be best visualized as some sort of platform in support of some server-farm co-op. also i keep thinking of openstack being overkill, somehow, and am likely wrong.
-for a personal user on a watered down feature-set that isn't supporting a large organization and still elects to own their own bare metal, it would be like .. two netbooks in a post-office-box housing place that has fiber ..
I'm partially being snarky. By offering opt-in backwards-compatibility 'ye olde establishment. Also intending to offer a path forward for businesses already relying on the business model, for users expecting products relying on the business model. I know I'm playing with a pipe-dream-sci-fi-quantum leap in the relationship between the end-user and the internet ..
... so I'm trying to be snarky and also fair and thus hopefully incentivize existing entities to implement the protocol and use it in order to take bites out of big data in manageable bits without setting everything on immediate fire. The folks who write apps with a bent on data-mining may be open to something more provider independent in order to draw in users.
also .. half-baked early adopter:
.. you are a streaming content author or have an online shop and want your content to be redistributed, and are willing to make some metadata deals in order to do so. you are peered with dozens of indexes and some of them require different participation levels, maybe you have shipping partnerships, you work with some online labels or other profit-sharing outlets and this useful metadata associated with traffic that content has generated in your PO box is requested by these partners. So, the parts of your "forward-proxy-cache" that were relevant in these transactions would want appropriate taxonomy in order to facilitate ongoing partnership. I see users on the internet who like targeted adds, I know people in reality who like shopping.. I dream of a world where they all get better hobbies but I'm not trying to judge. ;)
personal forward-proxy.. a reckless way of putting it, also s'/sold/shared/' where users are hopefully suddenly tuned into the reality that once something is copied out into the public domain..
My current approach is to have a nightly job which pulls my dropbox and other cloud storage into my local storage, but I'm planning to look into an S3 compatible service to see if that integrates well enough for my needs.
The author asks why he needs some start up to take his data and then trust them to do things. It’s because that start up spent a lot of money to build the software to run on their servers and now they don’t want to give it away for free.
In my opinion, this is the root cause of most of the issues with our data and control on the web in general. There was a time when the web itself disrupted the centralized networks of the day like America online, MSN, and CompuServe. The centralized services we use today such as Facebook, Google, Amazon, Twitter and others could have never been built unless a permission less platform like the web existed and disrupted centralized platforms. The centralized services we use today such as Facebook, Google, Amazon, Twitter and others could have never been billed unless a permission list platform like the web existed disrupted America online and the others. They had invested a lot in the infrastructure that the web leader replaced. They were extracting rents and controlling the platform.
Today we need something like that but the infrastructure we need to disrupt was built by Facebook, Google, etc. Wordpress and OpenStreetMaps is just one example.
That’s what I have been putting my money into for 8 years and open sourcing:
Take it, use it, it’s free. Build on it Whatever solutions you want to manipulate whatever data you want. And perhaps more importantly, host software for entire communities of people who want to collaborate and connect with each other. Not just manage their personal data.
Feedback welcome on how to make it better. We are planning to officially launch later next year to be like a Wordpress of 2020.
These are infrastructure problems, they should be treated as such, i.e., maintained by tax dollars.
The failure of the semantic web and the sad state of personal data are primarily failures of the free market to solve these problems, imho.
- People are moving from Microsoft Office to Google Docs
- Designers are moving from Sketch/Photoshop/Illustrator to Figma
- People are moving from Evernote/Text Files/Whatever to Notion
- People are moving from HTML files to Webflow
- People have already moved from native mail clients to Gmail
There's a problem here, but it isn't that the free market hasn't solved this problem, it's that people are choosing other features (mainly collaboration) as more important than data ownership.
Note that this is a transition mainly driven by tech people, none of these products have gone mainstream yet (excluding Gmail of course), and the products that offer data ownership are still far more popular overall. But if the mainstream follows tech peoples lead, that won't be for long.
If it were treated like infrastructure, we could have both.
Pointing to a product that got replaced by another product doesn't inherently prove anything.
And seriously, who uses Gmail voluntarily? The complete non-existence of a suitable SMTP implementation is a pretty good example of a _clear_ failure of the market to properly allocated resources.
edit - thanks for pointing out Figma though, might have to check that out
This is a pretty moot point since if we didn't have computers we wouldn't have things recording a lot of the data in that article to begin with. For example, with location, you could write down where you are every minute of the day, but that isn't very practical. Luckily, we have computers to automate that. Does that mean you should have to give that data away to a third party?
> how many people could manage a library / filing system rich enough to catalog the level of information we're expecting to keep here
Nobody could do that manually. That's the job of computers. I suppose you could keep a journal and store boxes of pictures and a lot of paper. It would be a pain, take up a lot of space, and take forever to search through.
Luckily we do have computers and they happen to be really good at searching. So you should be able to just store this stuff on your computer and have it assist you with owning that data.
Instead what we have is a world where you upload everything to the cloud so someone else owns the data and you have no idea what's happening with it. They also get to choose how the data is presented to you. Since your data is spread among so many companies it hard to get the aggregations mentioned in the article. Usually the only way that happens is companies agreeing to share your data with each other. Most people are okay with this since it's "free".
Hell I don't always know where I'm going to put all the groceries I take home.
EDIT - to add - too much data isn't really useful. When we are talking personal data collection it's basically a librarian's job, which is non-trivial
I totally agree with the general thrust of this argument. I'd like to hear more about use cases for this kind of personally-owned, aggregated data store. Once this article started talking about searching over, say, notes and highlights from articles and blog posts, I started to see specific use cases that seem totally compelling. However, it's not clear to me how this part of the data ownership conversation matches up with the seemingly more principles-driven data ownership conversation.
Theoretically there's nothing stopping you from building some of these more specific implementations (which the author has done--btw those projects looks really cool).
Just don't buy so many groceries. Jk. Actually, I have started working on a project that would help solve this problem (in combination with solving others). It's just nowhere near ready and won't be for a while.
> too much data isn't really useful
That's not what all the companies building huge data centers are saying.
> When we are talking personal data collection it's basically a librarian's job
I'm not really sure I understand this point. What do librarian's have to do with this?
It translates well to a digital medium. The general idea is a collection of granular information (notes) interconnected in a non-hierarchical way using tags.
I believe I have solved some of that problem. Hode (the Higher Order Data Editor), a kind of generalization of graph databases, lets a user enter data in a manner which I believe is as similar as possible to natural language. To encode the fact "cats kill birds" as a "kill" relationship between "cats" and "birds", you just write "cats #kill birds". Relationships involving other relationships can be represented with similar ease, and relationships involving any number of elements. The query language is not much more complicated -- for instance, everything that kills birds would be "/e /it #kills birds".
I'm starting with photos & videos because my own collection is trapped in Flickr, and it may not be long for this world. I intend to build out such a system and publish instructional videos on how to do it as I go along.
Imagine a world in which more or less everyone can do this. It's not such a long shot, don't you think?
I’ve seen a bit of how Boeing tracks parts and doing something simpler but similar for data might be tractable now, except that i think it would take APIs that worked substantially different than conventional code. Except perhaps in Ruby and Node (thinking in particular about htmlsafe tagging in Rails)
It seems like a lot of the things and products mentioned here, if released by an independent dev or small team, could similarly be overlooked. I can't imagine most of the engineers I know (and I suppose especially not rich megacorps) to really ever consider the .01% of people (the kind of demographic you'd find on HN, I guess) saving ALL browsing history across browsers according to some universal standard or LinkedIn statistics or YouTube text history.
I can see how this would be relevant for most people living well enough, say, like middle-upper class America, who can afford these technologies and to care about the multitude of examples presented, but is there a conversation about how much data we should or need to be collecting (to say nothing of handing off to 3rd parties) at all in the first place?
I've always felt that relying and interacting with less technology (or at least making efforts within reason to) was better for my own quality of life (less tracking, less worrying about posting regrettable stuff, sticking to basic principles like "move more, eat less" instead of obsessively counting stuff on my old MyFitnessPal and Fitbit) - surely I'm not alone?
This is a valid point, I kind of admit these are sort of first-world problems. But my main motivation for raising this issues in the first place is to learn better, process information more effeciently, have better memory, and this is something I wish to use to work, learn and reason about things that really matter, like climate change or solving poverty, etc.
Regarding tracking less -- I guess people are different, I do know people who are happy to just stick to 'move more, eat less'. For me personally such maintenance is boring, and looking at stuff like workout/sleep data etc really motivates me to learn more about it and keep going. I really hate going for another run but at least I'll have a datapoint after it!
I feel like the actually stressful bit is having to think about tracking. If it was done automatically and you didn't have to think about it, then why not? You'd always find people who are obsessed about doing (or not doing) things even without counting I guess.
Micropayments would go a long ways to shifting providers away from advertising and towards pay-per-use. Many users would not object to paying a fraction of cent for reading an article - especially if it would enable the provider to remove the tracking and invasion of privacy all in order to make a buck.
Is now the time to restart this?
Totally happy to pay for someone to maintain connections between bank APIs and a Google Spreadshert. Curious how long they'll last.
> Why can't I search across watched youtube videos even though most of them have subtitles hence allow for full text search?
This blows my mind every time I'm on youtube... so much potential, and yet.
> Often, a friend recommends you a book so you want it to add to your reading list.
Yep. For a while I was collecting them in a spreadsheet. After about two years I've realized it's actually a lot more about the context of why/where/when I added a book rather than it simply existing in a long list. Even though I had "Source" and "Date Added" as columns, I (still) have no way of grouping them by topic cross-referenced with my notes.
Also, the conversation in which I received the recommendation likely has valuable context I haven't included, and good luck deep-linking to a message. (Telegram handles this OK, but Gmail? Or god forbid iMessage).
> Why can't I see what was my heart rate (i.e. excitement) and speed side by side with the video I recorded on GoPro while skiing?
Another angle I've considered: the past four (text/email/GPS) interactions with (some person) has resulted in higher stress levels ... this is an insight I typically extract from writing about my day. Would be interesting to have it suggested to me. Yes, lots of privacy implications here.
> It's just a matter of regularly fetching new stories/comments by a person and showing new items, right?
Not sure if you've seen fraidycat  and the discussion . Basically, a fetch-and-consume model for blogs, Twitter, etc with frequency/priority levels.
> Why am I forced to manually copy transactions from different banking apps into a spreadsheet?
Plaid  looks promising, but I haven't built anything noteworthy with it yet.
> Why can't I easily share my web or book highlights with a friend?
This is ridiculously hard. I think my favorite solution to date is to copy-paste the whole article into a Google Drive doc and annotate it. Not a good solution, I know.
> I wonder what computing pioneers like Douglas Engelbart (e.g. see Augmenting Human Intellect) or Alan Kay thought/think about it and if they'd share my disappointment.
I imagine they would be/are very upset.
0 - https://github.com/kickscondor/fraidycat/issues
1 - https://news.ycombinator.com/item?id=21802952
2 - https://plaid.com/
> Monzo API only allows to fetch all of your transactions within 5 minutes of authentication.
The referenced link states:
> After a user has authenticated, your client can fetch all of their transactions, and after 5 minutes, it can only sync the last 90 days of transactions. If you need the user’s entire transaction history, you should consider fetching and storing it right after authentication.
So I would think the sentence would be:
"Monzo API only allows to fetch the last 90 days of your transactions after 5 minutes of authentication."
...which actually seems worse.