This got me thinking about personal data storage. I think step 1 in owning our data is having a place to store it. That storage should be abstracted from the actual provider(s) so we can migrate and/or replicate our data. It should also be available from multiple devices. A personal data warehouse like this should be easy to create, a la 'deploy to heroku'.
It's shocking to me how few people I've met take their personal data storage seriously. Most folks I know treat Dropbox/Drive like a landfill.
Agreed. I feel as much as we've seen NASs grow into devices that are vaguely approachable by enthusiasts, there is still a huge amount of ground to cover before they would be the sort of thing that one could easily deploy in their parents' house.
There has been piecemeal progress in swinging the pendulum back from cloud-everything to easier to use edge computing. The Helm email server is one example. The slightly more plug-and-play approach to modern NASs is another. And there are others. But you can tell that the vast allocation of R&D is not going here yet. I do think investors will eventually wake up and realize that user demand for data control means better edge devices and avoiding reliance on the centralized cloud.
What I have envisioned for PAO [1] is federated encrypted backup. I would like to see NASs allow me to basically allocate a percentage of my capacity to various peers to store encrypted-at-rest duplicates of their data. And vice-versa. Basically a federated mesh. No need for blockchain or other crypto-hype nonsense. Just straight authenticated and encrypted file storage.
My opinion is that cloud dominance really traces back to the advent of and self-reinforcing power of asynchonous Internet connectivity. When Internet connectivity was often synchronous (think the very early days of DSL), peer-to-peer networking remained very common. As the number of users using asynchronous connectivity increased, it became reinforced as more services centralized data and content. Peer-to-peer is effectively a relic of the past now. Only today have we started to see some resurgence of symmetric connectivity (e.g., 1Gbps symmetric fiber). I believe deployment of symmetric connectivity will be a decentralizing force as more people realize it's possible to just access your file system and data directly between devices rather than use an intermediary. And as vendors realize this is an opportunity space to offer interesting technology (e.g., the likes of Zerotier) to consumers.
> I do think investors will eventually wake up and realize that user demand for data control means better edge devices and avoiding reliance on the centralized cloud.
Hear hear! We need to not forget the core lesson of the internet, which is centralization is a weakness. I have not been happy with the increasing trend towards centralization in tech, and I agree that people taking more ownership is going to mean more edge devices.
The hardware is there, it's the software that needs to catch up.
I would like to see NASs allow me to basically allocate a percentage of my capacity to various peers to store encrypted-at-rest duplicates of their data
This is something I've been thinking about, too. We have so many Internet-connected devices with increasingly cheap storage-- Some universal protocol for distributing data across this network would be really cool. (I understand this could sound like blockchain. I have no horse in that race.)
I would worry about the liability implications of such an approach. If one of those peers is storing (say) child porn, and some of that porn ends up on my hardware via automatic duplication, does that implicate me legally in their crime? If it does, does it matter whether it's on my hardware with or without my knowledge? If the police identify my hardware as part of that peer's storage network, is that hardware at risk of being searched and/or seized? That sort of thing.
There's no setup on the user's part. For the developer, it allows you to abstract the datastore and give ownership back to the user - the developer doesn't have access to the private databases of its users. It's possible to build completely client side apps with syncing between devices without ever being exposed to the user's content.
Looks like CloudkitJs also exists for the web. I'm not sure if it allows the user to export directly, but that would help guarantee users weren't trapped.
All that said, it's tied to the Apple ecosystem. An independent service with similar features and a large enough community would be interesting.
It is definitely concerning. 10 years ago I wasn't worried about putting my data in things like Dropbox because the general fear was "what if they go out of business?" to which I felt reasonably secure that I'd have time to get my data out of there before they shut down for good.
Now, it's much more likely that you'll get a "You have violated our Terms of Service and are banned from using <the service>", pointing you to a line that looks like: "The company may, at its sole discretion, decide what constitutes a violation of the Terms of Service, and terminate services as a result." It happens!
Additionally, depending on any small set of cloud providers makes you vulnerable to the powers that be if they ever decide that your account will be shutdown without opportunity for appeal. How up a creek would the average user be if their dropbox, google drive or amazon storage disappeared without and opportunity to fetch it first?
We need something like a geographically distributed coalition of storage that allows one to provide storage space for others in exchange for storage of your own data remotely. Then data can be replicated into multiple locations and roughly be secured on a mutually assured destruction sense (If you take down replication of my data, you lose your replication on my site)
sort of been on my mind lately because like you say, should be easy ;) snapshot of current thinking:
-some open pit data mining/management protocol exists and scrapes data out of your own personal forward proxy / metal that lives on the edge of fat bandwidth, you can do whatever you want with it, autogenerate bookmarks and forum interaction tags if you want (hadn't thought of that,) .. including not store it, because the software that stores it is part of a personally owned open source platform that is also providing all the cloud services that you normally go to third parties to obtain
..so..
-the basics are baked in, its got your social media/self-promotional pages that are interoperable with others, an online store, search/index peers are essentially friends on social networks.. its gets foggy, how much granularity? what sort of resource commit to the forward cache? anonymization routines? regional compliance issues? capability to sell dataset(?) like, who would actually use it?
.. etc ..
-if something like this were actually to organize i think it would be best visualized as some sort of platform in support of some server-farm co-op. also i keep thinking of openstack being overkill, somehow, and am likely wrong.
-for a personal user on a watered down feature-set that isn't supporting a large organization and still elects to own their own bare metal, it would be like .. two netbooks in a post-office-box housing place that has fiber ..
rather than restricting the software from commercialization by license it explicitly commercializes everything on behalf of the user ..
I'm partially being snarky. By offering opt-in backwards-compatibility 'ye olde establishment. Also intending to offer a path forward for businesses already relying on the business model, for users expecting products relying on the business model. I know I'm playing with a pipe-dream-sci-fi-quantum leap in the relationship between the end-user and the internet ..
... so I'm trying to be snarky and also fair and thus hopefully incentivize existing entities to implement the protocol and use it in order to take bites out of big data in manageable bits without setting everything on immediate fire. The folks who write apps with a bent on data-mining may be open to something more provider independent in order to draw in users.
also .. half-baked early adopter:
.. you are a streaming content author or have an online shop and want your content to be redistributed, and are willing to make some metadata deals in order to do so. you are peered with dozens of indexes and some of them require different participation levels, maybe you have shipping partnerships, you work with some online labels or other profit-sharing outlets and this useful metadata associated with traffic that content has generated in your PO box is requested by these partners. So, the parts of your "forward-proxy-cache" that were relevant in these transactions would want appropriate taxonomy in order to facilitate ongoing partnership. I see users on the internet who like targeted adds, I know people in reality who like shopping.. I dream of a world where they all get better hobbies but I'm not trying to judge. ;)
personal forward-proxy.. a reckless way of putting it, also s'/sold/shared/' where users are hopefully suddenly tuned into the reality that once something is copied out into the public domain..
Even as someone who has several terabytes sitting next to me in a homelab, I still leverage dropbox. You need something with strong integrations into other services to treat like a kind of internet RAM for short term storage.
My current approach is to have a nightly job which pulls my dropbox and other cloud storage into my local storage, but I'm planning to look into an S3 compatible service to see if that integrates well enough for my needs.
It's shocking to me how few people I've met take their personal data storage seriously. Most folks I know treat Dropbox/Drive like a landfill.