Yup. Calling it. This is the future, may not seem like it to everyone but this is a part of actually new and extremely useful, passively scalable technology. Imagine (good) unkillable zombie databases- so long as the name of a piece of data is known (its hash).. someone, somewhere, might make it possible for you to answer your query, without even needing to setup a server setup.. that’s _it_! Not to mention it might be faster than the original source.
Pretty damn powerful. The next level above this is a search index that lets the user generate their own results, using their own machine learning algorithm or ranking weights of their own preferences, bc they would have direct access to the DB index and features. People could wrote anti-ad plugins. There could be foss upgrades all the time. Nobody would have to spend a particularly crazy amount of money on storage if they could all just cache the bits they’d needed themselves. Quite remarkable imo!! Whatever ends Google search’s reign, will probably be user-owned in a way that seems a lot like this..
Unfortunately this is still killable. Someone needs to periodically publish up to date databases using some outside protocol. Also if you participate in the public DHT, then your participation is public and can have consequences on your real-life legal entity.
If you want to use TOR and onion sites, I don't think this really adds anything to those.
I think this just helps you optionaly crowdsource bandwidth.
With a simple protocol on top it can be quite resilient. The database is updated only by a group of 3 admins, who perform the update once a week. When one admin initiates an update, the other two ask him the passphrase. If the answer is "tomato" (means "I'm compromised"), the update is rejected and they find another admin. Regular users are generally safe: the laws dont punish downloaders, only uploaders. You can expand the admin team to 5 members, so RIAA (or whoever stalks torrent sites) would have to compromise 2 admins simultaneously. Don't forget that the admins can be in different countries, making their legal pursuit a nightmare.
That's mostly important due to social factors. Technically you'll find that most torrents are seeded by people who seed a lot, while most users just drive-by leech it.
What makes torrents special is that all leechers CAN be seeders, that don't have to be though.
At least in Czech republic, there is no penalty for downloading copyrighted content. You are, by law, entitled to it, as you might get yourself a copy of what you already own. Bought a film on DVD? You can legaly download it from wherever.
This all ends once you start uploading it, tho. You cant upload it, period. Problem with torrents is theres no checkbox to signal "stricly no upload", you can only set limit to few kb/s to minimize this issue, but technically even few bits can still get you prosecuted.
(edit: following is wrong, perhaps germany is same as Czech, see child comment) In Germany, its a different story. You cant even donwload, not by torrent or even from sites. I know of people receiving fines in their mailboxes with IP/Date/Site/Material list and finig each piece by ~200€.
Your information about Germany is incorrect. The legal situation here is the same as what you described for the Czech Republic. You can download anything anywhere. What's illegal is making stuff publicly available, and BitTorrent uploading falls under that.
Source: was caught and got council from a copyright lawyer.
Pretty sure you can’t download either. But the fines would be very low, so instead they go after uploaders, convince the court it’s "commercial distribution", values skyrocket and for a small 1000€ fee you can make the problem go away.
ime correct: individual downloaders will not be prosecuted unless somewhat high-profile, intitutional/commercial users (vps) will only get a cease and desist. uploaders are fair game.
It also helps that it’s pretty hard to figure out who downloads. With torrents, you’ll always upload as well (outside using specific clients), with websites they have no data.
Thanks for correction! Looks like the letters sent were torrent regarded, then.
I also knew of most sites to download from were being blocked at every possible level (ISPs, landlord...), so I connected those two in a wrong way
> At least in Czech republic, there is no penalty for downloading copyrighted content
This is not correct - you still make a copy and the copyright holder could still sue you over it, but it's just not worth doing. It's not legal to do so, the Czech Republic is a signatory to the Berne Convention, it's just not a criminal offense. Which to be honest is the situation in virtually every country in the world.
I've have a answer to this same thread specifying something i'm working on, so in case you are curious, you can have more info by reading the details i gave in the other answer, and the way to solve this problem is through indirection.
You don't expose your data layer directly to the consumer, you expose an api that will resolve to one, two or several databases from one or many peers. The indirection allow you to define your rules and use your data layer in a way that fits your application goals in the best way possible.
So whats is immutable is your application, api and initial data, which you can mutate at later stages through other torrents or by consuming other api's from other peers and mutating your initial database state.
The problem is, the current browser is not meant for this, Javascript is not meant for bigger and complex applications (of course you can do it, but..)
This idea touches on (or rather, even expands on) an idea I've had brewing for a while. For those unfamiliar, there is a programming language called Unison [1] that has a nice feature where each function is identified by a hash of its AST. I often thought a cluster environment, maybe something BEAM-ish, would be an interesting idea where functions are retrievable based on a pair of (`func-name`, `hash`). You could have some kind of execution environment (maybe a lightweight process) that receives a hash identifying the function to execute and some pointer to the data (probably a URL) to act as input.
Given tech like described in the linked article, you could reasonably lookup such a function from a peer-to-peer network.
This approach to mobile code in functional languages has been tried repeatedly since the 1990s.
> each function is identified by a hash of its AST.
That encoding for mobile code only works if the expression has no free variables (i.e. is closed).
It turns out that it is surprisingly difficult to write code which you can be sure has no free variables at particular points (the "send this code to another machine for execution" points). Especially in functional languages. If you write a higher-order functional, you know nothing about the closedness of its argument. If you require that all function arguments be closed it breaks all the useful functional programming techniques. The only way to make it workable is to check closedness only on those values which need to be mobile.
If you do the closedness-check at runtime, toy examples will work fine and the technique looks quite powerful, but once your codebase gets to a meaningful level of complexity it turns into a game of whack-a-mole with closedness heisenbugs. You quickly discover that closedness is data-dependent.
In order to check closedness statically you need quite a sophisticated system of modal types. MetaOCaml was the most usable result of all this research:
The important upshot here is that this isn't just some check you can slap onto the compiler. The programmer has to think about these types, and craft them carefully, as an integral part of the programming process. Basically you aren't just writing a program, you're writing a program plus a proof that it won't try to mobilize open code. Writing formal machine-checked proofs is not something that most programmers are good at.
I agree a complete system capable of something as magical sounding as I described would be both extremely complex to implement and probably even a nightmare to program within.
As to the specific discussion of free variables and open/closed arguments, I have to admit I have no background or education in the theory behind programming languages. Other than going through about 50% of SICP about a decade ago, I also have very little experience with functional languages.
What has been fuelling this interest lately is learning a bit more about stack based languages like Forth. My extremely primitive understanding of such programming models suggests that a function/word in that kind of programming context is closed over some defined portion of the stack. So my naive mind considers that one could grab as much stack as necessary along with the word to be executed and just pipe that over to some other execution context. Of course, details matter and there are probably several important ones that I haven't even considered that would make this naive assumption border on impossible. However, it at least seems more reasonable to attempt than crawling through a heap trying to gather everything.
In pure functional programs it's possible to copy the state (monad/environment/free variables), but it's not always efficient, depending on what those are.
In general programs, doesn't have to be functional, the environment is stateful and often has abstract, black box processes. This can be transferred as well, by copying some things, transforming others, and in general where necessary using a two-way pipe of some kind. By analogy, imagine creating a WebSocket at the same time as making a JSON-RPC call.
Except for the black boxes, this kind of transfer can be taken further to make a general distributed system. Think things like Paxos, Raft, CRDTs. If the data is also mobile to where it is used, this performs well for many tasks. When that is done, in some ways everything comes back to messages and state again and doesn't need specific pipes. That's more robust then the simple form of transfer.
The resulting architecture is often much faster than the non-distributed version for many tasks, and robust, but it takes a lot of tricky parts to make a general purpose architecture with those qualities and stay high performance.
You can copy them in a pure functional program if the function AST you're transporting is sufficiently expressive, because the values of the unbound free variables in pure FP are available at the point where the function is defined, so can be inlined into it, and included in its hash.
But when it's not pure FP, then indeed you can't copy the values into the function AST, because the environment at the point of each call to the function is what matters.
(The pipe stuff is just for making it more efficient, and handling black boxes which sit outside the paradigm of concrete values, things like I/O monads and devices).
How do you copy a variable? You can only copy values. A free variable has no value. You're confusing capture with closedness.
Here's a trivial example:
\f -> \y -> (runRemotely (\x -> f y x))
The expression passed to runRemotely has a free variable "y". How are you going to serialize (\x -> f y x) in order to send it across the network? When you hit the "y", what are you going to do?
For this trivial minimalist example you might cook up a one-off hack like lambda-abstracting the free variables in the runRemotely expression and then reapplying to the returned value, but there are much more complicated and insidious examples where these workarounds don't work.
The media control barons have spent the past decade building legal strategies for taking down promising services. Personally I also feel like this is the future, but only in a future where we dismantle their stranglehold on culture.
Anyone who shares content (whether a file or a byte) exposes themselves in one way or another and can be targeted. Anyone else remember the “good old days” when torrent sites stood up in the face of the law and yelled “we host nothing illegal because we host nothing!” and they were right. Then the law firms started joining the swarms to collect IP addresses of everyone that shared with them.
Now they rarely bother trying to sue IP addresses. The legal ability is still there, it just turns out it’s not worth the cost to sometimes collect a few hundred dollars from someone who can’t afford it.
When we talk about transformative tech, we have to immediately consider the “standard scenarios” because being blind to them is how most technology trips and falls. Pirating (especially live PPV events), illegal porn, utterly violent videos, information on making weapons, letting children access same and/or mixing them in with adults in unsafe and unsupervised spaces, openly anti-government statements in countries where they make people disappear, and so-on.
These things happen. And they happen faster and more intensely than most programmers can handle. The only way to keep tools like this viable is to build the tech with those in mind.
> so long as the name of a piece of data is known (its hash)
But that's also the fatal flaw. It's pretty definitionally true of stuff on the web that the person who owns / controls it wants to be able to change / update it. That isn't really supported at all at present for this distribution method, and it's hard to see how it could be in the future without introducing a single choke point.
Case in point: the vast majority of torrent traffic is for new torrents, because people want to download the stuff that's just been released, not the stuff that they either already have or could have had months ago. You lose 100% of this traffic with this "p2psearch" method, because the database can't be updated. Or if it can be updated (stick it on a traditional website for people to mirror), you rely on everyone updating to the newest version of the database.
It's also incredibly slow compared to traditional sites. It took longer than it took me to type this comment to see a single result for the popular title I tried to search for.
despite the downvotes, there is some substance to this. mutability is useful.
however, as implemented, all that is needed is for the user of p2psearch to refresh and the browser to pick up the latest database. i imagine most users are not keeping torrent search open 24/7, so this doesn't seem onerous.
it's probably a bit of a process for the host of the frontend to update the database, prepare a new torrent, update the code [0], and then rebuild the bundle regularly, but this could be automated.
regardless, it doesn't seem so unreasonable from an end user perspective, and i personally don't mind if my torrent search index is behind by a few days.
> it's probably a bit of a process for the host of the frontend to update the database
Right, I suppose I didn't put my point very well, but as I see it there are only two options:
* Everyone gets their code (with the built-in torrent for the database) from the same source, presumably the source that created / seeds the database. In this case the host is arguably just as vulnerable to the authorities as The Pirate Bay. If p2psearch doesn't make torrenting more resilient than TPB, what purpose does it serve?
* One brave citizen creates and hosts a single copy of the database, and it goes viral. Lots of people host it. It never gets updated, because the whole point of using IPFS / torrents is that everything can be addressed statically. In this case the advantage of p2psearch is obvious - it functions as a DHT with a built-in search that is more or less impossible to take down! On the other hand, the lack of mutability greatly reduces the value, since as I said in my OP most torrenting is focused on new releases.
It worked instantly for me. As good as anything else.
You get other features with it for free. Popular chunks will be easier to obtain as many have them and preservation starts with rare pieces. You'd want common use to be fast and uncommon to complete eventually.
If a new torrent uses the same folder name the new files will appear next to the old ones but if so desired the old files may be duplicated into the new torrent.
Its not IPFS but it works.
If you can get a person or organization to sign off on the data I'd say its a feature rather than a bug?
Let the professor publish his data set and let interested parties store it without much effort.
The current internet is terrible at archiving anything it creates. Dead links, impossible to find videos if they get deleted for some reason, news articles that get memory-holed, easier to find archives of paper newspapers than of online newspapers... That's why so many people tweet screenshots of other tweets rather than linking them directly, the network relationship may break at any moment, so better just to post an image rather than link information.
In the Napster/Torrent era once I found a video, document or audio I was interested in, it was very easy to find again without having to worry about archiving it myself.
This isn't a static site. It is dynamic requiring a lot of frameworks. I can think of a lot of better ways to accomplish something similar, especially with IPFS.
First CAS (content addressable storage) is nothing new. Torrents aren't new and are basically a distribed CAS database. With a magnet link (basically a checksum) you can find those files hosted by any torrent client on the planet.
So sure things are "unkillable", but depend on someone/somewhere decided it's worth storing that particular chunk. Seems strange to pair "unkillable" which depends on someone somewhere that "might" do something.
Much like how IPFS was easily oversold, great you can find things hosted anywhere on the planet, but if you publish 1M files and expect to magically seem them hosted elsewhere a year later you are likely to be disappointed.
I do wonder if it would be a better approach to replace filecoin or similar complex trust relationships with a simple peer to peer trading program. Something along the lines of "Lets trade 128MB", trust but verify, then watch uptime/availability. For clients up less than a month, give them the free 128MB and watch, 1-3 months trust them enough to store data with a 20x replication, 3-12 months 10x replication, over a year 5x, whitelisted peers of friends/family 3x.
I've built something akin to this, but with the idea of applications distributing API`s where the databases are also torrented but working behind the api's, so that developers can build basically anything.
In my case i've implemented a new "browser" based on Chrome that allows this to work, without having to resort to browser-only infrastructure (for instance applications can dodge Javascript and also call RPC api's from other applications directly).
The applications and the applications data are distributed over torrent and managed to work together in the same environment as a flock where one app can consume its own apis and also the api's from others.
service Search {
rpc doQuery(string query) => (array<string> result) // the access to the sqlite db from torrents is encapsulated here
}
The beauty of this design is that it can also be re-scheduled and have the same request routed to other peers
I know everyone is fed up with the cliche of web3, but having done some side projects with web3 (as in npm i web3) and ipfs, the posted article here is very exciting.
This posted link could be another piece of the jigsaw (Solidity, IPFS, IPNS, ...) that I think will come together to make interesting apps in the future. Solidity doesn't mean necessarily spending $100 to make a function call - there are other chains, off chain stuff being developed, and you could host a private chain for your app.
While none of this stuff can do something new you can't do with Postgres - you can create more open and perhaps 'honest' applications where everyone can see the data-engine and understand it. So it is more of a cultural shift. For example, if you make a Twitter this way, you don't need to rely on an API. The data is there for everyone, all of it.
For example, take Uniswap. It's not a company like Facebook, it's an open protocol. Swapping tokens is now functionally open source because of that. There is no "Facebook of swapping tokens". And this can be used as a building block for other apps. Not necessarily just "gambling" or "trading" either.
Even as someone who profited well from Uniswap's initial token offering I still have to say that "swapping tokens" is an entirely useless functionality. Gambling, trading, scamming, and laundering are the only realized uses of crypto I have seen since I became involved with the ecosystem a decade ago. The fatal flaw of the decentralization argument is that humans are involved. People want to be able to negotiate chargebacks, even if it is a hassle. People are willing to surrender anonymity in exchange for forms of credit. All organizations require trusted parties. DAOs in a real world would still require forms of centralized trust to perform (AWS, Cloudflare, ...) and little stops one member from going rogue, stealing IP, and starting his own (legally recognized) business. This is before you even start opening an Econ 101 textbook to page 1.
While I generally agree with the principle that there is a conflict between things people say they want in terms of anonymity etc and what they do want (security for chargebacks, credit etc etc), there are still increasing trade use cases, especially around sex, that are heavily discriminated against.
What you’re describing is so 101 level, I don’t mean it in an insulting way but in bewilderment from the typical criticisms from within tech from which you’d think what you’re describing is impossible mythology or completely off the mark
Transparency in business is a hugely interesting area, look at efforts around equal exchange and related schemes for honest products that get workers paid. New financial instruments for ensuring this transparently to funders (whether a commercial transaction/purchaser, or public goods funding, or other sources) is transformative potentially on a political scale, or at least more scalable than other efforts with less dilution to practicalities of traditional commerce. Maybe I’ll have more words to share when I can express it through my current project
(which I must add because of concerningly wide quickness to attack, is not eth or ecologically impactful tech)
Yes. The web3 name is being used by crypto-shysters right now so unfortunately they have the lions share of web3 implementation as they call it. But this (OP) is real web3!
No. This thing puts the user first (forgive my corpspeak): it gives us a powerful tool for free. Web3, on the other hand, puts the advertisers first. If this was web3, it would have some useless microtransactions attached to it. I bet that someone behind web3 is thinking hard right now how to stop this torrent thing.
I don't think so. This still relies on non-distributed infrastructure to host the code and provide the database torrent. Those are not "unkillable". And an inefficient completely immutable database where even individual queries are not private does not sound like it has a lot of uses.
Unkillable zombie databases would be an amazing asset for fanfiction & art. Not only does it become slightly more C&D-resistant than normal hosting, it also protects against the far more common scenario: artists and authors blowing their lid and ragequitting by deleting all their work.
My biggest issue with storing data inside the browser is the lack of tools to set up filter rules for clearing or explicitly not clearing website-specific application data.
Looking at Chrome's Inspector, Application Storage has "Local Storage", "Session Storage", "IndexedDB", "WebSQL" and "Cookies".
If I decide to clear all this stored data stored for all the pages with one button press in the settings, how can I be sure I'm excluding something important? Or if I install a new extension which helps me manage privacy, how can I be sure that it won't delete all this data because I forgot to add a site to its whitelist?
I don’t think the average users ever delete any of their browser data. In this case however, the whole idea would be that there is redundancy and it is ok for you to clear your cache and storage. As for the hoster, they would probably use a nodejs type server to host their files instead of relying on having a browser tab open.
However, I do agree that relying on regular web browsers is not a good idea. A browser update could limit resources of background tabs even more than they normally do and soon your application is down.
One of the flaws that traditional torrents suffer from is if all the downloaders want 1 file out of 10 and thus seed 1/10, the torrent can ever again be fully downloaded if the last seeder with 100% drops off.
I suspect this is not an issue for the demo since the DB seeding is probably being monitored, but as a technique? I could definitely see chunks of a DB not being available due to no one seeding 100%
Not in this case. Visitors to this site are only pulling data from a torrent, not hosting any data for other visitors.
If you want to help host the torrent then you'd be seeding the whole database by default. Torrents are split into blocks and there is no easy way to choose to only download/seed specific blocks of a single file.
Inspired by this, can anyone explain why distributed protocols more often opt for centralized consensus algorithms like Raft, instead of decentralized schemes like Chord or Kademlia? In all cases, the underlying data structure is a shared key/value store. Intuitively, the p2p approach feels more robust, since each node only needs to worry about itself, and every node is the same. So why add the coordinator node? Is it still the right choice in $current_year, even after so many hours of development invested into strong p2p consensus protocols like libp2p powering Ethereum (currently ~11k nodes btw, not actually that big – and many operated by small number of entities)?
Obviously you don't need consensus protocols if you are not trying to build consensus...
It's like asking why we need filesystems and network cards if all we're trying to do is show lights on screens. Obviously not all patterns of lights are as easy to produce on computers.
I'm not talking about the consensus protocol of the blockchain itself, but of the p2p algorithms underlying it, e.g. using Kademlia for service discovery and message routing. I'm asking why a distributed system would choose something like Consul (which uses Raft, and requires a coordinator node) instead of running a decentralized protocol like Kademlia (which has no coordinator nodes) within their distributed single-tenant environment.
I did a bit more research last night, and discovered that Bitfinex actually does something like this internally (anyone know if this is up to date?) [0] — they built a service discovery mesh by storing arbitrary data on a DHT implementing BEP44 (using webtorrent/bittorrent-dht [1]).
This seems pretty cool to me, and IMO any modern distributed system should consider running decentralized protocols to benefit from their robustness properties. Deploying a node to a decentralized protocol requires no coordination or orchestration, aside from it simply joining the network. Scaling a service is as simple as joining a node to the network and announcing its availability as an implementation of that service.
At first glance, this looks like a competitive advantage, because it decouples the operational and maintenance costs of the network from its size.
So I'm wondering if there is a consistent tradeoff in exchange for this robustness — are decentralized applications more complex to implement but simpler to operate? Is latency of decentralized protocols (e.g. average number of hops to lookup item in a DHT) untenably higher than that of distributed protocols (e.g. one hop once to get instructions from coordinator, then one hop to lookup item in distributed KV)? Does a central coordinator eliminate some kind of principle agent problem, resulting in e.g. a more balanced usage of the hashing keyspace?
Decentralization emerged because distributed solutions fail in untrusted environments — but this doesn't mean that decentralized solutions fail in trusted environments. So why not consider more decentralized protocols to scale internal systems?
Raft is a consensus protocol. You don't need it if you don't need consensus.
You can do service discovery etc with gossip protocols. You are right that you don't need consensus to have systems publish their own keys on a network.
Systems that use Raft or equivalent do have a need for consensus, for example CassandraDB/CockroachDB (when you do `UPDATE account SET balance = balance - 10 WHERE user='chatmasta';`, you need that transaction to go through only once globally, and you need the whole system to agree on whether it did), Kubernetes (when you ask for a database to be served on some hostname, you need a single instance to go up, and every load balancer to route to that same instance), etc.
If you have examples of systems that use strong consensus when it's not required, point them out. Stating "why do some systems use strong consensus when other systems (doing something completely different) get by without consensus" is a bit strange.
a decade ago, I had the idea of a distributed internet wide filesystem where the chunks could be duplicated over and over again over the internet. Something survivable and loosely/eventually consistent when updated.
Someone appears to have built at least part of it.
Since ipfs is hash addressable, is there a decentralized way to point to the latest hash of the content. I can point a domain to it, but domain can be taken down.
Yes. There are some alternatives and frameworks like Fleek that abstract this away from you but the most straightforward way is to update the CID with each update.
Webtorrent support has been added to Vuze and libtorrent (backs clients like deluge and qBittorrent), so there is some ability for mainstream swarms to interact with webtorrent peers.
All of this seems very familiar; I recall similar claims being made about torrent-paradise.ml (domain now dead); it was a static site, unkillable, db distributed on IPFS, etc.
Is there any relationship between these two projects, or are the similarities only incidental?
The site and search eventually work, but the torrents themselves do not download and do not appear to have all necessary magnet information in the links. So the downloads just hang there in the torrent client.
Pretty damn powerful. The next level above this is a search index that lets the user generate their own results, using their own machine learning algorithm or ranking weights of their own preferences, bc they would have direct access to the DB index and features. People could wrote anti-ad plugins. There could be foss upgrades all the time. Nobody would have to spend a particularly crazy amount of money on storage if they could all just cache the bits they’d needed themselves. Quite remarkable imo!! Whatever ends Google search’s reign, will probably be user-owned in a way that seems a lot like this..