Being "distributed" over p2p/federated architecture opposes the end-user convenience of search/discovery/ranking/recommendations because of speed-of-light limitations. I wrote a previous comment about this: https://news.ycombinator.com/item?id=17578332
Also, the previous last reply by posting2fast of a partial-centralized server doesn't really replace Youtube because his proposed idea creates new problems of spam videos and untrusted/fake videos. E.g. the central index of metadata says "www.johndoehomeserver.com" has a tutorial video for Algebra but when you actually stream the video from "johndoehomeserver.com", you get a spam video for Viagra instead of math instruction. Therefore, users will naturally gravitate toward the centralized servers that have both the metadata and the actual video content. This emergent group behavior of preferences would end up recreating another "Youtube"-like clone.
p2p architecture and torrents works well for things like pirated Photoshop or ripped Marvel Avengers movies because the users already have the content's title _preloaded_ in their brain and therefore a centralized index for discovery/serendipity of unknown content isn't necessary.
Anyone then could build a search index and build a good search experience.
To combat spam, instances should reveal up/downvotes to indicate quality, I guess your fake math video would not get much love from the community.
Please take extra care to correctly parse what I actually wrote in response to the gp. Yes, speed-of-light is still a limitation based on the gp's constraint of "search/discovery in a _distributed_ way" which means the search algorithm avoids central servers and loops through a bunch of remote p2p nodes to parse a bunch of exposed JSON manifest files.
If instead, the search algorithm loops through data in a cached index server, that's no longer "search in a distributed way" that the gp was originally wondering about. That's the particular point I was responding to.
>Anyone then could build a search index and build a good search experience.
Now, as to the issue with that "cache index server" that pre-parses the JSON files...
The cache server that also contains the actual video data will naturally attract the most users because when they hit the "play" button on their smartphone, the video starts immediately instead of waiting or suffering stuttering from somebody's flakey home video server.
So, the index server with the "good experience" as perceived by users will be the one that also includes the actual videos -- basically acts as a CDN -- and this emergent behavior of user preferences defeats the decentralized ideals of p2p video.
We see that p2p of things like illegal software already works and is proven. However, p2p of mainstream videos has massive technical hurdles that oppose how typical users like to discover content and play them with immediate gratification.
So DNS isn't distributed because my computer caches queries?
I think this is arguing semantics rather than practicalities.
Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself. What we care about is the ability to aggregate search results from multiple places, to bypass search if we have a specific video URL that's being shared, and to build our own search engines without running into copyright problems.
If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?
> So, the index server with the "good experience" as perceived by users will be the one that also includes the actual videos -- basically acts as a CDN -- and this emergent behavior of user preferences defeats the decentralized ideals of p2p video.
My reading of this argument is I might as well just host my blog on Medium, because Google search is just another point of centralization. And after all, for speed reasons users will prefer to use a search engine that hosts both the blog and the search results -- so eventually Google search is definitely going to lose to Medium anyway.
But of course Medium isn't going to unseat Google, because in the real world speed improvements are relative, and at a certain point users stop caring, or at least other concerns like range of accessible content and network effects begin to matter a lot more.
It's both I would argue.
Distributed systems professor here. My lab has been working on a "academically pure" distributed Youtube for 14 years and 7 months now. That means no central servers, no web portals, and no discovery website. Pure Peer-to-Peer and lawyer-proof hopefully. Distributing everything usually means developer productivity drops by roughly 95%. Plus half of our master-level students are not capable of significantly contributing. Decentralised==hard. This is something the "Distributed Apps" generation is re-discovering after the Napter-age Devs got kids/s
> All there needs to be done is to expose a static, daily generated JSON file that contains all videos on the instance.
Or simply make it real-time gossip.
Disclaimer; promoting our work here. We implemented a semantic clustered overlay back in 2014 for decentralised video search, that could make it just as fast as Google Servers. This year we finished implementing a real-time channel feed of Magnet links protocol + deployment to our users. Our 51k concurrent users ensure that we can simply re-seed a new Bittorrent hash with 1 million hashes, then everybody updates. Complete research portfolio, including our decentralised trust function .
> does anyone actually care if it's technically decentralized?
That is an interesting question. Our goal is real Internet freedom. In our case, logically decentralisation is a hard requirement. Our users often don't care. Caching servers quickly introduce brittleness into your architecture and legal issues.
Again, I'm not talking about a technical engineering component. I'm talking about users aggregate behaviors.
Please see my other reply of how we seem to be talking at different abstraction levels.
>Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself.
Right, but that's not what I'm arguing. I'm talking about centralization as a emergent phenomenon that bypasses the ideals decentralized protocols that the protocol's designers didn't intend.
>If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?
I guess I don't understand the premise then because if that were true, why would the adjective "distributed" even be mentioned in the question "search/discovery in a _distributed_ way?" To me, something about distributed/decentralized as a characteristic in the technical implementation is very important to the person asking the question.
EDIT: here's another example of that type of "search without central indexing server" question: https://news.ycombinator.com/item?id=20282397
So am I.
For example, Github currently hosts the majority of Git repositories online, and I've heard people argue that this means Git isn't really decentralized, because the user behavior is to stick everything into a central repository on a central server. But when Microsoft bought Github, lots of people migrated to Gitlab, and (issues notwithstanding) it was easy for them to do so because of Git's distributed architecture. Git was decentralized enough that pivoting from a bad event was still way easier than it would have been with a different architecture.
When I talk about decentralization as a practical concern, I'm not worried about users aggregating around good services. I'm worried about whether the architecture supports moving away from or augmenting those services if something goes wrong in the future.
And what I mean when I talk about centralization as a continuum is that the social aggregated behaviors you're worried about are still strictly better under a PeerTube system than they are under a Youtube system -- so there's no point in bashing PeerTube just because it doesn't solve literally every problem.
If I'm removed from a centralized PeerTube indexing service, my video is still online under the same URL, and I can still point users at a different indexing service. If censorship becomes problematic or widespread, users will move to different indexes because the network lock-in of an indexer is less than the lock-in of a social platform. As far as speed concerns go, users can fall back on slower indexers only when fast ones fail. All of this is workable.
But if I'm removed from Youtube, I have to start over from scratch with a new URL on a different site with different features that doesn't play nicely with any of the existing tools or infrastructure.
> I'm talking about centralization as a emergent phenomenon
The emergent phenomenon you're talking about is that sometimes better, faster services have more users than bad services. That's not a problem with decentralization, and that's not a problem decentralization is trying to solve. Decentralization is only trying to mitigate the harmful effects of that phenomenon.
It is not a desirable goal of decentralization to make every node in a graph have the same traffic levels -- and I mean that both on a technical and on a cultural level.
I understand your point here but this sounds more like a technical detail and not about social power structure. To your point, I'd also say the combination of DNS and http protocols already allow for people to move their content around the internet (keep the same url) and yet people do care about aggregation around platforms because they don't like concentration of power. So even though you state you don't worry about it, others do. I believe reducing platform power is part of the motivation for p2p video.
>And what I mean when I talk about centralization as a continuum is that the social aggregated behaviors you're worried about are still strictly better under a PeerTube system than they are under a Youtube system -- so there's no point in bashing PeerTube just because it doesn't solve literally every problem.
Btw, I'm not "bashing" Peertube. Instead, I'm trying to emphasize that it would be a mistaken belief to think that a p2p video protocol can stop defacto centralization. (E.g. see history of http protocol on why that doesn't happen.) Instead of thinking about what's technically possible with cache index servers, we should think about what humans typically do that inadvertently recreates centralization that nobody seems to want. A quality cache index server can create a feedback loop that attracts both users and video uploaders which weakens decentralized p2p nodes. If that particular cache server's popularity doesn't really matter because p2p nodes will always be able to independently exist, then that means today we can also say that Youtube doesn't matter because you can already serve videos (AWS, Azure, home server) independently outside of Youtube.
>If I'm removed from a centralized PeerTube indexing service, my video is still online under the same URL, and I can still point users at a different indexing service. If censorship becomes problematic or widespread, users will move to different indexes because the network lock-in of an indexer is less than the lock-in of a social platform.
But people can make the same argument about Google's index search results. E.g. it doesn't matter if your blog or niche pet store got removed from the page 1 of the search results because you can theoretically point users to a different indexing service (Bing, or roll-your-own index ranking algorithm with Common Crawl dataset, etc). The content at the url domain you already own is still at that url. But we both know that answer (while true in a sense) does not satisfy people. Website owners get very upset when they lose ranking or get removed (censorship) from search results altogether. Even though there are technical solutions for people to not use "google.com", it's irrelevant when their mental framework is "power & influence" of Google.
>The emergent phenomenon you're talking about is that sometimes better, faster services have more users than bad services. That's not a problem with decentralization, and that's not a problem decentralization is trying to solve. Decentralization is only trying to mitigate the harmful effects of that phenomenon.
I think I disagree with that but let me expand. If the goal of decentralization is some diversity (e.g. some niche content has a place to serve video outside of Youtube) then your paragraph makes sense. However, if it's the more ambitious idea of "replace Youtube", then yes, it's a huge problem of decentralization that it can't be as fast/convenient/quality as centralized services for normal users. If most mainstream users are avoiding decentralized services because it "didn't solve problems it doesn't claim to solve" -- does it mean decentralization "succeeded"? I guess there's semantic wiggle room there.
>It is not a desirable goal of decentralization to make every node in a graph have the same traffic levels
I never claimed equal traffic was desirable and that seems to be an uncharitable reading of my points.
The comment you link to above makes a technical argument. It asserts what you believe is and isn't technically possible. In that sense I feel like you are moving the goal posts.
That's all an end-user cares about.
Indexing videos once a day (or once an hour or whatever) would be very feasible. Indeed, different servers could create their own indexes, and some might be better at sorting for relevance than others.
I imagined gp (mikece) as a HN techie (not an oblivious end-user) and thought he was wondering about how to use programming technology to avoid central servers ... and therefore, me interpreting "search/discovery in a distributed way" in a very literal manner was the appropriate level of abstraction to mikece. Avoiding central servers (if possible) is an interesting goal to discuss because they have a tendency to attract disproportionate users which defeats the goals of decentralization.
>Indexing videos once a day (or once an hour or whatever) would be very feasible.
And here, you're interpreting what's feasible only at the level of the technical stack instead of considering several chess moves ahead to emergent group behaviors which renders the metadata-only type of index a solution as not end-user friendly.
>, and some might be better at sorting for relevance than others.
And that's the server that would end up becoming a defacto "centralized" server that people were trying to avoid. This is especially true if that superior server also includes the video data.
Consider that the http protocol itself is already decentralized. If that's true why do people perceive Youtube and Facebook as centralized when they're only nodes on a http network? Because decentralized protocols don't stop emergent group behavior towards centralization.
I fail to see how an search index would be bad user experience. Compare that to the current situation of 61 isolated, unsearchable PeerTube instances.
You/SamBam/danShumway are at the abstraction level of technical protocols, parsing JSON files, index servers, etc.
I'm at the abstraction level of psychology and emergent group behavior that overrides those ideal technical structures.
>Distributed services still benefit the traffic from Google,
This sentence is a perfect example of how we're focusing on different things.
Your interpretation: Google is an index, and it links out to distributed servers. Ergo, an analagous Peertube-Index metadata server that lists PeerTube p2p nodes can be technically accomplished to do the same thing. What's the problem?!?
My interpretation of mikece: Google's index/algorithm/ranking/censorship has "too much power" over the web ecosystem and this a common complaint of its centralized authority of urls. Who gave Google all that power? Us websurfers did! How did it get that power even though it just has links to distributed http nodes instead of serving up the data (NYTimes article, etc) itself? [Excluding Google Amp in this example.]
To me, mikece is asking how to avoid another Google/Youtube type of defacto centralization of power which means we avoid central servers from existing to accumulate that power in the first place. To me this means p2p clients all querying each other and mikece is wondering if this is tehnically possible. That's what my speed-of-light answer is about.
Therefore, discussing what's "technically feasible" with indexing p2p video nodes seems to be missing the point if the abstraction level is emergent group behavior.
I see you don't like this, but slicing up a service to isolated islands won't help much. It's a good step forward, but search is essential and in this case takes very little effort.
Furthermore PeerTube instances are centralized services too, if one gets very popular, then it will thrive / suffer the same way YouTube did.
Apathy or indifference is also a valid position. However, I was addressing the many who do think there's "bad" in that emergent behavior.
What does it mean in practice? Some believe Google's search index, Youtube's video service, Facebook, etc have too much power over the internet. Therefore, lecturing them that "the http protocol itself is already decentralized so what does it matter that one http node spelled "youtube" is more popular?" -- is not a satisfactory explanation. They want to change that power imbalance.
Therefore, I believe the social ideals for p2p video would be to take away power from Youtube and have it more widely dispersed. Ideally, nobody would be big enough to "dominate" in the web video ecosystem. There wouldn't be a Power Law of popular cache index servers with one eventually dominating.
I'm saying that p2p video really can't prevent that from happening if a bunch of users voluntarily gravitate towards index servers which are centralized -- which negates the power-dissipating intentions of p2p. Also consider that many video content creators would voluntarily upload their videos to those index cache servers which further solidifies the centralization of power. Humans keep being humans and will subvert the (global) goals of decentralization and (local individual actions) aggregated together inadvertently recreate centralized platforms!
If you don't care about that, that's valid but a lot of others do based on common complaints of Youtube wielding too much influence.
Speed of light is not the bottleneck in reaching 1000ms search response time anywhere on earth. Calling it a speed-of-light limitation does a disservice to your point, which really is that querying many peers for search results is slow, for reasons that have nothing to do with the speed of light.
> E.g. the central index of metadata says "www.johndoehomeserver.com" has a tutorial video for Algebra but when you actually stream the video from "johndoehomeserver.com", you get a spam video for Viagra instead of math instruction.
That some video content may not reflect its supposed category or title is not a new problem, is it?
Discover heavily sponsored content from content farms. As an experiment, even with an old account, start just browsing the content Youtube highlights. You will soon end with a recommendation page full of shit with 500K+ views using the same template.
I have a very old account and browse suggestions using my brain, not randomly. While not perfect, almost every recommendation right now looks like something I could watch.
Bicycling, civil engineering, cat toys, and weird metal music mashups, which are all similar to things I intentionally watched, but haven't watched.
LBRY is similar to PeerTube in open and decentralization, but all content metadata is written to a blockchain, which means everyone/anyone can access the index. This blockchain can then be searched (https://github.com/lbryio/lighthouse) or extracted to SQL (https://github.com/lbryio/chainquery).
Maybe they could implement a naive "view tracking" by having the client do small proof of work when they interact with content?
Similar to voting in NotaBug? https://github.com/notabugio/notabug
So the view count publicly available and something like the Google search analytics already available? Personalized data offers a huge competitive advantage.
I think if the legislation ball ever gets rolling two things we're likely to see, because they're low-hanging fruit, are the end of mass tracking on the internet and a meaningful shift in who controls the data gathered.
I can imagine a platform akin to internet banking where you manage your data and its usage.
Something I'd love to see is a "publication" of big-data algorithms. A private entity designs the algorithm for profit and leases it and you run it in your (trusted) environment, owning both the input and output. Nothing leaks.
Its "this person watched this, so they would also be interested in this video and this ad." You can't make this anonymous and near as useful, and it is currently YouTube and Google's premium money maker.
Most other data is already available with a little work, providing the data you describe doesn't help competition that much.
That's what I meant by cross-platform linking.
> You can't make this anonymous and near as useful
I argue that you can. Anonymity is about not linking you, the physical person, to your online presence. An online presence can be tracked and profiled, without the invasion of privacy. It all depends on what data is collectable and who has access to the data. An algo provider doesn't need to also control the data it is used on, it just needs access to training datasets. There are technical solutions to all these problems, but it's a political solution that's lacking.
> Most other data is already available with a little work, providing the data you describe doesn't help competition that much.
Well if it doesn't help competition, how useful can it be?
Everything there is taking place on YouTube. Ads can maybe be a cross platform, but even that isn't necessary
>I argue that you can
It can't. Knowing what videos I have watched in the past is very useful. This can't truly be anonymous and shared.
>Well if it doesn't help competition, how useful can it be?
You are the one arguing that releasing this data solves a problem.
These aren't big problems, I think. The problem is that the people who hold the monopolies on data at the moment are also the people who are extremely powerful lobyists.
Ideally we’d see blogs return as the curated content mediators.