Every time I see social media recreated in a decentralised fashion I realise why the successful ones are centralised. The loading times of this are dreadful and I don't see how you'll be able to solve that problem when you're relying on P2P.
In any case, it sounds like we'll be getting many new Reddit clones so I may as well plug what I've built recently: a Reddit alternative API that is free. Check it out if you're a dev that is being caught out by the Reddit pricing changes[1].
Basically the difference between connecting to datacenter-residing apps that internally share data between each other and connecting to 10 random guy's basement PCs.
Everyone who played any P2P multiplayer shooter in their life knows P2P sucks... on the other hand, dedicated server based shooters and MMOs (generally) work great. You don't have to be an IT professional to know this. Similar concepts (speed of light and multiplicative effects of multiple slow connections) apply here.
P2P is only great for uses where latency doesn't matter. See: Torrents.
That's too broad of a generalization. p2p is AWESOME for latency since every server added adds latency.
What you are talking about are probably p2p networks with multiple users (based on your shooter reference) and the problem is of home connections and bandwidth since multiple users that all need to connect to everyone doesn't scale.
So for your example:
One on ones over p2p are optimal, but p2p matches with more than 8 players are suboptimal (I pulled the number out of my ass it depends on the protocol and usecase)
It is both decentralized and distributed. It's not P2P though, it's a hub-and-spokes architecture with several interconnected (federated) hubs.
Hubs can have decent uptime and thick internet connections. Hubs are relatively few so they replicate state quickly between themselves, and then more locally between spokes when they come online.
Most natural structures are more tree-form, like that.
That just depends on how you define "distributed". If it means "running on multiple machines", then even centralized protocols are distributed, since a part of the computation is running on your computer.
In context of protocols like Mastodon, if the end-user devices aren't primary holders of data, then I don't call it distributed. It's just decentralized. I guess that way, "distributed" necessarily implies peer-to-peer.
Mastodon page: requires unblocking JavaScript, takes one second just to load the basic layout (without post contents), then takes four to ELEVEN seconds to fully load
Weird. Anecdotal I think. Maybe something with your instance. Mastodon UI is quick and snappy for me. I migrated away from mastodon.social when the influx from Twitter made that server very slow (it has since scaled up significantly).
With an empty cache, five seconds for the full layout, one further second for images. With a populated cache, four seconds to load everything at once. And also requires unblocking JavaScript for no good reason. Sure, not as bad as twitter.com, but that’s damning with faint praise.
I thought Twitter was horribly slow just because of the insane amount of JS and HTML it uses for displaying some simple text. I've always avoided clicking on Twitter links because it's so ridiculously slow.
Why not a bittorrent style peering mechanism? Popular content will always have pools of people who are consuming it.
I've had an idea in the back of my head for what would make a decentralized video service work, and it plays with the idea of peer driven networks.
So, there a few kinds of people on the network: advertisers (hang in there, I'll explain why), leechers (or consumer), content producers, and seeders.
The prime relationship is between a consumer and a content producer. Every other part of the network is designed to support and benefit their interaction.
First, a content producer creates a piece of content (or steals it, let's be realistic) and publishes it on the network. Attached to the content is a pay-per-view requirement; the producer has elected to require payment to view this content. This money can come from one of two places: first, an advertiser can bid on slots the producer has chosen, second, the consumer can pay the producer directly. The producer doesn't have a choice whether the consumer pays or not, only that payment is required.
Requiring payment for the content is optional, but advisable. You see, the producer has a second order relationship with the seeders. The producer can't possibly feed a million viewers in the first hour. That would be a very difficult task. But what a producer can do is elect to provide a percentage of their required income to seeders. Now there's a financial benefit to seeding a popular video/song. The producer can set their payments to whatever they want. The market may equalize itself.
Finally, we have advertisers, which are the monetary fuel for the fire. Most viewers will tolerate ads and wont want to pay money; so instead they pay time. An advertiser can bid on slots in as precise or general way as they want. If they want to bid on a specific creator's video slots, they can. If they want to bid on a broad category of creators or videos, they can. Etc.
This won't ever be truly decentralized, and there is a whole ecosystem of services external to the network that would be needed for this to function well (payment processors, content collation and moderation -- kind of like how mastodon works, a bidding marketplace for advertisers to interface with the content network, a front end for seeders to select and re-seed, and so on).
I don't have the technical hutzpah to build out something like this. I wish I did. Is there someone out there who wants to be a technical mentor? I'd be game.
--
[edit] There should be more and less granularity in how different parties engage. Some people want to pay 10 bucks a month (Amazon Prime) and forget about the rest. Some people want to pay per view. Advertisers might want to target specific creator's channels or whole segments of created content. Creators might want to be able to never even think about ads and just check a box that says "put an ad in front of every video I upload" and then forget it. Or a creator might want to put ads in videos longer than a certain length. And so on. What I've specified above is loosely conceived and would need a lot of thought to turn it into something useful.
Decentralization is a meme, the great majority of consumers don't care. People used Morpheus because it was an easy way to watch (free) movies, not because it was decentralized. Now they use Netflix because it's an easy (and relatively cheap) way to watch movies, not because it's centralized.
Besides, Matrix already has p2p chat, IRC has always been a thing, and so on, heck even BitTorrent is technically still alive. (De)centralization is not a feature, nor is it a bug, it's simply irrelevant.
Hopefully, for the consumer it would be no different than YouTube or Netflix. Unless you don't want ads, then you pay your dues.
I have no idea how the experience would vary for the advertisers.
For the producer it should offer more flexibility in how they get paid. This is the major selling point I think. The ability to put out a high effort video -- say you made a full length movie -- and be able to charge theater prices (17.50USD, or close to it) and make sure the ads hit at the appropriate times? It might be worth it for larger creators to swap over.
Wow, as a consumer I am really sold on this model! You tell me I will pay theater prices and get ad interruptions throughout the show? This is fantastic, sign me up. /s
I apologize. I guess I miscommunicated something. I mean that a creator can opt to charge what they want and also turn off ads if they choose. Or front load the ads, like theaters do. Really, it is up to the creator how they want to format their content.
How does your non official API work? Is it using the official API and caching, redirecting to the website, and transforming it into json? I'm just curious.
That doesn't sound like it's going to be a stable API? I'm not sure there is much you can do though.
Edit: yea after looking at this, it definitely appears to violate the TOS. Also this seems like a huge red flag just begging reddit to shut it down: "Unlike with the Reddit API you do not need to authenticate using OAuth."
IANAL but my understanding is that violating the ToS is only ever a problem if your crawler decides to "sign-in", as that constitutes agreeing to the terms of the ToS.
Further, old.reddit.com doesn't gate any content (even NSFW stuff) behind a sign-in page (at least for now)
> IANAL but my understanding is that violating the ToS is only ever a problem if your crawler decides to "sign-in"
Where on earth would you get that impression?
From the reddit ToS: "To use certain features of our Services, you may be required to create a Reddit account (an “Account”) and provide us with a username, password, and certain other information about yourself as set forth in the Privacy Policy."
NB: "Certain features", that clearly means that features are not gated by sign in, and that the ToS also applies to them.
Also from the Tos: "
Except and solely to the extent such a restriction is impermissible under applicable law, you may not, without our written agreement:
license, sell, transfer, assign, distribute, host, or otherwise commercially exploit the Services or Content;
modify, prepare derivative works of, disassemble, decompile, or reverse engineer any part of the Services or Content; or
access the Services or Content in order to build a similar or competitive website, product, or service, except as permitted under the Reddit API Terms of Use."
And later: "Access, search, or collect data from the Services by any means (automated or otherwise) except as permitted in these Terms or in a separate agreement with Reddit (we conditionally grant permission to crawl the Services in accordance with the parameters set forth in our robots.txt file, but scraping the Services without Reddit’s prior consent is prohibited)"
My point wasn't that scraping reddit is not a violation of the ToS, it was that you're not able to legally enforce the terms of the ToS (that you have quoted in your reply) on people who haven't agreed/consented to them (which they do by logging in).
Even if I were to agree with your interpretation (and I absolutely do not), this is still plain jane mass copyright infringement. The submitters have given reddit a sublicense to publish their content, not random 3rd parties.
> The submitters have given reddit a sublicense to publish their content, not random 3rd parties.
This is tangential no? Third party reddit apps _already_ republish end user content.
I'm not saying third party apps powered by scraping are not illegal, I'm saying they're _no more_ illegal than those powered by the official reddit API.
Question: if someone, using the internet, gains access to your personal computer, but doesn't add anything or delete anything, have they committed a crime?
Interesting. I just did a quick read. The CFAA protects computer data and is limited to data in which the federal government has a legal interest; data of financial institutions; and perhaps some additional specifically enumerated parties. Its reach is further limited to the theft and subsequent use of data that causes specified types of harm.
So, while it almost certainly would not apply to our personal computers, I think it would probably apply to most commercial companies (provided they were large enough to constitute commercial commerce and provided data was used in a manner explicitly enumerated by the statute)
Impressive feat! Does reddit have rate limiting, or other hurdles in place similar to the hoops youtube-dl has to jump through? Curious what your thoughts are about maintaining a project like that.
As history has shown, you can only do so much to stop this. If you perfectly mimic the GoogleBot and use google IP ranges by hosting on google cloud, they either take an SEO hit or let you bot them at the end of the day. GoogleBot looks like a DDoS attack a lot of the time too
You can also go the route of looking like a pool of users, then it's just a game of cat and mouse and one providers don't really have time to play
Not at the minute. I might open source it in the future, especially if I get a cease and desist. Though hopefully even if it comes to that someone like the EFF will help me fight it.
When visiting the link, chrome just suggested I may mean to visit "reddit", never seen that before. I have safe browsing disabled, so I'm at a lost from where this comes from.
https://i.imgur.com/l57osTi.png
For the GET /r/:subreddit/comments/:article endpoint, is there a limit to the amount of comments returned? If yes, how is it determined which get returned? Sorted by top? Best?
Yeah, currently the limit is 50 comments. I plan to implement the ability to choose a sort and limit, assuming the API actually gets some usage. Right now the sort is whatever the new.reddit.com default is.
Reddit may not be a public website for long. They push their app and ask for logging in so often, it wouldn’t surprise me if they shut off parts of the site for not logged in users.
They can still ban your service, e.g. block your IPs. Sure, you could then play a cat and mouse game with them, but your API clone going down every few days will make it unusable for anyone with a real purpose. The companies that reddit (officially) wants to target, e.g. OpenAI et al., have no problem scraping.
In any case, it sounds like we'll be getting many new Reddit clones so I may as well plug what I've built recently: a Reddit alternative API that is free. Check it out if you're a dev that is being caught out by the Reddit pricing changes[1].
1 - https://api.reddiw.com