Hacker News new | past | comments | ask | show | jobs | submit login

Every time I see social media recreated in a decentralised fashion I realise why the successful ones are centralised. The loading times of this are dreadful and I don't see how you'll be able to solve that problem when you're relying on P2P.

In any case, it sounds like we'll be getting many new Reddit clones so I may as well plug what I've built recently: a Reddit alternative API that is free. Check it out if you're a dev that is being caught out by the Reddit pricing changes[1].

1 - https://api.reddiw.com




I don't really find Mastodon that slow.

I do think it's got a lot, lot of sharp edges that need smoothing out before it's really viable. But speed isn't a problem, and it's decentralised.


Mastodon is not P2P/decentralized, it's federated.


Basically the difference between connecting to datacenter-residing apps that internally share data between each other and connecting to 10 random guy's basement PCs.

Everyone who played any P2P multiplayer shooter in their life knows P2P sucks... on the other hand, dedicated server based shooters and MMOs (generally) work great. You don't have to be an IT professional to know this. Similar concepts (speed of light and multiplicative effects of multiple slow connections) apply here.

P2P is only great for uses where latency doesn't matter. See: Torrents.


That's too broad of a generalization. p2p is AWESOME for latency since every server added adds latency. What you are talking about are probably p2p networks with multiple users (based on your shooter reference) and the problem is of home connections and bandwidth since multiple users that all need to connect to everyone doesn't scale.

So for your example: One on ones over p2p are optimal, but p2p matches with more than 8 players are suboptimal (I pulled the number out of my ass it depends on the protocol and usecase)


> Mastodon is not P2P/decentralized

Mastodon is decentralized, because there is no central server. However, it is not distributed.


It is both decentralized and distributed. It's not P2P though, it's a hub-and-spokes architecture with several interconnected (federated) hubs.

Hubs can have decent uptime and thick internet connections. Hubs are relatively few so they replicate state quickly between themselves, and then more locally between spokes when they come online.

Most natural structures are more tree-form, like that.


That just depends on how you define "distributed". If it means "running on multiple machines", then even centralized protocols are distributed, since a part of the computation is running on your computer.

In context of protocols like Mastodon, if the end-user devices aren't primary holders of data, then I don't call it distributed. It's just decentralized. I guess that way, "distributed" necessarily implies peer-to-peer.


Postgres in leader/follower config = centralised Cassandra = distributed Bitcoin = decentralised

The crux comes down to the consensus model for state


Nitter page: full page layout with all text loads in about half a second, images load in the next three

https://nitter.it/rauschma

Mastodon page: requires unblocking JavaScript, takes one second just to load the basic layout (without post contents), then takes four to ELEVEN seconds to fully load

https://fosstodon.org/@rauschma

I expected better.


> https://fosstodon.org/@rauschma

Took between 1.2 and 1.8s for a full fresh load, for me.


https://mastodon.social loads in under a second, for me.


Now use nitter and create your account, login, and tweet something. If you can't, you didn't compare.


Weird. Anecdotal I think. Maybe something with your instance. Mastodon UI is quick and snappy for me. I migrated away from mastodon.social when the influx from Twitter made that server very slow (it has since scaled up significantly).


Hmm, compare for https://elk.zone , is it still slow? For me it is almost instant.


With an empty cache, five seconds for the full layout, one further second for images. With a populated cache, four seconds to load everything at once. And also requires unblocking JavaScript for no good reason. Sure, not as bad as twitter.com, but that’s damning with faint praise.


Both of these load instantly for me.


> I don't really find Mastodon that slow.

Not generally, but many servers. Which results in me avoiding clicking on any links to mastodon because I expect it to be extremely slow.


Which I find frankly ironic because I dread clicking on Twitter links for that same reason.

It takes 10 seconds to load a tweet, and I have to click through the "view in app" nag, to boot.

Using Mastodon servers have been a breath of fresh air in comparison.


I thought Twitter was horribly slow just because of the insane amount of JS and HTML it uses for displaying some simple text. I've always avoided clicking on Twitter links because it's so ridiculously slow.


I use nitter, the right servers are fast ;)


Too much content is blocked for nitter this days. At least on my experience.


Huh? Like what? Never had that happen.


Some servers/instances maybe are less reliable than others. Same for teddit and invidious.


Ah, well, luckily it’s easy to switch servers. But that’s the issue with Mastodon, you have to view the content on the instance it’s posted on.


> The loading times of this are dreadful and I don't see how you'll be able to solve that problem when you're relying on P2P.

It's easy enough to centralize a caching layer, perhaps through a popular CDN.


Caching is not easy for something like Reddit, where everyone has a personalized feed.


Couldn't you say the same of Mastodon?


Mastodon is not P2P


Why not just cache sub feeds?


Why not a bittorrent style peering mechanism? Popular content will always have pools of people who are consuming it.

I've had an idea in the back of my head for what would make a decentralized video service work, and it plays with the idea of peer driven networks.

So, there a few kinds of people on the network: advertisers (hang in there, I'll explain why), leechers (or consumer), content producers, and seeders.

The prime relationship is between a consumer and a content producer. Every other part of the network is designed to support and benefit their interaction.

First, a content producer creates a piece of content (or steals it, let's be realistic) and publishes it on the network. Attached to the content is a pay-per-view requirement; the producer has elected to require payment to view this content. This money can come from one of two places: first, an advertiser can bid on slots the producer has chosen, second, the consumer can pay the producer directly. The producer doesn't have a choice whether the consumer pays or not, only that payment is required.

Requiring payment for the content is optional, but advisable. You see, the producer has a second order relationship with the seeders. The producer can't possibly feed a million viewers in the first hour. That would be a very difficult task. But what a producer can do is elect to provide a percentage of their required income to seeders. Now there's a financial benefit to seeding a popular video/song. The producer can set their payments to whatever they want. The market may equalize itself.

Finally, we have advertisers, which are the monetary fuel for the fire. Most viewers will tolerate ads and wont want to pay money; so instead they pay time. An advertiser can bid on slots in as precise or general way as they want. If they want to bid on a specific creator's video slots, they can. If they want to bid on a broad category of creators or videos, they can. Etc.

This won't ever be truly decentralized, and there is a whole ecosystem of services external to the network that would be needed for this to function well (payment processors, content collation and moderation -- kind of like how mastodon works, a bidding marketplace for advertisers to interface with the content network, a front end for seeders to select and re-seed, and so on).

I don't have the technical hutzpah to build out something like this. I wish I did. Is there someone out there who wants to be a technical mentor? I'd be game.

--

[edit] There should be more and less granularity in how different parties engage. Some people want to pay 10 bucks a month (Amazon Prime) and forget about the rest. Some people want to pay per view. Advertisers might want to target specific creator's channels or whole segments of created content. Creators might want to be able to never even think about ads and just check a box that says "put an ad in front of every video I upload" and then forget it. Or a creator might want to put ads in videos longer than a certain length. And so on. What I've specified above is loosely conceived and would need a lot of thought to turn it into something useful.


You're just describing YouTube with extra steps.

Decentralization is a meme, the great majority of consumers don't care. People used Morpheus because it was an easy way to watch (free) movies, not because it was decentralized. Now they use Netflix because it's an easy (and relatively cheap) way to watch movies, not because it's centralized.

Besides, Matrix already has p2p chat, IRC has always been a thing, and so on, heck even BitTorrent is technically still alive. (De)centralization is not a feature, nor is it a bug, it's simply irrelevant.


Hopefully, for the consumer it would be no different than YouTube or Netflix. Unless you don't want ads, then you pay your dues.

I have no idea how the experience would vary for the advertisers.

For the producer it should offer more flexibility in how they get paid. This is the major selling point I think. The ability to put out a high effort video -- say you made a full length movie -- and be able to charge theater prices (17.50USD, or close to it) and make sure the ads hit at the appropriate times? It might be worth it for larger creators to swap over.


Wow, as a consumer I am really sold on this model! You tell me I will pay theater prices and get ad interruptions throughout the show? This is fantastic, sign me up. /s


I apologize. I guess I miscommunicated something. I mean that a creator can opt to charge what they want and also turn off ads if they choose. Or front load the ads, like theaters do. Really, it is up to the creator how they want to format their content.


haha this is what im thinking, where is the incentive to actually use it.


This may be a very, very naive question but: is it possible to have a decentralized reddit that has centralized servers that cache the data?


How does your non official API work? Is it using the official API and caching, redirecting to the website, and transforming it into json? I'm just curious.


It basically scrapes reddit.com's HTML. Doesn't use the official API at all (that would defeat the purpose).


That doesn't sound like it's going to be a stable API? I'm not sure there is much you can do though.

Edit: yea after looking at this, it definitely appears to violate the TOS. Also this seems like a huge red flag just begging reddit to shut it down: "Unlike with the Reddit API you do not need to authenticate using OAuth."


IANAL but my understanding is that violating the ToS is only ever a problem if your crawler decides to "sign-in", as that constitutes agreeing to the terms of the ToS.

Further, old.reddit.com doesn't gate any content (even NSFW stuff) behind a sign-in page (at least for now)


> IANAL but my understanding is that violating the ToS is only ever a problem if your crawler decides to "sign-in"

Where on earth would you get that impression?

From the reddit ToS: "To use certain features of our Services, you may be required to create a Reddit account (an “Account”) and provide us with a username, password, and certain other information about yourself as set forth in the Privacy Policy."

NB: "Certain features", that clearly means that features are not gated by sign in, and that the ToS also applies to them.

Also from the Tos: " Except and solely to the extent such a restriction is impermissible under applicable law, you may not, without our written agreement:

license, sell, transfer, assign, distribute, host, or otherwise commercially exploit the Services or Content; modify, prepare derivative works of, disassemble, decompile, or reverse engineer any part of the Services or Content; or access the Services or Content in order to build a similar or competitive website, product, or service, except as permitted under the Reddit API Terms of Use."

And later: "Access, search, or collect data from the Services by any means (automated or otherwise) except as permitted in these Terms or in a separate agreement with Reddit (we conditionally grant permission to crawl the Services in accordance with the parameters set forth in our robots.txt file, but scraping the Services without Reddit’s prior consent is prohibited)"


My point wasn't that scraping reddit is not a violation of the ToS, it was that you're not able to legally enforce the terms of the ToS (that you have quoted in your reply) on people who haven't agreed/consented to them (which they do by logging in).


Even if I were to agree with your interpretation (and I absolutely do not), this is still plain jane mass copyright infringement. The submitters have given reddit a sublicense to publish their content, not random 3rd parties.


> The submitters have given reddit a sublicense to publish their content, not random 3rd parties.

This is tangential no? Third party reddit apps _already_ republish end user content.

I'm not saying third party apps powered by scraping are not illegal, I'm saying they're _no more_ illegal than those powered by the official reddit API.


Question: if someone, using the internet, gains access to your personal computer, but doesn't add anything or delete anything, have they committed a crime?


Yes. CFAA.


Interesting. I just did a quick read. The CFAA protects computer data and is limited to data in which the federal government has a legal interest; data of financial institutions; and perhaps some additional specifically enumerated parties. Its reach is further limited to the theft and subsequent use of data that causes specified types of harm.

So, while it almost certainly would not apply to our personal computers, I think it would probably apply to most commercial companies (provided they were large enough to constitute commercial commerce and provided data was used in a manner explicitly enumerated by the statute)

Edit: it's to its


The interstate commerce clause strikes again. The CFAA has been interpreted in court to apply to all internet connected computers.


Impressive feat! Does reddit have rate limiting, or other hurdles in place similar to the hoops youtube-dl has to jump through? Curious what your thoughts are about maintaining a project like that.


As history has shown, you can only do so much to stop this. If you perfectly mimic the GoogleBot and use google IP ranges by hosting on google cloud, they either take an SEO hit or let you bot them at the end of the day. GoogleBot looks like a DDoS attack a lot of the time too

You can also go the route of looking like a pool of users, then it's just a game of cat and mouse and one providers don't really have time to play


> As history has shown, you can only do so much to stop this.

History has shown you can stop this well enough. Try accessing e.g. instagram; bibliogram attempted this, the project is now discontinued.


This is true in the case of content that doesn't care about SEO. Reddit cares very very much about SEO, so it can never truly block bots.


The google scraper ips are very different no?

> If you perfectly mimic the GoogleBot and use google IP ranges by hosting on google cloud


You are right, apparently they do publish the ranges https://developers.google.com/search/docs/crawling-indexing/...


I've only done some rudimentary rate limiting checks and it doesn't seem like they do. Though I haven't pushed it far (~1000 rpm).

In any case, my plan is to deal with it if it becomes a problem.


Is the reddiw code open source? I imagine if your project sees any substantial usage, it won't be too long before you receive a cease and desist.


Not at the minute. I might open source it in the future, especially if I get a cease and desist. Though hopefully even if it comes to that someone like the EFF will help me fight it.


Not a lawyer, but it's probably a bad idea to open-source something _after_ getting a C&D for it


Yeah, true. I'd definitely ask for legal advice before doing that.


When visiting the link, chrome just suggested I may mean to visit "reddit", never seen that before. I have safe browsing disabled, so I'm at a lost from where this comes from. https://i.imgur.com/l57osTi.png


clearly an anti phishing filter


So like libreddit (sp?) and teddit, another readonly scraper, the real API supports write access.


Yep, very similar. My guess is that ~80% of API usage is read-only, so covering this should reduce API costs significantly for developers.


For the GET /r/:subreddit/comments/:article endpoint, is there a limit to the amount of comments returned? If yes, how is it determined which get returned? Sorted by top? Best?


Yeah, currently the limit is 50 comments. I plan to implement the ability to choose a sort and limit, assuming the API actually gets some usage. Right now the sort is whatever the new.reddit.com default is.


How will you deal with the huge onslaught of abuse?


Seems like scraping and providing an alternative API would violate TOS and will receive a takedown notice in due time.


You don't need to agree to TOS to scrape a public website, what they write in it is moot.


Reddit may not be a public website for long. They push their app and ask for logging in so often, it wouldn’t surprise me if they shut off parts of the site for not logged in users.


True, though that will seriously hurt their SEO.


They can still ban your service, e.g. block your IPs. Sure, you could then play a cat and mouse game with them, but your API clone going down every few days will make it unusable for anyone with a real purpose. The companies that reddit (officially) wants to target, e.g. OpenAI et al., have no problem scraping.


You can just host on Gcloud and report as GoogleBot, good luck with that. There's also plenty of proxy services.

> The companies that reddit (officially) wants to target, e.g. OpenAI et al., have no problem scraping.

Indeed, on their case they just need the datasets anyways


Wasn't there a recent SCOTUS case that upheld the need to follow TOS?


Is this open source? I’d love to help out


github.com/plebbit

Lots of repos, everything is open-source


So are you scraping reddit behind this ?


yep




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: