I was confused about the memcached problem after moving to the cloud. I understand why network latency may have gone from submillisecond to milliseconds, but how could you improve latency by batching requests? Shouldn't that improve efficiency, not latency, at the possible expense of latency (since some requests will wait on the client as they get batched)? And while maybe efficiency is valuable, why would that be an improvement for a problem they didn't have before?
Sorry that wasn't clear. The latency didn't get better, but what happened is that instead of having to make a lot of calls to memcache it was just one (well, just a few), so while that one took longer, the total time was much less.
As long as nothing's blocked, latency could go up 'a lot' (sub-ms -> ms, maybe 1ms->2ms with batching) without meaningfully impacting overall throughput.
I can definitely see millions of networked memcache calls being a bottleneck, and if the batching adds another ms per req on average, but removes the bottleneck, then they can serve a lot more users at a cost of 1ms per req.
Is there anything in TFA that would support my theory? I don't know. I don't care enough to endure InfoQ. (I did for a Rich Hickey talk once, lo these various months, and yea it were a minor inconvenience).
Disappointment in the HN community has reached a new high today.
So, instead of discussing the topics in the video, the majority of commenters here are discussing the flaws of the website it's hosted on or debating whether or not reddit is profitable. Neither of which has ANYTHING to do with scaling.
The dominance of non-relevance is interesting for HN. Was writing exact the same thing, and just saw yours.
In this and some other technical topics, people end up discussing their personal tastes with web site's design, their individual UI frustrations with some button on the web site, the font, the color, and other random non-relevant topic; like now the profitability.
It's no excuse, but I see a reason for this: most people feel they understand these irrelevant topics better than they understand the scalability features of the infrastructure Reddit has built. People talk about their comfort zone.
What a horrible website. Made an account to get the mp3. Just to discover that the mp3 download link doesnt work even then. Thank god I used a throwaway email to sign up. Otherwise I would be even more pissed.
I wouldn't call that simpler. You have to register using a fake email, possibly actually go check the inbox and click a link, and possibly be required to log in again now that the account is registered. With bugmenot, all you do is copy and paste a valid username and password.
I think his point is that he doesn't understand accounting and how complicated it can be so he thinks that if you were deeply involved in the business you should know every facet of every debit and credit.
If he's right I guess I wasted about 300 hours studying for Financial Reporting and Analysis section in the Chartered Financial Analyst (CFA) program.
Large banner and tiny video frame, site rendering completely garbled for 8 seconds until fully loaded, signup required to view the slides (after realizing that the video doesn't cut to slides at all). The talks are interesting, and the interview transcription is nice though there are UX issues there as well– I need to click to view each response, and despite the entire question being hyperlinked clicking it actually does nothing. Show all works thankfully (ah, but the frame breaks my scroll).
I'll probably be back to check out more of the videos, but definitely not because of the site. If the editing is good, YouTube is just fine, otherwise SlideShare plus an audio file is just perfect.
[F] First Byte Time
[C] Keep-alive Enabled
[C] Compress Transfer
[A] Compress Images
[A] Progressive JPEGs
[F] Cache static content
[ ] Effective use of CDN
What is it about video + slides that appeals so little to people?
A full transcript with interleaved slide images takes a few minutes to read at most, and lets you control the pace of information absorption. A video with slides, especially when you cannot 2x the talking speed, is a painfully slow data transfer method.
Video + Slides = analog modem
Transcript w/ embedded slides = Google fiber :-)
Of course, if you're into audio books instead of reading, maybe you consider that a feature.
// I push video for a living. It's great for visual explanation like DIY instruction (e.g. woodworking, swapping RAM on a Mac Mini), emotional content, personal story telling, etc. Systems architecture is generally not in one of those categories.
> What is it about video + slides that appeals so little to people
I don't have a solid block of 40 undisturbed minutes to listen to a talk. Give me a transcript and I can read a paragraph here and there as I do other things at my own pace. I might have ten minutes here, ten minutes there. I don't want to be constantly pausing/unpausing the video, or worse - switching between the video and my music.
Plus, if I concentrate, I could read a 40 minute talk in 20 minutes or less.
Basically, when I'm reading, I control the pace. I rarely watch videos that are longer than about 5 minutes (that aren't entertainment, which is entirely different).
Site kinda sucks and is annoying to use for those of us that prefer textual resources and the ability to flip through slides ourselves without clicking around a teensy tiny 1-px sized gaps between slides.
I can read a slide deck at a rate of 5s/slide (or faster), pausing to concentrate on the ones that interest me. I don't have the spare time to watch a video of someone, just in case the topic is interesting.
Great presentation man. Do you think Pylons will serve Reddit for years and years to come? Is there any need for you to switch to Pyramid or another framework? I made the choice to use Pylons for a recent project, and it just feels kind of odd using an old framework which is now in "maintenance only mode", but I truly did not like Pyramid... much less Django.
Does anyone know how the different storage systems are utilized, and why each system is utilized for that purpose? The presenter mentions using memcached, Cassandra, and PostgreSQL, and mentions the same type of data when discussing each (votes, for instance). I would definitely benefit from a more in-depth understanding of how each system is utilized, and why.
Each tool has a different use case. Votes is a great example.
Memcache has no guarantees about durability, but is very fast, so the vote data is stored there to make rendering of pages as quick as possible.
Cassandra is durable and fast, and gives fast negative lookups because of its bloom filter, so it was good for storing a durable copy of the votes for when the data isn't in memcache.
Postgres is rock solid and relational, so it was a good place to store votes as a backup for Cassandra (we could regenerate all the data in Cassandra from Postgres if necessary) and also for doing batch processing, which sometimes needed the relational capabilities.
That makes a lot of sense. Were the majority of your systems using this "durability chain" so to speak -- memcached -> Cassandra -> Postgres? Additionally, in retrospect do you find this type of chain to work well, and would you use it again (perhaps you already are over at Netflix)?
No. That's generally because once something gets popular and jumps to the front page, it gets a huge boost in visibility, especially from people who weren't looking at the niche subreddit it comes from.
A lot of those people aren't interested in that content, so it will suddenly get an influx of downvotes.
thanks for the insight. i did not have much to do with Go. But hear a lot of positive benchmarks here on HN. Reddit is kind of like an example social web app, and I read a lot on its architectural changes and scaling efforts. So for social Web 2.0 apps, which are getting older now, I m curious how much Go would make a difference. Mainly for Google applications, apparently Go brings a lot of speed and performance on the same server.
It doesn't really matter what the genre of an application is. What matters is the runtime fundamentals. How much time is spent computing vs waiting for I/O? Whichever one is slower is the current bottleneck and is what you should fret about. Go becomes something to consider if the bottleneck is computation time. It's tangential otherwise.
Some time ago the HN hivemind went through a period of blind Node.js love. Now it's going through blind Go love.
Regardless of what you think of either language, "rewrite in X" is not a magic incantation that will spontaneously solve all your architectural issues. Designing a good architecture involves balancing many components, of which your primary implementation language is an important, but not exclusive, element. There are also the organisational issues -- hiring, spending time not adding new features, etc.
Perhaps so, but then it would fit into the category of 'unaware of being a parody of itself' statements/questions that are also (unintentionally) humorous such as the classic "I can tell that site was built in rails from the design".
Profitable is not that big a deal with something on the size and important of Reddit though.
Firstly, as with Wikipedia, if Reddit were forced to close because of money issues, Reddit could simply post a 'donate now or reddit shuts down' post and they would likely be rolling in millions of dollars.
Second, simply because reddit itself is not profitable does not mean people are not making a lot of money off reddit. The moderator system lends itself very well to a kind of 'corporate capture' of communities where moderators can be (and are) bought off for very tidy sums.
> Firstly, as with Wikipedia, if Reddit were forced to close because of money issues, Reddit could simply post a 'donate now or reddit shuts down' post and they would likely be rolling in millions of dollars.
From what I remember, this is kind of why they started Reddit Gold.
There is an issue of who calls the shots -- if you solicit donations from your users, that's who you are beholden to and need to serve to get money. If you are soliciting third party advertisements, that's who you are beholden to (and if you are using a third-party ad placement service, you are beholden to them as well as, perhaps more than, the actual advertisers.)
I really wish wikipedia would just put small unobtrusive text adverts on each page rather than the massive intrusive banners begging for money.
Hi, welcome to your first day on the Internet. Since you're new, let me tell you how things work around here.
There are probably dozens of web sites similar to Wikipedia. But Wikipedia is on the first page of search engine results for just about anything you search for. Why is that? Because people have learned that they can trust them over the last 12.5 years.
When you go to Wikipedia, you know that when you're looking for information on the Battle of Hastings that you aren't going to see ads for anatomy enlargement pills. You won't see any advertising at all in fact. You know that the community at large does a decent job at removing biased information. You know that a company can't buy their way into hiding negative information or promoting positive information.
This level of trust is what causes people to link to Wikipedia thousands of times per day.
So let's say Wikipedia takes your advice. They put a small unobtrusive text advert on each page. Suddenly you're searching for information on acne and an ad for "Acbegone" pops up that promises to cure your problem for 3 easy payments of $19.95. Acbegone ends up becoming a huge advertiser with Wikipedia - spending $1 million per month on advertising. Suddenly Wikipedia gets The Phone Call. "Hi, this is Acbegone. We'd love to continue advertising on your site but your article on acne mentions 10 other products. Get rid of those and we'll double our ad spend with you. Don't get rid of them and we'll be forced to stop advertising." Wikipedia can't make do without the income they've become accustomed to so they make editorial decisions to not mention any product - but still there's that ad from Acbegone. Suddenly Wikipedia seems like one huge cheesey ad. People stop trusting it. People stop linking to it. It stops coming up in search engine results.
Your hypothesis about an advertiser asking wikipedia to alter content surely applies to google search results.
Google indexes other people's content. All Google has to say is, "Sorry, we're not in control of the content others make, our automated systems follow an algorithm we're unable to make one-off tweaks to." It could conceivably cost Google $1MM to make a one-off tweak to their algorithm in terms of programming and testing time.
Wikipedia on the other hand is all content. They have no plausible response other than, "Yeah, it would take 5 minutes to update that but we won't do that for you." Hell, all they'd really have to do is let the advertiser update it as they want and then instruct editors to do nothing.
It really is just different for this and a number of other reasons.
The problem is, if wikipedia did alter articles based on advertisers demands (Which seems pretty far fetched to me), the public would just alter them back. Or see the edits wikipedia is making and put 2 and 2 together.
>and then instruct editors to do nothing.
Yeah good luck with getting wikipedia editors to comply with that request!
A site like wikipedia would likely have thousands upon thousands of advertisers. They wouldn't be dependent on a few big advertisers. If an advertiser came to wikipedia and asked them to change a page, wikipedia would just say "no", publish the details to make the advertiser look like a douche (cue internet witch hunt, boycot naming shaming etc), and not care about the 0.000% temporary drop in revenue.
Without cash flow, how far can the site go? Someone needs to pay the bandwidth/server bill. Not to mention it costs time to run a site. If you are working full-time at another job (because you aren't bringing any money in), you won't have any time to work on it.
Reddit succeeded largely because the company that bought them is making money elsewhere.
AllAdvantage was great for free money as a teenager. It only took a few minutes to slap together a VB application to move the mouse a few pixels every minute. I made a few hundred dollars from them while I slept.
That was more of a general statement than a comment specifically on reddit's position.
I can make millions of dollars selling condom wrappers, but just because I have made millions of dollars does not mean that I have done something important. I may catch quite a bit of hate for this, but a large portion of HN's content is on things that make money, but are not truly important.
I think the mantra from the startup community is often:
1. Make money doing whatever it takes. eg come up with some crappy website, sell it to google, then shut it down.
2. The money problem is solved!
3. Spend money solving world hunger, diseases, philanthropy.
I would find it incredibly strange if reddit is never able to make money. I don't think they are really trying at the moment.
Reddit is an incredibly valuable service. Maybe a lot of people on Hacker News don't see this, but reddit has basically become the Geocities of online discussion communities. The subreddit system has eliminated the "eternal september" problem, since all non-casual users will trickle into the communities that match their interests. Even if reddit loses 90% of its users, it will still be a highly relevant online community. I am certain that they can turn a (modest) profit if they really try.
Reddit will probably never become a massive money machine. But regardless, it is a very influential community. Even community is arguably an understatement at this point, it is really closer to infrastructure. As I have said here before, I would be willing to bet that it is still around in 10 years, with a significant (millions) amount of users.
> * I don't think they are really trying at the moment.*
I can - sort of - confirm they are not trying hard. Last time I tried to advertise on Reddit, I failed because they could not accept CC payments from mainland Europe ... Just think of all the ad revenue they are losing.
Advertising on Reddit is not the same as advertising in general.
If you advertise on Reddit, you're advertising to a violently anti-corporate anti-advertising audience, who may love you, but very well may hate you. You could be subject to a witch hunt at the drop of a hat.
I very much doubt advertisers would be lining up to advertise to that crowd. They're hardly big spenders either.
This is an unsubstantiated claim. Some of reddit's communities are like this, but most are not. If you're just viewing the front page, you are viewing the lowest common denominator, which could give you this impression. But reddit is a very heterogenuous community.
We operate a dozen of our own colos, with a virtual colo on AWS for insta-scalable multi-region redundancy, and an Amazon "colo" costs the same as about eight of our own when spun up and serving at least a gigabit of traffic.
However, the difference is less if you're going from zero sys admins to 24/7 says admins. I'd SWAG the crossover is once your AWS budget exceeds 4 full time sys admins willing to do shift work.
/r/all is the internet’s largest right-wing community, on any manner of subjects from race relations in America, to multiculturalism in Europe, to feminism and women’s rights anywhere. Last time I visited was around the Zimmerman verdict, and I couldn’t decide whether the conversation on reddit more closely resembled Free Republic or Stormfront—the major difference being that neither of those other right-wing communities can match redditors in their hatred and fear of women.