

The Costs of Bookmarking - jambo
http://blog.pinboard.in/2011/09/the_costs_of_bookmarking/

======
mmaunder
This seems too expensive and I'd like to try and help, so I'm going to give
you an idea of my hosting bill and why it's low and then suggest something for
you:

I pay around $3k per month. I own my own servers and lease a full rack and I
serve roughly 1 billion page impressions per month. My bandwidth consumption
is measured in Mbps rather than amount of data transferred because I get
billed using 95th percentile billing. I average around 130 megabits per second
of transfer - constantly, peaking at 150mbps I'm transferring roughly 40
terabytes of data per month. 95th percentile billing and owning your own
servers is they key here.

To give you an idea, for one month of your hosting bill you can buy 1,
possibly 2 servers from Dell and put them in a half rack that will cost you
around $800/month including power, secure access, bandwidth, etc. Those
servers will last around 5 years with a possible drive replacement or two
during that time for a few bucks.

But I think you have another problem that's making things worse. With 15,000
active users you should be able to support them on one or two small Linode
servers using round robin DNS. That's a relatively small user base and the
number of requests per second can't be anything more than 10 per second? So
I'm guessing something about your basic app architecture is off. It could be
that you're not using Nginx to reverse proxy to Apache and you think you need
more apache children/processes, and therefore more memory, than you actually
do. You could have a DB that doesn't have indexes in the right places and so
you're IO bound.

I would suggest first looking at your app and seeing where the bottlenecks are
in performance. Fix that first, then look at hosting.

Questions:

-How many servers do you currently have and give us a rough idea of config.

-How many app requests per second do you get at peak?

-What's your peak bandwidth throughput in Mbps?

-On your servers, is lack of CPU or lack of disk throughput the bottleneck?

-Have you had problems running out of memory that caused you to buy more servers? Which app ran out of memory?

-Give us an idea of your server config. e.g. nginx -> Apache -> MySQL & Redis. Do the servers talk to each other and if so what do they do?

~~~
patio11
This is a generous offering of expertise, but supposing you were to help him
out and he were to take on implementation costs, wouldn't "Here's what I'd do
to get you to a billion impressions a month of _customer demand for your
product_ " be a higher-priority task than saving a few hundred bucks?

The most common failure mode in scaling for startups is _to have no scaling
problem at all_.

~~~
wheels
Costs that are that out of whack with expectations would usually indicate some
very low hanging fruit. If you can save $800/month by adding a couple of
database indexes and tweaking your config file, then yeah, it's worth it.

That said, from what Maciej has written in the past, he sounds competent in
these things (i.e. has his databases set up for remote replication and
failover), so it would seem that the culprit is more likely over-engineering
than under.

------
StavrosK
Hmm. For another data point, <http://historio.us>, which has about 3k active
users, costs $30/mo to run. I'm sure costs wouldn't scale completely linearly
with users, but there you go.

~~~
ThaddeusQuay2
HELLO! I recently signed up for your service, tested it out a bit, and really
wanted to give you $20 for a year's plan, but was dissuaded by the fact that
there are no blog entries since November 25th, and by the fact that indexing
of PDFs, a feature request which was made about a year ago in one of your
polls, is still not done. Can you tell me the status of the PDF indexing,
and/or about the general maintenance and updates that are going on behind the
scenes? I like your service a bit more than Pinboard, despite their additional
features, and I would still like to commit to yours, in the form of a paying
customer, but I first want to know that it hasn't been abandoned, or at least
that it's not totally on autopilot. Thanks in advance for a detailed reply.

P.S. I really hate to point this out, but $30 per month does not inspire
confidence. How do you handle backups, and switching over to another machine
if the main one should fail (as well as other such items, which normally incur
an extra cost in this kind of setup)?

~~~
StavrosK
Hello! The service hasn't been abandoned, but feature development has slowed
down a bit, sadly, hence PDF indexing taking so long. Apart from that, though,
everything is working perfectly, and we do change some small things from time
to time or upgrade/maintain components. Downtime should be excellent, though,
we've had minimal downtime for the past year.

Right now, everything is served from a single server, which is why we get hit
by some datacenter maintenance from time to time. Backups are made daily as
well as multiple times a week, so that shouldn't be a concern... Let me know
if you have any other questions!

~~~
ThaddeusQuay2
I like your interface a bit better than Pinboard's, because yours is more like
Google's. I like some other things about yours, although I don't recall them
at the moment. However, I do remember that I was able to bookmark PDFs, even
if they aren't searchable. So, I can at least add tags.

I see that your Twitter is rather current, but it wouldn't hurt to do a
monthly blog update, even something minor, because I assume that there are
other people, than myself, who look at these details when it comes to deciding
whether to commit to a service. Said commitment is really less about money,
and more about the perceived reliability, stability, and long-term viability
of the service.

I suggest that, in the export function, you include everything about each
bookmark, such as tags, dates, and whatnot, because it will likely make the
customer feel better to know that they can get a complete snapshot of their
efforts, whether they are leaving the service, or whether they are attempting
to occasionally perform some external operation on their data. Giving
customers complete ownership of their information is a good differentiator,
because most websites don't provide such a thing.

My last question: How do you handle (D)DOS attacks on public bookmarks? I can
see this as being a potential problem.

Thank you for providing an alternative to Pinboard!

~~~
StavrosK
Hmm, you have a good point about blog updates, it's just that we use Twitter
for minor announcements. I'll have to change that, though.

Export already includes all the data except the actual page text, so by
downloading that file once in a while you can reconstruct your bookmarks
almost perfectly (or just import it into your browser).

There haven't been any DoS attacks yet, but varnish is a champ, so that would
be pretty easy to mitigate, depending on the scale...

~~~
eropple
Varnish is nice for load spikes, but how do you cope with, say, a SYN flood?
(Honest question, I'm curious.)

~~~
StavrosK
We don't, we haven't needed to yet.

------
code_devil
$2k for 15000 users. <wrong>Each user is worth $7.50/month to you. But, they
only pay once to sign up.</wrong> Each user is worth 13 cents.

So, it seems that the users paying for the archival service($25/yr) are the
one's keeping the lights on at your bookmarking service. 1000 such users would
bring in $25K. Basically you need 7% of your users(15000) to break even today.

Each user is worth $2000/15000 ~= 13 cents

My Question:

1\. Is it easy to get recurring paying users?

2\. In the long run(for any paying web service) what % of the user base do you
think will be such users ? [Is 20% the max ? or 30 % ?]

EDIT: Fixed my cost per user

~~~
alexallain
Did you mean to compute the cost/user? If so, that comes out to a little over
13c/month. Still, that comes to $1.60/year, meaning that a user would start to
cost money right around the end of the 6th year of usage (assuming hosting
costs don't go down).

But if they're not paying for archival, are they really the folks who are
causing the high hosting costs? As other have speculated, it would seem that
the archival service might be the major reason that hosting is so expensive.

~~~
code_devil
Yes. I computed it reverse. Thanks.

------
latitude
Since we seem to be sharing the numbers now, let me add my 2c. I ran the
Hamachi service [1] with 3 mil registered accounts off 4 co-located servers at
a cost of about $700/mo in total. The servers were mid-range 1U Dell boxes,
each costing around $1K. That was back in '04, so I'm sure prices have gone
down quite a bit since then.

[1] <http://en.wikipedia.org/wiki/Hamachi_(software)>

------
joshu
FYI, Maciej is banned from HN - his comments are autodead, so don't expect
responses.

~~~
1336
Why?

~~~
prawn
<http://news.ycombinator.com/item?id=2985137>

------
teoruiz
It would be enlightening to know a bit more about the actual server
architecture of the site. I understand that the storage footprint per user for
a bookmarking site is quite big[1], but I still think $2k for 15k users is a
fairly high hosting bill.

[1] Since they offer full archiving of bookmarked websites to their premium
users.

~~~
haraball
There's more information about this in earlier blog post, e.g.:
<http://blog.pinboard.in/2011/08/a_short_rant_about_hosting/>

~~~
sparky
As noted in that post, Pinboard's hosting costs are so high because it runs on
(multiple?) beefy dedicated servers (8-16 cores, 24-48GB RAM) costing
$500-$1000/month each.

Presumably he has gone this route in order to keep a large database mostly in-
memory, but I'd be interested in hearing more about the application
constraints that led to this architecture. Some applications certainly need
beefy servers, but they have to provide a lot of bang for your buck to compete
with the 10-50 VPSs you could rent for the same cost.

~~~
sciurus
When you need 48GB of RAM, having 50 VPSs with 1GB of RAM doesn't do you much
good, not to mention the i/o issues he describes.

~~~
sparky
I agree completely. It's the genesis of "needing 48GB of RAM" that I'm curious
about. The I/O issue with shared hosting, as I understood it, was that it took
a long time to rebuild a busted RAID volume. This, to me, seems like a
different symptom with the same underlying cause: there is a big server in the
middle whose operation is critical to the site (otherwise, having 1 of N
replicas take a long time to rebuild wouldn't be a big deal). For some sites,
this big-server-in-the-middle is fundamentally necessary to do what the site
does. I would like to know what it is about Pinboard that puts them in this
boat. This is an honest question, not a sneaky way of saying "surely they
don't need a server that big."

------
aquark
Thanks for sharing the data.

Can you break this down into how many actual servers you have? Is providing a
high level of redundancy the source of the high costs?

Why do the S3 numbers fluctuate so much -- that implies a lot of transient
data. At $100\month you are storing ~50MB per active user?

------
mibbit
That's an amazing amount of money to be spending on hosting.

Should be spending nearer $200/mo total for hosting a service like this with
that number of users IMHO.

~~~
sramov
Why do you think so? Not all sites are created equal. And Pinboard is
definitely on the lean side.

Try hosting a cluster of Magento 'daily deal' sites on Amazon infrastructure,
with shared storage and isolated RDS instances for each region. You end up
with 15-20k $ per month for god damned bloated magento installs.

~~~
mibbit
...then don't use magento.

What I said was that for the functionality pinboard provides, and the number
of users using it, the hosting costs are extremely high.

If the hosting costs are high because of inefficient software, or bad
architecture decisions, then those should be changed.

~~~
patio11
_If the hosting costs are high because of inefficient software, or bad
architecture decisions, then those should be changed._

If anyone here believes this, make your best estimate as to how many man-
months you need for a re-architecture and how many _hundreds_ of dollars
you'll save in hosting costs, then do the division to get your effective
hourly rate. I'll pay you that plus 50% for contract programming work.

~~~
mibbit
In the example, the current hosting is $24k/yr. If that can be slashed to
$2.4k/yr, you've saved $21.6k a year in hosting.

For $21.6k/yr I'd say it's worth a week or two re-architecting.

Yes, it's a different game if you're profitable and $21.6k is negligible, but
if you're a startup you should be spending time to optimize things.

The other point is one of scaling. If you're paying $2k/mo to support 15k
users, when you scale to 15m users, you could be paying $2m/mo unless you fix
things early on.

~~~
adambyrtek
What makes you think that in "a week or two" you are able to cut costs by a
factor of ten without sacrificing performance and reliability?

~~~
mattmanser
Well, there's a lot of people here expressing surprise at the cost of hosting
a small user base of simple data in a not complicated problem domain.

It looks an order of magnitude too expensive to us, there's probably some
simple thing wrong in the architecture.

Maybe we're all missing a key complexity of the service. The Archival service
might be it.

~~~
jarek
I'm imagining the full text search of the archive to be a factor, but then
again I don't know much about search.

------
huhtenberg
Maciej, can you share that google doc in a way that would let viewing it
without needing to create an account with Google?

~~~
icebraining
Saved as HTML and reuploaded: <http://pastehtml.com/view/b6z5s6bc5.html>

~~~
huhtenberg
Thanks

------
brianbreslin
what % of the users are paying? what is the average rate they each pay (since
i know you had the each additional person pays .01 more thing for a while)

What was your rationale behind these ISPs vs say pure cloud
(EC2/rackspace/stormondemand etc)? Or what made you pick each option?

~~~
qxb
The answer to your second question is covered here:
<http://blog.pinboard.in/2011/08/a_short_rant_about_hosting/> ("Why not go
with Linode/AWS/[other virtualized hosting]?")

tl;dr - I/O performance

