
How We Built A Data Center With Commodity Hardware And FOSS - khadim
http://www.searchenabler.com/blog/build-your-own-data-center/
======
gouranga
Looks a bit shonky to me, but if it works, good for you.

I worked for a design agency back in about 2004 which had the same sort of
infrastructure internally. They had a cheap desktop Exchange 2003 box, cheap
desktop Fileserver, cheap desktop SQL + App server, cheap desktop mp3 server,
big old DLT drive, SDSL router, switch, a KVM, cranky old monitor, keyboard,
mouse, mounds of cable, a well underpowered UPS and a garfield all wedged
under a couple of tables.

Some muppet knocked over a coffee cup and it spilled through the gap between
the tables into the monitor causing a cascade failure and a small fire which
melted pretty much everything under the table. It took the company out for 2
weeks until Dell could ship new kit in.

There is a reason to do it properly IMHO (and experience!)

~~~
khadim
We will avoid coffee at our garage ;)

We tried to control cost by assembling stuff, using commodity hardware etc.
But did not use low quality stuff. Our UPS is from Eaton, Server components
are standard ones like Seagate, Intel, Router/Switches from Cisco, D-link.

~~~
gouranga
In that case I withdraw my comment :)

Best of luck - sounds good.

------
ErrantX
Commodity hardware _seems_ a good/cheap option - and I don't know, this might
be fine for them.

I built a couple of racks for clustered decryption using commodity hardware;
after a few months "burn in" the failure rate proved to be way higher than
expected. Which, when you think about it, is logical - they are running 24/7
(compared to if you were using the hardware "normally").

After having to replace three servers inside a month we started to cycle into
server hardware. This was a _lot_ more expensive up front, but is reliable and
the failure rate is almost non-existent in comparison.

So; buyer beware.

Even so; it is good to see someone else hacking together their infrastructure
:)

~~~
jeffffff
what components were you having issues with the most?

~~~
khadim
SMPS and UPS batteries. I think as pointed in other comment, we need to have
temperature & humidity control.

------
robomartin
The very first thing you should do if you want to bolt-together your own
hardware out of commodity devices is invest in static control technology. This
is often not understood by those without experience in electronics
manufacturing. Static electricity can cause damage to components that will not
surface for months. Because of the nature of the mechanism it is almost
impossible to know that static was the culprit when a motherboard, memory or
drive fail.

In our case, we, among other things, do a lot of hardware manufacturing (as in
buy the chips and build our own boards). Because of this we invested in custom
conductive flooring and everything else above the floor is also static-aware.
For example, all rolling carts and shelves are wired with redundant contacts
that touch the floor, thereby making a connection.

The above is an extreme for the occasional builder. At a minimum you should
setup a working environment that has a conductive surface like an static-
dissipative mat and wear a conductive wrist strap connected to the mat as
well. There are also ionizers and other tools that would be good investments.

Handling components around server racks should never happen without regards
for static electricity safety. The rack should be grounded and you should wear
a conductive wrist strap connected to the rack.

I'll bet that a lot of failures in consumer/commodity builds are due to this
very issue and not necessarily connected to bad or questionable components. If
you buy components built by reputable manufacturers they will have been
manufactured in tightly controlled environments already. Don't break the
reliability chain by handling these components in uncontrolled environments.

~~~
khadim
Thanks Martin for inputs. We got connected to hardware expert in Bangalore,
who would be visiting our premise and helping us sort out some of the stuff
you mentioned. Sharing really helps. If would not have put together this
article, we would not have got so many inputs neither would have got people
voluntarily approaching us to help :)

~~~
robomartin
You need to think like an electron. Create conductive paths for static
electricity everywhere you can. Tables, shelves, floors, racks, desks. The
floor we have is specifically designed for clean rooms. As I understand it, it
is epoxy mixed in with carbon to make it somewhat conductive. In order to
install it they first clean and acid-wash the existing floor. They they scrape
it to make it rough (to improve adhesion). Then they apply flat copper
conductors crossing the room in various directions and generally ending near
grounded wall outlets. Finally, they apply multiple coats of the conductive
epoxy on top of the whole thing. The copper foil is longer than the width or
length of the room. The excess is connected via a thick wire to the grounding
circuit. The floor becomes the foundation upon which all else (in terms of
static control) is built.

------
xd
Reminds me of a "datacenter" me and some house-mates built in a basement when
in uni.

We stuck the servers on top of a few pallets so they wouldn't be submerged in
water when it rained. The power was fed from the living room above and the
network cables pushed through gaps in the floor boards to all the bedrooms.
The adsl router was nailed to the wall on the top floor above the beer fridge
and ran smoothwall I think it was.

We even tried running a serial link for a monitoring terminal in the living
room over standard audio cable .. it kinda worked .. and was called teethanet
for some reason.

Overall it worked like a charm and I don't remember there ever being downtime
.. even when there was a foot of water surrounding the servers.

~~~
khadim
Thanks for sharing xd. We had our share of problems initially, but now pretty
stabilized with hardly any downtime.

~~~
xd
What happens with your network when the power goes down in your area, do you
lose connectivity to the outside world?

Also, I just realised that I may have come across as a bit patronising! That
wasn't the intention, I was just relating to what can be done with a lack of
resources and a bit of ingenuity :)

~~~
khadim
We have Eaton online UPS, which as of now can provide 4 hours of power backup.
We've also got 3KVA power generator which runs on fuel, used only in event
when power failure is for longer durations and UPS batteries need to be re-
charged. Pretty common here.

------
illumin8
The problems you will face are not obvious and immediate problems. For
example, a year from now, a server might fail, and you find out that you can
no longer obtain that specific model of motherboard or RAID card that was used
in that server, so you have to replace the entire thing (purchase all new).
Or, you find that a drive in your RAID array fails and you can no longer get a
drive with the exact same model and disk geometry as the other disks in the
RAID, so you are unable to rebuild the RAID and have to basically start from
scratch.

The main reason why you should go with real server type components is the
support lifetime. You can be fairly confident that you can get replacement
parts be it CPU, memory, raid controller, motherboard, or disk drive, for at
least 5 years after purchase. Otherwise you will find that when your
components die, you will be completely unable to repair them due to lack of
available parts.

Good luck to you, but I've been down this road before and if you cut corners
in the beginning, it usually costs you more money to do it right later on.

~~~
khadim
Our assumptions is we can recover cost of equipment within warranty period
(Most of the components have 3 yrs warranty) Also, most of data redundancy is
handled at software level, I feel thats the best usage of FOSS like
Hadoop/Cassandra. We can avoid expensive high end hardware and build system
with commodity hardware components and rely more on cloud technology. You
might still be right, but hope things work out.

------
EchoAbstract
My questions about the data center are what kind-of climate controls they're
using (not listed in the article) and how they're making sure that the
generator and UPS are always in good condition. At my previous job we were
able to correlate many component failures to poor climate conditions (too hot,
too cold, too little humidity, too much humidity).

I also recall back in college setting up a 110 node linux cluster. Before they
upgraded the HVAC system in the server room the cluster generated so much heat
you the power cables started to shows signs of heat damage (we had to shut the
cluster down until the HVAC was beefed up).

I've also had a bunch UPS where the batteries fail, and need to be replaced
periodically. I also wonder how that figures into the cost of their data
center (it's an issue at a big data center as well).

~~~
khadim
Ya, we faced battery failure problems. We are using standard cooling via Air
conditioners. I think we will face more problems as we scale in this aspect,
need to work out.

~~~
genwin
If you had wind and enough control of the building, you could funnel wind over
water to cool, no power required. That was a method to make ice, a thousand
years ago.

I love what you've done for so little cost.

~~~
khadim
We don't have sufficient control as we are in rental premise. Superlike your
suggestion :)

------
RealGeek
Most parts of India suffer power outage of more than 12 hours a day. I
understand that you have UPS and power generators, but does it provide enough
reliability to run your servers on it?

I have UPS and generator at home; the cost of running a power generator
exceeds $1,000 a month but it is still not reliable.

Wouldn't servers at Hetzner cost cheaper and more reliable than hosting in
your garage in India? You can get an i7 server with 16 GB RAM is for $60 and
32 GB for $72, and no hardware setup & maintenance cost.

[http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
pr...](http://www.hetzner.de/en/hosting/produktmatrix/rootserver-
produktmatrix-ex)

~~~
khadim
Ravish, we are based out of Bangalore and so far we never had 12 hours a day
power outage. Max we have faced 4-5 hours of power outage for couple of
occasions in summer. In general power outage is not more than an hour or two
in a week, which is manageable with UPS & Generator combo.

From Hertzer pricing page, add 1Gbit port cost of 50$ a month, data transfer
charges etc would make it much more expensive. Considering our usage, we felt
private infra would be a better choice.

~~~
RealGeek
Do you have 1 Gbit leased line in Bangalore? How much is the data transfer cap
and who is your ISP?

~~~
khadim
There is no data transfer cap for ILLs. We use You & Tulip. 1Gb port i
mentioned, as we need high speed communication across nodes. We use Cassandra,
Hadoop so internal data transfer rate is very high. So for internal
communication 100Mbps creates bottleneck, but definitely sufficient for
external communication / bandwidth.

I commented from what i got from pricing page. Apart from basic server cost,
there's always associated extension hardware/data/software cost which
generally makes it much more expensive.

If you have more details on cloud setup using Hetzner and associated cost, we
would be happy to explore.

~~~
moonboots
Hetzner can set up a private gigabit vlan upon request[1]. They charge a one
time setup cost for the switch (30 euros) and 15 euros/month/server. Internal
and incoming traffic is free. With this setup, internal data transfer is at 1
Gbps while external transfer is at 100Mbps.

    
    
      [1] http://devblog.supportbee.com/2012/06/04/building-a-dependable-hosting-stack-using-hetzners-servers/

------
endeavor
Interesting write-up, but I expect you will have some problems over time with
this.

In my experience the computer failures are relatively easy to manage. Power
will be your primary issue, and to a lesser degree, internet. When we ran a
lot of servers on UPSs, the UPSs start to become a major pain. UPSs fail much
more often than hard-drives. And when the battery dies it cuts off power to
the computer. You're using low-end desktop machines, so you probably don't
have dual PSUs in each machine (if you did you could run each PSU to a
different UPS). UPSs are also much more likely to fail when the grid power
fails. So the power goes out, your UPSs will kick in, then 30 seconds later
one or TWO fail. Can your power distribution can handle that? I'm no UPS
expert, but this seemed pretty common across a variety of UPS products.

As for internet, multiple links is a good start. Load-balancing outbound
traffic is fairly straight-forward. Inbound traffic is much harder. Really you
need BGP. A BGP-based set of links isn't cheap, and doing the configuration
yourself is hard if you don't have experience with it. The alternatives that
aren't IP routing based are usually based on DNS, which is better than
nothing. But the effectiveness of it is dependent on how your users
servers/clients/browsers implement DNS caching. It will work OK for some users
but some will cache the stale link IP for 24 hours -- not cool.

You certainly can work this way, and probably get three nines of uptime. But I
can say now that we switched to a decent colo I sleep a lot better. We can
focus more on building product, adding value for our customers, than screwing
around with the utilities.

FWIW I'm basing this off of my experience in the United States.

------
vamega
Could Co-Location/Dedicated servers not give the same benefits with regard to
costs? The cloud was not designed for continuous predictable workloads
(although people use it for that purpose).

------
mamcx
Perhaps good idea to use something like
[http://www.amazon.com/HP-658553-001-ProLiant-Server-
System/d...](http://www.amazon.com/HP-658553-001-ProLiant-Server-
System/dp/B005KKJPCO/ref=sr_1_1?ie=UTF8&qid=1341589314&sr=8-1&keywords=hp+mini+server)?

~~~
miahi
It's small, it's a server, it's cheap, but it's also terribly underpowered.
Any desktop i3 CPU is three times as fast. It doesn't support more than 8GB of
RAM or more than 4 thin (LFF) HDDs, so you cannot use it as a storage device
or RAM database.

------
asto
How does this compare with colo? Did you compare costs and benefits?

Edit: Never mind, I just realised colo won't work for you because they'd
probably expect servers that fit correctly in their racks and your computer
cabinets probably won't.

------
PanMan
I wonder if this is more efficient. We rent hardware including power and
connectivity for about the same as they monthly pay. But we had no upfront
investment, and no hardware to mess around with (which saves time).

------
zerop
Good example of "Jugaad" :)

~~~
Samuel_Michon
Interesting! I wasn't aware of that concept, but Wikipedia has a page on it:

<http://en.wikipedia.org/wiki/Jugaad>

I only know the word from Sikh prayers in the sacred language of Gurmukhi. For
instance, it's part of the Mul Mantra, in which it means "throughout the
ages". As I'm not well versed in Punjabi or Hindi, I didn't know it had other
meanings.

~~~
zerop
Hindi word 'Jugaad' means finding a work around and less costly way of doing
things. It mostly comes in form of sharing/saving/using a resource wisely.
e.g. In India, Missed call does not charge anything to the caller. So it is
often used as a way to indicate something or signal some event. For example,
asking a friend to give you a missed call when they reach home. Some pics
about Jugaad here: [http://www.quora.com/India/What-are-your-thoughts-on-how-
the...](http://www.quora.com/India/What-are-your-thoughts-on-how-the-concept-
of-Jugaad-is-practiced-by-todays-youth-of-Indian-metro-cities)

