

Why isn't there a Linux distro out there that is made for huge web 2.0 infrastructures? - scumola
http://badcheese.com/?q=node/74

======
whalesalad
I think as a web professional I would want anything but this. I even tend to
dislike using apt-get for every package on my server. Some things need a
tender loving hand with careful consideration during configuration (I'm a poet
and I didn't even know it).

You can't really build a one-size-fits-all distribution for everyone. Sure you
can get a basic LAMP stack setup, but 1) is that what you really want in a PHP
situation and 2) not everyone is rolling a simple PHP website these days.

If what you're trying to solve is the problem with "scalability" then a Distro
isn't gonna help. A carefully constructed architecture and implementation is
what will. I personally tend to shy away from a big cluster of apt-get
installations in favor of installing certain things straight from source. You
can bet that all of the huge sites out there are rolling their own too..
Facebook has their own version of Apache and so do companies like Yahoo.

I think something like this might be good for simple in-house development and
testing, but again, not everyone's needs are the same. I totally love server
configuration, and oftentimes it gives me a real nice break from coding and
designing. There is nothing more fun and satisfying than getting a new server
up in just a few hours configured to the best of my abilities.

Whateva hopefully these comments make sense. Been up all night and running on
adderall and mcdonalds.

~~~
olefoo
You can build your own packages, with whatever tweaks you want built in. The
thing is once you get past a certain size, you've got too many servers to be
treating them as individuals.

And if you're past 4 machines you really want to start looking at cfengine.

------
jlouis
Here is what you do:

; You acknowledge that system administration takes time and effort. If you
have more than some 10-15 machines, you ought to automate the work of keeping
them in sync. You want to minimize effort of administration.

; You install something along the lines of puppet from
<http://reductivelabs.com> and use that to manage your server infrastructure.

; You install a monitoring suite for the infrastructure. Ganglia is popular
when the servers are forming a cluster.

With the above, any reasonable operating system will work, be it .deb/.rpm
based linux distros, *BSD, or OSX (I honestly don't know how to do this
properly in Windows).

------
kaens
I'm actually working on something like this in my free time for Ubuntu
flavors. I've been trying to figure out how to get some funding for it so I
can focus on it, but I'm not sure if it's a service I should try to charge
users for (automated customization of Ubuntu linux flavor live/installcds).

I figure that once I have a proof-of-concept I can show it to people who might
be willing to invest in it so I could afford to host it somewhere and so on.

Right now I'm focusing on package installation / removal, but I plan on adding
the ability to put some stuff in your home folder for first user creation, and
upload specific configuration files.

------
hugs
Instead of creating a completely new distro, I'd like to see effort put into
".deb"s or ".rpm"s for common configurations. Now _that_ would be awesome.

~~~
jrockway

        apt-get install lighttpd memcached ...

~~~
mechanical_fish
Can a distro really hope to do much more than this _apt-get_ one-liner can?

Yeah, you could install an improved collection of prewired config files for
everything so that all the tools are better integrated out of the box. But
given that each web 2.0 app probably has a unique configuration, which
requires a sysadmin to hand-edit all of those config files anyway, is it
really going to save that much time? Configuration space is kind of large. The
odds that the prewired configuration is a close match to the one you want may
be fairly low.

And if your code _doesn't_ need a custom configuration of servers, but is
designed to run on some kind of standard server-farm-in-a-box configuration
(with, at most, minor tweaks of a couple of config files), why are you even
installing your own distros? Aren't there hosting companies that run farms of
standardized boxes that your app can be designed to, and that will handle the
provision and administration of those boxes for a monthly fee? Kind of like
the Google App Engine business model? I haven't done business with such a
company, but I've been presuming they exist. Isn't this how (e.g.) Engine Yard
works?

The potential problem with the distro idea is that it sits between these two
business models (bespoke setups by your own sysadmin on the one hand, one-stop
shopping for standardized architectures, standardized server farms, and
standardized sysadmins on the other). Is there much daylight between the two?

------
shawndrost
Deprec (<http://www.deprec.org/>) takes you from a clean ubuntu install to a
live rails site, and claims to support heartbeat/linux-ha and multiple
servers, as well as some misc handy stuff like ntp. (I've only used it to
manage a simple single server.)

------
ojbyrne
Because the vast majority of web sites aren't huge. It's a limited market.

~~~
icco
"It's a limited market." Fantastic! That's the perfect reason to do something
like this. Sadly I don't think anyone will be doing it in one night, I believe
that this is a realistic request. The whole point of Linux (and Unix before
it) is to have a group of tools, a toolbox if you will, that can be easily
refactored, reorganized and redeployed to fit the problems you face. What he
is asking for is a toolbox that comes with the tools he wants from the start,
and he can't find a Sears that carries it. The only issue I have with the
whole thing is I wish he would do it, not ask others to solve his problems for
him.

~~~
aneesh
> "The only issue I have with the whole thing is I wish he would do it, not
> ask others to solve his problems for him"

Why do you have an issue with this? Isn't the whole point of a company to
solve its customers' problems for them??

~~~
mileszs
Which company? One of the companies that already make a distribution of Linux?
I'm sorry, but I don't understand whose responsibility you believe it to be to
create this distribution.

As the author of the article alluded to, if it were to get made, it would
probably get made by hobbyists.

------
olefoo
Have you had a look at Eucalyptus?
[http://eucalyptus.cs.ucsb.edu/wiki/EucalyptusAdministratorGu...](http://eucalyptus.cs.ucsb.edu/wiki/EucalyptusAdministratorGuide)

Of course the first thing that is going to happen is htat you are going to
have a batch of arguments:

Xen vs. VMWare vs. $virtualiser

cfengine vs. bcfg2 vs.
$(cool_haskell_config_that_will_be_cool_as_soon_as_it_is_done)

dpkg vs. yum vs. rpm vs. $packagemanagerthatdoesnotsuck

Ubuntu vs. Fedora vs. $(my_favorite_distro) vs. BSD

emacs vs. vi #this one may be orthogonal

It would be nice if we had something that just routed around all of those
arguments and helped to set up a straightforward way to manage multiple images
as part of one security domain, with monitoring and rekeying baked in.

~~~
shawndrost
I don't understand: eucalyptus looks like an open-source implementation of
EC2, so isn't the end-result a bunch of blank virtual boxes? Its server
management system might make it easy to load new slices with your custom disk
image, but if you have that image, why not just put it on EC2 and save
yourself the hassle of running a utility company in your spare time?

(Loading pre-built images to ec2 seems like a good answer to the op's
question; I know I've seen some built-to-purpose images around.)

~~~
olefoo
eucalyptus is only one part of what Steve seems to be asking for.

Ideally you'd have one location where you define the services you're running
and the variables you're monitoring and you'd be able to do things like rekey
the entire cluster in one operation, set ACLs for resources that exist on
multiple locations, schedule batch jobs by priority and deadline; and do all
this while totally abstracting away CPUs, filesystems, networks and all of the
messy and bothersome failure-prone hardware.

~~~
shawndrost
Wow! That'd be really handy. Does eucalyptus have that? If not, what part of
the problem does eucalyptus solve?

------
centuren
I don't understand exactly which version of Linux he's been deploying that
makes this a time saver. You can install Linux with 95% of what you need (and
nothing more) with plenty of distros, and one command gets you the rest of the
way.

The rest of the work is custom (user access, software specific requirements,
etc). I'm also the lead sys-admin for a dot-com company, and I think I can
safely say I've NEVER had a Linux install that wasn't, in his words, something
that I "can just deploy, configure and turn off what I don't need?"

That's the reason we use it.

------
froo
I just had either a brilliant or a dumb (take your pick) idea that has
sprouted from this thought.

It be cool if there was a site where you could just click the options you
wanted and it would create a customised distro for you, without all the added
extras?

I'm no linux guru and I dread the thought of having to set up a new system
under linux. If there was a site where I could just pay a couple bucks for a
customised distro for what I wanted (as its not always the same thing) that
just worked, I would absolutely pay it.

I think this kind of approach may also help to increase the adoption rate of
linux amongst less technically inclined people too.

EDIT - Sorry about the stream of conciousness, just noticed someone supplied a
link for SUSE, is there anything similar for debian based systems? Ubuntu
undoubtedly is getting the "lions share" off attention for linux systems, so
is there anything like this for it?

~~~
SwellJoe
<http://www.rpath.com/corp/>

Their "conary" package management system (developed by at least one of the
same folks that were involved in RPM, presumably with a reasonable amount of
confidence that RPM didn't really get it right) is designed for the
construction of specific purpose distributions. It, obviously, hasn't taken
the world by storm, so they seem to have evolved in a different direction of
late, and also changed their name I'm pretty sure--though I can't remember
what they used to be called (I could only remember the name "conary"). Anyway,
technologically speaking conary is awe-inspiring. But, I'm not using it...so
obviously, it's not providing value that I can figure out a use for.

------
wheels
Well, for our web-services server, stock Ubuntu Server and a 50 line script
does the trick. I have one tarball that I scp over, and it handles setting up
accounts, installing the right packages, applying a config-files diff, setting
up rc.local and starting the web-services.

I'm planning on setting it up eventually so that it'll automatically clone
from the most recent backup of our databases.

I wish there existed, and looked for, tools to manage such a roll-out, but
really, it was only a few hours of work to get everything set up. VMWare is
your friend there for testing the script, fixing, rolling back to the fresh-
installed point, wash, rinse, repeat.

------
timtrueman
I think that what software you use has little to do with scaling a huge site.
The biggest obstacle is how your data is organized and accessed. This is
different for every application. If you can't get the data schema right then
what software you have doesn't matter. That's assuming you're huge. If you're
not huge then just build something and don't lose focus trying to build
something that can handle 100 million users when you have 100.

------
sadiq
Maybe something like <http://studio.suse.com/> would be useful?

It lets you create customised OpenSuSE-based images for
deployment/installation. It would be fairly easy to construct the distro
you've got in mind, you'll just get the added benefit of having everything
kept up to date for you.

------
keefe
Will every startup have the same requirements? What if I want nginx instead of
apache and so forth? You are always going to want some customization, if you
are on EC2 then you can just create an AMI and I think there is some way to
save these distributions otherwise.

------
jotto
you can have this if you pay for joyent's accelerator (open solaris with
everything installed)

------
daveyd
Great idea!

