Hacker News new | past | comments | ask | show | jobs | submit login

I think this only solves the cost problem. VMware is damn expensive for what it is.

I'm not really a fan of virtualization myself. We canned it at our organisation a couple of years ago. We had 30 ESX/vSphere managed hosts across 3 data centers which hosted app servers, web front end servers, virtual load balancers and some other minor infrastructure. The final straw was the upgrade costs. Not only that we had problems with volume size limits on our SAN which would require more expense and hackery to work around. Also, we couldn't host our database servers on top of it due to performance problems so we had to have dedicated machines there anyway.

The whole thing at the end of the day just added complexity, expense and didn't improve reliability, security or load distribution as our architecture was sound on that front already.

We've gone back to the original virtualization system: processes and sensible distribution of them across multiple machines. Performance, cost and sanity have improved.

I can understand that virtualization is useful when your resource requirement is less than one machine but above that, I doubt there are any real benefits. It's snake oil.




>It's snake oil.

As opposed to a miracle elixir that cures all ills?

It's a tool.

Like most tools, it's best used when you thoroughly understand the challenge and how/why to apply that tool to it.

The problems you've noted here seem more indicative of poor planning and understanding of the solution than any intrinsic deficiency of virtualization.


Its a shit tool that creates issues and costs fuckwads of cash. That is it.

We understand it. The numbers sold are not the numbers gained.

The entire infrastructure deployment was planned around virtualization, DC failover and resilience using VMware's guidelines and solutions (VMware were even paid to consult on this). It never delivered and doubled our administrative and licensing overhead.


VMWare sales from what I have heard, has a history of over-selling, over-promising and ultimately getting people to spend a crap-ton of money on their products.

Honestly I have to say, it sounds a bit like you guys got taken for a ride. I'd suggest sitting back, relaxing and taking a look at what other VM options there are out there, some of which don't have license fees attached to them.

When I first deployed virtualization there were really only two 'enterprise' ready solutions out there which were essentially VMWare or XenServer. Knowing about VMWare and their 'history' I chose XenServer. I haven't regretted the choice but they wouldn't be a fit for everyone.

These days there are multiple numbers of solutions out there to choose from, which Openstack may or may not be one for you... and that aren't going to whisper into your ear about how great their product is and how much money it will save you.


I see people being WAY-oversold on the SAN more than the software.


I echo this observation.

By now, I'd actually be more surprised not to find a VMAX or Shark in an "Enterprise" datacenter.


>Its a shit tool that creates issues and costs fuckwads of cash. That is it.

Calm down please.

>The entire infrastructure deployment was planned around virtualization, DC failover and resilience using VMware's guidelines and solutions (VMware were even paid to consult on this). It never delivered and doubled our administrative and licensing overhead.

That might say something about VMWare's specific solution and you being taken for a ride, but it's nothing intrinsic to virtualization as a tool.


Exactly. Virtualization/Cloud are only good for scaling down, not scaling up. The 20-30% hit to IO, additional latency of SAN vs local storage, management overhead, and added complexity to infrastructure more than negate any potential benefits of improved load distribution when you're dealing with applications that use anywhere near the capacity of physical servers.


20-30% hit in IO? It hasn't been at that level for a long time. With new KVM versions and good Intel processors there is a performance hit of as low as 5% these days. That's not to necessarily say your particular workload will see that, but 10% is generally an 'at-worst' level at this point.


Its definitely 20-30% for realistic workloads using VM based tech on top I.e. CLR/JVM or a database engine which is realistic. This is on top of VMware. I can't speak for Xen.

The outcome is pretty grim.


Exactly. I often see claims of 5-10%, but I've yet to see any reliable set of benchmarks done with those results. Too often, people are using dd and testing throughput instead of actual IOPS. Even the benchmarks that show 20%+ tend to be skewed in favour of virtualization, as they tend to be run with a single VM instead of multiple VM's.

Even if there was 0% performance penalty from virtualization, you'd still see suboptimal allocation of hardware resources just from trying to take an abstracted view of the hardware. Different applications have different performance profiles. You either end up with overbuilt hardware to support the virtualization environment and the different performance profiles of the different applications, or with multiple VM's for the same application on the same hardware which is totally unnecessary overhead. Virtualization is just not meant for large scale.


Here [1] is a great paper about nested virutalization for KVM. This combines hardware capabilities with software tricks to allow running multiple levels of VMMs. It may not have intense IOPs testing but it's got a couple benchmarks that would be representative of real-world workloads. Keep in mind that this paper was published in 2010 and virtualization performance has been on a dramatic rise the last several years.

Jump to the results section. The more relevant bullets here are 'single guest' (either virtio or using direct mapping).

Highlights (or lowlights, depending on your perspective): kernbench - 9.5% overhead SPECjbb - 7.6% overhead

I don't agree with your point about suboptimal allocation of hardware resources. Virtualization does not require you to divide a machine in a different way than processes do (you could easily have one VM consume nearly all CPU cycles, one consuming nearly all I/O capacity, etc.) IMO, the key difference is that virtualization lets you easier establish hard, enforceable limits and concrete policies around resource usage (not to mention the ability to account for usage across all kinds of different applications and users). And, it lets you do that for arbitrary applications on arbitrary operating systems. So users don't have to write to one particular framework/language/runtime/OS whatever. That's all pretty important for large scale.

[1] http://static.usenix.org/event/osdi10/tech/slides/ben-yehuda...


What distinction does KVM or the kernel make for a single guest?

Is there a system that would allow for mapping part of an IO device (such as a block range or a LUN) to a guest when multiple guests are running with the same level of overhead?


I'm not sure I follow your question 100%, but I'm gonna take a stab...

The distinction being made here isn't for a single guest or multiple guests, it's for a single guest OS or nested guests (i.e. a VM running another VM). To expose the hardware virtualization extensions to the guest VMM, then they must be emulated by the privileged domain (host). There are software tricks that allow this emulation to happen pretty efficiently (and map an arbitrary level of guests onto the single level provided by the actual hardware). It's not a common use-case, but for a few very specific things it's very useful.

There are a few different ways to map I/O devices directly into domains. Some definitely allow for part of an I/O device. For example, many new network devices support SR-IOV -- which effectively allows you to poke it and create new virtual devices (which may be constrainted in some way) which can be mapped directly into guests.


Ah, VMware is the problem, that is explains it.

Parties that care muchly about fine performance margins apparently need to be using Xen or KVM or Illuminos then.


Can't speak for the parent poster's company but the numbers don't match my experience with VMware many years back. It's possible they've had a sharp regression but we were maxing out gigabit ethernet and local RAID arrays in 2006.


Well, don't confuse I/O with throughput here. You can look at performance numbers for just about anything and tweak one direction or the other.

For instance it's easy to make a benchmark showing huge throughput to any given storage solution (and many NAS providers sell on this basis), but your I/O might be terrible because to get that throughput you're maxing the CPU (etc.). Likewise, you can change your benchmark and show high I/O, but the throughput is 'terrible'.


The parent very clearly specified I/O, which is what I was commenting on.


Virtualization which cares about iops needs SSD. Hard drives can't be sanely virtualized.


> The 20-30% hit to IO

If you're seeing that, something is configured wrong. VMWare was 95+% of native disk and gigabit ethernet 5 years ago.


Unless you are doing really heavy IO or CPU across multiple VMs on the same host which is likely if you have any load worth mentioning. There is a 20% to 30% difference if you run one process per VM or 8 processes on bare metal in an 8 core machine. We benchmarked it on known good configs optimised to bits.

Either the hypervisor scheduler is shit or the abstraction is costly. I reckon its down to the reduction in CPU cache availability and the IO mux.

This was HP kit end to end, VMware certified, debian stable on and off ESX 4.


Honest question here. First: What year exactly was this deployment put in place? At least for Intel their CPU performance for VM has increased SUBSTANTIALLY even in the last couple years. I remember taking some VM hosts from some of their first (or second? Can't remember for sure) gen processors to the 5500 series. It was like night and day. It is even more so for the e5 series.

Second: Why did you implement a large-scale VMWare install without either having a testing period before sinking contract dollars and license costs into it, or at least having contract terms to opt-out if their claims didn't match reality?


2008. TBH not sure what CPUs we had in there. Kit has been scrapped now. I wouldn't bother going through it again. We now have a standard half and full rack which we can purchase and deploy quickly for an installation so there are no plans to piss any more time and cost up the wall.

We did have a testing period which was unfortunately run by a Muppet who decided the loss was acceptable. Muppet now no longer works for organisation.


> Unless you are doing really heavy IO or CPU across multiple VMs on the same host which is likely if you have any load worth mentioning. There is a 20% to 30% difference if you run one process per VM or 8 processes on bare metal in an 8 core machine. We benchmarked it on known good configs optimised to bits. > > Either the hypervisor scheduler is shit or the abstraction is costly. I reckon its down to the reduction in CPU cache availability and the IO mux.

We were running heavy loads and there was nothing like a 20-30% hit. I'm not saying you didn't see one but this isn't magic or a black box: we had a few spots where we needed to tune (e.g. configuring our virtual networking to distribute traffic across all of the host NICs) but it performed very similarly to the bare metal benchmarks in equivalent configs.

What precisely was slower, anyway - network, local disk, SAN? Each of those have significant potential confounds for benchmarking.


> I can understand that virtualization is useful when your resource requirement is less than one machine but above that, I doubt there are any real benefits. It's snake oil.

Tell that to Netflix - or any of AWS' customers, really.


AWS is a prime example. EC2 unreliable, doesn't perform well and is expensive (and fairly weather dependent by the looks too). If you want anything that can actually shift anything, dedicated servers are cheaper (but you might actually have to commit - oh dear if your margin is that tight, you don't have a product worth it).

The only use case I can see for it really is rapid scaling but that seems only viable for content delivery as your data back end architecture is way harder to scale than click the magic cloud button. Back in the old days we used to do content via CDN (Akamai etc) which actually works out cheaper per GiB.

Then we approach things like S3 which is terribly unreliable (some of our clients use it for storage as it's cheaper than a SAN but they suffer for that). Drop outs, unreliable network requests, latency, rubbish throughput and buggy as fuck SDK.

Its like consuming value line products from a supermarket. Sure there's quantity but its lacking in quality.


If you setup you're stack wrong then some of you're points are valid. But many amazon services offer you redundant infrastructure at a reasonable cost.

But the key advantage over traditional hosted solution is that you can run reproducible test and dev stacks for 8/12 hours a day without having to pay for hardware that sits there doing nothing while you sleep. Sure cloud computing has it's faults but the ability to run you a HA test stack in 3/4 datacenters in minutes, and pay for just the time you are testing it, enables people to move forward and develop more and more impressive technology.

OTOH I'm not sure why you would run cloud on your own hardware as you're already paying for the HW, I suppose it simplifies the management significantly for PayPal or they wouldn't be doing it


Take a deep breath, relax, and realise that plenty of people have use cases that are different to yours.


If you have a way to build a disk image containing your application from scratch virtualization buys you easily rebuilt application stacks, and extremely clean systems. That is also a very good first step to gain the ability to scale out and up (ie. run your app in 20 places around the world).


Volume size limits on your SAN... Is 16TB too restrictive for your volume size limit?


2TiB per LUN on ESX until recently actually. We have a single filesystem that is 19TiB.


So... 19TiB and you're only using iSCSI instead of NFS and you're complaining about volume size restrictions?


We have to use iSCSI. Not all our kit talks NFS plus we use SAN block replication between DCs which really doesn't play nice with NFS (we did test it and discovered that not all NFS implementations are good).

This whole thing has to play nice with a windows DFS cluster as well.


You said you're using ESXi? The Windows virtual servers won't care whether they are sitting on a LUN or NFS. Why do you have to use iSCSI? Or SAN block replication for that matter? If you think you need to use iSCSI, why don't you use it for some of your storage and NFS for parts that require storage beyond 2TB?


What was your SAN?


Virtualization is a requirement for any kind of serious dynamic infrastructure.


Processes are a form of virtualisation are they not?

That's enough abstraction.


Originally, yes. Unfortunately Unix/Windows has accumulated a ton of global state (libfoo 3.7.1 not parallel installable with libfoo 3.6.4) and enterprise software is now so fragile that any alternative to VM sprawl is unthinkable.


We run our software (and other "enterprise" software) multiple instances per node and they are fully isolated. Most is in JVMs but we have C++ stuff with no problems.


Virtualization is a form of process so yes.


Except that Google, Facebook, etc. etc. successfully use baremetal deployments without a VM in sight...




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: