Hacker News new | past | comments | ask | show | jobs | submit login

> We couldn't even run our basic Logstash offerings on our OpenStack cluster without them having bizarre performance issues.

I'd really be interested in post-mortems. As long as you're not using SDN/overlay networks/weird plugins for Cinder instead of plain NFS, many components of OpenStack are nothing more but a config generator and deployer for core Linux iptables/bridges/KVM virtualization.

> It could never run anything reliably and they scrapped the entire project and re-purposed all the servers for DC/OS, which was super nice and reliable and every team migrate hundreds of services onto.

We're moving our stuff away from DC/OS as we're sick of the instabilities and especially the UI and configuration changing every release. It's been two and a half years of banana-ware for us.

What version of DC/OS were you using? There have been a lot of improvements recently. (I work at Mesosphere)

We started with 1.8 and upgraded to 1.12 finally end of last year.

Our biggest pain point, next to the tendency of amok-running deployments leading to disks filled up with useless logs (leading once to a totally corrupted master after a weekend), was/is that the "official" Jenkins package is the ultimate PITA to upgrade, massively lags behind despite security issues (current: 2.150.3 - mesosphere/jenkins: 2.150.1!) and you can't even run Jenkins outside of DC/OS because it needs the Marathon shared library to work.

Another thing that we dearly missed was the ability to "drain" a node - for example if I want to perform maintenance on a node, but cannot shut it down right now as a service on the node is being used... then I'd like to at least prevent new jobs from being spawned on that node. Or during system upgrades that stopping the resolvconf generator does not restore the original resolv.conf leading to a broken DNS, or when specifying NTP servers by name that the NTP server could not be resolved at boot time (as the resolv.conf still referred to the weird DCOS-round-robin-DNS), leading to DC/OS not wanting to start because the clock was out of sync,...

What are you using instead of DC/OS?

k8s - first in our OpenStack environment (once I figure out how to get external connectivity) and then once all the deployment jobs written for DC/OS are migrated to k8s, the DC/OS nodes will be reprovisioned with k8s on bare metal.

No cloud for that project, contractual prohibition - everything must be kept in-house.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact