Anyway long story short, most of these people do not really understand why they need all this rocket science to manage < 500 internal users. One of the new buzzwords I am hearing these days is mostly related to bigdata and machine learning. One of my managers came to me and asked me why dont we integrate our product with hadoop it will solve the performance problems as it can handle lot of data.
I am frustrated by the industry as a whole. I feel industry is simply following marketing trends. Imagine the no. of man-hours are put into investigating technologies and projects dropped mid-way realizing the technology stack is still immature or not suitable for at all.
People want theirs apps to be made with Visual Studio (BTW, FoxPro was part of the package).
So they ask: "In what is the app made"?
"In Visual, Sir."
Done. End of story (like most of the time, obviously some times people are more dangerous and press it ;) ).
The point is not focus in the exact word but in what the people know the word will give to them.
So, for example "Big Data". The meaning for us matter zero. The meaning to some customer is that it have a largeish excel file that with his current methods and tools take too long to get results.
So. Do you use "Big Data Tools"?
And what about use Hadoop?
"We use the parts of big data tech necessary for solve this, and if we need to use hadoop or other similar tools that fit better with your industry and that use the same principles will depend in our evaluation. Not worry, we know this"
Or something like that ;). Know that worry the people behind the words have help me a lot, even with people with WORSE tech skills (damm, I have build apps for almost iliterate people with big pockets but only witch cellphones as references of tech!)
And the anecdote about the largeish excel file that was too big and take too long? Yep, true. And was for one of the largest companies in my country ;)
That's called practicing conservatively, minimizing chances of bad outcomes. It's a matter of astute clinical judgement to glean optimum risk/benefit ratio in a particular case. Since no two cases are ever exactly the same, good judgement is a constant necessity.
I see that the process of developing software has many parallels and not surprising that everyone experiences so much brokenness. When people complain to me about some mysterious program misbehavior (stuff I had nothing to do with) I empathize with them, and try to help them think logically about the problem they're having.
Only rarely can I offer any real insight, but given the insane proliferation of the alphabet soup of identifiers attached to all the "new things" out there, no one I know in the industry feels they have a handle on what's happening.
Seems like the pace of "innovations" will lead to even greater levels of incomplete and dysfunctional systems and can only lead, sooner or later, to truly catastrophic failures.
I am very skeptical of people who are "BizDev" or "Project Managers" or "Managers" or "Scrum Master" they generally don't know what they're talking about and rely on buzzwords.
For example, if a DBA and a JS Developer say "We need to use a scalable database", they probably don't have the same thing in mind about what "scalable" or "database" exactly is, however, both are concerned about provide data at performance.
So, if a naive web developer wanna "a scalable document stored!" you can just give to it postgres and presto! ::troll:: ;)
Also Project Manager and Scrum Master are just positions that describe roles and responsibilities an organization / on a team. The people filling those roles needn't be clueless.
The positions I mentioned above are usually the position people who failed at picking up any valuable skill seem to resort to.
If they don't accept your answer and ask a followup, then they're probably a person worth actually having a conversation about the pro's and con's with.
My work lands me in a number of different conferences in non-software industries. This is true for all industries. Its just that ours has a faster revolving door. That, in addition to a low barrier to entry (anyone can claim they're a web developer), leads to a higher degree of this madness. Its just part of human behavior to seek out, and parrot, social signals that let others know you, too, are an insider.
Personally, I have to avoid a great number of those gatherings, since the lot of them are just a circlejerk of low-density information. If I pay too much attention to those events, I catch myself looking down my nose, and since that isn't productive/healthy behavior, I avoid landing myself in a place where guys with buddy-holly glasses and <obscure-craft-beer> argue which Wordpress plugin is the best.
My remark was to highlight that buzzwords are often used for "me-too" ankle-deep conversations/articles. Whether someone calls it Devops, or Systems Engineering, makes no difference to me. However, I favor pragmatic conversations about the topic, rather than buzzword bingo.
Examples include: "MongoDB sucks.", "Everyone should use Docker", and "What? You mean you're not using Kubernetes for your CRUD app?"
Basically, blanket statements that accomplish nothing more than to send social signals.
So it's healthy to embrace it as counter-balance to the constant hype.
(Besides, whether something is "real" or "solid" I think can mostly be answered in hindsight -- when it's mature enough and tested enough. In which case calling only things in the past solid is prudent).
Unfortunately I have to agree as a developer. My job is to make a fast, reliable, stable product but at the same time I'm questioned the tools I use by people who don't have any knowledge but heard the latest trend.
But sometimes it's also very easy to please people.
Big data: just insert 10M records in a database and suddenly everyone is happy because they now have big data :|
Since when is 10M records is considered big data?
My goto gauge for big data is that it can't fit in memory on a single machine. And since that means multiple TB these days, most people don't really have big data.
: *Heck you can even rent ~2TB for $14/hour! https://aws.amazon.com/ec2/instance-types/x1/
Most all of that is overall poor architecture, and most companies don't hire particularly good developers or DBAs (and most web developers aren't actually very good at manipulating data, relational or not), but it's the state of the union. That's "enterprise IT". That's why consultancies makes billions fighting fires and fixing things that shouldn't be problems in the first place.
A Lucene index can be much larger than your current RAM. It can be 100x that. The data will still queryable. Lucene reads into memory the data it needs in order to produce a sane result. Lucene is pretty close to being industry standard for information retrieval.
My definition is instead "when your data is not queryable using standard measures".
I unsubscribed from that (non-tech) podcast.
As a grumpy SA, I see way too many people try to push for new tools because they "seem cool", instead of "Do they solve a problem we have?"
Things we consider industry standard though, why should you need to fight for it? An example I can think of, dependency injection. Ideally you can test your software better and realease more reliable builds. Believe it or not I do come across companies that still are not aware of these concepts. Introducing it would be possible without breaking anything because you can continue instantiating services the old fashioned way.
With newish stuff that's still changing, if it won't impact production (i.e., tooling) I'm up for adopting it earlier than usual.
I've been around government contracting and when you see problems that come up a lot, that we have industry standard solutions too, it's hard not to feel frustrated. I get where you're coming from though, just sharing my experience :)
More programmers need to embrace the suck.
I'd argue the opposite. Instead of spending time reflecting on how cool and useful their code is, or hardening it up, devs spend too much time reinventing the wheel. All this work to learn the next new fad is killing productivity.
> devs spend too much time reinventing the wheel
I'd argue the opposite. They spend too much time not reinventing the wheel. They strap factory made bicycle wheels onto a car and are surprised when the wheels break. They could benefit from spending more time trying to make a better wheel.
Do you have any suggestions for which 'better wheels' people should be looking at?
I generally like reading on anything Lisp-related, as this family of languages is still pretty much direct heritage of the golden ages.
The stuff done by Alan Kay, et al. over at PARC is also quite insightful.
Sometimes it can make you more productive. Or though your site is still responding to current customer demands in a timely fashion, you know that the mobile experience could be significantly improved now that browsing via cell phone is on the rise.
Another thing to consider is employability both from a company and individual perspective. If you can keep up with moderately current (not the latest and greatest) trends, you'll attract people who want to grow in their careers. I wouldn't want to work on C# 2.0 using Visual Source Safe. It's hard to convince a company that you can learn git on the job.
In general I like to move without introducing breaking changes. I'm not a cowboy coder, it's really exhausting working with one. I do think there's merit in realizing when it's time to change though.
Rows is a bad measure of "big" when it comes to data. A measurement of bytes and probably more specifically bytes per field and how many fields the records have, as this gives a better indication into the way this will be written and potentially searched.
10 million rows of 5 integer values is pittance for any relational database worth using in production. 10 million rows of 250 text columns would be horrendous for a relational database.
But many times this happens because of wasted or bloated indexes that aren't useful. Or it happens when data types are picked incorrectly.
For example, I once worked on a database where the original developer used Decimal(23, 0) as a primary key. This was on MySql and that ended up taking up 11 bytes per row, versus a Long which would have just been 8. In one table, maybe not so bad but when you start putting those primary keys into foreign key relationships... we ended up with a 1 billion row table in MySql that had 4 of these columns in it. That might make it "big data" by that definition but it's also just bad design.
Another example in that same database was using text fields in mysql for storing JSON. Since text fields in mysql are stored as separate files, this meant that every table that had one (and we had several tables that housed multiple) ran in large IO and disk access issues.
"big" data is probably a bad term to use these days because of easy it is to accidentally create a large volume of data but not need a big data solution outside of the fact that it's not the business that needs it, it's the poorly implemented system that does.
But the real reason we talk about fitting in memory comes from the core of the issue: IO. Even a super large memory set could end up being slow if it's postgres and single threaded reader that's scanning a 500 GB index. AWS offers up to 60 GB/s memory bandwidth and we'd need it for this index, since that would still take almost 10 seconds to warm up the indexes in the first place.
Bwuh? Over in MS SQL you just go for an NVARCHAR and forget about it. What is the right way to store this data (if you really do need to store the JSON rather than just serializing it again when you get it out of the DB)
It stores text fields as blobs.
I suppose now the right way would be the json data type. It didn't exist when I was working with these servers though (or they were on a much older version of MySql)
"SQL doesn't scale". It needs to be in Mongo or whatever NoSQl database is in right now. I have heard all sorts of nonsense regarding "big data" in the last few years.
Take your .war file, drop it onto JBoss. It deploys across the cluster in a zero downtime manner, isolates configuration, provides consistent log structure, cert management, deployment. You can deploy dozens of small war's to the same server and they can talk to each other. Load balance across the cluster automatically based on actual load. Run scheduled jobs and load balance the scheduled jobs themselves. Allow them to be isolated and unique within the cluster.
I may not like Java as a language, but from an infrastructure standpoint Java was basically Heroku long before Heroku was Heroku. The infrastructure is just...solid. The downside was that the XML config stuff was just messy.
I have come to the point where I only look at other languages once in a while and it serves me well.
A few years ago when I was still in farming we had the ostrich craze: ostriches were crazy profitable (or so the ostrich sellers said) and every farm needed to consider it.
Eggs where $300 a piece etc etc.
Of course the first to get one made great money by selling eggs, chicken and consulting hours to all the rest.
The rest where not so lucky and today I don't know a single ostrich farm.
Same goes for latest tech: if you want to you can try to be first and make a good living on the hype stream.
As is, believe it or not JavaEE.
I mean, it's great to have this new tech and all, but when you're trying to build something to last some years, sometimes it's hard to filter the crap between all the buzzwords. It just reinforces the thought that smart people should just leave this field entirely or search for other fields of knowledge (or business) where our knowledge of programming can be made of use.
I'm 35 now, but I'm starting to realize that I will not have the patience to keep with all the crap just to be employable.. There are some areas where being old and experient is valuable. Philosophy, science, psychology, teaching, etc., are maybe some of them, but this industry is definitely not one of those areas. It makes me think that what I'm building now will some day be completely wiped out of existence..
"All my work will be obsolete by 2005"
If you aren't willing to accept that obselence is part of life, then you are either building something you aren't passionate about or confused about the cruelty of time.
The quote has bounded context. And in that context, seems generally valid and applicable.
I'm basically saying that the high churning we have now does not give you enough time to build significant experiences that you can use later on in life, and, as such, it's the opposite of a good investment to the future. It is almost as we are only living for the present status, forgetting that on the future we will have less patience and energy to have to "re-learn" almost the same things.
If you look at what technology was popular 10-15 years ago then that's what will be in use in Enterprises now. Java web services is currently the big thing at my company.
All the late 90's business apps which were in Visual Basic, Oracle Forms and Access are being rewritten as Java web services at the moment by an army of contractors. In another 10-15 years they will be rewritten again in the language Du Jour of today probably Go. It's an endless cycle.
"We could store gigabytes of data on the clients without having to pay for servers"
Yep, same experience here with both "Big Data" and the ML space. The decision makers need to see the sheer amount of Java, Scala and/or Python code you need to actually implement to do anything useful.
Unlike the natives, however, who simply wasted some time building extraneous fake runways, in the Valley people are royally screwing up their own core architecture.
I'm old enough to find this more humorous than frustrating.
The Valley is ripe for disruption. ;)
So far I've seen micro services repeat this trend almost exactly.
Yanking out the major chucks of independent functionality into separate deployable services makes sense at a large enough scale and for large enough, independent enough components. But you would only do so out of necessity, not as an initial architecture.
And yet here we are.
Fashion signals, well, virtually everything about social interactions. A tremendously complex world. Including, for that matter, whether or not you care about fashion trends, and quite possibly, why you might or might not (you're not in the game, you've quit the game, you're so fabulously successful you don't need to play the game, you couldn't play the game if you wanted to, ...)
In IT, TLAs, ETLAs, buzzwords, slogans, brands, companies, tool names, etc., all speak to what you know, or very often, don't know. It's not possible to transmit deep understanding instantaneously, so we're left with other means of trying to impart significance.
Crucially, the fact that clothing and IT fashion are so superficial (of necessity) means they can be game, and that those who are good at following just the surface messages can dive in. Some quite effectively. But they're not communicating the originally intended meaning.
Have you looked at React Native at all?
Bigdata and machine learning are also hot word. But they are clearly modern engineering. Consultants exist to explain the best way to achieve modern best practices to people without the appropriate background. If someone asks about "Why no Hadoopz plx?", either explain the other technology used instead (maybe spark, storm?) or explain that the scale is small enough for Access to handle. That's a consultant's job.
'twas ever thus.
Computer science is not a real field.
But I think Alan Kay has been "exposed" to computer science, and I follow his logic, based on my limited scope of knowledge.
Most of the times that I bring up the concept of virtue to my peers in age they seem either confused with the concept or contemptuous of it. They behave like virtue is a purely religious thing, yet caution in the face of possible danger is a very basic survival skill.
That seems excessive"
A 100 times yes. We tried to split our monolithic Rails app into micro-services built in Go. 2 years and many fires later, we decided to abandon the project. It was mostly because the monitoring and alerting were now split into many different pieces. Also, the team spent too much time debating standards etc. I think micro-services can be valuable, but we definitely didn't do it right, and I think a lot of companies get it wrong. Any positive experiences with micro-services here?
A small team starting a new project should not waste a single second considering microservices unless there's something that is so completely obviously decoupled in a way that not splitting it into a microservice will lead to extra work. It's also way easier to split into microservices after the fact than when you're developing a new app and you don't have a clue how it will look like or what the overall structure of the app will be in a year (most common case for startups).
A thousand times yes. Distributed systems are hard.
> Debugging is more difficult since you now can no longer step through your program in a debugger but rather have an opaque network request that you can't step into.
Yes. Folks underestimate how difficult this can be.
In theory it should be possible to have tooling to fix this, but I've not seen it in practice.
> You can no longer use editor/IDE features like go to definition.
Not a problem with a good editor.
> Version control becomes harder if the different services are in different repositories.
No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.
I would modify this slightly. Larger organizations with independent teams may want to run on per-team repos. Conway's law is an observation about code structure but it sometimes also makes good practice for code organization. And of course, sometimes the smell is "this company is organized pathologically".
Another problem is that large monolithic repositories can be difficult to manage with currently available software. Git is no panacea and Perforce isn't either.
Flat out wrong for any organization with multiple products. Which, let's be honest, is most of them.
My personal take on it, at this point, is that much of our knowledge of how to manage projects (things like individual project repos, semantic versioning, et cetera) is centered on the open-source world of a million mostly-independent programmers. Things change when you work in larger organizations with multiple projects. You even start to revisit basic ideas like semantic versioning in favor of other techniques like using CI across your entire codebase.
Monorepos come with their own challenges. For example, if any of your code is open source (which means it must be hosted separately, e.g. on Github), you have to sync the open-source version with your private monorepo version.
Monorepo are large. Having to pull and rebase against unrelated changes on every sync puts an onerous burden on devs. When you're remote and on the road, bandwidth can block your ability to even pull.
And if you're going to do it like Google, you'll vendor everything -- absolutely everything (Go packages, Java libraries, NPM modules, C++ libraries) -- which requires a whole tool chain to be built to handle syncing with upstream, as well as a rigid workflow to prevent your private, vendored fork from drifting away from upstream.
There are benefits to both approaches. There is no "one right way".
I love Git, and I used submodules for years in personal projects. It started with a few support libraries shared between projects, or common scripts for deployment, but it quickly ballooned into a mess. I'm in the process of moving related personal projects to a monolithic repository, and in the process I'm giving up the ability to tag versions of individual projects or provide simple GitHub links to share my code.
Based on these experiences, I honestly think that the only major problem with monolithic repositories is that the software isn't good at handling it, and this problem could be solved with better software. If the problem is solved at some point in the future, I don't think the answer will look much like any of the existing VCSs.
Based on experiences in industry, my observation is that the choice of monolithic repository versus separate repository is highly specific to the organization.
Mind elaborating on this?
What editor are you thinking of that can jump from HTTP client API calls to the corresponding handler on the server?
Totally agree with everything else, but gotta completely disagree on this last point. Monorepos are a huge smell. If there's multiple parts of a repo that are deployed independently, they should be isolated from each other.
Why? Because you're fighting human nature, otherwise. It's totally reasonable to think that once you excise some code from a repo that it's no longer there, but when you have multiple projects all in one repo, different services will be on different versions of that repo, and your change may have changed semantics enough that interaction bugs across systems may occur.
You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?
When separately-deployable components are in separate repositories, and libraries are actual versioned libraries in separate repositories these relationships are explicit instead of implicit. Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.
While with the monorepo approach you can get "fun" things like service A inspecting the source code of service B to determine if cache should be rebuilt (because who would forget to deploy service A and service B at the same time, anyways...), as an example I have personally experienced.
My personal belief is that the monorepo approach was a solution back when DVCSs were all terrible and most people were still on centralized VCSs like Subversion that couldn't deal with branches and cross-repo dependencies well, and that's just what you had to do, while Git and Mercurial, along with the nice language-level package managers, make this a non-issue.
Finally, there's an institutional bias to not rock the boat (which I totally agree with) and change things that are already working fine, along with a "nobody got fired buying IBM" kind of thing with Google and Facebook being two prominent companies using monorepos (which they can get away with by having over a thousand engineers each to manage the infrastructure and build/rebuild their own VCSs to deal with the problems inherent to monorepos that most companies don't have the resources and/or skills to replicate).
EDIT: Oh, I forgot, I'm not advocating a service-oriented architecture as the only way to do things, I'm just advocating that whatever your architecture, you should isolate the deployables from each other and make all dependencies between them explicit, so you can more easily write tooling to automatically catch bad deploy states, and more easily train new hires on what talks to/uses what, since it's explicitly (and required to be) documented.
If that still means a monorepo for your company's single service and a couple of tiny repos for small libraries you open source, that's fine. If it means 1000 repos for each microservice you deploy multiple times a day, that's also fine (good luck!).
Most likely it means something like 3-10 repos for most companies, which seems like the right range for Miller's Law) ( https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus... ) and therefore good for organizing code for human consumption.
But having multiple repos doesn't prevent the equivalent situation from happening (and, I think, actually makes it much likelier): no matter what, you have to have the right processes in place to catch that sort of issue.
> You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?
That's the sort of problem which happens with multiple repos, but not (as often) with a single repo.
> Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.
A monorepo is explicit, too, even more explicit than multiple repos: WYSIWYG. And you can always see if your services are using the same API by compiling them (with a statically-typed language, anyway).
The beautiful thing about a monorepo is it forces one to confront incompatibilities when they happen, not at some unknown point down the road, when no-one know what changed and why.
If you expect to need to step into a function call when debugging, then it's too tightly coupled to spin out. You should be able to look at the arguments to the call and the response and determine if it's correct (and if not, now you have isolated a test case to take to the other service and continue debugging there).
If the interface will change so often that you expect it will be a problem that it's in a separate repository, if you expect that you will always need to deploy in tandem, then it's too tightly coupled to spin out.
The advantage of micro services is the separation in fact of things that are separate in logic. The complexity of systems grows super-linearly, so it's easier to reason about and test several smaller systems with clear (narrow) interfaces between them than one big. It's easier to isolate faults. It's harder to accidentally introduce bugs in a different part of the system when the system doesn't have a different part. If done right, scaling can be made easier. But these are hard architectural questions, there's no clear-cut rule for when you should spin off a new service and when you should keep things together.
Someone else mentioned separating the shopping app from the payment system for an ecommerce business, which even has security benefits. I think that's an excellent example.
As for advantages, microservices tend to keep code relatively simple and free from complex inheritance schemes. There's rarely a massive tangled-up engine full of special cases in the mix, as there often is in monolithic apps. This substantially decreases technical debt and learning curve, and can make it simple to understand the function an isolated microservice performs.
There is the obvious advantage that if you have disparate applications executing nearly-identical logic to read or write data to the same location, and the application platforms can't execute the same library code, you can centralize that logic into an HTTP API, which reduces maintenance burden and prevents potentially major bugs.
My opinion is that adopting microservices as a paradigm leads to a slow, difficult-to-debug application, primarily because people take the "micro" in microservices too seriously. One shouldn't be afraid to split functionality out into an ordinary service after it's been shown to be reasonable to do so.
With microservices, the production version of their service would conceivably be stable. It moves the contract from the repo to the state of production services.
With a monolithic repo done right, the other teams broke their build of their branch, and it's up to them to resolve it. You, meanwhile, are perfectly happy working on your branch. When their changes are mergeable into trunk, then they may merge them, not before — and likewise for you.
With multiple repos, they break your build, but don't know it. You don't know it either, until you update your copies of their repos — and now you have to figure out what they did, and why, and how to update your logic to handle their new control flow, and then you update again and get to do it again, until finally you ragequit and go live in a log cabin with neither electricity nor running water.
I don't see how this is a problem if you are pushing frequently and have a CI system. You know within minutes if the build is broken. If it broke, don't pull the project with the breaking changes.
My point is, I don't think one approach is inherently better than the other. Both require effort on the part of the teams to manage changes (or a CM team), and both require defined processes.
I agree with the overall sentiment of your comment, but the quoted part is where I've seen trouble brew. The tendency is to be conservative about pulling updates to dependencies, which can easily get you into a very awkward state when a critical update eventually sits on top of a bunch of updates you didn't take because they broke you. It is usually better to be forced to handle the breakage immediately, one way or another.
Yes, that's the contract that you need to have with other teams. And it's the contract that is automatically enforced with microservices.
You don't debug distributed systems by tracing into remote calls and jumping into remote code. You debug it by comparing requests and responses (you use discrete operations, right) with the specified requests and responses, and then opening the code that has a problem¹.
It calls for completely different tooling, not for a "better debugger".
1 - Or the specs, because yes, now that your system is distributed you also have to debug the specs. Why somebody would decide on doing that for no reason at all? Yet lots of people do.
Multiple platforms is not a problem and generally a good thing as long as it's not excessive. You don't want to be in a case where you have the same number of different platforms as developers or anything like that. I'm guessing there is a rule of thumb here, but I'm not sure what it would be. Max 1 different platform per 5 developers? Something like that.
I do wish people would stop conflating "running in a different service" and "loose coupling". They are completely orthogonal.
I've worked on some horrendously tightly coupled microservices.
Unless you can coax dOSGi into working (which is tons of fun), then you can have services tightly coupled to other services running on entirely different machines causing frequent (and hilarious) cascades of bundle failures whenever the network hiccups.
OSGi is a trigger word for me now. I've worked on two large OSGi projects (previous job and current job) and it's always the same. Sh*t is always broken (and my lead still insists that OSGi is the one true way to modular bliss). And the OSGi fanboys always say "Your team is using it wrong!" Which very well might be true, but I no longer care. Apparently it's just too damn hard to get a team of code monkeys to respect service boundaries when OSGi makes it so damn easy to ignore them.
If I'm ever in a position of getting to design a new software architecture (hasn't happened in 10 years, but hey I can dream), I'll punch anyone who suggests "OSGi" to me right in the face.
That's a good point. I think this thought extrapolates to other parts of software engineering as well. Sometimes writing very modular and decoupled software from the beginning is very hard for a small team, and we can't see well if this is the best approach since it's also hard to grasp the big picture.
I'm currently facing this issue. I'm trying to write very modular and reusable applications, but now I'm paralyzed trying to picture the best patterns to use, where should I use a facade, a decorator, etc. I think I'll adopt this strategy for myself--only focus on modularizing from the beginning if it'd lead to extra work otherwise.
Microservices also make it much harder to refactor the code which you often need to do in the early stage of a project.
The thing is, you need a massive investment in infrastructure to make it happen. But once you do, its great. You can create and deploy a new service in a few seconds. You can rewrite any individual service to be latest and greatest in an afternoon. Different teams don't have to agree on coding standards (so you don't argue about it).
But, the infrastructure cost is really high, a big chunk of what you save in development you pay in devops, and its harder to be "eventually consistant" (eg: an upgrade of your stack across the board can take 10x longer, because there's no big push that HAS to happen for a tiny piece to get the benefits).
Monolithic apps have their advantages too, and many forget it: less devops cost, easier to refactor (especially in statically typed languages: a right click -> rename will propagate through the entire app) and while its harder to upgrade the stack, once its done, your entire stack is up to date, not just parts of it being all over. Code reuse is significantly easier, too.
Anything you're not running locally just hits the shared infra.
Unsure if sarcastic.
Maybe Swift? Scala Native in a year or two? I've done a little Erlang before, so maybe Elixir?
>The thing is, you need a massive investment in infrastructure to make it happen.
I thought that one of the selling points of microservice architectures was the minimal infrastructure. I am really struggling to see an advantage in this way of doing things. You are just pushing the complexity to a dev ops layer rather than the application layer - even further form the data.
Monoliths invariably tend to become spaghetti over time, and completely impossible to any non trivial refactoring. With microservices, interfaces between modules are stable and spaghetti is localized.
Because individuals may be jumping through dozens of services a day, moving, refactoring, deploying, reverting (when something goes wrong), etc. It has to be friction-free, else you're just wasting your time.
eg: a CLI to create the initial boilerplate, a system that automatically builds a deployable on commit, and something to deploy said deployable nearly instantly (if tests passed). The services are small, so build/tests should be very quick (if you push above 1-5 minutes for an average service, it's too slow to be productive).
Anyone should be able to run your service locally by just cloning the repo and running a command standard across all services. Else having to learn something every time you need to change something will slow you down.
That infrastructure is expensive to build and have it all working together.
The more dramatic effect was on a particular set of endpoints that have a relative high traffic (it peaks at 1000 req/s) that was killing the app, making upset our relational database (with frequent deadlocks) and driving our Elasticsearch cluster crazy.
We did more than just split the endpoints into microservices. We also designed the new system to be more resilient. We changed our persistence strategy to make it more sensible to our traffic using a distributed key-value database and designed documents accordingly.
The result was very dramatic, like entering into a loud club and suddenly everything goes silent. No more outages, very consistent response times, the instances scaled with traffic increase very smoothly and in overall a more robust system.
The moral of this experience (at least for me) is that breaking a monolith app into pieces has to have a purpose and implies more than just move the code to several services keeping the same strategy (that's actually slower, time consuming and harder to monitor)
I can't get my head around how people introduce changes to their system if they have to update 12 different microservices at once? It must be horrible.
Often you hear stories how people are converting monolithic app to microservices - but this is easy. Rewriting code is easy and it's fair to say it always yields better code (with or without splitting into microservices - it doesn't matter).
What I'd like to hear is something about companies doing active development in microservice world. How do they handle things like schema changes in postgres where 7 microservices are backed by the same db? What are the benefits compared to monolithic app in those cases?
It seems to me that microservices can easily violate DRY because they "materialise" communication interfaces and changes need to be propagated at every api "barrier", no?
As I said in another thread, the separation in different components was key for resiliency. That allowed independence between the higher volume update and the business critical user facing component.
>I can't get my head around how people introduce changes to their system if they have to update 12 different microservices at once? It must be horrible.
The thing is, if you design the microservices properly it is very rare to introduce a change in so many deployments at once. Most of the time is just 1 or 2 services at a time.
>What I'd like to hear is something about companies doing active development in microservice world. How do they handle things like schema changes in postgres where 7 microservices are backed by the same db? What are the benefits compared to monolithic app in those cases?
We don't introduce new features in our monolith service anymore. So, from that perspective we do all active development in microservices.
>"How do they handle things like schema changes in postgres where 7 microservices are backed by the same db?
The trick is, you want to avoid sharing relational data between microservices. I don't know if it is just us, but we have been able to split our data model so far and in most cases we don't even need a relational database anymore, so having a schemaless key/value store makes seems easy too.
> What are the benefits compared to monolithic app in those cases?"
There are several advantages, but the critical one for me is being able to have a resilient platform that can still operates even if a subsystem is down. With our monolithic app is an all or nothing thing. Another advantage is splitting the risk of new releases.
>It seems to me that microservices can easily violate DRY because they "materialise" communication interfaces and changes need to be propagated at every api "barrier", no?
Not necessarily. YMMV but you can have separation of concerns and avoid sharing data models. When you do have shared dependencies (like logging strategy or data connections) you can always have modules/libraries.
One key factor was decoupling the high volume updates from the users requests so one didn't affect the other one.
In my experience, any monolith that can be broken up into a queue based system will benefit enormously. This cleans up the pipelines, and adds monitoring and scaling points (the queues). Queues removes run-time dependencies to the other services. It requires that these services are _actually_ independent, of course.
I do, however, avoid RPC based micro-services like the plague. RPC adds run-time dependencies to services. If possible, I limit RPC to other (micro) services to launch/startup/initialization/bootstrap, not run-time. In many cases, though, the RPC can be avoided entirely.
Yep. We already had a feature flag system, a minimal monitoring system, and a robust alerting system in place. Microservices make our deployments much more granular. No longer do we have to roll back perfectly good changes because of bugs in unrelated parts of the codebase. Before, we had to have involved conversations about deployments, and there were many things we just didn't do because the change was too big.
We can now incrementally upgrade library versions, upgrade language versions, and even change languages now, which is a huge win from the cleaning up technical debt perspective.
To be honest, we still have a monolithic application at the heart of our system that we've been slow to decompose, though we're working on it. We deploy it on a regular cadence and use feature flags heavily to make it play nice with everything else.
Git doesn't really help with that. More granular deployments do, and if microservices help with more granular deployments, go for it.
That's your problem right here
It makes sense for some thing. We run a webshop, but have a separate service that handles everything regarding payments. It has worked out really well, because it allows us to fiddle around with pretty much everything else and not worry about breaking the payment part.
It helps that it's system where we can have just one test deployment and everyone just uses that during testing of other systems.
I've also work at a company where we had to run 12 different systems in their own VMs to have a full development environment. That sucked beyond belief.
The idea of micro-service are is enticing, but if you need to spin up and configure more than a couple to do your work, it starts hurting productivity.
Is the payments service a single service that manages the whole transaction, or have you go for multiple services handling each part and, if so, how did you manage failure with a distributed transaction?
We had almost the same story with payments. Except for we've jumped to a payment-processing SaaS but got dissatisfied (all those SaaSes I saw don't work with PayPal EC without so-called "reference transactions" enabled) and decided that wasn't a good idea and we have to jump back to in-house implementation.
I didn't want to re-integrate the payments code back to the monolith - thought it would take me more time and make code messier. So I wrote a service (it's small but to heck with "micro" prefix) that resembled that SaaS' API (the parts we've used). It had surely evolved and isn't compatible anymore, but it doesn't matter as we're not going back anyway.
Works nicely and now I'm feel more relaxed - touching the monolith won't break payments.
On the other hand, I see how too many services may easily lead to fatigue. Automated management tooling (stuff like docker-compose) may remedy this, but also may bring their own headaches.
We have specific services that process different types of documents, or communicate and package data from different third parties, or process certain types of business rules, that multiple apps hook into, but it's literally like 20 services total for our department, some that are used in some apps and not others.
When I hear 'micro-services' I'm picturing something more akin to like node modules, where everything is broken up to the point where they do only one tiny thing and that's it. Like your payment service would be broken into 20 or 30 services.
But maybe I'm mistaken in my terms. I haven't done too much with containers professionally, so I'm not too hip with "the future".
The thing is though, the Elixir feed checker has its own database table that tracks whether it's seen an episode in a feed. And when there's a new episode it sends an API call to WP to insert the new post. The problem is that sometimes the API calls fail! Now what? I'll need to build logging, re-try etc. So I'm thinking of making the feed checker 'stateless' and only using WP with a lot of query caching as the holder of 'state' information about whether an episode has been seen before.
To sum up my experience so far, there's something nice about being able to use the right tech for each task, and separating resources for each service, but the complexity--keeping track of whether a task completed properly--definitely increases.
One hard tech limit is that with 50k podcasts, 4million+ episodes, search definitely doesn't work well. Not just WP, but SQL itself. Hence Elasticsearch. I also plan to work on recommendations, etc. so will need probably to be exporting SQL data into other systems anyway for making the "people who liked this also liked this" kinda things.
Also I kinda lied about using the WP API--that's how I built the system initially (and will switch to it moving forward), but to import the first few million posts from the content of the feeds, I just used wp_insert_post against the DB of new entries that Elixir fetched (I posted the code I used here: http://wordpress.stackexchange.com/a/233786/30906).
I also plan to write the whole front-end in React (including server side rendering) so will have to figure out how to get that done. Would probably use the WP-API with a Node.js app in front of it, will look into hypernova from AirBNB. So probably more usage of WP API accessed by another service...
It doesn't sound like microservices are needed, just adding in the appropriate tech for the job.
Once these are doing anything other than rotating log files, can the system really be considered monolithic?
The advantage though is that APIs (system boundaries) are usually better defined.
Perhaps one should use the best of both worlds, and run microservices on a common database, and somehow allow to pass transactions between services (so multiple services can act within the same transaction).
A shared database is an anti-pattern in distributed systems.
Similarly, distributed transactions (ala. DTC) is an anti-pattern.
Distributed systems aren't hard. They're just different.
Then again, sometimes it's advantageous to identify parts of your system where aspects of state can be safely decoupled. And in which having them reside in disparate systems (and yes, sometimes be inconsistent or differently available) might actually be a better overall fit.
You completely lose the concept of transactional integrity, so you will have to work around that from the start.
Then again, sometimes your state changes not only don't need to be transactional; it can be disadvantageous to think of them that way.
Depends, depends, depends.
I'm curious; in what kinds of situation would this apply?
> Depends, depends, depends.
Flexibility is usually an important requirement. Often you cannot freeze your architecture and be done with it. I think a transactional approach could better fit with this.
Any situation where the business value of having your state be 100% consistent does not outweigh the performance or implementation cost of making it so.
The non-web world has been doing this with message queueing for about 15 years. Maybe more.
I mean, the infamous "UNIX way" of "do one thing, do it well" (something we nearly lost with popularity of "do everything in a manner incompatible with how others do it" approach in too many modern systems), when complex behavior was frequently achieved through the modularity of smaller programs communicating through well-defined interfaces.
Heck, microkernels are all about this, and their ideas haven't grew out of nowhere. And HURD (even though it was never finished) is quarter a century old already.
That said, in places where it doesn't make sense we didn't try to force it. Our main game API is somewhat monolithic, but behind it we have almost 10 other services. Here's a quick breakdown:
- Turn based API service (largest, "monolithic")
- Real-time API service (about 50% the size of turn-based)
- config service (serves configuration settings to clients for game balancing)
- ad waterfall service (dynamic waterfall, no actual ads)
- push notification service
- analytics collection service (mostly a fast collector that dumps into Big Query)
- Open graph service (for rich sharing)
- push maintenance service (executes token management based on GCM/APNS feedback)
- help desk form service (simple front-end to help desk)
- service update service (monitors CI for new binaries, updates services on the fly - made easy by Go binary deployment from CI to S3)
- service ping service (monitors all service health, responds to ELB pings)
- Facebook web front-end service (just serves WebGL version of our game binary for play on Facebook)
- NATS.io for all IPC between services
But don't get too caught up on the "micro" part. Split services where domain lines naturally form, and don't constrain service size by arbitrary definitions. You know, right tool for the job and whatnot.
I wouldn't, however, just "do microservices" from day one on a young app. But usually that young app has no idea what the true business value is, i.e., you have no idea what down time of certain parts of your services really means to the business. That's the #1 pain point we're solving: having mission critical things up 100%, and then rapidly iterating on new, less stable feature designs in separate services.
You should, however, keep an eye on how "splittable" everything is, i.e., does everything need to be in the same DB schema? Most languages have package concepts, which typically align (somehow) with "service" concepts. Do you know their dependencies? That sort of thing. Then, the later process of "refactor -> split out service" is pretty straightforward and easy to plan.
I don't really like that model applied to everything, but eh now you are kind of forced in a hybrid approach - say, your macro vertical plus whatever payment gateway service, intercom or equivalent customer interaction services, metrics services, retargeting services, there are a lot of heterogeneous pieces going into your average startup.
but back on topic, what Docker really needs now is a whack on the head of whoever thought swarms/overlays and a proper, sane way to handle discovery and fail-over - instead we got a key-value service deployment to handle, which cannot be in docker and highly available unless you like infinite recursion.
I'm currently working on a large refactoring effort along these lines. The end goal is to create a modular, potentially distributed system that can be deployed in a variety of configurations, updated piecemeal for different customers, and integrated by our customers with the third-party or in-house code of their choice using defined APIs. We aren't typical of the other examples, though, in that we do literally ship our software to our customers and they run it on their own clusters.
a good example of this that I've used in production at my current $dayjob: dynamic PDF generation. user makes request from our website, request data is used to fill out a pdf template context which is then sent over to our PDFgen microservice which does its thing and streams a response back to the user.
All of that and much more needs to be replicated for each microservice, right?
Why not just have a module in your monolithic app that does it. The logic will still be separate. In most languages/frameworks you can spawn pdf generation task. Any changes are easier to introduce as well. There's no artificially materialised interface. Updates are naturally introduced. All auth logic is there already, you don't need to worry about deploying yet another service, same with logging etc.
the template has values that are related to database models. the main app (still mostly monolithic) fills out the template context. the context itself is what's passed to the microservice. the microservice does not connect to a database at all.
> Does it keep connection pool of let's say 5 connection always open (as libraries like to do)?
no. the service probably handles a few hundred requests per day, it is not in constant use. communication is over HTTPS. it opens a new connection on each request. this does impact throughput, but its a low throughput use case, and pdf rendering itself is much slower and that time totally dominates the overhead of opening and closing connections anyway.
> Does it have authentication?
yes, it auths with a bearer token that is borne only by our own internal server. this is backend technology so we don't have to auth an arbitrary user. we know in advance which users are authorized.
> Is it public or private API?
> Who is managing security?
we are, with a lot of assistance from the built-in security model of AWS.
> Is it running behind it's own nginx or other proxy?
the main app is behind nginx. the microservice is running in a docker container that exposes itself over a dedicated port. there's no proxy for the microservice, again, because of the low throughput/low load on the service. no need to have a load balancer for this so the most obvious benefit of a proxy wasn't applicable.
> Does it have DoS protection (PDF generation can be CPU intense)?
yes, it's an internal service and our entire infrastructure is deployed behind a gatekeeper server and firewall. the service is inaccessible by outside requests. the internal requests are queue'd up and processed 1 at a time.
> What about the schema for request?
request payload validation handled on both ends. the user input is validated by the main app to form a valid template context. the pdf generator validates the template context before attempting to generate one also. its possible to have a valid schema that has data that can't be handled correctly though. errors are just returned as a 500 response though. happens infrequently.
> They need to be deployed together with changes in other services, right?
nope. the microservice is fully stand alone.
> What about changes to database schema - you need to remember to update that service as well and redeploy it at the right time as well - just after successful db migrations - which live in another project.
the microservice doesn't interact with a database at all. schema changes in the main app database could potentially influence the pdf template context generation, but there are unit tests for that, so if it does happen we'll get visibility in a test failure and update the template context generation code as needed. none of this impacts the microservice itself though. it is fully stand alone. that's the point.
> All of that and much more needs to be replicated for each microservice, right?
in principle yes, and these are good guidelines for determining what is or is not suitable to be a microservice. if it would need to auth an arbitrary user, or have direct database access, or be exposed to public requests, it might not be a good candidate for a microservice. things that can stand alone and have limited functional dependencies are much better candidates.
> Why not just have a module in your monolithic app that does it.
because the monolithic app is Python/django and the PDF generation tool is Java. one of the main advantages of microservices architecture is much greater flexibility in technology selection. A previous solution used Python subprocesses to call out to PDF generation software. It's actually easier and cleaner for us to use a microservice instead.
Ah yes, the 'let's have decentralised microservices with centralised standards!' anti-pattern. It results in lots of full-fledged, heavyweight, slow-to-update services, which also have all the problems of a distributed system. It's the worst of both worlds.
Although I personally had to deal with some monolithic monsters that I wished were split into smaller services.
IMHO. You need a lead with a clear vision that drives the effort. Too many leads will create chaos.
Well, there's your problem - you need a monitoring microservice and an alerting microservice! Well, those may be too coarse by themselves, but once you break them down into 5 or 6 microservices each, you'll be ready for production.
To answer some questions: yes this is obviously poking fun at Docker, but I also do really believe in Docker. See the follow-up for more on that: https://circleci.com/blog/it-really-is-the-future/
In a self-indulgent moment I made a "making of" podcast about this blog post, which is kinda interesting (more about business than tech): http://www.heavybit.com/library/podcasts/to-be-continuous/ep...
And if you like this post you'll probably like the rest of the podcast: http://www.heavybit.com/library/podcasts/to-be-continuous/
> -It means they’re shit. Like Mongo.
> I thought Mongo was web scale?
> -No one else did.
It's so incredibly true, and I laugh (and cry, b/c we use Mongo) at this section each time I read it. Also, this gets me every time:
> And he wrote that Katy Perry song?
- So shared webhosting is dead, apparently Heroku is the future?
- Why Ruby, why not just PHP?
- Wait, what's Rails? Is that different from Ruby?
- What's MVC, why do I need that for my simple website?
- Ok, so I need to install RubyGems? What's a Gemfile.lock? None of these commands work on Windows.
- I don't like this new text editor. Why can't I just use Dreamweaver?
- You keep talking about Git. Do I need that even if I'm working alone?
- I have to use command line to update my site? Why can't I just use FTP?
- So Github is separate from Git? And my code is stored on Github, not Heroku?
- Wait, I need to install both PGSql and SQLite? Why is this better than MySQL?
- Migrations? Huh?
Frameworks, orchestrations, even just new technologies -- these are great if they actually make your job easier or if they make your product better. Unfortunately, they often do exactly the opposite.
> using a VCS for personal code can be overkill
I've been burned before, have you? If you're using something like Google Drive, you should use DropBox instead, since it seems less likely to lose your work.
Nooooooooooooooooo. Everytime someone says "service discovery" a kitten dies (Except for consul, that's the biz).
As such, I maintain SOAP should be gone for the good of the running system.
It's literally billed as "A fast and modern Python SOAP client". Python 2 and 3 compatible. Last commit was two weeks ago.
And going by the bugtracker, it's running into quite a few problems with almost-but-not-quite compliant servers/WSDL files, which is a real issue when you're trying to interface ass-old legacy APIs (we're talking "not upgraded since 2006"-old) made by $BigEnterprise. Maybe this time the project won't die before they work out all the little kinks.
I really dont have any idea why the people are are so excited about "docker" all the things.
I don't know if you understand what Docker really is when you say something like this: "Run only one process in one brand new kernel", the kernel is shared between containers, that's the whole idea, you package the things your application need and be done with it.
The current problem with containerization is that there are no really good or understood best practices, people are still experimenting and that's why it's a big moving target and, consequently, a pain in the ass if you need to support a more enterprise-y environment. You will need to be able to change and re-architecture things if the state-of-the-art changes tomorrow.
I agree with your sentiment about going overboard on "docker all the things", that's dumb and some people do it more because of the hype than by understanding their needs and using a good solution for it but I think you are criticising something you don't really grasp, these two statements:
> "Run only one process in one brand new kernel"
> you have a kernel in your hand, why the hell you will run only one process on it?
I'm not trying to be snarky, I really recommend you doing a bit more of research on Docker to understand how it works. Also, Docker doesn't make it a pain in the ass to upgrade apps, quite the contrary if you do it in some proper ways.
Except now Go and Rust make it very easy to compile static Linux binaries that don't depend on glibc, and even cross-compile them easily.
Hell I think it's actually not even that hard to do with C/C++: https://www.musl-libc.org/how.html
If I have a binary built by Go, what problems does Docker solve that just copying that binary to a normal machine doesn't?
You expose what are the network APIs of your apps (e.g open ports), filesystem mounts, variables (12 factors), etc.
Your application becomes a block that you can assemble for a particular deployment; add some environment variables, connect a volume with a particular driver to a different storage backend, connect with an overlay to be able to talk to other containers privately across different servers or even DCs, etc.
It's really all about layers of abstraction for operating an application and deploying it to different environments.
With the latest container orchestration tools, you can have a catalog of application templates defined simply in Yaml and it's very easy to make it run anywhere. Add some autoscaling and rolling upgrades and it becomes magic for ops (not perfect yet, but checkout latest Kubernetes to see new advancements in this space).
With the proper tools and processes, this removes a lot of complexity.
But environment variables already exists without docker.
Volumes already exists, aka partitions.
"Overlay network" already exists, aka unix sockets or plain TCP/UDP/etc over the loopback interface.
I'm not trying to be a dick here, it's just that the points you brought up doesn't really bring anything new to the table. How is this different from just having a couple bare-metal or virtual machines behind a proxy?
There are some aspects to containerization that are very feasible, but only at certain scales and the points you brought up makes me question whether you perhaps might be over-engineering things a bit.
For example, volumes: With Kubernetes (on Docker), the lifetime of the volume mount is handled for you. No other containers have access to the mount. Container dies, mount dies. Whereas on plain Linux, mounts stay. You need cleanup, or you need to statically bind apps to their machines, which will seriously limit your ability to launch new machines -- there will be a lot of state associated with the bootstrapping of each node. Statefulness is the enemy of deployment, so really what you want is some networked block storage (EBS on AWS, for example) plus an automatic mount/unmount controller, thereby decoupling the app from the machine and allowing the app to run anywhere.
Environment vars are inherited and follow the process tree, so those are solved by Linux itself.
Process trees also handle "nesting": Parent dies, children die. But you will end up in a situation where a child process might spawn a child process that detaches. This is particularly hard to fix when a parent terminates, because the child doesn't want to be killed. Now you have orphaned process trees. The Linux solution is called cgroups, which allows you to associate process trees with groups, which children cannot escape from. So you use cgroups, and write state management code to clean up an app's processes.
I could go on, but in short: You want the things that containerization gives you. It might not be Docker, although any attempt to fulfill the principles of containerization will eventually resemble Docker.
You now have generic interfaces (Dockerfile, docker-compose, Kubernetes/Rancher templates, etc.) to define your app and how to tie it together with the infrastructure.
Having these declarative definitions make it easy to link your app with different SDN or SDS solutions.
For example, RexRay for the storage backend abstraction of your container:
You can have the same app connected to either ScaleIO in your enterprise or EBS as storage.
We are closer than ever to true hybrid cloud apps and it's now much more easier to streamline the development process from your workstation to production.
I think it's pretty exciting :)
This sounds exactly like the "It's the future!" guy in the original post...
Something like kubernetes also lets you abstract away the lock-in of your cloud infrastructure, so whilst it adds another layer and a bit of complexity, it again is arguably worth the effort if you're worried about needing to migrate away from your current target for some reason in the future.
As a framework it abstracts apps from infrastructure quite well. It's super easy for me to replace my log shipping container in kubernetes and have most things continue to work, as all the apps have a uniform interface.
Nobodies saying you can't build these things without kubernetes, but it definitely gives me more of the things than configuation managment systems currently do. Personally, I'd rather aim at the framework than handles more of what I need it to do.
Finally, bootstrapping a kubernetes cluster is actually quite trivial and you can get one off the shelf in GKE, so I'm not really sure why I'd personally want to go another route.
This is not revolutionary in itself, but having the creation and deployment of a server being 100% replicable (+ fast and easy!) on dev, preproduction, and production environments, plus it's managed with my usual versionning tool, that is something I appreciate very much.
Sure, there are other tools to do the same, but docker does the job just fine.
The problem of ensuring that upstream dependencies can be reproducibly installed and/or built is, of course, left as an exercise for the reader.
What networking problems does Docker solve?
Your program don't see what else is running on the system. Also means that it removes possible conflicts for shared libraries and other system-wide dependencies.
This kind of isolation is not only good for app bundling as a developer, but even more important as an operator in a multi-tenant scenario. You throw in containers and they don't step on each other toes. Plus, system stay clean and it's easy to move things around.
Network namespace as in linux network namespace (http://man7.org/linux/man-pages/man8/ip-netns.8.html).
Each container has it's own IP stack.
Containers provide proper abstractions so you can then assemble all of this, pretty much like you use pipes on a unix shell.
Deployments, installations etc. are pretty easy, it's not something containers are actually good at solving. At best you containerize the configuration management itself, which simply makes it harder to work with.
Nowadays all that I do is setup a barebones CoreOS instance and fire away containers at it, be it with kubernetes (and then my config management is a bit more robust so to setup k8s in CoreOS) or just use CoreOS's own fleet if it suffices.
Then I get the goodies of containerization such as process isolation, resource-quotas, etc.
Like I said: it isn't painless, sometimes much the opposite, but it's worked much better for the lifecycle of most of the products and services I've been working on the past couple years.
Even before with automated deployments it wasn't so easy when configuration begins to get hairy. And yes, you can argue that this might be a smell of something else but that's what I've seen happening over and over.
One process per container is perfectly fine. In fact, that's the common use case. There is absolutely nothing wrong with it, and there is practically zero overhead in doing it.
What you gain is isolation. I can bring up a container and know that when it dies, it leaves no cruft behind. I can start a temporary Ubuntu container, install stuff in it, compile code in it, export the compilation outputs, terminate the container and know that everything is gone. We do this with Drone, a CI/build system that launches temporary containers to build code. This way, we avoid putting compilers in the final container images; only the compiled program ends up there.
Similarly, Drone allows us to start temporary "sidecar" containers while running tests. For example, if the app's test suite needs PostgreSQL and Memcached and Elasticsearch, our Drone config starts those three for the duration of the test run. When the test completes, they're gone.
This encapsulation concept changes how you think about deployment and about hardware. Apps become redundant, expendable, ephemeral things. Hardware, now, is just a substrate that an app lives on, temporarily. We shuffle things around, and apps are scheduled on the hardware that has enough space. No need to name your boxes (they're all interchangeable and differ only in specs and location), and there's no longer any fixed relationship between app and machine, or even between app and routing. For example, I can start another copy of my app from an experimental branch, that runs concurrently with the current version. All the visitors are routed to the current version, and I can privately test my experimental version without impacting the production setup. I can even route some of the public traffic to the new version, to see that it holds up. When I am ready to put my new version into production, I deploy it properly, and the system will start routing traffic to it.
Yes, it very much is the future.