Hacker News new | past | comments | ask | show | jobs | submit login
Go + Services = One Goliath Project (khanacademy.org)
418 points by yawn 32 days ago | hide | past | web | favorite | 434 comments

We turned our monolith into a bunch of micro services almost 6 years ago to the day. For a long time I was very happy with the new pattern but over the years the weight of keeping everything updated along with the inevitable corners that fall behind and have...questionable..security due to how long they sit neglected has really left me wondering if I am happy with it after all.

I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.

A company I am affiliated with made a decision to rewrite their code in microservices-oriented architecture thinking it would only take one year. Now we're 7 years into the transition and starting to come up against some hard deadlines that threaten revenue streams. It seems obvious to everyone except the leadership and the architects that this has been an unmitigated disaster. Other comments on this thread seem to indicate that many have had similar experiences.

For those who are curious, here is a classic article on why rewriting code from scratch is a bad idea: https://www.joelonsoftware.com/2000/04/06/things-you-should-....

For a more in-depth analysis on the unforeseen challenges of microservices in particular, I would encourage a lot of careful research into how other companies have tried and failed at this. In particular, I might look at Uber's ongoing difficulties.

All I have to say to the Khan Academy engineers is to buckle up because frankly, moving from Python 2->3 is not that hard and you have no idea what you are getting yourself into.

Yes, the micro services + Golang vanity project because you think you’re Google. I really don’t think people understand error states in distributed systems very well, putting possible network partitions everywhere is not a great idea. I would strongly suggest trying a Golang monolith first and seeing if there are one or two heavily used services that need splitting off. Also monorepo. Always.

The really fascinating thing about this tendency is that Google itself never completed nor even really started the wholesale transition of programs and services to Golang/microservices. Google does have services that are micro with respect to the overall codebase. But they aren't what most people out there in the wider world would think of as micro. And Golang remains a niche language at Google, perhaps more popular than server python, but far smaller in usage than Java or C++.

Microservices have always seemed dangerous to me. "Bugs thrive in the seams between libraries, so let's put more and deeper seams in!"

Microservices have an immense cost, and you have to make sure they're worth it. Many teams years ago found it a nice pattern and implemented it because why not, and now we're at the "oops this isn't actually amazing" part of the cycle.

If you work at Google where the standard is higher fine, but 99% of places people end up making very buggy stuff that interrelates in weird ways!

Do you have a reference for the uber microservice difficulties?

this. 100% this.

In my experience, the biggest benefit of microservices is decoupling teams.

Developer productivity is very hard to maintain in a monolithic app as the number of developers increases and the legacy code piles up. Breaking up services and giving each dev team control over their own codebases enables them to develop their own products at their own pace.

If you only have one dev team, microservices are a lot less attractive. However, there are still some benefits, such as being able to refactor parts of your codebase in isolation (including perhaps rewriting them in different languages), and the ability to individually adjust the runtime scale of different parts of your codebase.

I was already achieving that around 2008 by having each team responsible for their modules, delivered over Maven, or on the late 90's by having each team responsible for their COM modules.

No need to over-engineering modularity with distributed systems algorithms into the mix.

This. The problem with decoupling services is usually there end up being a couple services that are critical but not sexy.

No one wants to touch them so they sit around unmaintained until an unrelated change or unpatched security issue comes around. Suddenly you've got a big problem with a mystery codebase.

That sounds very familiar, but I'm not sure this is something that can be blamed on decoupling itself. An unpopular module is going to need as much attention as a micro service from code point of view. For upgrades / patching, there would be a company-wide process around it that doesn't care that much how the code is organised.

'company wide' process is either, 'squeaky wheel gets the grease,' or, 'no one even knows this exists,' 100% of the time in my experience. This is from going from 10k+ to 150 to 5k+ to 50 people.

I wrote company-wide, but it's not the case everywhere. At some scale you'll want department, or even project-wide process. But the policy should be fairly common - who owns it, what's the response time, how to escalate urgent things, etc.

I mean, I can write policy documents all day. Most of them would have had a better life as toilet paper.

Indeed, microservices is mostly about scaling development.

Microservices is changing dev complexity to ops.

That's why most companies are promoting devs to do ops .

So then you have devops

It think in ways this is true.

Containerization, autoscaling, service discovery, tracing, metrics and monitoring et al - lot of it is required to do larger scale, distributed systems. Even if you do not call them microservices.

This is nonsense. You can already do that via libraries. The choice of RPC vs local procedure calls has no effect on scaling development.

IMO the only reason to use micro-services is the one they mentioned - you can have different parts of your system running on different machines so they can be spun up independently. But I think most people aren't "web scale" enough to need that anyway.

Modular programming as well.

I've found there's a happy middleground. You need medium-sized-services that still share code libraries. For many companies, this is often 7 or 8. The key is to combine like business units/features, not necessarily fragmenting at every visible code boundary. A deployed "service" can really just be several HTTP paths and/or gRPC services in one repo. You still get to keep decent separation of work, deployment, versioning, dev focus, etc with these medium sized services without sacrificing the benefits of larger, more centralized/shared reuse.

I’ve accidentally landed on an architecture like this and I’m actually pretty happy about it. It was driven by a desire to kill a large monolith slowly, by extracting key features into separate services. Sold to the customer as microservices because sexy fad of the moment. Our real motivation was we had way too much trouble recruiting in the monolith stack (.NET) and had a surplus of embedded C++, python, and JS engineers. Anyway, turns out our teams naturally self-organize around four or five domain+language clusters that effectively form separate services that are too large to really be micro, but too dissimilar to play nicely together in a monolith. Eg, python data science module wrapped in a Flask API providing physics calcs, legacy C# service providing simple data models via REST, JS react front end served from a separate node service, a weird C++/python hybrid used for an embedded device sim service, etc. It’s not what I planned as the tech lead/architect, but I think we organically reached what is really the best approach for our team. Definitely an element of Conway’s law in action, but in a good way. We are making the most of our organizational structure rather than fighting against it. Would this scale to google levels? No, probably not. But we don’t need it to, and it’s incredibly unlikely we ever would given our specific business.

Start with a monolith that has clear internal APIs that are designed so they can later be made into network APIs. This gives you the development speed of a monolith while maintaining an options for the future. When you do break things out into separate services: try to make as few of them as possible and maintain the ability to build as a monolith.

Forget everything you have heard about micro services. Most of it is bullshit from people who don’t actually think for themselves.

This. If you can't design a well segmented monolith, you can't design a well segmented system of microservices either. The microservices will just be buggier and much harder to fix after the fact.

Design is best evolved. If you can get a lot of that done while you still are able to run a well structured system as a monolith you can save a lot of time. It is cheaper to change an interface in Java/Golang than a REST API.

Could not agree more. I recently led a refactor of a monolithic .NET MVC app and took this exact approach. We made all of the controllers thin, with almost no logic at all beyond specifying the route and dependency injection. Then redirected the request to a “service”, which originally was just a reworked combination of the old controller/model logic hidden behind a common service interface. Then, slowly we replaced the C# services with microservices. So we went from ball of spaghetti to monolithic service oriented architecture lite to actual microservices with the monolith converted into an API gateway. If we didn’t have independent motives for going to microservices, sticking to the clean and well organized internal APIs of the refactored monolith would have been totally fine.

I moved back to monolith and am very happy. I think of the monolith now as a collection of modules. The rule is now, one should be able to drag any of the modules to the top-level of our monorepo and create a new microservice pretty easily when the time comes. I think the microservices book (that came from that Uber engineer...?) suggests a rule of 5 engineers per service.

How are you preventing transaction couplings? For example, module A and B are called by C. C starts a transaction that wraps A and B. If you move B up as a network feature, you lose the transactionalty.

Good question, and this rule is meant to be bent in those scenarios. I try to avoid these dependencies if at all possible, but if not possible, the "writes" for those modules all belong to a single service, and any other service, depending on how "pure" I need to be in the project, will make network calls to the other service, or just grab that data directly from the database.

For a more concrete example, I recently built a service ("scraper") that scraped data and upserted a large tree of structured data to postgres in a transaction. Writes were only allowed from scraper, but "api" could SELECT data for reporting to the frontend "web" as much as it wanted. In the future, "api" might make be refactored to make internal HTTP request to "scraper," so they could have totally separate databases.

Usually the answer to this is "we pray to god it doesn't fail".

Our approach to services at Khan Academy is likely a bit different from most. We're sticking with a monorepo (the code for all services lives in one repository). We have a single go.mod file at the top of the repo, so all services use the _same versions_ of dependencies.

We're still building out our deployment system to better support multiple services, but we're planning to redeploy all of the services when library code changes (which is something we're trying to minimize).

All of this ensures that we don't have trouble with services lagging behind on critical updates.

I don't really understand moving to microservices if you aren't going to give the service teams the autonomy to make their own decisions and move at their own pace. Microservices always seemed to me to be more of an organizational strategy than a technical one - if you have services, but they can't operate independently, it feels a bit like you are just creating a distributed monolith.

We're making a certain set of tradeoffs. For example, we're not adopting the "write code in whatever language you want" form of microservices that some folks adopt, because we don't feel like we're large enough to support that.

Like I said, though, we do want to minimize the library footprint. The vast majority of deploys in this new world will be single service deploys, with the benefits that come with that. We already deploy our monolith several times a day. These services will speed that further.

It'll be interesting to see how it works. I would probably fight to not have shared dependencies because the relatively small benefit does not seem to merit the increased coupling between teams - if all teams needs to agree before a library can be upgraded, I can't imagine it will be very easy to keep libraries up-to-date. To a certain extent it depends on how large your engineering org is, which I don't know.

I've been in companies that moved from monolith to microservices and I saw it work well except when teams had such tight cross-service dependencies that they had to get other teams' ok before making changes to internal details of their service. Then developer velocity was slower than before because it took time to make the cross-team discussion happen and political capital to make other team care when they have other priorities.

We'll see how it plays out, but the situation that I've described isn't really different from the one we have today (because we have a monolith). Hopefully, it will be better because of how the Go project is trying to get library maintainers to follow semver and avoid breaking changes. When we need to upgrade a dependency, we can do so in one diff, catching the errors with the compiler and test runs. If the upgrade seems risky, we'll watch that deploy carefully, and we already have a process for "risky" deploys. Plus, these are likely some of the easiest changes to rollback if need be, because they are unlikely to change persisted data.

Ultimately, though, if we find that this plan reduces velocity, it won't be that hard to change later.

Didnt you guys recently decide to write some services in kotlin? Are you rewriting those in go now since everything is going to be one language?

We have some dataflow jobs written in Kotlin (we blogged about that in June 2018[1]). We also have an internal service written in Kotlin.

We ideally want one language. But Apache Beam (which is behind Google Dataflow) doesn't yet have production support for Go. More importantly, though, we have no time pressure on switching the Kotlin code over, so that's a long way out.

[1]: https://engineering.khanacademy.org/posts/kotlin-adoption.ht...

Thanks for answering! That was the specific article I remember reading.

Choosing the dependencies' version is not all there's to decide. Teams have a lot of control of basically everything else.

So, no choice of language and no choice of the libraries used. What’s “everything else” exactly? Sounds like missing out on the more interesting parts of a micro service architecture.

Yeah, I'll never work for a company[1] where service teams are free to choose any language; 2 or 3 options at most is fine, but more than that is a hard no. I'll have to read/work on that code sooner or later, and I have no time to be dealing with a hodgepodge of languages

1. In the 10-5000 employee range: tech giants are a different beast when it comes to team accountability.

> I'll have to read/work on that code sooner or later

In the microservices organizations I've seen, this isn't true. The other service teams provide an API and, like any other SaaS you use, you do not need to be able to read the implementation. You would only work on that code if you switch service teams.

Sounds similar to some work I’ve been doing, so thanks for unknowingly validating my design!

At my employer, I’m spearheading a wholesale reimplementation of outdated process automation software, turning everything into Django web apps.

I’ve been working with a monorepo and monolithic deployments to maintain development velocity but recently started transitioning the CI/CD pipeline to deploy each application/service in the monorepo independently. The pipeline packages common assets (including, e.g., manage.py, common HTML templates, and the dependency spec...all housed in the same monorepo) into each app directory before the deploy stage.

Meanwhile, local developers clone the entire monorepo, and when they launch localhost, all of the services come online simultaneously. (That’s the goal, at least!)

I was already excited to see my work come to fruition, and now I’ll be keeping an eye on Khan Academy, too!

That does sound pretty similar (though our services all have the luxury of serving nothing other than GraphQL!).

Our current plan for local development is to continue cloning the monorepo and firing up all of the services. Go services don't take a whole lot of resources, so we think this plan will work fine for quite a while.

I'm in an org that runs multiple services in Go. The dependencies on library stuff has been a very minimal need. I think you are optimizing for an imaginary problem. With microservises, a team should have non breaking API versions running and work with teams to transition to a new API version when needed. If the underlying uuid lib or kaftka lib changes, teams may or may not need to update, but they can do so on their own time.

IMHO a microservice should follow the unix philosophy of doing one thing well. But interfacing with other services is not as simple as unix pipes. In particular it is more of a request/response communication. Consequently, interfacing needs to be thought through carefully and made as simple as possible. They should evolve much more slowly than individual service code. While you may use (g)rpc or http at a lower level, your specific protocols will have many more constraints. Ideally you have codified assertions about their behavior in tests early on.

Note that they are not all going to start/stop at exactly the same time and during development they may even crash so each microservice should survive such transitions. You may have two different versions of service code or even completely different implementations or two different version of the API in use at the same time. This can happen as different services evolve at a different rate. And you may want to transition to a new version in a piecemeal manner so as to not bring the whole service down. All these considerations complicate things. So ideally you have factored out these common tasks in shared library/packages. And ideally you write your code such that if necessary more than one service can be compiled into the same binary for performance reasons.

In a monolith some things become easier since everything dies at once! But somethings become more complicated - such as supporting evolving code. And monolitha require more discipline to keep things modular. Over time this gets harder and harder. Lack of modularity means you have to understand a lot more code and when you evolve things, more code will have to change and there may be unforeseen side-effects. And scaling can become harder.

I think that the fact you have much more visibility over which of the services are behind and have lower security practices one of the things I love about our microservices - we recently (a year ago) broke up our monolithic codebase into a service oriented architecture (I would hesitate to call our services micro personally) and I was astounded at all of the hidden security issues and random code in the far reaching corners of the monolith that hadn't been touched or thought about in years.

It is much easier (imo) to pop into a repo of one of our services and look through the code in it's enitrety and see when things were last touched and where the issues are. I would make the argument that the "inevitable corners that fall behind and have questionable security" is something that is inevitable in any codebase that grows to a certain complexity, and microservices (or SOA in general) make it much easier to see those things as they are decomposed.

Well, I can only agree with all that... But the move to services adds a lot of surface that needs it's own security and architecture maintenance.

The more you break down your code (the smaller the size of the services), the more maintenance need is created from the division, and the easier it is to fall behind on it.

A recent project I was exposed to has been struggling with Microservices and a multi-repo setup. Even with CI/CD and a lot of good tooling around their setup.

The overhead introduced with having such a setup in a corporate environment that has not-so-well-though-out requirements and design is ridiculous. Keeping track of dependencies, arcane knowledge of inter-service dependency quirks being siloed and hidden, keeping individual services up to date, dealing with older services and their interaction with newer services till they "migrate" to newer tooling/common code, etc.

Everyone then skirts around the fact that the problem could potentially be Microservices or a micro-repo setup. Instead, they throw process, sign-off, complicated promotion pipelines and just plain warm-bodies at the problem in an attempt to mitigate it. But the damage is done, velocity has slowed to a crawl and everyone is miserable, especially when having to explain the whole thing to newcomers.

The best ideas can be implemented poorly. Software design principle that works for nimble Silicon Valley startups doesn’t work in your big corporate environment? Big surprise.

When I worked at a pretty large software co, a team there were always adopting the latest techniques and tools but their deployment pipeline was an over architected disaster that nobody could reliably deploy. Reason? Their CI/CD system consisted of a person manually clicking around to build Jenkins Jobs. There will be other such silly nonsense (hopefully less extreme) in other corporate environments too.

Like many things in life, there are no absolutes. So rather than going to opposite ends of the spectrum, check out modular monoliths.

This approach is a nice balance (IMHO), since you start off with monolith but it is broken down into cleanly separated modules. Each module can potentially become its own micro-service if or when the time comes.

In terms of implementation, this can be easily done, for example we do it JVM/Kotlin where each microservice is its own project that produces a binary/jar. All the projects are part of a multi-project build. Lastly we have a common project for shared code, utils, types, enums, interfaces, etc and an application project that loads/sets up all the microservices from each project. Works great so far. And when you do have break up 1 project into its own service, the effort is fairly manageable.

Do you also break up the data for each service up front or do you pull that out later?

Same. I tried microservices before and while it cleaned up the code base, I didn't quite like the results down the road. Some things go stale, you multiply ops * number of microservices, a change in one can mean a change in multiple others. I'm not against services in general, but not a fan of so called microservices.

I’ve been through a transition with about 60 devs that went DDD plus microservices. A few monotliths ended up as a couple of hundred services, and looking back I feel we got basically all positives.

What other people say about scaling teams is true, but I have a few other points as well:

- personally, I spent 6 months writing tooling for service lifecycle management and setting strict conventions. This was before the microservices decision and I was recruited to do “devops” which gave me a lot of head room. :p

- when talks about microservices surfaced I had to fight for several months to get the lead devs on-board with tooling and conventions. From the infra/cm/systems management side we’re used to manage many thousands of “configuration items” - devs are usually not, and many underestimate the value of proper automated lifecycle management.

- once everyone was on-board we all used a common language. Big win.

- I could form a “devops” team to help develop the tooling further and the infra platform as well.

- almost all teams worked in mobs - amongst other things it really helped with the ownership part, something that’s absolutely crucial. Well defined domains and accountable mob teams, just awesome!

- quality rose by a mile. Small commits and somewhat robust tooling under the eyes of a mob.

- graph the pains. If outdated versions are a problem - put it on a graph, green, yellow, red. Show services in relation to other services - the application is the sum of all connected services.

I could go on, but the post is getting long! :)

Can you explain what you mean with service lifecycle management in regards to micro services please, or do you have a book on it? I'm currently studying SE and it's the first time it has come up. Thank you :)

Well - I’ve got a “service management” background, so it’s completely natural to talk in these terms. :)

A service have in a way two interfaces - one business (the work it’s doing), and one technical (how it does it; a port publishing an endpoint or whatever).

No business process to support, no service to manage.

The lifecycle of the technical service will consist of a bunch of actions that will be taken, perpetually, until the sunsetting/decommissioning of the business process. Actions: init, code pushed, deploy, monitor, trace, update, decommissioning etc.

You take these actions and put then in a lifecycle circle and you have a nice powerpoint!

As much as possible, preferably everything, in this cycle have to be governed by conventions and automations.

- Automatic follow-up if a service have no upstream or downstream services for example. Why do we have a dangling service?!

- Or, you have a key service tied to an SLA, but upstreams are not matching this?

- you have services that have not been touched in a timely maner.

- etc... just drop the relevant team an automated slack message with the option to initiate whatever is required to keep the lifecycle churning.

With many thousands of assets/ci (configuration items) almost everything have to be automated or you will grind to a stop eventually.

If you can couple business process to automated technical service management - big wins!

Never heard about this concept, I really like it.

To me, it’s kind of what ”DevOps” should be about, from a technical perspective:

Take the best/reasonable parts from ITIL (concepts/principles), mix it with principles from the agile manifesto and the 12-factor app. Automate the lot of it.

Doing this in practice gives you dev & ops.

It’s quite a journey that is more difficult the bigger you are. It scales though, so start small, prove the concepts, and grow organically.

Thanks for sharing, that's pretty exciting. I think most people don't realize the work they have to put in to make this work (and reap the benefits).

Thanks for reading.

The more i write and talk about it, the more I realize that it is about ”externalities”, so to speak.

A microservice is just code - one small piece that does one, tightly defined thing. We’ve been doing code always and smaller pieces of is easier to deal with and reason about.

The structure around keeping 100s or 1000s of moving pieces in concert is where a lot of the work is shifted. It takes teamwork as well as a common vision and language. The above sentence tangents “culture”.

> weight of keeping everything updated...monolith repo

You need CD or this is an accident waiting to happen.

> I would love hear some thoughts from others that made the move, especially anyone that decided to move back to a monolith repo.

We had a big monolith where I work.

We’ve been slowly, but surely isolating parts of the monolith as separate deliverables, extracted into their own repos. But only when appropriate, and not as a forced exercise.

The remaining “monolith” is still pretty big, but it does (mostly) represent one logical deliverable, so effort to split it up has some what stalled.

There’s been small points of friction, but nothing near as painful as we used to have it. No way we’re going back.

So everything in moderation. Micro services architecture is a tool. Use it when it’s the right one.

I way prefer monolith compared to dabbling in microservices here: https://natalian.org/2019/05/16/Microservices_pitfalls/

Go is a very weird language :-(.

It's very limited when you started to do complex thing. Example, let's say you are building websocket. You will have a hard time to write type safe websocket handler to process the payload from client for all the events...

I started to do Rust/Crystal and both of them are better than Go(performance, type system).

Yet, whenever I build something for work, I come back to Go :-(. I told myself to use Rust or Crystal.

Then I realized that Go is a practical language. It compiled fast so it makes testing easier. The cross compiler just make it so easy to build binary run on everything thing. And the limitation of Go makes it very consistent on how you do thing. This makes working with Go become faster event by the fact that it slows you down on other parts.

So I think Go is a language that people easier to fall into because it has the speed of interpreter language like Ruby/Python(or even faster) during development and have a better performance/type safe story.

Java and C# provide the benefits you mentioned, while being more expressive languages and having better runtimes compared to golang.

Hardly. OP mentioned fast compilation and limited ways to code the same solution.

C# and Java are slower to compile and offer way more options to do the same thing.

Here's Uncle Bob take on testing with Go: https://m.youtube.com/watch?v=2dKZ-dWaCiU&t=36m40s

> C# and Java are slower to compile

Not for any meaningful work in my experience. As a matter of fact, I found the change/compile/run loop in golang to be slower on projects I've been working on due to the fact that it doesn't support incremental compilation, so any change I make ends up recompiling the entire program and writing out a 100+MB binary anyway. Compared to a Scala project I worked on before (and Scala is notorious for slow compiles), after the first compilation, all modifications happen very quickly as only the respective classes are re-compiled.

> Here's Uncle Bob take on testing with Go

Again, this doesn't apply for any non-trivial/large project. On a project I'm working on, it literally takes 7-8 minutes to do a clean build + run all unit tests in golang.

Go absolutely does incremental builds by default and has been like that since I can remember. Packages are only rebuilt when their source or their dependencies change.

Same for tests which are cached by default so during typical development only a subset of tests are executed and compilation time can be a big part. Leave full tests for CI.

An anecdote I found from 5 years ago:

On my 1.7GHz processor it takes 10 seconds to build the whole standard library from scratch (300k lines of code).

It's fast AF.

just trying to understand - you guys think moving a Python2 monolith to Python 3 is too painful, and so you are going to port all the code from Python2 to a completely new language (Go), change the architecture (monolith -> microservices) and move the HTTP API to React + GraphQL, all in one year?

2020 is going to be in an interesting year at Khan Academy ;-)

The move from Python 2 to 3 would likely have also involved changing the architecture so they could migrate components incrementally. Since they were going to do that regardless, and if they already wanted to change the interfaces from HTTP to GraphQL, this is a natural time to do it. Though, this migration has nothing to do with React--they were already and will continue to be using it.

That isn't what they said. They carefully explained that they could migrate to Python3 but that the benefit of doing so was small so they looked at the performance benefits of using other languages and decided that the performance benefit was large. The performance benefits of using Go (or Kotlin) were the deciding factor.

My guess is that the framework for them thinking about this is that they have already been thinking about migrating languages to get better performance and the Python3 migration seems like completely wasted effort if you then throw it all away to go to another language shortly after.

2020 is absolutely going to be an interesting year.

One thing that might not have come across clearly in the blog post: we're already well on the path to using React everywhere (we started using React a week after it became public… 6 years ago?). We made the decision to move to GraphQL in 2017, so we've already got a lot in our GraphQL schema. Finishing those switchovers will make our move to Go happen more quickly.

At least the article makes it seem it's not a decision taken on a whim and they did some kind of POC and planned the transition.

Of course the obvious thing missing in the article is how they expect to deliver new business features while recoding everything in a new language.

It's fun to read this and the Etsy thread currently on the frontpage as well.

> Of course the obvious thing missing in the article is how they expect to deliver new business features while recoding everything in a new language.

For new features, the new parts of the GraphQL schema for those features will be written in Go as part of the new services. Our frontend is already in large part a single page app in React which requests data via GraphQL, so the frontend for the features will look just the same as it would on our monolith.

> If we moved from Python to a language that is an order of magnitude faster, we can both improve how responsive our site is and decrease our server costs dramatically.

I see people say things like this a lot but my experience is that while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency or that servers are running at 60%+ cpu across all cores.

If 90% of your service latency is not directly on the cpu and/or you haven't profiled to see that the performance bottleneck is evenly distributed across all tasks, then it's super dangerous to migrate to a new language thinking that will fix it.

I hope people inside Khan Academy know this and it's just a clickbait blog. If they really think "go is 10x faster than python so we'll only need 1 server for every 10 when we migrate" then I think they'll be disappointed.

* It’s not just that Go is faster to run but also faster to iterate on. If python can be neither, its offering little benefit.

* they moved from a monolith to a microservices architecture; concern that any of the services in the request path could add latency just because of the overall runtime speed is slow is a legitimate one.

* their primary deployment method is Google App Engine where you are billed by CPU used. Any change that consumed less CPUs has a tangible effect on their costs

Go is faster to iterate on is absolutely false. Where do you get this idea from?

Having worked in larger async twisted Python and Go, the experience of our teams is that Go is faster to iterate on by a long shot. We've replaced most of the old Python. We just brought on some new devs on my team. They were able to make new contributions to the Go stuff in short order. The Twisted Python, not so much.

I can easily ask the vice versa of such a bland question. Answer that first if you want a real answer instead of trying to flame.

It’s not bland it’s direct. What you’re doing is conflating personal preference with actual language features. Most developers who aren’t us would say python, being a higher level, dynamic scripting language rather than Golang which is lower level and extremely explicit about the data your program is using. Iteration in go is simply harder as you have to be more explicit rather than sketching something out. Changing that more explicit stuff is harder if you got it wrong while iterating. Why do you think Golang is better at iteration?

Add in extremely poor error messages, lack of generics, having to generate loads of code for various things, the ability to crash whole services if your program does something incorrect, excruciating error handling will all slow you down.

It’s funny that none of the things you mention in your last paragraph are handled any better in Python. Without type safety, you instead have to deal with bugs caused by inane mistakes. If we’re talking about generics, the conversation has shifted from “writing scripts that does x” to “maintaining a production system” and for the latter, golangs type safety, out of the box excellent default tooling and easily grokkable concurrency primitives make it far more maintainable than Python.

Things are also easier to change in go because of the type system and interfaces. The former catches most the obvious incompatibilities, the latter ensures that abstractions don’t leak across different system boundaries; whereas in Python there is a tendency to pass a do everything objects across the system.

Error checking has improved substantially with error wrapping in go 1.13. Not only can you locate precisely where your system failed; you have to be explicit about handling errors. I do concede that pre error wrapping the error handling was garbage.

Clearly I don’t agree but it’s an interesting and detailed answer. I do agree that type systems help with refactoring but not iteration. There’s a fine line there. Personally I think Golang’s type system isn’t as good as it could have been... I do like the idea that Golang reports to provide quite good (simplicity above everything), if they added proper macros to replace code generation and to replace the desire I have for generics it would be a much more useful language for my needs.

Dynamic languages generally enable faster iteration in the early stages of a project, but once you have a large, mature codebase, static types allow you to work faster and produce fewer bugs. A more performant language will also run your test suite faster which can have a big impact as a project gets very large.

Also, while I agree Go’s error handling isn’t very elegant, it does force you to explicitly consider every potential error, which in my experience makes uncaught errors far less likely than a language with bubbling exceptions.

IMHO the affirmation of

> Go is faster to iterate

holds true considering the total lifespan of the project. Golang is more explicit thus requiring more time to define every type but I've never refactored so fast and safe a codebase. In Python the fact that is dynamic makes it more difficult to safely iterate over it (is more statically-typed Vs dynamically typed). About error handling, it's not perfect, but the code is readable, easy to follow and easy to reason about.

> Iteration in go is simply harder as you have to be more explicit rather than sketching something out.

I will completely agree that prototyping in Python is way faster. Python is my preferred language for throwaway/prototype code.

> Where do you get this idea from?

It's solely based in my experience. I hope it contributes to the conversation.

Well Brad, that's very dependent on what exactly the service is. Moreover oftentimes low CPU utilization is actually a limitation of the implementation on a slow language (eg; Python technically does have async webservers but adds a lot of idle overhead).

Indeed, there are many benefits these more performant languages have over Python aside from raw single-core performance. For starters, more efficient concurrency and parallelism can help reduce average latency when combined with a quality async webserver. Then there's gains due to shared memory across threads.

So in many cases-- absolutely, you can only need 1 server vs. 10 when you migrate. It's thus not fair to say that these gains are "very rare".

I inherited a few python/django servers. One of these has workers that grow to about 1Gb over time, even though they retain absolutely no data in core (or shouldn't). The same server is used to collect data, convert things and analyze the data. Especially the latter can take a bit of time, which means that there is a problem when more than two people try it at the same time, since it severely hinders the other tasks.

I'm now converting one server to Go (although not the heavy one), and it really runs fast and uses much, much less memory. It also starts in in less than 1s, whereas the django application takes 5 minutes, because of some stupid problem in static file collection.

Python is fine as a teaching tool, to prototype in, and to use in notebooks as a wrapper around numpy, scipy, etc., but not to run in production.

> while other languages are 10x or more faster than python in some benchmarks it's very rare that computation time dominates server latency

Most applications spend most of their time waiting for the database or network. I suspect the fastest programming languages are those that have the lowest thread/process overhead. If most apps spend their time waiting, then a language with 10X lower process overhead can handle 10X more processes.

You can also avoid using separate process for each client (NodeJS).

> You can also avoid using separate process for each client (NodeJS).

Yes, NodeJS solves one problem by having low process overhead, but it also fails to take advantage of parallelism in modern processors. Ideally, I'd like to see a system with both.

> Ideally, I'd like to see a system with both.

Java, or Kotlin, using one of the reactive frameworks.

You can always use the 'cluster' module or worker threads.

Sorry if this wasn't clear: Go is 10x faster than Python, yes, but we know that we're not going to reduce our server count by 90%. 50% is quite possible, though, given Go's superior threading and its good resource use. Moving away from the monolith should also give us new optimization possibilities.

Migration from python 2 to 3 is easy and fast. I've migrated multiple large apps and it took about a day each. Most libraries that matter have been migrated. Some don't even support python 2 anymore. It's practically 2020. This should not even be a consideration. After 2 to 3 is done they should consider again If they want to redo the stack but first I'd focus on this small maintenance task.

Hahaha, sorry but this is a very cute thing to say, in my view. At our company we just barely finished migrating our software with nearly a million lines of legacy Python 2 code to Python 3. This took over a year of nearly exclusive migration effort, just making our code work with both. The entire migration project started way before I joined the company several years ago.

So, no, things are not as simple if you're not dealing with toy projects. And no, you can't assume that it's the same for everyone if you're not in their shoes.

Your comment is pretty much the equivalent of "I don't see a bug. Works for me."

It very much depends on how good the codebase is. I also spent a year on and off porting a large codebase from 2 to 3, and it would have gone an order of magnitude faster if the codebase were in better shape.

I agree that code quality is a big factor. But what is good code quality in Python? In our case, the oldest code is the most "pythonic" and is at the same time the worst to maintain. The better code mitigates the drawbacks of dynamic typing and by that moves away from the pythonic standard you see in many libraries.

But even if you nail the types to the board (e.g. assert isinstance(...)), use (the somewhat weak) Mypy wherever you can, and have good test coverage, you still have to grep your code base for usage of, e.g., .keys(), eyeball hundreds of modules for subtle Unicode madness or hunt for the odd division, replace every sort() that doesn't use key= yet, etc. The todos add up and someone has to go into the code and change those lines.

What specific “pythonic” habits have you found are more difficult to maintain? Asking out of genuine curiosity, not to challenge the premise. I work with a lot of data science people that really emphasize being pythonic, but coming from the software/static typing side of the house I always find their code style and architecture a little concerning, and I’m not sure if I’m just not getting it or if they really are writing spaghetti.

One of the biggest footguns is method naming. Most Python libraries will gladly use generic method names like "add()" or "getName()". The moment you need to rename the method or change the signature, you will have a hard time telling it apart from all the other method calls by the same name. No type inference will save you here because type inference is incomplete and will never let you find all the callers.

What you should do is use unique names. But that will give you ugly code like myFoobar.foobar_addBar(). The kind of code that makes the pythonic crowd cringe.

Another problem is making code too generic with regards to what types it consumes, instead of nailing it down to the few types you're ever going to use here. This makes it hard to reason about your code months and years down the line. How is this method used in the rest of the code base? Do all callers expect an int? What if my method now happens to return a float?

And there's also abuse of duck typing. Throw around a lot of objects, sprinkling methods and other members on to them as you go. Then when you consume the object, just look if it has the method you want to call. This makes any kind of type checking and static type inference useless.

And then there's a whole lot of Python 2 libraries where you get the feeling that the authors didn't give too much thought about whether they are dealing with str or unicode. The method might just call .encode(...) on one of its arguments without being too sure what it is.

And every one of the mistakes that result from the above practices might only pop up when your code has already been shipped to the customer site.

Of the things you mentioned, only Unicode has been a problem, and that's exactly what 3 is fixing, so that's to be expected. The rest was automatically handled by 2to3 with only a cursory review.

We tried 2to3 and it gave us poor results. But probably because we deviated too far from being pythonic.

The str/unicode misery is one of the biggest gripes I have with Python. I'm glad this unpleasant knot has been mostly untied in Python 3. I came to the conclusion that the transition would have been much easier if Python 3 just concentrated on the separation of bytes and (unicode) strings. The other features could have been in Python 4.

It's a bit like IPv6. If it would just solve the address space problem, most would have moved to it already. Instead it comes with a lot more baggage. And each additional feature has it's own uphill battle for acceptance. So nearly everyone is dragging their feet, citing their pet peeve with the technology.

I'm not sure that's true because, as I said, most of it is automatic and worked well with 2to3, leaving us to deal pretty much only with Unicode. I'd certainly prefer to only have to do this upgrade once.

How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper, but it makes the code more convoluted instead of transforming it to the key-paradigm. And if you want to sort by several keys, now you will have to call sort several times.

How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?

Those are just off the top of my head. I don't remember all the cases where 2to3 tripped over and produced garbage. But there were too many cases to put actual faith in automatic conversion.

It might work if your code is kind of new and homogenic. But looking at how much trouble Dropbox had, even with all the tooling and Guidos they could muster, I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.

> How do you automatically go from sort( ... some elaborate compare function ...) to sort(key=some completely different function). Yes, there's a wrapper

Yes, there's a wrapper. You use it, add a comment "this is wrapped in the migration to 3" and move on.

> How do you automatically infer the intention of somedict.keys()? Is it going to be used as a list or as an iterator?

If it's being iterated on first thing, it's an iterator. If list methods are called on to it, it's a list. This just hasn't been a problem for us, sure, it took some looking at, but it wasn't more than 30 seconds per case.

> I have the feeling that your positive experience with 2to3 might rather be the exception than the rule for old and big code bases.

Maybe so, but the codebase was ten years old and hundreds of thousands of lines.

The two projects i migrated were good quality code at the hundreds of thousand lines of code scale. Though I find that metric a bit inappropriate. I agree the statement was a bit generalising and not appropriate for every project and the million lines projects should have been excluded To be honest - I find a year long dedicated migration effort a bit excessive. But who am I to judge. My experience was smooth with just a few hurdles around byte/string issues and that was it.

FWIW, our codebase is of similar size to that and we estimated it would take around a year to migrate, which is why spending a bit more time and ending up with a Go-based system on the other end was appealing.

After all it will still be Python.

Khan Academy rationale might be false, as the transition path might be easy from Python 2 to Python 3.

But in the end they will have the same stack as before and that's what they clearly try to avoid. Given that it makes sense to transition from a dynamically typed language to a statically typed which offers more compiler feedback.

Why do they call them "micro services" and not distributed systems? Oh right, it's because distributed systems are obviously really hard to create correctly and no sane person would ever agree to pay for that.

Nice: re-branding. I can't wait for the, maybe "consolidated computing" manifesto (aka turning micro services back into monoliths).

What if the services you are writing are independent in that they solve separate business problems, are built by separate teams, have little to no data coupling (e.g. Only basic auth), have different scalability profiles, etc? Separate services are really effective for these cases. Neither micro services nor monoliths are silver bullets. Instead it's possible for each approach to be the best approach in a particular business context.

In several decades of it experience , I've not known or heard of a nontrivial system like the one you describe in the first part of your message.

In the latter part, that must be a disingenuous dichotomy. You don't really believe that just because an avenue exists we should include it in an evaluation?

What about writing modular libraries then?

People are willing to pay for it when you have 50+ engineers trying to push code through one deployment pipeline. There's an inflection point somewhere at which the cost of sharing deployments is no longer worth it.

I actually think this is where good software design and bounded contexts come in. You can perfectly well run a monolith with hundreds of developers if each of the sections is very well contained. There is no need to add network partitions everywhere to enforce this!

This. Usage of multiple nodes, networking, redundancy, etc. is because of operational concerns. It shouldn't be over "development concerns", which is exactly the wrong cause to embarge upon such a long-winded journey!

I'm trying to imagine that ( real or imagined) benefit outweighting solving distributed transactions.

Why are you getting hung up on nomenclature? The point of the article is clear. If you feel that using 'microservices' is too trite or "buzzword-y" then that's about you and not the article.

I guess you're agreeing with me sarcastically? Very funny.

I think some of the misunderstanding in these comments comes from not fully appreciating the perspective of not-for-profit organisations. While I can't speak for Khan Academy, I know that in every NFP organisation I have worked for there is an acute awareness that funding could dry up one day and the prime directive is to ensure that in a scenario like that, the work of the organisation can continue.

In this case, it leads to a higher concern about minimising the cost of the operational services than you might have in a for-profit organisation. In all the strategic planning I have been involved in with NFP, we always have the "what if worst case scenario arises" plan and in that plan the ability to scale down to bare minimum operational cost is key. It may not be conscious but I suspect that may be part of the reason the performance savings from moving to Go are so attractive in this case, where most profit-making companies just ask the question of whether they can afford to pay for the servers with their current margin or not and if they can they have more important things to worry about.

> Go, however, used a lot less memory, which means that it can scale down to smaller instances.

I could be mistaken, but this sounds like they went ahead with the default JVM settings, where it tends to use as much memory it is allowed to (which makes sense from a utilization and efficiency perspective). If memory usage is a concern, the JVM can be tuned for such.

The JVM hasn't yet become fully container-friendly because it bases its calculation on the host OS figures, not the container figures.

You can use a calculator to get precise, proven settings for any supported JVM: https://github.com/cloudfoundry/java-buildpack-memory-calcul...

I thought this was a solved issue since JDK 10: https://www.docker.com/blog/improved-docker-container-integr...

In my understanding, no. It's improved but not fully "fixed". Unfortunately that's as far as my understanding extends.

So, some potential pitfalls:

- The decision seems to be primarily a software architecture one, without much mention of all the other architects whose input will shape how the finished product is run and supported. In a modern software development environment, all the other parts of the org should be consulted on greenfield work to "Shift Left" anything that may need to change down the pike. Design in a silo leads to ineffective products.

- They're going from "hmm we need to upgrade from Python 2 to Python 3", to "we need to redesign everything in a new language with a radically different software architecture". This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.

- They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?

- Why are they continuing to use App Engine if the writing's on the wall for 2024?

I don't disagree with your pitfalls, but I do think we're working to avoid them.

To your first point, "The decision seems to be primarily a software architecture one...", this project has had involvement of the whole engineering team since the beginning. The whole org is on board with this change. It's definitely not happening in a silo.

> This is definitely the second system effect. It's going to take years to make this thing reliable and sunset the old product.

I hope not, but obviously we're not done yet, so I can't say how long it will end up taking to completely decommission the Python 2 app. What I can say is this: there are aspects to this project that are _simplifying_ our system and, for what's left moving from Python to Go, our intention is to port the business logic as close to a straight up port as we can get.

> - They're porting over the logic? Even if this is actually the right move, wouldn't a clean-room implementation potentially give better outcomes?

_That's_ second system effect, to me. We can't change everything and fix every problem now, so we're focusing on the changes that will help us move from Python to Go faster.

> - Why are they continuing to use App Engine if the writing's on the wall for 2024?

I don't think Google Cloud is disappearing in 2024, for one. Beyond that, again, we're not changing everything about our architecture. The way our data is stored is staying the same.

> The whole org is on board with this change. It's definitely not happening in a silo.

Given the seemingly strong chorus of voices responding with cautionary tales about why you might want to rethink this plan, and the number of engineers in your organization, it seems more likely that you have some dissenting voices who have either been too scared to speak up or have already been shot down.

What I meant by "the whole org is on board" isn't that there weren't other opinions. There have been multiple opinions on almost every decision we make (and we have an open process and document our decisions in a style very much like this one[1]). In the end, it's not about "shooting down" alternatives, since that's loaded language. It's about making what we think is the best choice we can with the information available to us.

Even in this thread, there's a chorus of voices sounding caution based on their limited information of what our situation looks like, but there are others who see why we're doing this, based on the same limited information.

We absolutely do know the risks of this project, which is why we're doing this as incrementally as possible.

[1]: http://thinkrelevance.com/blog/2011/11/15/documenting-archit...

Go + AppEngine is the most unstable combination i have ever seen. While we tried to deliver project during 1 year, it was almost fully rewritten couple times because of new Go or AppEngine API. Having NodeJS with far less problems. And AppEngine has huge price tag.

GAE v2 let’s you use docker containers and ime it has been pretty stable and fantastic.

App engine isn't going anywhere. Don't be so dramatic.

Looking at the case where khanacademy is migrating their server only after about 10 years. I realize more that I don't have to worry that much about being locked into certain technologies (unless it's clearly untransferable, e.g. storing part of customer data in 3rd party server), because after all, we might keep it only for about 10-20 years, and the thing I'm working at almost certainly will only last < 2-3 years.

> We’ll only generate web pages via React server side rendering, eliminating the Jinja server-side templating we’ve been using

I’ve been down this road. Deep down this road. Let me just give you a heads up on something I didn’t consider at the time: Most template languages do not parse every single node, one by one. In a sense they are just doing string concatenation. Not so with server side rendering and React. I’m not saying it can’t be done but just realize it is going to take a lot more compute power. Caching is great of course but won’t help you if you plan to customize user content during the server side rendering as well. My recommendation is that you don’t do any user authenticated stuff during SSR.

Also consider how you are going to handle cookies if you do plan to make authenticated requests to server side rendering. Also solvable but for some reason people had the hardest time understanding why we had to forward cookies to the domains we controlled in an API request and definitely not to any other servers.

I’m not sure I would pick React for an SEO driven website. It is hard to get a competitive “time to first byte”. Unless of course you can pre warm a cache of every one of your pages.

Lastly, you’re going to need Node for the SSR. I’m sure you know this but that might take you out of app engine and into cloud compute. Not a big deal but thought I’d mention.

Good luck! It is doable. If you ever want to chat about how we solved some of these problems I’d love to save you some time if I can. Hit me up in my profile email.

Thanks for the suggestions and the offer to chat!

We've been doing SSR for quite a while now and are improving our CDN use as we go along. We already took steps to ensure that there's no user-specific information showing up in our server-side react rendering which would damage cacheability.

Our frontend infrastructure team essentially owns the React render server. I'll let them know you offered to chat.

I've been in a similar boat. We've been splitting up or converting large Python 2.6/2.7 applications into Go services (and doing the same to large Perl applications) for a long time now.

Go has consistently been 10-20x performant (allowing for dramatically reduced hardware needs), easier to maintain, and more productive to produce code in than our previous Python (Twisted) and Perl (AnyEvent).

Hopefully KhanAcademy has solid telemetry data in both legacy and new code so they can quantify benefits. They will also have a learning curve for managing multiple micro services vs monoliths. Accessing shared data will be a problem they will likely have to solve. We've opted for each service controlling its own data - no reaching into another service's data behind its back. Everything through APIs. This gives the microservice the ability to alter its datastore as it needs to and not be blocked by other teams' need to update how they access the data.

Debugging a distributed solution is much harder than a single service. Distributed tracing, consistent structured logging with log aggregators that let you do fancy searches (like Splunk), and application telemetry and metrics will be even more important than before.

Does sound similar!

We have established the rule that each service owns its own data.

We've already got Stackdriver set up to give us distributed tracing and have set up standards around logging.

The article says this: "Moving from Python 2 to 3 is not an easy task."

I disagree with this. It's a Python project's dependencies that make it hard to move from 2 to 3, and most libraries have been updated.

Of course, you could argue that it isn't easy to migrate a codebase from one major version of a language (or framework, or database) to another, but when you eliminate easy from your vocabulary it becomes harder to describe different levels of difficulty.

That’s precisely why it’s not an easy task!!

migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort. this is not just a library problem the language itself changed significantly

source: no python services at my company are going to be migrated to python 3; it’s all moving to a JVM

> migrating from python 2 to 3 is such a large task that migrating to any other language is a comparable effort

I'm going to call BS on that one.

If you're having issues with Python 2, then it might make more sense to switch to another language instead of upgrade to Python 3. But going from Python 2 to 3 is much easier than switching languages completely.

Python is not a perfect language. There is no perfect language. It sounds like your company just had a reason for switching to a JVM language and the Python 2 EOL was a justification to start.

I agree - the statement that migrating from Python 2 to Python 3 is a comparable effort to migrating from Python 2 to Go feels grossly exaggerated.

AT least, it is more exciting for a developer to migrate from Python 2 to Go than to Python 3. :D

I'm skeptical too but if they're doing a lot of communication with other services and they relied on bytes and ASCII just working and the code isn't backed up by tests then I can see them having a very bad time going from Py2 to Py3.

N=1 and all that, but I ported 500k lines of a Python 2 monorepo to Python 3 this year and it took like two weeks, including a week spent reading Eevee’s post on the subject half a dozen times and playing with six and futurize.

Migrating to 2-3 can be a large task in some very rare cases possibly, but for the most part it is practically effortless if you don't have to support both simultaneously. The biggest hurdle might be the "fear" of the unicode change, but that can be dealt with.

Source: All python services at my current workplace are in the process of being migrated to 3.*, and I'm doing one of the main ones at the moment and it's a breeze, including compiled c-extensions.

For curiosity, a good list of actual python3 syntax changes: https://docs.python.org/release/3.0.1/whatsnew/3.0.html#over...

What? I've done it on some sufficiently large code bases, and small ones, and it was done way faster then a rewrite. With tools like 2to3 you can assign it to an intern and have it done pretty quickly.

It's not quite that easy.

Numerous large projects and companies have publicly stated that they are stuck on Python 2 and it's easier to migrate languages, even to ones that they have to invent (Go) than to migrate to python 3. At least one of these companies had Guido on their staffs for years. Another, also with Guido on the staff, needed over three years to migrate from 2 to 3. The overwhelming body of evidence shows that migrating a large project from 2 to 3 borders on impossible, but there's always someone willing to pop up on HN to say how easy it is.

wait - what?

Is Django a big enough project for you?

Did you know that Django was not only successfully migrated from Python 2 to Python 3, it was ported in such a way that for many years it used the same codebase in both languages ...

Perhaps that's the biggest advantage of porting from 2 to 3. A lot of the code could run in both languages.

The corollary of my complaint is there will always be someone who pops up on HN with no idea how many lines of code are in a "large project" like Dropbox or YouTube.

This is kind of a cop-out, how would anyone know?

Is Dropbox open source? Is Dropbox even a typical Python application representative of the challenges of porting from 2 to 3?

My hunch is that the challenges of porting Dropbox to any other language have to do less with Python more with the need to deal with a filesystem at a lower and more granular level than what typical programming languages offer. Thus everything needs to be rewritten in bazillions of ways to handle the bazillion corner cases.

Which companies are those? Because neither Google nor Dropbox have claimed the things you're implying they did.

I heard about this project from a friend who works at KA. I am concerned about the strategy, and I think the following approach would yield better results:

1. Write in Go an exact reimplementation of the current Python codebase. Use the same database schema, front-end HTML/JS, test suite, and so on. To whatever extent possible, use the same names for classes and functions. Check the reimplementation correctness by using a comparison tool that calls both the Python and Go version of a page/function/search and making sure that they produce the same results.

2. Change the production code over to the Go version, perhaps using a ramping strategy where X% of servers are running the Go code, and you gradually increase X, while monitoring vital statistics like server load and response time.

3. Now that the production site is running Go, incrementally split off components into their own services.

This approach leads you to the same destination, but with a lot less risk. It is very unhealthy to have a situation where the production site is running one codebase but all the developers are working on another codebase. Note that you will realize the benefits of Go (performance, type safety) after step 2, which is much sooner than OP's plan.

Joel Spolsky's classic essay about how you should never do full codebase rewrites is worth reviewing:


100% agree. We've just finished to roll out our implementation in Go migrating a subsystem from PHP and receiving around 150req/second and demultiplexing those request to 1500-2000req/second to legacy backends.

The key to the success of the project was that the API was an exact match, and we could compare both implementations for exact requests. The deploy strategy of the new version:

- Reply the real traffic to the new Go service comparing the results with the old one - Then implement a toggle feature than enabled different traffic sources to use one backend or the other - Keep changing backends to the new system and ensure that metrics were unaffected

Having e2e and integration tests for the Golang project was of a huge help, since we could fix all differences using TDD.

Although we changed some of the implementations to take advantage of Go constructs, just a 1-to-1 replacement would have had a huge performance impact.

Having done this kind of migration/rewrite multiple times, the way you succeed is starting with acceptance tests that verify functionality from the API layer that is implementation agnostic.

After the tests are in place, you break off small portions into microservises and ensure tests pass.

You pass a small percentage of traffic through the new arch, fixing bugs and leveraging telemetry.

You eventually slide all traffic to the new architecture.

You want to get something receiving traffic asap as to start getting feedback. Often this means taking something smaller or simpler out of the lagacy codebase first.

Doing a complete, side by side replica is a recipe for disaster. Think MVP. We've even introduced traffic routing based on feature sets so you can route users who don't use edge case features the new arch which has yet to incorporate the those features while keeping others on the old arch.

It looks like I need to write a followup with more details about how our migration looks!

Since we're using GraphQL federation, we have a gateway which serves up our complete GraphQL schema, pulled together from a collection of services behind the gateway. We can move individual properties over from our Python monolith to new Go services and the clients will never know. Plus, we can do side-by-side testing in the gateway by making a request to the monolith and to the new service and comparing the results (something we don't have yet, but plan to).

This is definitely not a big bang rewrite. It's about as incremental as it can be, because we can move individual GraphQL properties over and the gateway stitches the result together.

Is moving directly to #3 not an option?

I’m thinking that defining parts that can be moved to separate services, and start consuming these could be a way to organically transition to a new architecture.

Sounds nice in theory, but won't you always be playing catch-up? Or will development(/bugfixes) halt on the existing python version? Would it ever be acceptable to the business to commence such a rewrite without any show of value until after completion?

(Speaking as someone working at a firm that did choose (before my time) to start a ground-up rewrite than ran about 3 years over estimate).

It is crazy to see they are still using python 2. Seeing how slow the conversions to python 3 have been, was creating python 3 a good decision for python community? Can it be argued that developing python 2 further in a backward compatible way would have been better for the community? I know that evaluating this kind of thing is hard as metrics are bound to be subjective and speculative. But I am curious if there was any serious attempt to figure it out.

> was creating python 3 a good decision for python community? ... I know that evaluating this kind of thing is hard

I don't think there is any question here: Python 3 is a complete disaster. Years and years of engineering effort wasted on changing string libraries. Sadly, the Python leadership refuses to acknowledge the failing, perhaps because such an acknowledgement would challenge their omnipotence ... it would, and it should.

I remember a long time ago, I used the "six" library (Python (2 * 3) ) to code in Python 2 with an API based on Python 3.

I'm not entirely sure how well the line of code: "import six" would fix any compatibility issues nowadays.

> Now, in 2019, Python 3 versions are dominant and the Python Software Foundation has said that Python 2 reaches its official end-of-life on January 1, 2020 , so that they can focus their limited time fully on the future. Undoubtedly, there are still millions of lines of Python 2 out there, but the truth is undeniable: Python 2 is on its way out.

The Python 2/3 split is by far the most annoying thing about Python. I don't develop software in Python but about half the time I've had to use a Python library or program the problem of 2/3 incompatibility has cropped up. Some projects don't make it clear whether one or the other is required, leading to further confusion.

If anything the Python 2 EOL could make a bad situation worse. Like Khan Academy, each Python 2 package maintainer will be forced to make a decision: move to Python 3 or abandon and maybe move to an entirely new language. It think many will choose to abandon, leaving these packages to rot.

Second on the list are the multiple package managers (or things looking like package managers).

Third on the list of annoyances are native extensions, driven by the poor performance of Python itself. These extensions make it difficult to use certain libraries across operating systems.

So as a non-Python developer I don't look forward to the occasions when I must use a Python-based piece of software.

If you're installing a package via the package manager, it will very quickly tell you if you can install it on your specific version of python. Unless you're downloading some rather obscure and un-loved library where the author didn't explicitly state which versions of python they support.

Multiple package managers: There has only been 2 big ones from my long-term general usage of python. Easy-install and pip, the former of which is falling in favor but still semi-supported. Pretty much everything runs off of pip. What may be confusing you, and does confuse me at times as well, is their naming and the installation instructions as provided by library authors. E.g. some say python setup.py -install others just tell you to "pip install" it. Some would say use "setuptools", etc. Other would tell you to use things such as "conda" or "anaconda", pipx, and to create virtualenvs. All secondary, but things that should not ideally distract you from just plain using pip.

3. This has also been getting a whole lot better in the last 5 or so years. Microsoft has been funding dev-time to make the ecosystem for python (including extension compilation) much more pleasant in the Windows space. Also, the package managers and library authors are doing a whole lot better in that binary distributions are much more prominent so the compilation of the extensions never has to happen on your machine.

> Unless you're downloading some rather obscure and un-loved library where the author didn't explicitly state which versions of python they support.

I take it that you've never tried the obscure and un-loved pwntools package :)

>The Python 2/3 split is by far the most annoying thing about Python.

It's _literally_ the reason why I went with Ruby instead of Python all those years ago.

On a personal level, I have the same problems with using Python.

I've become enamored with package management in Go — not perfect yet but efficient, simple, to the point. The backwards compatibility enforced at the version level is also great — you often find Go code years old that keeps running just fine. I like things that you set up once and may just forget, that's where real productivity is found imho — it doesn't matter that I can do X in 1h if I have to do it every other day, I'd rather spend a full week or even ten, and solve it forever.

I think Js is comparably simple but I've heard so many horror stories about dependency management that I just don't know — I've yet to use Js in prod myself at work and I don't look forward to this day.

Here's the thing: it does not matter how great a language may be while I'm writing it, because that's 10-25% of my time; what matters is that everything around, from setting up dev environments to shipping passing by devops, especially as a one-man/small team, can be done "simply enough". And that, IMHO, is where Go is miles ahead of most other languages from a philosophy standpoint.

I tend to feel very positively about Rust for it looks to be an extraordinary intelligently lead project, with comparably 'real' benefits that extend beyond the code page (but I've yet to use it myself to confirm first-hand).

We see the importance of these topics so clearly with Py2/3: none of the problems between these two have anything to do with what's in the code, with programming; all of it has to do with the ecosystem, with the real and much larger task of maintaining codebases, managing teams and deploying 'stuff' in ways that work with the environment (whether tech, people, knowledge, politics, what have you).

The move to 3 by Python has been a failure in that regard, and IMHO it rests on the shoulders of an entire community who chose to stick to 2 now regardless of what happened then. Well then is now and the result is chaotic.

I'm not worried about Python itself — the language is incredibly popular, especially in academia, and we need to double the programmers population earthly each year so that's a sustainable amount of new projects written in Python 3 every day. Old 2 projects will be but a drop 10 years from now simply because of this number effects of growing tech at such an insane rate (it's been about true since the late 1940's, uncle Bob has a great take on it in his latest appearance in The Changelog podcast).

Anyway. I can't wait for py2 to die and py3 to become the only Python.

> I like things that you set up once and may just forget, that's where real productivity is found imho

Very productive, until an unpatched security issue in a dependecy from 6 years ago bites you in the ass.

How does Khan academy make money to sustain the website? Is it all donations?

Non-profit, with several $1m+ donations: https://www.khanacademy.org/about/our-supporters

To be more specific (for anyone else who hasn't checked the link yet), the "Lifetime giving" section has: 9 donations >10m, 4 donations between 5-10m, 20 donations between 1-5m

So looks like there has been at least 120m in donations!


From what I understand it’s mostly donations but I think they receive some grants too.

As much as I personally don’t enjoy writing Go I really can’t fault them.

I still find it interesting that for a relatively obvious feature set of fast compiles, fast startup and fast runtime there really isn’t anything mainstream out there to compete with Go.

I really hope something like Kotlin, Swift, ReasonML or even AOT JVM/.NET brings something to the table soon. Or perhaps I’ll just have to wait for WASM to really take off server side.

> fast compiles, fast startup and fast runtime

All I can think of is D, but it’s not quite mainstream. Are there any other less popular languages that meet all 3 conditions?

Delphi, Ada, .NET Native, OCaml.

Still snake oil.

Site looks pretty legit, can you explain why it's snake oil?

Because it literally uses string manipulation to generate c code from source. There is no concept of an AST or anything - just string bashing.

Looks super lit with everything baked;- faster compilation, small binaries, and performance... Still scared to jump in


in production is fast startup really such a boon outside of serverless? especially if you're already doing blue/green deployments, doesn't seem like it'll have much impact.

(depends what "fast" vs "slow" means - are we talking about milliseconds vs a second or two, or startup times so horrendous they cripple your devs' ability to iterate and tests?)

We essentially run in a serverless environment (App Engine), so fast startup does matter to avoid some unlucky users hitting the cold start.

fair enough. you say in the blog App Engine has worked well for you and you're sticking with it, so i'm assuming you considered moving to traditional servers but found it unappealing?

Yes. Google Cloud now has multiple options for autoscaling servers (App Engine Standard, App Engine Flex, and Cloud Run) with the biggest differences being how they're deployed and specifics around the scaling.

We _could_ manage our own Kubernetes clusters and such, but Cloud Run is pretty similar to that and takes away all of the management headache. There is essentially zero code difference, should we decide to change our deployment strategy later.

We're using Google Cloud Datastore for persistence, and that automatically scales in both servers and storage, so it has worked out nicely for us as well.

Local iteration is most important to me, but serverless is a great example too, as are CLIs (slow CLIs have recently become my pet peeve).

The nice thing about Go is it performs well in each of these use cases by ensuring nothing in its tool chain is slow, or produces slow code.

Slow is in the eye of the beholder I suppose, but I guess I’m using it here to mean within an order of magnitude of its peers.

GraalVM looks to bring fast launch, at the expense of long-run performance optimisations from JITting. For FaaS-y purposes that will be a sane tradeoff, for long-running services the startup overhead is amortised over requests.

How does WASM fill this gap?

> fast compiles, fast startup and fast runtime ... mainstream

C is this but it's harder to write secure/correct C, the standard library is smaller, and there's no canonical toolchain in the same way as Go.

I guess I’m jumping to conclusions but it seems like a lot of lessons have been learnt since JVMs and .NET came onto the scene, and that WASM runtimes and the future languages that target them will prioritise speed at every stage.

Go already cross-compiles to WASM, so if desired, Go code can be run via WASM. But on the server, you probably rather want to run the Go code natively. For the client, this should be quite interesting.

One doesn't just write WASM though.. WASM is like the JVM. You still need to write code in some other language.

Exactly. Hence posing the question (rhetorically, I suppose).

Seems like such a waste. Is switching to python 3 really that hard? Is hardware that expensive? If this is indeed the right call it doesn't bode well for traditional scripting languages as the web scales to fewer high traffic apps. We might start to see more jvm, go (apparently) or even rust and c(++), rather than speed of development languages like Python or Ruby. Trend seems to be the reverse though, with python the second and most rapidly growing language.

Everyone’s project/code base is different but in my experience there’s been a critical mass of libraries for a few years. I presume the “it’s hard to move to 3” is dev teams wanting a new toy as much as “the rewrite is too complex”. Library use, size of code base etc are all big factors but at the end of the day, I think team motivation is really the deciding factor.

That mirrors my experience as well. Someone with influence is bored or wants to level up, so they'll drag the entire company into a long, expensive quagmire.

Unless the existing codebase is mired in technical debt and completely unsalvageable or cannot scale further, this seems like a very radical move.

Replacing an old unholy mess with a new unholy mess is usually a bad plan. It's called the Second System Effect.

Which is precisely why most of the Web Dev should not be called Engineers.

I worked on a fairly large codebase that needed to be rewritten from scratch when migrating from 2 to 3, primarily because all the tests were written using a test framework that was no longer maintained. So given that you might need to start over anyway, I think it's reasonable to consider other options. That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.

> That said, yeah it's difficult to understand how KA's web server costs aren't already basically zero, and how their endpoints aren't already basically instantaneous.

I find that this is the outsider's view of a great many products. Things always seem a lot simpler on the outside.

In Khan Academy's case, I think a lot of folks just think of our site as being a collection of more-or-less static pages with videos on them. There's a lot more going on than that, though. We've got a CMS that supports articles with math and interactive elements, in addition to the videos... and many, many exercises with hints. All translated into dozens of languages.

We have to remember every exercise people have done so that we know which ones to present to them next, and we need to display that progress when they look at topic pages. Oh yeah, and if they're in a classroom, we need to present that progress to teachers (or coaches/parents, outside of the classroom). Teachers can also assign content.

Now, we're also offering features for school districts: https://www.khanacademy.org/district

Plus, there's the official SAT prep, which connects to the College Board directly to provide personalized guidance about what to work on... and that's only one of the test preparation areas of our site.

And, as you can imagine, there are a bunch of other features and aspects of the features above that I'm not mentioning. It adds up.

Fair, but what percentage of spend is actually on web servers as opposed to database, data transfer, static asset storage, caches, CDN, etc? The features you listed are kind of what I expected, but I still wouldn't expect the web servers to be more than 15% or so of your hosting costs. I know you guys get a ton of traffic, but on most web sites at least 90% of traffic is logged out and doesn't even need to hit the web servers in the first place.

I don't have recent numbers in front of me, but I believe our web servers are more like 40% of our hosting costs today.

Over the past year, we've started leveraging our CDN (Fastly, who have been great) a lot more. That said, for us a lot of logged out traffic still carries the weight of logged in traffic. A logged out user can start doing math exercises and we'll keep track of what they've done. If they then create an account or log in, that activity is associated with their account.

Khan Academy may look like a content site, but in many ways it's more like a "learning app".

According to their 2018 accounts, 'information technology' costs were $5m. Salaries were listed separately at $29m so I'm guessing the $5m was mostly servers.

This is my experience too. Someone or a couple someones on the team decide they want to try out some new tech or expand their resume. Then it becomes a quest to justify the switch rather than a quest to make the best business decision.

What's really the ergonomic difference between "traditional scripting languages" and Go? I came up writing professional C code, and spent most of the last 15 years writing "traditional scripting language" code, and Go feels a lot closer to scripting than to C to me, despite compiling down to machine code.

Go has static types, and that distinguishes it from Python, Ruby, and Perl. But the trend now seems to be for languages to move towards static typing anyways; 2005-era Python was wrong about that.

> Go has static types, and that distinguishes it from Python, Ruby, and Perl.

It also has interface{} and that's a non-trivial commonality with Python, Ruby and Perl.

It’s usually the DSL argument. In Go you can establish calling patterns for errors, but you’re really limited in terms of providing libraries with nice APIs that prevent you from making mistakes.

In Python you have a lot of ways to make sure someone does a thing. Exceptions are good for making sure an error is handled. Context managers make sure a resource is cleaned up properly.

I might be wrong but I feel like writing something like jquery (with its fluent API) would be really tough in Go.

> 2005-era Python was wrong about that.

Given Python's incredible rise since 2005, and even in the past few years, I'm not sure we can say they were "wrong." It's serves a purpose.

That rise has been entirely accounted for by machine learning and data science. Python as a language for actual software engineering has been slowly dying for a while now.

I've worked with or known about too many places that have jumped on the microservices bandwagon and the only thing I've encountered is a lack of maturity (mine included) and a knee-jerk reaction against monolithic code, when the problem isn't the size of the codebase but its organic growth over time. Go is an excellent language for it but distributing your business logic and functionality over a network fundamentally changes the behaviour of your app and how you have to think about it; you can't just tear out bits of the monolith and make it an API.

I've come to some fairly comfortable conclusions:

1. If your team is small; don't do it. The mental overhead of that architecture will bring your team's ability to deliver down to it's knees.

2. If you've got a long-lived app with lots of legacy code; don't do it. You will have to maintain and add new features to the old codebase because it's easier, which means you have more things to rewrite.

3. If you're small-ish/mid scale. Don't do it. Kubernetes and similar are tools for people handling Facebook/Google/Netflix kind of loads.

4. If your organisational structure doesn't fit (i.e. you don't have enough people to split into smaller teams); don't do it.

Scaling is considered a good problem to have, right? That means that your current, ugly monolith is actually successful, and somehow the first thought is to replace it all with a complete—but trendy—unknown?

If the engineering team is so unhappy about the architecture then Go is the wrong choice. Maybe they could consider service oriented architecture and some refactoring/tech debt time so they feel happier about that codebase. Pull some of the code out into modules and then start figuring out where bits of the codebase would actually belong, while still being one deployable.

And then after that, if you really want to, distribute it over the network, and then start thinking about porting it.

Otherwise, throw away your first successful prototype at the first instance and go all in on distributed architecture and microservices. At least then you have the luxury of figuring it out from scratch.

> I've worked with or known about too many places that have jumped on the microservices bandwagon and the only thing I've encountered is a lack of maturity (mine included) and a knee-jerk reaction against monolithic code, when the problem isn't the size of the codebase but its organic growth over time. Go is an excellent language for it but distributing your business logic and functionality over a network fundamentally changes the behaviour of your app and how you have to think about it; you can't just tear out bits of the monolith and make it an API.

This seems like a response to the blog post as opposed to the parent comment.

We absolutely recognize how added network boundaries changes the app in big ways.

One thing I wanted to mention: we're _not_ going the Kubernetes and service mesh sort of route because our experiences thus far show us that there's still a lot of rough edges. We're sticking with App Engine because it generally just works. Scales down essentially to zero and scales up well with the traffic. So our services are all going to individually be running on App Engine.

Plus we're not going "micro" with our services. They're each fairly decent size, own specific parts of our data, and are owned by specific teams.

>knee-jerk reaction against monolithic code

It's almost religious. The reaction some people have when you suggest monoliths is completely baffling. It's apparently just "known" that it is the correct approach to all problems, so, y'know, it's embarrassing for you to have even suggested otherwise.

The best systems I've worked on have all been well architected monoliths. The code is ugly as fuck in some places but that's what tech debt is.

If you release a bug to prod in your monolith, it is exactly the same as a dependent microservice releasing the same and bringing the cluster down. You get a crash either way, and at least with a monolith you're not spreading your call-stack over the network; it's all in memory.

A micro service can have non critical dependencies.

In terms of general compute speed Go is many times faster than Python - approximately on a par with Java speed-wise but with much less memory use. That means your server costs are many times cheaper and your page latencies are much faster than Python. It's significant.

I thought Go's GC was more wasteful of memory than Java's b/c of the focus on reduced pause times.

No, not at all. Go uses the stack for most variables so it's a lot less heavy on GC than Java.

In the past companies would just compile python down to c to get the memory and perf they need. Probably would be the right answer here too, but that would not look as cool on the resume.

By compiling you mean rewriting it in C or Cython. Which is only a good idea for certain applications.

yeah, I was thinking of Cython getting used. Anly particular reason it would not work for this kind of use case?

Moving compute intensive tasks to C would speed up many programs. That is the reason that many dynamic languages have overall good performance - they rely on C libraries for the computative heavy lifting.

This has one big backdraw however. Not only do you need to write in two different languages, the rewriting in C requires a lot of care, as the language protects you much less than the high level language you implement for.

One big attraction of Go is, that it is high level and productive enough, to be the main implementation language, and for time critical stuff very efficient. So you don't have to cross language bareers to implement speed critical code and you get the full type and memory safety in the whole stack.

Interestingly, one can write Python extensions in Go, so in most cases that would be my choice these days for speeding up critical code paths in Python.

For any web-based app, it really makes sense to take the approach of using a rapid development language first, then as you need to scale, convert to something that’s compiled and focuses on speed. It’s not one or the other kind of thing — they both have a role (at least until we reach the holy grail where fast to develop is also fast to run).

Like one of the former engineers at Twitter said about their early issues with stability and when someone blamed Ruby for the performance issues - short version it was a bad architecture not the language.

Stateless web servers are one of those things which are ridiculously and easily parallelizable, you can scale a web server horizontally easily. The ROI of rewriting everything in another language as opposed to just adding more web servers would probably take years unless you’re running at a ridiculously large scale. For the cost of one developer’s fully allocated salary, you can throw a lot of hardware at performance issues with web apps.

I prefer statically typed languages, but performance isn’t one of them. Besides, how much processing is a typical web app doing?

What's a "rapid development" language? I know about RAD, but that was a buzzword that was only ever used to push terrible languages like Visual Basic, and somewhat less terrible ones like Embarcadero Delphi.

Dynamic languages (so you can hack together stuff and quickly bypass any roadblocks), with REPLs (quick feedback and avoiding writing tests), and low cognitive overhead (so new folks can ramp up quickly).

Some example languages that come to mind here: Python, Ruby, JavaScript, Clojure, Groovy.

Go actually comes close here, even though it doesn't have a REPL and isn't very dynamic, because of its focus on minimizing cognitive overhead and getting the job done with minimal fuss.

Other languages carry a lot of community baggage. Java is one of the worst IMO...if you try to hire Java programmers, it's going to take a lot of effort and risk to find and reject applicants who've read too many design pattern books, are architecture astronauts, or come from an enterprise-y background. The signal-to-noise ratio is just really poor.

Rubys devs said the same thing about Java when Rails first came out.

Now look at Ruby, its stack is gigantic and cumbersome.

I guess Go will try to resist that, but over time Go will become burdened with a huge stack of outmoded software.

I _like_ Python. Even released a Python web framework. I think there are many projects it's a good fit for, which is why it's continuing to be quite successful, despite the pain of Python 3.

But that doesn't mean it's a great fit for all projects. Personally, I've come to find that code in statically typed languages is easier to maintain over time, especially from a big team. I guess a lot of Python folks agree, which is why Python 3 allows static typing as well.

At a certain point, server costs _do_ add up to real money and some applications are not purely database-bound. Go's tooling makes it almost as fast to work with as a scripting language, but with much better performance. The language itself is certainly not as succinct as Python, but I think it has made reasonable tradeoffs.

Also: there's already a lot of JVM on the web.

Finally, I'll just note that _not all_ Python 3 migrations are that hard. It depends on a lot on the libraries used.

I don't think upgrading to python 3 will be as hard as rewriting the entire thing, but rewriting does come with the benefit that you are not stuck with the problems that come with dynamic typing in a huge codebase, and i'm assuming this is why sticking with python is hard.

They also said a faster language will improve their server.

Instead of rewriting with static types, they could just gradually add them – Python 3 supports static typing[1] with the actual type-checking done by external[2] tools.

[1] https://docs.python.org/3/library/typing.html

[2] http://www.mypy-lang.org/

This is perhaps a very big misconception with python usage. Just because it has "dynamic" or "duck" typing, doesn't mean that you have to consider chaotic and unpredictable data running through your code paths. It's just not the case.

In a typical codebase, it's probably 98% very specific and known data types linked to the variables in your code. With the remaining 2% being things that are just "easier" to solve with dynamic typing rather than coming up with complicated interface/inheritance hierarchies that you typically find in compiled languages.

And with type-hinting now being there in python, you have a very good way of "codifying" that dynamic or duck-typing. Such that you can expect almost 100% knowledge of all the data types coming in/out of your classes/functions. At this point, I'd argue it's got one of the most robust "type" systems out there, if one can call it that at all. Just don't use text editor + MyPy, or VSCode for your python development, and you'll be in good hands. I.e. Use PyCharm.

Re: your last sentence, what is text editor + my so bad ? How does pycharm warrant this definitive advices ?

Honestly, at this point mypy is inferior to the built-in PyCharm when it comes to speed, integration and in some cases the type-inference as well. I've also compared it to the VS python language server and it too doesn't stack up.

The other metric would be what a person coming from established, full-fledged IDEs such as VS would expect. With PyCharm, you get intellisense almost on-par with what you get from VS for C#/VB, assuming you use type-hints that is.

Not trying to be difficult, but I'm being honest about them providing a really good python experience that is for the most part free. There is no need to putz-around with VScode, json configs, plugins, mypy, etc and still end up getting a relatively inferior experience. Doubly so for new-developers.

They actually added a daemon recently that should help to speed things up (https://mypy.readthedocs.io/en/latest/mypy_daemon.html) . Haven't had a chance to try it yet though.

> Is hardware that expensive?

Hardware is cheap. Hardware is on the "accessible to a 3rd world middle class person" level of cheap.

But with enough scale, it adds up, while the costs of a rewrite don't. And the difference between a language like Go and one like Python is on the hundreds of times.

FWIW, our tests showed Go as realistically being about 10x faster than Python for our tasks.

> Is switching to python 3 really that hard?

I think it's more that given their specific codebase it's similarly difficult to switch to python 3 as it is to switch to a number of other, entirely different programming languages. Once you recognize that rough equivalency, then it's worth considering the stability, compile times, necessary production resources, etc of those other programming languages.

If your project's specific dependencies are so intrinsically stuck on a python 2.x implementation you might be caught between having to redesign that dependency in-house or switching to a language where you wouldn't need to do that in-house work.

> it's similarly difficult to switch to python 3

But that's almost certainly not the case. Even if their existing codebase relies very heavily on the small subset of Python2 features that require manual porting, switching to Python3 will be much less work than rewriting everything in a new language.

The Python 2 to 3 conversion was only part of it. The libraries were a huge difference for us.

I will agree that porting to Python 3 would be less work than porting to Go, but we believe that the difference is less than people would suspect.

Never announce a plan to do something. Announce the results after doing it.

If you announce a plan, you're making a bet (sometimes with your reputation) that's it's going to be successful, and telling the world about it.

If however, you didn't announce it, you could still get the benefit of success, or make it looks like learning the lessons of a failed experiment.

We have the good fortune of not having to be super secretive about our tech. We _want_ to talk about this project as we go along and share what we learn. We've already got more interesting stuff to talk about, and it'd be a lot less interesting, I think, without this context for the overall project.

As mentioned in the post, a small piece of our GraphQL schema is already in Go running in production. This blog post isn't just "we're thinking of doing this thing". It's "we've already done a bunch of research, thinking, _and_ built some of it."

Just because you can do something doesn't mean you should have done it. And sometimes that's only found out in hindsight.

Indeed, that's true. I just think we have more to gain by talking about what we're doing and seeing what the community thinks about some of our approaches rather than keeping it all private for the next year.

In my experience it's more annoying than you think, especially if you want to prevent small sneaking regressions.

Python not being statically typed also means all of the breaking type changes they made with string now being byte or similar means crashes are not revealed until possibly production if your unit tests didn't get that oneeeee edge case right.

If I'm going to do a new project in the future, I will demand it will be a statically typed language. I'm sick of dynamic languages.

Switching to python3 is hard, especially if you have a massive python codebase interacting with other systems. Its not as easy as importing unicode_literals. Unicode breaks in very subtle ways.

and rewriting it in another language will magically fix this and not introduce new bugs? besides, in my experience Python 3's clear separation between bytes and str makes these breakages much less subtle than it silently going wrong in Python 2.

i wish them all the best, but would've been much more impressed if they'd done it, not simply announced to do it.

> in my experience Python 3's clear separation between bytes and str makes these breakages much less subtle than it silently going wrong in Python 2.

No question Python 3 is better than 2, but it's not better enough to justify the move. People will only move when they absolutely have to. That isn't progress, it's inefficiency.

The question isn't, is porting Py2 to Py3 easier than porting Py2 to Go. The question is, if you spend the same effort on porting Py2 to Go that you would have spent porting to Py3, and that only gets you 60% of the way there, but the ROI on that work is much higher, are you better off?

Python is growing only because of tensorflow, pytorch, numpy, pandas and friends. Web dev is moving away from dynamic typing and that’s a good thing.

Yikes! It's a lot of effort to reduce memory use. They might be better off creating a new Go entrypoint/server that can call into CPython to reuse all their existing/tested modules (treat their Python as a microservice called by Go). They could then use Go to create/call new microservices or replace various routes on a selective basis.

I think the real problem is that they didn't properly maintain their code. Rewriting it in Go won't prevent them from dealing with this in a few years for when this Go version reaches end of life. I would have liked to see an article on "introducing process" side of programming.

Thanks for the comment! A couple of things about this...

1. the Go team is working very hard to ensure that there are no such compatibility issues. Code written for Go 1.0 should still compile with Go 1.14 beta today.

2. It's possible there's more we could have done along the way, and I tend to think that statically typed languages make it easier to safely refactor more ruthlessly. But I do think we've actually done quite a bit of change incrementally along the way. Our move to React on the frontend and GraphQL on the backend have been good examples of that. Plus, we did a huge refactoring a couple of years ago to draw better boundaries in our monolith, and that has made a move to services possible.

Unpopular opinion, writing Go is faster than Python. With the compiler, strong typing, and no versioning hell, I'm much more productive in Go.

Whenever I use python I run into problems with versions and dependencies. And the whole community just tells me to use pyenv or virtualenv and it will "fix all my issues". Only it doesn't.

Just as a counterpoint, I'd say that using Python can IMHO be similarly productive than Go.

Regarding the dependencies, you have tools on top of virtualenv, such as pipenv/poentry, which handle dependencies, and are easy to use. Biggest issue that I've encountered would probably be when two or more dependencies require the same package, with no intersect between supported versions. I don't think Go handle this any better, thought.

Type hints (and mypy for static type checking) are a must, and coupled with a good IDE, they really improve the productivity. I'd say that mypy's type system is more advanced than Go's, but it strongly lacks in type safety (due to the fact that majority of the libraries are not taking advantage of it yet, and that Python is still a dynamic language by its nature, and there is no runtime type checking).

I doubt that's unpopular. It's the same reason I significantly prefer Go over Ruby.

All these great things that Java fossils like myself have been telling Node and Ruby hipsters about. Of course they won't be caught dead writing in a language their parents use.

Same. Moved from Python to Go, don’t write python much anymore.

It’s ridiculously easy to build things in go. The default tooling works just great. It’s a nice fit with docker for building tiny containers.

Perhaps the nicest thing though is how easy it is to write fast http servers. The default server is pretty good, but there are also so many choices for faster http server frameworks. Middle wares are easy to write and share. I can’t say that I truly understood how http servers worked until I started using go.

For larger stuff, yes. For small stuff, no. If you just need to get something small done quick and dirty, python will be easier and faster.

Yes. My current estimate of the cutover, for myself, is about three weeks of solid, 40hour/week development. After that, my Python (or other dynamic language) starts the process of seizing up, where instead of rewriting some module I just put a little hack in there to make it backwards compatible with other code, since I haven't got a great way of being quite sure what's calling this code, so I use a __setitem__ or have a function that takes "a thing or an array of that thing", and I find myself increasingly reluctant to refactor the Python.

YMMV on the exact number, but that's been my experience several times now.

I know it can be done; I've seen it done, I've done it myself. But refactoring without even the rudimentary static type system Go has just becomes an increasing nightmare at scale.

And I use unit testing in Python, etc.

But, flipside, yes, Go isn't a great language for just bashing a script together in. Maybe not the worst, with a bit of library work, but not a great language.

The recommendation should be to use pyenv and virtualenv: pyenv for installing the Python versions you need for different projects and virtualenv for creating an isolated environment for each project. Using this setup, I almost never run into dependency issues.

I don't know much about Go. How does it avoid dependency conflicts?

I just cannot understand how rational engineers would choose untyped languages like Python or Ruby for large scale systems. Humans make mistakes - even very smart humans make frequent mistakes - and blast radius grows with the size of the system.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact