Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Best-architected open-source business applications worth studying?
296 points by ghosthamlet on July 24, 2017 | hide | past | favorite | 88 comments
Not just good code described in books like Code Complete, But aslo has a great architecture as a whole, and should be open source business applications, as there are many great library/framework/generic applications like Yii Redis Lua Linux, but open source business applications with great architecture not easy to find.

Nginx and Git.

Nginx has a lot of respect on the market for handling high concurrency as well as exhibiting high performance and efficiency.

I don't even have to speak about the Git architecture. It speaks plainly for itself.

There's a series of books called The Architecture of Open Source Applications that does justice to this topic


I love git, but the learning curve seems very high for the unititiated. Git seems to work best as the assembly language of source control for a lot of people. It's okay as long as you put a nice GUI on top of it.

Even though I'm pretty comfortable with git thanks to attending a two day workshop conducted by a guy from Github, I still like using tortoise git.

It's sad so many people use GUIs on top of git - you really end up not understanding what git is doing and it can lead to problems. Command line is supreme for the same reason that a majority of books are better than their movie counterparts. They are easier, though, but in the case of git that isn't a good thing.

I agree, but this is a symptom of a larger problem.

Really I'm disappointed that people don't build their computers from scratch these days. You don't know what's going on when you have so many components joined together with random blobs of firmware.

Give me a bare Z80, a bunch of RAM, and anybody could understand it. Plus if you're writing your code in assembly you can optimize better than any compiler - they wouldn't know enough to use your opcodes as constants when you need to.


Honestly that just sounds way too complicated... I do it all in my head.

To think I've been writing all my code out on paper. I need to get on your level. /s

I'm skeptical that you actually learn more by using the command line vs. a GUI. Yes, if you need to do something subtle, you might have to learn something, and you might need to use the command line to do it. But you learn more because you need to, so you go out and learn it, not because the command line teaches you something.

To me knowing the CLI is a little like knowing how to read sheet music. It makes analysis and collaboration a lot easier.

You can put down a sequence of git invocations in a text file or script, stare at them, reason about them, tweak them as appropriate for your use case, copy and paste them to another developer, write about them in a blog post, include them in a README.

It's harder to do this with a sequence of gui actions.

And how often in a normal dev workflow do you need to do that?

Most developers, pull, push, merge, rebase, and fix merge conflicts.

Yes I agree, but installing tortoise GIT and then looking at what commands are available on the menus is a great way to focus learning when you don't know whats to learn yet.

I think it really depends on the use cases. For example, if you are in a team of 5 engineers, all working on mostly unrelated areas, then no problem. In this case, you might go a long way without having an actual need to learn something more advanced.

However, it's quite important to recognize when your basic git skills are not enough anymore. Which often is joining a bigger team/company.

"Learning something more advanced" is usually knowing how to fix conflicts, merge, and rebase. That can all be handled with the GUI. Even when you screw up, learning how to use the reflog can be done from the GUI.

What can't be done from the GUI? The only thing I've had to do from the command line is using git subtrees.

From a GUI or any GUI? There are many, many GUIs implementing git and not all of them do it right or even well. A GUI is an abstraction and implementers are free to do it however they see fit. Not to mention having to deal with bugs in two separate systems.

>There are many, many GUIs implementing git and not all of them do it right or even well.

Speaking of that, which git GUI is good, in your opinion? I tend to use Git from the command line, but had tried out a few GUIs for a project earlier, but not enough to decide. GitEye was one, another might have been SourceTree (IIRC it is from Atlassian).

I completely agree with you, but to be fair, magic it is really convenient to use. But then again, it doesn't really try to "hide" git's workflow as much as other tools do.

I use a combination of command line and GUI. From the command line, using git for commit, add, status, push, pull, fetch, branch, add remote and rm is just clearer, quicker and more straightforward in many cases.

From any of the IntelliJ derivatives I'll tend to do add, commit, push and diff just because it's a clean interface where I'm already working. I've never really like any of the standalone 3rd party tools though. They just never felt like they gave me anything all that useful.

I'd love to find a better third-party UI for git too. Or, I've found SourceTree or the GitHub app (not the latest Electron release, albeit) hard to work with when doing a lot of stuff outside the typical add/commit/push, so I've slowly ended up learning the command line over the years.

I've seen others get in situations with fussy merge conflicts or a detached HEAD and be clueless why the UI isn't working properly. I can't blame them though, since git can be complicated and a tool like SourceTree has to make things nicely-abstracted to be helpful/productive. It's hard to know what details to make explicit/simplify, and I think with git there's a lot of work we could still do.

Git really needs something on par with Mercurial's TortoiseHg - it actually manages to express all the edge cases in a natural fashion, and lets you jump from cmd line to gui and back with ease (it's more of a cmdline tool that happens to invoke gui dialogs).

I've been looking for years to find something like it for git. Nearest I've seen was GitKraken, but it's UI is still horribly opaque, can't invoke from cmdline in nearly the same way, and requires me remotely authenticate (!?) to use it :|

GitKraken is my favorite UI, and they publish a number of videos to try to break down how features might be best used, supports Git Flow natively, etc.

Yeah -- I use sourcetree almost exclusively. Why would I want to type 'git status' and then 'git diff long/path/to/my/file.txt' when I could just click on the file in the changes list?

To see a diff of that file I'd you could just type gst or git diff l and hit tab in zsh until I get to the path I want. Takes me a short time, I don't have to move my hand away from the keyboard and I can use my aliases if I want to. I feel like for my day-to-day work, the CLI works faster and better than all of the GUI tools I see. Also, it's available on every machine I touch (colleagues and juniors when help is needed). I just find it more valuable in my day-to-day workflow compared to GUI stuff.

Because it's possible to be much faster on the command line than you'll ever be clicking around on a GUI.

if you are already using a GUI for your IDE, is it really faster typing on the command line than right clicking on your git root folder in explorer -- that in my case is usually on a different monitor -- and clicking on git commit where you can see all of your changes, and then double clicking on a file to bring up your diffs?

Don't get me wrong, knowing the command line is invaluable when it comes to corner cases that just don't work well in a GUI and automating tasks.

Yep, it's absolutely faster - a command line is simply more responsive than moving a mouse and clicking dialogs if you know exactly what you need to do. I've always used an IDE but I exclusively use the command line for VCS. Setting up a commit is much easier to do if you know all the commands over the tedious drag/right click/choose dialog.

As to your use case, git diff has never failed me.

This is all aside from the fact that the CLI is ubiquitous. One day you'll be in a situation where it's all you have access to, and you'll thank yourself for having learned it (or kick yourself for not).

When would I only have access to the command line as Windows developer? The most I would have to do from a command line in my usual day to day work is use git pull to automate my deployments. But in that case, I also need to know the ins and outs of msbuild. That doesn't mean I'm going to use a regular old text editor and build with msbuild in my daily workflow.

Maybe in the general case, but not in the scenario I outlined above. What if there are 20 files I touched? I'll need to type each of the 20 paths to view each individual diff. And that's if I view each diff only once, which I don't typically do.

I understand the value of being familiar with your tools, but sometimes the CLI just isn't the right one.

EDIT: btw, I use the vim plugins in all of my work (VS, VSCode), so I'm not shy when it comes to learning curves.

>git diff

Gets all the differences between your current unstaged changes vs your head, no need for typing 20 individual paths

Good to know, although I'm still not impressed with its presentation when doing larger diffs.

Then use a different `diff` app, such as the wonderful graphical kdiff if you're on KDE. You can find the differ that you like and then set `git` use it as a default.

Then we're back to using a GUI, which is what I was arguing for in the first place lol

True, but a graphical differ I'll concede could be easier to use than a CLI differ for some people. But a full graphical Git environment is horrible. I've seen my Windows-using colleagues use them. Shudder.

By the way, you might want to look at the `--diff-word` option.

No, it's faster using the keyboard to operate a GUI.

Git is one of my favorite tools of all times.

Not only has it been a time and life saver in my coding projects, it has also helped me out in other aspects - keeping versions of documents.

Once, at work, I created a script that would take regular git 'snapshots' of all the procedures defined on a SQL Server. O'course, there could have been better code management and deployment practices in the first place, but this script let me introduce some sanity without affecting the DBAs workflows.

Off-late, I have been working on a pure python implementation of Git (Calling it Pigit [1]). While, I am attempting to clone git's behavior in Python, I believe there are important lessons to be learned about the different components involved. I am also attempting to see if I can make it modular enough to have changeable 'reference' or 'object' store (instead of the file system). One use-case I can think of, off the top-of-my-head, is having an entirely in-memory repository for testing scripts or purely for pedagogical reasons.

Strangely, I never thought of looking into the AOSA Book for Git! Thanks for the reference :)

[1] https://github.com/rajatkhanduja/pigit

Nginx was the first C code base I've read that was well written besides being well architected. Now I would never recommend anyone to write C, but if you really want to know what it's supposed to look like take a look at Nginx.

> Nginx was the first C code base I've read that was well written besides being well architected.

PostgreSQL is also extremely nice.

Admittedly, SQLite has to do a lot less than Postgres, but I'll throw their hat in the ring if only for the rigor with which they test their code-base[1]. My visceral, completely irrational original gut reaction to that page was "meh, they're likely just generating the set of all permutations where on likely-injectable strings-- no formal verification, no care".

Thankfully, I went for a coffee, came back and the tab was still open for a more complete read. I've worked on tons of projects in my career -- from healthcare to finance, large insurance companies and defense. I really think SQLite is definitive case study on how to test your code-base in a "meaningful" (where "meaningless" would be generating the aforementioned permutations) manner. As new methods of testing emerge, they expand the scope of their coverage as well (i.e., AFL and other fuzzing techniques which emerged circa 2014).

Hats off to their team for the discipline. If I ever need someone to manage taking a pacemaker to production, I'd hire whoever was in charge of their testing, give them access to Fidelity Payroll, hand them the p-card, and tell them "whatever you need, it's yours, buddy".

[1] https://sqlite.org/testing.html

Are you aware that you comment contains of bit a bragging that might distract people being able to focus on the content?

To me, being generous with the interpretation, it read as the author providing some context into their experience with what many think should be well tested systems, not bragging.

That is a great link! Thanks!

Meta: So far (23 top level answers) we have: Nginx, Git, Guava, Photoshop (?!), Discourse, OpenBSD (and: other BSDs, Plan9, BSD tools, Linux, LLVM/Clang, WebKit, Chrome, Firefox, Quake 1-3, Doom 3, CPython, TensorFlow), Hashicorp's tools, Redis, Mysql, Postgresql, Apache HTTP server.

Wasn't the question about business applications?

Redeeming answers: ERPNext, Odoo, OpenERP, OpenERM

I have worked with Odoo. Although it is redeeming I don't think its a well architected app. System is extremely confusing to customize past the online tutorials.

"Wasn't the question about business applications?"

Define "business application".

To me, it would be defined as "application used in a business" (EDIT: or more precisely: "application upon which a business is built"). Web servers, databases, and (maybe) operating systems definitely qualify. So do web browsers.

Your definition is not the common one, I don't think.

Probably not. Nonetheless, "business application" is still pretty vague, IMO.


GitHub the application is not open source.


There's been a good deal of academic work on architectural differences between open source and closed source applications (basically resulting from the differences in the organizational structures that designed/built/grew them ala Conway's Law). Observations for example include reports that closed source applications tend to have more large scale API classes/layers, because there is a management structure in the designing organization that can herd them into existence, while open source projects of the same size and complexity tend to have a less centralized architecture, again reflecting the organizing characteristics of the developers involved[0].

None of this is arguing that one or the other style of architecture is "better" per se, but rather the architectures are different because they were in the end optimized for different kinds of development organizations.

Most business applications remain fundamentally a three-tiered architecture, with the interesting stuff today tending to happen in how you slice that up into microservices, how you manage the front end views (PHP and static web apps are pretty different evolutionary branches), and critically how you orchestrate the release and synchronization/discovery of all those microservices.

(None of which is directly an answer to your question, but is more meant to say that lots of the most interesting stuff is getting harder to spot in a conventional github repository because much of it is moving much closer to the ops side of devOps)

[0] http://www.hbs.edu/faculty/Publication%20Files/08-039_1861e5...

i think reddit in github is close with you describe, many devOps scripts also in its repos: https://github.com/reddit/reddit

Some open source software is mostly developed by a single organization and may look more like closed source software.

I'm thinking of Odoo for example, almost exclusively developed by Odoo S.A. (not saying it's a particularly good codebase to look at though).

Do you have the name or a link to one of these studies?

I've updated the parent comment with a citation.

I'll also mention a somewhat related article here, not directly on topic, but likely interesting to those reading about conway's law and architectures: Microsoft Research did some very interesting work on the interplay of code quality and organizational metrics (e.g. how high in the org chart do you have to go to get everyone who committed code to a specific DLL or what fraction of the developers under that lead engineer committed code to that DLL, or etc). Their conclusion, simply put, was that organizational metrics appeared to better model actual end user experienced shipped code quality than more traditional test metrics[0].

[0] https://www.microsoft.com/en-us/research/publication/the-inf...

Checkout ERPNext (https://GitHub.com/frappe/erpnext). It is based on a metadata framework (Frappe) that lets you build by configuration, so complexity can be handled much better.

Frappe also lets you build extensions (apps), add hooks to standard events, has a built in RESTAPI and more. Here is a quick overview https://www.slideshare.net/mobile/rushabh_mehta/frapp-framew...

Disclaimer: see my bio

Spree is an open source e-commerce solution. IMHO has good architecture for learning.

Spree has a clean API, clear models, front end and back end, extensions, and command line tools.


Especially take a look at the models:


a great app, thanks

http://www.aosabook.org/ - I actually did not get much out of this book, i felt my time was more efficiently spent studying languages and databases.

But this chapter is great: http://www.aosabook.org/en/500L/an-archaeology-inspired-data...

By business app, I'm interpreting as something that might be a a basis for writing an enterprise application or an application that might be used by enterprise and not the infrastructure-type of stuff I see posted below like NGINX, Git, etc...

Something that's expandable by multiple departments, expandable business-specific logic, modular, plug-in infrastructure, the ability to work with multiple authentication schemes, etc....

Take a look at Liferay Portal: https://github.com/liferay/liferay-portal/

Edit: fixed all my typos.

For an OSS business application, Rundeck (http://rundeck.org/) is very polished and has a clean architecture. The concepts for setting up jobs, schedules, ACLs, etc, is clearly thought out and flexible.

I love discourse and learnt a hell of a lot from it! It did however lead me to using ember in my open source project ( https://github.com/etewiah/property_web_builder ) which I rather regret.....

There is a nice site site about this subject: The Architecture of Open Source Applications http://aosabook.org/en/index.html

Two I am familiar with are OpenERP and OpenEMR.

OpenERP, now Odoo, is written in Python.

OpenEMR is written in PHP. It dates from a while ago, but has been mostly updated to the latest PSR standards.

Might also try OrangeHCM, but not sure what those guys are doing these days.

I'm not exactly sure what is meant by business. Commercial successful?

Anyway, here are some projects which I can recommend by its source code:

* OpenBSD. Also the other BSDs. Plan9. And the BSD tools. Linux is a bit bloated but maybe it has to be. I don't recommend the GNU tools.

* LLVM/Clang.

* WebKit. Also Chrome. Firefox not so much, although maybe it improved.

* Quake 1-3, as well as other earlier id games. Really elegant and clean. Also not that big in total. Doom 3 has become much bigger in comparison but again maybe it has to be.

* CPython. Anyway interesting also for educational purpose.

* TensorFlow. Very much not Theano.

I really enjoy reading the source code of most projects which I used at some point. Some code is nicer, some not so nice, mostly judged by how easy it is to understand and how elegant it seems to be. In any case it really is rewarding to look at it as you will gain a much better understanding of the software and often you will also learn something new.

My understanding for business application, the short answer is: software with special business purpose

I think business software here means software like: https://github.com/reddit/reddit, https://github.com/circleci/frontend, ERP, e-commerce and so on, against general application like library/framework/language compiler/database/OS

Amongst the other great suggestions you could also have a look at Redis (https://redis.io)

One of my favorite "open source architecture" essays is on Graphite:


It's part of the book, "Architecture of Open Source Applications", which has many such essays. This one is freely available -- and quite good.

Graphite is used for the business purpose of simple & fast real-time analytics for custom metrics inside an organization. It was built inside Orbitz and is now widely used at many startups, including my own.

Graphite is now a vibrant open source project with a community around it here:


Graphite has long ago hit the end of its useful life. Its limitations especially in the Whisper/RRD storage system are well known and lamented by those who still need to run it. A strictly-superior system (and a great OSS project to learn from, to boot) is Prometheus, https://prometheus.io.

Re: "This one is freely available -- and quite good."

All of the chapters/essays in the "Architecture of Open Source Applications" volumes 1 and 2 are freely available, in case that's not clear on the AOSA website: http://www.aosabook.org/en/

Applications dont really need to be well architected until they are hitting scale. Then the parts of their system that need to relieve pressure will need to be re-architected. This is almost like a case study and there are a lot of good talks on youtube from places like dropbox and facebook that explain the problem and solution. Example: https://www.youtube.com/watch?v=PE4gwstWhmc

If you dont want to do youtube case studies there are also books to read about distributed systems. Also reading about cloud architecture can help.

> Applications don't really need to be well architected until they are hitting scale.

very True, 'a system Well architected' before hitting scale is considered OVER Engineering

> Then the parts of their system that need to relieve pressure will need to be re-architected.

> This is almost like a case study and there are a lot of good talks on youtube from places like dropbox and facebook that explain the problem and solution. Example: https://www.youtube.com/watch?v=PE4gwstWhmc

As far as I know sqlite has the reputation of being great (mostly for the test coverage and sheer amount of unit tests).

From a usability and installation experience, Hashicorp's tools. One very small executable for each of their products that work as the client or server, a simple command to join them in a cluster, and reasonable defaults and the ones I've used work well together.

The learning curve to go from I've never heard of them to reading about them, to installing them and using them was very small at least for Consul, Nomad, and Vault.

Check out ERPNext written in python https://erpnext.com

For iOS engineers, I'd recommend reading over the Kickstarter iOS application (https://github.com/kickstarter/ios-oss).

They use a lot of interesting stuff, like FRP, lenses, etc.

Artsy has a bunch of Open Source applications that are interesting to check out, especially for those interested in mobile apps https://github.com/artsy

I find Apache Spark to be exceptionnaly well written and easy to read. (in Scala). https://github.com/apache/spark

Photoshop: http://www.computerhistory.org/atchm/adobe-photoshop-source-... Not open source but one of the most commercially successful and one of the best architected. Original source code now available via Computer History Museum.

Airbnb Superset. It's not mature yet, but it's enterprisey enough and the code is clean.

If libraries count, Google Guava has some of the most impressive code quality I've ever seen

PostgreSQL and Apache HTTP server.


I would say:



Has anyone mentioned SQLite?

Shameless plug, but Bahmni and the Go CD open source projects.


"On January 16, 2008, MySQL AB announced that it had agreed to be acquired by Sun Microsystems for approximately $1 billion"


Edit: sorry, missed the question entirely. I thought OP said "open-source businesses worth studying"

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact