Hacker News new | past | comments | ask | show | jobs | submit login
OpenBSD Crossed 400k Commits (marc.info)
154 points by fcambus 6 days ago | hide | past | web | favorite | 44 comments





Anyone know of projects with >1M commits? >10M? >100M?

KDE reached 1 million commits ~10 years ago

https://dot.kde.org/2009/07/20/kde-reaches-1000000-commits-i...


KDE used to be monorepo right?

Yes. It used to be, until they started moving to git at around the same time as the news article above.

Well, with Subversion you can sort of have different repositories in one repository... I think just about nobody had a checkout of the monorepository (it was too large), so changes (e.g. in public API) in several parts of the stack were not perfectly synchronized. There was one check-in per atomic repository for such things.

The Linux kernel is quickly approaching the 1M mark, currently sitting at 871k+[1]

[1] https://github.com/torvalds/linux


Note that this number doesn't include any pre-git history -- all of those commits were made after April 2005 (v2.6.12-rc2 is the first commit within git).

The initial commit the parent comment is referring to:

https://github.com/torvalds/linux/commit/1da177e4c3f41524e88...


It also has 622 releases. So on an average, 1400 commits make it to a single release. And releases are RC level not just version bumps. A single version bump has well over 10000 commits on an average.

Hasn't the rate of development grown too over the years? It's not going to be an even distribution.

Greg KH gives a pretty good talk about the rate of kernel development fairly regularly[1]. But yes, it's been consistently getting faster every kernel release.

[1]: https://www.youtube.com/watch?v=vyenmLqJQjs


Absolutely. It's a crude estimation. If anything, it's going to be even more commits per release. The point being, it's impressive that those many changes go into a single release of something as critical as the Linux kernel.

Number of commits highly depends on people in a project and workflow of the team. I make commits every 5, up to 10 minutes, so I do like 30-70 commits per day[1], some people in another team make a commit per day. Linux on github is close to 900k [2]

https://imgur.com/a/nCuOJHU

https://github.com/torvalds/linux


In our team it's even less. We rebase and squash to one commit before merging, so one commit can take a couple of days or even a week.

FreeBSD is getting close: 513k ports commits, 353k src commits, 53k doc commits. I'm guessing another year to get from 919k up to 1M?

Chromium recently crossed a million bugs. And if you count Chromium and Blink together (they're in one repository for quite some time now), it's probably over a million commits too.

https://www.openhub.net/p/chrome says 810K commits, blink is 182K commits (https://www.openhub.net/p/chromium-blink), so 993K. But the commit count is 2 months old and so they probably are over 1M now.

The Google monorepo had 35M in 2015: https://softwareengineering.stackexchange.com/questions/4143...

Not sure what Microsoft's monorepo for Windows is like but it might be near 100M.

Beyond that I would guess some aerospace or DoD projects have a larger codebase, but they probably don't use version control consistently.


I would be surprised if they had code bases that large.

https://www.f35.com/about/life-cycle/software

F-35 has 8 million LOCs. An equivalent C++ project would be Qt (~ 8 million LOCs and > 100k commits).



The Apache svn repository is approaching 1.9 million commits http://svn.apache.org/viewvc which includes all the various Apache projects

It’d be beautiful to see a plot of how that number of commits has accumulated over time

Two years ago I did something similar. Plotting the surviving lines of code in the OpenBSD code base across commits:

https://twitter.com/mulander/status/809120593606049792


Nice plot. Interestingly, it seems that by far, relatively, most code from 1999 was removed. What did they add in that year, which got removed then (around 2014)?

not op, and not 100%/authoritative, but I can think of some things:

- adding/refactoring locking for improved SMP support

- dropping older architectures (VAX, etc)

- dropping older protocols/servers (e.g ISDN, decnet sorts of things, obsoleted proto-IPv6 versions)

- dropping/refactoring systrace

- rewriting or dropping various network routing daemons (apache HTTPD 1.3 removed from tree at this time)

- libressl replacement at this time

see also: https://en.wikipedia.org/wiki/OpenBSD_version_history

relatedly, in the openbsd world, Ted Unangst is well known enough for doing old/unused code audit+removal that there is a slang verb 'tedu' (his handle, usage e.g. "it got tedu'd") which means basically zapping old stuff. See 1st comment in the twitter thread..


Y2K mitigation code

Monotonic functions are boring to look at. I'd rather see its derivative, probably smoothed out a bit.

Casts exponential function.

you can use Gource to get a graph like this: https://youtu.be/iZjvVxbM3kY

400k CVS commits

It's also probably one of the oldest open source repositories. OpenBSD pretty much pioneered the concept of making their VCS open to the public over the Internet (hence the name).

Yes, OpenBSD invented anonymous CVS which was the first way to access a version control system without prior authorization http://www.openbsd.org/papers/anoncvs-paper.pdf

It was already ubiquitous in 1997 when I got started working on open source software, so I took it for granted. I was surprised to find out 20 years later how new anoncvs had been and how fast it spread to other projects like FreeBSD and Apache httpd.


Why is 400K special? Why not 524288, considering we're coders after all?

Because that’s the number of commits they recently passed. What about being coders makes it more interesting for us to wait a few more years than talk about it now?

;)


262144 then?

You downvoters are obviously not qualified to be hackers ... who uses decimal in a place called Hacker News?


[flagged]


I read it more like a quantum quandry, you never know exactly how many commits there are, but we will celebrate one possible measure because it comes close and that's a landmark. One would seem to think his statement also applies to himself:

>If you think you've got a great way of measuring, don't be so sure of yourself -- you may have overcounted or undercounted.

By his own admission, his own counting method is probably flawed.


> .. because yes the code quality is mostly very high

because he is, actually, smart? I'm in totally different field, devops, don't laugh :), but I religiously follow their approaches to security and design in general.


Yes he is. I complimented on the code quality, and that doesn't come from itself.

But "being smart" does not require you to be arrogant and condescending. Theo is, and seems to need to say "I AM SMART" all the time. Like "Hey, I found this thing where the manpage says X but look the implementation is actually Y. POSIX says X, so probably Y is a bug". Answer: "I AM SMART. I will fix this". Ooooo-kay. I didn't say you weren't smart.


That seems to me a healthy approach

He also said:

> That's a lot of commits by a lot of amazing people.


I didn't see that in his message. Why are you so bitter about one developers perceived ego?

Imagine you're part of a development team where the tech lead in every thread says "I am the tech lead because I'm the smartest".

It's made me contribute less to OpenBSD than I otherwise would have (luckily other people are more welcoming, and Theo isn't a bottleneck on all things), and it's not just me. Other potentially good contributors have stayed away. Now, of course, other bad contributors have stayed away too.

The other bad aspect to arrogance is that it misses out on research the rest of the world has done, because you think nobody else can think. OpenBSD got W^X years and years after Linux (though OpenBSD got it by default first), because (in their own words) they don't look at what Linux is doing. OpenBSD missed out on W^X for years, and it took one more release for x86 to get it, because they said it couldn't be done (even though it worked just fine on Linux).

Looks like it was the same with the intel branch predict bugs. They say they did huge amount of research over weeks or months, and then just ended up with the kernel memory maps containing... exactly what Linux chose. Why did they do this from scratch?

I wouldn't say I'm bitter, but resigned to just accept that OpenBSD's way of doing things misses out on exactly what they want to achieve because of this attitude.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: