Hacker News new | comments | show | ask | jobs | submit login
Veracity: The next step in DVCS (ericsink.com)
84 points by io on July 14, 2010 | hide | past | web | favorite | 42 comments

Some of this is really interesting, other parts not so much.

Versioning directories, pluggable storage layers, decentralized databases...these are interesting, but have been tried, and have not thus far proved compelling. Thus far, I have not seen projects move to Bazaar or Monotone because they were fed-up with Git and Mercurial not versioning directories, nor have I heard of people picking Mercurial or Subversion because of their pluggable storage systems, nor do I see people choosing Fossil and Monotone because they have distributed databases. It's not that these aren't good features--some of them, like properly handling directories and renames, are (IMVHO) definite improvements over Git and Mercurial--but rather, they are not by themselves enough to make people switch what they're using.

The real kicker for me is simply the license. All of the major DVCS players right now are GPL licensed. While Git's been pretty accepting of non-GPL compatible implementations (e.g., JGit, Dulwich), the Mercurial team has indicated that it would view any such project in a very dim light, and I can't conceive of anyone wanting to tempt the FSF by trying that kind of thing with Bazaar. Suddenly having a well-implemented closed-source-friendly DVCS could very easily result in a very sharp and immediate uptake among tool companies. This could completely change the game for corporate shops.

GPL covers code. If you take code from Git to write an alternative implementation, yours has to be GPL.

However, if all you do is take the file formats and the over-the-wire protocol, and write original code, it can be under whatever license you want. That seems to be what most of the alternative Git implementations have done.

The Git repository format is remarkably simple. I'm confident that if for some reason all copies of Git disappeared (binaries and source) and I needed to keep working with my existing Git repositories, I could cobble together something that would get me basic DVCS functionality in a couple of weeks.

By basic functionality, I mean the ability to check out revisions, view history, make changes to the working copy, and commit them, and to tag and branch and merge. (Heck, it was only a couple of weeks or so for Linus to write the thing once he decided to go for it, and he didn't have the benefit of having a well-designed and documented repository format and wire protocol handed to him).

Any decent programmer could do the same or better. Thus, I don't think Git being GPL is any significant barrier to adoption by people who need a good DVCS in a closed source project.

While I am keenly aware that the GPL only covers code, some major contributors to the Mercurial team have indicated that they do not believe it would be possible to make a compatible implementation without consulting Mercurial's source code. Whether you agree or not is immaterial at that point; why bother doing so when it will at best alienate the community, and at worst end in a messy lawsuit?

I'd be curious why they think consulting the Mercurial source code would make the compatible implementation have to be GPL.

Of course, rather than just go ahead and look and then write a compatible implementation from that knowledge, if I needed such a thing in a proprietary product, I'd pay a third party to read the Mercurial source and publish a specification of the file format, protocols, and rules for manipulating the repository (e.g., rules for locking things, and such). Then I'd implement from that specification.

That should avoid any confrontation.

There seem to be quite a few open source or free software projects where the developers think the license has more power than it really does. The reality is that all open source or free software licenses, even the more restrictive ones like GPL, will in fact let people do some things with one's software that one might not like. That's because these licenses are all based on relaxing the exclusive rights of the copyright holder. Unlike the typical proprietary software license, they aren't a mix of relaxing rights of the copyright holder and restricting rights of the user.

Hence, if someone, including proprietary software companies, wishes to do something with with the code or the knowledge embedded therein, and what they want to do is something that doesn't require permission under copyright law, then they are free to do it.

Personally, this is one of the reasons I have used the two clause BSD license for my recent works. People are going to find ways to legitimately do things I might not like. Instead of going with a restrictive free license like GPL and fantasizing that nothing bad can be done with my code and then being disappointed and pissed off when they do, better to just realize the price of giving users freedom is that they might do things I don't like, and not worry about it. Hence, BSD--the release and forget about it license.

What I find kind of funny (hilarious, actually) is that the kind of protection that some developers wish to achieve IS actually available--via patents! The Mercurial developers can get what they want out of GPL by patenting their repository format and their protocols, and then making a free patent license available to implementations that use GPL.

> nor do I see people choosing Fossil and Monotone because they have distributed databases

Monotone (and I believe Fossil) only version files/directories, and just happen to use SQLite as a local data store instead of rolling their own disk format. Veracity on the other hand, sounds like it allows simple databases as versioned objects in addition to directory trees. That's completely different and sounds very cool.

I'm also not sure how necessary it is. For example monotone allows you to configure custom merger programs (intended for if you don't like KDiff3/Meld/etc) and to set a file attribute that prevents it from trying to use the internal line merger, so in theory you could just commit your database file (or disk image containing database files, or ...) and then write a custom merge program for them. But maybe what Veracity has will delta-compress better, and of course you'd get a merger that's already written.

I don't know what qualifies as a "major DVCS player", but fossil is BSD licensed: http://methodlogic.net/BSDFossil.html

I'm actually far more interested in Fossil since it became BSD-licensed, but it's not nearly as widely used as the other three. The biggest projects I know of that use Fossil are SQLite, Lamson, Librelist, and Fossil (natch). Git's major projects are things like the Linux kernel, X, and Rails; Mercurial's include Firefox, OpenJDK, and Xen; and Bazaar's users include Emacs and MySQL. All three of those also have large commercial hosting options that are pretty heavily used (e.g., GitHub, Gitorious, Google Code, Bitbucket, Kiln, Launchpad), whereas Fossil currently lacks anything similar.

I'm not saying Fossil is bad for not being widely used; just that it's definitely not yet as popular as Git, Hg, and Bzr.

Emacs is de jure using Bazaar, but is it using it de facto? I read somewhere (I think it was here on HN) that the Emacs developers were leaning toward Git, but RMS decided on Bazaar because that's the office GNU DVCS. The comment or article I read said that someone made a good Bazaar/Git bridge, and most of the active Emacs developers are actually using Git, and bridging to Bazaar when they want to push to or pull from the official repository.

That's true to a large extent as far as I can tell. I've seen reports on emacs-devel of commits taking a couple of minutes, and of having merge issues you'd never encounter in Git.

Bazaar wasn't chosen for Emacs for technical reasons, and they appear to be suffering the consequences of that.

No question it's not as popular. A few folks (myself included) are working on using it for the NetBSD src, though, fwiw. IF we get that up/running and blessed (by NetBSD, who are on search for CVS alternatives), it'd a nice feather in the hat for fossil.

a DVCS that can handle enterprise requirements, very interesting. But let's see how it compares to git/hg/bzr in a week :) The pluggable storage layers sounds like something I would use in my company.

Well, I guess it depends on how they implement the "database versioning", but if I had a DVCS that would version a sqlite database inside a file (merge support, all the trimmings) in addition to the code surrounding that file, I'd switch away from git/svn in an instant.

I bet pluggable storage layers would be really useful for building something like GitHub. Start with one of the builtin storage layers, and if scale puts strain on that, write a better storage layer.

GitHub more-or-less did that when they switched off of Engine Yard (where they were using GFS), but they had to roll their own with RPC calls. That also meant that they had to change Grit, their Ruby git library, to split stuff that had to get executed on the fileservers (where the actual repos are stored) and stuff that can be executed on the server that wanted the information. With pluggable storage layers, they could have changed how they stored the repos without changing their bindings at all.

Agreed. From the description, it sounds like it would be best marketed to enterprise and bigger businesses. Kind of like how they encourage SourceSafe users to migrate to Vault, they could work on migrating TFS/ClearCase users to Veracity.

I think if they aimed it at a git/hg/bzr replacement for OSS projects, they'd fail mightily.

Either way, they have their work cut out for them.

"The core of Veracity will be open source, but we do plan to sell add-on [proprietary] products built on the core."

When companies do this, doesn't the open source product run the risk of becoming a second class citizen, or even a crippled one with features being implemented in the proprietary product and later migrated downstream or not at all? Isn't this a recipe for conflict of interest?

That depends on whether the products they built are "add-ons" or complimentary products.

If they're building add-ons (i.e. apart from authenticated users, premium members can make groups or something like that) then yes, you're getting a conflict of interest. If they'll built a commercial GUI application, or an integrated graphical differ then it will only help develop the dvcs because the ecosystem around it will be larger.

It's looking similar to fossil (http://www.fossil-scm.org) to me... Though the description isn't very deep.

If it is fossil with an easy to use interface that would be a definite value-add!

Eric is a very smart guy so I'll be interested to see this released. The pluggable storage layers seems like a feature that might make it a lot slower than git since I believe git uses a lot of file level i/o tricks for speed.

I'm sure it'll be slower than Git--literally ever DVCS I know of is, because Linus, as a kernel hacker, built Git around things he knew would be very fast--but precisely for that reason, you can be tremendously slower than Git and still be extremely fast. Mercurial can fall into that category: merely starting Mercurial can take longer than some Git operations take to complete, but that ends up meaning that they take a whopping 150ms on the Mercurial side. Given people generally perceive operations that take less than 250ms as instantaneous, that pretty much literally doesn't matter. While I have no idea whether Veracity will get that close to Git's speed, there's no reason it can't be close if they've written it well.

Seems to me stronger user accounts could be bolted onto an existing DVCS quite easily. The hard part would be working out the PKI, but adding digital signatures to commit objects would be easy.

PKI probably would need to be pluggable to meet the variety of project needs out there. I'd imagine enterprise projects would use a corporate CA, some small startups or open source might be comfortable with a quick-start "ring-of-trust" distributed scheme, etc. Github and other hosting providers could offer CA services. Interesting way to prove code ownership in any case.

This is actually one of the two things about this product I find most interesting. Having a user-friendly public key system is really tough to get right, even when you're targeting programmers. Look at Monotone, for example: I'd argue they've basically got it right now, but it took them a long time to come up with sane ways for handling things like a user losing their keyring. Veracity will have to get this right from the get-go if they want to nail the enterprise market.

I think "bolted on" would precisely be the issue. That tends to translate into "bypassable" and "brittle".

That's great. At my last job I used Vault as a replacement to VSS(crap) and found it to be a polished product. I didn't like the dependency on sql server but it is robust. I am glad that they are open sourcing this as more options for the enterprise and business in general are good.

I'm using git now and it is leaps and bounds better than anything else I have used. (I like Mercurial too but the branching as a clone is somewhat of a deal breaker for me. I know that there are extensions to do this but I like the out of the box philosophy of git.)

>>I like Mercurial too but the branching as a clone is somewhat of a deal breaker for me.

Not sure what you mean hg branch Branchname creates a branch in the repo.

Ah yes you are completely right about the ability. Back when I was looking into mercurial vs. git some of the documentation I found it seemed like cloning a branch was the preferred way to do things in HG land. Not sure if that was the case but it was my impression at that time at least. fuzzy recollection mixed with not using it in a while led to that statement. cheers.

The launch of this product makes a lot of sense considering Eric Sink's earlier writings on the topic of DVCSs. For example, in one article, he talks about the diverging trends of Enterprise and Open Source, leading to Enterprise finding it hard to accept a DVCS. Well worth a read: http://www.ericsink.com/articles/vcs_trends.html

Looks fascinating. One thing that jumps out at me as a git user is the directory/rename tracking.

Directory tracking seems like a great idea.

Rename tracking on the other hand seems like a fool's errand since there is no sane universal definition of what constitutes a rename; I think what git does with superficial heuristic hinting at the UI level is really the best thing there.

I could come at that from the opposite direction: maybe getting perfect renaming is impossible, but the incredible slowness of asking Git for a file's history makes it utterly unusable in some contexts, where Mercurial/Subversion/Bazaar's rename tracking makes the same operation lightning quick. Maybe it's impossible to get it perfect, but if it can get good enough, the reward can be tangible.

It's possible I've seen rename tracking turn into a debacle.

Personally I don't see the use for it. If you want to retain history then merge the file(s) to a different directory/name. More often than not that works more than well enough, whereas proper rename tracking can become a rabbit hole of edge cases, and if you miss even a few you can hose your users.

Anybody know how will the performance be w.r.t large binary assets? Git and Mercurial are notoriously handicapped there.

Ooh! Versioning the data in databases!

That is a feature I'm looking forward to. Rails migrations alleviate some of the traditional pain here, but don't go very far on actual data. Having a database you can check out with real versioning could be a really, really compelling feature.

Unfortuantely, that isn't what Veracity is going to do; see the comments from Eric Sink himself elsewhere in this discussion.

This sounds a bit like fossil, what with storing stuff in a DB and syncing between databases.

I've been waiting for someone stepping into the database schema versioning space for years now. Would be very interesting to see how this would work in Veracity after rolling/maintaining my own solution for years.

And who's this "Vera"? (-:

I may be mistaken, but I don't think that he's talking about database schema migration, but that the VCS itself has a versioned database for any extra meta data that isn't really code, but should go along with it. Mercurial 1.6 added something very similar in a key-value store, although I haven't tried it out yet.

"Veracity goes beyond versioning of directories and files to provide management of records and fields, with full support for pushing, pulling and merging database changesets, just like source tree changesets."

Okay, I re-read that three times, and I still parse that as some form of database versioning (vs. a database where versioning data is stored)... I mean it talks about records and fields, etc. I guess we'll have to wait a bit to see what's supported.

On the other hand, tghw's interpretation could be very on the mark as well (e.g. simple meta data that goes alongside the sourcecode).

tghw's interpretation is closer to being correct.

A Veracity repo can have lots of DAGs (directed acyclic graphs).

A DAG in a repo can either be a "tree dag" or a "database dag".

Veracity's notion of a "database" is a model that is unique to Veracity, although its concepts are quite common. Records. Fields. Multiple record types. Simple constraints. Links between records. Simple queries. It's not SQL.

Nothing here implies that Veracity will help you, as a database developer on SQL Server or Oracle, manage the versioning of your schema and sprocs and so on. Those are interesting features, but they're distinct from (and a lot higher level than) the stuff I was describing.

I am very interested in that "database dag". If it works the way I think it does, I think it will be far more significant than the source control part of the product. For example, I'd like to be able to put together a decentralized knowledge base. Who knows, we may see http://redmonk.com/sogrady/2010/05/04/open-data-github/ sooner than later.

Hi Eric, love your ambitions. Why did you feel the need to put something so database-y right into the repo and go beyond file/directory versioning and key,value tags?

Thanks for clarifying!

There was one key phrase there that makes me think it will be worth looking at: "We are dogfooding Veracity here at SourceGear"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact