This is completely untrue. There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol, they show you the results of that protocol but show zero insight into the protocol.
Tridge neglected to tell people that he was snooping the network while Linus was running BK commands when Linus was visiting in his house. THAT is how he did the clone.
The fact that you all believe Tridge is disappointing, you should be better than that.
The fact that Tridge lied is disappointing but I've learned that open source people are willing to ignore morals if it gets them what they want. I love open source, don't love the ethics. It's not just Tridge.
> There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol
The network protocol, according to multiple sources and the presented talk at LCA, was "send text to the port that's visible in the URL, get text back". The data received was SCCS, which was an understood format with existing tools. And the tool Tridge wrote, sourcepuller, didn't clone all of BitKeeper, it cloned enough to fetch sources, which meant "connect, send command, get back SCCS".
Anything more than that is hearsay that's entirely inconsistent with the demonstrated evidence. Do you have any references supporting either that the protocol was more complicated than he demonstrated on stage at LCA, or that Tridge committed the network surveillance you're claiming?
And to be clear, beyond that, there's absolutely nothing immoral with more extensively reverse-engineering a proprietary tool to write a compatible Open Source equivalent. (If, as you claim, he also logged a friend's network traffic without their express knowledge and consent, that is problematic, but again, the necessity of doing that seems completely inconsistent with the evidence from many sources. If that did happen, I would be mildly disappointed in that alone, but would still appreciate the net resulting contribution to the world.)
I appreciate that you were incensed by Tridge's work at the time, and may well still be now, but that doesn't make it wrong. Those of us who don't use proprietary software appreciate the net increase in available capabilities, just like we appreciate the ability to interoperate with SMB using Samba no matter how inconvenient that was for Microsoft.
Fascinating, I was unaware of that link (and don't systematically check people's HN profiles before replying). Thank you for the reference; I've edited my comment to take that into account.
> The data received was SCCS, which was an understood format with existing tools.
You'd be surprised. SCCS is not broadly understood. And BK is not exactly SCCS.
I read the SourcePuller code when it was published (sp-01). It's pretty easy reading. I give Tridge credit for that. I wrote a little test, got it to checkout the wrong data with no errors reported. Issue was still there in sp-02 .
Rick saying "I worked on BK" is the understatement of the century. He showed up and looked at my code, I had done things in a way that you could have walked the weave and extract any number of versions at the same time. He was really impressed with that. I split apart stuff that Rick had not seen before.
Then he proceeded to fix my code over and over again. I had a basic understanding of SCCS but Rick understood the details.
Rick knows more about SCM than any guy I know.
And he is right, SCCS is not well understood and BK even less so.
Come on, man, you should be better than this. With so many years of hindsight surely you realize by now that reverse engineering is not some moral failing? How much intellectual and cultural wealth is attributable to it? And with Google v. Oracle we've finally settled even in the eyes of the law that the externally visible APIs and behavior of an implementation are not considered intellectual property.
Tridge reverse engineering bk and kicking off a series of events that led to git is probably one of the most positively impactful things anyone has done for the software industry, ever. He does not deserve the flack he got for it, either then or today. I'm grateful to him, as we all should be. I know that it stings for you, but I hope that with all of this hindsight you're someday able to integrate the experience and move on with a positive view of this history -- because even though it didn't play out the way you would have liked, your own impact on this story is ultimately very positive and meaningful and you should take pride in it without demeaning others.
I don't like cheaters. If Tridge had done what he said he did, go him, I'm all for people being smart and figuring stuff out. But that is not what he did and it disgusts me that he pretends it is.
There is absolutely zero chance he figured out the pull protocol via telnet. I will happily pay $10,000 to anyone could do that with zero access to BK. Can't be done. If I'm wrong, I'll pay up. But I'll have a lot of questions that can't be answered.
So he cheated, he got Linus to run BK commands at his house and he snooped the network. He had no legal access to those bytes. Without those snoops, no chance he reverse engineered it.
As I have seen over and over, when the open source people want something, they will twist themselves in knots to justify getting it, legality be damned.
How about you be better than this and admit that open source is not without its skeletons?
>So he cheated, he got Linus to run BK commands at his house and he snooped the network. He had no legal access to those bytes. Without those snoops, no chance he reverse engineered it.
Snooping the network is a common and entirely legal means of reverse engineering.
>There is absolutely zero chance he figured out the pull protocol via telnet. I will happily pay $10,000 to anyone could do that with zero access to BK. Can't be done. If I'm wrong, I'll pay up. But I'll have a lot of questions that can't be answered.
I just tried this myself. Here's the telnet session:
I confess that I had to look up the name of the BK_REMOTE_PROTOCOL environment variable after a few false starts to put the pieces together, but it would be relatively easy to guess.
I also looked over Tridge's original sourcepuller code and didn't really see anything that you couldn't infer from this telnet session about how bk works.
- no weave. Without going into a lot of detail, suppose someone adds N bytes on a branch and then that branch is merged. The N bytes are copied into the merge node (yeah, I know, git looks for that and dedups it but that is a slow bandaid on the problem).
- annotations are wrong, if I added the N bytes on the branch and you merged it, it will (unless this is somehow fixed now) show you as the author of the N bytes in the merge node.
- only one graph for the whole repository. This causes multiple problems:
A) the GCA is the repository GCA, it can be miles away from the file GCA if there was a graph per file like BitKeeper has.
B) Debugging is upside down, you start at the changeset and drill down. In BitKeeper, because there is a graph per file, let's say I had an assert() pop. You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset. We were a tiny company, we never hit the claimed 25 people, and we supported tons of users. This form of debugging was a huge, HUGE, part of why we could support so many people.
C) commit comments are per changeset, not per file. We had a graphic check in tool that walked you through the list of files, showed you the diffs for that file and asked you to comment. When you got the the ChangeSet file, now it is asking you for what Git asks for comments but the diffs are all the file names followed by what you just wrote. It made people sort of uplevel their commit comments. We had big customers that insisted the engineers use that tool rather a command line that checked in everything with the same comment.
- submodules turned Git into CVS. Maybe that's been redone but the last time I looked at it, you couldn't do sideways pulls if you had submodules. BK got this MUCH closer to correct, the repository produced identical results to a mono repository if all the modules were present (and identical less whatever isn't populated in the sparse case). All with exactly the same semantics, same functionality mono or many repos.
- Performance. Git gets really slow in large repositories, we put a ton of work into that in BitKeeper and we were orders of magnitude faster for things like annotate.
In summary, Git isn't really a version control system and Linus has admitted it to me years ago. A version control system needs to faithfully record everything that happened, no more or less. Git doesn't record renames, it passes content across branches by value, not by reference. To me, it feels like a giant step backwards.
Here's another thing. We made a bk fast-export and a bk fast-import that are compatible with Git. You can have a tree in BK, have it updated constantly, and no matter where in the history you run bk fast-export, you will get the same repository. Our fast-export is idempotent. Git can't do that, it doesn't send the rename info because it doesn't record that. That means we have to make it up when doing a bk fast-import which means Git -> BK is not idempotent.
I don't expect to convince anyone of anything at this point, someone nudged, I tried. I don't read hackernews any more so don't expect me to defend what I said, I really don't care at this point. I'm happier away from tech, I just go fish on the ocean and don't think about this stuff.
Git doesn't track changes yes, it tracks states. It has tools to compare those states but doesn't mean that it needs to track additional data to help those tools.
I'm unconvinced that tracking renames is really helpful as that is only the simplest case of of many possible state modifications. What if you split a file A into files B and C? You'd need to be able to track that too. Same for merging one file into another. And many many many more possible modifications. It makes sense to instead focus on the states and then improve the tools to compare them.
Tracking all kinds of changes also requires all development tools to be aware of your version control. You can no longer use standard tools to do mass renames and instead somehow build them on top of your vcs so it can track the operations. That's a huge tradeoff that tracking repository states doesn't have.
> submodules
I agree, neither submodules nor subtrees are ideal solutions.
> What if you split a file A into files B and C? You'd need to be able to track that too. Same for merging one file into another. And many many many more possible modifications.
I suppose Bitkeeper can meaningfully deal with that since their data model drills down into the file contents.
> You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset.
I still have fond memories of the bk revool. I haven't found anything since that's been as intuitive and useful.
I hadn't heard of the per-file graph concept, and I can see how that would be really useful. But I have to agree that going for a fish sounds marvellous.
I fished today, 3 halibut. Fish tacos for the win! If you cook halibut, be warned that you must take it off at 125 degrees, let it get above that and it turns to shoe leather.
It might not have happened if it wasn't for me and avb. I need to write that up but the short story is that FDDI was the path to 100mbit, I wanted 100Mbit ethernet, the sun hardware engineers thought I wanted to signal over copper the same way that 10mbit did.
I didn't care about that, I cared about what someone in this thread said, it's amazing that you can plug a 10Mbit hub in and have it work with 100mbit, Gbit, etc.
In my mind, it was all about the packet format, if they are the same then we get cheap switches. And that is what we have today, I saw this coming around 1990.
And for you guys hating on the 1500 bytes, the SGI memory interconnect taught me that bigger is not better. When you are doing cache misses remotely, big is not good. I'm doing a shit job explaining but there is some value in smaller.
Tim was cool, he said add "The Perl logo is a trademark of O'Reilly Media and is used with permission" and we're good. Now I have to remember how to get into that VM :-)
I sort of agree with this. I'm not a lisp fan, I don't think lisp is bad, it's just not how my brain works so I struggle. But I understand lisp enough to agree with you and I think that is what John was trying to do.
I wrote most of my first source management system (NSElite, mentioned elsewhere in this thread) in perl4. I was learning perl at the time and my first and second efforts were awful. Perl really lets you get sloppy and create unmaintainable code.
My 3rd rewrite was very stylized and, I felt, maintainable. Which proved to be true as I had to fix bugs in it.
I did weird stuff like using $whatever as the index into the
@whatever array.
But I digress. On the <>, Little has argv so you can do
int
main(string argv[])
{
int i;
string buf;
FILE f;
if (defined(argv[1]) && streq(argv[1], "-") && !defined(argv[2])) {
while (buf = <STDIN>) bputs(buf);
} else {
for (i = 1; defined(argv[i]); i++) {
if (defined(f = fopen(argv[i], "r")) {
while (buf = <f>) puts(buf);
fclose(f);
} else {
fprintf(stderr, "unable to open '%s'\n", argv[i]);
}
}
}
return (0);
}
but why would you want to when all of that is
int
main(string argv[])
{
string buf;
while (buf = <>) puts(buf);
return (0);
}
I mean, come on, that's cat(1) in 8 lines of code.
edit: I need to learn hacker markup. My code looks like crap.
You are kind of making my point. In tcl, they'd just give you "", but you can't tell the difference between "past the end of the array" or an element where you said
set foo[i] = ""
In Little you can tell, we'll return undef (your clear "error" though in these languages it is a supported feature, not an error). So we support the auto expanding array but give you that extra bit of info that you are past the end.
That man page isn't the most clear. upvar gives you a way to modify a variable in the calling context (and I think it can go up more than one stack frame).
It's a way to pass by reference rather than the default pass by value (how tcl passes is more complicated than that for performance but the default semantics are pass by value).
I would think so but I haven't done it. People embed tcl all the time, perl/tk is perl with a tcl interpreter embedded just so they can get at the tk part (gui stuff).
Tridge neglected to tell people that he was snooping the network while Linus was running BK commands when Linus was visiting in his house. THAT is how he did the clone.
The fact that you all believe Tridge is disappointing, you should be better than that.
The fact that Tridge lied is disappointing but I've learned that open source people are willing to ignore morals if it gets them what they want. I love open source, don't love the ethics. It's not just Tridge.