Hacker News new | comments | show | ask | jobs | submit login
A rift in the NTP world (lwn.net)
173 points by Tomte 219 days ago | hide | past | web | 163 comments | favorite



I see the strapline of NTPsec on Github is: "ntpsec/ntpsec: The NTP reference implementation, refactored".

So is it or isn't it?

A few quotes from ESR from here http://esr.ibiblio.org/?p=6881Interesting

> tossing out as many superannuated features as I could

> full of port shims for big-iron Unixes from the Late Cretaceous

> I do have an an advantage because I’m very bright and can hold more complex state in my head than most people [speaks to attitude :)]

> This differs dramatically from the traditional Unix policy of leaving all porting shims back to the year zero in place because you never know when somebody might want to build your code on some remnant dinosaur workstation or minicomputer from the 1980s.

> Yet another important thing to do on an expedition like this is to get permission – or give yourself permission, or fscking take permission – to remove obsolete features in order to reduce code volume, complexity, and attack surface.

> Then ntpdc was deprecated, but not removed – the NTP Classic team had developed a culture of never breaking backward compatibility with anything.

> I shot ntpdc through the head

I don't know the facts in the case at all, but I can imagine if you were the existing maintainer, you'd find this language and attitude incredibly difficult to take. And after all if you're not being paid to work on it, why should you swallow your pride?


> I see the strapline of NTPsec on Github is: "ntpsec/ntpsec: The NTP reference implementation, refactored".

> So is it or isn't it?

It's technically correct since they started from the reference implementation then "refactored" it.

They're skirting real close to claiming they are the reference NTP implementation but not quite doing that, just leaving the door open for misunderstandings.


The concept that your protocol is defined by a "reference implementation" and not a well-written specification is moronic. Someone should be able to articulate unambiguously (with words) what the NTP protocol is and what it does.


Yes and no.

(As an aside, have you ever tried to articulate unambiguously, with words, a complex technical topic, such as the design for a network protocol?)

On the one hand, NTP is defined by an IETF RFC (https://www.ietf.org/rfc/rfc5905.txt, version 4), using words, diagrams, and some code. The traditional RFC standards process requires two interoperating implementations for standards-track work. A reference implementation usually aims to be one of them, implementing the whole protocol, including features that are not needed in all use-cases. The reason for this is that it is easy to specify something that cannot be implemented, well or at all, and for the spec to be ambiguous so that different implementations do not work together.

The reference implementation also acts as a pseudo-formal-specification, in that it is more formal than the English language version and its behavior can be compared with what the spec writers intended.


Where did you get the idea that NTP is defined by a 'reference implementation'? Going back to 1988, Dave Mills has been documenting the NTP protocol. https://tools.ietf.org/html/rfc1059

(Personally I don't think that's the best way to do time synchronisation, but it is fully documented, measured, etc.)


>Where did you get the idea that NTP is defined by a 'reference implementation'?

The article and many comments here indicate that they believe NTPd is a "reference implementation". What does that mean other than "specifies behavior"?

If it instead means "Here's an example, it might do what the specification says or not" then it's not really a "reference" - the specification is.


The reference implementation serves 3 purposes:

1) It implements the whole thing, so it can be used as-is.

2) It serves as a reference for any questions that are unanswered by the spec. Yes, specs should ideally answer all questions themselves, but there has not been a spec written yet that does not have at least one unanswered question somewhere.

3) It serves as a way to test new implementations, because you can see if your new implementation is compatible with the reference implementation. If not, then you probably screwed something up somewhere.


>What does that mean other than "specifies behavior"?

That it's the implementation that a) everyone uses, and b) actually supports the whole spec.


That's why most attempts to give a formal semantics to various standards or protocols find errors in them. This has happened repeatedly going back decades. You'd think more parties would take note to do at least a minimal, formal spec. The tooling is better than ever now.


Agreed. Notice how another child comment was down-voted and there's whining in the responses about how hard it would be to write unambiguous specification. The idea that you could write a "reference implementation" that "proves your implementation meets the specification if it interoperates with the reference" is silly.

Unfortunately most people find it "exciting fun" to "just write code" and "boring work" to carefully consider exactly the behavior of their software in all conditions.


That's how the majority of software on the internet is written, including Python and Ruby for much of their time.


>That's how the majority of software on the internet is written

This is true but serves as an indictment of that software, not the concept of clear specification.


Would people claim that Python and Ruby have "specifications", though?


Ruby at least has an ISO standard; it's not exactly current with today's Ruby, though.


Refactoring shouldn't change functionality.


"today neither NTP nor NTPsec shares code or patches with the other"

So I guess it's not the "reference implementation".


You don't think removing dead code and needless complexity is a good thing? It seems to me like he's doing a very good and important job, there's no need whatsoever to retain support for systems many decades old at this point.


If those systems are still in use, then surely they need an NTP implementation.


I need a house and a car and food, can I rely on NTP developers for those? Just because someone needs an NTP implementation doesn't mean that the NTP developers need to provide it, especially if providing it is to the detriment of everyone else's security. What do any of us who don't have to run 80's mainframes get out of tethering ourselves to the past?


Why does retaining compatibility with systems that you've always had compatibility with mean reduced security? That seems like a pretty big assumption without any supporting evidence. Normally retaining compatibility with old systems is a pain just because it's extra work, rather than actually hurting the security of the software.


If that compatibility is implemented by large quantities of otherwise unnecessary code, then having that code be there increases the potential attack surface and reduces security. Just as obscure, rarely used features - removing them would facilitate security.

I don't know much about NTP, but if I would take their claims at face value - "reduction in code of over 2/3 (from 227kLOC to 74kLOC)" is a valuable gain even if means removing compatibility with some less used systems; and "NTPsec was immune to over 50% of NTP Classic vulns BEFORE discovery in the last year" is a realistic consequence of a large reduction in code size/attack surface.


If NTPsec is immune to a bunch of vulnerabilities because they literally deleted all of the relevant code, that's disingenuous. It's not that they're immune, it's that they chose to literally throw away support for a whole bunch of functionality that NTP retained. You could argue that maybe people shouldn't be using that functionality anymore (though I don't know if that's a good argument, since I don't know what the actual functionality is or why NTPsec threw it away), but since people are using it, and since NTP Classic is the reference implementation for NTP (and therefore must implement all of NTP), I reject the notion that NTP Classic is automatically in the wrong for retaining this functionality.


If product A implements a small subset of the functionality of product B, but that small subset is enough for your needs, then from a security perspective (other things being roughly equal) you should never ever use product B. It brings in additional risks for no good reason. By default, extra functionality is a cost and a drawback - if it's useful it may justify that cost, but otherwise removing that functionality is a very useful thing to do.

If half of people need the 'deleted' functionality (and it seems that the proportion is much smaller, but let's assume half) then obviously they should use NTP Classic, but the other half definitely should not.


Depending on what the deleted functionality is, vulnerabilities in that functionality may not affect you anyway even if you're using NTP Classic. If you're not using that functionality to begin with, then it's hard to be affected by vulnerabilities in it. I say "depending on what it is" because of course you can have functionality that, even though you don't care about it, you still can't actually turn off (e.g. a protocol server that implements commands you don't care about, but can't disable).

It's not clear to me offhand whether the NTP Classic functionality that NTPsec deleted is the type that doesn't affect you if you don't use it, or if it's the type that can't be disabled.


[flagged]


I'm not sure why you're being downvoted, it seems a fairly straightforward statement to make about ESR. When he pops up on a project mailing list, he tends to cause a lot of noise before (in every case I know of) gently fading out again when he doesn't get his own way. It's especially fun (and painful) to watch strongly consensus-driven projects try to cope with his presence, for example when he turned up on the Subversion dev list.

He popped up on python-dev a few days back with one of his archetypical shots across the bow: https://mail.python.org/pipermail/python-dev/2017-February/1...

Literally nothing in that message except ego, and what amounts to a promise to ignite some flame wars in the near future, as if any of that information was worth wasting a few hundred peoples' time to read.


It's basically pure insult. It has no semantic content beyond "I wouldn't want to work with ESR, he has a massive ego and he supports gun rights".

(Adding a throwaway compliment does not significantly affect this.)

The gun rights thing in particular is... if you want to say "I don't like guns and I don't want to work with anyone who supports gun rights", that's up to you. If you want to say "anyone who supports gun rights is probably a bad person to work with", you need to back that up. "I don't want to analyze most of his views(1) and how they reflect his personality traits" is the second one, and does not back it up.

(It's hiding behind some ambiguity and hinting, but I think the broad intent is quite clear. Possibly it was intended as closer to: "ESR's position on guns is crazier than simply supporting gun rights, and this makes him probably a bad person to work with". In that case, there are two things that need to be backed up: what specifically does ESR believe that is crazy? There are a lot of links on that page. And why does that make him probably a bad person to work with?)


I can't speak for the original commentor, but I'd be inclined to take the last interpretation there. My objection isn't about gun rights at all, but the way in which he argues for gun rights as if you're not worth paying attention to if you disagree.

See the reductio linked at the bottom: http://www.catb.org/~esr/guns/reductio.html

I could try to explain why that's wrong as an argument, and I'm sure you could too (the difference between banning guns in a home and banning them in a state, the existence of armed law enforcement, etc. etc.), but that's hardly the point. The point is that if you don't agree with the reductio, you are apparently "intelligence-impaired" and should "give up on public policy". There's no room for intelligent people who disagree.

And he does the exact same thing in technical contexts, which should be completely unsurprising. See http://esr.ibiblio.org/?p=7294 on the specific topic of an implementation language for NTPsec:

"And the core docs don’t presently offer me anything like the survey of options you just laid out.

"I trust you understand why this is a problem without my having to spell it out."

There are good reasons why the Rust team specifically and consciously doesn't include this information in the core docs, but tries to make sure that it's extremely easily discoverable. I'm happy to debate the merits of standard libraries vs. well-trusted third party libraries with people who are interested in debating that. But, apparently, there's no room for intelligent people who disagree with his opinion here, either.

This is poisoning the well. This is not simply arguing a technical point forcefully, but arguing that anyone who disagrees with the technical point is not worth arguing with. This kills projects and communities. I've worked with maybe two or three people who think this way, and it's still too many to want to work with another.


I have no particular reply to this, but I would like to say: yes, this is the kind of comment that I was trying to push towards. Thank you.


He doesn't just support gun rights, he tries to inject arguments about guns in inappropriate places.

Why do I know his stance on guns if the thing he does that I have interest in is creating software? It's totally irrelevant.

It's a bit like one person just writing software and talking about it and someone else barging in and going off about women's rights.

Eric S Raymond is a smart person that we owe a lot to but he doesn't come across to me as a nice person to be around.


> Why do I know his stance on guns if the thing he does that I have interest in is creating software?

Well, I don't know. But if I had to guess, I'd say the three most obvious avenues would be that you looked at his personal site, where he quite reasonably talks about his interests other than software; or that you looked at his personal blog, which mentions four topics in the tagline and only one of them is software; or that someone else brought them up in a context where they weren't relevant. Two of these are appropriate places for him to talk about guns, and one is no fault of his.

I'm not saying he doesn't inject arguments about guns in inappropriate places. But the fact that you know his stance about guns doesn't mean he does.


I took the parent comment to be someone familiar with ESR's personality, the reference to gun rights was unfortunate, but indeed probably the parent comment intended to create that association. However, parent comment certainly didn't suggest gun rights were the exclusive reason not to want to work with ESR.


There are reams of examples of ESR's crazy beliefs on the internet. Some of these one could pass off as personal eccentricities e.g. the belief you're a god-like superprogrammer or that Iranian agents are out to harm you. The overt racism, sexism and homophobia, less so. He's not just some dude with some slightly zany ideas.


acqq didn't say anything about racism, sexism or homophobia. They mentioned guns.

I feel like you've taken the conversation as: acqq anti-ESR, myself pro-ESR; and now you're jumping in on the anti-ESR side again, in accordance with how these things always happen.

But no, I'm not defending ESR. I'm saying that acqq's attack on ESR was uncool for reasons that have nothing to do with ESR. Launching a totally unrelated attack on ESR is beside the point.


I actually followed the gun link and read the first paragraph or so, and regardless of what acqq was trying to accomplish or infer with that link, I think those first few paragraphs are informative in a way that his specific views on firearm ownership are not.

Specifically, his choice to purposefully refer to himself as a "gun nut" because he thinks people see he's not "crazy" and it thus "discredits the idiots" flies in the face of the many people here saying he does seem a bit eccentric, if not "crazy". I'm left believing he's exceptionally poor at interpreting people's opinions and motivations (which might well explain some of the behavior referenced).

So, regardless for the reason for the inclusion of that link, I did find it somewhat illuminating.


So, because he expresses very little patience for repeated arguments about a public policy debate that's been going on for decades before he was born, your conclusion is that he must have some sort of defect in understanding other people? Like, some sort of autism spectrum disorder?

Perhaps he has no problem interpreting other people's opinions. Maybe he's just tired of dealing with polite sounding condescension from people who don't agree with him. Holding views that other people dismiss out of hand with implied "you'd agree with me if you didn't have aspergers" style ad hominem arguments for years tends to make them bitter.


> So, because he expresses very little patience for repeated arguments

Little patience is not the same thing as leaning into misconceptions.

> Perhaps he has no problem interpreting other people's opinions. Maybe he's just tired of dealing with polite sounding condescension from people who don't agree with him.

That's entirely possible, but it then means his proffered explanation is a post-hoc justification for trying to push people's buttons and creating discord rather than understanding. That's his prerogative, but I have little patience for people that create misunderstandings and chaos on purpose, regardless of whether they feel justified.


No, my point was it's very, very easy to come away with the impression that ESR is a complete nutjob and he's developed a well-deserved reputation for being one. Maybe you think his views on guns are perfectly sane but that's not even a fraction of the tenth of it. The record is extensive so whenever you run across someone who thinks ESR is bananas, I think it's pretty safe to assume they're familiar with it and are simply picking some issue they feel is particularly striking to them. 'I feel this person is an extremist so I wouldn't want to work with them' is not an unreasonable or 'uncool' position.


> Maybe you think his views on guns are perfectly sane

It's not that. I think his views on guns are not relevant; or if someone wants to claim that they're relevant, they need to defend that claim.

If someone has other objections to ESR, which are relevant, then they're free to bring one or more of them up. If there are a lot of relevant objections, that should be easy to do.

I'm not interested in saying: that particular objection was irrelevant, but there are so many relevant objections that it doesn't matter.


I think his views on guns are not relevant; or if someone wants to claim that they're relevant, they need to defend that claim.

I'm quite sure they don't. They have some opinion which you apparently don't share. Fair enough. But that's basically it, they don't really owe you a defense of that opinion to some standard you define.


[flagged]


> Is there any place where I claim that "his views on guns" are relevant?

You linked to his page on guns, with no further commentary, and then suggested that people inform themselves. If you thought the relevant thing was something other than his views on guns, then okay: instead of defending a claim you don't believe, you need to make clear what you are claiming.

I claim that you were either communicating poorly, or deliberately hiding behind ambiguity.

> I see all your responses as your own confusion, probably having root in your own worries (and I'm also not surprised that it's hard to admit that).

I'm not interested in being psychoanalysed.


Obsessed with guns too much and with drawing false conclusions? To the point of the cognitive dissonance? Like, how dare I linking to the page "on guns" (as you name it) without explaining to you that it's not something you should be "insulted" about! The last sentence in the post above, moreover, has certain similarities with one of the first statements on the page that initiated your response ("I am in fact...") Funny. Note how you ascribed to me the motivations I haven't had and now you write: "There's a big difference between analysing the meaning of some text (which is necessary for communication) and guessing at someone's motives." You actually did the later, accusing me as if I wrote "he supports gun rights" which I absolutely never wrote. Also "It's hiding behind some ambiguity and hinting, but I think the broad intent is quite clear." Apparently it wasn't clear to you, that much you were able to admit. But you continue, guns, guns. It's your invention, not my subject. My subject was the personality traits, all the time.


Reply to your PPS:

Yes I was previously familiar with ESR. No I don't have strong opinions about gun control, I'm not interested in being psychoanalysed partly because you are just plain wrong about my motivations, and yes I would have replied the same way if it was about free speech.

I try to be charitable, but that doesn't mean leaving myself vulnerable to vague hinting intended to retain plausible deniability, as I interpreted your initial post; and it doesn't mean tolerating bullshit accusations against me as you're now throwing around.

There's a big difference between analysing the meaning of some text (which is necessary for communication) and guessing at someone's motives and emotional reactions (which is usually rude and unnecessary).


> Adding a throwaway compliment does not significantly affect this

I linked to the text I think he did right, simply because I remember it still. I do like it. I still don't claim that knowing he wrote it would make him easier to work width. My link to his own gun page (after citing his statement "I shot ntpdc through the head") is not related to the problems he represents in the teams, it's there just to present one more aspect of that complex personality. Your construction of what my linking to his writing is supposed to mean remains purely yours and shows how the interpretation of any post is very subjective. You don't quote me, you just invent some sentences to make your argument. I still stand to my claims, and I don't think I've insulted him.


From the article:

Most of them, [Sons] said, "are older than my father.... [and] are not always up to date on the latest techniques and security issues." [...] Sons suggested that they "should be retired."

That is incredibly asinine and discriminatory to boot.

From the NTPsec project manager's comment:

The main point of contention that caused the fork was BitKeeper vs Git.

I can't believe that arguing over VC tooling is what caused the fork. Why not just compromise on this, use BitKeeper for long enough to get on the maintainer's good side, and helpfully offer to convert to Git later? This is how ESR got GNU Emacs development migrated from Bazaar to Git, and it seemed to have gone pretty well for all involved.


That's simply not what I said; I was misquoted. Please watch the original video.

https://www.oreilly.com/ideas/the-internet-is-going-to-fall-...

It's amazing* how many people here are willing to roast me over a third-hand account of my opinions, when I've already offered to answer questions directly.

* Not actually amazing, fairly typical of internet commentary, really.


To save effort on finding the relevant segment of a 17+ minute interview, I have attempted to transcribe a portion. See also https://www.oreilly.com/ideas/susan-sons-on-maintaining-and-... with some portions transcribed (inexactly); add ~20s to the times below to match the podcast timing.

(5:26) [O'Reilly interviewer] Mac Slocum: Related question on this: how can the Internet's infrastructure remain up to date and secure, particularly when it's distributed like this?

(5:33) Susan Sons: So the really terrifying thing about infrastructure software in particular is when you pay your ISP bill, that pays for all the cabling that runs to your home or business. That pays for the people that work at the ISP. That pays for their routing equipment and their power and their billing systems and their marketing and all of these wonderful things. It doesn't pay for the software that makes the Internet work. (5:54) That is maintained almost entirely by volunteers. And those volunteers are aging. [Um.] Most of them are older than my father. And [um,] we're not seeing a new cadre of people stepping up and taking over their projects, (6:10) so what we're seeing is ones and twos of volunteers who are hanging on and either burning out while trying to do this in addition to a full-time job, or are doing it instead of a full-time job, or should be retired, or are retired. [Um.] And it's just not giving the care it needs. (6:27) And in addition to this, these people aren't always up to date on the latest [um] techniques and security concerns of the day. And the next generation isn't coming up. I recently started a mentoring group called the #newguard that takes early and mid-career technologists and we cross-mentor and then we match them up with the old guard who are maintaining and who built this software to try to help solve that problem. But in the meantime there's still not enough funding going in this direction. And there's not enough churning happening. [Um.] And it's a really tough thing because there's a certain amount of what I call "functional arrogance" involved. [Um.] I don't have a certificate of "Susan is good enough to save the Internet" anywhere. I don't know who hands those out.

(7:08) Slocum: Sure.


I found your remarks about succession planning at around the 5:50 mark of the linked video:

[The software that makes the Internet work] is maintained almost entirely by volunteers, and those volunteers are aging. Most of them are older than my father, and we're not seeing a new cadre of people stepping up and taking over their projects, so what we're seeing is ones and twos of volunteers who are hanging on and either burning out while trying to do this in addition to a full-time job, or are doing it instead of a full-time job, or should be retired, or are retired, and it's just not getting the care it needs. And in addition to this, these people aren't always up to date on the latest techniques and security concerns of the day, and the next generation isn't coming up.

In context "should be retired" sounds awfully prescriptive, but I can see how that could mean something like "these volunteers want to retire but feel obligated to continue their maintenance duties".

Then at the 7:00 mark, you say:

It's a really tough thing because there's a certain amount of what I call functional arrogance involved... There's a certain point where you just have to say, "I'm going to decide that I'm in charge of this"...

I dunno. I can see where that's going to rub people the wrong way while at the same time seeing the value in having some moxie. I get the impression, though, that Stenn wasn't too happy with this approach.


2:15 The entire build system depended on one build server located in Harlan Stenn's home. "But Harlan no longer had the root password to this system, couldn't update it, didn't know what scripts were running on it, and no one in in the world could build NTP without this server continuing to function."

3:25 It was death by a thousand cuts. "And I was seeing things that were not yet C99 compliant in 2015. The status of the code was over 16 years out of date in terms of C coding standards which means that you can't use modern tools for static analysis..."

4:30 "And in the mean time, security patches were being circulated secretly and then being leaked, and the leaked patches were being turned into exploits which we were seeing in the wild very quickly, when the security patches weren't being seen in the wild for a long time."

6:00 "...but it doesn't pay for the software that makes the Internet run. That maintained almost entirely by volunteers, and the volunteers are ageing, most of them are older than my father. And we're not seeing new [?] people stepping up..."



I am too lazy to look it up on the mailing lists but a while back I recall ESR contacted certain BSD projects trying to convince them to change to Git. I mistook it for the occasional incoherent spam that slips through the filters.


> I am too lazy to look it up on the mailing lists but a while back I recall ESR contacted certain BSD projects trying to convince them to change to Git.

They're mostly still on CVS these days, right? Regardless of one's personal feelings about esr, is there any good reason to stay on CVS in this, the second decade of the twenty-first century?


As loeg mentioned, FreeBSD uses SVN. OpenBSD still uses CVS.

Why? Basically because they are the ones using it and it continues to work just fine for their needs.

Is there any good reason for them to migrate everything to a new VCS, be forced to learn new tools, etc., when what they are using now meets their needs just fine?

This is a perfect example of change for change's sake.


FreeBSD's been on SVN, not CVS, since 2008. Many developers use an official git mirror[0] to stage their work before committing.

[0]: https://github.com/freebsd/freebsd


a reason (although not necessarily good enough) is that git requires a lot of resources for large repos

there were other reasons but the discussion hasn't re-happened in the last 18-24 months


People live forever, why should anyone try to take up the reigns from the older generation? I think it's clear, even if you don't like some of Sons's attitude, that this was a sound-bity quote likely taken a little out of context.

And why do people fork a project when they want to fix things? I'm sure there were dozens of problems in getting started on fixing NTP, BitKeeper was just the first problem to deal with. What can you do when the current maintainer is so contrary that they have already lost access to free funding to modernize a dying code base? We don't need NTP backported to unix workstations that EOL'd over a decade ago, we need an NTP that doesn't cause more problems than it solves.


I'm struggling to imagine a context that excuses the attitude of "that developer should be forced to retire so we can take over their project".


Most of them, [Sons] said, "are older than my father....

Eric S. Raymond, who seems to be behind the "young" fork, is 59.


To be fair, that is technically younger than my father. By 3 years.


Searching for additional funding, Stenn contacted the Internet Civil Engineering Institute (ICEI) and began working with two of its representatives, Eric S. Raymond and Susan Sons.

Stenn said in a phone interview, "Then all of a sudden I heard they have this great plan to rescue NTP. I wasn't happy with their attitude and approach, because there's a difference between rescuing and offering assistance. [Their plan was] to rescue something, quote unquote, fix it up, and turn it over to a maintenance team."

Most of them, she said, "are older than my father.... [and] are not always up to date on the latest techniques and security issues." Many are burning out from trying to maintain critical code while working full time jobs, and Sons suggested that they "should be retired.

Wow. Keep away from the ICEI I suppose.


I highly recommend that you watch my interview with Mac Slocum (video here: https://www.oreilly.com/ideas/the-internet-is-going-to-fall-... ) to find out what I actually said, rather than listen to a reporter saying what someone else said I said. I was misquoted.


    Several years ago, the project's inadequate funding became known in the media
    and Stenn received partial funding from the Linux Foundation's Core
    Infrastructure Initiative, which was started after the discovery of how the
    minimal resources of the OpenSSL project left systems vulnerable to the
    Heartbleed vulnerability.

No no no no. Yes more funding and resources are good for these things but Heartbleed did not come about because of that. It came about to due broken development practices and the developers focusing on adding more features rather than working through the issues people had reported in their bug tracker.

The work done in NTPsec echos this in what seems to be a repeat of OpenSSL/ LibreSSL with NTP/ NTPsec.

Yes forking is the "easy way out" in these circumstances and it's a shame to see efforts split in such projects but in reality it's often what's needed to get things moving in the right direction.


> broken development practices and the developers focusing on adding more features rather than working through the issues

Both of those problems are easily attributed to a lack of funding. Fulfilling feature requests generates money, fixing bugs does not. When you can only afford to spend 10 hours a week on a project, you're going to end up cutting corners to work around those time limits.


In the case of Network Time Foundation, the nonprofit wrapper for NTP classic, the problem was largely mismanagement of funding. NTF funded Harlan Stenn's work on NTP, but funded no other developers. It did, however, fund a fundraiser and two part-time administrative staff. The two administrative staff came to one FTE (full-time-equivalent) total, and I forget how much the fundraiser was working. In any case, the org had funded more FTE in overhead than in technical staff.

By contrast, the temporary rescue team funded a project-manager-slash-information-security-officer, three developers, and a bright intern who did a great deal of documentation work. Granted, these were not all full-time positions, and I was able to lean on my parent organization for administrative support, so while paperwork was minimal I didn't have to fund it directly.

NTPSec's staff has varied over time, but at the time I stepped down to hand the ISO role off to my successor, they had the following funded positions: a project manager, an ISO, two developers, and a sysadmin-slash-developer. (Note: not all of these were full-time positions, and some were funded by third parties rather than by NTPSec directly.)

Open source infrastructure software projects, when well run, do not spend over half their resources on non-development work. It's just not responsible. It's how you lose donor confidence, and how you fail to maintain good software engineering practice even when you have the resources to do better.

This has been a running theme in open source failures in the last few years: "it will be better if we throw more money at it!". Sometimes this is true: there are developers out there simply burning out for lack of resources and splitting their attention between OSS work and making a living. However, often, there is mismanagement at work, or the project doesn't have a good enough talent pool to pull from to use funds effectively when they do get funds. I love when the problems are purely technical, because it's a clean fix and everyone thanks me and I walk away quietly. When the underlying problems are social, they tend to fester, because nobody really wants to be in the hot seat over the disagreements that happened.


That's quite the stretch, to claim that the two part time administrators is anywhere near half the value of a full time software engineer.

If you take average wages, two part time administrative employees would cost you somewhere around $40,000 a year (2 * 12 * 30 * 52, rounded up). One software developer's value is conservatively $120,000 a year (likely more; $80,000 salary, $40,000 payroll taxes & benefits).

By those estimated numbers (I'm not in a position to look up the actual numbers right now, but they're probably available), that's only around 25% for administrative overhead, for a three person team. Not that bad for a foundation. And certainly nowhere near "half their resources" as you claim.

I'm glad that NTPSec appears to be relatively well funded, but knocking on NTF for being less well funded and having administrative work that they don't want to pawn off on an already overworked developer is pretty nasty business.


This is pretty much what happened. We spent a few months working with Mr. Stenn, and ultimately he did not agree to pursue strategies to correct the underlying problems that caused NTP's security and stability issues. Simply patching known vulns and moving on would have been a temporary solution: more vulns were lurking. NTPSec was born to give the code base another chance, to evolve with a different strategy. In the end, I tend to feel that this is a strength of OSS: different groups are free to do things different ways, and if people are paying attention, software quality should win out.

Since Eric and the rest of my team started working on the NTP code base in early 2015, we've eliminated over 50% of its vulnerabilities before they were disclosed simply by applying good software engineering practice where it hadn't been. In the year before my O'Reilly presentation, it was more like 80 or 85 percent. Everything we hadn't eliminated by disclosure or discovery time was fixed promptly.

There are other NTP protocol implementations besides NTP classic or NTPSec that are worth considering for some users. However, we felt that refactoring the reference implementation was necessary due to its use in many less-mainstream, but often highly-critical (in a life-critical or economically-critical or critical-to-scientific-research sense) applications. The non-NTP-related implementations don't always do what high speed trading houses need, or scientific installations built on aging but extremely precise equipment need, or controls system interfaces need, and on and on and on. We just didn't have a drop-in replacement available for all of the things that weren't web servers, workstations, and other commodity applications.

The "rift" article is now subscriber-only, so I can't respond there to its many inaccuracies (I was passed a PDF by someone who cached it, this is the only way I was able to read it). I was never contacted about it by the author, and I don't feel it was a fair treatment of the subject. That's okay. I learned a long time ago that fixing a mess will make some people thank you and some people angry with you. It wouldn't have become a mess by the time I found it if there weren't a cost to fixing it. People who fear controversy will have a hard time making a difference in the world.

I'm at work, but I'll do my best to answer any questions fired at me today on this thread. If there's something you want to know, ask!


NTPsec advocates keep saying "eliminated 50% of vulnerabilities <<<before they were disclosed>>>", as if there were another meaningful way to eliminate vulnerabilities from a codebase.

Can you provide a breakdown of the vulnerabilities NTPsec HAS and HAS NOT been vulnerable to, along with their severity (low: degrades time service, medium: provides a practical vector for corrupting integrity of time service, high: compromises integrity of the server itself) and whether they're exposed (a) in the default configuration, (b) in a configuration run widely on the Internet, or (c) in no configuration actually known to the project maintainers?

You clearly have the list somewhere, because everyone involved in the project has this statistic ready to quote.

If you don't have the severity and exposure breakdowns, that's OK. Post the list anyways. Maybe it'll be obvious what the severity and exposure is.

This business of counting vulnerabilities and claiming victories has been a problem for software security for two decades now. Ops people don't care about the vulnerability count, if the vulnerabilities left exposed in the codebase are the ones that get their servers popped.


I'm sorry if I wasn't clear, I meant "before they were disclosed to NTP classic or NTPSec". In other words, by simply improving on the software engineering practice, we eliminated classes of vulnerabilities without having to track them down individually. This is pretty common with ailing code bases, though often overlooked. I'm at work right now, so I don't have a comprehensive list handy. Going through NTP classic vulns and seeing how many never impacted NTPsec would recreate such a list.

The severity varies (many weren't that big, some were)... the point of claiming the victory is to demonstrate that I'm not just having a fuss about testing code, using static analysis tools, using an accessible code repository, refactoring for lower attack surface and better separation of concerns because they are beautiful in abstract. I like results. NTPSec, and before it the temporary "rescue" team, have been slowly chipping away at the big picture mess, making the code safer and more maintainable, because it's likely to remain in service for another decade or two.

Every time 14 vulns are disclosed and we are already immune to half of them, we get to put twice the effort on the half we do need to deal with, if even we need that much. We aren't just firefighting, NTPSec can develop proactively. That means something for our users.

lots of personnel overlap here...the main difference being pre- and post- fork and where the funding came from, probably not interesting to most people.


No, I understood your meaning. I'm saying: that's what every code refactoring does. I'm saying that since you can't claim credit for eliminating vulnerabilities that are already disclosed, the emphasis you place on precluding vulnerabilities is strange.

Can you provide that list of vulnerabilities now? You're obviously keeping track of them, that being part of the premise of the project. I know you don't have them broken down, but we can help with that.


How about this: before I put the effort in to generating the list myself, can you at least promise to confirm that I have the complete an accurate list once I do, and to fill in any gaps?


As a side note, I'd like to add a point that I highlighted in my O'Reilly Security Conference talk but previously forgot to mention here...

One of the coolest after-effects of this whole thing was that, after the fork, when NTP classic began feeling the pressure of competition, their speed in addressing security vulnerabilities increased incredibly. While I was sorry that it didn't happen on its own, I was pleased and impressed to discover what Mr. Stenn was capable of once his competitive hackles were raised.

Many people experience hurt feelings during a fork, and a fork represents a frustrating duplication of effort that I'd usually rather avoid. However, forking is a central tenet of the open source ethos for a reason. Competition can do incredible things. <3


If a primary purpose of forking ntpd was to give the original project a kick in the ass about fixing vulnerabilities, could it not be argued that your project has now served its purpose, and dollars could be better spent on building from the success of "NTP Classic" --- which, after all, is the version of NTP most likely to be deployed?


I would agree with you if NTP classic had fixed the total of its social and technological problems. Unfortunately, this is not the case. "Patching faster" is one small victory.


What percentage of "NTP Classic"'s problems are managerial/social and what percentage are raw technical?


> In the podcast, Sons depicted NTP as a faltering project run by out-of-touch developers. According to Sons, the build system was on one server whose root password had been lost.

> Stenn denied many of Sons's statements outright. For example, asked about Sons's story about losing the root password, he dismissed it as "a complete fabrication."

Unless either you or Stenn is outright lying (neither of which seems likely, on priors), this seems like a strange misunderstanding to crop up. Do you know what's going on with this?


I know what his side of the story is on that specific password. I don't think it's adequate, but... I also don't know that it's helpful to keep arguing this two years later. Casual contributors couldn't build the latest dev version of NTP due to repository access and build system problems, and the lead (effectively only active, at that point) maintainer couldn't or wouldn't fix the situation.

While the password problem made a good rhetorical flourish--it illustrated how the scaffolding supporting NTP development had been allowed to rot--the fact is that the server was in Mr. Stenn's control and he could have rebooted it to rescue media at any time, fixing the problem in a few minutes. Yet, the server was never properly brought up to good maintenance practice. I suspect that the majority of people reading this know how to reset a root password, so the password doesn't really matter that much in the grand scheme. The server was just another thing being neglected.

As I described in my O'Reilly talk, technical problems of this magnitude stem from social problems. The project didn't have a culture of sound engineering practice. I did what I could to work with Mr. Stenn to offer support and resources to bring that practice to his project. I didn't want to lose the years of institutional knowledge he'd acquired working on NTP. That's costly to replace. However, I wasn't going to forgo sound engineering practice to keep him on board: over time, smart people could learn the ins and outs of even the most tangled code base. The costs of bad engineering practice just keep coming, and I cannot force people to do the right thing, only lay out the costs and benefits then see what they choose.

That, and throw a little storytelling prowess at the problem now and again, in the hope of motivating people.


From the article (and the end result) it seems the "strategies" you talk about that were rejected include a total rewrite and an abandonment of features and platform support.

You can eliminate vulns and improve stability a lot of ways. Total rewrite is definitely not the best way. Even if you're the best programmer in the world, rewrites often run into old bugs as well as new ones, and require a lot of testing and a lot of repeated effort.

And I can't speak for the other infosec nerds, but for me, name-dropping ESR does the opposite of inspiring confidence in a security-focused project. I wouldn't trust him to secure my shoelaces.

If the old codebase was really bad, perhaps eventual rewrite would have been useful. But what would help existing users more is fixing the existing product so they can upgrade in place and be more secure, and not forcing them to go through a whole product migration cycle just for better security.


The specific measures that were refused, from the rescue plan that Mr. Stenn rejected and my notes from that meeting:

* Moving NTP development from a private Bitkeeper repo which requires all people accessing it (10 at most without private license purchase, given that Network Time Foundation has only 10 licenses) to agree to a restrictive license that may interfere with their other development work, to a public git repository which is accessible by the public as a whole. Stenn felt that tarball releases were sufficient, and did not agree that giving the public an opportunity to see code prior to release was important.

* Releasing patches to NTP vulnerabilities to everyone at the same time. NTP had a practice (for which Mr. Stenn never explained to me the reason) of releasing vulnerability patches to a closed group months or more ahead of the public release. These patches were typically leaked fairly rapidly and turned into exploits which were then used against NTP deployments in the wild.

There were other disagreements, but these were the big two technical disagreements upon which Stenn walked away. They were not points upon which I was willing to compromise, especially given that neither I nor other people in a position to help NTP could possibly have signed Bitkeeper licenses while maintaining our primary employment. This was a massive roadblock for increasing contribution to NTP, from us or anyone else.

If you look at the slides from my O'Reilly presentation here: http://slides.com/hedgemage/savingtime you will see that even when the rescue proceeded without Stenn, we did not do a major refactor! Slide #20 outlines the original rescue, which had 4 points:

* migration to git

* replacing the build system (when Stenn had been on board, we'd intended to repair the build system in-place, but without the mystery scripts residing on his build box, we decided that a from-scratch replacement was more reliable and efficient than to reverse-engineer and repair)

* updating documentation so that new developers could be onboarded

* fixing what vulns we could given limited resources

That is it. Refactors came later when, after this "rescue" work, Mr. Stenn declined to use these work products and the NTPSec fork was born.

We did make every effort to avoid a fork, but in the end, I could only offer help, I could not force anyone to take it. Forking is, in the end, the OSS community's last protection from failing projects.


So why not maintain a patchset to be applied to the original and maintain it on your own repo? There's nothing you can do if someone is holding security patches from you, but you could certainly release your own to the public.

Honestly, both of those sound like very common issues which do not result in whole new product forks. Large projects maintain patchset and private security lists all the time. To me the ends don't justify the divergence.


ESR seems to have a long track record of being involved in these types of events. There are obviously a number of possible conclusions to draw:

  - ESR is so productive and involved in so many things that this is actually a normal drama ratio
  - ESR only involves himself in critical issues with high potential for drama
  - ESR brings the drama
I can't rule out either of the first two, but years of data points lead me to believe it's the third.


I agree that it is very likely to be the third. Just have a read of some of ESR's articles. Eg:

http://esr.ibiblio.org/?p=129

" American blacks average a standard deviation lower in IQ than American whites at about 85. And it gets worse: the average IQ of African blacks is lower still, not far above what is considered the threshold of mental retardation in the U.S. And yes, it’s genetic; g seems to be about 85% heritable, and recent studies of effects like regression towards the mean suggest strongly that most of the heritability is DNA rather than nurturance effects. "

ESR's famous melodramatic-response to having his CML2 ("Eric's configuration markup language for kernel building") patch rejected by Linus is also quite telling about his self-perception and his perception of other people and his interaction with anyone who would have an opposing viewpoint to his.


ESR's politics are repugnant, but there's more than enough evidence of job-related narcissism to make an argument about his personality (if that's needed) without having to drag the thread through his profuse bigotry. That's what Twitter's for.

I'm not sure how much more needs to be said beyond the fact that ESR has repeatedly claimed, including recently, that the Internet might fail if people don't continue to donate to his Patreon --- a Patreon which, he has previously stated, stands as direct evidence of his fame as a programmer, a fame rivaling that of Donald Knuth or Linus Torvalds.


How does the quote show that he "brings the drama"? I mean, that's on his own blog. It's not like he's exporting that. It's not like he's going up to black people and saying "you're almost retarded."

I think the CML incident and the time where he yelled at some Debian developer about "our tribe" are better examples, because at least there he's going into other people's spaces and actually confronting them.


Because what good can come from discussing that topic? The only reasons to broach it are to a.) troll and/or b.) agitate for some horrific policy proposals.

He's not an academic researcher. Don't you think it's odd for some old white guy to get worked up over minority test scores? Why write about this rather than basically anything else? Either he's a troll or profoundly unaware about how bringing this line of argument up may make people feel threatened. There are a lot of things that simply don't need to be discussed. It's like graphically describing your partner's episiotomy over dinner. Some things just inspire a visceral reaction.


It's odd, sure, but it's his blog. People blog about whatever they're interested in. He could have all kinds of reasons for being interested in that which aren't trolling or policy.

Here's an unflattering guess why he might be interested in this: he has cerebal palsy. Despite that, he still is a fairly smart (even if crazy and/or racist) guy. He might be extremely interested in differences in human intelligence for that reason. To figure out why he still has above average intelligence while most other people with his birth defect don't. That could explain a lot of his crazy beliefs and his narcissistic "I'm a super unix hacker" routine.

Why does he have to be an "academic researcher" to have opinions about race that he wants to talk about on his blog? Or for that matter, any topic? Do you have an anthropology degree to assert your opinion that his blog may make people feel threatened, or are you just another guy on the internet with an opinion?


>Why write about this rather than basically anything else?

Why not? If it's true, and you offered absolutely no refutation to his claim whatsoever, why shouldn't we discuss facts? Do they stop being true if we stop talking about them? Do their ramifications stop existing if we just refuse to accept them?


> It's not like he's going up to black people and saying "you're almost retarded."

Er, why do we think there are no black people reading his blog?


I don't think that there are no black people reading his blog. I have no idea if black people read his blog or not.

Even then, that quote isn't a individual indicitment of stupidity. Something which there are many examples of him doing, more aggressively, in other contexts.


Or his public arguments with Larry Wall.


Eek. That post contains quite a lot of claims that are not backed by a single reference to a primary source.


I was under the impression that he wrote fetchmail, but according to wikipedia he took over a previous package and renamed it fetchmail. So AFAIK, he's notable because he said so.

Yes, I know I'm nobody but that doesn't mean I can't call bullshit when I smell it.


There is little doubt that ESR is notable. He is best known as the writer of essays about Open-Source such as The Cathedral and the Bazaar and The Magic Cauldron as well as others on many topics including gun rights. He suffers from cerebral palsy but has not let that hold him back. He is also known for being contentious. Arguably, his philosophical contributions are more valuable than his code contributions.


He is also the incarnation of the god Pan [1] and a sexual tyrannosaurus (and relationship advice columnist)[2].

As get as I remember, his philosophical contributions primarily involve being in the right place at the right time, when people were searching for the reason that Linux was taking over the world, and having something to do with convincing Netscape to go open source.

[1] http://www.catb.org/~esr/writings/dancing.html

[2] http://www.catb.org/esr/writings/sextips/


>There is little doubt that ESR is notable. He is best known as the writer of essays about Open-Source such as The Cathedral and the Bazaar and The Magic Cauldron

Yes, but one could hardly call those great accomplishments that stood the test of time.

It was a puff piece based on the popular at the time starry eyed look at OSS development, that caught on (then), because it fit with the overblown hype for community "bazaar" style software development prevalent in the late 90s.


Nonsense. Those are still seminal documents of the Open-Source movement.


Because modern open source developers care for those notions? At best there's only historical value in them. Contrast with something like the Mythical Man-Month or SICP that new generations discover again and again.

Plus, ESR's documents have been debunked in their claimed benefits of the bazaar style time and again (plus the whole end of "linux on the desktop" dream -- again, meant by those believing in it as Linux-based distros being a major desktop OS player to overtake MS, not in the "works for me and I've installed it to my parents too" sense).


I don't know why this is being downvoted. Whether you like ESR or not, he is notable for all of these things. Plus founding the Open Source Initiative. And he maintains GPSD.

One can debate whether these things are all valuable, and how any value they may have may fit into how to judge him as a person along with his flaws, but they are notable.


>Plus founding the Open Source Initiative.

Which is notable itself (OSI) because?


Because they basically invented the term "open source." Prior to them the biggest thing in popularizing what we now call "open source" was RMS and the Free Software Foundation, which lots of people were turned off by.

https://en.m.wikipedia.org/wiki/Open_Source_Initiative


No, the biggest thing that popularized what we now call "open source" was (obviously) Linux --- which was the wave OSI caught.


I meant the literal words "open source" as the term to refer to the practices that created software like Linux.


If you're looking for a simple security-oriented ntpd, I can't recommend OpenNTPd enough. It's cross platform and available for most Linux distros in packages.

http://www.openntpd.org

I have a much higher trust level for OpenBSD on security issues than I do with either of these projects.


Just remember not to use OpenBSD or OpenNTP for time servers, they can only be used as clients.

(This is because neither supports leap seconds, so using them as servers will cause clients to desynchronize. Their belief that we don't need leap seconds is quite irrelevant: Right now we have them.)


What do you mean when you say "we have them"? Because they are in UTC?

RFC5905 (NTP 4) notes:

    The goal of the NTP algorithms is to minimize
    both the time difference and frequency difference between UTC and the
    system clock.  When these differences have been reduced below nominal
    tolerances, the system clock is said to be synchronized to UTC. 
Since nominal tolerances are not defined in the standard, a server would still be NTP conforming if it smeared the leap second.


> What do you mean when you say "we have them"? Because they are in UTC?

They are part of the time standard used by the entire world. BSD doesn't like them, fine, but pretending (in their code) that they don't exist is not the correct approach.

It works more or less fine for clients, but not for servers.

> a server would still be NTP conforming if it smeared the leap second

Not really. Consider the situation if a client is speaking to multiple servers, some normal servers, some buggy BSD servers.

Consider a client that manages to do a mixture of correctly applying a leap second, while also doing time smearing because an upstream source is doing time smearing.

BSD is simply doing the wrong thing here. (My understanding is that BSD time servers are not permitted in the ntp pool because of the problems they cause.)


Of course you shouldn't mix servers that handle and ignore the leap second in one pool, so OpenNTPD servers aren't accepted in the ntp.org pool. Doesn't mean you can't run OpenNTPD as an internal server for your machines :)

http://marc.info/?l=openbsd-misc&m=143544318718489&w=2


I tried using OpenNTPd as the ntp client on a slow virtual server last year.

It seemed like perfectly decent software except for the fact that it failed to keep the time in sync.

I switched to the usual (ntp.org) ntpd, with the same pool servers, and it worked fine.


Did you try to diagnose the problem or did you just switch? I can't take such a vague bug report seriously.


I didn't try to diagnose the problem. I just switched.

I do not claim that OpenNTPD has a bug. Maybe it has requirements on the quality of the local clock that the virtual machine didn't meet. Maybe it has requirements on the remote time sources or the network that the setup didn't meet.

What I do know is that over the years I've deployed ntp-over-the-Internet in a dozen or two different setups (including some user-mode-Linux ones with really poor local clocks), and ntp.org's ntpd has always managed to sync the clock.

The one time I gave OpenNTPD a go it didn't. Looking at the log, I was running it for several months. After a while I got tired of fixing it manually when the clock was a few seconds out.


And the winner of the battle between NTP (classic) and NTPsec is... chrony

https://chrony.tuxfamily.org/comparison.html


There's lots of good things chrony does that NTP doesn't and vice versa, but since chrony doesn't support Windows (roughly a third of the server market), I wouldn't claim that it's the clear winner.


Doesn't Microsoft have their own NTP implementations for use in the Windows world? IIRC, it's called Windows Time Service and it's included with Active Directory.

At a previous job, we pointed our in-office linux dev systems at our Windows domain controller and never had any issues with time synchronization. Since this was back in this embrace-and-extend days, I was pleasantly surprised by how interoperable MS was when it came to NTP.


I support scientific computing, so depending on the application I will replace w32time with ntpd, which offers better accuracy and built-in support for reference clocks.

That said, I didn't realize that the w32time service got a _lot_ better in Windows Server 2016:

https://technet.microsoft.com/en-us/windows-server-docs/iden...

It looks like Microsoft added support for NTPv4, and w32time now boasts 1-ms accuracy, which is at least a factor of 10 better than the previous Windows Server release.


[...]began working with two of its representatives, Eric S. Raymond[...]

Usually signals the start of much drama.


The presentation mentioned was where I learned of the project. It sounded very very strange to me. Script kiddies who don't what what NTP does, only that it's good for DDoSing? What what that about..?

I wasn't surprised to learn that the project one year later was caught up evaluating Rust vs Go. That's great and all, but it's not saving the Internet from "meltdown".

Anyone who needs a modern ntpd and doesn't need the refclock stuff should probably just go with chrony.


> Script kiddies who don't what what NTP does, only that it's good for DDoSing? What what that about..?

There were several large reflection issues in ntpd in the past few years which led it to become a popular DDoS method. Reflection is the big winner lately.

No need for a botnet, just scan UDP services, find ones which reply with large packets or a series of packets and then spoof your IP (easy from many VPS and dedicated server providers) and flood out requests for those large packets.

You can get amplification of 100x as much traffic with only a small request and it's quite challenging to trace back to the original source due to the spoofed packets.


I'll bite. Which large VPS / dedicated server providers don't implement BCP38?


I'd rather not name and shame, but suffice to say a quick test with hping shows this is currently working on a number of my servers, some on larger, some on smaller providers. And all it takes is 1.

You definitely have to hunt around if this is something you want and the number does seem smaller than I remember it being so many more are probably implementing it since I last checked, but still not all.


That much is well known. It's the other part that's questionable.


What's the status of phk's ntimed? The planned release dates on http://nwtime.org/projects/ntimed/ were quietly shifted from 2016 to 2017 a few weeks ago, and I'm not seeing much happening at http://phk.freebsd.dk/time/ .


I expected it will be like that as I've read how PHK wrote about it.

Whoever comes to some big project sees only the small part of it that he immediately understands, and it appears to him as he can do it "much simpler." Yes he can, if he just does that small part. The problem is, the big projects actually do more. If you don't need the big project, you can use the small simple project too. Everybody has fun making something small. Maintaining something big -- that's the hard part, and we see from the article we comment to, it's again what the "saviors" want to avoid:

"[Their plan was] to rescue something, quote unquote, fix it up, and turn it over to a maintenance team."


From what I understand of ESR's claims, much of the removed functionality is support for older platforms. Unlike NTP Classic, he asserts POSIX and C99. Apparently, this has drastically reduced complexity/size.

Of course, I haven't worked on the NTP code. But given the described circumstances, I'm willing to entertain the argument that older features now cost more to maintain than they are worth.


Sorry, you fully missed the topic here. There was another "liberator" who did his own, "NTP (minus most of it) from the scratch" effort -- PHK, not ESR. See the posts in the thread please, specifically, the one by gpvos for the link.

The problem is, none of these "liberators" actually improves NTP, and none can (by default from their goals) provide a "better" NTP but only a small subset, typically even less tested. And everybody of those gets some funding and a lot of the attention, instead of the real maintainers of the darned NTP.

Doing big, widely present and very compatible and long-lived projects is hard. Very hard. And the maintainers should be helped, not blamed.


As an outside observers - and ignoring the whole drama aspect of ESR - you say "very compatible".

Does "very compatible" mean "wider compatibility than POSIX and C99"? If so, that's a very ambitious goal, but is it worth in this age?


It's very clear that I don't consider as positive contribution anybody not willing to commit to the support of the big project but who promotes his quasi-solution as discarding the existing feature set and the existing compatibility and then running away.

If somebody complains "the source is not up to the current security standards" that doesn't mean that anything has to be discarded to work towards that goal.

And if somebody wants to make the source compiling only with the most recent compilers and only for the small subset of the previous platforms he obviously has another agenda than improving the security, or actually helping the project. It's just enforcing his taste to the people who never asked for that.

For my personal experience, at my previous work I was able to "just use" NTP, and I know that on these platforms I still can't use any of the alternatives. I admit I surely have another perspective than the average Linux user who doesn't really care what keeps his time in sync as long as it "looks right."


Sometimes dropping older platforms is a sane approach. In this case, however, I would expect even MIPS R4000 running IRIX and PPC 603 AIX boxes to still be able to keep their internal clocks in sync the same way my Xeon machines do.


Also worth considering "NTPsec has removed lots of stuff that has zero reported bugs in them, like sntp, the ntpsnmd code, and various refclocks."

And reading this comment:

https://lwn.net/Articles/714279/

"in general all the other NTP implementations likewise lack broad support for all the various reference clock hardware and drivers. This is why ntpd is still used so heavily as stratum 1 servers."

And, additionally.. are the servers running Windows a small and obscure base?


Unlike NTP Classic, he asserts POSIX and C99.

Aka not supporting Windows.


Ugh. Why can't people just act civilized to each other? I can't believe this type of empire building goes on even for projects as auturistic has NTP.


There are a large percentage of people for whom power games are more important than getting things done. See narcissism, cluster B, etc.

Those people are toxic, and removing them from your community makes everything better. If you can't do that, ensure that the only topics of discussion are reality based, and focussed on getting things done.

I'm not saying the people here meet those criteria. But it does answer your question.


I wonder how much the funding of NTPsec plays into this story? Funding is tied to the need for a project. If a project can claim an exaggerated need, then they can get more funding. So you end up with a conflict of interest when you hear one side's version of of reality.


Par for the course for Raymond: make a big noise, little motion, claim credit.

Stenn just gets things done.

disclaimer: have known Stenn since the 90s, Raymond casually since the 80s.


Why not get rid of NTP altogether? For servers, they ought to be able to accept a hardware time and frequency inputs. Typically this would mean a 1 Pulse-Per-Second (PPS) signal and a 10 MHz signal. You might also want some kind of serial signal that describes the date and time to label the last second pulse.

Why don't any professional servers come with this kind of connection built-in? Why can't I synchronize my Xeon chip's clock to my own reference frequency?


1-second-per-second is a difficult engineering challenge:

https://rachelbythebay.com/w/2014/06/14/time/ (associated HN discussion - https://news.ycombinator.com/item?id=8066915)

Here's a choice quote from the NTP FAQ (http://www.ntp.org/ntpfaq/NTP-s-sw-clocks-quality.htm):

Unfortunately all the common clock hardware is not very accurate. This is simply because the frequency that makes time increase is never exactly right. Even an error of only 0.001% would make a clock be off by almost one second per day. This is also a reason why discussing clock problems uses very fine measures: One PPM (Part Per Million) is 0.0001% (1E-6).

And if you want to delve into the physics, you'll enjoy this tutorial by John Vig (U.S. Army Communications-Electronics Command):

http://www.am1.us/Local_Papers/U11625%20VIG-TUTORIAL.pdf


That might have been true awhile ago, but today you can get GPS-disciplined oscillators that can produce 1 PPS accurate down to tens of nanoseconds. By accurate, I mean "replicates what the NIST or USNO atomic clocks indicate."


Also your "rachelbythebay" post is discussing exactly the problem I'm trying to solve with hardware - using software. You don't need to run NTPd at all if your computer hardware and infrastructure has solved the problem of "keep my clock accurate down to 10's nanoseconds."

Also, there's another unrelated problem which is "how accurately does a user process running in a linux kernel get the current time?". That's a problem no matter what kind of hardware the kernel is running on (although hardware might assist with that as well).


1. If I have a fleet of servers, it's a lot more important to maintain accurate time between the servers than accurate time in an absolute sense. If your hardware time cable can't span two racks, I don't want two separate clocks on each rack from a different hardware source.

2. For applications where you need extremely highly accurate time, there already are other solutions like PTP. NTP is mainly useful for wide-area connections. I want to keep my laptop's time accurate, so that if I modify a file here and rsync it elsewhere, make doesn't get confused about the time my laptop sent. I'm not plugging in a hardware time server to my laptop.


>If your hardware time cable can't span two racks, I don't want two separate clocks on each rack from a different hardware source.

It absolutely would span racks. You must account for the additional speed-of-light delay the cable itself would add though.


PTP is more accurate than NTP, but not as accurate as a 1PPS line. However, PTP has the advantage that is bidirectional so it can solve the (non-trival) problem of correcting for propagation delays.


> For servers, they ought to be able to accept a hardware time and frequency inputs. Typically this would mean a 1 Pulse-Per-Second (PPS) signal and a 10 MHz signal. You might also want some kind of serial signal that describes the date and time to label the last second pulse.

Because large numbers of servers in a mass-scale deployment frequently have only two things: Ethernet connections and power. Introducing coaxial cable so you can feed a 10MHz reference signal (such as you feed a reference clock signal to certain RF Rx and Tx equipment) is a non starter. Introducing a serial cable to every server into an environment that is purely IP/Ethernet is a non starter.


So people would be willing to spend millions of dollars on computers and would also be unwilling to spend a few thousand more for cabling?


As someone who spends millions of dollars on computers - yes, absolutely.

You severely underestimate the cost of rolling out another cable type one per server. In fact there are entire billion-dollar segments of the industry that specifically exist to remove said cables.

Anything more than power and a few ethernet connected to a rack full of 40+ servers gets real messy real fast, and starts to actually impact cooling and thus power usage in a very measurable way.

Even things like screws in server rails vs. tool-less impact the bottom line in most large scale server ops.

For the 3 total servers that actually need very precise timekeeping we effectively do this. For the tens of thousands of others, that's what IP and network protocols were made for.


I cannot even imagine trying to deploy coaxial cables or serial throughout a datacenter via overhead cabling trays/ladder racks to 120 individual 52U racks each full of 10kW of servers in hot/cold separation systems. It would be a complete nightmare. Cable management is already to the point where are specifically using 1.6mm uniboot diameter fiber patch cables because the traditional yellow fat stuff starts becoming unwieldy.


traditional yellow fat stuff starts becoming unwieldy

I'm not sure what you're referring to here. The traditional 10BASE5 cable hasn't been used in literally decades. There's none of it deployed in modern datacenters. https://en.wikipedia.org/wiki/10BASE5 AKA 0.375 inch diameter "frozen yellow garden hose".

So standard fiber optic patch cable is the new "traditional"?


no, I'm talking about the different types of short distance 9/125 fiber patch cables used between servers and top of rack switches, or in large end of row aggregation switch environments and with large-scale patch panels (for 144/288 count cables):

http://tiffanycommunication.altervista.org/wp-content/upload...

and another example of duplex uniboot inside one cable jacket:

http://www.fibertronics-store.com/images/1378852143657-21093...

versus zipcord/duplex fiber which has a figure-8 shape, and the total jacket diameter is fatter in the range of 2.2 to 3mm:

http://www.pctstore.com/v/vspfiles/photos/SCASCADP30S10-5.jp...

ignore the SC or LC connectors, the difference is the cable diameter itself. Figure-8 shaped duplex two strand fiber cables that are fatter and occupy more space than a single circular jacket that contains two strands and a single boot connector.

This matters for cable management when you're dealing with a switch that might have 4 x 48-port 10Gbe SFP+ linecards in it, all heavily populated. All of the cables coming off the switch in a neatly organized way and going vertically up some fiber cable management to a patch panel.


> that's what IP and network protocols were made for.

No, it's not and that's the problem. IP was meant for wide area internetworking, not local low-latency time distribution.

With enough engineering one could potentially include the time and frequency reference signals directly onto existing cabling - for example they could be put onto a fiber at a different wavelength from the ethernet traffic. But it would take quite a bit of money to convince a NIC manufacturer to do this.

Some NICs already come with 1pps reference: http://www.solarflare.com/ptp-adapters


What scenarios do you see where PTP (which gets improvements from hardware-support in widely available network cards and switches, but doesn't require extra cabling etc) isn't good enough? On a geeky level syncing everything to a central clock is fun, but I don't see much practical reasons to actually build complex hardware for it given that we have working alternatives on top of our existing infrastructure.


>What scenarios do you see

One scenario which we certainly don't have today: access to a tick-accurate global clock across a large number of computers. It would be interesting to see how distributed scheduling and communications between computers would be coordinated if each computer had access to an internal clock timestamp that exactly matched all of its peers.


You underestimate the complication of cabling and managing that cabling when building a facility that is on the scale of eight thousand or more 1U size servers (or the proprietary open compute/blade type equivalent of the same in 1U).

Then the RF and/or serial cabling from each server has to go somewhere, to some equipment (what, a new special 10 MHz reference 1U top of rack device that is a bespoke creation?), which both consumes rack space and electricity. And is managed over some sort of management network.


>what, a new special 10 MHz reference 1U top of rack device that is a bespoke creation?

It's not a "bespoke creation", such things have been available for decades:

https://www.microsemi.com/products/timing-synchronization-sy...


and each one of those 1U units costs how much, and can feed reference signals to how many individual end-point servers? It doesn't scale to a massive hosting/hypervisor operation.

products like the one linked above are priced and intended for providing timing to crucial core network infrastructure. Such as if you're an ISP with a POP that has a 1+1 pair of Juniper MX960 at the main IX point in a major city. Where the ISP would then have its own set of NTP servers serving as internally-accessible-only stratum 2 to all of their other network equipment that is topologically nearby within the same AS.


>and each one of those 1U units costs how much

It depends on what capabilities you want. A good oscillator with GPS discipline capabilities would cost between $100-$10000 depending on specifications. In particular, if you want it to have high stability with time and temperature when GPS fails you must spend much more money.

>and can feed reference signals to how many individual end-point servers?

In theory one of these units could feed a whole datacenter (thousands of endpoints), if you also purchased enough cabling and distribution amplifiers.


>You underestimate the complication

I'm not underestimating anything. If people want to have highly accurate time available at a hardware level, they can pay for it. I'm complaining that it isn't even an option from server vendors, who are quite capable of implementing it.


You can. The workstation in front of me is directly connected (via serial) to a GPS providing a 1PPS signal.

Everything else in my house uses NTP and, in some cases, PTP, however.


Yes but the oscillator in your computer is almost certainly not referenced to a frequency reference. Also you are not able to correct that oscillator's frequency with your hardware (yes, you can estimate your frequency error from external PTP measurements but you cannot physically correct it - nor prevent drift).


I wonder how that would work with virtual servers?


Does the hardware clock that runs the physical chip also provide a clock for the underlying virtual cpu?


I read through some of the blog posts related to language choice for NTPSec (helpfully provided in LWN comments).

https://blog.ntpsec.org/

I can't help but wonder if there are other options, most notably C++. Don't get me wrong, I have a well developed loathing of C++ based on past experience, but it seems like a category fit and people I respect have said good things about the latest versions. Nim would also seem like a pretty good fit technically, though I could well understand if its relative immaturity and lack of developers (another recent story here on HN) cause it to be deemed inappropriate for this situation.


"[Sons] has since become president of ICEI; she described herself in the presentation as having "moved on" and is no longer involved with NTPsec on a daily basis."

"Already, where once only Stenn was looking for support, now Raymond is in a somewhat similar position, as NTPsec has lost its Core Infrastructure Initiative funding as of September 2016."

So, a drive-by fork.

BTW, had anyone used any project that ESR is heavily involved in? The only things I know about are NTPsec, fetchmail and his attempt to hack the Linux kernel build system.


According to ESR, if you use a mobile phone, you're using his code (I assume this to refer to gpsd).


"GPSD is everywhere in mobile embedded systems. It underlies the map service on Android [4.0 and after?] phones. It's ubiquitous in drones, robot submarines, and driverless cars. It's increasingly common in recent generations of manned aircraft, marine navigation systems, and military vehicles."

Interesting, thanks.


why was the buildsystem changed to waf? ugghh. ntpsec wants to lower barriers to entry just use/update autotools.


I have no experience with waf, but I have some with autotools, and 'barrier to entry' is a good description of autotools. Also 'barrier to progress,' 'barrier to happiness' and 'barrier to sanity.'


CMake is the usual choice of developers who don't want to deal with autotools these days. Making people learn yet another build system is not going to be popular.




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: