Hacker News new | comments | show | ask | jobs | submit | luu's comments login

Right now? I'm pretty unhappy. I don't think I want to talk about exactly why publicly while I'm still in this job, but to give you an idea of how bad the situation is, literally all of my friends have been trying to get me to quit for months, with the exception of people who have given up because they think I must be insane.

On the other hand, I have a decade of full-time experience and I've been happy for about seven out of ten years. All things considered, that's not too bad. The other way to look at it is that I've had maybe five roles at one company, two another another, and one at a third, and I'd say four of those have been good. That's only 4/8, but it's possible to bail on bad roles and stay in good ones, which is how it's worked out to being good 70% of the time. Considering how other folks I know feel about their job, I can't complain about being happy 70% of the time.

In retrospect, some of my decisions have been really bad. If I could do it over again, I'd bail more quickly on bad roles and stay in good ones for longer.

My dumbest mistake was the time I was in an amazing position (great manager & team, really interesting & impactful work), except for two problems: an incredibly arrogant and disruptive person whose net productivity was close to zero who would derail all meetings and weird political shenanigans way above my pay grade. When I transferred, management offered to transfer the guy the guy to another team so I'd stay and I declined because I felt bad about the idea of kicking someone off the team.

From what I've heard, the problematic dude ended up leaving the team later anyway, so not having him kicked off didn't make any difference, and the political stuff resolved itself around the same time. The next role I ended up in was the worst job I've ever had. And the one after that is my current job, which is, well, at least it's no the worst job I've ever had. Prior to leaving the amazing job, I thought that it was really easy to find great jobs, so it wasn't a big deal to just go find another one. Turns out it's not always so easy :-). If I hadn't bailed on that and just fixed it, I'd be 4/6 and I could say I was happy with my job 80% of the time. Oh well, lesson learned. Looking back, I was incredibly lucky to get the roles that I did, but that same luck blinded me to the fact that it was luck and that there are some really bad jobs out there.

4/6 is 66%, and yes there are some really bad environments to work in, mostly related to people.

There is no chicken and egg problem for most workloads. Processors are quite good at handling correctly predicted branches, and overflow checks will be correctly predicted for basically all reasonable code. In the case where the branch is incorrectly predicted (because of an overflow), you likely don't care about performance anyway.

See http://danluu.com/integer-overflow/ for a quick and dirty benchmark (which shows a penalty of less than 1% for a randomly selected integer heavy workload, when using proper compiler support -- unfortunately, most people implement this incorrectly), or try it yourself.

People often overestimate the cost of overflow checking by running a microbenchmark that consists of a loop over some additions. You'll see a noticeable slowdown in that case, but it turns out there aren't many real workloads that closely resemble doing nothing but looping over addition, and the workloads with similar characteristics are mostly in code where people don't care about security anyway.


People who are actually implementing new languages disagree. Look at the hoops Rust is jumping through (partially) because they don't feel comfortable with the performance penalty of default integer overflow checks: https://github.com/rust-lang/rfcs/pull/146


That proposal for Rust was ultimately not accepted, here's what replaced it: https://github.com/nikomatsakis/rfcs/blob/integer-overflow/t...

TL;DR: There exists a compiler flag that controls whether or not arithmetic operations are dynamically checked, and if this flag is present then overflow will result in a panic. This flag is typically present in "debug mode" binaries and typically absent in "release mode" binaries. In the absence of this flag overflow is defined to wrap (there exist types that are guaranteed to wrap regardless of whether this compiler flag is set), and the language spec reserves the right to make arithmetic operations unconditionally checked in the future if the performance cost can be ameliorated.


Yeah, I think Rust has probably made the right decision here, but it's frustratingly imperfect. This introduces extra divergence in behavior between debug and release mode, which is never good.

Note that there's even pushback in this thread about enabling overflow checks in debug mode due to performance concerns...


I'm hopeful that as an industry we're making baby steps forward. Rust clearly wants to use checked arithmetic in the future; Swift uses checked arithmetic by default; C++ should have better support for checked arithmetic in the next language revision. All of these languages make heavy use of LLVM so at the very least we should see effort on behalf of the backend to reduce the cost of checked arithmetic in the future, which should hopefully provide additional momentum even in the potential absence of dedicated hardware support.


If you read the thread, you'll see that the person who actually benchmarked things agrees: someone implemented integer overflow checks and found that the performance penalty was low, except for microbenchmarks.

If you click through to the RISC-V mailing list linked to elsewhere in this discussion, you'll see that the C++17 standard library is planning on doing checked integer operations by default. If that's not a "performance focused language", I don't know what is.


  > the C++17 standard library is planning on doing checked 
  > integer operations by default
In C++, wrapping due to overflow can trivially cause memory-unsafe behavior, so it's a pragmatic decision to trade off runtime performance for improved security. However, Rust already has enough safety mechanisms in place that integer overflow isn't a memory safety concern, so the tradeoff is less clear-cut.

Note that the Rust developers want arithmetic to be checked, they're just waiting for hardware to catch up to their liking. The Rust "specification" at the moment reserves the right to dynamically check for overflow in lieu of wrapping (Rust has long since provided types that are guaranteed to wrap for those occasions where you need that behavior).

  > someone implemented integer overflow checks and found 
  > that the performance penalty was low, except for 
  > microbenchmarks.
I was part of that conversation back then, and the results that I saw showed the opposite: the overhead was only something like 1% in microbenchmarks, but around 10% in larger programs. (I don't have a link on hand, you'll have to take this as hearsay for the moment.)


The benchmark I see says up to 5% in non-microbenchmarks. A 5% performance penalty is not low enough to be acceptable as the default for a performance-focused language. If you could make your processor 5% faster with a simple change, why wouldn't you do it?

Even if the performance penalty was nonexistent in reality, the fact is that people are making decisions which are bad for security because they perceive a problem, and adding integer overflow traps will fix it.


As someone who's spent the majority of their working life designing CPUs (and the rest designing hardware accelerators for applications where CPUs and GPUs aren't fast enough), I find that when people say something like "If you could make your processor 5% faster with a simple change, why wouldn't you do it?", what's really meant is "if, on certain 90%-ile or 99%-ile best case real-world workloads, you could get a 5% performance improvement for a significant expenditure of effort and your choice of a legacy penalty in the ISA for eternity or a fragmented ISA, why wouldn't you do it?"

And the answer is that it there's a tradeoff. All of the no-brainer tradeoffs were picked clean decades ago, so all we're left with are the ones that aren't obvious wins. In general, if you look at a field an wonder why almost no one has done this super obvious thing for decades, maybe consider that it might be not so obvious after all. At zurn mentioned, there are actually a lot of places where you could get 5% and it doesn't seem worth it. I've worked at two software companies that are large enough to politely ask Intel for new features and instructions; checked overflow isn't even in the top 10 list of priorities, and possibly not even in the top 100.

In the thread you linked to, the penalty is observed to be between 1% and 5%, and even on integer heavy workloads, the penalty can be less than 1%, as demonstrated by the benchmark linked to above. Somehow, this has resulted in the question "If you could make your processor 5% faster ...". But you're not making your processor 5% faster across the board! That's a completely different question, even if you totally ignore the cost of adding the check, which you are.

To turn the question around, if people aren't willing to pay between 0% and 5% for the extra security provided, why should hardware manufacturers implement the feature? When I look at most code, there's not just a 5% penalty, but a one to two order of magnitude penalty over what could be done in the limit with proper optimization. People pay those penalties all the time because they think it's worth the tradeoff. And here, we're talking about a penalty that might be 1% or 2% on average (keep in mind that many workloads aren't integer heavy) that you don't think is worth paying. What makes you think that people would who don't care enough about security to pay that kind of performance penalty would pay extra for a microprocessor that gives has this fancy feature you want?


> people aren't willing to pay between 0% and 5% for the extra security provided

This is not true. One problem is that language implementations are imperfect and may have much higher overhead than necessary. An even bigger problem is that defaults matter. Most users of a language don't consider integer overflow at all. They trust the language designers to make the default decision for them. I believe that most people would certainly choose overflow checks if they had a perfect implementation available, and perfect knowledge of the security and reliability implications (i.e. knowledge of all the future bugs that would result from overflow in their code), and carefully considered it and weighed all the options, but they don't even think about it. And they shouldn't have to!

For a language designer, considerations are different. Default integer overflow checks will hurt their benchmark scores (especially early in development when these things are set in stone while the implementation is still unoptimized), and benchmarks influence language adoption. So they choose the fast way. Similarly with hardware designers like you. Everyone is locally making decisions which are good for them, but the overall outcome is bad.


  > if people aren't willing to pay between 0% and 5% for 
  > the extra security provided
In the context of Rust, integer overflow checks provide much less utility because Rust already has to perform static and dynamic checks to ensure that integers are used properly, regardless of whether they've ever overflowed (e.g. indexing into an array is a checked operation in Rust). So as you say, there's a tradeoff. :) And as I say elsewhere in here, the Rust devs are eagerly waiting for checked overflow in hardware to prove itself so that they can make it the default and do away with the current compromise solution (which is checked ops in debug builds, unchecked ops in release builds).




There are areas where you could make a typical current processor "up to 5%" faster in exchange for dumping various determinism features provided in hardware that are conductive to software robustness in the same way as checked arithmetic. For example the Alpha had imprecise exceptions and weak memory ordering. The consensus seems to be against this kind of tradeoff.


Is there a writeup somewhere on why the Rust people nevertheless decided against checked arithmetic?

The current RFC seems to be https://github.com/rust-lang/rfcs/blob/master/text/0560-inte... which seems to avoid taking a stand on the performance issue.


This RFC was the result of a long discussion that took place in many forums over the course of several years, so it's tricky to summarize. Here's my attempt:

1. Memory safety is Rust's number one priority, and if this were a memory safety concern then Rust's hands would be tied and it would be forced to use checked arithmetic just as it is forced to use checked indexing. However, due to a combination of all of Rust's other safety mechanisms, integer overflow can't result in memory unsafety (because if it could, then that would mean that there exists some integer value that can be used directly to cause memory unsafety, and that would be considered a bug that needs to be fixed anyway).

2. However, integer overflow is still obviously a significant cause of semantic errors, so checked ops are desirable due to helping assure the correctness of your programs. All else equal, having checked ops by default would be a good idea.

3. However however, performance is Rust's next highest priority after safety, and the results of using checked operations by default are maddeningly inconclusive. For some workloads they are no more than timing noise; for other workloads they can effectively halve performance due to causing cascading optimization failures in the backend. Accusations of faulty methodology are thrown around and the phrase "unrepresentative workload" has its day in the sun.

4. So ultimately a compromise is required, a new knob to fiddle with, as is so often the case with systems programming languages where there's nobody left to pass the buck to (and you at last empathize with how C++ got to be the way it is today). And there's a million different ways to design the knob (check only within this scope, check only when using this operator, check only when using this type, check only when using this compiler flag). In Rust's case, it already had a feature called "debug assertions" which are special assertions that can be toggled on and off with a compiler flag (and typically only enabled while debugging), so in lieu of adding any new features to the language it simply made arithmetic ops use debug assertions to check for overflow.

So in today's Rust, if you compile using Cargo, by default you will build a "debug" binary which enables checked arithmetic. If you pass Cargo the `--release` flag, in addition to turning on optimizations it will disable debug assertions and hence disable checked arithmetic. (Though as I say repeatedly elsewhere, Rust reserves the right to make arithmetic unconditionally checked in the future if someone can convincingly prove that their performance impact is small enough to tolerate.)


> by default you will build a "debug" binary which enables checked arithmetic

The check failures trigger a panic?

Is there any work to enable an ASan-like feature for unsafe blocks BTW?


Yes, they trigger a panic. No, there's no ASan.

There isn't as strong a need for ASan in Rust because so little code is unsafe. Most of the time, the only reason you drop down to unsafe code is because you're trying to do something compilers are bad at tracking (or that is a pain in the neck to encode to a compiler). It's usually quite well-contained, as well.

You can work with uninit memory, allocating and freeing memory, and index into arrays in Safe Rust without concern already (with everything but indexing statically validated).

IMHO the kind of stuff `unsafe` is used for is very conducive to aggressive automated testing.


This is timely, as I’ve just gone through the worst onboarding I've ever experienced. It took two very long days to install an OS on a machine and get it to boot (I had to assembly my computer and then use the corporate installer, which failed multiple times with errors like “an unknown error occurred”). It then took me about a week to run the local project’s “hello world” (I was initially pointed to documentation that had been deprecated for months, and then when I found the right documentation it was incomplete and out of date). The actual process to get there was hilariously baroque; for example (if my email records are correct and I didn’t get duplicate emails from the system), I must have clicked through 18 EULAs to get a subset of the permissions I need. I promise, that’s worse than it sounds since the form to do so took more time to click through than you can imagine. It was a month before I could effectively do any work at all.

I originally thought that I was just unlucky, but when I asked around I found that I had an above average experience, at least among people near me who started recently. Unlike many other folks, I was actually able to get a username/alias, a computer, and an office. I don’t know what folks without a username do since they can’t even start clicking through the necessary EULAs to get permissions to do stuff, let alone do any actual work.

I’m not sure how it is in other parts of the company, but if you imagine that it’s similar and do some back of the envelope math on how many people get onboarded and how much losing a month (or more) of time costs, it comes out to N million dollars for a non-trivial N. And that’s just the direct cost of the really trivial stuff. It’s hard to calculate the cost of the higher level stuff that’s mentioned in the article, but I suspect that’s even more expensive.


So your all individualy developing locally on your machine instead of on a server that you all use? - I think that's your problem right there


I'm curious, what subset of the software development world are you working in? "Developing locally on your machine" is and has always been standard practice in every part of the industry I have encountered.


Every project I worked on at BT (15 years) and all the webdev work it makes no sense to have 20 different pcs all developing and trying to merge the code at the end.

You remove the risk of subtle differences between developers pc's

if your only developing standalone apps I could see it but Microsoft doesn't develop windows or office that way.


Perhaps I just don't understand what you're describing, because in my experience it is completely normal to have developers working on 20 different PCs and all merging code as they go - that's what version control systems are for! I can't really imagine how else you would do it. Do all the devs really mount a shared filesystem with a single copy of the code!?

Or do you mean that the devs all still have individual copies of the source tree, but they do all their work by remoting in to some big monster server and building/testing there? That sounds.... um.... ungodly slow. Though I suppose if you are talking about webdev, perhaps there is no build step, so it might not suck quite as much...?

It sounds like a very different world, and I'm struggling to understand what you mean.

I never worked on Windows or Office, but I did work in devdiv for a while, and at that time Microsoft certainly did develop Visual Studio in the traditional way. There were build farms, to be sure, but they were used for occasional checkpoint builds and for testing. For daily work, we ran the ol' edit/compile/link/run/crash cycle locally, on our own dev machines.


You can have your own source control at the server level and every one logs into that server just run an X Server on your PC. And yes you should be developing on exactly the same hardware its going to run on.

All you need to set up a new user is a bog standard pc and a new account set up on the server.

Its a lot easier to keep dev test and live identical than 20 or so developers pc's

And even when we did local development (oracle forms) all the code was checked in to a central server which was used to produce a daily build.

How do you think IBM mainframe development is done?


Thanks for explaining. That's certainly a very different way of working from anything I've ever encountered. I don't know enough about mainframes to even wonder how things are done, it simply isn't part of my world at all.

I come from a personal computing background, where it really doesn't make sense to talk about "developing on exactly the same hardware it's going to run on" because you cannot know in advance what that is going to be. The kind of thing you are talking about makes even less sense in the embedded world, where cross-compilation is the rule and the target machines are likely not capable of self-hosting a toolchain at all.


I actually think those subtle differences are more likely to bring problems to the front earlier. If you have code that works on 20 slightly different setups, then it is likely to be more robust than code that has only ever seen one setup.

Obviously this is quite subjective, and depends a lot on what type of development you are doing.


What do you mean when you say develop on a server? Do you actually remote in and do development via commandline? Do you edit the files locally and upload it back? Do you run a remote desktop with gui?

If one person accidentally saves a file that has a syntax error or something, does it crash the server for everyone?

What tools do you use for development?


sccs or pvcs normaly


Who handles support for this? Is it Google, or one of the carriers that Google's using? Who do you talk to if the handoff between carriers doesn't quite work? I've skimmed the FAQ and I didn't see that listed in there.

For context, I use Google Voice, and while it's really nice in a lot of ways, it has some unfortunate bugs that have persisted for years that it seems impossible to get support for. I no longer unconditionally recommend it to people because while being able to take calls from my computer and get transcripts of voicemails is great, it also pseudo-randomly decides to only direct some calls to devices where I'm not present with no notification on devices that I'm using and also drops a small percentage of texts (my guess is around 1%). Most people I talk to are horrified by the idea of texts getting dropped, and I don't recommend GV to those folks, although it's fine if that's a tradeoff you're ok with.

I understand that it's a beta and that bugs are expected, but it's a problem when you run into a bug that interferes with your ability to use your phone and there's no way to get support for it.

Phone carriers aren't exactly known for their stellar customer service, but at least with them, if you call back enough times or escalate to higher level support, you'll eventually get a resolution. As far as I can tell, that's not true of Google's phone related products.


The typical MVNO model would be for Google to handle all the support. I agree that end-user support does not seem to be one of Google's strengths.


Their support for the Nexus line of phones has been outstanding. Anyone can phone and speak to real humans with good tech knowledge, arrange a free replacement for (first time) cracked screens, and get next day replacements.

I have no doubt that the support for this will be as good.

Now, their automated side of advertising is crappy unless you're a large spender, but then that's just economies of scale.


> Their support for the Nexus line of phones has been outstanding

That hasn't been my experience, particularly with this issue:


Basically you have to roll a dice whether you'll have sound in voice calls on a Nexus 4 running Lollipop. 500 comments, 1200 stars, and the issue is simply closed as "wrong forum" with no other comment than "contact customer support".

I mean, I'm glad that Google supports so many camera features, as that's clearly the most important function of a Nexus 4, but at the end of the day I still expect my phone to be able to, you know, make voice calls.

Related HN discussion: https://news.ycombinator.com/item?id=8898669


Android support != Nexus hardware support. It seems like you're much better off support-wise if your phone's screen is shattered than if you are having any sort of issue with the software...


Ironic given that Google is generally believed to be a software company.


It's been a while, but even spending 5 figure a month, I found it difficult to get quality support from AdWords.


Maybe that's okay -- end-user support isn't exactly one of the cellular industry's strengths either :P


I dunno, I'd rather have "bad support" over "support that literally doesn't exist".


I'd defer to those who currently have Google Fiber. Is support any good?


Google Fiber customer here: it's not the best support I've ever interacted with, but other telcoms have set the bar so ridiculously low that it's comparatively great.


Google owns that network, though.


So what's your point?


If the past is any indication, Google will have really good support at first for this, similar to how it worked for Glass. Then once the product has been received well by early adopters (and received good publicity), they will layoff/downsize until everyone is unhappy.


Obviously I understand your trepidation regarding Google's support, and the proof will be in the proverbial pudding, but they at least claim they'll be providing better support than for most of their other products:

LIVE SUPPORT AVAILABLE AROUND THE CLOCK If you need help, our support team is in the US and available all day, every day. Give us a call, and don't be surprised when you connect right away to a Fi Expert without pressing 0.


This is really neat and I love playing with stuff like this. But, fundamentally, it shows that stars aren't a good measure of anything besides stars, at least outside of the top scorers for commonly used languages. For those, it's not a bad proxy for fame.

If you like at the top javascript or ruby developers, yep, those are all pretty famous javascript and ruby developers. But if you look at the #6 matlab developer, well, turns out that's me. I've probably used matlab for less than 40 hours total, lifetime. And most of that was in grad school, a decade ago. Most of my stars come from a tutorial. Not a tutorial I created -- a tutorial I worked through, that thousands of people have probably done. Ok, so, not many people put matlab code on github, so that data is messy. What about popular languages?

Turns out I'm also the 240th most starred scala developer worldwide. I once used scala for two months and created some projects to help me learn that aren't even close to being polished enough to be useful to anyone. Like most code written by someone who's learning a language, it's not any good. But that somehow puts me at 240? Even in a pretty popular language, by the time you get into the hundreds worldwide (or the top few in most cities), it's people who just threw up some toy projects.

I wonder if this explains why I've been getting recruiters contacting me "because they saw my scala code on github". I doubt anyone who's actually seen my scala code on github would contact me for a scala position, but someone who uses a tool that counts stars might think that I actually know scala and contact me for a scala position. This particular tool is too new to be the source of that, but the page the source data comes from (github archive) shows how easy it is to make BigQuery queries to return results like this.

For Julia, I'm also presently ranked above all of the co-creators of Julia, despite having spent a total of perhaps 20 hours ever using the language (I'm 72, compared to the co-creators, who are 113, 143, and unranked).

BTW, in languages I've actually worked in professionally, I'm 98,582/244,375 in a language I used for years before it became trendy, 1,100/1,835 in a language I've used a lot recently, and 75,998/161,465 in a language I've used some recently. In the language I'm most proficient in, the language I'm mostly likely to reach for if I just want to get things done, I'm 14,800/25,094.

P.S. If the developer is reading this and wants bugreports, your service returns a "503 Service Unavailable" if you click the "top foo github developers in your city" for developers that don't have an associated city.


As another data point - in the Clojure space, I'm ranked right alongside Rich Hickey down in the 30s.

For those of you who aren't aware, Rich is the inventor and maintainer of the language.


That might just say there is very little matlab code on github. Or matlab users don't use github stars much?

Perhaps a better label than 'top developer' would be 'developer of most popular repos' or something.


Or one person made 30 repos and stared them all himself once... giving the account 30 stars in some obscure language.


Am I the only one who thinks that this kind of design question is the new "how many gas stations are there in Manhattan?"

In both cases, the point isn't to get the right answer, but (allegedly) to see how the person thinks. With the estimation question, the trick is to make up some plausible-ish numbers and then multiply and/or add them to get a plausible-ish result. If someone's seen one of those before, they'll nail it for sure. If not, it's a crapshoot. In theory, you're measuring whether or not someone's able to reason well enough to spontaneously estimate something off the top of their head, but comparing the number of people who've seen those question before vs. the number of people who can answer that type of question without having ever seen it before, what you're really filtering for is people who have heard of or seen Fermi questions.

A while back, someone's interviewing experience got posted to HN and they mentioned that they got asked to design a URL shortening service maybe five times. They failed it the first couple of times and got progressively better each time. I doubt the interviewers meant to measure whether or not this person had done enough interviews to catch on to the style of questions that are currently trendy, but that's what they actually measured.

I think the most common objection to this is that the point of these questions is to drill down and figure out how the person really thinks, but in empirical studies on interviewing, they find the actual evaluation of chatty questions like this is heavily influenced by all kinds of biases. Techniques that have more clear-cut evaluation criteria, like work sample tests, and even completely non-technical interviews like behavioral interviews end up being better filters. Even then, the filtering isn't "good" (IIRC, the last time I read one of those studies, work sample tests score the best, and had a correlation of around .5 with an ideal filter), but it's better.

For these kinds of questions, even if you haven't seen the specific question before, there's all sorts of interview gamesmanship that helps tremendously. One thing in the blog post, not spending an hour on requirements gathering, is an example of that. In real life, if you're going to write an application from scratch, spending more than an hour on requirements gathering is perfectly reasonable for a lot of problem domains. But if you're playing the interview game, you know that you have, at most, an hour to sketch out the entire problem, so you have to cut the requirements gathering phase short (but not too short). The post mentions that this gets at real skills. That's true. It does. The problem is that everyone who's seen this kind of question five times is going to be good enough at the interview gamesmanship that they don't need to have good real skills to breeze through the "don't spend too long talking about requirements" sub-filter.


The paradox is that the best programmers tend to be the worst at interviewing (because they usually get hired quickly) and the worst programmers gradually get better at interviewing (after doing lots of them).


The difference here is that the Monopoly question is asking the candidate to reason about how to solve the problem with a computer program. The author admits that there can be biases, but isn't that the case even with work sample tests? Someone could have a novel solution to a problem, and the reviewer could say "oh, that's too clever to maintain."


I think it's a fine question. You are asked to design a program that would do something sufficiently complex and ill specified, which is something you will encounter on the job.

I am not saying that you shouldn't test the coding abilities of your candidate, but it looks to me a good way to see how he/she reasons about the various components of the system and asks for requirements.


The claim:

“Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages.”

But how did they determine that?

The authors looked at the 50 most starred repos on github, for each of the 20 most popular languages plus typescript (minus CSS, shell, and vim). For each of these projects, they looked at the languages used (e.g., projects that aren’t primarily javascript often have some javascript).

They then looked at commit/PR logs to determine figure out how many bugs there were for each language used. As far as I can tell, open issues with no associated fix don’t count towards the bug count. Only commits that are detected by their keyword search technique were counted.

After determining the number of bugs, the authors ran a regression, controlling for project age, number of developers, number of commits, and lines of code.

That gives them a table (covered in RQ1) that correlates language to defect rate. There are a number of logical leaps here that I’m somewhat skeptical of. I might believe them if the result is plausible, but a number of the results in their table are odd.

The table “shows” that Perl and Ruby are as reliable as each other and significantly more reliable than Erlang and Java (which are also equally reliable), which are significantly more reliable than Python, PHP, and C (which are similarly reliable), and that typescript is the safest language surveyed.

They then aggregate all of that data to get to their conclusion.

I find the data pretty interesting. There are lots of curious questions here, like why are there more defects in Erlang and Java than Perl and Ruby? The interpretation they seem to come to from their abstract and conclusion is that this intermediate data says something about the languages themselves and their properties. It strikes me as more likely that this data says something about community norms (or that it's just noise), but they don’t really dig into that.

For example, if you applied this methodology to the hardware companies I’m familiar with , you’d find that Verilog is basically the worst language ever (perhaps true, but not for this reason). I remember hitting bug (and fix) #10k on a project. Was that because we have sloppy coders or a terrible language that caused a ton of bugs? No, we were just obsessive about finding bugs and documenting every fix. We had more verification people than designers (and unlike at a lot of software companies, test and verification folks are first class citizens), and the machines in our server farm spent the majority of their time generating and running tests (1000 machines at a 100 person company). You’ll find a lot of bugs if you run test software that’s more sophisticated than Quickcheck on 1000 machines for years on end.

If I had to guess, I would bet that Erlang is “more defect prone” than Perl and Ruby not because the language is defect prone, but because the culture is prone to finding defects. That’s something that would be super interesting to try to tease out of the data, but I don't think that can be done just from github data.


You've done a good job of pointing out weaknesses in the paper, but I have a question about your defect rate argument: Would you hold the same position if the study had said the opposite? In other words, if it claimed that Ruby and Perl had more bug fixes (and thus defects) than Erlang and Java, would you claim the study was flawed due to a culture of meticulous bug-finding in Perl and Ruby?

From Conservation of Expected Evidence[1]:

If you try to weaken the counterevidence of a possible "abnormal" observation, you can only do it by weakening the support of a "normal" observation, to a precisely equal and opposite degree.

It really seems like a stretch to say that higher bug fix counts aren't due to higher defect rates, or that higher defect rates are a sign of a better language. Language communities have a ton of overlap, so it seems unlikely that language-specific cultures can diverge enough to drastically affect their propensity to find bugs.

1. http://lesswrong.com/lw/ii/conservation_of_expected_evidence...


I'd love to see a methodology similar to the one here analyzing concurrency bugs: http://www.cs.columbia.edu/~junfeng/09fa-e6998/papers/concur... . Akin to what's done in social sciences, they applied simple labels to bugs in the bug repos -- a grad student and some undergrads can label a lot in a couple weeks -- and regress on that.


Personally, I have read the abstract alone, and was pleased to see it nurtured my own preferences: static typing is better, strong typing is better. But then I saw this disclaimer:

> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.

So I dismissed the paper as "not conclusive", at least for the time being. I wasn't surprised if their finding were mostly noise, or a confounding factor they missed.

By the way, I recall some other paper saying that code size is the most significant factor ever to measure everything else. Which means that more concise and expressive languages, which yield smaller programs, will also reduce the time to completion, as well as the bug rate. But if their study corrects for project size, while ignoring the problems being solved, then it overlooks one of the most important effect of programming languages in a project: its size.


What's striking is the comment that the "mysql" project has the highest bug fix density of the C programs. That seems unexpected, because MySQL is very heavily used and reasonably stable.

It may be simply that MySQL bugs actually get fixed. The work behind this paper counts bug fixes, not bug reports. Unfixed bugs are not counted.


Having used MySQL in the past and vowed never to use it again, I would not be especially surprised if the MySQL code base was unusually buggy. On the other hand, high use rates could well lead to high bug discovery rates.


Just the initial data-set alone is going to be incredibly mis-representative of software in general.

* Most-starred projects mean these are all successful projects. Much software is not successful.

* A successful project means there are likely more-experienced-than-average engineers coding.

* Most-starred projects will be older, more stable code-bases than most software.

* Open Source development is a small slice of software development.

* Github is a sub-set of Open Source development.

And I expect there are a myriad of holes to poke elsewhere. In general I distrust any research that surveys GitHub and tries to make claims about software development in general. It is lazy.


>> why are there more defects in Erlang and Java than Perl and Ruby?

I have no experience with Erlang, but one reason I'd expect Java to have more defects than Ruby and Perl is that Java is more verbose, i.e. it takes more code to get something done. One would naively expect to find an association between the size of commits and their propensity to contain errors.


The nice thing about the paper is that they try to put data where others put guesses, rants, expectations, and believes. Arguing with data is actually difficult (and rare) in the software engineering field.

The practical result is that language choice doesn't matter since the effects are very small.


> There are lots of curious questions here, like why are there more defects in Erlang and Java than Perl and Ruby?

Maybe because Erlang and Java are used for projects of higher complexity (larger scope, more interacting components, etc.)? Did the authors try to address this issue at all?


The standout result for me that led me to believe this was very likely the case was that Erlang scored _horribly_ on concurrency bugs. To me, that makes sense: you're seeing all the bugs that come from trying to tackle tricky high concurrency situations, which is why people picked Erlang in the first place. If people tried to tackle those exact same problems in other languages, we'd probably see them doing worse at concurrency.


Not to mention the Erlang mantra of 'let it fail'. They're tracking bugs, not severity. Someone may file an issue "hey, in this instance, this thing goes wrong", but because of the supervision process, it doesn't actually cause anything to break (just a logged error message and a bit of system churn). The language actively encourages you to code happy path, and address failure conditions only as necessary, relying on the supervision tree to handle errors that stem from deviating off the happy path.


This is one of those things that's well known by people who spend a lot of time doing low-level nonsense that really ought to get wider press. You really don't want to use x86 hardware transcendental instructions.

I've seen people try to measure the effectiveness of their approximation by comparing against built-in hardware, but their approximation is probably better than the hardware! If you're not writing your own assembly, you'll be ok on most modern platforms if you just link with "-lm" and use whatever math library happens to be lying around as the default, but it's still possible to get caught by this. On obscure platforms, it's basically a coin flip.

I used to work for another CPU vendor, and when we implemented more accurate sin/cos functions, some benchmarks would fail us for getting the wrong result.

Turns out those benchmarks hardcode a check for a couple results, and that check is based on what Intel has been doing for ages. My recollection is that we had a switch to enable the extra accuracy, but that it wasn't enabled by default because it was too much of a risk to break compatibility with Intel.

If that sounds too risk averse, there's a lot of code out there that depends on your processor precisely matching Intel's behavior on undefined flags and other things that code has no business depending on. It's basically the same thing Raymond Chen is always talking about, but down one level of abstraction. I've seen what happens if you just implement something that matches the "spec" (the Intel manual) and do whatever you want for "undefined" cases. You get a chip that's basically useless because existing software will crash on it.


> You really don't want to use x86 hardware transcendental instructions

...unless you're doing size-optimised graphics intros, where tremendous accuracy is not essential and the stack-based nature of x87 makes for some extremely compact code:



I think the main reason why this issue hasn't been so significant is that the majority of code that uses those instructions isn't dependent on extreme accuracy - code that does would likely use their own routines and/or even arbitrary-precision arithmetic.


The problem is not so much "Intel stuff is old and broken" as it is "Intel wrote the standard and there's no process to move it forward"?


OpenGL is a great example of how painful it is to write apps that are capable of not breaking when the standard it relies on changes. You basically end up with capability checks or version checks which cause your code to become complicated.

I suppose there could've been a function that returned the implementation version of fsin/fcos/etc, which was incremented whenever Intel changed the implementation. That way benchmarks could test against that particular version. But it'd be hard to come up with a precise, consistent way to implement that. For example, do you increment the version whenever any transcendental function changes? If you do that, then you get benchmark code riddled with special cases for very specific versions, which someone has to constantly look up in documentation somewhere. You'd basically have to memorize the effect of every version increment. On the other hand, you could implement a more elaborate scheme, like having a version number for each of the transcendental functions, but then you'd either need to hardcode the number of version numbers returned by the API, or expose a way to add to the total number of version numbers in case someone wanted to add another function, which is extra API complexity.

I'm not necessarily arguing against having some kind of process by which apps could have the implementation of sin/cos/etc changed, just explaining how it gets complicated pretty quickly.


One thought on version number is that the version number could be the product of prime powers.

If function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1, we can encode this as 2^5 * 3^3 * 5^7 * 7^1 = 472500000. This lets us unambiguously increment orthogonal values while still keeping them represented as a single number. We could even easily add a function 5 by multiplying the number by 11.

One problem is that it's not super dense (storing those 4 values takes ~29 bits), but a 128bit or 256bit number would store a larger value than is likely needed by most applications.

One benefit is that software can check for the functions that it's interested in (as long as we can agree on what prime corresponds to what function - a relatively mild standard, if we just use order of introduction) while ignoring any portion of the result it doesn't understand.


> Storing those 4 values takes ~29 bits.

Just storing those 4 values in 4 bits is more efficient.

Checking that a bit is set is also a simple AND + CMP. Which beats out a DIV + CMP.

Sorry I just had not considered this isn't common knowledge outside C or C++ land.


> Just storing those 4 values in 4 bits is more efficient.

That doesn't really work for version numbers > 1. They're not just flags.


How do you store a version number in a single bit?


Well the bit n represents the version n, if this bit is at 1, this version is supported, otherwise it's not.


A few bits per value (depending how big you predict they will get). One bit per value means you can only increment version of each function once :)


That's a crazy scheme on a computer.

Just store, say, a byte per value, and concat them into your 128 bit number.


It's easy to come up with much more compact schemes that are still completely generic. For example, to represent a variable length number use xxx0, 0xxxxxx1, 0xxxxxx1xxxxxx1, etc. Normally you need a pair of these to specify (version,function). But if we assume functions are packed, you can have a run-length scheme (run-length,starting-function-number).

So "function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1" is four functions starting with 1 = (1000,0010),1010,0110,1110,0010 = 24 bits.

It gets better quick with larger numbers of functions.

BTW, the size of the prime scheme is log2(nthprime(function))*version bits for each function. If you don't know ahead of time which functions there might be, then you have to do a prime factorization, which is a hard problem. I guess if you used really large numbers you could have a cryptographically secure way of indicating which versions of which functions are in your library.


This is called Godel numbering (http://en.wikipedia.org/wiki/Godel_numbering).


One problem is that it's not super dense...

If you start with version 0 (and why not?), I think the Fundamental Theorem of Arithmetic implies maximum possible density?


You have maximum density for version 0, but when versions get sufficiently high then the numbers of functions there are sequences of bits with no valid interpretation because they represent primes or multiples of primes > the number of functions. Also increasing the versions of functions represented by larger prime numbers requires more bits than increasing the versions of functions represented by smaller numbers.

I think I would just represent the version of each function with a byte.


D'oh! Yes, any possible library contains only a finite number of functions to version. Even if we order the functions in decreasing order of update frequency, this scheme will overflow quickly if more than one or two functions update regularly.


NM, That's space efficient for 0's, but only 0 values. Your probably better off with something easy to decode. aka 2 bytes per function for an ID = 50,000 functions for 100kb which is hardly an issue. And if you really need to have more than 65,000 versions of the same function you can always deprecate it. An if your really only tracking 10 functions then that's still only 10 bytes which is hardly an issue.

55th prime > 256. So incrementing the version number adds more than one full byte to your number.


I think the suggestion was to multiply the powers of primes. e.g.

fn 2 @ v1, fn 3 @ v1 = 2^1 * 3^1 = 6

fn 2 @ v2, fn 3 @ v1, fn 5 @ v3 = 2^2 * 3^1 * 5^3 = 2 * 2 * 3 * 5 * 5 * 5 = 1500


I guess that might be useful if the vast majority of things are on version 0. But it quickly get's out of hand.

EX: Multiplying by 2 adds a full bit to your number every time you increment it. And it just get's worse from there.

If you really want to store mostly zeros your probably better off just using a count of the functions > 0 and then either use a bit for each function to show it's > 0 or an index if things are sparse.


It's certainly true that it has a nasty growth curve (exponential in version number; probably worse in number of active terms).

I just think it's fun for low value things, and interesting if you're in the situation you have a huge feature set relative to the number of active features, since it incurs no penalty for fields that are 0. Any time you have about ~1% features being present, it can make a difference. Example: if you have about 5 of 500 fields with 16 possible values represented, storing all the 0s is expensive. Using a naive 4-bits-per-field, you end up with 2000 bits, whereas, using my method, you only use ~620. Even using a 500bit number to flag which fields are present and then listing just those fields in order only saves you about ~100 bits over my method.

Plus, I manage to spark off a discussion about math and the problem, rather than just whining about it being hard, which I consider to be a good thing.


What's wrong with indexes? 1024 = 10 bits * 5 = 50bits then 4 bits * 5 = 70 bits total.

Or just a bit flag for active (500bits) then 4 bits * 5 for your count = 520 bits.

And if you really want to compress things start with a count so the 500 bit flag is only used if your count is over 50.

PS: in your case you also need an index of number if bits / bytes used.

Edit: If there is a reasonable chance your counts are greater than 1 just appending one bit per count vs using higher exponents saves space aka for 3 numbers append 100101 = first 1 then 3 then 2.


I love that this has a big fat "reproduce the results" button with detailed build instructions.

Why isn't reproducibility required to publish in CS? Unlike in fields like psychology or chemistry, reproducing results should be trivial if the authors provide instructions on how to do it.


In the natural sciences, reproducibility means that a separate team run an experiment on independent apparatus, configured from written information (e.g., the paper). Not that you re-run the experiment on the original apparatus set up by and made available by the original authors. Clicking a button is not "reproducibility" in that sense, though it is better than not being able to do even that. Shipping code/VMs/etc. and having a 2nd team just re-run it has too much of the original lab in it to be real replication; it's more like just inspection of the 1st setup. Which is better than no inspection, but worse than independent reproduction of the results.

Applied to CS, there's really no way around it: reproduction requires that researchers claiming to replicate a result implement it independently. If there are not two independently produced implementations that both confirm the result, the research hasn't been reproduced. The process of doing so helps to discover cases where the original results were due to idiosyncracies of the original implementation, test setup, etc. It'd even be ideal if replication was done in as dissimilar a setup as possible, to find cases where the results unexpectedly depend on details of the original setup not thought to be important.

If anything I think there are quite some dangers that reproduction will be decreased by the current trend towards what's questionably called "reproducible research" as a euphemism for "code reuse". If people reuse code rather than doing their own independent reimplementation of methods as stated in papers, erroneous results can lurk for years and infect other research as well. (I think code reuse is good for engineering practicality, and making permissively licensed code available is also a good way of getting researchy methods out of academia into the real world. But I think it is quite wrong to call reusing the original researcher's setup, rather than independently producing your own, "reproduction" in the scientific sense.)


The necessity of truly independent reproduction diminishes when the researchers/engineers can supply an implementation that does real work in real situations. A very high bar of self-evidence, if you will.

True reproduction remains necessary to confirm that the stated underlying science is the cause, and not some other variable that might be legitimate and remarkable but undocumented.


Providing an implementation of a network protocol is hardly doing "real work in real situations". I mean, it ships the packets from here to there but so does every other implementation of every other protocol.

The thing that is interesting or novel about this protocol as opposed to the thousands of other proposals is that it's supposed to be faster than the other proposals for a broad range of applications with a broad range of link topologies and across a broad range of software/hardware platforms.

Of course, to establish that claim replication is completely critical.


> Providing an implementation of a network protocol is hardly doing "real work in real situations".

In what sense is a functional implementation not able to do real work?


Completely agree - any work on 3D algorithms seems particularly prone to this (presumably due to funding sources?)

I once went so far as to email the author of a paper to ask for their implementation of a Minkowski 3D hull algorithm, and they point blank refused. Now I don't believe I'm entitled to this of course, it's just strange you can claim something in a paper and for it to not be reproducible.

Oddly I think it's the triviality of sharing implementations that make them more guarded.


Don't discount embarrassment at the state of research-quality code!




Cutting costs isn't IBM's strategy to get out of the recession. Cutting costs is IBM's strategy. I interned there in 2003, and it was already apparent that they preferred to hire people overseas and let attrition reduce the ranks of folks locally in order to reduce costs. From all the complaints, it was clear at the time that was causing serious problems with R&D.

The joke around the office was that we'd replace one person locally with three people overseas, losing the output of two people, since the three overseas people were so clueless they'd need a local person holding their hands full-time. I was doing microprocessor design at the time, and the community is small enough that everyone know that Intel was getting good folks overseas. But even at the reduced wages outside the U.S., IBM was cutting corners and not hiring the best people.

Even locally, they try to reduce costs. A friend of mine who stuck around long enough to make it into management told me that they try to keep salaries at about the 40%-ile to save costs. On finding that out (as well as a few other gems), he left. Until I heard that, I couldn't figure why brilliant friends of mine often got raises that didn't even cover inflation. Don't they know that people are going to leave because of that? They know, and that's their strategy.

The thing about Bernanke comes out of left field. IBM didn't use low interest rates to invest therefore "the companies that were expected to spend us back to better economic health didn’t do so" therefore low interest rates didn't make the recession less severe than it otherwise would have been? No comment on whether or not those last two statements are actually true (I'm not an economist and haven't studied the issue), but Cringley certainly doesn't make a case for them.


My wife worked at IBM for many years and every year, like clockwork, right before the quarter was up, they'd shut down the power to entire IBM sites for a day or two to save money. Everyone would be told to work from home. That's how important cost-cutting is to IBM.


Poundwise Pennyfoolish no? (Or is it Pennwise Poundfoolish?)

If productivity doesn't drop, why not telecommute 50%?


I interned there too in 2011 and can testify that employee morale is ridiculously low. No one loves the company, the only people who were working hard were the lifers who'd joined after college and were still there at 40 or 50.


This is why I left IBM (SWG) a few months ago. When I turned in my notice after 1.5 years at IBM, people, including very senior people and managers, cheered me on saying things like "I wish I could leave" or "I put my notice in 6 months ago but they gave me a huge raise...".

When the financial gimmicks run out, IBM is absolutely screwed because they've been cannibalizing their most valuable asset for short term shareholder value.


I know a bunch of people only hanging out because they think they'll get laid off anyways and IBM severance + unemployment isn't so bad.


Isn't it easier to look for a job when you're not competing with 8,000 other laid off people?


IBM is a slow trickle kind of lay-off place. So there are always lay-offs happening.


Perhaps, but if you've been at IBM for 10 years, 6 months of doing nothing but collecting your pay check probably sounds really good.


A good talk on one of IBM's big problems was submitted this morning but gained no traction. I wish HN'er would learn to identify big names in finance so this stuff wouldn't be missed.



It's a good point. But he's making the same point as Stockman (see below), and Cringely's story links to that clip as well. https://news.ycombinator.com/item?id=8085628


Yes, but Drunkenmiller "Broke the Bank of England"


When he speaks, he's probably worth listening to.


Point taken. Thanks for the link.


I was a manager at IBM at the beginning of off-shoring. I too left because I disagreed with the model, in principle. However, the statistics did show that the company could save ~60% for the same quality of service, immediately. What was unclear and highly debated at that time was how sustainable that saving was and how developing economies would impact on those decisions.


>IBM didn't use low interest rates to invest therefore "the companies that were expected to spend us back to better economic health didn’t do so" therefore low interest rates didn't make the recession less severe than it otherwise would have been?

Investing in your own shares doesn't lead to growth like capital investment does, it's just more money at rest. Low interest rates improved the recession through improving the stock market - through 'trickle-down'. That's why the stock market can be raging while the larger economy is in the crapper.

So low interest rates can simultaneously make the recession less severe than it otherwise would have been, yet make the growth potential of the economy worse.



Applications are open for YC Summer 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact