Right now? I'm pretty unhappy. I don't think I want to talk about exactly why publicly while I'm still in this job, but to give you an idea of how bad the situation is, literally all of my friends have been trying to get me to quit for months, with the exception of people who have given up because they think I must be insane.
On the other hand, I have a decade of full-time experience and I've been happy for about seven out of ten years. All things considered, that's not too bad. The other way to look at it is that I've had maybe five roles at one company, two another another, and one at a third, and I'd say four of those have been good. That's only 4/8, but it's possible to bail on bad roles and stay in good ones, which is how it's worked out to being good 70% of the time. Considering how other folks I know feel about their job, I can't complain about being happy 70% of the time.
In retrospect, some of my decisions have been really bad. If I could do it over again, I'd bail more quickly on bad roles and stay in good ones for longer.
My dumbest mistake was the time I was in an amazing position (great manager & team, really interesting & impactful work), except for two problems: an incredibly arrogant and disruptive person whose net productivity was close to zero who would derail all meetings and weird political shenanigans way above my pay grade. When I transferred, management offered to transfer the guy the guy to another team so I'd stay and I declined because I felt bad about the idea of kicking someone off the team.
From what I've heard, the problematic dude ended up leaving the team later anyway, so not having him kicked off didn't make any difference, and the political stuff resolved itself around the same time. The next role I ended up in was the worst job I've ever had. And the one after that is my current job, which is, well, at least it's no the worst job I've ever had. Prior to leaving the amazing job, I thought that it was really easy to find great jobs, so it wasn't a big deal to just go find another one. Turns out it's not always so easy :-). If I hadn't bailed on that and just fixed it, I'd be 4/6 and I could say I was happy with my job 80% of the time. Oh well, lesson learned. Looking back, I was incredibly lucky to get the roles that I did, but that same luck blinded me to the fact that it was luck and that there are some really bad jobs out there.
There is no chicken and egg problem for most workloads. Processors are quite good at handling correctly predicted branches, and overflow checks will be correctly predicted for basically all reasonable code. In the case where the branch is incorrectly predicted (because of an overflow), you likely don't care about performance anyway.
See http://danluu.com/integer-overflow/ for a quick and dirty benchmark (which shows a penalty of less than 1% for a randomly selected integer heavy workload, when using proper compiler support -- unfortunately, most people implement this incorrectly), or try it yourself.
People often overestimate the cost of overflow checking by running a microbenchmark that consists of a loop over some additions. You'll see a noticeable slowdown in that case, but it turns out there aren't many real workloads that closely resemble doing nothing but looping over addition, and the workloads with similar characteristics are mostly in code where people don't care about security anyway.
People who are actually implementing new languages disagree. Look at the hoops Rust is jumping through (partially) because they don't feel comfortable with the performance penalty of default integer overflow checks: https://github.com/rust-lang/rfcs/pull/146
TL;DR: There exists a compiler flag that controls whether or not arithmetic operations are dynamically checked, and if this flag is present then overflow will result in a panic. This flag is typically present in "debug mode" binaries and typically absent in "release mode" binaries. In the absence of this flag overflow is defined to wrap (there exist types that are guaranteed to wrap regardless of whether this compiler flag is set), and the language spec reserves the right to make arithmetic operations unconditionally checked in the future if the performance cost can be ameliorated.
I'm hopeful that as an industry we're making baby steps forward. Rust clearly wants to use checked arithmetic in the future; Swift uses checked arithmetic by default; C++ should have better support for checked arithmetic in the next language revision. All of these languages make heavy use of LLVM so at the very least we should see effort on behalf of the backend to reduce the cost of checked arithmetic in the future, which should hopefully provide additional momentum even in the potential absence of dedicated hardware support.
If you read the thread, you'll see that the person who actually benchmarked things agrees: someone implemented integer overflow checks and found that the performance penalty was low, except for microbenchmarks.
If you click through to the RISC-V mailing list linked to elsewhere in this discussion, you'll see that the C++17 standard library is planning on doing checked integer operations by default. If that's not a "performance focused language", I don't know what is.
> the C++17 standard library is planning on doing checked
> integer operations by default
In C++, wrapping due to overflow can trivially cause memory-unsafe behavior, so it's a pragmatic decision to trade off runtime performance for improved security. However, Rust already has enough safety mechanisms in place that integer overflow isn't a memory safety concern, so the tradeoff is less clear-cut.
Note that the Rust developers want arithmetic to be checked, they're just waiting for hardware to catch up to their liking. The Rust "specification" at the moment reserves the right to dynamically check for overflow in lieu of wrapping (Rust has long since provided types that are guaranteed to wrap for those occasions where you need that behavior).
> someone implemented integer overflow checks and found
> that the performance penalty was low, except for
I was part of that conversation back then, and the results that I saw showed the opposite: the overhead was only something like 1% in microbenchmarks, but around 10% in larger programs. (I don't have a link on hand, you'll have to take this as hearsay for the moment.)
The benchmark I see says up to 5% in non-microbenchmarks. A 5% performance penalty is not low enough to be acceptable as the default for a performance-focused language. If you could make your processor 5% faster with a simple change, why wouldn't you do it?
Even if the performance penalty was nonexistent in reality, the fact is that people are making decisions which are bad for security because they perceive a problem, and adding integer overflow traps will fix it.
As someone who's spent the majority of their working life designing CPUs (and the rest designing hardware accelerators for applications where CPUs and GPUs aren't fast enough), I find that when people say something like "If you could make your processor 5% faster with a simple change, why wouldn't you do it?", what's really meant is "if, on certain 90%-ile or 99%-ile best case real-world workloads, you could get a 5% performance improvement for a significant expenditure of effort and your choice of a legacy penalty in the ISA for eternity or a fragmented ISA, why wouldn't you do it?"
And the answer is that it there's a tradeoff. All of the no-brainer tradeoffs were picked clean decades ago, so all we're left with are the ones that aren't obvious wins. In general, if you look at a field an wonder why almost no one has done this super obvious thing for decades, maybe consider that it might be not so obvious after all. At zurn mentioned, there are actually a lot of places where you could get 5% and it doesn't seem worth it. I've worked at two software companies that are large enough to politely ask Intel for new features and instructions; checked overflow isn't even in the top 10 list of priorities, and possibly not even in the top 100.
In the thread you linked to, the penalty is observed to be between 1% and 5%, and even on integer heavy workloads, the penalty can be less than 1%, as demonstrated by the benchmark linked to above. Somehow, this has resulted in the question "If you could make your processor 5% faster ...". But you're not making your processor 5% faster across the board! That's a completely different question, even if you totally ignore the cost of adding the check, which you are.
To turn the question around, if people aren't willing to pay between 0% and 5% for the extra security provided, why should hardware manufacturers implement the feature? When I look at most code, there's not just a 5% penalty, but a one to two order of magnitude penalty over what could be done in the limit with proper optimization. People pay those penalties all the time because they think it's worth the tradeoff. And here, we're talking about a penalty that might be 1% or 2% on average (keep in mind that many workloads aren't integer heavy) that you don't think is worth paying. What makes you think that people would who don't care enough about security to pay that kind of performance penalty would pay extra for a microprocessor that gives has this fancy feature you want?
> people aren't willing to pay between 0% and 5% for the extra security provided
This is not true. One problem is that language implementations are imperfect and may have much higher overhead than necessary. An even bigger problem is that defaults matter. Most users of a language don't consider integer overflow at all. They trust the language designers to make the default decision for them. I believe that most people would certainly choose overflow checks if they had a perfect implementation available, and perfect knowledge of the security and reliability implications (i.e. knowledge of all the future bugs that would result from overflow in their code), and carefully considered it and weighed all the options, but they don't even think about it. And they shouldn't have to!
For a language designer, considerations are different. Default integer overflow checks will hurt their benchmark scores (especially early in development when these things are set in stone while the implementation is still unoptimized), and benchmarks influence language adoption. So they choose the fast way. Similarly with hardware designers like you. Everyone is locally making decisions which are good for them, but the overall outcome is bad.
> if people aren't willing to pay between 0% and 5% for
> the extra security provided
In the context of Rust, integer overflow checks provide much less utility because Rust already has to perform static and dynamic checks to ensure that integers are used properly, regardless of whether they've ever overflowed (e.g. indexing into an array is a checked operation in Rust). So as you say, there's a tradeoff. :) And as I say elsewhere in here, the Rust devs are eagerly waiting for checked overflow in hardware to prove itself so that they can make it the default and do away with the current compromise solution (which is checked ops in debug builds, unchecked ops in release builds).
There are areas where you could make a typical current processor "up to 5%" faster in exchange for dumping various determinism features provided in hardware that are conductive to software robustness in the same way as checked arithmetic. For example the Alpha had imprecise exceptions and weak memory ordering. The consensus seems to be against this kind of tradeoff.
This RFC was the result of a long discussion that took place in many forums over the course of several years, so it's tricky to summarize. Here's my attempt:
1. Memory safety is Rust's number one priority, and if this were a memory safety concern then Rust's hands would be tied and it would be forced to use checked arithmetic just as it is forced to use checked indexing. However, due to a combination of all of Rust's other safety mechanisms, integer overflow can't result in memory unsafety (because if it could, then that would mean that there exists some integer value that can be used directly to cause memory unsafety, and that would be considered a bug that needs to be fixed anyway).
2. However, integer overflow is still obviously a significant cause of semantic errors, so checked ops are desirable due to helping assure the correctness of your programs. All else equal, having checked ops by default would be a good idea.
3. However however, performance is Rust's next highest priority after safety, and the results of using checked operations by default are maddeningly inconclusive. For some workloads they are no more than timing noise; for other workloads they can effectively halve performance due to causing cascading optimization failures in the backend. Accusations of faulty methodology are thrown around and the phrase "unrepresentative workload" has its day in the sun.
4. So ultimately a compromise is required, a new knob to fiddle with, as is so often the case with systems programming languages where there's nobody left to pass the buck to (and you at last empathize with how C++ got to be the way it is today). And there's a million different ways to design the knob (check only within this scope, check only when using this operator, check only when using this type, check only when using this compiler flag). In Rust's case, it already had a feature called "debug assertions" which are special assertions that can be toggled on and off with a compiler flag (and typically only enabled while debugging), so in lieu of adding any new features to the language it simply made arithmetic ops use debug assertions to check for overflow.
So in today's Rust, if you compile using Cargo, by default you will build a "debug" binary which enables checked arithmetic. If you pass Cargo the `--release` flag, in addition to turning on optimizations it will disable debug assertions and hence disable checked arithmetic. (Though as I say repeatedly elsewhere, Rust reserves the right to make arithmetic unconditionally checked in the future if someone can convincingly prove that their performance impact is small enough to tolerate.)
There isn't as strong a need for ASan in Rust because so little code is unsafe. Most of the time, the only reason you drop down to unsafe code is because you're trying to do something compilers are bad at tracking (or that is a pain in the neck to encode to a compiler). It's usually quite well-contained, as well.
You can work with uninit memory, allocating and freeing memory, and index into arrays in Safe Rust without concern already (with everything but indexing statically validated).
IMHO the kind of stuff `unsafe` is used for is very conducive to aggressive automated testing.
This is timely, as I’ve just gone through the worst onboarding I've ever experienced. It took two very long days to install an OS on a machine and get it to boot (I had to assembly my computer and then use the corporate installer, which failed multiple times with errors like “an unknown error occurred”). It then took me about a week to run the local project’s “hello world” (I was initially pointed to documentation that had been deprecated for months, and then when I found the right documentation it was incomplete and out of date). The actual process to get there was hilariously baroque; for example (if my email records are correct and I didn’t get duplicate emails from the system), I must have clicked through 18 EULAs to get a subset of the permissions I need. I promise, that’s worse than it sounds since the form to do so took more time to click through than you can imagine. It was a month before I could effectively do any work at all.
I originally thought that I was just unlucky, but when I asked around I found that I had an above average experience, at least among people near me who started recently. Unlike many other folks, I was actually able to get a username/alias, a computer, and an office. I don’t know what folks without a username do since they can’t even start clicking through the necessary EULAs to get permissions to do stuff, let alone do any actual work.
I’m not sure how it is in other parts of the company, but if you imagine that it’s similar and do some back of the envelope math on how many people get onboarded and how much losing a month (or more) of time costs, it comes out to N million dollars for a non-trivial N. And that’s just the direct cost of the really trivial stuff. It’s hard to calculate the cost of the higher level stuff that’s mentioned in the article, but I suspect that’s even more expensive.
I'm curious, what subset of the software development world are you working in? "Developing locally on your machine" is and has always been standard practice in every part of the industry I have encountered.
Perhaps I just don't understand what you're describing, because in my experience it is completely normal to have developers working on 20 different PCs and all merging code as they go - that's what version control systems are for! I can't really imagine how else you would do it. Do all the devs really mount a shared filesystem with a single copy of the code!?
Or do you mean that the devs all still have individual copies of the source tree, but they do all their work by remoting in to some big monster server and building/testing there? That sounds.... um.... ungodly slow. Though I suppose if you are talking about webdev, perhaps there is no build step, so it might not suck quite as much...?
It sounds like a very different world, and I'm struggling to understand what you mean.
I never worked on Windows or Office, but I did work in devdiv for a while, and at that time Microsoft certainly did develop Visual Studio in the traditional way. There were build farms, to be sure, but they were used for occasional checkpoint builds and for testing. For daily work, we ran the ol' edit/compile/link/run/crash cycle locally, on our own dev machines.
You can have your own source control at the server level and every one logs into that server just run an X Server on your PC. And yes you should be developing on exactly the same hardware its going to run on.
All you need to set up a new user is a bog standard pc and a new account set up on the server.
Its a lot easier to keep dev test and live identical than 20 or so developers pc's
And even when we did local development (oracle forms) all the code was checked in to a central server which was used to produce a daily build.
How do you think IBM mainframe development is done?
Thanks for explaining. That's certainly a very different way of working from anything I've ever encountered. I don't know enough about mainframes to even wonder how things are done, it simply isn't part of my world at all.
I come from a personal computing background, where it really doesn't make sense to talk about "developing on exactly the same hardware it's going to run on" because you cannot know in advance what that is going to be. The kind of thing you are talking about makes even less sense in the embedded world, where cross-compilation is the rule and the target machines are likely not capable of self-hosting a toolchain at all.
I actually think those subtle differences are more likely to bring problems to the front earlier. If you have code that works on 20 slightly different setups, then it is likely to be more robust than code that has only ever seen one setup.
Obviously this is quite subjective, and depends a lot on what type of development you are doing.
Who handles support for this? Is it Google, or one of the carriers that Google's using? Who do you talk to if the handoff between carriers doesn't quite work? I've skimmed the FAQ and I didn't see that listed in there.
For context, I use Google Voice, and while it's really nice in a lot of ways, it has some unfortunate bugs that have persisted for years that it seems impossible to get support for. I no longer unconditionally recommend it to people because while being able to take calls from my computer and get transcripts of voicemails is great, it also pseudo-randomly decides to only direct some calls to devices where I'm not present with no notification on devices that I'm using and also drops a small percentage of texts (my guess is around 1%). Most people I talk to are horrified by the idea of texts getting dropped, and I don't recommend GV to those folks, although it's fine if that's a tradeoff you're ok with.
I understand that it's a beta and that bugs are expected, but it's a problem when you run into a bug that interferes with your ability to use your phone and there's no way to get support for it.
Phone carriers aren't exactly known for their stellar customer service, but at least with them, if you call back enough times or escalate to higher level support, you'll eventually get a resolution. As far as I can tell, that's not true of Google's phone related products.
Their support for the Nexus line of phones has been outstanding. Anyone can phone and speak to real humans with good tech knowledge, arrange a free replacement for (first time) cracked screens, and get next day replacements.
I have no doubt that the support for this will be as good.
Now, their automated side of advertising is crappy unless you're a large spender, but then that's just economies of scale.
Basically you have to roll a dice whether you'll have sound in voice calls on a Nexus 4 running Lollipop. 500 comments, 1200 stars, and the issue is simply closed as "wrong forum" with no other comment than "contact customer support".
I mean, I'm glad that Google supports so many camera features, as that's clearly the most important function of a Nexus 4, but at the end of the day I still expect my phone to be able to, you know, make voice calls.
If the past is any indication, Google will have really good support at first for this, similar to how it worked for Glass. Then once the product has been received well by early adopters (and received good publicity), they will layoff/downsize until everyone is unhappy.
Obviously I understand your trepidation regarding Google's support, and the proof will be in the proverbial pudding, but they at least claim they'll be providing better support than for most of their other products:
LIVE SUPPORT AVAILABLE AROUND THE CLOCK
If you need help, our support team is in the US and available all day, every day. Give us a call, and don't be surprised when you connect right away to a Fi Expert without pressing 0.
This is really neat and I love playing with stuff like this. But, fundamentally, it shows that stars aren't a good measure of anything besides stars, at least outside of the top scorers for commonly used languages. For those, it's not a bad proxy for fame.
Turns out I'm also the 240th most starred scala developer worldwide. I once used scala for two months and created some projects to help me learn that aren't even close to being polished enough to be useful to anyone. Like most code written by someone who's learning a language, it's not any good. But that somehow puts me at 240? Even in a pretty popular language, by the time you get into the hundreds worldwide (or the top few in most cities), it's people who just threw up some toy projects.
I wonder if this explains why I've been getting recruiters contacting me "because they saw my scala code on github". I doubt anyone who's actually seen my scala code on github would contact me for a scala position, but someone who uses a tool that counts stars might think that I actually know scala and contact me for a scala position. This particular tool is too new to be the source of that, but the page the source data comes from (github archive) shows how easy it is to make BigQuery queries to return results like this.
For Julia, I'm also presently ranked above all of the co-creators of Julia, despite having spent a total of perhaps 20 hours ever using the language (I'm 72, compared to the co-creators, who are 113, 143, and unranked).
BTW, in languages I've actually worked in professionally, I'm 98,582/244,375 in a language I used for years before it became trendy, 1,100/1,835 in a language I've used a lot recently, and 75,998/161,465 in a language I've used some recently. In the language I'm most proficient in, the language I'm mostly likely to reach for if I just want to get things done, I'm 14,800/25,094.
P.S. If the developer is reading this and wants bugreports, your service returns a "503 Service Unavailable" if you click the "top foo github developers in your city" for developers that don't have an associated city.
Am I the only one who thinks that this kind of design question is the new "how many gas stations are there in Manhattan?"
In both cases, the point isn't to get the right answer, but (allegedly) to see how the person thinks. With the estimation question, the trick is to make up some plausible-ish numbers and then multiply and/or add them to get a plausible-ish result. If someone's seen one of those before, they'll nail it for sure. If not, it's a crapshoot. In theory, you're measuring whether or not someone's able to reason well enough to spontaneously estimate something off the top of their head, but comparing the number of people who've seen those question before vs. the number of people who can answer that type of question without having ever seen it before, what you're really filtering for is people who have heard of or seen Fermi questions.
A while back, someone's interviewing experience got posted to HN and they mentioned that they got asked to design a URL shortening service maybe five times. They failed it the first couple of times and got progressively better each time. I doubt the interviewers meant to measure whether or not this person had done enough interviews to catch on to the style of questions that are currently trendy, but that's what they actually measured.
I think the most common objection to this is that the point of these questions is to drill down and figure out how the person really thinks, but in empirical studies on interviewing, they find the actual evaluation of chatty questions like this is heavily influenced by all kinds of biases. Techniques that have more clear-cut evaluation criteria, like work sample tests, and even completely non-technical interviews like behavioral interviews end up being better filters. Even then, the filtering isn't "good" (IIRC, the last time I read one of those studies, work sample tests score the best, and had a correlation of around .5 with an ideal filter), but it's better.
For these kinds of questions, even if you haven't seen the specific question before, there's all sorts of interview gamesmanship that helps tremendously. One thing in the blog post, not spending an hour on requirements gathering, is an example of that. In real life, if you're going to write an application from scratch, spending more than an hour on requirements gathering is perfectly reasonable for a lot of problem domains. But if you're playing the interview game, you know that you have, at most, an hour to sketch out the entire problem, so you have to cut the requirements gathering phase short (but not too short). The post mentions that this gets at real skills. That's true. It does. The problem is that everyone who's seen this kind of question five times is going to be good enough at the interview gamesmanship that they don't need to have good real skills to breeze through the "don't spend too long talking about requirements" sub-filter.
The paradox is that the best programmers tend to be the worst at interviewing (because they usually get hired quickly) and the worst programmers gradually get better at interviewing (after doing lots of them).
The difference here is that the Monopoly question is asking the candidate to reason about how to solve the problem with a computer program. The author admits that there can be biases, but isn't that the case even with work sample tests? Someone could have a novel solution to a problem, and the reviewer could say "oh, that's too clever to maintain."
I think it's a fine question. You are asked to design a program that would do something sufficiently complex and ill specified, which is something you will encounter on the job.
I am not saying that you shouldn't test the coding abilities of your candidate, but it looks to me a good way to see how he/she reasons about the various components of the system and asks for requirements.
“Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages.”
But how did they determine that?
They then looked at commit/PR logs to determine figure out how many bugs there were for each language used. As far as I can tell, open issues with no associated fix don’t count towards the bug count. Only commits that are detected by their keyword search technique were counted.
After determining the number of bugs, the authors ran a regression, controlling for project age, number of developers, number of commits, and lines of code.
That gives them a table (covered in RQ1) that correlates language to defect rate. There are a number of logical leaps here that I’m somewhat skeptical of. I might believe them if the result is plausible, but a number of the results in their table are odd.
The table “shows” that Perl and Ruby are as reliable as each other and significantly more reliable than Erlang and Java (which are also equally reliable), which are significantly more reliable than Python, PHP, and C (which are similarly reliable), and that typescript is the safest language surveyed.
They then aggregate all of that data to get to their conclusion.
I find the data pretty interesting. There are lots of curious questions here, like why are there more defects in Erlang and Java than Perl and Ruby? The interpretation they seem to come to from their abstract and conclusion is that this intermediate data says something about the languages themselves and their properties. It strikes me as more likely that this data says something about community norms (or that it's just noise), but they don’t really dig into that.
For example, if you applied this methodology to the hardware companies I’m familiar with , you’d find that Verilog is basically the worst language ever (perhaps true, but not for this reason). I remember hitting bug (and fix) #10k on a project. Was that because we have sloppy coders or a terrible language that caused a ton of bugs? No, we were just obsessive about finding bugs and documenting every fix. We had more verification people than designers (and unlike at a lot of software companies, test and verification folks are first class citizens), and the machines in our server farm spent the majority of their time generating and running tests (1000 machines at a 100 person company). You’ll find a lot of bugs if you run test software that’s more sophisticated than Quickcheck on 1000 machines for years on end.
If I had to guess, I would bet that Erlang is “more defect prone” than Perl and Ruby not because the language is defect prone, but because the culture is prone to finding defects. That’s something that would be super interesting to try to tease out of the data, but I don't think that can be done just from github data.
You've done a good job of pointing out weaknesses in the paper, but I have a question about your defect rate argument: Would you hold the same position if the study had said the opposite? In other words, if it claimed that Ruby and Perl had more bug fixes (and thus defects) than Erlang and Java, would you claim the study was flawed due to a culture of meticulous bug-finding in Perl and Ruby?
From Conservation of Expected Evidence:
If you try to weaken the counterevidence of a possible "abnormal" observation, you can only do it by weakening the support of a "normal" observation, to a precisely equal and opposite degree.
It really seems like a stretch to say that higher bug fix counts aren't due to higher defect rates, or that higher defect rates are a sign of a better language. Language communities have a ton of overlap, so it seems unlikely that language-specific cultures can diverge enough to drastically affect their propensity to find bugs.
I'd love to see a methodology similar to the one here analyzing concurrency bugs: http://www.cs.columbia.edu/~junfeng/09fa-e6998/papers/concur... . Akin to what's done in social sciences, they applied simple labels to bugs in the bug repos -- a grad student and some undergrads can label a lot in a couple weeks -- and regress on that.
Personally, I have read the abstract alone, and was pleased to see it nurtured my own preferences: static typing is better, strong typing is better. But then I saw this disclaimer:
> It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.
So I dismissed the paper as "not conclusive", at least for the time being. I wasn't surprised if their finding were mostly noise, or a confounding factor they missed.
By the way, I recall some other paper saying that code size is the most significant factor ever to measure everything else. Which means that more concise and expressive languages, which yield smaller programs, will also reduce the time to completion, as well as the bug rate. But if their study corrects for project size, while ignoring the problems being solved, then it overlooks one of the most important effect of programming languages in a project: its size.
Having used MySQL in the past and vowed never to use it again, I would not be especially surprised if the MySQL code base was unusually buggy. On the other hand, high use rates could well lead to high bug discovery rates.
>> why are there more defects in Erlang and Java than Perl and Ruby?
I have no experience with Erlang, but one reason I'd expect Java to have more defects than Ruby and Perl is that Java is more verbose, i.e. it takes more code to get something done. One would naively expect to find an association between the size of commits and their propensity to contain errors.
The nice thing about the paper is that they try to put data where others put guesses, rants, expectations, and believes. Arguing with data is actually difficult (and rare) in the software engineering field.
The practical result is that language choice doesn't matter since the effects are very small.
The standout result for me that led me to believe this was very likely the case was that Erlang scored _horribly_ on concurrency bugs. To me, that makes sense: you're seeing all the bugs that come from trying to tackle tricky high concurrency situations, which is why people picked Erlang in the first place. If people tried to tackle those exact same problems in other languages, we'd probably see them doing worse at concurrency.
Not to mention the Erlang mantra of 'let it fail'. They're tracking bugs, not severity. Someone may file an issue "hey, in this instance, this thing goes wrong", but because of the supervision process, it doesn't actually cause anything to break (just a logged error message and a bit of system churn). The language actively encourages you to code happy path, and address failure conditions only as necessary, relying on the supervision tree to handle errors that stem from deviating off the happy path.
This is one of those things that's well known by people who spend a lot of time doing low-level nonsense that really ought to get wider press. You really don't want to use x86 hardware transcendental instructions.
I've seen people try to measure the effectiveness of their approximation by comparing against built-in hardware, but their approximation is probably better than the hardware! If you're not writing your own assembly, you'll be ok on most modern platforms if you just link with "-lm" and use whatever math library happens to be lying around as the default, but it's still possible to get caught by this. On obscure platforms, it's basically a coin flip.
I used to work for another CPU vendor, and when we implemented more accurate sin/cos functions, some benchmarks would fail us for getting the wrong result.
Turns out those benchmarks hardcode a check for a couple results, and that check is based on what Intel has been doing for ages. My recollection is that we had a switch to enable the extra accuracy, but that it wasn't enabled by default because it was too much of a risk to break compatibility with Intel.
If that sounds too risk averse, there's a lot of code out there that depends on your processor precisely matching Intel's behavior on undefined flags and other things that code has no business depending on. It's basically the same thing Raymond Chen is always talking about, but down one level of abstraction. I've seen what happens if you just implement something that matches the "spec" (the Intel manual) and do whatever you want for "undefined" cases. You get a chip that's basically useless because existing software will crash on it.
I think the main reason why this issue hasn't been so significant is that the majority of code that uses those instructions isn't dependent on extreme accuracy - code that does would likely use their own routines and/or even arbitrary-precision arithmetic.
OpenGL is a great example of how painful it is to write apps that are capable of not breaking when the standard it relies on changes. You basically end up with capability checks or version checks which cause your code to become complicated.
I suppose there could've been a function that returned the implementation version of fsin/fcos/etc, which was incremented whenever Intel changed the implementation. That way benchmarks could test against that particular version. But it'd be hard to come up with a precise, consistent way to implement that. For example, do you increment the version whenever any transcendental function changes? If you do that, then you get benchmark code riddled with special cases for very specific versions, which someone has to constantly look up in documentation somewhere. You'd basically have to memorize the effect of every version increment. On the other hand, you could implement a more elaborate scheme, like having a version number for each of the transcendental functions, but then you'd either need to hardcode the number of version numbers returned by the API, or expose a way to add to the total number of version numbers in case someone wanted to add another function, which is extra API complexity.
I'm not necessarily arguing against having some kind of process by which apps could have the implementation of sin/cos/etc changed, just explaining how it gets complicated pretty quickly.
One thought on version number is that the version number could be the product of prime powers.
If function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1, we can encode this as 2^5 * 3^3 * 5^7 * 7^1 = 472500000. This lets us unambiguously increment orthogonal values while still keeping them represented as a single number. We could even easily add a function 5 by multiplying the number by 11.
One problem is that it's not super dense (storing those 4 values takes ~29 bits), but a 128bit or 256bit number would store a larger value than is likely needed by most applications.
One benefit is that software can check for the functions that it's interested in (as long as we can agree on what prime corresponds to what function - a relatively mild standard, if we just use order of introduction) while ignoring any portion of the result it doesn't understand.
It's easy to come up with much more compact schemes that are still completely generic. For example, to represent a variable length number use xxx0, 0xxxxxx1, 0xxxxxx1xxxxxx1, etc. Normally you need a pair of these to specify (version,function). But if we assume functions are packed, you can have a run-length scheme (run-length,starting-function-number).
So "function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1" is four functions starting with 1 = (1000,0010),1010,0110,1110,0010 = 24 bits.
It gets better quick with larger numbers of functions.
BTW, the size of the prime scheme is log2(nthprime(function))*version bits for each function. If you don't know ahead of time which functions there might be, then you have to do a prime factorization, which is a hard problem. I guess if you used really large numbers you could have a cryptographically secure way of indicating which versions of which functions are in your library.
You have maximum density for version 0, but when versions get sufficiently high then the numbers of functions there are sequences of bits with no valid interpretation because they represent primes or multiples of primes > the number of functions. Also increasing the versions of functions represented by larger prime numbers requires more bits than increasing the versions of functions represented by smaller numbers.
I think I would just represent the version of each function with a byte.
D'oh! Yes, any possible library contains only a finite number of functions to version. Even if we order the functions in decreasing order of update frequency, this scheme will overflow quickly if more than one or two functions update regularly.
NM, That's space efficient for 0's, but only 0 values. Your probably better off with something easy to decode. aka 2 bytes per function for an ID = 50,000 functions for 100kb which is hardly an issue. And if you really need to have more than 65,000 versions of the same function you can always deprecate it. An if your really only tracking 10 functions then that's still only 10 bytes which is hardly an issue.
55th prime > 256. So incrementing the version number adds more than one full byte to your number.
It's certainly true that it has a nasty growth curve (exponential in version number; probably worse in number of active terms).
I just think it's fun for low value things, and interesting if you're in the situation you have a huge feature set relative to the number of active features, since it incurs no penalty for fields that are 0. Any time you have about ~1% features being present, it can make a difference. Example: if you have about 5 of 500 fields with 16 possible values represented, storing all the 0s is expensive. Using a naive 4-bits-per-field, you end up with 2000 bits, whereas, using my method, you only use ~620. Even using a 500bit number to flag which fields are present and then listing just those fields in order only saves you about ~100 bits over my method.
Plus, I manage to spark off a discussion about math and the problem, rather than just whining about it being hard, which I consider to be a good thing.
In the natural sciences, reproducibility means that a separate team run an experiment on independent apparatus, configured from written information (e.g., the paper). Not that you re-run the experiment on the original apparatus set up by and made available by the original authors. Clicking a button is not "reproducibility" in that sense, though it is better than not being able to do even that. Shipping code/VMs/etc. and having a 2nd team just re-run it has too much of the original lab in it to be real replication; it's more like just inspection of the 1st setup. Which is better than no inspection, but worse than independent reproduction of the results.
Applied to CS, there's really no way around it: reproduction requires that researchers claiming to replicate a result implement it independently. If there are not two independently produced implementations that both confirm the result, the research hasn't been reproduced. The process of doing so helps to discover cases where the original results were due to idiosyncracies of the original implementation, test setup, etc. It'd even be ideal if replication was done in as dissimilar a setup as possible, to find cases where the results unexpectedly depend on details of the original setup not thought to be important.
If anything I think there are quite some dangers that reproduction will be decreased by the current trend towards what's questionably called "reproducible research" as a euphemism for "code reuse". If people reuse code rather than doing their own independent reimplementation of methods as stated in papers, erroneous results can lurk for years and infect other research as well. (I think code reuse is good for engineering practicality, and making permissively licensed code available is also a good way of getting researchy methods out of academia into the real world. But I think it is quite wrong to call reusing the original researcher's setup, rather than independently producing your own, "reproduction" in the scientific sense.)
The necessity of truly independent reproduction diminishes when the researchers/engineers can supply an implementation that does real work in real situations. A very high bar of self-evidence, if you will.
True reproduction remains necessary to confirm that the stated underlying science is the cause, and not some other variable that might be legitimate and remarkable but undocumented.
Providing an implementation of a network protocol is hardly doing "real work in real situations". I mean, it ships the packets from here to there but so does every other implementation of every other protocol.
The thing that is interesting or novel about this protocol as opposed to the thousands of other proposals is that it's supposed to be faster than the other proposals for a broad range of applications with a broad range of link topologies and across a broad range of software/hardware platforms.
Of course, to establish that claim replication is completely critical.
Completely agree - any work on 3D algorithms seems particularly prone to this (presumably due to funding sources?)
I once went so far as to email the author of a paper to ask for their implementation of a Minkowski 3D hull algorithm, and they point blank refused. Now I don't believe I'm entitled to this of course, it's just strange you can claim something in a paper and for it to not be reproducible.
Oddly I think it's the triviality of sharing implementations that make them more guarded.
Cutting costs isn't IBM's strategy to get out of the recession. Cutting costs is IBM's strategy. I interned there in 2003, and it was already apparent that they preferred to hire people overseas and let attrition reduce the ranks of folks locally in order to reduce costs. From all the complaints, it was clear at the time that was causing serious problems with R&D.
The joke around the office was that we'd replace one person locally with three people overseas, losing the output of two people, since the three overseas people were so clueless they'd need a local person holding their hands full-time. I was doing microprocessor design at the time, and the community is small enough that everyone know that Intel was getting good folks overseas. But even at the reduced wages outside the U.S., IBM was cutting corners and not hiring the best people.
Even locally, they try to reduce costs. A friend of mine who stuck around long enough to make it into management told me that they try to keep salaries at about the 40%-ile to save costs. On finding that out (as well as a few other gems), he left. Until I heard that, I couldn't figure why brilliant friends of mine often got raises that didn't even cover inflation. Don't they know that people are going to leave because of that? They know, and that's their strategy.
The thing about Bernanke comes out of left field. IBM didn't use low interest rates to invest therefore "the companies that were expected to spend us back to better economic health didn’t do so" therefore low interest rates didn't make the recession less severe than it otherwise would have been? No comment on whether or not those last two statements are actually true (I'm not an economist and haven't studied the issue), but Cringley certainly doesn't make a case for them.
My wife worked at IBM for many years and every year, like clockwork, right before the quarter was up, they'd shut down the power to entire IBM sites for a day or two to save money. Everyone would be told to work from home. That's how important cost-cutting is to IBM.
I interned there too in 2011 and can testify that employee morale is ridiculously low. No one loves the company, the only people who were working hard were the lifers who'd joined after college and were still there at 40 or 50.
This is why I left IBM (SWG) a few months ago. When I turned in my notice after 1.5 years at IBM, people, including very senior people and managers, cheered me on saying things like "I wish I could leave" or "I put my notice in 6 months ago but they gave me a huge raise...".
When the financial gimmicks run out, IBM is absolutely screwed because they've been cannibalizing their most valuable asset for short term shareholder value.
I was a manager at IBM at the beginning of off-shoring. I too left because I disagreed with the model, in principle. However, the statistics did show that the company could save ~60% for the same quality of service, immediately. What was unclear and highly debated at that time was how sustainable that saving was and how developing economies would impact on those decisions.
>IBM didn't use low interest rates to invest therefore "the companies that were expected to spend us back to better economic health didn’t do so" therefore low interest rates didn't make the recession less severe than it otherwise would have been?
Investing in your own shares doesn't lead to growth like capital investment does, it's just more money at rest. Low interest rates improved the recession through improving the stock market - through 'trickle-down'. That's why the stock market can be raging while the larger economy is in the crapper.
So low interest rates can simultaneously make the recession less severe than it otherwise would have been, yet make the growth potential of the economy worse.