Hacker News new | comments | show | ask | jobs | submit login
What is a coder's worst nightmare? (2014) (quora.com)
476 points by oskarth 793 days ago | hide | past | web | favorite | 224 comments

"Best Fit" memory management. Malloc three buffers, in order: 64K, 1K, 64K. Now free all three. Next allocate 1K. Guess where it resides in the heap? Yep: the middle! After a while, memory was fragmented with 1K detritus. Even though you were supposed to have 500K available, there wasn't room for a 64K buffer anymore. This was the default for Microsoft's C compiler, in 1990. How did I find out?

There was a project that was crashing into a blue screen of death after a few hours. The crashing library had #defines that went 15 levels deep. The stack dump wasn't very instructive. Even though the library ISV was based in Boston, I had to fly out to Santa Cruz to talk to the developer directly. He was somewhat belligerent: "you should have figured it out yourself!". 15 levels of #defines. He was the perfect example of the other side of Dunning-Kruger thinking his arcane shit was obvious.

But, that didn't solve the problem. His fix didn't. So, I flew back to Boston. I started to trace everything with -ahem- #defines of the main APIs. It was turning into a Zen Koan: the sound of one app crashing. Everything worked with the Borland compiler. It was just the MSC compiler that failed. The company was one of the top 3 ISVs back then. Meetings about what to do would last for hours. There were technical politics: "Can we use Borland?" No; the company had standardized on MS. I could talk to their engineers directly, though. But, they didn't have a clue. So, I read though the docs to see what compiler switches were available. And there it was: best-fit. Fixed!

So, I wrote an analysis about why best-fit was absolutely the worst possible memory management and sent it to Microsoft. It took 8 years for MS to change the default. So, the reason why software would blow up? Best fit. Why early versions of Windows with go into blue-screen-of-death? Best fit. How many developers had to deal with this? Worst nightmare in aggregate.

You just brought back old memories for me. I wrote the heap manager for Borland C. I chose the next-fit algorithm for its balance of speed and reasonable fragmentation under common use cases. Best-fit performs exactly as you described.

I pulled out my old Minix (1987) book, which I keep for nostalgia. Tanenbaum on p. 202 writes:

> The simplest algorithm is first fit... A minor variation of first fit is next fit. ... Simulations by Bays (1977) show that next first gives slightly worse performance than first fit. ... Another well-known algorithm is best fit. ... Best fit is slower than first fit.... Somewhat surprisingly, it also results in more wasted memory than first fit or next fit because it tends to fill up memory with tiny, useless holes. First fit generates larger holes on the average. ... one could think about worst fit... Simulation has shown that worst first is not a very good idea. ...

> quick fit ... has the same disadvantage as all schemes that sort by hole size, namely, when a process terminates or is swapped out, finding its neighbors to see if a merge is possible is expensive. If merging is not done, memory will quickly fragment into a large number of small, useless holes.

Man, oh man, BC was the bomb! In a good way; maybe I should exclaim that it was the anti-bomb!

Wait, the three buffers should be merged together upon free(), if they're consecutive in memory, isn't it?

What did I miss?

Typical memory managers put blocks onto a free list during free() calls in order to reuse them later. I wrote a commercial memory allocator library in the 90's and I always coalesced consecutive blocks in free(), which limits fragmentation and speeds up allocation by avoiding long list searches. But even then many libraries did the easy choice of the free lists.

@coldcode, was that SmartHeap? The 90's was the era of 3rd party tools for memory pooling and GuardMalloc for debugging.

No I competed with them and basically pushed them out of the Mac OS market. But OS X came along and mine was no longer necessary.

That would be, indeed, a poor memory allocator implementation. But unlike what parent said, it's definitively not a failure of the best fit algorithm.

> the company had standardized on MS

This is a whole topic in itself. There has to be some good medium between consistency and best tool for the job.

For example, IBM determined that, despite Windows dominating Desktop/laptop market share ( https://en.wikipedia.org/wiki/Usage_share_of_operating_syste... ), Macs were cheaper to use ( http://www.businessinsider.com/apple-says-ibm-saves-270-for-... ). Now you could go further into that and ask whether OS X is the right tool for everyone's job, as I'd argue in some cases it isn't. However, I think that it shouldn't be that difficult for a company to be open to letting their employees choose the hardware, software, and other tools to get the job done. If the budget is there, then why not have the discussion, and if it isn't, allow the employee to make the case for an exception.

The same could be said for priorities and tasks. Allow your employees to make the case for what they should be working on, and allow group discussion if appropriate. You may be surprised at what good could come of it.

Not having a job is my worst nightmare. Realizing your skills are being undercut by a fresh college grad who can get enough pats on the back while skidding along with code that works. Realizing the language you cradled has now become outdated and the other engineers that have found out the new next 9 years of hotness are too good for you to compete with. Having to interview for a new job and realizing that you can't pass the white board questions because you have fallen behind with the expectations of young blood. That's my nightmare, becoming irrelevant.

As much as I hate to say it: it's a healthy nightmare. I have it too, though I came to development late in my career (My career was more managing technologists in IT depts, but I turned out to be good with Oracle and PostgreSQL and other development tasks and now get more requests for that... which pays as well and is more fun to boot).

When I was young, I would go into interviews sometimes where the other candidates where there in the lobby, too. Some were older and clearly had been in technology a long time... but never advanced past where they were in their "prime" as you made small talk. They weren't going to get the job nor should they have. They allowed themselves to become as relevant as token ring.

As technologists we have to keep ourselves current. That keeping up with the kids is really part of our profession: the technology evolves quickly and so must we. But this is the part I like about technology work. There's always something next to keep things fresh. A new technology, a new architectural style, new needs, unsolvable problems that become solvable. Our profession doesn't tolerate coasting. Good.

Other professions have this same problem, just not at our speed. The proverbial buggy whip makers had to evolve with changing realities... they just had more time to see the inevitable.

(As an aside, I play poker with a dedicated COBOL developer in his mid-50s... so sometimes you can make old school pay, too :-) ).

It's nightmarish because you can't control it - software is driven by fashion. You can carefully choose a good technology, patiently master it, then find there's no demand anymore because a technology that's worse in every objective measure, is getting all the blog posts and retweets this year, and all the people into it are reinventing the wheel you knew 20 or 30 years ago, all of them claiming to be innovating and inventing new stuff. Like the people who knew Smalltalk in the 90s and were poised to take software engineering to the next level but got stomped on by Java, say. Or the Perl people who were finally getting their act together with Modern Perl before they were stabbed in the back by the Perl 6 camp. There are dozens of examples. Sometimes it comes back around but if you knew ML in the 70s there was an awfully long wait for F# to go semi-mainstream...

> Like the people who knew Smalltalk in the 90s and were poised to take software engineering to the next level but got stomped on by Java, say.

Side-tidbit: considering the way you said that it sounds like you might already know this, but some guys working on a variant of Smalltalk called Strongtalk were temporarily whisked off of that project and commissioned by Sun to write Java.

I understand that full compiler and VM source are now under the BSD license. It was last poked in '06. It probably needs cleaning up.

The code is ancient and crufty and only runs on Windows. It also only supports IA32, so no IA64 or ARM. There's been some motion towards building it on Linux; I don't know whether this got anywhere.

StrongTalk's a JIT, so it's going to be painful to port, unfortunately. And JITs have moved on a long way since then. I'd really like it to be resurrected and brought into the modern age, but in all honestly I think it's just dead.

That's really sad.

Thinking about what you said gave me an idea.

Real-world native languages fall out of use forever on a daily basis. Linguistic historians want to preserve the unique attributes of these languages, because they each tell a story and add to the character of humanity's history.

Computer languages share some of these attributes in a broad sense, but these must also be offset with the technical merit(s) of the language in question: if a language's foundational concepts have become outdated, continuing to use that language outside of a job description is a liability.

However, if a language's fundamental design is still interesting, but the low-level technologies used to implement it have fallen out of favor (for example, targeting IA32, or only functioning on an obsolete processor, etc) that language will also still become effectively "dead," and arguably unfairly so.

It could be said that there are language designers and language implementers. The designers have sufficient scope in their mental model to sweep across the domain of language design and implementation and distill that into a functioning codebase. Implementors, on the other hand, just want to fulfil a spec.

I wonder what would happen if there were a website out there that listing all "dead but worthy" languages along with instructions on how to get them running? Does anything out there like this exist?

I'm envisaging students, language enthusiasts, etc, modernizing these old language specifications - which would allow the community to readily reasses the technical merits of the language in question. It might ultimately be demoted again, but we'd have a reasonable, modern justification of why.

I would claim the antidote to this is specializing in a domain and not in a language.

Absolutely. This is kind of what I was getting at in my comment elsewhere on this post about forms and reports. If you know the data needed by a given industry, how it originates, how it is used, who all the people/roles are and what they need and how they interact, then you can safely ride each technological fad as it happens.

I'm a grad getting in to webdev, and I find this fascinating - is it really that bad? Every company I've interviewed at so far has been lenient as long as you know how to solve a problem in a somewhat relevant lang/tech.

Yes, no qualification. I've been part of the hiring process several times. Unless you're exceptional, hiring is brutal for older devs.

I would worry about it and I'm only close to 30 but I'm self employed now.

I am 44 years old. I go on an interview binge every couple of years or so, because I normally work remotely from home and I get bored with not having anyone to talk to (professionally). The last time I went on a series of interviews (in Dublin at the end of June 2015) I got about four NOs and six YESes. I think I'm a very good programmer but not an exceptional one :)

I think it's more like Uncle Bob says: the number of programmer has doubled every five years. A consequence of that is that those of us who were in it for a long time (30 years in my case) appear to be surrounded by a sea of youngsters with no clue. Personally, I realized recently that not only I don't have a fear of losing my job - I'm starting to find it strange when others tell me about it. (I'm re-reading Clarke Ching's Rolling Rocks Downhill and I find it harder to empathize with the protagonist than it was a few years ago.)

How do you handle the clueless? How do you handle the director of development who wrote the system the company is based on yet refuses to learn anything about relational dbs or why storing megabytes of data in the session makes it impossible to scale?

I leave. This is related to the "not being afraid of not finding a new job" :)

I have done it twice in the last five years - in both cases because I wasn't allowed to improve the quality of the code. In the first case, it was because "the client isn't paying for that, they're paying for features". In the second case, because "that code was written by the NY office and they're going to be upset if we change it".

Consider that an industry with a high churn is always looking for fresh meat. The analogy to fashion is no coincidence.

I've found a trick that works for me: if tech is less than 5 years old and not some industry standard I can safely ignore it. Once it passes the 5 year mark I'll spend time on it.

I don't find it an issue at all, but I also do follow the current technology trends, at least to the level where I can talk about them intelligently.

I have been learning new technologies pretty much continuously. It's not impossible, especially if you follow sites like Hacker News, to keep a finger on the direction of the industry, and then try to stay on top of the next new hot technology of the year.

But I hear you on the "worse" technology sometimes winning. You mentioned Java; it was worse than just about all other major contenders, and is only finally losing popularity.

On a current technology fad: React seems to be designed to ignore 40 years of accumulated software best practices. [1] Separation of concerns? Who needs that any more? And the rationale for it is that it allows teams of 100 developers work together on an app. Open standards? Nah, how about lock-in to custom language extensions that will prevent you from migrating your code to the next web standard! Much better.

And how many app teams have 100 or more active developers? Probably fewer than a dozen, and I submit that none of them probably should. Certainly not the Facebook app: It has a lot of features, but not that many features, and yet it has a 150Mb footprint. When I hear things like that, I can't help but fill in "junior" or "mediocre" in front of "developers." React helps to prevent people from breaking each others' code when you have bloated development teams filled with junior developers. React has some cool ideas, but all told I think it's a step backward for software engineering, and certainly isn't as much of a help for small teams, especially if you want to have a CSS/SCSS/LESS expert styling your product without having to dig through JSX files, for instance.

The Java rationale was similar, IMO: You can use the Java abstractions to allow lots of mediocre developers to make contributions to a product without breaking each others' code. At least not as frequently as when they can poke right into random data structures and change things arbitrarily. If it weren't for Google's decision to require Java for Android, I think Java would be relegated to big company web backend development.

I do like React's idea of the Virtual DOM for optimization, but you can get that without using React. [2] React Native is great for using native components and driving them from JavaScript, but it's also not the only game in town. [3]

Back to the original point, though: You can stay on top of the Hot New Technologies, but when there are good technical reasons to use alternate technologies, stay on top of those as well. And then explain clearly to your clients (or employers) why the current fad is a fad, and how to get the key benefits of that stack without its drawbacks. Oh, and choose clients (or employers) who will listen to strong technical arguments. :)

[1] https://www.pandastrike.com/posts/20150311-react-bad-idea

[2] https://github.com/Matt-Esch/virtual-dom

[3] https://www.nativescript.org/

> On a current technology fad: React seems to be designed to ignore 40 years of accumulated software best practices. [1] Separation of concerns? Who needs that any more? And the rationale for it is that it allows teams of 100 developers work together on an app. Open standards? Nah, how about lock-in to custom language extensions that will prevent you from migrating your code to the next web standard! Much better.

That's some strawman right here. First, JSX is optional. Sencondly, JSX is open. You may disagree whether it's a standard or not, but you know what? If tomorrow you end up with a large codebase where you want to get rid of JSX, you just apply a JSX transpiler on your existing codebase. Problem solved. As for separation of concerns, you have a little bit more of a case. React does allow you to put business logic in your views... just like 99% percent of templating languages out there. But React does not force you to do that. You can have only the minimum amount of logic you need, and put most of your frontend business logic in your store.

It's not just about the JSX part. That's a piece of the larger problem: The fact that the HTML & CSS are being built up in JavaScript at all. This troubles me at a deep level, and no matter the approach you use, it won't likely be portable to another framework.

Templates using a "handlebars" syntax are pretty portable with minor tweaks, in general.

This happened because CSS and HTML are a totally broken model, useless for writing larger applications. They don't provide any sort of modularity / encapsulation - everything is in the open, present inside one big giant namespace with class names colliding with other class names and styles cascading through / being inherited by everything you want or don't want.

Web Components / HTML Imports kind of solve this though, but they're still not there. http://caniuse.com/#search=components

JavaScript has lexical scope, and that pretty much solves everything. CSS styles are now local variables and need to be passed around. Components are not just a dump of HTML, but opaque functions - their inner structure cannot be (accidentaly) accessed by outside components. Ah - finally, a sensible programming model.

Just imagine what would be possible if CSS class names were lexical, not global strings - and you could import them into HTML (perhaps with renaming). How big of a change that would be in terms of making things easier to separate.

Well... React users got tired of imagining things :) http://glenmaddern.com/articles/css-modules

Of course Web Components solve it by reinventing all the possible wheels. Tim forbid that we get standard lexical scope and parameter passing.

Well, you're only building raw HTML for your lower-level components (in JSX or in Javascript). The difference with standard templating is that this considerably more composable. In this regard, it's not very different from custom directives in Angular (from what I can see, I don't do Angular). And from my point of view, having this mix is much preferable to the traditional HTML templating + jQuery soup.

You can't escape the fact that any non-trivial rendering needs some logic. If you don't let yourself get carried away, you get the exact amount of logic you need for your component in a reusable package. As someone currently porting a codebase from the dark ages of the web to React, I have difficulty expressing how nice that feels.

For me new has to be new. I learnt Tcl/Tk back in the 90s because I could rapidly develop a GUI on X11 but then started to use Tcl as a general purpose scripting language. Then I look at Ruby and Lua and Node.js and I think, what can any of these do that Tcl doesn't, just as well and usually better? E.g. all the async stuff in Node is old news in the Tcl world. Then I look at people who flock to the latest thing always and I think do I want to work in a world where every project it someone's first in an immature technology, or maintenance in a dead-end technology? That doesn't sound very satisfying to me, I want to get the foundations right and build on top of them, not build on shifting sands.

Seperations of concerns can be interpreted in multiple ways. You could say that the concern of one React component is to render one particular part of a view based on data. Or: "convert data to a view".

In for instance Angular you have the template seperated from the controller or directive. But in what way does that really help you? The controller and the template are inseperable. The variables inside of the template corrospond directly to the variables in the scope of the controller. You cannot see what the template is doing without having the controller next to it. In React you really see how the view is built up, without any magic.

The points you make on React make me really wonder if you did your research.

Yes I've been "doing my research." I've done everything but write an entire app in React, but I only have so much time.

I don't see Angular great either, for what it's worth, so please don't think that I'm holding Angular up as a "better" solution. Both are heavy in the wrong ways. I got roped into using Angular because of its popularity, but I'm actively looking for a more solid and optimized replacement. React should be exciting to me as an option, but the more I read about it, the more wrong it feels. I just listened to a couple of the React creators on a podcast, for instance, and what they were saying about cross-platform development in the podcast is just demonstrably wrong. Source: I've been writing cross-platform apps for 20 years (mobile for the last 8 years or so).

As for Angular 2.0: I'm reserving judgment on that until it stabilizes, though it looks better.

I'm personally at least partly on the fence about whether the HTML template Really Should Be Separate, to be honest. My first exposure to Polymer was at 0.5, and the way that Polymer did things seemed profoundly wrong as well. I haven't tried again with Polymer 1.0, but I feel like using Polymer and React with all of their standard idioms will make an app less flexible rather than more. And React performance on mobile simply isn't good enough for complex apps [1]; for sufficiently simple apps, why should I even care what framework I use? Even back when Zuckerberg was claiming that HTML5 apps couldn't perform at native speed, Sencha showed them that, yes, they can, if they're done right. [2]

Where I still feel uncomfortable about spreading HTML into all the components, embedded in the JavaScript that instantiates them, is that it feels entirely too much like what people did with OOP when taking it to extremes. You end up with objects calling objects calling objects calling virtual functions calling whatever...and unless you've got a debugger open and you're stepping through the code as it runs, it's almost impossible to actually reason about the code, or even to understand how it all hooks together, unless you understand the entire execution path.

With HTML templates, you can look and see the overall app layout -- what goes where, how the hierarchy is shaped, what components are being instantiated, etc. -- without having to run the app and call up a DOM tree to see what really happened. When you embed the HTML in components, and those components rely on other components, you pretty much need to run the app and step through everything to see what gets instantiated where, and then you need to dig through the code to understand why.

To me it resonates as a "bad code smell." [3] I've been cranking out code for thirty years now, so I've got a certain amount of experience with code. And I'm still willing to admit that my gut reaction may be wrong, but I may have to write an app in that style to see if it confirms my bias, or if it actually feels OK in practice. It wouldn't be the first time I'd changed my opinion on a topic (see: OOP above, which I was one of the worst abusers of at one time).

I think that there is a place for isolated web components, but that the place is for things like the Google Map Component, or any component with nontrivial complexity like that, that really needs to be developed in isolation and/or is intended to be used in unrelated projects, and isn't expecting to be styled based on a global stylesheet. For self-contained complexity in components, the component idea is great, and for that why not use the web standard?

CSS being done in React-JavaScript for basically all components strikes me as a misfeature. First, it breaks the syntax for anyone who's done styling before (in a way that, for instance, SCSS doesn't). Second, it limits you in what styling tools you can use: There are tons of tools like Compass, Bourbon, and Susy that add useful styling functionality to your apps, and using React you've cut yourself off from such support (or waiting for React plug-ins to be created). Third, when styling is isolated to a particular group of files, you can drop in a different group of files and restyle your app.

P.S. I'm creating these various Walls Of Text in part because I'm trying to solidify my own thoughts on the matter. Feel free to ignore if what I've said doesn't hold any value for you, but writing it does hold value for me. Thanks for your comments.

[1] https://aerotwist.com/blog/react-plus-performance-equals-wha...

[2] https://www.sencha.com/blog/the-making-of-fastbook-an-html5-...

[3] https://en.wikipedia.org/wiki/Code_smell

While separation of concerns is certainly good to have, Have we traditionally been using this in the right way ? I think not.

In particular separation should be enforced among components which is what React encourages, not between facets of the same entity as we have practiced since the beginning. Coupling of styling and behavior with associated markup is pretty much inevitable for web applications and decoupling those just reduces the cohesion.

I have posted a more elaborate answer in my blog [1], but finally I do appreciate your suggestion to rationally vet new technologies before jumping onto the bandwagon.

[1] http://lorefnon.me/2015/11/22/an-answer-to-react-is-the-new-...

I agree that it sucks when the "worse" technology wins, but your points on why React is an example of one of these "worse" technologies are seriously misinformed regarding its shortcomings and have largely glossed over some of its most significant merits over existing technology.

> React seems to be designed to ignore 40 years of accumulated software best practices.

Best practices in software is not set in stone. Sometimes you need to break from existing best practices of the past in order to arrive at truly powerful new approaches to solving problems that may not seem intuitive at first, but eventually manage to redefine best practices through their technical merits.

The Virtual DOM and re-rendering on every change is one such approach popularized by React, but it's definitely not the only one, nor is it even the most significant one, in my humble opinion.

Before React, mutable models was the only game in town. Angular imposed mutable models, Ember imposed mutable models, Backbone imposed mutable models, your average custom ad-hoc JQuery framework probably also imposed mutable models. Everyone blindly followed this "best practice" simply because this was the status quo ever since the earliest days of UI development.

React is the first JS UI framework that does not impose a mutable model. And as a result, the ClojureScript community was able to build on top of it and show the world that change detection can be implemented as a constant-time reference comparison when immutable models are used [1]. The immutable models approach was highly unintuitive (who'd have thought using immutable data structures in your model could make your app faster than using mutable data structures, which are intrinsically faster), but its results are clearly evident, and enables nothing short of a quantum leap in UI performance.

[1] http://swannodette.github.io/2013/12/17/the-future-of-javasc...

> Separation of concerns? Who needs that any more?

Your definition of separation of concerns seems to be the need to keep templates, styling and logic in separate files, which seems like a rather superficial distinction to me, to be honest. In any case, as mercurial already mentioned, JSX is completely optional, and React easily allows for you to keep your templates in a separate file from your logic and your styles.

Ironically enough considering your rather petty criticism, React and Flux-inspired architectures like Redux have in fact popularized a much more important separation of concern: the separation of application state vs operations on the application state.

This new separation of concern, along with Flux's insistence on a unidirectional data flow, has removed much incidental complexity from app state management, and enabled a whole new generation of developer tooling like component localized hot-reloading and time-traveling debugging [2].

[2] https://www.youtube.com/watch?v=xsSnOQynTHs

And this is the essence of why React has won over the likes of Angular, Ember, and Backbone. Developers have always been able to build great applications, even in the frameworks that come before React. But React and Flux-like architectures allows developers to manage complexity significantly better than in other frameworks that come before it.

This is why "it allows teams of 100 developers work together on an app". And as a front-end developer who has worked never worked in such 100 developer teams, and only on solo projects and in projects with small 2-10 dev teams, I can state with confidence that I can reap much of the same benefits of this reduced complexity. In fact, I probably benefit even more from this reduced complexity because as a solo/small-team developer, my complexity budget is absolutely tiny compared to what bigger teams can afford.

> Open standards? Nah, how about lock-in to custom language extensions that will prevent you from migrating your code to the next web standard!

The standardization process on the open web is painstakingly slow compared to the rate at which new technology is generally adopted and refined. This slow, methodical approach allows standardization committees plenty of time and input to think about every standard being proposed, but it's also one of the main reasons why very few libraries, even those with standardization as the end goal, begins as some kind of standard proposal.

It is much easier to gain traction as a standard if you already have an established implementation in a library/framework that is mature and well-adopted, and can demonstrate the merits of your approach. This is the approach taken by projects like TypeScript, and it's probably safe to assume that many aspects of React will be integrated into various standard tracks in the not too distant future.

>Ironically enough considering your rather petty criticism,

A lot of your response seems to be emotionally laden. Bad form.


I've been reading about Redux, and as soon as another app project comes along for me to try it out, I'm planning to give it a try.

As I've said more than once in this thread, to a large degree my own opinion is still forming on this topic. React does things that feel to me like "bad code smells," but it's possible that I need to adapt to the New Way of Thinking.

>And this is the essence of why React has won over the likes of Angular, Ember, and Backbone. Developers have always been able to build great applications, even in the frameworks that come before React. But React and Flux-like architectures allows developers to manage complexity significantly better than in other frameworks that come before it.

I've been writing software -- mostly games -- since I had to use assembly language for everything. Having a language (Java!) or framework (in this case React) explicitly protect me from myself almost always ends up slowing me down and slowing down the resulting app (yes, even React [1]).

Despite this I'm probably going to give React a real try at some point, even if it's in a toy project. I've been coding for 35 years, and it doesn't take me long to get the flavor of a new technology and its limitations when I finally stick my teeth in. It's all about finding the time...

[1] https://aerotwist.com/blog/react-plus-performance-equals-wha...

> A lot of your response seems to be emotionally laden. Bad form.

You're definitely right, I apologize for that. I cringed at parts of the post myself when I went back and read it again, but by then it was too late to edit.

> I've been writing software -- mostly games -- since I had to use assembly language for everything. Having a language (Java!) or framework (in this case React) explicitly protect me from myself almost always ends up slowing me down and slowing down the resulting app (yes, even React [1]).

Yes, micro-optimized, low-level code will always have an edge in terms of absolute raw performance, but there's a huge cognitive overhead involved with working with code like that, and you simply can't afford to do it on your entire codebase if you want to build new features and make changes quickly. What frameworks like React offers is a way to architect your application in such a way that makes it amenable to global optimizations that obsoletes entire classes of performance optimizations that you'd otherwise have to micro-optimize by hand on a case-by-case basis. This gives you more time to actually work on features and more time to thoroughly profile your code and micro-optimize the parts that actually lie on critical paths in your app.

Regarding your linked article:

If you take a look at the Vanilla benchmark, you can see the rendering time out-pacing JS compute time as we approach the 1200 mark, whereas for the React benchmark, the rendering time essentially stays constant. This is one example of a global optimization at work: the Virtual DOM spares us from having to micro-optimize DOM rendering for all of our components.

Regarding the JS performance scaling characteristics, I believe it probably has something to do with this:

> Did you set shouldComponentUpdate to false? Yeah that would definitely improve matters here, unless I need to update something like a “last updated” message on a per-photo basis. It seems to then become a case of planning your components for React, which is fair enough, but then I feel like that wouldn’t be any different if you were planning for Vanilla DOM updates, either. It’s the same thing: you always need to plan for your architecture.

This brings me to another example for a global optimization that React enables (this one doesn't come by default in React, but it's the first JS framework that made it possible): the use of immutable data in change detection. This allows you to implement shouldComponentUpdate as a single reference check for every component across your app. The change detection for the state array in the example would then become a constant time operation rather than the much more complex deep object comparison that React had to perform in the example, which is probably the root cause of the poor JS compute performance scaling as the number of photos increased. I strongly recommend taking a look at the first link in my original post if you're interested in more details.

>Yes, micro-optimized, low-level code will always have an edge in terms of absolute raw performance, but there's a huge cognitive overhead involved with working with code like that

I disagree. Code written to be optimized for a particular use case may itself be challenging to follow, but using it, if it's just a component and has a well documented API, doesn't have to be difficult at all. The Vanilla code in that benchmark article wasn't particularly hard to understand, for instance, and it could be wrapped in a Web Component to isolate the complexity.

Think about OpenGL/WebGL and the complexity and math and parallelization and optimization tricks that API conceals. At this point writing pixel shaders is almost easy, and yet they enable insane levels of parallel math with very little cognitive load.

I've written game SDKs that concealed a lot of complexity and yet were easy to use [1], so my gut reaction is to want to start with generic components that don't restrict what I can do, and then build up a DSL that is very easy to reason about that the app is assembled from. Based on my development history, I'm also likely to be making apps that are more than simple CRUD (with real-time interactive features, including games), so my personal requirements and point of view may be a bit different than the typical front-end developer.

>I strongly recommend taking a look at the first link in my original post if you're interested in more details.

OK, I'll take a look.

[1] Look at the listings for "Playground Game Engine"; the list is incomplete, but it gives you an idea: http://www.mobygames.com/developer/sheet/view/developerId,13...

> I disagree. Code written to be optimized for a particular use case may itself be challenging to follow, but using it, if it's just a component and has a well documented API, doesn't have to be difficult at all. The Vanilla code in that benchmark article wasn't particularly hard to understand, for instance, and it could be wrapped in a Web Component to isolate the complexity.

I don't think we actually disagree. =)

By "working with code like that", I meant actually writing, understanding and changing micro-optimized, low-level code. Your example of building on top of a micro-optimized, lower-level SDK is the perfect example of a global optimization that alleviates some of the need for tedious, case-by-case micro-optimization from the code that uses it.

I'm just saying that micro-optimizing every single piece of code case-by-case is not the best use of our time, and that we should opt for global optimizations where ever possible, and that React and Flux-inspired architectures like Redux can enable some very practical global optimizations.

Alternative links:

[2] http://lhorie.github.io/mithril/ (using its own virtual Dom); I think there are other implementations around, too.

[3] http://cycle.js.org/ (have a native driver)


I've heard good things about Mithril performance. I'll need to take a look.

Virtual DOM as a panacea may be overrated, though. Maybe React's just isn't fast enough, but if you use another way to keep track of what parts of the DOM to change, simply pushing things into the DOM can be WAY faster than using a Virtual DOM. [1]

Cycle.JS still hits my "HTML in code BAD" reaction. An example from the front page:

      .map(name =>
        h('div', [
          h('label', 'Name:'),
          h('input.field', {attributes: {type: 'text'}}),
          h('h1', 'Hello ' + name)
So we're again mixing HTML and JavaScript, and in this case we're also reinventing the syntax. If you're going to mix HTML with JavaScript, at least using standard HTML syntax seems like a better solution.

A lot of popular frameworks are using this generated HTML approach, though (including apparently Mithril). I'm still trying to figure out if my negative reaction is justified.

[1] https://aerotwist.com/blog/react-plus-performance-equals-wha...

This is such a superficial aspect of the framework that it's really not worth discussing. You need to go deeper to learn it properly, and only then give verdict. In ClojureScript (Om, Reagent, etc), you have the same HTML building pieces with code. Also in Elm you use normal Elm functions, not a HTMLish DSL. Also, nowadays in Cycle.js we use the more friendly syntax:

      .map(name =>
          input('.field', {attributes: {type: 'text'}}),
          h1('Hello ' + name)
Which was inspired by elm-html https://github.com/evancz/elm-html .

> "HTML in code BAD" reaction

That's what it is, just an automatic reaction without any real practical argument. Do yourself a favour and "unlearn" that. Open your mind. You're doing yourself a huge disservice by turning away from new ideas with these knee-jerk reactions. Working with views as functions of data gives you much more than it takes away.

I'm considering doing exactly that (unlearning the reaction). Too many people are agreeing with you for me to ignore it.

On the other hand, every time I turn around I think of more things that it "takes away." This time it's an entire category of tool: The visual layout editor. I like it when my code can be data-driven, and data-driven using a standard format that can be saved using a (visual) tool.

I especially like it when my artists can tweak the page layout for me, but I don't like the idea of giving the artists files full of JavaScript or JSX to tweak.

Are there artist-friendly tools that can edit the layout of a React site/app, where the JSX can be modified in-place using the tool? It doesn't strike me as an impossible-to-solve problem, but it could be tricky to do well.

It’s no more "HTML" in your JavaScript than if you do `document.createElement('div')`.

`h` is method that returns a JavaScript object; not HTML.

I don't use document.createElement('div') in any but very rare situations either.

I used to describe myself as a web standards evangelist ten years ago and my first instinct when I saw React (both because of JSX and because it was from Facebook) was to laugh and shake my head (until I decided to "give it five minutes" and found out I was wrong) so I hope my perspective could make some sense to you.


Just to clarify: React is not about killing the separation of concerns. React is about one thing: converting application state to a component tree and rendering that component tree to the DOM via highly optimised diffing.

At the base level, React turns state into component trees. Similarly, Redux is only about managing state and changes to that state. These are fairly straightforward ideas but having them as clearly defined building blocks with an extremely straightforward API (Redux moreso than React but in the trivial case a React component as of 0.14 can be a plain old pure function) radically simplifies application development.

I'll say it again: application development. I haven't said anything about HTML and CSS yet. This is as relevant to the web as it is to native application development and it's not an accident that React has been decoupled from ReactDOM with the latter now being merely the implementation detail of going the last mile and rendering things to the DOM (or to HTML strings) and maintaining DOM bindings and diffs.


A quick interlude: JSX is not HTML. It looks like XML and in many examples it has HTML tag names in it, but it's not HTML -- nor is it XML for that matter. It's just (entirely optional but very useful once you get over the initial visceral discomfort) syntactic sugar for defining component trees (not DOM subtrees). The syntax is obviously based on XML but it is a lot simpler and the familiarity obviously helps.


One thing most people don't understand about React (not least because React isn't vocal enough about it -- just like Flux wasn't vocal enough about certain concepts until Redux came along and showcased why they're important and useful) is that you can and should distinguish between presentational components and application logic.

If you use Redux (or Flux -- but with Flux there's the problem that most things called Flux aren't actually Flux because nobody really understood it) most of your components will be entirely presentational and the logic will live outside the components except for a few so-called "containers" which are just extremely thin wrappers.

So with React+Redux you then have your component structure living in React components and your logic living in a few functions that describe transitions of the immutable application state plus a few "containers" that just describe how the state is applied to the API of the few "smart" components that actually need interaction.

Sure, HTML strings intuitively feel more appropriate for describing these components and Web Components definitely look more "HTML-like" in that regard, but at the end of the day you're still writing something XML-like with made up elements (whether they're Web Components or React "components") that needs to be processed by JavaScript before being turned into the actual DOM.


One major change React brings to the table that isn't spoken nearly as much of as it should be is that React can be used to render applications to HTML, without the DOM, on the server. This usually gets mentioned in the context of "load times" or "SEO friendliness" but it's a pretty significant property of React.

Not only can React on the client "seamlessly" re-use the server-rendered DOM and attach itself like a jQuery plugin would but being able to render the application (and using Redux: render it with an arbitrary state) means you can truly embrace the idea of progressive enhancement without giving up the comfort of web application tooling.

Some years ago there was a lot of hype (well, not as much by a long stretch but some hype nevertheless) around the idea of making web apps work without JS (YUI[0] allowing server-side rendering of JS web apps and PJAX[1] enabling web sites to behave more like apps). Instead of mucking around with client-side templates you would render pages on the server and then intercept internal links to fetch and inject only the bits that changed between the current page and the next.

React+Redux is basically PJAX, but it's also the polar oppsite. Instead of rendering everything using server-side technologies, you render everything using client-side technologies. And instead of offloading re-renders to the server you keep them in the client when possible (where they can be further optimised thanks to DOM diffing).

Let me restate that: React+Redux allows you to build "Rich Internet Applications" in a way that makes it possible to support JavaScript-free fallbacks out of the box. There's nothing stopping developers from making the HTML output of their React apps richly semantic and accessible either (even though that represents the Eternal War of the web).

The idea is not new[2] but with React it's not only achievable but easily achievable. And unlike Web Components it only needs existing technologies that are widely supported (it even works in IE8).


The PandaStrike article you linked is not a rebuttal of React. It's a rebuttal of traditional client-side web applications. It doesn't matter whether they are built with Angular or Backbone or React -- the arguments are mostly universal. The difference is that React is the only option today that (despite not having HTML "templates") allows developers to do something smarter.

Of course React is a land grab. Every open source project is. But it's not an attack on the Open Web anymore than jQuery was an attack on the Open Web before querySelectorAll or XMLHttpRequest landed in a spec. That you're no longer writing straight-up HTML in a text file doesn't mean you're no better off than if you were using GWT[3].

And who says Web Components are the Right Choice just because they're becoming part of the native platform? Sometimes the thing we think we want[4] isn't what we really need[5], even if it's on track to become a standard.


[0] http://ajaxian.com/archives/server-side-rendering-with-yui-o...

[1] http://pjax.herokuapp.com/

[2] http://www.stevesouders.com/blog/2010/09/30/render-first-js-...

[3] http://www.gwtproject.org/gettingstarted.html

[4] http://readwrite.com/2014/07/24/object-observe-javascript-ap...

[5] http://www.infoq.com/news/2015/11/object-observe-withdrawn

This is a beautiful post, so poignant and so true.

Even if it makes sense for a single company not to hire an old developer, we are losing so much values as an industry if we invest all this money into teaching valuable lessons to devs, then throwing them away and letting newer kids make the same mistakes.

Yes and no. A guy/gal who spent a decade or two dealing with RPG on a 400 will have judgments contextually relevant to that technology environment. His experience and judgments will have limited applicability to say microservice based architectures on AWS. Yes there will be some generalities that will apply everywhere, but probably not enough to make up for the deficiencies of limited experiences outside of that 400 comfort zone. In fact, some of that prior experience may well prove harmful rather than beneficial in a different environment with different strengths and weaknesses.

Unfortunately technology is unlike some professions like, say, warehouse management or construction where techniques evolve with time, but there are more commonalities from one decade to the next. Technology sees highly disruptive change and so I would still argue that without keeping current in your knowledge, your experience has a very short half-life.

No it isn't, it really isn't. Take any website you like, it's a form for you to enter data to be stored into a database, and reports you can run to see this data. Facebook, Amazon, you name it. So it displays in a web browser rather than a 3270 terminal, that's the only difference, and it's a trivial one.

Incidentally, what you call "microservices on AWS" is architecturally identical to how mainframes worked in the 1970s, you would do it with CICS. Much of the computer science, algorithms and datastructures, in every day use, were discovered in the 70s if not before. So either you have very limited experience of technology, or this is just ageism.

If you want to take generalities sure, nothing ever changes. We store data, we use electricity, etc. I can generalize away any detail and make myself look clever, too, but I also know the operating realities of those technologies of which I spoke involve details which are not so trivially disposed of and in practice they matter. Are there analogs between these technologies? Can we see echos of the green screen paradigm when we load a web page? Sure. But that's a long stretch from saying that implementation details are close enough to even say a majority of the information relevant from the old 400 days.

As for me, I've run AS/400 shops (back when it was the AS400, but we did use newfangled 5250 emulators), been responsible for development in companies implementing highly distributed client/server based products, and most recently I've been involved in developing back end systems for web and iPhone applications (and, yes, on AWS in microservicey ways). I've been around technology plenty. I can tell you, too, that my early mistakes with AWS were precisely because I allowed my prior monolithic systems experience take too much precedence in my judgments when I first approached it. It was not the same. I needed to learn the new lessons; perhaps my only advantage was I recognized it quickly and recovered.

It's not ageism, however, but a realism that recognizes that not all experience is the same or applicable to every situation: it takes continuous professional development to make that experience worth a shit. However, I can understand why you'd be a little overly defensive in your response however... being a developer of a certain age. Don't worry. I'm about the same age as you, as are many of my colleagues, and I don't think we're too worried about finding work.

If you can persuade your customers every few years that they need to reinvent the wheel and call the same things by new buzzwords, then you are all set for a lucrative career, you are correct. But sooner or later the wheels will come off this gravy train. I'll be fine - will you?

You didn't address any of your parents points there, and instead make vague "you're doomed" threats.

Lot of reading comprehension problems here.

1) That's not a threat, if you think it is then call the Internet Police - they have a hashtag #hnwatch just for that

2) He is simply wrong that experience doesn't carry forwards. To use my Amazon example, the value in that business is understanding how people buy books and how a logistics operation to deliver them works. I don't know what technology Amazon actually uses in their warehouses but it's nothing that couldn't be done with terminals and line printers, I'll bet dollars to donuts, and nothing that would be unfamiliar to anyone working in the logistics field in the 70s.

Well said, I stumbled across virtualization in the early 2000s and rediscovered what had been invented in the 60s. In the 80si created a compression algorithm only to find out it had been invented in the 60s. When I was under 30 I assumed everyone over 30 was brilliant and the often were. People under 30 today assume everyone over 30 is an idiot.

I've found the solution (for me at least), while I love spending time on new tech, is focusing on being a "solution provider" rather than focusing on specific technology. The moment you stop including the name of specific tech on your resume and start describing how you helped solve X problem that saved/made the company Y amount of money instead, you open the door to positions that may be as hands on (or not, depending on what you want), but that pay better and where the interviews are not focused on tech trivia at all.

Same here, I think that will come soon when we realise most of our jobs run on credit and not profit and the "demand" for engineering is essentially based on crap shoots. I'm genuinely scared and haven't been able to shake this feeling. It keeps me bothered at night.

> Having to interview for a new job and realizing that you can't pass the white board questions because you have fallen behind with the expectations of young blood. That's my nightmare, becoming irrelevant.

I'd like to think that interviewing for positions that aren't entirely focused on new-hotness still leans on heavily trod areas of CS. Data structures, sorting and searching.

A large portion of modern dev jobs are either CRUD apps or data munging. Knowledge of algos are not necessary for those for those domains.

It always amuses me when I'm interviewing for some vanilla web app job and they start pitching maths questions. It's filtering on the wrong criteria.

No, they are filtering on the criteria of "recently revised this stuff for an exam" i.e. "we want cheap and easy to exploit 23-year-olds but we can't say that in front of HR".

This. So much. And by any chance if you are stuck in some kind of "corporate only" domain (ex: telecom, insurance etc) with "system/non-web competency" like C/C++/Linux, your job market narrows down further. Its not impossible to find a job, but your job market becomes very niche. When people say just read up on tech etc, they forget the fact that, most employers are looking for professional experience in that area. Doubt if contributing to OSS, side projects can help convince the employers otherwise !

Do you love coding?

If you don't, just find something else.

On top of what have been said, there are two important advantages over young people that old devs should have. The first advantage is that compared to young people your advantage is your long experience in Computer Science. For instance: you should have a better understanding of your stack, you have mastered a lot of the basics of CS with several paradigms in different programming languages, databases SQL/NoSQL, etc... You may have a little bit of knowledge around the edge of your stack and outside...

For the second advantage it depends where you want to do. If you want to stay a dev and only code then the best thing to do is to master your stack and learn complex things. By doing that you differentiate yourself and increase the barrier of entry. So for instance to learn C++, AWS, Machine Learning, Hadoop ecosystem, webgl... Depending your field and what can be the next big hard thing with it.

Else, the other way is to go up in the hierarchy to be a team leader and maybe to go higher. At first you have been (or should) start to learn how to manage people. You should know how to lead your team when things are doing well but also when things are at the worst. Sometimes you have deadlines that are hard to meet. Sometimes the stress is at the maximum, financially the company is not doing well, or goals and ideas diverge... It's all the experience you need to get to be able to manage these situations and start to be a great leader.

Also you -may- start to learn to master git to be a team leader, to be able to fix the problems, and manage the repo for the code reviews. Another point should be to increase your communication skill, every day by talking with your team, by enjoying to write well written emails and speeches. These books [1] are a good start. And also a last point that I think is important is to start to get financial skills with things like financial management, accounting, and if possible to be able to do simple DCFs...

[1] http://www.amazon.com/Elements-Style-4th-William-Strunk/dp/0...



Legacy aint gonna maintain itself ;)

This is funny, but it's actually more like a programmer's second worst nightmare -- "It doesn't work but it should".

The true worst nightmare is, "it works, but it shouldn't", because that means your whole model of the domain was wrong, rather than some isolated, fixable component.

1. That can't happen

2. That doesn't happen on my machine.

3. That shouldn't happen.

4. What does that happen?

5. Oh, I see.

6. How did that ever work?

Stage six is the worst. Especially when I wrote the original code.


Yep, that's a good one! My first time seeing it was on http://bash.org/?950581

It comes with a nice follow up after the first 6 rules:

< MatthewWilkes> 7. svn blame

< miniwark> 8. one day we will write tests

> < MatthewWilkes> 7. svn blame

Am I the only one who runs git blame (for the commit, not the author) 20 minutes into investigating the bug and git bisect an hour in? They are excellent tools to find when and how did the bug first occur

In the "Beautiful Code" book, there is a "Beautiful Debugging" chapter. The author describes how things like `git bisect` work. I had some success convincing people to use bisect at work by pointing them to this chapter.

I do this as well. Unfortunately we're far and few between.

I find that software developers often try their best to stay at stage 1 by dismissing "weird" issues reported by users if they are inconvenient.

"Can't reproduce" = "Works fine on my machine. No problem!"

Yes. And usually discovering answer to 6 leads to finding another bug that would bite you later. So you often spend 2x time finding the answer to 6 than you spent for steps 1-5.

I remember having one of those moments with a function I hadn't implemented yet and being incredibly confused as to how it could possibly work in spite of there being no code yet.

Yes! I once started testing a class I had written, only to discover methods I hadn't finished writing yet were working perfectly!

It turns out I was accidentally reimplementing a library function, and I had chosen the exact same names and method signatures for the (admittedly small and fairly obvious) class.

Hehe, I had more or less the opposite of this once. I spent way too much time tracing a function in ever more detail going on the assumption that it must have been the source of the problem. Only... it was never called in the first place, it was a library module that took the call and it was misbehaving...

Just for the record, my test was a bit off and wasn't hitting the non-existent code at all.

The worst problems ( and the best pranks ) are the ones that make you question your own sanity.

Agreed. There's nothing worse then working on a problem for hours only to have it disappear for no discernible reason.

It's not a nightmare, but it's incomparably frustrating. I'm more motivated to figure out things that work when they shouldn't then the other way around.

I'm not a programmer but a Sysadmin. I ran into an issue a few months ago that nearly drove me crazy. I was custom building a new server for a customer, very nice build, Xeon E5 10 Core, 64GB DDR ECC RAM, Windows Server 2012 R2, SSDs, etc...

Everything is going well, update and configure bios, install Windows, install drivers and software. Then start configuring the server, sever needs to reboot, reboots to a blue screen of death. Can't get it to boot up normally. Ok must be bad software, driver, etc... Time for a clean install, everything is going fine and then reboot -> blue screen. Look up the bug check code, no help. Another clean install, this time no extra software, no drivers, same problem after a reboot. Finally figure out that it is only happening after I make the computer a domain controller. After Googling with this new information, I find one thread on some forum where people are having the same problem, turns out it was the solid state drive, if you use a Samsung 850 Evo SSD with Windows Server 2012, it will blue screen if you make it a domain controller. I never thought a problem like this was possible. Sure enough changed the installation drive and no more problems. Nearly drove me crazy, took me two days of troubleshooting.

Haha, these multi-hundred-thousand-line-source-code SSD firmwares need to do something. In particular they detect popular filesystems and try to optimize some operations (specifically if filesystem is recoverable by check disk after crash, it's considered to be acceptable optimization; i.e. it's possible to delay actual commit of free block info for example). Seems like detection glitched on the blocks that contained data related to being domain controller.

Thinking about SSD and HDD firmware can make a guy reach for tape drives as a last ditch hope that data stays just data.

At least with those (and to some degree optical media, but the write process there is more laborious) the storage and the RW logic is different pieces.

That is a whole 'nother level of Nope.

The closest I've come to that is tinkering around with random old hardware: a specific ISA (PnP) sound card and a specific 4GB IDE HDD I have here hate each other and if both are in the same system, when Linux sends the 'IDENTIFY' IDE command, it times out and never gets a response. This "works" across two different motherboards.

Thankfully I was just messing around, I knew the system I was using had used up 7 or 8 of its 9 lives and I was being careful, and I only changed one or two pieces of system state at a time so I could easily rollback (as nonsensical as "okay let's try removing the soundcard" is :P).

If anyone wants to know, model info shouldn't be too hard to locate, and pix would be fun to grab, I'd just have to go find all the parts.

Years ago (maybe 10) I decided to update Ubuntu on my dev machine. Everything went well, I boot into the new system and start playing around. But after some time the internet connection drops in the middle of apt-get. I am furious, start cursing at my ISP. I reboot the system, and thankfully, the internet is back on. Start installing new packages, and after some time the connection drops again. I was like "seriously???", then reboot again and net was on again.

Took me the whole day to figure out that the particular kernel version I updated to made my particular brand of router drop the connection to the ISP after ~15min. Had to downgrade the kernel to make it work again...

Fun times :)

10 years ago would have been 5.04 to 5.10. You did very well to only have that one issue to sort out! Thinks are so much better one decade later.

Another story about ubuntu; and this time I'm the idiot;

When you're a sysadmin who's only dealt with one type of linux for a long time you forget what the different distro's do.. there really isn't that much, naming conventions/subtle commands etc;

But, apt's packages have the nature of 'always run the latest service if upgraded', a notion that does not exist in centos/rhel/fedora/bsd etc;

So, apt-get upgrade on our production database server means the server goes away for a while; not 'the packages are available to be used when you restart the service at midnight'*

:p oops.

Ehh, not quite.

I was working on a dev server (which was pretty identical to the prod servers), vmhost/kvm/ubuntu 10.04 LTS.

One day I'm staging my updates, to make sure they work when I go to prod.. I do my apt-get and reboot into the new kernel.. but- nothing can be mounted. No developers can work either, they depend on that machine..

I'm looking at it and I can mount the drives from busybox after the kernel panics.. No idea what's going on.

I'm asking around in #ubuntu on freenode, since googling is getting me nowhere. 'Just reinstall it' they say.

'Sometimes it's better not to know why these things happen'


I found the issue, some person was smart enough to think lvm2 was called 'lvm' and then initramfs all my previous kernels during the upgrade procedure- so I couldn't boot into old kernels either.

Lost a day of dev time (although it was only 3 developers and me)

Given that we're talking about 10.04 LTS, I imagine this is from a little while ago and you don't have your devs rely in a single server nowdays?

Was a very small company, they do everything with VPS's now. No more physical hardware to maintain.

Many of you might have faced this issue, but it was pretty weird facing it the first time.

Problem was with the realtek soundchip on my card. Headphones stops working on windows (even though it detects it), but still works on ubuntu. Try all usual stuff (remove, update, reinstall drivers, etc), nothing works. One fine day, I find it starts working again. Then another day it stops working. Realise that sound stops working after i switch from Ubuntu to Windows. Weird. How does ubuntu have an effect on sound in windows? Google issue and find that its some strange ubuntu driver issue that puts audio driver in a weird state. Workaround is to shutdown laptop , remove battery, and hold down power button for 5 seconds to discharge capacitors and then boot into windows.

That's great!

We had an HP server that was _randomly_ losing network connection with the virtual machines hosted on it!

Couldn't ping them, nothing, but you could log on through the hypervisor. If you rebooted the VM it'd be fine for a week to two weeks.

This server was pretty new from HP.

After some research, calling HP and MS, it was determined that it was a setting on the NIC that was causing the problem!

There's a KB on it now on MS's site.

That is insane. And also one of the reasons I try to avoid being a system integrator whenever possible. It just takes up too much time - so much simpler having my friendly Supermicro distributor build my server for me!

I almost lost faith in my abilities through the ordeal. I love building, picking every single component, I love having the latest hardware.

This particular build was setup to be primarily a Hyper-V host, for the VM drive I chose an Intel 750 PCIe SSD. The drive is amazing, super fast, makes traditional SSDs look like spinning disk. Choose the EVO to save a few bucks for the customer, as the server OS wasn't going to doing any heavy lifting. I never imagined that a hard drive could be incompatible with an OS.

I would have specced the Pro version for a server. It's not much more expensive than the EVO but it's more reliable and faster.

Say No to Samsung for production systems TBH.

Other coders' nightmares:

* Security : customers acccounts leaked, identities stolen, bank account depleted

* Undetected data corruption : customers' data is corrupted and it has seeped into the backup stream, even the off-site one, and that is corrupted as well now.

* Rarely occurring simingly random concurrency related bugs that manifest themselves once a month or longer.

* Time : making assumptions about it, dealing with it distributed systems etc. Nobody thinks about it until it stops working or systems' time gets snapped back somehow.

* Works on my machine: You can't repro a customer bug in-house. Something about your internal setup is making a difference. Even worse: You work for a big-corp that is a platform for said customers, and so god knows what another team has running in your environment in the name of "dogfood all the things"

Years ago, my wife had a bug that only happened in a non-debug build and only on one specific system. It was a recursive tree traversal algorithm, written in C, with an OB1 error in some pointer arithmetic. If memory was allocated in a particular order, the function would read off the end of level X's data and find level X+1's data, which was structured exactly as expected, so it would end up processing certain nodes twice. In the debug build there was a debug variable being initialized to null, which was in between the two blocks of data and therefore got interpreted as a terminator.

What a horrible coincidence. I'm surprised they found it.

I saw one once where the cell carrier actually made the difference. As in I had to fly to a foreign country before I was able to reproduce the bug.

Something like this happened a few months ago, only this time I was the user and I drove the person writing the system up the wall because initially he couldn't reproduce it.

After writing a test script and letting it sit in a loop the bug became reproducible: 1152 times got a replication.


What amazes me about CS - and this almost literally leaves me speechless - is how little interest there is in classifying and avoiding bugs, and working towards a General Theory of Why Stuff Might Not Work. Even a set of basic widely accepted heuristics would be good.

Instead of real observation, there's a lot of leaning on type theory, OOP, immutability, and other ideas that scratch some sort of neatness itch and/or advance the careers of opinionated language creators, but have little/no empirical support to prove they actually produce more reliable code.

Meanwhile formal techniques and languages that do have empirical support are sidelined and ignored.

Bugs like yours are not uncommon. If anything, because of Darwinian debugging - the obvious ones die first - you'd expect a good proportion of hard-to-reproduce bugs in any code base.

So they shouldn't be considered weird aberrations. I'm not sure why CS theory doesn't have a better handle on them.

Yes, this amazes me as well. Shallow bugs are many and are fixed quickly and easily so in the end the ones that remain are nasty and hard to fix.

Every anti-bug measure tends to work like yet another brush aimed at a wall. But most brushes overlap partially or even wholly with areas of the wall already painted with previous brushes. The only strategy that would really work would cover all of the wall in a very systematic fashion.

Time : making assumptions about it, dealing with it distributed systems etc. Nobody thinks about it until it stops working or systems' time gets snapped back somehow.

I once wrote an industrial control program, and used a time function that rolled over at midnight. To make matters worse, we never bothered to set the clocks on the computers, so it seemed like a completely random glitch, and was a devil to debug.

The one I'm dealing with currently. Sometimes half an array in perl is missing its data, but the program should terminate early if that data is missing. So it gets past one check and then disappears sometimes. Never locally in dev. Only in prod. And not always reproducible. Customer data is getting messed up sometimes, and the code paths don't support this possibility. frustration.

Kinda sounds like you've got a race. Are you using threads at all?

AnyEvent for concurrency; no threads aside from the fork that causes child workers to pick up a message. The validation is in the parent, the child uses the data. The race would have to be something mutating the array, but I can only find one place where mutation happens. That part is a tight for each loop that appends source data to the target array. I will likely have to add logging, but the issue is so infrequent that the extra logging will be a mess due to the volume of data this service handles (around a billion requests per day, and maybe between zero and 200 requests will face this issue). Something will work out eventually; the bug must be exterminated ;)

Oh man, I used to work on a large service that had to aggregate data from a bunch of downstream services and submit it upstream. At the end of the month, billing would run off the data and send out invoices -- tracking down subtle data corruption was nightmarish and cost me my youth.

I have one of those random concurrency bugs in my code right now - it takes a month to show up and so far not in the debug build. The code doesn't crash, the program just stops running as though manually paused.

Is this on a unix machine?

If so crash the program with a kill -9 set up to write the core to disk, then run gdb on the executable and reload that core file. Then run 'where'. This will tell you where you're hanging and how you got there as well as all the parameters to functions on the way down.

A ps -aux beforehand can give you another hint, look at the status letter.

Good luck!

I have a Linux version, but unfortunately the version that stops is the Windows version. At this point if I solve this it is going to be by chance.

Ai, in that case I have no bag-of-tricks for you, my last windows programming was about a decade ago and I don't miss it in the least. Much good luck, if you ever do figure out what caused it you should definitely do a write-up, there is probably a more general lesson to be learned from nailing that one down.

ps aux (the dash is for sysv syntax: ps -eif)

Aargh. Sorry about that. It's so easy to mess that one up. Gets me with tar all the time as well. Thanks for the correction.

This reminds me of the buggy pic-C compiler standard library I was using. Some functions calls with particular parameters (maybe the printf family) would halt program for no apparent reason.

Currently dealing with an intermittent concurrency related bug. It's horrible.

Have one that took six months to find.

The database person spent a month trying to figure out why there were entries with odd dates. Like one or two a month out of a few hundred thousand. Eventually figured out it wasn't his fault. His end was getting an occasional packet with a good crc and odd data. Most of the time the parser would kick those out. But a few were well formed enough to go through.

I chased it down eventually to a documentation error in the datasheet for the radio transceiver we were using. The datasheet very clearly stated that when a packet came in with a bad crc... nothing would happen. No end of packet interrupt. So I had to use a timer to detect a bad crc. (Okay whatever, I mean I hate those guys, but I've seen worse)

Turns out when the radio received a packet with a wrong/bad crc the radio silently reset the receiver and started waiting for a new packet. If a the radio received a packet before the timeout hammer came down then the driver would think the previously received packet was 'good'


Was interesting because the symptom appeared at the farthest end of the system the tool graphing data for the customer and was caused by something at the other end. The error could have been literally anywhere in our system.

This is why i want to avoid mutable data as much as possible

Anything involving interrupts and timing. Especially when the timing is so critical that it is time to break out the logic analyzer. Bugs like that can keep you busy for weeks.

Let me add

* gcc dies with internal compiler error when you try to compile with debug symbols.

Oh, boy.

By this time I've killed gcc, javac, ghc, and the .Net compiler. Worse, the ghc bug was the only one I could track.

I'm so thankful that I work client-side and not server-side sometimes.

That seems more like a fictionalization of "Reflections on Trusting Trust" than a true story.

Mick (OP) responds to this assumption in the Quora comments:

I just read that. This guy certainly wasn't Ken Thompson. But this happened to me in 1991, seven years after the date of publication you posted, and the "login" portion did the exact same thing Ken was describing as far as the secret password. This was probably Thompson's genius idea implemented by the grad student. The mechanism was very different though in the compiler, but the outcome was still a poisoned compiler.

I had heard of self-modifying compiler before, too or I probably wouldn't have thought to look there myself. I'm not surprised to see Thompson behind the original idea.

I don't doubt the story's plausibility, but I'm left wondering how the student could have replaced the compiler with his own. Usually you need root in order to mess around with the system like that.

I thought of the 'trusting trust' paper just as I reached the paragraph about the compiler.

As an ex-PHB rather than a programmer, I think perhaps that Dr Phelps could have had taken a bit more interest in the project &c. Perhaps had a look at the code now and again as simply asked "why does this look so unlike other people's code?". Might have sent a shot across the bows of the bad apple.

This. I wonder what made the graduate so angry to go to such extreme lengths to protect his code. And why nobody noticed it - did they even have the right to fix the code?

Either that, or said student had read Reflections on Trusting Trust. It was published in 1984. The 3B2's and Tymnet were current around then, and the authors bio indicates he's been programming long enough for the timeline to fit too. It's not that unlikely that someone, somewhere read the paper and decided to have some fun at some point.

The thing that makes me tend to agree with you is that he hasn't mentioned any "finally, years later, I found out where that guy got his idea" moment in it.

He did sorta mention it, but not by name:

"I suddenly realize it's in the compiler. It was the compiler. And every time you compile the original code and run it puts in the subliminal message code into the source code. I'd heard of this before."

Ah, yes, I missed that... Thanks.

There is an "I'd heard of this before" in there that I do attribute to the paper.

Or for those who like explanations from secondary sources: http://c2.com/cgi/wiki?TheKenThompsonHack

I assumed for the first half it was going to turn out to be a psych experiment (and there were several other compsci students getting similar "contracts") -- especially as the hurdles he was jumping were clearly intentionally set up.

Or, perhaps, the grad student antagonist got the idea from the essay.

That's not a worst nightmare for someone who knows how to read assembly language and use a debugger. I would've narrowed that down to the compiler the first time and probably debugged into it too (providing whoever did this very clever hack didn't also do something to the debugger... but if the system is behaving this oddly, it's certainly better to bring one's own tools.)

Seeing the CPU do something it shouldn't, being traced by a hardware ICE, now that is a worst nightmare - and one I've actually experienced.

That reminds me of my 'worst' computer bug. I am writing some networking code for Mac OS 8. Which is bad in ways that are hard to describe. ex: OS 8 sort of has interrupts, but you can't say allocate memory during them.

Anyway, after getting some really terrible error messages for 2 very long days, and digging way to deep I finally say: "This can't happen, the RAM on on this machine is clearly crap."

Swap out the ram and everything is fine after that. !@#$

PS: Not that I had a great understanding of OS 8 coding. I remember printing a help page for some function which said, "This is the documentation for _. Followed by a copyright notice." and that's it.

Story time? (Please!)

It was a long time ago, but I remember we were working on an embedded system controlling some industrial equipment, and it randomly crashed; the time between crashes was long enough that it'd take several days before it happened, so even getting a trace of the crash was an exercise in patience. Eventually we did get a trace, and it turned out the CPU would suddenly start fetching instructions and executing from a completely unexpected address, despite no interrupts or other things that might cause it. We collected several more traces (took around a month, because of its rarity) and the addresses at which it occurred, and the address it jumped to, were different every time. Replacing the CPU with a new one didn't fix it, and looking at the bus signals with an oscilloscope showed nothing unusual - everything was within spec. We asked the manufacturer and they were just as mystified and said it couldn't happen, so we resorted to implementing a workaround that involved resetting the system daily. Around a year after that, the CPU manufacturer released a new revision, and one of the errata it fixed was something like "certain sequences of instructions may rarely result in sudden arbitrary control transfer" - so we replaced the CPU with the new revision, and the problem disappeared. We never did find out what exactly was wrong with the first revision, other than the fact that it was silicon bug.

> the time between crashes was long

I know that pain. I was working on the NDIS driver (WinNT 3.5.1, later a 4.0 beta) for our HIPPI[1] PCI cards. The hardware was based on our very-reliable SBus cards, so when the PCI device started crashing, we assumed it must be a software error.

I probably spent 2+ months trying to find the cause of the crash. Trying to decide if your change had any affect at all when you have to wait anywhere from 5 minutes to >10 hours for the crash to happen will drive you insane. You have to fight the urge to "read the tea leaves"; you will see what you want to see if you aren't careful.

While, I never did find the problem, I did discover that MSVC was dropping "while (1) {...}" loops when they were in a macro, but compiled correctly when they macros were changed to "for (;;) {...}".

Later, a hardware engineer took the time to try to randomly capture the entire PCI interface in the logic analyzer, hoping to randomly capture what happened before the crash. After another month+ of testing, it worked. He discovered that the PCI interface chip we were using (AMCC) had a bug. If the PCI bus master deasserted the GNT# pin in exactly the same clock cycle that the card asserted the REQ# pin, the chip wouldn't notice that it lost the bus. The card would continue to drive the bus instead of switching to tri-state, and everything crashed.

Every read or write to the card was rolling 33MHz dice. Collisions were unlikely, but with enough tries the crash was inevitable.

[1] https://en.wikipedia.org/wiki/HIPPI

Ok, that one should get the prize. The chances of spotting that are insanely small, kudos on making progress at all, more kudos for eventually tracing it down to the root. I really hope I'll never have anything that nasty on my path.

Most of the credit goes to the hardware guys that were able to finally isolate the problem.

We found the bug, but the months of delay (and a few other problems like losing a big contract[1]) killed the startup a few months later. While I'm annoyed the SCSI3-over-{HIPPI,ATM,FDDI} switch I got to work on was never finished, the next job doing embedded Z80 stuff was a lot more fun... and a LOT easier to debug.

Incidentally, I found a picture[2] of the NIC. Note that HIPPI is simplex - you needed two cards and two 50-pin cables. This made the drivers extra "fun".

[1] "no, I'm not going to smuggle a bag of EEPROMs with the latest firmware through Malaysian customs in my carry-on" (still hadn't found the bug at the time)

[2] https://hsi.web.cern.ch/HSI/hippi/procintf/pcihippi/pcihipso...

That's a beautiful board. I remember the FAMP made by Herzberger & Co at NIKHEF-H, I used to hang around their hardware labs when those were built. Similar hairy debug sessions in progress. Those worked well in the end iirc and ended up at CERN.

"A technical explanation appears in the demo as a 10-minute scroller, but the same text is provided below, for your convenience."

Hehe :) That's a really neat one: if you've fixed your machine you can read it, otherwise you can't...

Awesome sleuthing, and the take home is if you get down deep enough digital is analog again.

Sounds similar to the Dragonfly BSD/Matt Dillon episode[0][1], with AMD except that Matt figured it out after more than a year(!!!).

[0] http://article.gmane.org/gmane.os.dragonfly-bsd.kernel/14518

[1] http://thread.gmane.org/gmane.os.dragonfly-bsd.kernel/14471

That sounds like a story worth telling.

I was hired by a psychologist

My money was on the psychologist the whole time. I still kind of think it was Dr. Phelps and maybe Mark the admin and the AT&T tech are grad students in disguise. So I guess my worst nightmare is finding myself in a similar situation and later finding out my boss set the whole thing up for funzies.

I was hired by a bookie ;)

I have a somewhat related story. Circa 1995 or so, I was working on software compiled on a 486DX computer. Back then the computers had L2 cache in SDRAM chips you plugged into the motherboard.

I was moonlighting with a few guys to develop software and each night myself and two friends met up in office we were renting. There, on one of those fold out tables, we toiled away each night writing code.

The code we worked on was for off shore gambling (aka a bookie). Back in the 90s you could walk into a convenient store and pick up an magazine; usually autotrader magazine. On the back was an 800 number you could call to place bets on football, hokey, or baseball. (Called HOBs bets in the industry).

Anyway, because of the nature of the software and the people we dealt with, things were a bit hairy at times. The bookie we worked for paid us generously but, at the same time, expected perfection (i.e. software that just worked).

One evening, as usual, the three of us met to work toward the next release of the software. The next release of the software was to include a feature for boxing bets. Mike Tyson was about to be release from jail and the bookie was anticipating various future business in taking boxing bets.

So this one evening we were doing a software build to release as beta software. I did the software build and ran a battery of tests we normally run and all worked fine. To transfer the software to the offshore network we used a dial up connection.

I pushed the software out which my bookie contact would run and make sure all the logic was correct.

A few days go by and we get a phone call that the software is not functioning. Sometimes it crashes (which was rare) or sometimes just strange things would happen such as strange blinking characters on the screen.

I collected enough information to try and replicate the steps necessary. You can imagine my customer wasn't too happy. So the three of us basically stopped working on the code to track down the bug. We spent a week looking for the bug we were sure was in the code somewhere. Back then we didn't really use a code repository but we did have a diff of the beta release and previous release.

Pouring over the code changes we just could not find the problem. It even got to the point where I would recompile the beta version, run it, and could actually get the code to crash. Comparing to earlier release it would not crash.

It was by chance that my fellow coder, sitting right next to me, started running similar tests on his machine while I was off making a food run. It was sort of normal for us to group up around my workstation when something needed a collective look.

Long story short it turned out that one of the L2 cache chips was causing my compiles to become corrupt in just the right way to cause the code to be not as we typed it in.

Anyway, thought I'd share. Although not a malicious act, it was one of the worst things to track down. Fast forward to my current life... I work in the embedded field and recently solved a bit-flip problem in one of the products my company produces that relies on NAND memory. That little incident from my past certainly has served me.... ;)

I'm curious, how did you manage to pin it down to an L2 cache in particular? (As opposed to general memory corruption, for example).

ah good question. It was very, very difficult to pin down. We knew our compiler was the same on both computers and compiler flags were all the same. The two machines were purchased at the same time and we knew everything about them was the same.

We used something called a Pharlap DOS extender so our software could use beyond the 1MB boundary. It was in fiddling with that memory extender that I began to suspect a memory failure. Changing it's parameters eventually got me to fairly repeatable way to get the issue to show up not only in my app but in 3rd party software began failing too. Also, while we mostly did our work from a DOS prompt and a DOS editor (called qc) we would sometimes run Windows (Windows 3.11) which had it's own way of accessing Expanded vs. Extended memory through a DOS driver. So by swapping between the Pharlap driver and the Microsoft config.sys driver did I begin to suspect memory failure. In other words, the machine became more noticeably more unstable.

I don't recall the exact name but I began running something like memcheck on the computers. Basically a memory checker and although it did not reveal the problem in its entirety, the memchecker would crash on my machine and not any of the others. These were computers running just DOS with Novell Netware network drivers for DOS/Netwware only.

I reasoned that memory was failing before the memory walker/checker was getting corrupt. When I swapped main memory with another computer and the problem didn't follow the memory, my only other option was to pull out the L2 cache chips. When I put them in the computer next to mine, and finally saw the problem show up there I knew.

My friend owned the computer company that sold us the computers and he later was able to test and validate my findings.

That kind of thing is often exploited in a security context.

See: http://dinaburg.org/bitsquatting.html

That's an ugly one. And one that could easily happen today.

Psychological experiment was my first guess as well. (In fact, here in germany, there is a TV show which kind of does a similar thing for the lulz, though not yet with developers)

The year this happened probably disproves this theory as it sounds extremely intricate to prepare. Nowadays though I guess you could pull it off quite easily by making the "ex grad's machine" a VM...

Having been playing Fallout 4, I couldn't help but think this was some wacked-out psychological experiment.

Aside from the issues where all your assumptions of your tool chains, OS, and computational reality is melting around you this seems a lot related to lack of feeling of control and infomation. You have to keep calling people from vendors and other departments because your talent, knowledge, and skills hit a brick wall in the face of bureaucracy / compartmentalization. It is that you don't know what to trust anymore, you have to be resourceful with time, and you have limited access to the information / control needed.

This is how I explain why my technical job is difficult in a typical enterprise organization and why you need to be far better than average to produce merely mediocre results.

Here's a coder's worst nightmare: "I managed to get reflection working* in C++, by registering class functions in static variables. This lets you configure routes like in Ruby on Rails. Since C++ is somuch faster, can you port the rest of Rails over as well? You'll just be doing this for the next 4-5 months."

* https://news.ycombinator.com/item?id=10607029

I worked on software that encrypted entries with a custom C++ plugin to SQL Server. Nobody knew how it worked, it was implemented long ago and considered magic. Some entries on some installations sometimes would become totally garbled, which was a seriously dangerous issue.

This was a pure Java shop and I was almost the only engineer who even knew C++ since the reorganization that happened before I joined.

Turns out the affected installations all used custom compiled DLLs with hard coded specific crypto keys made up by someone long ago. We didn't really know how to deploy new versions of this.

But I found some version of the code at least, on some old file server, so I could sort of debug it. It took me a while to figure out but after working on it on and off for a few weeks I found a crucial race condition in the code path that looked up the crypto key.

Turns out if you were unlucky after a SQL Server reboot, a few entries could get encrypted with a "default" key. So I was able to decrypt the garbled entries on affected customers at least.

We decided to change the whole method of encryption since it was obviously hacked together by some cowboy coder a decade ago.

But in the meantime, the easiest way to fix the issue for the affected customers was a method I taught to one of the tech support guys.

I told him to RDP into the affected server, open the crypto DLL in Notepad++ and do a string substitution on a particular ASCII sequence that would change a certain character into a tilde. And then reload the DLL with a SQL Server command.

Because through disassembling the binary I found that this would sufficiently increase a certain value and fix the bogus comparison that caused the race condition.

The nightmare here was mostly because of custom DLLs running semi-unknown code on remote servers.

As a contractor, the fear is more on the client side than with the code. Coding problems can be resolved or worked around eventually, but the client can be a wildcard. In developer meet-ups I usually hear about coding issues like they are a puzzle, whose solution is being shared with the group. But the scary stories are usually centered around clients, since problems can arise there even if you provide perfect code.

When I read the topic my first thought was "customers!"

Make a technology, platform, architecture or framework choice. After a significant amount of effort realize you made the wrong choice.

Worse when you make the right choice but everyone else flocks to the crappy choice.

Management that doesn't care about regressions then later also sticks you with placating the customers who get screwed by those regressions.

Management that does care about regressions and you both still get stuck with placating the customer, because the customer always wanted features instead of dealing with debt.

Here's another story: I was optimizing the performance of a small, in-house x86_64 OS on a specific benchmark. I spent many months finding and removing/improving bottlenecks.

Once the performance was similar to competing OSes, the measurements from my performance experiments were varying by 30%! This was surprising because the benchmark was simple and the performance of competing OSes was completely stable.

Well after three or four days of frustration, I discovered that the problem was our test hardware: for whatever reason, immediately after BIOS POST, any memory loads executed for the next 10 seconds were 30% slower than memory loads executed almost exactly 10 seconds after BIOS POST. Initially I was shocked, but this may make sense -- it seems plausible that hardware manufacturers would run initialization code in the background resulting in quicker boot times.

Anyway, after we figured that out we simply waited 10 seconds after POST before we ran any experiments. I'm curious how common this slow-for-the-first-10-seconds thing is with other x86_64 machines.

Edited for clarity.

Apart from the obvious concurrency based or non-easily reproducible issues, some of my personal nightmare scenarios:

1. Uncommented code, but especially uncommented regular expressions 2. Unnecessary abstraction that runs too deep & so is hard to comprehend 3. Also, take any tractable problem and add timezones to it. Now you have an intractable problem!

My team and I are in the middle of a bad one right now. Our site is allocating incredible amounts of memory per page request, and it took us forever to figure out we didn't have a memory leak, at least in the traditional sense. We're running .NET, and its GC scheduler has 3 tiers for objects 0,1, and 2. 0 and 1 are short lived objects, but after an object has been around long enough the GC assumes it will probably be around forever and the 2nd tier garbage collector gets almost no time to run. We proved this by forcing the GC-2 to run longer and it brings the memory under control, so we don't have a "leak", per se, but we still can't figure out why these ephemeral objects are sticking around long enough to get so promoted.

Increase the size of your Eden space. You probably have something allocating very large amounts of temp objects, which is exhausting Eden. You should optimism that, whatever it is, but bumping up Eden will give you some breathing room.

Would it be too insane of an idea to go do a local full rebuild of .NET, now that it's open source? If Microsoft can get "[Build|Passing]", surely you can too... :P

Build .NET, prove the issue still exists with the local build, then add new instrumentation to the GC routines inside .NET's runtime.

Instrumentation never killed anybody... and if this issue persists too much longer...

I can't deny that it's a crazy idea though.

Leading a project where in order to just not fall behind on maintenance and user support you need 2 people, and 2 people is all you're gonna get for the foreseeable future, yet management demands forward progress out of you. Fucking demoralizing and degrading.

Nightmare? This is the status quo in the majority of small-but-not-tiny organizations.

Doesn't make it any less of a nightmare. I wonder what percentage of effort is wasted in the industry this way.

this seems like one of those stories that got slightly more intricate every time it was told. and it's been told a lot.

Internet explorer. A custom CMS. And sometimes some of the posted data is missing... On a single POS. And because data ath end of form is optional, system can't even figure out that something went wrong, it just silently discards it.

Never figured out what problems that particular IE had, but the workaround was simple... I added an extra hidden input field just to check if all was sent. The theory was that at least operator could repost the form if it went wrong - however, it never did again. Thank you IE for your obscure bugs and nonstandard behaviour! Rip. </rant>

My "favorite" bug was debugging a crash in Macromedia's Flash Player for Windows Pocket PC. It was an ActiveX control running in the Pocket IE browser. The crash was intermittent and only happened after running a 15-minute automated test suite on the Pocket PC, crashing deep in Pocket IE code. After staring at code print outs for a couple weeks, I found the bug: a BSTR returned from a COM interface was freed using CoTaskMemFree() instead of SysFreeString(). ARGH! A one-line fix.

Stories like these makes one wonder if "mere mortals" should just turn things off, scrap them in a industrial chipper, and go back to pen and paper.

You know, I know everything posted on the internet is fact /s... but I really hope this story is true cause it's just crazy.

I mean, who want to think that the main thing that would let you create code is the thing that is infected? And it seems like there weren't any other options for languages (which hopefully weren't compromised as well) so there would be no really getting around it.

I mean, who want to think that the main thing that would let you create code is the thing that is infected?

I'd wager that people who work with esoteric embedded toolchains develop an instinctive distrust in that layer.

That's a pretty messed up story, though I'm not sure it'd be a coder's worst nightmare. I'd think that'd be if some major project you were worked on got hacked... while you were enjoying the weekend off and had no access to your work computer.

Or just if your project developed a show stopping bug at that point in general, since you'd get the feeling that 'oh god, we've been done by two days' or 'we've been serving malware for two days'.

Or maybe if you'd found a breaking bug that needed fixed, yet no answers were available online, no one in the room could help and the site was due to go live that night. All that time worrying, refreshing your Stackoverflow question and trying in vain to find a solution would be torture.

The Knight Capital bug is probably most people's worst nightmare: http://pythonsweetness.tumblr.com/post/64740079543/how-to-lo...

That's not a nightmare. Somebody else caused it, and you aren't to blame.

It's not the code that causes the nightmare, it's the result of the code. Say, a live production system, where something went wrong due to your coding, causing a great irreversible loss. You realize it the moment it happened, and you can't do anything about it.

Closest I got: data inconsistencies in a +5-year old, live core production database of a large company, caused by a coding error. You don't know where it is wrong, you don't have means to fix it (it was wrong from the start), you do know how much was lost because of it. Only 'good' thing was that it wasn't my fault.

"You don't have to pay me anymore, Dr. Phelps, I just want lab time." This is nerd war.


This also came to mind while reading:


Once that you start using version control systems, this shouldn't be a problem.

A real nightmare for me was when I inherit a bunch of code full of uncommented regular expressions

How does a version control system address a malicious compiler?

Unless you're suggesting the entire system be under VCS?

About 2 paragraphs in, I was thinking, "Tell me this is not a story about someone who spent weeks figuring out that his compiler had been compromised."

Ah, well. The legends of self-modifying compilers are old at this point: http://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomps...

Yes, why did s/he not isolate and compile on another machine in step 1?

These days you can just run strace and see all the OS calls.

Unicode and timezones.

The OP is an interesting story. See Ken Thompson's Turning Award lecture[1] from about that time frame; it is strangely similar.

[1] Reflections on Trusting Trust, Ken Thompson, Aug 1984. https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomp...

Reverse scheduling. That's the worst one by far.

Yet another example of why all programmers should be familiar with "Reflections on Trusting Trust." (http://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomps...). Clearly the malware author was.

This is pretty unbelievable.

Yeah...I would have compiled the code on a different machine then wiped the original machine if it had clearly been hacked.

This was a university network in the 80's most likely based on machine model. Assuming the story is true, and not as mentioned elsewhere, a fictionalization of Reflections on Trusting Trust, the person in question most likely didn't have a standalone system that he could wipe, and didn't have copies of the OS media to install. Note the details of an AT&T tech coming in to fix the compiler install. That's quite believable for a university installation like in the story.

When I went to university in '94, if we'd had to deal with the same thing I'd have been in largely the same situation: Sun and SGI workstations that largely booted via NFS (some of the SGI's were exceptions, I think, but most parts of the system were mounted via NFS there too) where anyone trying to re-install the compiler etc. without the approval of the computer department would be in big trouble (and have to hack the system; though the protections were relatively weak).

And while we certainly could install compilers etc. locally in our accounts, getting hold of a cleanly compiled compiler for some of the systems we had would not have been all that trivial, even though the systems we had were all more mainstream than the 3B2 mentioned in the article.

I wouldn't be surprised about an outside contractor having a hard time getting the computing department to believe that this problem was a hacked system. I'd expect the reaction would be to assume that he was just trying to make excuses for why it was taking him so long to find the actual problem.

I kept wondering that as well. I'm wondering if this was a different time, or if it's just a fun story

3B2, so probably like early-mid-80's. Likely didn't have many options.

I've heard another anecdote from university where somebody changed some late digit of PI in the compiler. Very hard to detect...

Being so sleep deprived on a new gig that you mistake myvar: for :myvar

Coder's worst nightmare? When you are the last line before something gets pushed into production, and the fear that it will fail IN production.. have i missed something?

That's so messed up. I'm so glad I started my development career in an age where one could spin up a VM in minutes to isolate an issue like this.

forgive me for saying, I'm not a coder, I'm a sysadmin.

I'm scared of having developers with a little ops knowledge take over my job.


You probably should be afraid. Those developers are going to replace your tools with better tools, that they'll know and you will no longer have years of arcane experience to fall back on.

Developers spend a lot more time thinking about abstractions, where to place things and creating & controlling nThings. If you cannot do these things well, you need to learn them.

The best way to do that will be to ride the bandwagon, work together with devs and learn from each other.

Time zones

Software that crashes while presenting it to the CEO or on investors day.

How did I just know what video this was before I clicked it..

Bottom line: insists on hourly rate.

I'm in this camp, but I keep hearing stories about the pot 'o gold. "Charge what it's worth to the client." This happens?!

being held responsible (on-call) for how it works in production

COM. 'nuff said.


Users with ideas

For the love of Zeus... it's called git, use it!

The 3B2 computer mentioned in the story was around in the mid-1980s. Git's initial release was in 2005.

Not to mention the malicious code was in the compiler, not the source code.

After you invent time travel.

At the time of this story, based on the details (3B2 computers and Tymnet) RCS was likely still current and CVS likely didn't exist. There were other alternatives like SCCS, but speaking as someone who wrote code back then: It wasn't until the late 90's it started being weird to not see source control systems used even in a corporate setting. The last time I worked in settings were source control was not universally applied was as late as '99.

As a point of reference, the Joel Test was written in 2000 and even then "Do you use source control" was something you had to actually ask.

You'd be surprised. One of the questions I systematically ask when interviewing developers is whether they're experienced using version control systems. Not all are. The same goes for automated testing, following coding standards, or (of all things) using a bug tracker.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact