Hacker News new | past | comments | ask | show | jobs | submit login
We called it RAID because it kills bugs dead (microsoft.com)
399 points by soheilpro on March 18, 2020 | hide | past | favorite | 122 comments



I left MSFT and went to Viair (with folks from the MSN Mobile and Outlook Express team), and we didn't have a bug tracking system... so we had someone send us a few screenshots of RAID and built a VisualJ clone one weekend. Fun side project, didn't really want to be in the business of bug tracking, was slightly before the web app era... We called our bug tracking system "BlackFlag" because it was a competing brand of bug spray. (And obviously we used the logo from the punk band...)


I feel like I'm watching an episode of Halt and Catch Fire. Thanks for sharing!


Such a great show. I wish more people knew about it.


Never heard of it, but as a tech nerd from the 80's it sounds awesome. Will definitely check it out.


1st season is very good. Rest are crap.


A bit unfair, first season is amazing, the rest are merely very good - it’s a brilliant show overall though.


Really depends on what you want in a show. If you came in expecting a show about people designing computer and solving hard and interesting technical problems along the way you will be become more and more disappointed as the show goes on and focuses more on the characters personal problems and whether or not they get married/divorced/sleep with each other.


>focuses more on the characters personal problems and whether or not they get married/divorced/sleep with each other.

This is an unfair summary. There might be a divorce (does no one in the tech world get divorced?), and a relationship (that starts in Season 1, Episode 1 and is a driver of the plot), but it is nowhere near about "who sleeps with who". The show has good character development; would you rather the people not do anything in life, over the 15 or so years the show covers?

Halt and Catch Fire is great television that loosely follows the technology arc/personal computing age from the early 1980s through the 2000s. It has great acting and characters, including two strong female leads. If you go in thinking it's going to be about people sitting around "solving hard problems on computers", yes, you will be disappointed. There are coding screencasts for that.

It's a shame to discourage someone from watching wonderful television about a subject we're all here for.


Yes, it's very similar to Mad Men in that respect.


There is no comparison to Mad Men. MM was good all the way to the final season, with both professional view of advertising world and personal growth of characters. Was perfect balance. HaCF only had that in 1st season, in rest it lost its balance becoming a soap opera, which I can't stand.


Fun fact, one of Sweden’s most influential music- and culture journalists was being interviewed on national television about a weird documentary wearing a pink “Pink Flag” t-shirt with a modified Black Flag logo. Beyond meta.


Is that a clever way of referring to Black Flag at the same time as Wire's Pink Flag?


That’s how I interpret it. I actually asked him on Twitter an hour ago - I’ll keep you posted :)


The t-shirt was all pink of course.


Lokko or Strage?


Strage!


Everything about this story is awesome, especially the fact that you used the same logo from the punk band.


The bug puns were everywhere!

When I started working at Industrial Light and Magic in 1998, there was no bug tracker in regular use. I ended up building a bug tracker in Python called "Roundup" (inspired by a program by Alan Trombla, who came up with the name). To create the thinnest veneer of plausible deniability in case the bug spray lawyers came after us, the icon was a little spinning GIF animation of a cowboy on a horse, lasso curled in the air.

I only stayed at ILM for two years, but Roundup remained still in use as the ILM software department's main issue tracking system long after I left, at least 6 years later. It was one of the early issue tracking systems that made it easy to subscribe to any issue, so you'd get e-mail notifications whenever the issue was updated.

The issue identifiers were numbers prefixed with a code to indicate the affected software project, like "REN003" for a bug in the rendering system. Three digits, of course, because I could hardly imagine that a single project could ever have more than a thousand bugs!

I remember that along with statuses like "in progress" and "done", we had a status I've never seen anywhere else: "cbb". As in "done, but could be better." What optimistic software engineers we were back then!


CBB reminds me of the pieces of hardware I've labeled as "maybe bad".

Only to run across 2 years later, and find myself wondering "who was this 'future me' that I envisioned might be interested in an ethernet card that exhibits flaky behavior under high load when bonded?"


Been there and learned that lesson! Let's waste future me's time by having them figure out in what way the hardware might be bad.


Well, if it just corrupts packets and doesn't (make the (driver|kernel) crash|bring the link down completely|confuse PCIe so badly the PCH hard-resets) you could use it to build a resilient UDP (or even TCP) packet retransmission algorithm.

Not kidding, the results would probably be drop-in usable for cellular/satellite/microwave/etc links, and spotty WAN in general.


That sounds like an interesting historical error, because Roundup is a herbicide, not insecticide.


Don't say it too loud, but I hear it also kills humans.


I also sometimes work in film production. We use the status 'good enough'


It's also called "can't be bothered"


I think I used that. Was it open source? There was a short time when there were almost no open source bug trackers


http://www.roundup-tracker.org/

I'm sure bugzilla is older though.


Fun fact: Apple's internal bug tracking tool, Radar, traces its origins to a pen-and-paper tracking system from the very early days.

In a similar fashion to RAID from the article, it's use expanded way beyond the original use case and is now used by nearly every team at Apple, from hardware, to test, to marketing, retail, and everything in between.

It's often the butt of a lot of jokes as most people hate its dated and confusing UI :).

edit: Though to be fair, the team maintaining it has made massive strides recently to completely revamp the UX which is no small feat given such a wide range of business critical use cases.


As far as I can tell, most teams think Radar is the best bug tracking tool ever. Externally, it's extremely frustrating to interact with, even more so through Feedback Assistant.


Actually you bring up a good point - the reactions are perhaps more bimodal than anything. If you're on one of the teams it was designed for, it is indeed near perfect. If not, it's far from it.


The UI was old and crufty and it was a pain in many ways to work with. But I really miss it because it’s still better than any other issue tracking system I’ve ever used.


Can you elaborate on what made/makes it better?


Another request, what made it great?


It was torture. Bimodal is right.


> As far as I can tell, most teams think Radar is the best bug tracking tool ever.

To judge from the outside, one of its features is that it makes it super easy to ignore years-old and highly multiply submitted bugs.


I've heard using it externally described as "throwing a Radar over the wall"


I see you've stumbled across P3 - Important.


They've probably yet to use Bloomberg's DRQS. It's a dream to use. If I understand correctly, Bloomberg had been migrating to Jira. What a shame.


The problem isn’t the bug tracking system, it’s that Apple rarely responds on a timely manner and when it does, it’s often not that useful.

The frustration streams from the human side, not the bits


Yeah, the new feedback assistant UI was pretty buggy and unpleasant to use for a few months, but it's now perfectly fine and I have no complaints with the tool itself. The problem is that reporting bugs to Apple is a giant waste of time, not that the tool for reporting bugs to them is bad.


The majority of the bugs filed are machine-filed or filed internally. It's also used a lot for project-management type stuff. I don't think radar is really very intended for external stuff, though maybe it should be.


It even has a cute avatar, Annika the Anteater, that a bunch of people have plushes of


Tried to find a pic... Does Apple secrecy really extend that far?


you mean the first image to come up in a google image search for [Annika the Anteater apple radar]?

https://twitter.com/macshome/status/1028996446903783426


Most Apple employees don’t have extensive public social media and those that do are generally pretty careful about posting photos of anything inside buildings. There isn’t specifically any secrecy around radar the plush toy, but most tend to be a lot more private social media-wise than most people.



That link breaks the back button, which looks like the URL shortness fault :/


What do you mean? You want to leave a google property?

I can't believe getting out of an amp page takes a copy paste on iOS.


I believe Anika is canonically an aardvark.


And shirts!


>>> they might have been too scared to write it. When looking back on the origin of RAID, one of the original developers confessed, “It really wasn’t made to last that long. Sorry!”

This speaks to me - too many times we think we need clever architecture and well thought out methodologies and ... the most useful tools are a mix of inspiration and timing and luck.

Had the developers stopped to plan they would have built something with over 16 bits of primary key - and also made a hundred other "improvements" that would have destroyed all the value MS got out of it.

There is a point in this - it I am not sure I know what it is.


> Had the developers stopped to plan they would have built something with over 16 bits of primary key

While true, this was in the 16-bit, Windows 1.x era. Also the ~640kb of RAM era (disks also werent that big either, DOS (v3.x) around that time only supported 32 MB partitions). Not sure which version of DOS Windows 1.0 ran on, I didnt use windows until v3.11 for Workgroups. A 32-bit primary key would likely have been seen as wasteful (not that I'm excusing it) and impractical, because you could likely never practically store more than 64k records on a PC of the day (and even if you could, you wouldnt have an disk left to do anything else).


He didn’t say which platform it ran on. Was Microsoft still using Xenix internally at the time?


Yes, for a time Xenix was used for some proportion of internal development, IIRC for DOS.

I think I read that on here somewhere. I unfortunately don't have a citation. Need to get a time machine to take me to whatever point in the future I sort out my bookmarks ._.


On the other hand, temporary fixes are often the most permanent kind.


Google's counterpart is called Buganizer. I've seen bugs in it dating back a decade or more. I don't know what, if anything, was in place before Buganizer. They might be exposing some version of it externally as well now: https://issuetracker.google.com/, although the UI bears little resemblance with the internal version I used years ago, so it's probably a separate product.

What Raymond's article does not mention (or at least I don't think he does, I have only skimmed the article) is that at MS each team had _its own_ bug database. As a result, even filing a bug in another product was an ordeal, let alone fixing bugs across products. What this friction means in practice is nobody would bother unless they really couldn't live without a fix. I think you can easily see the results of that in Microsoft products today.

At Google, _everyone_ uses the same bug DB. If a BigTable bug affects something in Ads, you can create a bug for BigTable and reference it in a bug under Ads, which is, IMO, the way it should be. Moreover, you don't need to get bug database permissions for BigTable, because you already have them.


> What Raymond's article does not mention (or at least I don't think he does, I have only skimmed the article) is that at MS each team had _its own_ bug database. As a result, even filing a bug in another product was an ordeal, let alone fixing bugs across products. What this friction means in practice is nobody would bother unless they really couldn't live without a fix. I think you can easily see the results of that in Microsoft products today.

Filing a bug in another Microsoft product has been technically easy for a long time. It's true that each product team has their own Product Studio database or Azure DevOps organization, but Product Studio gives you a listing of all PS databases company-wide, and I think you can get the same thing with AzDO. And in both, you can open a bug in any work item tracking database you have write access to.

To the extent that you cannot/could not touch bugs or sources for other Microsoft products than the one you work/worked on, that was more a feature of Microsoft's confidentiality policies and corporate culture than of our tools. That is, people in the Windows organizations had access to Office's Product Studio DB or master source repos only if they were specifically delivering features for Office, and vice versa.

That attitude is mostly gone now. Since 2013, everyone working on product R&D has had access to most other products' resources, and (IMO) almost no one tries to jealously guard their code against "outsiders". There are still different bug databases for each Microsoft product, but you have permissions to read all of them by default.

Exceptions to this general openness and transparency are largely limited to consumer hardware products (Surface, Xbox) which remain very tightly controlled against leaks.

(Above is based on my personal experience working for Microsoft in the start and end of the last decade, with a gap in the middle.)


I remember when I first started at Microsoft, I found some bug in some product and thought "I should file a bug with them"

Turns out there was no easy way to do that. I ended up emailing some person from the team I found in the GAL (I think that's what they called it?) and they weren't helpful or appreciative.

Never did that again.


I don't know how it is now, but you've just described every single well-meaning starry eyed fresh Microsoft employee in the 00s and likely before. The fact that management did not see how fucked up this was is a condemnation of the Microsoft company culture of that period. Again, don't know how it is now. I turned in my blued badge well over a decade ago.


I think the Material UI is new, it used to look a lot different when I used it externally (back in 2017).



Buganizer was preceded by BugsDB if I recall correctly


I'll take any opportunity to mention John Browne's "The Bug Count Also Rises"[1], which contains the line "The RAID is huge with bugs", referring to Microsoft's bug tracker. It was previously submitted to HN[2].

[1]: http://www.workpump.com/bugcount/bugcount.html [2]: https://news.ycombinator.com/item?id=2740452


What a poetic (and a little depressing-in-a-good-way) comment. I remember various times of my career being a glorified bug tracker. It's a bad mode to get trapped in with no forward progress.

https://news.ycombinator.com/item?id=2740626


Oh lord, product studio. Back in 2013 the preferred method of filing workitems for a given "wave" was to load this spreadsheet, do weiiird formatting, and indentation, type up all the workitems you foresee for the next sprint and then press a magic "send to product studio" button." God help you if you messed up some of the nesting or wanted to reorder anything - honestly I would just wipe it clean and start from scratch.

Ironically my team just finished a product called RAID that was designed to kill bugs in product studio and move them to DevOps. There are still pockets of teams that use Product Studio within Microsoft - the sufficiency gap is real - but they're the overwhelming minority now.


I had absolutely zero problems with product studio when I used it from 2008-2011. It was a very 1990s style MDI application, somewhat representing the peak of win32 ux design, short and to the point, and as long as the servers were happy it "just worked".


Sounds like the product could have been named FLIT, given the finicky nature.

[1]https://en.wikipedia.org/wiki/Flit_gun


> Mind you, that doesn’t mean that things have been stable, because the name of the service changed from Visual Studio Online to Visual Studio Team Services, and then again to Azure DevOps Services.

That's the most Microsofty sentence ever. It summarizes their allegiance to ever changing marketing goals over simplicity and consistency. Even for something like a bug tracker.


Before it was called Visual Studio Online it was called Team Foundation Server.

To make things even more confusing, there is now another product called Visual Studio Online, which is a completely different thing (it's a cloud IDE) https://visualstudio.microsoft.com/services/visual-studio-on....


The one thing all long-lived systems have in common: nobody ever intended them to be long-lived systems.


The quick and dirty Perl deployment system I built at reddit lasted nine years. Five years after I left.

I definitely never intend it to last that long.


Even better is when a long-dead system is suddenly resurrected and repurposed for another project.

I had a friend at a company I left tell me they were un-mothballing a cancelled platform (killed at the time by an unfortunate acquisition) and deployment system I built several years ago because it was needed again. The whole thing was pretty janky because it was barely past prototype but fairly functional.

I wished him good luck and told him to find a new job, because there be dragons. He took the advice eventually, but I did hear they got it up and running again, with some pain. Apparently it is out there chugging along doing its thing once again.


My father wrote a dBase application forty years ago that, to his amazement, is still in use in some places today.

Inertia is a hell of a force.


It probably has more digits for the primary key than the MS bug tracker originally mentioned. Sigh.


so reddit used a perl script for deployment until 2017?


There's nothing more permanent than a temporary solution.


As an independent dev team working with Microsoft Games in 2002-2003, we used RAID to ship Rise of Nations. It was great! Rockin' fast compared with any web tool now and batch operations were near instant. Sadly it was utterly insecure for remote use and it went away after that game.


I loved Rise of Nations! Mind sharing which bits you worked on?


Single player campaigns


I started at Microsoft in 1991, and remember RAID fondly. My favorite part back then (probably running on Windows for Workgroups): when you minimized the program back to its desktop icon, the icon animated for a few seconds, showing a can of bug spray firing on a small flying bug, which dropped to the ground.


For green-field projects (new projects/apps), you always had bug #1: "[Insert project codename] doesn't work. Severity: 1. Priority: 1."

Always interesting to read some of the bug reports; some of which could meander off-topic in interesting ways.


Sometimes I think bug tracking doesn't get enough attention. It's gotten a lot better these days but compared to source control, bug tracking seems to remain the red-headed step-child.

Bugs can be just as important as source code - if a line of code is changed and all the comment says is "bug 1234", then hopefully bug 1234 has context about the scenario that was broken, a history of a conversation between QA and development, and signoff from QA with whatever was verified.

Sure, this might fall into "write a good commit message", but obviously that doesn't always happen, and having the historic knowledge of the bug tracking database at your fingertips is important.


That is because Industry Best Practice these days is to use JIRA, which lumps bugs in with stories and spikes and other agileshit. They must be planned, prioritized, and pointed alongside all the other work. Ceremonies must be done. The idea that they are defects, and that tracking them and their fixes specially is important, tends to be lost.


Fossil (https://www.fossil-scm.org ) has an interesting approach to this problem - integrate bug and work item tracking, project wikis, and source control, but make them all distributed, so that there is not a single master copy of the bug database or the docs wiki.

It attempts to solve both the cultural challenge (bugbase written by a different company/team from the source control, imposing 2 different styles on you) and the technical one (bugbase and source control don't talk to each other, and so bugs and checkins are not linked).


If only Fossil weren't so opinionated... I love the idea of bundling everything in the repo, and not averse to SQLite database as the storage format. But there's no way to edit history, even for commits that don't exist anywhere other than your own local repo - because the author considers rebase an anti-pattern.


Doesn’t Gerrit (https://www.gerritcodereview.com/) do the same? At least for changes. Not sure about reporting issues.


Personally, I like to mention the ticket number(s) and a description of what changed and why. I wish git would reject commit messages like "fixed".


Train a spam filter and use a commit hook, in Perl.


The late Joe Armstrong described the similarly minimal Erlang issue tracking system in this excellent post about MVPs

https://joearms.github.io/published/2014-06-25-minimal-viabl...


Borland's bug tracking tool for Delphi was also called RAID.

It used the disconnected data set and resync technology that was part of Delphi itself, which was handy for remote working before always-on internet.


I wonder if Borland's RAID was inspired on Microsoft's, or just the same pun.


> Office cloned the source and built their own custom version

I found this bit amusing considering that (IIRC) the Office team also had their own C compiler :-P


It's often the simple things, not meant to last long, but work well that live the longest.


Who wants to celebrate the 18th anniversary of the oldest open Jira ticket with me? https://jira.atlassian.com/browse/JRASERVER-161


And if you have disks in RAID a bug in it kills your data instead. ;-)


I was startled that they used the name RAID given the Berkeley RAID paper... but on checking the dates, they would have needed a time machine to realize that this term was about to become heavily overloaded.


I was wondering about that myself.


Isn't RAID a way of storing all of the zero bits on one drive, and all the one bits on the other drive?

That makes replacement of a bad drive much easier.


It's a weird, nice feeling watching your software in use well beyond it's planned - or more frequently, assumed - obsolescence.


What ammuses me is looking at some of the hard-coded "end of time" dates at previous employers. Instead of using null as an unbounded upper bound (because nulls are hard /s), there are hard coded "end dates". One was 2032-12-31, another 2037-12-31 and usually 9999-12-31. Obviously, these all have problems, especially when used in finance. 2037 makes sense, because 2037 is the year that 32bit linux overflows. 9999 also makes sense. At least none of us will be around, so it's some future generation(s) problem. The 2032 one made no sense to me, and I was never able to get a justification.


Now that it's 2020, I've been looking for a new year to think of as "the future". 2040 seems too obvious and maybe too far away to really imagine, what just add 20 years? Boring. I think I'll take 2032.


2049, for the year the blade runner sequel is set in.


The year 2525—if man is still alive.


I'm sure RAID was much more usable than some of the ITIL-inspired monstrosities some of us have to deal with.


> Unfortunately, the archive project renumbers the work items. Fortunately, the original work item is remembered in the title, so you can do a search for originalid:3141 to find the old work item known as number 3141.

I wonder what happened if a bug was carried over for 2 or more databases.


Presumably the archive projects are read-only and therefore once a bug goes into the archive project it never moves again.


I assume it'd be a chain e.g.

db 1 - issue 3141

db 2 - issue 12 (originalid:3141)

db 3 - issue 4 (originalid:12)

etc.


I remember using RAID when I worked at Microsoft from 1993-1996


Product Studio, RAID's replacement was great. I really didn't like the transition to TFS.


> The database was written back in the days of 16-bit computing, so naturally it had a limit of 32,767 bugs

I know it’s a nitpick; but no, that is not natural to me at all. You can’t have a negative bug or would store it in list with negative indices. So using signed makes no sense to me.

Sometimes I get really confused why signed shows up in places it doesn’t belong.


> Sometimes I get really confused why signed shows up in places it doesn’t belong.

Because in the absence of convenient sum types, you end up using a simple type with sentinel/special values so often that anything where the main value is logically non-negative (or strictly positive) but you don't have a clear reason to never need special values is done with a signed type where negative (and possibly zero) values are reserved or assigned to special uses.


It is the same reason the floating point standard loves NaN, exceptional "invalid" values allow you to encode recognizable extra information for error codes or error handling. It is bug prone, but also quite lightweight and flexible.


I loved RAID. Lightning fast, so fast that speed was its own quality.


Of all my comments, this is oddest thing to downvote! Is RAID bad now?


seek not to comprehend the downvoters, for it is written: it is simpler to predict each gust of wind in a storm, than to explain even a single downvote

or something


Thanks marvy, this made me smile.


:)


it killed them dead by means of tracking and keeping them in an inventory...


im a CMVC man myself


Is it just me, or does this seem terrible? Lack of a stable business key & limiting key size to 4.5 digits just seems terribly lacking in any competent database practice.

I was building ISAM database applications for small business in the early nineties, and we used 6 digit keys.. and an index to access records by.

I'm sure Raymond Chen is a nice guy, but this just makes me think of a retard stabbing his cheek when trying to eat with a fork. (Dirty Rotten Scoundrels movie.)


Yikes, please don't post like that last bit here. I get that you probably meant it lightheartedly, but it does not come across well.

https://news.ycombinator.com/newsguidelines.html


No you did not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: