
The Black Hole of Software Engineering Research - azhenley
http://blogs.uw.edu/ajko/2015/10/05/the-black-hole-of-software-engineering-research/
======
kragen
The main problem is that "software engineering" is a code name for "software
management".

If you major in "mechanical engineering", you will take courses in calculus,
linear algebra, probability, computer programming, physics (thermodynamics,
mechanics, electrodynamics, strength of materials, maybe aerodynamics and
viscous flow), control theory (what used to be called "cybernetics"),
manufacturing techniques, CAD, and maybe a management course or two.

The corresponding curriculum for software involves things like programming
languages, algorithms, data structures, networking, operating systems,
software architecture, and formal methods, and it is called "computer
science". What is currently called "software engineering" is the software
equivalent of "art appreciation". It's a desperate attempt by the cult of
managerialism that dominated that 20th century to figure out how to run
software projects without understanding software itself; it's an attempt to
substitute credentials and command structures for intellectual mastery of the
problems to be solved and understanding of the techniques for solving them.

The consequences, generally, are what you'd expect from a book publisher run
by illiterates or a record label run by Deaf people. Experian and CGI Federal
wasted hundreds of dollars building a "healthcare.gov" site that didn't work;
SAIC wasted hundreds of millions of dollars on building a Virtual Case File
system that was ultimately scrapped. These are the companies that represent
the approach known as "software engineering", that get CMM certifications from
the SEI.

By contrast, the companies that make software you actually use — companies
like Canonical, Google, Facebook, and Apple — put a great deal of emphasis on
_programming, motherfucker_ , and very little on things like SDLC,
requirements engineering, and "formal" specifications.

This is not to say that the issues that "software engineering" attempts to
tackle — like testing, requirements analysis, and software architecture — are
unimportant. They are very important. But the field that identifies itself as
"software engineering" is in fact failing to tackle those issues. It is not
where the insights are coming from. They're coming from _programmers_ and
_computer scientists_.

~~~
seiji
All good points. "software engineering" is how we take raw code and turn it
into reliable systems.

The worst code you'll ever see written is by computer science PhDs who just
have to _get something done_ for their research paper experiments. It'll be
6,000 lines of matlab with no loops, no functions, and entire 30 line sections
just get copy/pasted when they need to repeat an action.

That's code without software engineering. That's "coding" and not
"developing."

In a way, software engineering is a moving goalpost in the same way as "if a
computer can do it now, it's not AI." Many of our standard practices these
days used to be "must be considered separately" software engineering practices
10 years ago (revision control, proper documentation, creating libraries,
etc).

Open Source projects run by communities with a focus towards testing,
documentation, and maintainability are the best thing to happen to software
engineering. But, there are still crazy people running around writing code
just for themselves and refusing to realize software is more than _code code
code_ these days.

We pretty much need two divisions here: "software engineering (social)" and
"software engineering (technical)". Social software engineering is defined by
well maintained, long-lived, multi-contributor software projects. Technical
software engineering is the implementation of the testing systems and
contribution systems behind those long lived sustainable projects (e.g. are
you calculating the branch coverage of your unit tests? do you even have unit
tests? (no, testing your live servers aren't "unit" tests, those are
integration tests) are you testing all combinations of acceptable and invalid
inputs (null, proper value, invalid value, out of range value positive, out of
range value negative, ...)? are your systems even _testable_ and not bound to
side-effect laden global states? do you have proper mock objects to compensate
for your side-effect driven development? do you allow outside contributions so
you don't have to maintain all 60 of these detailed testing and failure
scenarios yourself?).

(there's also the entire other branch of "how to be a better software writer
through requirements gathering and agile/lean/XP/monkeypants practices," but
that's out of scope here.)

~~~
kragen
Revision control, proper documentation, creating libraries, use of high-level
languages, were common practice at Bell Labs ("programming, motherfucker") in
the 1970s, and alien to "software engineering"-driven groups like Computer
Sciences Corporation (CMM Level 5!) and Harlan Mills's group at IBM. What we
usually call "unit tests" these days, and certainly "mock objects", came from
Extreme Programming, which is a _reaction against_ the "software engineering"
managerialism cult.

I agree with you that those things are how you take raw code and turn it into
reliable systems, and that you don't get there by copying and pasting code and
writing 6000-line functions. But this article is not promoting those things.
It's promoting formalized process engineering.

(However, in my book, using very-high-level languages like Matlab (or, better,
Octave or Numpy), and high-level operations like matrix-multiply instead of
explicit loops, are best practices.)

I do agree that social practices around effectively collaborating with other
people are very important and require a lot of effort. You can have the
world's most effective hackers on your project, but if they're wasting their
time because you don't have a budget for a test server or because they're not
allowed to fix the bugs they find, your productivity is still going to be low.

~~~
nickpsecurity
I totally agree about the process junkies ignoring good stuff and pushing
their religion under banner of software engineering. They're the posers who
weakened the term. The Bell Labs people were real, software engineers both
experimenting and building on proven methods. Also...

"What we usually call "unit tests" these days, and certainly "mock objects""

That's not true. Like other assurance activities, Extreme Programming just did
what was already done, gave it a new name, mixed it with bullshit (esp pair
programming), and got it some needed attention. The Orange Book testing
criteria, in 1983, required as much testing as resources allowed of
primitives, their compositions, all interfaces, and common failure conditions.
That's on top of specifications, covert channel analysis, pen testing, using
only proven components, build system automation, etc that far exceed XP's
quality regiment.

Funny you mention Cleanroom because it was first thing I thought of when I
heard the "new" Agile approach. Iterative development, code reviews, usage-
based testing for acceptance, verifying each module... Cleanroom had all that.
At least they eventually figured it out and it seems competitors in Agile
identified XP's problems as well. Although old, I'm glad the concepts are
mainstreaming because it gave us something important: solid, well-maintained,
constantly-updated TOOLS to support the efforts. :)

~~~
kragen
You're missing the point. XP testing isn't an assurance activity.

XP "unit testing" isn't what's called "unit testing" in the software
engineering literature, and it isn't designed to reduce bugs; it's an
executable functional spec, and the reason you write the tests first is so
that you don't end up designing some monstrosity of an interface whose
clumsiness becomes contagious to everything that calls it. "Mock objects" are
a particular way of writing those tests in which your testing stub objects
actively validate that they're being invoked the way you expect, rather than
just passively simulating the subsystem being stubbed out.

Orange Book testing, by contrast, is aimed at reducing bugs.

XP-style test-first unit testing is not the best way to develop all software,
but it works pretty well for some things. Similarly for pair programming: it
can help a lot with intra-team communication, code reviewing, and not getting
stuck, but I'm pretty sure there are projects where working alone and doing
the code review later is a better practice.

Yes, Cleanroom has a lot in common with XP.

I think that a lot of the "competitors in Agile" actually suffer from the same
disease I'm blasting here: they try to solve technical problems with
management practices, and consequently you have a lot of Scrum teams with
really shitty code who can't make much progress because they're beginning
programmers, or experienced programmers who have fallen into bad practices.

~~~
nickpsecurity
Another I agreed with almost entirely. Hell yeah!

Yes, I read on unit testing and XP with its other goal. Most of what the crowd
pushes it for is the benefit of catching defects usually with requirements,
dependencies, or breaks from later changes. Old projects used testing for
those reasons, too, although not as consistently.

I'd say the real difference are greater focus on mock-ups and documentation
focus as you said. Other than that, the techniques are similar and more ad hoc
than before. Just getting more attention and use. Still A Good Thing.

"I think that a lot of the "competitors in Agile" actually suffer from the
same disease I'm blasting here: they try to solve technical problems with
management practices, "

Totally agree. This is a bad sign any time we see it. The focus should always
be on getting the right people in there, making sure they have good
tools/methods, and letting them get shit done. That's development or
engineering. Documentation, etc really just becomes evidence of what was done
and an aid for further development/maintenance.

I'll add that I think there's a zealot kind of thing going on. They blasted me
years ago with all of the stuff like it's better while not providing any
empirical evidence showing it was. Some evidence appeared around testing, etc
and Spiral already proved incremental approach would help. Much of it had
nothing but some people's word to back it. Pair Programming even countered
years of evidence about effects of interruptions on flow.

So, I think it would help if they or the next group focuses on controlled
studies pitting methods against each other on similar projects with objective
and subjective accounts of results. Then, we'll have a better handle on what
works and where. Right now, I have to rely on little empirical data and mostly
just take the word of the experienced people delivering best results for given
domains. Gotta get programming a bit more scientific in industry.

------
stillsut
I was always interested in Econ / Applied-Math research as an undergrad and
would build toy models in those areas. When I graduated, I went to work at a
financial company before coming back to study theory again 3 years later.

The immersion in industry was absolutely critical for me to see 1. what was
actually needed in the real world, 2. what types of assumptions can be made
about the predictability of large economic systems.

If I had gone to grad school right away, I think I would have followed several
ill-advised models down way too deep. So, I'd say at least one year in a real
role in the industry you hope to contribute is really essential for someone
planning to do applied/translational research.

------
rch
> work­ing as a [professional] for the past three years

That doesn't seem like a great deal of experience to me.

Off the top of my head I can only think of one or two professors with
distinguished professional careers who are principally focused on engineering,
rather than computer science. Maybe there's a list somewhere?

------
uxcn
One of the first things I noticed out of school was that most projects prefer
not using formal methodologies for building software. A lot of the projects
that do, tend to focus on applying methodologies more rigorously than the
actual goal of the software. I've seen problems with both.

I think what some people miss about having a formal methodology is that a lot
of time can end up being wasted addressing the same types of issues (design
problems, bugs, re-implementing things, multiple requirements changes,
etc...). The same can actually be true with applying methodologies too
rigorously though.

The most effective software development I've seen tends to be where the
culture values both quality and time spent. If something can improve quality
and reduce the amount of time spent on boring tasks, it generally doesn't take
much convincing there.

I think in a lot of ways, a lack of focus on software engineering isn't
necessarily an academic problem. It can end in some fairly bad results though,
like the Knight Capital incident
([https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stoc...](https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stock_trading_disruption)),
so obviously there can be a very tangible impact.

------
jdp23
One thing I wish he'd stressed more: there's almost always a big gap between
the research that's published (and any software used to produce it) and being
useful in a team's real-world environments. For software engineering tools,
for example, researchers tend not to put a lot of time into instalability,
integration into teams' toolchains and processes, usability, or clarity of
error messages ... to say nothing of performance and scalability -- you don't
need any of those things to publish the paper.

Of course there are exceptions, and I don't mean this as a criticism of the
researchers. It's just that there's a lot of work involved in "tech transfer",
and it's often hard to justify the investment -- especially for a relatively-
small team. Microsoft and Google and Facebook all have plenty of people doing
various forms of "applied research" to bridging that gap; but they've got the
resources to do it.

------
michaelfeathers
Sometimes I feel that software engineering is really three separate
disciplines: ergonomics, type theory, and sociology.

~~~
AnimalMuppet
And complexity management.

------
PaulHoule
The very idea that computer science is called a "science" is a reason why
software engineering is not taken seriously.

~~~
sliverstorm
To my knowledge true computer science is essentially a very narrow focus of
higher mathematics.

~~~
nextos
Not so narrow. I'd say it may contain all mathematics that is computable,
which is a rather big subset. Essentially all constructive mathematics.

That's a lot of stuff. For example, probabilistic programming is concerned
with all computable distributions. This is the most general class of
sampleable distributions.

~~~
TeMPOraL
Narrow in a sense that it's one specific angle to view math problems. But the
range of problems is indeed great; hell, parts of computer science have or may
have direct relation to how we view reality itself.

~~~
sgeisenh
It really isn't narrow. Mathematics, philosophy and computer science are just
three different ways of viewing the same problems.

------
lordnacho
I'm not sure I agree with what he's saying. I've been looking over the
shoulder of my brother's CS degree at Columbia, and although there's a lot of
theory, there's also a huge amount of practice.

They've been introduced a huge variety of tools: vim, Git, valgrind, tmux and
so on. A variety of languages: HTML/CSS/js, C, Java, oCaml. Platforms: Linux,
windows, Android.

A lot of the stuff seems incidental. If you're going to mess with the Linux
kernel, you need to understand C, you need some Linux CLI commands, and you'll
end up learning a bunch of other things as well. You end up spending a lot of
time debugging, profiling, etc, basically what anyone does when they're
programming.

They also seem to learn at least the basics of how to write programs as part
of a team. Code merges, planning things together, bits of scrum.

~~~
kragen
I don't think the article is complaining that programmers don't know how to
program or use version control; it's complaining that they don't know
"software engineering", which is being used as a code word for "software
management by managerialism".

------
saintx
What we call "Computer Science" is the discrete analogue of differential
equations. It has absolutely nothing to do with software engineering. CS
programs don't teach software engineering, which can be thought of as "how to
design, build, grow, and maintain an effective software system in a team
environment." Since most Universities prioritize research faculty over
teaching faculty, software engineering skills are generally undeveloped in
fresh college graduates, and tend to be passed along by mentors and senior
engineers in software teams to junior engineers.

Shy of Universities hiring prominent open source software developers to serve
as part-time teaching faculty, I don't know how we could push the transfer of
these skills into the university setting and expose students to this knowledge
at an earlier age.

------
Laaw
Can someone provide for me a path for keeping up with specifically software
engineering research? As a guy whose title is literally "software engineer",
I'd like to know what the PhDs in my field are up to.

~~~
azhenley
One suggestion would be to check what papers come out of the top-tier SE
conferences each year (e.g., ICSE and FSE). However, this is time consuming
since the topics range widely and each paper takes some effort to get into.

~~~
gcb0
i easily read papers from journals on other fields. there is nothing for SE
that is open and broad.

------
nickpsecurity
Glad ajko is at least attempting something. This is one of the meme's I've
been posting on for years. I think the problem is different, though. So much
wisdom and so many capabilities are in papers, sometimes software, created in
academia. However, nobody knows they exist. Coursework isn't the reason. The
reason seems to be two-fold: (a) the main sources for all of it are pay-walled
ACM and IEEE that professionals never use; (b) the non-paywalled links are
scattered across the web in University sites instead of one or more central,
organized sources for such information. These problems _must_ be solved
somehow because the opportunity cost is gigantic.

I try to get this info to people by pulling batches of papers, skimming them,
finding the worthwhile stuff, and mentioning it in online forums or to
specific people in projects. I have over 10,000 papers in my collection from
ACM/IEEE that were organized at some point (sighs). Anyway, when a topic comes
up, I can just search them and (surprise!) many modern problems have already
been solved. People just aren't applying the solutions when they know about
them and more often just don't know about them.

So, the second one was my idea. A resource, crowd or taxpayer-funded, that's
essentially an organized encyclopedia of papers and software developed in
academia pertitent to aspects of software development. It would start as a
manual effort that pulls in the links and paper that represent the best of
most topics. As it got buy-in, the professors might have their students
categorize and submit their own stuff. I thought an invite-only forum for
serendipidous discussion among people that know their shit (esp researchers,
professors, industry vets) could also get ideas across. Might be a fee for all
of this charged to users, universities, someone that's way cheaper than
ACM/IEEE. Basically, enough to cover the servers, bandwidth, and at least one
employee.

Ran it by Epstein at NSF. He thought it was a nice idea but wisely warned it
might not take off at all without buy-in ahead of time. The buy-in would have
to come from a ton of places, too. So, bootstrapping that might be tricky.
Even worse, many people I talked to in academia indicate the people and
organizations on that side don't care about practical things at all. The ones
that do are an exception and the norm actually filters out that stuff in favor
of publishing/citations.

Well, we could hit it from the other side. Closest thing I've seen from
industry was Dr Dobbs with articles on Cleanroom, Design-by-Contract, etc.
Many professionals read that and could get exposed to cutting-edge via
practical treatments from a trusted source. Additionally, the likes of
Microsoft, IBM, Google, and Facebook do the cutting-edge academia and
practical stuff simultaneously while publishing some and deploying others.
This brings attention to it. Some results at these companies got widely
deployed and are standard fare today.

So, I'd say the easiest route is a hybrid between companies and the nonprofit
concept. Anyone wanting to try it can look at the kind of stuff companies are
working on. Package up the cutting-edge stuff for those things to make it the
first deliverable of the site. Add in plenty of IT and INFOSEC stuff on top of
that which has immediate value for them. Get those companies as members to
sponsor the initial development. Simultaneously, try to get University groups
on the bandwagon who operate in key areas of IT or INFOSEC where the research
results need attention. For big ones, maybe let them pay to have a dedicate
person doing nothing but pulling in and curating their research as a brand
boost for them. The site/service should grow in both usage and material as
businesses pull on it for info while Universities push theirs into it.

Outside an effort like this, I'm not sure how else we're going to bridge the
gap between academia and industry in a major way.

~~~
hga
_(a) the main sources for all of it are pay-walled ACM and IEEE that
professionals never use_

Balking at the $200/year ACM cost wouldn't signal to me that you're a
professional (in the First World, then again, they have Developing World
discounts). IEEE, though, is $40/month for 25 articles with a max of 10 unused
carrying over to the next month, $17/month for a "Basic" level of only 3. Plus
membership; that's prohibitive.

~~~
lfowles
I was looking at renewing my IEEE membership yesterday to get article
access... Naively I assumed the THREE articles per month level was included in
regular membership dues. What a racket :/

~~~
nickpsecurity
I didn't even know about the limit until he said something. My University had
no limit in their proxy: I pulled over 10,000 papers from it sometimes 50-100
a night. :O Wonder what they pay.

Anyway, might help to get affiliated with a University or enlist the help of a
student. ;)

------
TerryADavis
Every genius makes it more complicated.

It takes a super-genius to make is simpler.

~~~
nickpsecurity
What's complicated? There's all these problems in industry that allegedly need
solving, improving, etc. There's all kinds of academic work that solved a lot
of that, sometimes with industry, case studies. One doesn't make it to the
other and often vice versa. Why is that? How to solve it?

Sounds like an inherently complicated situation given all the parties,
differing incentives, and legalities involved. I made an attempt in another
post but not sure what will work.

------
ThomaszKrueger
Software Engineering is an oxymoron. If it weren't, professionals in the field
would be held accountable when projects go belly up, and they would be
required to be licensed to practice. Since there is no real understanding of
what that "engineering" is, there can't be a body of standards to which
qualify and license. Software Engineering is the only profession I know of
that requires no formal training, no license, and has pretty much zero
liability.

~~~
tjr
There are areas of software development that do operate more like traditional
engineering, such as in avionics software development.

For example, software developed for commercial avionics systems in the United
States follows the guidelines in DO-178C:

[https://en.wikipedia.org/wiki/DO-178C](https://en.wikipedia.org/wiki/DO-178C)

Every stage of development from writing requirements to coding to verification
to selection and usage of tools (including compilers and operating systems) is
reviewed across a mixture of internal reviews and external audits. There's an
extensive paper trail with each subsystem (i.e., software + hardware unit)
being tested and enduring a certification process before being released to the
public.

So while the _engineers_ on these projects aren't licensed, the development
processes and the resulting software are examined and signed off on.

Does that count for software engineering? I think it does. The next
interesting question, in my opinion, is, if that counts as software
engineering, then what aspect could we take away or relax, and still have
software engineering? Is there a graduated spectrum of software engineering?
Or is there a hard cut-off point, where we can say, "yes, this is engineering,
but that isn't"?

~~~
ThomaszKrueger
That seems to be the exception. Practically everywhere you look it is like
someone else pointed out "mf* programming".

~~~
tjr
It is indeed an exception. But it demonstrates that software engineering does
exist, and it suggests that more software could be "engineered", if we chose
to do so.

I suspect that reason that such engineering process isn't applied to more
software is that we don't really care that much. Are we willing to accept a
10x, or 100x, increase in development time and cost in order to have a more
robust iPhone game? Word processor? To-do list?

Which is why I'm interested in the question, what can we take away or relax
from avionics-style software development to bring better "engineering"
practices to other areas of software? Maybe a 2x increase in time and cost
would be acceptable...

