
Are natural language specifications useful? - ingve
https://alastairreid.github.io/natural-specs/
======
gerbilly
I've never found natural language interfaces to be useful in programming.

The hard part of programming is specifying the problem in an unambiguous way.

If anything natural languages can make this harder to accomplish.

~~~
mrec
BDD test definitions are the one that really annoy me. As far as I can tell,
they're written in "natural language" (i.e. a blessed set of specific
parameterizable phrases) purely so that they can be written by PMs rather than
devs. If that has ever actually happened in the entire history of anything, I
haven't witnessed it.

~~~
lotyrin
I find utility in them being understood by PMs, even if not written.

Also, the overly-parameterized kinds of reusable statements like "Click on
'whatever' then fill in 'name' with 'stuff'" are an anti-pattern (they
describe what not why, and they fail to provide actual reuse like a page
object would provide so you get costly maintenance). They should read "When I
log in" or "When I register as a user" they should interact with high-level
objects provided in your test suite,they should be purely whole-system-user-
interface integration tests, and you should have very few of them, just enough
to answer "At the human level, what does this thing do for users, and why"

Any tool can be misapplied, poison is in the dosage, etc.

~~~
crdoconnor
>They should read "When I log in" or "When I register as a user"

I _hate_ those kinds of stories. Half of the relevant scenario information is
concealed in a mess of turing complete code. _Which_ user logged in? _How_ did
you register as a user?

I've read tons of stories like this and they're often next to useless. You
might as well just write the test in regular code and put a comment at the top
of the file and get the PM to read that instead.

If it's something the _user_ does I don't want the executable spec to bury
those details in turing complete code because writing it in the higher level
language is "hard". I want them surfaced in the readable scenario.

The fact that Cucumber makes it difficult to write readable, terse,
deduplicated, parameterized user stories is, IMHO, a problem with cucumber,
not a problem with the people who write it.

~~~
lotyrin
If it matters what the user's username was or what button they clicked or what
form they're on --- to the business --- then sure, describe those in natural
language specs. If they only matter to the implementation, they should be in
the implementation's integration spec suite (which, discussing technical
detail, is not likely to be consumed or produced by business, and so has
nothing to gain from being written in natural-language, can simply have
natural language comments as you mention.

You shouldn't e.g. try to get to any coverage level with only a NL story
suite, etc., that is a gross misapplication.

NL Story suite has a very simple utility: the customer can read them and say
"yeah, this is an example of what should happen" or go "no, that's not quite
what I meant" early on, before implementation. Being able to automate testing
of that document as validated by the customer is just a bonus. 90% of the
value happens before any code is written. If you are writing an executable
natural language story and have no customers to read it, or if you are
covering details they will glaze over while reading because they can't imagine
or validate what is described, then you're using the wrong tool.

edit: Even a suite in code, for technical users, should strive to have more
reuse than having each test describe exactly what actions are taken, since
that's too hard to maintain. If your tests aren't DRY, you spend too much time
updating the 50 places your login form selectors are reflected in the suite
and the test suite acts as scar tissue that prevents changes instead of a
protective but flexible skin and skeleton that helps the program adapt.

Over-using story tests and dealing with the maintenance burden / poor
abstractions, overly-repetitive integration suites that make changes hard and
inappropriate use of unit testing that makes refactoring difficult are all
really common traps when people don't get the big picture of test automation,
but they are entirely avoidable. A test suite is like any other piece of
software and has to be designed (and factored) for the reality that its
behavior will change.

~~~
crdoconnor
>If it matters what the user's username was or what button they clicked or
what form they're on --- to the business --- then sure, describe those in
natural language specs

IME it's usually left up to the test programmer what to put in them and they
often put vague stuff in there that looks _exactly_ like what you just wrote
and that becomes useless both for PMs and non-PMs reading the natural language
suite because it leaves out business critical details.

It's also a hack often done in order to keep the story suite short because (as
I mention below), no inheritance in cucumber. There's a reason why the whole
world has not yet flocked to BDD and I don't think it's an issue with BDD
itself (I'm a big fan).

>NL Story suite has a very simple utility: the customer can read them and say
"yeah, this is an example of what should happen" or go "no, that's not quite
what I meant" early on

I think the idea that it has to be natural language so that a customer can
read them is bullshit. This is the exact same mistake the creators of COBOL
made, thinking that natural language naturally elucidates things. It doesn't.
It's parses very ambiguously. That's a feature if you're flirting with a girl
perhaps, but a bug if you're trying to write a precise executable
specification.

I feel very strongly that the story suite should be written in a language that
is easy to parse, can handle parameterization and inheritance but _isn 't_
turing complete. It ought to be readable for PMs and still maintainable as
part of an integration test suite.

>If you are writing an executable natural language story and have no customers
to read it, or if you are covering details they will glaze over while reading
because they can't imagine or validate what is described, then you're using
the wrong tool.

If they glaze over that might just be because they're a bad PM. I've had PMs
that glaze over when trying to figure out business-critical edge cases with
them because they liked to think of themselves as "big picture guys". That's
fine, they just shouldn't be PMs.

I think that the divide shouldn't be between "important to business" and "not
important to business" but simply "test implementation" vs "specification".

>edit: Even a suite in code, for technical users, should strive to have more
reuse than having each test describe exactly what actions are taken, since
that's too hard to maintain. If your tests aren't DRY, you spend too much time
updating the 50 places your login form selectors are reflected in the suite
and the test suite acts as scar tissue that prevents changes instead of a
protective but flexible skin and skeleton that helps the program adapt.

This is actually the main reason why I think PMs typically shouldn't write
executable specs. Inexperienced _programmers_ often don't yet have the DRY
instinct and the ability to keep a strict separation between implementation
details and specification. It's a rare PM that has those skills.

Then again, maybe they just need to be trained. I've also had the problem of
massive headache inducing repetition and a blurred distinction between
implementation detail and specification in huge word document specs.

>Over-using story tests and dealing with the maintenance burden / poor
abstractions, overly-repetitive integration suites that make changes hard and
inappropriate use of unit testing that makes refactoring difficult are all
really common traps when people don't get the big picture of test automation

Yeah, well, repetitive code and poor abstractions are basically a problem
writing code in any language. Poor tools make that worse, however.

It's one of the reasons why cucumber is a pile of shit: no inheritance. _Most_
stories in a business app are actually forks off existing stories. It's
utterly inexcusable that it doesn't have this feature.

Integration test suites with high coverage and readable stories do not have to
be repetitive. Mine aren't.

~~~
lotyrin
I think we're in agreement then.

It's indeed nice to have a PM that could produce details that can be
synthesized into a technical spec. It's nice when the PM can work with the
customer to actually understand those details. It's nice when PMs can actually
care about their products, and work with devs to weigh options. I don't
usually have those PMs. Lots of organizations have JIRA Babysitters who show
up at your desk whenever the political climate changes to let you know that
you should drop the urgent thing he asked you to do yesterday, because there's
an urgent thing he's got for you today.

The customer though, always wants what they want, and you can carrot instead
of stick them, and I have found some cases where I can sit with the customer
and do BA/PM with them despite the people with those roles, explain the idea
of test automation, show them how it can drive the browser and ask them to
describe the application (at a very high level) in natural language in terms
of examples, which I translate into Given/When/Then as we are working. I can
then explain in status updates "so, the 'simple' scenario for 'user submits an
order' is implemented <point to CI output> but we don't yet pass the story for
when a coupon code is used. Should we prioritize coupon code scenario or a
different feature?" or "So, we have implemented this story according to the
examples we were given in May, the behavior you described on this morning's
call would read "Given..." instead of "Given...". I can update the story and
prioritize that ahead of <whatever>, or should we ship with the original
behavior and proceed with <whatever>?"

Is it my job or the customer's job to have to do that? Probably not. But it's
a way to build a bridge from concrete stupid machines land into fuzzy people
land where everything is negotiable, in the absence of the roles or skills to
do so without such a tool.

Cucumber is indeed poor, the simple 1:1 of step definition strings to their
functions and of 1:1 scenario blocks to test executions are unfortunate.

I'd like to see a natural language tool where I am specifying invariants to a
property-based testing tool. "Given a user" implementation = what does a user
mean, well, it has a unicode string 'name' etc. and those map to database
model as follows, "when I am on the login page" here are the five pages that
have login forms, "When I enter my username" find a field 'username' and fill
it with the generated user name ... "Then I should be logged in" logged in
users see log out, their username in the corner, have access to their models,
logged out users do not, etc. Now this blows up into all the combinatorial
options, I find out "hey, if a user has emoji in their password, several
invariants are violated."

Generally I do not use NL tools or recommend they be used, I'd definitely
prefer the environment where they are superfluous as there is a QA automation
engineer working with a PM that is willing and able to elicit the necessary
details from the customer. Worse yet, all the current implementations seem to
be sorely lacking. Just, I don't think all hypothetical NL tools are
categorically useless.

------
tluyben2
We started with everything in natural on our products, but rapidly I found
myself wanting something more formal.

I have been working on formal specs for our products in Coq in the past but it
took me too much time and outside solely software dev it got me stuck.

So lately I have been using TLA+ (I used it many years ago but did not find it
formal enough in my then naive age and experience) and I must say it is great.

The learning curve is quite steep but not as steep as Idris or Coq (also less
formal) and far more practical.

Think the author could have used TLA+ although I did not get a full
appreciation about his executable specs from that article.

~~~
balfirevic
Can you share more about what kind of programs you're specifying with TLA+?
Business applications, particular algorithms, or something else entirely?

~~~
tluyben2
Our firmware, encryption algorithms, app/web and deployments. I currently
finished the firmware and deployments and working on the rest now. The
firmware helped a lot as we cannot change that in the field so we need to
agree on a non ambiguous way of communicating the specs for it.

------
Ace17
Answering to the main point of the author:

Architectural intent can be expressed in a formal way, but it requires a
formal language that allows you to define new abstractions.

And designing such a language is way harder than defining a small DSL with
just enough features to formally express your specification.

~~~
adreid
Yes, that is a large part of what I was saying.

Also, some things are so hard to specify formally that we still don't know
have any kind of formal spec. Memory concurrency semantics is an example. It
is only in the last couple of years that we got a good spec of fixed size
memory accesses. Then Peter Sewell's group drops the bombshell that if you
have mixed size memory accesses then you can't make programs sequentially
consistent even if you add a memory barrier after every single memory access.
But we still don't know how to formally specify the memory orderings
associated with atomic accesses, instruction fetches, page table walks or
device accesses. So, until then, we use the best natural language definition
we can and hope we will be able to formalise it soon.

Also, there are parts of the natural language spec that I had not seen any
value in... until I started worrying about whether the spec itself was correct
or could possibly be shown to be correct. And now that I do worry about that,
I am seeing new value in those parts.

