Hacker News new | past | comments | ask | show | jobs | submit login
Attempto Controlled English (wikipedia.org)
100 points by sublinear 11 months ago | hide | past | favorite | 69 comments


Attempto Controlled English - https://news.ycombinator.com/item?id=20950126 - Sept 2019 (60 comments)

AceWiki:an open-source wiki, that can understand english(almost) - https://news.ycombinator.com/item?id=1563384 - July 2010 (1 comment)

I like ACE it's very interesting application of Prolog. While it focused on writing logical definitions, I think there interesting way of looking at how we try to express programming languages syntax to newbies in classrooms. I personally think that it is possible to find way of speaking about code which sounds a lot like English. Maybe that's because my English is not adequate.

I have super early experiments here: https://kant2002.github.io/EngLang/ but it's still super early

My memory of Robert Lacy's biography of Ford is that he also introduced a simplified english for his migrant workers, as well as teaching them english in factory Schools.

I'm pretty sure South African miners used simplified english to get around tribal language barriers.

(these are distinct from the emergence of pidgin, they're directed intent to create synthetic languages, for specific outcome)

While ACE goes far beyond just style, I found that the guidelines set in https://www.plainenglish.co.uk/how-to-write-in-plain-english... (and similar attempts) are quite useful and happen to have a lot in common with ACE

Something like this would make for a great universal human language. In addition to this simplified grammar, it would also be great to limit vocabulary so only need a small dictionary, hopefully without any homonyms, homophones, or homographs.

The best example of this might actually be Wikipedia. There is a separate set of articles in a simpler and controlled form of English: https://simple.wikipedia.org/wiki/Main_Page , and details about the rules: https://simple.wikipedia.org/wiki/Basic_English

Simple English Wikipedia is not limited to Basic English

I doubt that English is a good basis for a universal anything, due to the extreme inconsistency of spelling and pronunciation. There are some other disadvantages, as there are in any language, but having to learn how a word is spelled and pronounced entirely independently of each other, with no correlation, is a big drain on time.

I imagine something like Esperanto would be much more suitable for the role.

The downside of Esperanto is that very few people speak it and is harder to learn in some ways as there's far less media in the language – you got that 1960s William Shatner film that's entirely in Esperanto, and maybe 2 or 3 more, and ... that's it.

Meanwhile you can learn English from watching Star Trek, Time Team, SpongeBob, reading Harry Potter, The Hobbit, shitposting on Reddit, etc.

For better or worse, English already is the de-facto universal language, or the closest we have to it. We tried the Esperanto thing back in the 20s, and it didn't really work out as there was very little interest outside of a fairly sizeable but ultimately limited group of enthusiasts. I don't see how Esperanto or any other constructed language will take off in the foreseeable future.

Continued evolution of English pronunciation and spelling seems like a more realistic and better option – the language will change anyway, as all languages do, so might as well ensure it changes in a reasonable direction. Time spent advocating Esperanto is probably better spent on advocating more consistent English.

Sure, but my point still stands, it's unfortunate that English is the one that spread. Spanish would have been a better choice.

> I doubt that English is a good basis for a universal anything, due to the extreme inconsistency of spelling and pronunciation. There are some other disadvantages, as there are in any language, but having to learn how a word is spelled and pronounced entirely independently of each other, with no correlation, is a big drain on time. I imagine something like Esperanto would be much more suitable for the role.

Isn't that the same for many things [how something is vs learning something about it that isn't how it is]? Roman numerals still catch me out; I cannot work out some programming code; a highlighted corner in a spreadsheet hides a comment; I can speak German phonetically (easy pronunciation rules - wysiwys) without understanding a word understanding a word.

Yes, but I've had to learn both languages as an adult, and English was much harder.

Why is that? I'm not criticising, just curious. What makes it a better language?

It's much more phonetic, which removes a huge amount of work in learning to spell.

Spanish is too verbose and the gender feature is a pointless waste of time. Informal/formal also pointless.

Formality used to be very important in language in less egalitarian societies, so it's really an artefact. I have seen it argued that (in German at least) gender can help native speakers with word recognition - i.e. it's easier to differentiate die Brücke and der Bruder with the articles than without.

But it's interesting to speculate on what English does which could be considered pointless: * Gendered pronouns (he/she/it) - some languages e.g. Persian/Farsi do with a single pronoun for both. * Number agreement for 3rd person verbs - i.e. he goes but they go. * The continuous aspect. In many languages the expressions 'I run' and 'I am running' are identical. * Articles. 'A cat sat on the mat.' Many languages do without these words and rely on context. * Required tense marking. 'I speak', 'I will speak', 'I have spoken'. Some languages make the distinction optional - i.e. 'I speak' can mean 'Today I speak', 'Tomorrow I speak', 'Yesterday I speak'.

Or things which could be useful to introduce to English: * Animate/Inanimate pronouns - i.e. a formal distinction between 'it' (used for objects) and singular-'they' (used for humans and similar). * An actual second-person plural - like y'all, yous, etc. rather than 'You' functioning as both singular and plural. * A distinction between we (including the person to whom you are speaking) and we (excluding that person). 'We are going to the zoo tomorrow' - does that mean me and you, or me and my family? * A grammatically distinct future form - 'I walk' -> 'I have walked' in the past tense, changing walk to walked, but 'I will walk' in the future using the same form as the present.

Same as having to always mention the gender in English, ie "he said", whereas in Spanish it's just "said".

"They said" is good English, whereas in Spanish word endings change due to gender.

What is "he said" in Spanish, and what is "she said"?

My point is that, in English, it's possible to leave someone's gender entirely unstated, whereas in Spanish someone is either "el amigo" or "la amiga" with no other options.

Also, if you don't know Spanish, I'm not going to teach it to you.

It's only possible if you use neologisms like "they said", whereas in Spanish you can go entire sentences without referring to the gender.

Singular they goes back to the 1300s. Whatever it is, it is not neologistic.


This is one of those things that's both true and false at the same time.

It's true it goes back to 1300, but it had also fallen out off fashion and was considered "wrong" later on. Languages change, and don't do so in a linear straight-forward way. From our perspective, it's very much a neologism (although it's been a few decades, and arguably already passed the neologism stage).

I don't know, the sentence "by 2020 most style guides accepted the singular they as a personal pronoun" doesn't scream "it's been used like this for centuries" to me.

Does Spanish assign gender to inanimate objects?

It assigns linguistic gender, which isn't the same as people's gender.

Yes, I thought so. Isn't that unnecessary overhead? I think this disqualifies Spanish and other languages like it.

To me, it doesn't matter much, because there isn't much "extra" to learn. The gender is derived from the word suffix, and there are three or four rules to that, so it's both extremely easy to learn and to apply to unknown words.

This is unlike, say, German, where each word has a random gender and you need to learn it along with the word. There, I agree, that's unnecessary overhead.

Lojban[1] is a constructed language which is (or tries to be) syntactically unambiguous. Of course even fewer people speak Lojban than Esperanto, but there's still a sizable community.

[1]: https://mw.lojban.org/papri/Lojban

Esperanto is very Eurocentric, in grammar and phonolgy, not to mention being under-specified and idiosyncratic in ways that make it hard to learn. It's also sexist, completely apart from the fact it's gendered, and some of its vocabulary is ambiguous.

Out of curiosity, how is it sexist?

There are languages like Finnish and Hungarian in which gender is almost entirely absent, and there are languages like French in which a binary masculine-feminine distinction permeates everything. Gender in Esperanto is similar to gender in English, but slightly more problematic:

1. There are different pronouns ("li" and "ŝi") for male and female persons, just like in English. There is a also a non-traditional gender-neutral pronoun ("ri") that a large proportion of Esperanto speakers have probably heard of by now but is not yet used very much.

2. There are about 20 word pairs where the female form is derived from the male form with the suffix -in, for example, "patro" father, "patrino" mother, so the sexes are treated asymmetrically. There are multiple proposals to fix this but, as you can imagine, it takes a while to reach a consensus on which proposal to adopt, if any.

Both of the above make it difficult to talk about someone in traditional Esperanto without knowing or revealing their sex, and also make it difficult to talk about non-binary people. But in principle the problem is not hard to fix and so it probably will get fixed once enough people feel strongly enough about it.

The proper use of the Latin alphabet, or any other alphabet, should be top priority. Phonetic spelling: if you see it, you can pronounce it; if you hear it, you can write it.

Spelling bees have no sense in a rational language.

Languages drift. What was straight spelling 10-20 centuries ago is now mismatch, because people now say these words differently. (Case in point: Italian vs classical Latin.)

To keep spelling in sync with pronunciation the spelling must be centrally controlled, like German, or Russian, or Spanish (the latter has the least-surprising spelling of all Latin-based modern languages I'm aware of, unless you're in Argentina).

Centralized control over English spelling is impossible due to a bunch of obvious reasons.

> What was straight spelling 10-20 centuries ago is now mismatch

I'm afraid you don't know how things work when people are used to phonetic correspondence: pronunciation changes, and spelling follows naturally.

As an example, consider the name of Florence. Originally it was "Florentia", today it is "Firenze": the name itself changed, and the spelling followed suit. That's because people in Italy do not ever ask themselves: "How do I write this?". The alphabet is there to express how you pronounce it, and not to support a parallel written language that does (not) evolve by itself!

Spelling bees are for those poor souls who saw their language drastically changed when being invaded by the kings of silent letters and wasted ink.

It's actually worse than this; you'd need centralised control over pronunciation as well. Otherwise, would a speaker of local Dublin english spell tree and three the same? And should a speaker of South Eastern Hiberno-English spell three and tree differently, even though the difference in their t and th is indistinguishable to most speakers of British English?

The lack of exact correspondence between spelling and pronunciation is a feature, not a bug.

> you'd need centralised control over pronunciation as well.

No, not at all. Several Italian words have regional variants, and they are simply spelled differently.

When you are used to a phonetic alphabet, you never wonder how to write what you say.

The differences in regional accents between say local Dublin English and the Supra regional Dublin accent (i.e. without even leaving the city) are reasonably large. I dunno what italian dialectal differences is, but Englishes are often very large city.

I'm not saying it's impossible; Trainspotting, for example, is mostly written in phonetic Scottish english. Many english speakers find it difficult to read though, for precisely that reason…

I would say that an exact, unambiguous phonetic writing (like in Spanish, Korean, Japanese kana, Mongol in Cyrillic mode, etc) would help flatten the pronunciation differences.

Languages drift, true. That's why you need spelling to follow, or else it only gets worse until all benefits of using an alphabet are lost.

Just do the occasional orthography cleanup every couple of decades and keep the old spelling around as deprecated until it's gone. If shifts in regional accents aren't perfectly consistent it's no big loss, their evolution will still be less inconsistent running on top of a of a non-fossilized standard orthography than without those cleanup efforts.

What obvious reasons do you see for English? Russian might be a language with a central authority claiming total control over all speakers, but both German and Spanish are 1:n in the relationship of language : state.

Phonetic spelling implies that one pronunciation is correct over all other alternatives, that certainly would not work for anything close to current-day english

Phonetic spelling implies that there's at least one pronunciation that correlates 1:1 to spelling, but it certainly does not rule out different pronunciations. Other pronunciations can also be 1:1, if they shift in parallel ways which actually isn't that unlikely, or they can do their own thing, the existence of a 1:1 version is still helpful even there. German-speaking Swiss share their written language with Germans (with some trivial exceptions like having cleared out the ß) but they are perfectly capable of talking in a way that is 100% indecipherable for Germans who haven't spent years learning (and even when Germans have lived their half their lives there trying to adopt the local pronunciation mapping is frowned upon).

> German-speaking Swiss share their written language with Germans ... but they are perfectly capable of talking in a way that is 100% indecipherable for Germans

You are talking there of a proper language on one side, and a bunch of non-codified dialects who never got any written form.

Yet those dialects tend to deviate from the standard language in very regular ways, even if they are perfectly incomprehensible to the uninitiated. When some new term appears in the standard language, speakers sharing the same dialect will independently shift it in the same way.

> Phonetic spelling implies that there's at least one pronunciation that correlates 1:1 to spelling

No! Phonetic spelling implies that each pronunciation correlates 1:1 with one spelling.

You seem to read "phonetic spelling" as using an alphabet that can encode any utterance in a reproducible way, like the International Phonetic Alphabet. I'd call that phonetic encoding, not phonetic spelling. One is explicitly language neutral, the other can be wildly language specific, but still very regular within the language specific ruleset. And those language specific rulesets usually come with a wide variety of "inofficial" variants used by regional dialects, most of which still being far more regular and internally consistent than the mess that is English.

English has conflicting tendencies. The spelling is effective in distinguishing written forms of homophones, a feature that is papered over in many sound-based spelling reforms. But OTOH the spelling is often disturbingly awful - and obnoxiously misleading. A controlled English should (IMO) strike a balance between these two tendencies while opening the door to spelling reforms, which could be easily managed in a controlled language environment.

I'm curious if a limited English dictionary would eventually start picking up loan words for new concepts, gradually bloating back into the modern (or future) English vocabulary.

Out of interest, I'd point at XKCD #1133, "US Space Team's Up-Goer Five" [1] and Thing Explainer [2][3] for examples of what gets lost when vocabulary is cut down.

[1] https://m.xkcd.com/1133/ [2] https://xkcd.com/thing-explainer/ [3] https://en.m.wikipedia.org/wiki/Thing_Explainer

Not an expert at all but I am curious if training a LLM purely on ACE input would result in a smaller but still capable model. I guess a small regular trained model can then translate ACE to a natural language.

Narrowing the task space results in better performance in the short run / with small cheap models, but worse performance in the long run. The model overspecializes and loses any generalizable capabilities because they are not necessary - like the old joke about how professors are like sea squirts, as soon as they get tenure they digest their brains as unnecessary. If you provide the model with adequate scale, the more diverse and challenging the problems, the better (https://gwern.net/scaling-hypothesis#blessings-of-scale).

So, if you trained a LLM on ACE only (and you somehow had enough to begin with), then it would fail to learn so many things. For example, it wouldn't learn about prose style, or other languages, or different speakers and identities.

Why waste time say lot word when few word do trick?

In all seriousness, these are good restrictions to follow when writing user-facing error messages. Developers tend to complicate the language, leading to more confusion.

ACE isn't for error messages or prose, it's for computers to understand natural language without needing an LLM (my own description). The idea is that ACE defines a strict syntax for English whereby computers can parse it reliably and human can read and write it reliably - just like SQL, but English.

I eagerly await EQL.

This is weird. I write a lot of technical documentation, and it often comes out reading like this. It's almost like I learned this from somewhere, but didn't realize it.

Convergent evolution, perhaps

While not strictly a formal language, Standard Marine Communication Phrases (SMCP, https://wwwcdn.imo.org/localresources/en/OurWork/Safety/Docu...) is also a controlled form of English used in international maritime contexts to minimize ambiguity.

That's close to signal flag language.[1] Which makes sense. It's used to convey the same messages, things ships need to communicate to nearby ships.

[1] https://en.wikipedia.org/wiki/International_Code_of_Signals

Does there exist a workplace where product owners have a background in technical writing and they're solely responsible for business logic instead of sloppy requirements?

Most high integrity software development includes roles for systems engineers that do this (avionics, medical, nuclear, rail, automotive). See standards like ARP4754A, INCOSE working group on requirements engineering for leads, or if you are really into it, applications of formal methods like DO-333

I tried to use the demo of the github page, but it doesn't work. It seems the page is stalemate since 2013.

I wonder how this parses the classic joke:

A programmers spouse sends them to the shops. "Buy a pint of milk. If they have eggs, buy a dozen." The programmer returns with 12 pints of milk.

In the age of LLMs this seems like a waste of time to even think about constraining and formalizing languages. Why should human users restrict themselves in their expression if LLM systems can easily point out any remaining ambiguities?

What they seem to be doing at the moment is riffing on the ambiguities. If they point them out, then the outcome will be .. less ambiguous language which will tend to be like ACE.

Gimme the ball is idiomatic. How do you know it means Give me, and not Furble the ball? We're taught Give me, we say Gimme. Both have a role.

It could also mean "please give me the management role overseeing the event known as the ball".

I'm interested to see an LLM do that. Technically, nearly every sentence is ambiguous, but that clearly doesn't bother us. An automated system that can point out sentences humans are likely to interpret in multiple ways would be interesting. However, no such thing seems to exist.

I recently made an LLM-based tool to identify inconsistencies with the INCOSE guide to writing requirements (identifying vague language, for example). Also have had some success translating those outputs to more structured language like LTL. I think it is doable

I am curious why you think this. A basic prompt of pointing out ambiguities should get you quite far.

I am pretty sure LLMs would be great to turn regular English into ACE including resolving detected ambiguities.

When working as a tech writer, I was mildly obsessed with this. Nothing drives users away from your documentation like a lack of clarity that signals that (1) you're on your own now, pal, and (2) the writer had zero clue of the potential for chaos in interpreting what he is writing.

You make a valid point, LLMs intuitively change the need for such formalized languages. It may not be a total waste of time, but I fail to see how this slight exaggeration justifies how your comment is currently stamped into the ground, likely by the current livid rage currently directed at LLMs in the software developer community.

I get it, people don't like to have their lives disrupted, and I sympathize, just as I sympathized when secretaries, bookstore owners and others were forced to find other ways to make a living. At least in our case we still have (or can find) a job, it has just changed, to some extent even considerably.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact