At the age of AI, this kind of human-centric process (thinking about the problem and designing an explainable algorithm to solve it) needs more praise. The handwritten output gives it an extra nice human touch.
Thanks! I'm the artist and it's nice to see this recognised. I have been worried with this project that people might think I should just use a LLM, but I think there is value in how controllable and explainable this is.
Inform looks very interesting, though I will apparently have to do a lot more reading before I can totally grok what it is and why.
I've been working on an idea I am calling the Story Empathizer. The struggle I'm having is that my idea is so abstract, it's hard to write an instance of it.
story = meaning + expression
In the world of computing, context is liminal. We write explicit answers to "what?" and "how?", and call them "data" and "algorithms". Those answers are entirely meaningful: defined. Context, on the other hand, is ethereal: untouchable. Context is the answer to "why?". It exists in our code like the waters of a river exist in the walls of the Grand Canyon: only present at time of writing. We encode our intentions into software just like a river carves time into earth. There is no inverse function: no intention decoder.
When we communicate with natural language, we start without shared context. If I want to be understood, I must share my context. I do that by writing a second story: a backstory. The reader uses that backstory to define the meaning of my expression.
With shared context, we aren't limited to reading. We have all the parts of the equation, and can use them to read, write, and empathize.
read: meaning = story - expression
write: expression = story - meaning
So what if we could move this process out of our brains, and into a computer? Well, that's the dream, anyway..
It could be one of the generated sentences, but I composed it via wetware, in order to get a more-meaningful-than-average utterance:
Meaninglessness (because a product of a random process)
is liminal (because it's betwixt and between, neither meaningful[0] nor meaningless[2])
because the context is shared (the vocabulary for the generator —the context shared between we meaningful creatures and the stochastic parrot program— are constraints, especially chosen by the artist to be evocative no matter how they are combined)
[0] compare SLN:
Just like the white winged dove
Sings a song
Sounds like[1] she's singing
Ooh ooh ooh
[1] there are many references to animal sounds sounding like human speech; here are two:
Ajax in the Little Iliad —having turned Athena's grey eyes green by managing to tank without asking her for buffs— is driven mad[2] by her shortly after he loses a prize to Odysseus: he interprets the greek cows, instead of saying "moo", as saying "μου" (mine), each claiming the prize for herself, not for him ... and so he kills them all.
In a less sanguine vein, Pechkin the Postman also has a nervous breakdown after thinking Khvatayka the Jackdaw is continually asking him "kto tam?" (both a fairly corvidesque sound and "who's there?" in russian) despite his efforts to explain that he's the postman bringing a magazine.
[2] in The Eden Express (1975), Mark Vonnegut points out that maybe he could've figured out he was going schizophrenic, if only he'd stopped to notice that everything had suddenly become meaningful.
I'm sure you've heard this many times, but thank you for sharing these blog posts. You're doing some _really cool_ stuff, and it's a treat to be able to read about the depth to which you've thought of these problems, and how creatively you've been solving them.
This is great, both the text generation and the diagrams.
Have you considered using wordnet to explore the space of related words? IIRC they encoded all sorts of relationships, from the obvious like synonyms and antonyms, to less common like metonyms etc.
This kind of thing is why I read hacker news and other sites that regularly throw up thought provoking, interesting and captivating things. This kind of human ingenuity, playfulness and exploration is worth trawling through all the I-used-an-AI-to-make-a-app posts.
I used a similar but less sophisticated type of process to generate random space station announcement messages for the hackerspace, such as "Problem detected in hyperspace bay three." Mostly, making sensible messages comes down to using word lists that make sense in the context. "caterpillar" is a perfectly valid noun, but it doesn't fit on a space station: "Problem detected in beautiful caterpillar omega" just doesn't have the same ring to it.
I implement phrases at the top level, with rules like: "sentence: $foreign_object_noun detected in $spaceship_part_noun"
Articles and plurals are problems. I may have to add some actual code to deal with them, as the OP did, rather than just blind string replacement with a rules table. It's a shame that "noun: $adjective $noun" doesn't cut it. Stupid English.
Caterpillar is also a pretty well known brand of construction equipment. I could imagine the tools of the future could be named after all sorts of animals.
There could even be some funny environmental storytelling there, administrators could call things by formal names, workers who grew up on earth could call them by similar animals, workers who grew up on the space-station might have become familiar with the equipment before knowing about the animals, so their associations might be backwards.
But it all like it’d have to be done intentionally, generating these associations procedurally seems like a total mess.
Each substitution referenced a file and entries in those files could reference other files (or themselves).
Reflecting the zeitgeist, in the "CELEB" file I find: "Billy Corgan, frontman of %ADVERB% %ADJEC% band The Smashing Pumpkins"
If I was going to do it again today I'd combine tenses into a single file and have the parser track the tense that should be used based on previously selected words. That's probably a lot harder than I think it would be.
> In general, there is perhaps a meaning-nonsense continuum that sentences can be placed along.
Some philosophers and linguists did say there is a difference between grammatically valid and nonsensical sentences. Chomsky's example was "Green ideas sleep furiously."
But I think the sentence just seems nonsensical because it presupposes that ideas can be green, and presuppositions are invariant under negation. Which means "Green ideas do not sleep furiously" still presupposes that ideas can be green. Since the presupposition is false (presupposition failure), the sentence itself is neither true nor false, just like a meaningless sentence with ill-formed grammar. But sentences with presupposition failure can be meaningful. E.g. "Mr Smith put his children to bed early", true or false? Neither, Mr Smith doesn't have children.
In case of grammatical "nonsensical" sentences without presupposition failure the issue seems to be that they are necessarily false. E.g. "Ideas are green" or "Gravity learns about regret". But that's not much different from saying "Bachelors are married", which is necessarily false, or "Bachelors are unmarried", which is necessarily true. They don't seem particularly nonsensical.
I think the case of semi-nonsensical sentences like "Vibrations become chances" can be explained with their ambiguity, where it isn't clear whether there is some metaphorical interpretation of the sentence which could be true.
I think the former are sleeping furiously, while the latter are probably not so furious for the most part, some ideas don’t mind having a little longer to ripen.
It's just a Markov chain built by tokenizing a corpus of English words into syllables and analyzing the probability of transitioning from one syllable to another.
I think the library I used to tokenize the words did not do a perfect job and the corpus was suboptimal too, but it works well enough for a bit of nonsense :)
> for example ‘universal’ which starts with a vowel but is pronounced with a y sound and therefore would go with ‘a’ not ‘an’
I spent a good minute with this sentence. For me, "universal" is pronounced with a j sound, not a y sound; a y sounds is pronounced with a w sound, a letter which happens to be pronounced with a d sound; indeed thanks, English.
(The word "yes", to me, is indeed pronounced with a j sound, the j being pronounced with a d sound.)
Explanation for other confused people: the pronunciations of these words and letters, expressed in IPA (or perhaps an orthography like Spanish’s) begin with odd other sounds: Universal /ˌjunɪˈvɝsl̩/, y /ˈwaɪ/, w /ˈdʌbl̩.juː/, j /dʒeɪ/
I've got a similar thing which generates output for several bots. You can get surprisingly good output just from simply replacing placeholders and some recursion.
Back in the day, I used the same "NLP meets madlib" approach in my silly little chat bots before I was introduced to the concept of Markov chains which completely blew my teenage brain.
This shows you don't need a very complicated Algorithm to produce passable sentences. I suspect it's similar to simple autonomous car algorithms that can start as simple lane following feedback loops.
The question is how performance scales as the controller complexity scales.
This is so great--both in terms of the project & the write-up. Thanks for sharing your work! :)
A "quick" dump of some thoughts it provoked:
(1) Really like the "meaning-nonsense continuum" concept--and that neither extreme is explicitly labelled in the image immediately following where the term is introduced. :)
(And, yes, "Gravity learns about regret." is indeed kind of beautiful.)
(Aside: Adding alt text to the generated images would be beneficial both for copy/pasting & accessibility reasons. :) )
(2) This statement made me smile: "If you don’t enjoy reading and contemplating these sentences like this, we are simply very different people." Because while I very much fall into the category of "enjoy reading and contemplating these sentences" I can also imagine people who definitely don't. :D
(3) Part of the reason for (2) was because I'd already encountered a couple of the "don't work" examples where my immediate thought was "wait, how about in this situation though?". e.g. "Aligning with compass."
Which then got me thinking about stylistic, metaphor or time-period related aspects of grammar, e.g. pirate treasure map instructions written in cryptic sentence fragments.
(4) Which leads into the whole "procedurally generated diagrams/documents" aspect of the project that I also think is particularly cool. It immediately got me thinking of being in a game and walking into an inventor character's lab or a magician's library and finding all manner of technical or arcane drawings generated for either environment or game mechanic purposes (e.g. in a detective/mystery game like "Shadows of Doubt").
Then again, since I was a kid I've always had an interest in browsing technical component and, err, office stationery catalogues, so maybe it's just me. :D
It also made me think back to "Interminal" an entry to the 2020 ProcJam procedural generation game jam, which generated an airport terminal including duty-free stores with procedurally generated perfumes (including "name, visual identity and smell"): https://nothke.itch.io/interminal
(5) The "physicalization" aspect of the project to render it in a tangible form is also cool--particularly because (at least to me but maybe also to all those vinyl-owning kids with no turntable :D ) a physical form seems to the amplify the "meaningfulness" of a message...
The underlying "digitalized cursive handwriting" project was also of interest--though I only skimmed its write-up in an attempt to avoid yet another rabbit hole. :)
The handwriting system seemed like it might also be amenable to animated use--which then made me think, "Wait! I've seen something about that recently...", had no idea what, but then some background retrieval process just produced the answer as I was writing, it was Noclip's recent "The Making of Pentiment" documentary: https://www.youtube.com/embed/ffIdgOBYwbc
(6) With regard to the technical execution/implementation side of things: I've observed that a project's need for text generation often raises a question of what implementation approach to use, in terms of a DIY vs pre-existing solution.
One issue which affects text generation specifically is "general solutions" often seem to tend toward specialization over time (e.g. "text generation for interactive fiction", "text generation for branching narrative-driven games", tools such as Yarn Spinner[0] & Ink[1]), which ends up making the solutions less suitable for simpler/different use cases & increases difficulty of the learning curve.
This was something I ran into during a small sub-project[2] last year where the text generation was somewhat "incidental" to overall project goal. I started out with a "quick DIY" solution but still ended up spending enough time on that aspect that I started to wonder if I'd be better not entirely re-inventing the wheel.
Around this time I ran into the "Blur Markup Language"[3][4][5] project which has the tag line "write text that changes" and--while I haven't yet used it--seems like it might be a promising "mid-level abstraction" solution for text generation, so thought I'd mention it as a potential option for others with text generation needs.
(7) In terms of other helpful text generation related resources, I've found various "word lists" to be of use, so thought I'd mention this "weasel words" list as a starting point: https://github.com/words/weasels/blob/main/data.txt#L14
(The repo README also links to other word lists under the same org including word categories such as "buzzwords", "filler", "hedges" & words listed in order of associated positive/negative sentiment.)
Thanks again for sharing your work & look forward to seeing where your projects go in future, should you share more in future. :)
[2] I wanted to generate "scripted dialogue" samples[2a] to demonstrate the 900+ individual Text-To-Speech speaker voices in the Piper TTS[2b] LibriTTS voice model[2c], in a form that is: useful for evaluating the voices; not incredibly tedious to listen to; and, makes it possible to identify which speaker you are currently hearing.
Thank you for this interesting "dump" of thoughts! I have a few replies which for some reason came out backwards.
(6) I too started this incidental to another project (the diagrams) and now have ended up in this rabbit hole where it's become its own project.
(4) Creating this kind of "narrative" for the diagrams is definitely something I have considered. e.g. there could be a character who had written them all and we found them in a book in the character's lab. I have this recurring thought about them being found on a CRT monitor in a forest, half buried in the ground and covered in moss. But I haven't been able to get a CRT aesthetic working well with them and for now I'm sticking with the hand drawn look.
(1) Yeah I did consider labelling but I am not the arbiter of what makes sense haha. Also actually I think there are probably different axes in which you could create that kind of scale. Like sense versus truth for example. There is probably a whole article to write just around that.
Thanks again, I definitely will continue to share my work :)
Bullshit generators are fun to write, and every seasoned programmer has probably written one, or a similar app, at one point. It's nice to see someone new discovering the joy they can produce.
Why in this world would you buy clay, turn it on a wheel, fire and glaze it, just to make a coffee-mug?
Yeah, it'd be easier to go to a shop and buy a mug, maybe with something banal like "Worlds Greatest Dad" or Darth Vader printed on it
But making your own, you learn something. You apply effort. You apply intelligence. You apply your sense of aesthetics. You work through trial and error. You refine your technique. You spend time.
In the end, you've constructed something that is meaningful, and not disposable. Something that you understand, and something that reflects yourself. How much more value would that be as a gift? Or even just as a "because I could"?
This project is that. I'd hire that artist over you in a heartbeat because the artist understands something, worked through it, and applied themselves.