Hacker News new | past | comments | ask | show | jobs | submit login
What does the ??!??! operator do in C? (stackoverflow.com)
636 points by isomorph on Oct 5, 2022 | hide | past | favorite | 166 comments



I learnt C, more than 20 years ago, from the book The C Programming Language written by Brian W. Kernighan and Dennis M. Ritchie, also known as K&R. I read the book almost cover to cover all the way from the preface at the beginning to its three appendices at the end while solving all the exercises that each chapter presented. As someone who knew very little about programming languages back then, this book was formative in my journey of becoming a programmer.

Appendix A (Reference Manual) of the book broadened my outlook on programming languages by providing me a glimpse of what goes into formally specifying a programming language. Section A.12 (Preprocessing) of this appendix specifies trigraph sequences. Quoting from the section:

> Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.

> 1. First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents. Should the operating system environment require it, newline characters are introduced between the lines of the source file.

Then section A.12.1 (Trigraph Sequences) further elaborates trigraph sequences in more detail. Quoting this section below:

> The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.

  ??=  #
  ??/  \
  ??'  ^
  ??(  [
  ??)  ]
  ??!  |
  ??<  {
  ??>  }
  ??-  ~
> No other such replacements occur.

> Trigraph sequences are new with the ANSI standard.


To be fair, and definitely a part of its appeal, the K&R is only 312 pages long. It covers the language and most of the standard library you’ll need.

As opposed to say, “Learn You a Haskell for Great Good! A Beginner's Guide” which is 881 pages and doesn’t even moderately cover the prelude.

Anyway, C is an amazing language and I keep a K&R on my phone as a pdf


My copy of Learn You a Haskell ends on page 360 and the index is another 16 pages. Is this some weirdly shaped PDF of it with tiny phone-sized pages?


According to its listing on the No Starch website[0], the PDF is (currently) 400 pages. I'm not a fan of that particular Haskell book but it's in no way even close to 881 pages.

[0]: https://nostarch.com/lyah.htm


Why don't you like that book and which Haskell book would you recommend then?


That's 800 pages double sided! Simply because the back side is blank, is no reason to nitpick.


I don't get it. Wouldn't 800 pages double-sided require a PDF containing 1600 pages?


400 single sided printed pages. If you are trying to inflate the page count that is 800 pages, but every other page is blank.


Sorry, been looking at new cars, and salespeople have been explaining new car, extended warranties to me.


Could be, now that you mention it. Which, now, makes me wonder if my K&R is even shorter.


There's one thing that's usually true about every well used copy of K&R, and that's that it always opens up easily and by default to the page with the huge operator precedence order and associativity table, because that's the page everyone needs to refer to the most often, which is usually bookmarked, but in an old copy doesn't even need to be.

That says something about the C programming language design, that I as a deeply stack based FORTH programmer and explicitly parenthetical LISP programmer find horrible.


Just use parentheses if you're doing many operations on one line? This is common sense in any programming language.

I've been programming in C for years and never had an issue with operator precedence.


There's a lot of exaggeration that goes on with certain fans of postfix systems when they talk about infix systems.

For example: I really like HP calculators, so I'm in several facebook groups for fans of RPN/RPL and HP specifically. Sometimes a few of them go way too far out of their way to try to demonstrate how inferior algebraic systems must be.

For the record, my copy of K&R wants to open into either section 7.6 or appendix B. No idea what this says about me, though.


Indeed, different strokes for different uses. I "grokked" RPN pretty early in my career and use it for all my calculator stuff... only. infix for programming languages is equally comfortable for me. I still have to think and study quite hard the prefix LISP notation. I get it, it's just not internalized like the other 2.


I don't know what it is with C programmers and programming "terseness". It hasn't been the 60s since the 60s, and you have gobs of memory available, your source code can have syntactic sugar for the purpose of readability and the world won't end.


Can you give an example of syntactic sugar in a modern language that makes order of operations a non-issue?


These aren't syntactic sugar, but formatters. I rather like Ormolu for Haskell and elm-format for Elm. I occasionally type a bunch of parenthesis so that I'll for sure have the right order of operations. Then the format on save removes the redundant ones. It's delightful. The typecheckers and tendency to wrap primitives in a semantically significant constructor help with that.


Parentheses


Yeah, that's why I said

> Just use parentheses

Not sure what that has to do with memory usage..?


Just to clarify, I'm not the original person you replied to, and we all agree about using parentheses.

mrguyorama just implied that the only reason you would check the operator precedence chart would be to shave a few bytes off the size of your source code, which has not been a reasonable reason to do anything for many decades, and yet C programmers seem to like to do it anyway.


I agree, but I'm still cracking open that page when I'm reading someone else's code. I guess you work solo most of the time?


Nope, I work on a team. I guess we all have similar instincts about what is readable.


It's not that YOU have any issues with it -- because you're perfect.

https://www.youtube.com/watch?v=fKHaNIEa6kA

It's about the poor people who read your code that relies on both you and them having perfectly memorized every single little detail of operator precedence and associativity, instead of simply and consistently using parenthesis.

Quick without looking: can you tell me what the precedence and associativity of the ternary ?: operator is?

The designer of PHP got it wrong (which isn't surprising given his proudly self proclaimed contempt towards computer science and incompetence at parser writing), but then millions of PHP programmers also learned it the wrong way.

https://en.wikiquote.org/wiki/Rasmus_Lerdorf

Do you really want any of those people who were corrupted by PHP messing around with your code, if you relied on it being one way, and they assume it works the other way?

It's not that you can't tell what it actually does, it's that you can't tell what the person who wrote it actually meant, which is more important than what it actually does, especially when it has bugs.

Don't do many operations on one line, AND do use parenthesis, AND do use indentation, with no exceptions except for very simple expressions. Take every opportunity to use line breaks and vertical alignment to make symmetry and repetition and nesting visually obvious, like:

  float distance =
    sqrt(
      (x * x) +
      (y * y))
Redundant parens, plus breaking expressions into multiple lines and indenting according to depth, unambiguously express programmer INTENT, so the reader doesn't need to wonder if the person who wrote it had a clue or was just showboating.

Just use parenthesis, and put a comment on it, sailor.

https://wellcomecollection.org/works/m33njwx3/items

My copy of The Little Schemer won't open to page 13 because of the jelly stains.

https://vpb.smallyu.net/[Type]%20books/The%20Little%20Scheme...


What is with the gay sailor condom ad? You're being completely ridiculous.

I'm glad you enjoy Forth so much, I guess. I'm sure postfix will catch on any day now.


Isn't that what brackets are for?

Well that's what I do. If you're having to look it up to write it, you're going to have to look it up to read it again down the line.


Can I up vote for first paragraph, and down vote the second? :-)


Just checked my first edition K&R (copyright date 1978, last page number is 228 (end of index) after which there is a single tearout page for other "High Quality C and Unix system titles" from Prentice Hall. There's also a front section that has about 10 pages in roman numerals (2 pages of prefix starting with 'ix') so about 240 pages total.

Page 1 starts: "Chapter 0: Introduction".


Page 1? What, it's not zero indexed?


It looks like it's 1-indexed, but it core dumps when you get to the last page.


My Second Edition K&R (purchased in the mid-90s) is only 272 pages, including the index.


Which is why most workloads bring POSIX for the ride as means to make anything actually usefull.


POSIX is an OS API. You're complaining that C interacts with the operating system to do useful work? What language do you use that can do useful work without interacting with the OS?


POSIX is the part of the C standard library in UNIX, that should have been part of ISO C as well.

It wasn't, so any C application that is more than a toy hello world with stdio, pings back into POSIX for any kind of meaningful work, that wants to stay cross platform.

Basically it the the C runtime library, that wasn't part of ISO.

I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.


>I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.

If you're downloading a JVM binary, you're missing out on the build step. It's C dependent, friend. How do you think that VM interfaces with the OS? Go on. Try it. ldd the java executable.

It's libc all the way down. C itself is a sort of "VM" specification utilized to create the tools to run the tools to build the tools that make other high level languages possible.

Unless you create something entirely custom in platform specific assembly, you're running on C at some level.


> POSIX is the part of the C standard library in UNIX, that should have been part of ISO C as well.

I'm not sure what you're trying to say. The Portable Operating System Interface (POSIX) is specified in an ISO standard, and basically specifies what a UNIX operating system's programmable interfaces are.

https://en.wikipedia.org/wiki/POSIX

POSIX also specifies stuff like "awk must be made available". Is that what you think the C programming language specifies?


I don't think you really understand what POSIX is.

POSIX is an IEEE standard (example [1]). POSIX defines the Operating System API. You can see the C implementation of this API here[2].

> so any C application that is more than a toy hello world with stdio, pings back into POSIX for any kind of meaningful work

Simply calling printf relies on writing to a file descriptor. A "Hello world" application on linux uses posix. ANY hello world application uses posix. Even your Java Hello world App will call into the posix APIs. `System.out.println` isn't magic. It calls into the C posix implementation.

If you want to do anything in any language (write to files, create threads, allocate memory, network communication), you need to go through the OS. POSIX is what defines that OS interface.

> I use JVM, .NET, Web and C++, not caring if the runtimes are bare metal or running on top of an OS, type 1 hypervisor, or whatever.

So you use POSIX, you just don't think about it.

[1] https://standards.ieee.org/ieee/1003.1/7700/

[2] https://en.wikipedia.org/wiki/C_POSIX_library


Comparing C to Haskell is like comparing razor to laser scalpel.


And not just any razor, a really useful one!


One of those that always cuts the user no matter how carefully they try to get hold of it.


No it's a tool.

I suppose you could compare it to a table saw. C is one without a guard or any other safety measures, so you need to be careful not to cut your fingers off. More modern languages have the guard and break etc.

For general use you probably do want all the safety bits, but occasionally it is useful to be able to take it off to do a weird cut on a weird bit of wood.

None of that necessarily means you will cut your fingers off though.


I learnt it a few years before that, and I remember how reading K&R once I got it felt like having someone turn UP the lights, open the blinds, wash the windows and basically TURN UP THE SUN compared to things I read before. So much clarity.

Number of times I've seen trigraphs in "real code": still zero. I hope it's the same for you.


I read it at a similar time, and I remember that feeling well. However, if you revisit it with a critical eye, you find a hundred places where a result isn’t checked, bounds aren’t checked, memory is leaked and so on.

All of this was pretty much fine in the context in which it was written, but these days bullet-proofing things is pretty much mandatory and K&R’s elegance disappears in the face of such challenges.


Well yeah, but there is a difference between teaching the mechanics of a language, and teaching how to write safe, correct and secure code in that language.

Perhaps that mindset is part of what made C survive for so long and in such diverse roles.


In 1990 IBM donated a 9370 computer to our university. The default code page for German EBCDIC did not support square brackets.

I don't remember whether trigraphs were not supported by the compiler at the time or whether we just wanted to avoid completely unreadable code. Not experienced in VM/370 administration we spent weeks to modify the system to use some international EBCDIC codepage.

The system never saw much use, everybody preferred Unix workstations where programming in C was a natural thing.


I, too, learned C by reading K&R cover to cover and solving all the exercises (in front of a Sun 3/160 running SunOS 3.5-ish). Even then back in those ancient days, it was obvious trigraphs were evil and should have been abolished to a special place in hell.


This is because IBM 029 card punches don't support these characters right?


I thought it was because of international character sets that lacked the punctuation of EBCDIC CP37 or ASCII.

[edit]

For example ISO/IEC 646 is ascii with punctuation replaced by other characters.


EBCDIC is very much IBM - on our old Burroughs machine we used to have to use cent signs (and something else that I forget) for square brackets and a 3-hole multipunch for ';'


stills leaves the issue of how to print out something that doesn't exist as a physical old style physical type face character. (way pre-dot matrix / laster printer stuff). -- aka substituting 3 characters way more informative than blank space.


My (possibly wrong) understanding is that trigraphs were a late addition to the ANSI standard, which would place it in late 80s, well into the CRT terminal and dot-matrix era.


I think the ISO/IEC 646 invariant character set is more the issue.


In the "Swedish 7-bit ASCII", the C code "a || b" would look like "a öö b". The same character mappings were used in Finland, and that's why IRC count the characters {|}[\] as letters (that would typically have been displayed as "äöåÄÖÅ").

On the Compis II computer (a CP/M machine built on the 80186 CPU), there were places for {|}[\] in the character set, but they were in the top half of the 8-bit characters and not generally useful for programming.


Or an ASR33


Fantastic book. I used Learn C in 21 days and that is what started everything for me. I had a second book on Linux administration and installed Slackware from 1.44mb disks, ultimately setting up pppd and using Mosaic.

Great memories!


Trigraphs make this obfuscated C submission possible: (https://gist.github.com/Property404/e31b99deb3527159e183)

I've pasted it here for convenience (formatting fixed, thanks child comment!):

   //  Are you there god??/
   ??=define _(please, help)
   ??=define _____(i,m, v,e,r,y) r%:%:m
   ??=define ____ _____(a,f,r,a,i,d)
   main(__)<%____(!_(-~-??-((-~-??-!__<<-
   ??-!!__)<<-??-(!!__<<!!__))+-~-~-??--~-~
   -~-~-~-~-??-(-~-~-~-~-??-!!__<<-~!!__),-
   ??-!__))<%??>%>_(__,___)??<____
   (printf("please let me die??/r%d bottle%s"
   " of bee%s""""??/n",(!(___
   %-~-~!!___))?--__+!___++:__+!___++,!(__-!!___)
   &&___%-~-~!!___??!??!!(___%-~-~!!___??!??!__
   -(-~!!___))?"":"s",___%-~-??-!!___<-??-!!___?
   "r on the wall":"eeeeeeer! Take one down,pass ??/
   it around")&&__&&_(__,___),"mercy I'm in pain")??<??>??>


Roughly the only good use of trigraphs these days is for obfuscated code, for example here: https://www.ioccc.org/years.html#1990_scjones

But trigraphs have gotten old even for IOCCC. In the guidelines for recent years, they specifically mention "We tend to dislike programs that ... obfuscate by excessive use of ANSI tri-graphs": https://www.ioccc.org/2020/guidelines.txt


How to format text on HN: https://news.ycombinator.com/formatdoc

  For code blocks, prefix each line with two or more spaces.


Thanks (I haven't seen this despite lurking on HN for 'a long time' and interacting with it recently, however, you clearly didn't quote the doc, which says Text after a blank line that is indented by two or more spaces is reproduced verbatim. (This is intended for code.)

Small nitpick, however I am happy you linked the page.


Guess with tri-graph elimination & awk getting unicode support will have to gawk C with cpp using pipology theory.

But think the cpp has to go away first, after enough sed.

https://grayson.sh/blogs/using-piphilology-to-hide-strings

https://www.gnu.org/software/gawk/manual/gawk.html#Signature...


Note that this uses not only trigraphs but also digraphs (here `<%`, `%>` and `%:`), which are similar to trigraphs in intended usages but behave much differently to digraphs in that it is a proper token and not a preprocessor substitution pattern. `printf("??(foo??)<:bar:>%c", "quux"<:1:>)` prints `[foo]<:bar:>u`, for example. Therefore digraphs are deemed less dangerous (however obscure) than trigraphs and do not require any compiler options.


Bjarne Stroustrup proposed Generalized Overloading for C++2000, which not only lets you override all kinds of white space, like between two symbols separated left to right by a space (i.e. "a b" to add a to b), or two symbols separated top to bottom by a newline (i.e. "a \n b" to divide a by b, like a fraction), or even by tabs, or either kind of comment, but it also lets you override writing two symbols next to each other without any separation (i.e. "ab" to multiply a by b, which mathematicians love)!

Of course they also had to limit the number of characters per symbol to 1 in order to unambiguously support the "ab" syntax for multiplying a and b (or however you wanted to overload the "absence of white space" operator), but fortunately they mitigated that little problem by making C++ fully supports Unicode, so you had thousands of single character Unicode variable names to choose from. His prophetic intuition was spot-on, now that there are so many expressive and inclusive Emoji characters to use for single character variable names!

https://www.stroustrup.com/whitespace98.pdf

I really appreciate Bjarne Stroustrup's clean simple design and coherent long term vision for C++2000, and I'm looking forward to using three dimensional white space overloading in C++3D.


I am almost unhappy to learn that this was a joke. Would have been nice to put a final (personal) nail in the C++ coffin with this insanity. However, I guess it says enough that I did have to dig quite far into the paper to realize whether Bjarne was joking or not.


We do not have yet three dimensional white space overloading unfortunately, will have to wait for the next release train, targeting C++26.

In the meantime, you can use multidimensional analog literals: http://www.eelis.net/C++/analogliterals.xhtml


See also: "What is the "-->" operator in C++?"

https://stackoverflow.com/q/1642028


And of course, its cousin the slides-to operator, described in the answer https://stackoverflow.com/a/8909176/493553 with the following example:

    while (x --\
                \
                 \
                  \
                   > 0)
         printf("%d ", x);


Whenever somebody complains about how Python uses indentation instead of { } or BEGIN END, you can prove to them that it actually does support that and more, by demonstrating that they simply need to prefix their favorite brackets or keywords with Python's unary "#" operator, like:

    for i in range(10): #{
        print(i)
    #}
or:

    for i in range(10): #BEGIN
        print(i)
    #END
You can even mix-and-match them, like:

    for i in range(10): #BEGIN
        print(i)
    #}
or:

    for i in range(10): #{
        print(i)
    #END
or turn them inside-out, like:

    for i in range(10): #}
        print(i)
    #{
or:

    for i in range(10): #END
        print(i)
    #BEGIN
Python is extremely flexible that way, and can easily strangle and eat all other languages.


It swallows them whole, one might be tempted to say.


Ha, I chuckled


I mean that's fine for code that you write, but most python code is not like that at all


And sort of the opposite of that, I once had someone say they wanted to contribute to the C++ portion of our codebase, but the only problem was they didn't know how to make the "->" character, and did they need to get a special keyboard?


Is it possible that their editor provided ligatures but they didn't know about those and so assumed it was actually a character in the source?


No, it was much too long ago for that. They were just very new to programming and had interpreted it wrong the first time they saw it (most likely in a book, that's how we used to get our first introduction to a language).


Semi-related: One of the people on my team uses a font that displays things like ">=" as "≥". I was a bit confused the first time I saw it.


Yeah, I thought this was going to involve the ternary operator. TIL about trigraphs.


Yeah, C# has some shortcuts these days of the form x <symbol>= y that compile as x = x <symbol> y. They are an actual advantage as x only needs to be stated once and it is absolutely clear that is assignment into the value--more information readily communicated with fewer characters. It also has the null coalescing operator ??. Put those together and you can have x ??= y (if (x = null) x = y;)--useful for lazy initialization and it can be returned, making lazy initialization getters much clearer. This looked awfully similar, I was trying to figure out how you could negate null coalescing.


These days, like since C# 1.0.


It's interesting to see the history of any given piece of syntax. This specific one is called augmented assignment or compound assignment: https://en.wikipedia.org/wiki/Augmented_assignment

+:= in Algol68

=+ in B

+= in C


still can, just have to add precidence changing characters ( )


From the ASCII Wikipedia page (https://en.wikipedia.org/wiki/ASCII#7-bit_codes):

> Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

> Many other countries developed variants of ASCII to include non-English letters (e.g. é, ñ, ß, Ł), currency symbols (e.g. £, ¥), etc. See also YUSCII (Yugoslavia).

> It would share most characters in common, but assign other locally useful characters to several code points reserved for "national use". […]

> Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and, thus, read, something such as

  ä aÄiÜ = 'Ön'; ü
instead of

  { a[i] = '\n'; }
> C trigraphs were created to solve this problem for ANSI C, although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on US-ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or Usenet) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".


One of the challenges of | is that it was never entirely clear whether the ASCII | should be equivalent to EBCDIC’s | or ¦. As I recall, Waterloo C wanted ¦ as its vertical bar character, although I could be wrong. On the IBM system that I used back in the 80s, we had ASCII terminals which were run through a muxer to the actual system (which was part of the magic that allowed it to have thousands of concurrent users all getting real-time access—a lot of UI was offloaded to these systems which were essentially minicomputers on their own).


Great article (that appeared on HN somewhat recently) from Ken Shirrif on the history display terminals, and a great photo of the IBM 2848 Display Controller.

http://www.righto.com/2019/11/ibm-sonic-delay-lines-and-hist...

The next-gen was far more common.. The IBM 3270 terminal hooked to a local controller that talked to the mainframe. Could also hook a printer to the controller, you could print screen and simple forms independently from the mainframe.

You know all this, but I've always thought it was cool, and try to refresh my understanding of the setup. I no doubt have many details wrong.


There's also iso646.h which allows you to do some particularly python looking stuff:

  #include <iso646.h>
  #include <stdbool.h>
  #include <stdio.h>
  #define is ==
  
  bool is_whitespace(int c) {
    if (c is ' ' or c is '\n' or c is '\t') {
      return true;
    }
    return false;
  }
  
  int main() {
    int current, previous;
    bool in_word;
  
    while ((current = getchar()) not_eq EOF) {
      if (is_whitespace(current) and not is_whitespace(previous)) {
        putchar('\n');
      } else {
        putchar(current);
      }
      previous = current;
    }
  
    return 0;
  }


Of course when you are willing to use preprocessor, you can do things like Bournegol: http://oldhome.schmorp.de/marc/bournegol.html


Or give C C++ functionality : https://libcello.org/




In C++ these are genuine operators and do not require the macros from iso646.

I quite like them, but then again, I have been writing way too much python lately.


Wow, and I thought I knew C pretty well. Great post.

edited to add: I really like "Modern C" and just re-checked -- no mention of the preprocessor feature!

https://hal.inria.fr/hal-02383654/file/ModernC.pdf


I think the only remaining purpose for trigraphs is when you are at the very end of a C interview, and your amazing candidate has answered every question perfectly, and you just have to find something they might not know about--only then do you reach for the trigraphs.


No, that’s the next-to-last question. If they know that, you ask about digraphs

  <: and :> are [ and ]
  <% and %> are { and }
  %: is #
(since C99, and expanded a bit later than trigraphs)

‘Unfortunately’, none of the characters used here can be coded using trigraphs, so you can’t use trigraphs to generate digraphs in source.


A very long time ago I had to write some throwaway code on a laptop with an European keyboard with {} in very inconvenient positions (requiring pressing the alt key). I resorted to digraphs and I don't regret it.


Gasp! You mean they've heard the volatile question before?


The register keyword is far more interesting.


How about the restrict keyword.


Wow, I know all these. I only recently discovered bit addressing in C though.


Why would you do this? Some weird insecurity?


Haha no! When a candidate is that awesome, I sometimes get morbidly curious about whether there is actually an end to that depth of knowledge or if it just goes on forever. At that point, they already have the "HIRE" classification and I'm pretty much in awe!

I love it when a candidate blows through my easy, medium and hard questions and leaves me scrambling.


Salary negotiating leverage? The worst they feel the more likely they accept to be lowballed


I think C also has the elusive "down to" operator.

https://stackoverflow.com/a/1642035


"-->" is not an operator in the C language, it's just a way of writing the unary operator "--" and comparison operator ">" together without any whitespace between them, since whitespace is ignored by the lexer.


Have any of you downvoters read the C grammar? There is no --> operator in C. I'll never understand some people.


I didn't downvote you but I presume others did because the StackOverflow post I linked to essentially says the same thing.


Honestly, I thought this was about a programming language called C? rather than C.


In the spirit of C++ and C#, there could be a C?'1':'0'


"There's a problem. Some machines don't have some braces and vertical bars and such. We'll have to add keywords like OR and BEGIN and END."

"Are question marks fine?"

"Yes."

"I'll come up with something."



This reminds me of a comment on a Python discussion >2 years ago, of which I think often:

"Whether it's computer languages or human ones, as soon as you get into a discussion about the correct parsing of a statement, you've lost and need to rewrite in a way that's unambiguous. Too many people pride themselves on knowing more or less obscure rules and, honestly, no one else cares."

https://news.ycombinator.com/item?id=23051202


Completely agree with that. In fact, it's the first thing I thought of when I saw the code snippet in question. Even if you replace the trigraph with the regular || operator, it's still hard to read that piece of code. Syntactic sugars and short circuits are cool and all but most of the time they have no place in production code that's meant to be read by other developers.


I'd say, "Congratulations! You're one of today's luck 10,000!", but trigraphs aren't really much fun. Just another reminder that C is old, and computing is even older.

I've used uppercase-only terminals, and I've used ancient C, but not at the same time.


Ancient C didn't have trigraphs. My copy of K+R (1978) doesn't mention them.


No, they were a design error introduced by the ANSI committee.


I thought ISO added them when C went from ANSI C (1989) to ISO C (1990), along with wchar.h and such. I might misremember, though, it's been a long time since I did anything serious with C.

Come to think of it, didn't they remove trigraphs in one of the more recent iterations of the standard?


They did remove trigraphs in one of the more recent iterations. It's possible that it was as you say, but I have this vague memory that it was ANSI who added them. I think maybe what the ISO added were https://en.wikipedia.org/wiki/C_alternative_tokens.



I've never seen them used anywhere.


They were meant for coding C on machines that had even less than ASCII as available text encoding really. So no wonder you never see them.


They were meant, mostly, for punch-card machines.

So if you started programming anywhere after the point in time when you needed to hand off your code to a punch card operator, you're unlikely to have seen them.


They were meant to support EBCDIC.


They're good for obfuscating source code but AFAICT that's about it on modern machines


Less obscure if physical print type head doesn't have the corresponding trigraph representation.

Unicode is a worthy successor to trigraphs -- no need for pre-processing!

Guess with tri-graph elimination & awk getting unicode support will have to gawk C with cpp using pipology theory.

But think the cpp has to go away first, after enough sed.

https://grayson.sh/blogs/using-piphilology-to-hide-strings

https://www.gnu.org/software/gawk/manual/gawk.html#Signature...


They are still around in C though.


They are being removed in C23.


I see, thanks.


gone in C23


... but if someone sed it back in .....


I see, thanks.


Huh, I never realized that C++ standards were removing C features. Time to be more careful about using g++ for everything.


C++ has never been a strict superset of C. The most obvious example is the "class" and "new" keywords which can be used as an identifier in C, but not in C++. There's more subtle differences as well, such as character literals having type int in C and char in C++.


Another really common one is that casting from void * to any other type doesn't require a cast in C, but it does in C++:

  #include <stdlib.h>

  int main()
  {
    int *foo = malloc(sizeof(int));
    return 0;
  }
That works in C, but not in C++.

There's actually another subtle different in there that main() means "unspecified arguments" in C, and "no arguments" in C++. ("No arguments" in C would be main(void).) However, it's no longer commonly used that way in C, but casts from void * to other types is very common in C.


The `func()` vs `func(void)` difference has been deprecated for a while, and is removed in C23.


Using unions for type punning is legal C, but the exact same code has UB in C++

The modern C++ way to do this ~safely isn't legal C, and yet the type pun isn't safe in C++. I believe using memcpy() to launder the bits is legal in both languages and in some cases your compiler can figure out what you're doing and not actually emit the unnecessary copy.


I used a few different compilers for C in one project. Ended up at memcpy and byteswaping to get data between different instances of the code correctly (some ARM, mips, and x86, and each of those can set the byte order). Using a union is possible if it supports packing and the bytes happen to be in the same order and the compiler keeps the struct in the same order. I found that is not true of all compilers, by default. Massively annoyed having to rewrite about 50 file writes/reads that were nice and simple with massive memcpy cascades. Inside the same code on the same compiler you can get away with a lot of things. But port to another arch or try to get bin data out of your program into another (good luck). These days there are realistically 4 compilers people use and they tend to behave mostly the same, also nice libs that do most of this for you. That was the same project I learned not all printf's are created equal. Different CRTs do very different things even in the same compiler family. There is a reason everyone decided to use json and xml to transport data. Because of that mess.


Ah, what an elegant example, haha.


Using g++ for C code is a recipe to get badly burnt - for unrelated reasons. Trigraphs are disabled in gcc by default anyway.


That's true for any C++ compiler, really. Although C++ tries to retain some element of compatibility with C, there have always been differences (you can name a variable `class` in C but not in C++).


By default, GCC ignores trigraphs in C code too.

You have to explicitly pass -std=c17 (or whatever) to get standard-conforming behavior including trigraphs.



Years ago I wrote a perfectly reasonable comment like /* WTF??!?!!?!???? */ and the old C compiler complained about "invalid trigraph". A syntax error in the middle of a comment!

Took me a while to figure out that "trigraph" was referring to some part of "??!?!!?!????" and not "WTF".


That's a bug, there is no such thing as an invalid trigraph. ?? followed by any character other than =, /, ', (, ), !, <, >, or - is not a valid trigraph, but that doesn't make it an invalid trigraph, that just makes it not a trigraph, it's perfectly valid to have ??? in a comment, or in a string literal.


Are you telling me that C compilers in the early 90's had bugs and confusing error messages??!!?!??? WTF?!??!?!?


Oh so that's why their called trigraphs, because there's 3 valid states?

Valid Invalid ??? (Exercise for the reader to decide if this is a trigraph or not)


Every time I hear about trigraphs I think of this horror:

http://stackoverflow.com/questions/53315710/ddg#53315821


There are two aspects to this, the trigraph, and using the short circuiting behaviour of the binary logic operator for control flow.

The latter is a very common idiom in Julia code, which I found obscure and puerile at first (“look how smart I am”), but have come to appreciate as concise and natural by now.

For example:

  function fact(n::Int)
     n >= 0 || error("n must be non-negative")
     n == 0 && return 1
     n * fact(n-1)
  end
https://docs.julialang.org/en/v1/manual/control-flow/#Short-...


In addition to trigraphs, there are apparently a set of C alternative tokens defined as follows:

  #define and &&
  #define and_eq &=
  #define bitand &
  #define bitor |
  #define compl ~
  #define not !
  #define not_eq !=
  #define or ||
  #define or_eq |=
  #define xor ^
  #define xor_eq ^=
I suppose that allows for code like this:

  if (x or not y or not z) {
      return 1;
  }
https://en.wikipedia.org/wiki/C_alternative_tokens


Makes for great obfuscated C++.

    template <typename T>
    void print(T const bitand foo) {
        std::cout << foo << std::endl;
    }


    void print(auto const bitand foo) {
        std::cout << foo << std::endl;
    }
Since C++20.


The instructor at the branch college where I learned C++ in the late 90's taught us that those were the preferred operators and that the old operators belonged in the wastebasket of history along with printf and str* functions.

It made for some amusing group projects when I got to university, when classmates had never seen those operators and were trying to figure out where they were coming from and why I would write such silly things. I trolled them by replacing all my brackets with `begin` and `end` in the next assignment before moving to the standard use of C operators for the rest of the class.


Anecdote: An online judge website (which is pretty well known in Korea) has an easy problem[0] asking to write a program which adds "??!" to input. A lot of beginners' C/C++ submissions got "Wrong Answer" verdict because of trigraphs.

[0]: https://www.acmicpc.net/problem/10926


Reminds me of the "goes to" operator [1]

[1] https://stackoverflow.com/questions/1642028/what-is-the-oper...


This sort of practice goes back to BCPL, which wikipedia says is the first braced programming language. Because { and } weren't universally available, compilers also supported the sequence $( and $) to represent these, which were typeable and printable on just about anything.

https://en.wikipedia.org/wiki/BCPL

This is the earliest example of this sort of thing i'm aware of - is there an earlier example?

Also, BCPL supported // for comments, again, probably the first use of this sequence.


> Has Microsoft Windows finally been open-sourced or where did this come from?

This comment on the SO post made my day. :D


In gcc I got:

    1.c:1:11: warning: trigraph ??< ignored, use -trigraphs to enable [-Wtrigraphs]
Is there a preprocessor directive to enable support out of curiosity?


from [1], trigraphs or not:

  int main() {
     [](){}()
  }
is still wierd.

Wonder if there will be a request for an emacs macro to handle the replaced cpp trigraphs? [2]

[1] https://zygoloid.github.io/cppcontest2018.html [2] https://www.emacswiki.org/emacs/CppTemplate


Good news, in C++20 you can add <> there somewhere, although probably it can't be empty.

Anyway, probably obscure enough:

  int main() {
      []<class=void>(){}();
  }


And in C++23 drop the arguments parenthesis, so this is also valid lambda call, :)

  int main() {
      []{}();
  }


[]{}() was always valid, but you can drop the arguments in more cases in C++23.


If we deprecated trigraphs and removed that step from the compiler would it speed compilation up much? I’m going to guess maybe by milliseconds?


They are already deprecated and removed in C23


Probably not by any measurable amount


I imagine microseconds or less


0 if sed used to expand the trigraphs before passing output to cpp/compiler.


Always zero if you make it someone else’s problem :-)


microseconds, not milliseconds


C++17 removed trigraphs. Sadly will no longer work.


s/Sadly/Gladly/


Oh trigraphs may you never die




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: