What I mean is that if the language has multiple ways to express statements that are semantically absolutely identical but differ in just the 'sugar', then it would be useful to have a way to automatically 'normalize' the code to always use a single idiomatic way of doing it (which in many languages is the 'sugar' version).
It's just like with indentation and brace styles - the original authors may each have their own preferences, but when reading code, it's more efficient to view the code in a single style according to my preferences, and that can be done automatically by showing me the code as reindented, with my preferred brace positioning, my habitual syntax highlighting color scheme ... and also with a standartized level of sugaring.
Come on, you're intentionally and misleadingly nitpicking if you're claiming that the halting problem is a restriction for this. You can trivially detect a large set of terms that are semantically identical; and all the rest - possibly identical and possibly not identical - can be left alone. "i = i+1;" <=> "i++;" and such.
In particular, "syntactic sugar" explicitly refers to substitutions that can be (and are) made by considering only the local scope, and the rest of the program can't change the validity of this substitution; in the cases for syntactic [de]sugaring all the changes can be performed safely by looking at a statement or two (or their AST representations) in complete isolation.
If you have two functions that happen to be semantically equal but differ in other things than pure sugar - well, then they won't be affected by desugaring or sugaring, obiously, and we don't need to detect if they're actually equal or not.
Translating from source code to AST is clear and unambiguous and doesn't require anything related to the halting problem.
Translating from AST to some canonical representation of source code can also be done - it's not trivial, but if you have the original code, then you can get a better result than the current decompilers do.
This will result in cases where transforming back and forth will cause
A => A' => A
and
B => A' => A
which is the intended result.
That's not how I read gp, exactly.
What he is saying sounds more like taking the AST of a program, desugaring at much as possible, and then re-sugaring it in such a way as to produce a consistently styled codebase, sort of like gofmt but more opinionated and linter-like.
What he wrote is "statements that are semantically absolutely identical but differ in just the 'sugar'". I was just quoting him back. Of course, one can always construct various hacky approximations to this, and maybe that's what he intended, even if it isn't what he wrote; I couldn't possibly tell.
I think that this is (quite) a bit like claiming (verified) compiler optimisations are impossible because they, too, have to re-write any given expression only to a semantically identical one. One wants to detect only pairs of statements that are semantically identical; this does not require detecting all such pairs.
Look, the OP could have been precise. We have terminology that makes all these things very exact. That's why terms like "soundness", "completeness", etc. were invented.
The OP wrote something that is ambiguous. I was pointing out why it's tricky. You're welcome to interpret it in a way that makes it less tricky.
However, what you're asking for still requires a pretty strong decision procedure if you want something that is actually of use on non-trivial examples. If you have expressions that are nested, then you need to be able to reason about all possible contexts; if you have state, you need to reason about all possible states. Maybe you have lots of experience with this kind of reasoning and have gotten used to it, but I find the necessary underlying proofs (whether human or automated) pretty heavy going.
The 'absolutely' should have hinted that 'semantically' wasn't the full extent of what he meant. Focus your quote onto "differ in just the 'sugar'" and you see exactly what he meant. He didn't misspeak. Sugar is a very specific term.
I'm not actually unclear on the meaning of these words. There happen to be multiple possible interpretations of what the OP wrote, some demanding problematic things and others impossible ones. As I said in my reply to @JadeNB, we have unambiguous terminology for all this. Would have helped to have used it.
As for the meaning of "sugar", I _think_ I have some inkling about what it means. We're in a discussion thread about a document on desugaring. I happen to be its author.
Glad you have a very solid understanding of sugar. But that means you're being intentionally obtuse when you invoke the halting problem and discuss the entire set of programs that produce the same results.
I think from https://news.ycombinator.com/item?id=8294806 that the core of your argument is about it being difficult to tease out sufficiently complex sugar, but to be honest I've almost never seen complex sugar. Most of this stuff, especially when PeterisP talks about being idiomatic, is simple transformations that you can do easily in either direction.
You don't need a magical sugaring oracle, you need a language-specific sugaring oracle. And if you have a language where the sugar isn't decidable, I'd argue something has gone wrong.
1. Then you haven't seen very much sugar. Take a look at the transformations used in Racket, for instance. Entire parts of the language that would be built into other languages are seamlessly expressed through extremely sophisticated sugar. You can "argue" all you want, but it's a perfectly sensible language definition approach, and it works superbly, without any clear indication of anything having "gone wrong".
2. I don't know what it means for "sugar" to be or not be "decidable": that sentence isn't even type-correct. But even guessing at what you might mean, making something language-specific isn't going to help you, assuming a typical Turing-complete language, if the underlying problem is not decidable. (I don't even know what a language-independent "magical sugaring oracle" might be, because any "sugaring" procedure would have to be defined relative to some language that it works on. Except, maybe, in the "magical" case, about which I don't know very much, I'm afraid.)
Of course you can approximate it – you can always approximate anything you want – but you're then solving a different problem. It's fine perfectly to define that as the problem you want solved. But it helps to be clear about what one wants, and the OP was not.
By 'decidable' I meant that you can write an always-correct, always-halting algorithm for taking a desugared program in the language and transforming it into an 'idiomatic' sugared form. Then you have a nice bidirectional idempotent transform.
(The 'gone wrong' was more in the definition of 'sugar' but I don't want to argue that, just ignore that line.)
Most languages have a fixed set of simple sugar. There is no approximation needed; you can make a 100% perfect sugar/desugar mechanism. I'm glad you agree that it should be language-specific. That way we can discuss the typical language where the language itself is Turing-complete but the sugar-related transformations are not even close to Turing-complete.
The original statement discussed having an ability to transform code into an idiomatic form as a 'language feature'. This means it's intertwined with the design of the sugar itself, and it's up to the language designers to make it feasible. When you discuss all the ways a program could arrive at the same result, you are going far beyond the realm of the finite sugars that apply to a specific language. The Halting Problem is not relevant.
Just because you can algorithmically desugar doesn't mean you can invert that transformation. The property you're looking for is invertibility, not idempotence (which neither follows nor is relevant), and it doesn't follow.
As for the power of the desugaring system, you apparently decided to just ignore my first paragraph as an inconvenient truth. I could repeat it. But I can give you even simpler examples that are well-known as canonical examples of the genre: C++ templates, Scheme macros. Scheme's macros are about as simple a desugaring system as you could want. And yet even its _simplest_ version — too weak to express many of the language's constructs, way weaker than Racket's — is Turing-complete. I don't even know what you're talking about with "finite" sugars (all programs are finite by definition, but that doesn't imply anything about their dynamic properties).
I'm guessing haven't really thought about this problem much and are just speculating casually. You might want to read work like
which shows some of the complexity involved. (Just to make you happy, the desugaring in this paper uses about as "simple" a desugaring mechanism as you could want. And it is about an extended version of precisely the problem of undoing desugaring, hence the name "resugaring".) Once you've understood this material we'd be in a position to actually have a meaningful conversation.
I'm not looking for the ability to strictly invert, I'm looking for the ability to make a canonical sugaring. desugar+resugar may or may not give me the same as the original, but it will be idempotent.
Scheme macros? Oh wow we really are talking about completely different things. I'm talking about fixed transformations that are part of the language (just like the original comment seems to be, as far as I can tell), not arbitrary user-supplied transformations.
And I admit that I'm coming at this rather casually, but I'm considering languages I've used that have very little sugar and writing this sort of mechanism would be easy.
I looked at that paper a bit, they're tackling a very hard problem by taking languages with complex sugar and making a program that automatically processes sugar into an algorithm for resugaring. That's much harder than codesigning language and sugar to be amenable to resugaring.
You don't need a perfect Desugaring Oracle, just a Good Enough, 80% solution. You could also get a lot of mileage out of non-turing-complete subsets of the language.
It doesn't matter what you _need_: my point is you can't _have_ one in the general case.
As for your 80% and non-TC solutions, see my other response (https://news.ycombinator.com/item?id=8294806). Maybe you've built such engines, in which case I'd love to hear more (because we're actively working on related problems). It's pretty damn hard because of the nature of reasoning you need for non-trivial examples.
For more detail, I suggest reading chapters 35, 36, and 37 of the first edition of this book (PLAI), which provide even more detail. See http://www.plai.org/.
What I mean is that if the language has multiple ways to express statements that are semantically absolutely identical but differ in just the 'sugar', then it would be useful to have a way to automatically 'normalize' the code to always use a single idiomatic way of doing it (which in many languages is the 'sugar' version).
It's just like with indentation and brace styles - the original authors may each have their own preferences, but when reading code, it's more efficient to view the code in a single style according to my preferences, and that can be done automatically by showing me the code as reindented, with my preferred brace positioning, my habitual syntax highlighting color scheme ... and also with a standartized level of sugaring.