
GCC tiny - ingve
http://thinkingeek.com/gcc-tiny/
======
modulan5
I appreciate the series of articles on Tiny and think they are fine, but I
have a few minor points:

1\. Missing “string” types in Tiny?:

In Part 8 a “bool” and “string” type were both first introduced. Tiny’s type
system was extended with “bool” in Part 9.

But the “string” type (not to be confused with string-literal), was never
added to Tiny’s type system.

I assume an array of “char” would cover that “string” type if “char” is added
as an extension of the type system.

2\. Expressions in Tiny:

The rule "unary-op" shows that unary "plus", unary "minus" and "not" are all
unary operators in Tiny. In Part 4, the rule is confirmed with a statement
that "not" is a unary operator.

But in the table of op priorities in Part 1 "not" is placed in with "and",
"or" instead of in with unary "plus" and unary "minus", so it has the
incorrect prority there.

3\. In the git-hub file for Tiny’s grammar, the rule for expression is:

expression -> primary | unop op expression | expression binop expression

I believe this should be changed to:

expression -> primary | unop expression | expression binop expression

~~~
brudgers
I'm not really sure what a string type is because, mathematically, a string
consists of letters which are part of an alphabet which are accepted or
rejected by a state machine. The entire alphabet might consist of 0 and 1 (and
in computing _practice_ that's what everything boils down to eventually). In a
language like, C a char is just a convenient abstraction over a byte and a
byte is just an abstraction over bits...sometimes 8, sometimes not.

A difficulty arises when, as is often the case, "string" is used to denote
human readable text. That this happens is understandable due to the use of
char as an abstraction for byte and consequently arrays of char being able to
represent human readable text in the several spoken languages common among
programming communties of yore.

The high level abstraction missing from most languages is a text type that
expresses the idea of human readability and is free from the ambiguity of
string types. Or to put it another way, Unicode strings will always be
problematic because the organization of characters of human alphabets is not
always entirely logical.

~~~
fao_
""I'm not really sure what a string type is because, mathematically, a string
consists of letters which are part of an alphabet which are accepted or
rejected by a state machine.""

A C string is a chunk of memory containing either bytes or words, that is
terminated with a null (byte|word) at the end.

A UTF-8 string uses the same transport, but it is blocked into "Codepoints"
(Groups of 1 - 4 bytes, dictated by _n_ high bits being set to a specific
configuration).

A UTF-8 "Grapheme" (The actual character that gets printed) is a number of
Codepoints that are interpreted as being grouped together, as per the NBNF
given in the Unicode spec.

All of this takes place in a block of memory of a number of words or bytes.

That's all.

A string type that conforms to C Strings is basically just a block of memory
of a certain size that happens to have things in it that conform to what we
would consider graphemes to be. Some String types have the block of memory in
a struct with a number and sans the last null byte, some string types use
blocks of lists, or a list of characters. But the most prevalent is the C
String.

~~~
brudgers
I agree and apologize for not being clearer.

What I was getting at is that as a Type within a type system and independent
of a particular language, it is not clear what the words "string type" mean.
As you point out, in C, the string type is an abstraction over a contiguous
memory block and allows addressing by bytes -- Addressing blocks of memory by
bytes is often convenient when processing the values stored in the block with
a state machine.

UTF-8 strings exist because the string type as an abstraction over bytes fed
into a state machine became conflated with the idea that strings had an
intrinsic relationship to human readable text.

Erlang is a programming language that does not conflate strings with human
readable text. Perl6 is a language that handles human readable text better
than average.

~~~
Koshkin
Well, if you want a completely general definition of what a string type is,
you could get one by following an axiomatic approach and considering the set
of (abstract) operations, or functions, that are allowed to be done on
strings: concatenation (a binary operation with the neutral element called
"the empty string"), taking a substring, having a string-to-integer map
"length", etc. From this general perspective, it would be incorrect to say,
for example, that a string consists of "individual characters" \- instead you
might say that the smallest substrings of any (non-empty) string all have the
length 1; in this sense the notion of a "character" does not even appear in
the abstract definition of the string...

~~~
brudgers
I don't disagree, and I should have said 'symbols.'

After I wrote what I wrote I went for a run and I suppose another way of
making the distinction I am making between strings and text is that a machine
can necessarily decide what is and is not a valid string and a mchine cannot
decide what is and is not a valid text (unless we conflate strings and text).

Or to put it another way the recursively enumerable languages are closed under
union, intersection, concatination and Kleene star. That's the realm of
strings. Text is not closed under those properties, e.g. 'hello*' is not
guaranteed to produce a valid text.

~~~
Koshkin
You are right: a text is not a string - because it is subject to completely
different axioms (the "grammar"). I guess, this is similar to, say, groups, in
mathematics, and sets being different _categories_ , and even though the usual
naive definition of groups is based on sets, this is only possible because of
the existence of the "forgetful functor"... In fact, in computing, a string is
not considered a (valid representation of a) text until it has been
successfully _parsed_ and, often, translated into its "true" representation
which is not a string at all!

------
astrobe_
Does it provide interesting advantages over "transpiling" to C?

~~~
le-mark
This is my question as well, seems like a boat load more work than writing c
to a file and calling gcc on it.

One thing I can think of; if the language you're implementing has exceptions,
you'd have to implement these the naive way wiht setjmp/longjmp if you target
c. Supposedly if you used gcc in the way this article shows, you could use gcc
to generate "real" excpetions.

------
aleden
This is so ugly.

> The next step is telling GCC configure that we are going to build GCC with
> tiny support. This will fail. Do not worry, this is expected.

Back in the 2.9 - 3.8 LLVM days there was a lot of API breakage between
releases. A lot of people I knew who worked with LLVM detested this, but would
you rather deal with this kind of cruft?

~~~
jordigh
Wait, so has it stopped breaking? Because their breakage essentially killed
the LLVM-based JIT compiler project for GNU Octave. Someone wrote it based on
the C++ "API" and we spent time trying to keep up with it and then we gave up.
Nobody really understood it well enough to rewrite it for the C API or knew if
it was even possible, so we abandoned it.

~~~
aleden
The changes in the LLVM JIT were definitely a disappointment. IIRC it used to
be a lot more lightweight the API was simpler to use.

But in hindsight it's hard to argue that the LLVM devs shouldn't have messed
with it. Before you couldn't get symbolized stacktraces or use C++ exceptions.
LLVM's primary goal has always been to produce the best code, and not
necessarily try to be the fastest at doing it [1].

In the 3.x days, it was a fast evolving project. I remember projects I managed
that used LLVM would no longer compile between releases because e.g. they
would rename a header file.

"Just to rename a header file??" one might say; it's a subjective question
whether this philosophy hurt or helped the project in the end. It might take
years until that answer is forthcoming.

I'm sorry to hear about what happened with GNU Octave. Maybe after LLVM proves
to stabilize someone new will write up the code :)

[1] Tangentially related: [https://bellard.org/tcc/](https://bellard.org/tcc/)
Compiler Time(s) lines/second MBytes/second TinyCC 0.9.22 2.27 859000 29.6 GCC
3.2 -O0 20.0 98000 3.4

~~~
jordigh
> But in hindsight it's hard to argue that the LLVM devs shouldn't have messed
> with it.

Not that hard, software stability is a very desirable property:

[http://stevelosh.com/blog/2012/04/volatile-
software/](http://stevelosh.com/blog/2012/04/volatile-software/)

