

Zinc: a low level language between assembler, C and C++ with Ruby-like syntax - tianyicui
http://tibleiz.net/zinc/

======
psnj
I was surprised by my almost-panicky reaction to seeing:

    
    
      Identifiers Can Have Blanks
    
      open_window_with_attributes(...)
      becomes:
      open window with attributes (...)
    

I think I actually felt that wrongness in my stomach. Like a more intense
version of seeing our corporate network shared drive's files with spaces and
parens in them.

I guess I'm old.

~~~
scott_s
I had a similar reaction, and I'm not sure that it's a "damn kids, get off my
lawn" reaction. Specifying an unambiguous grammar may be difficult - which
implies parsing may become a problem.

An implementation exists, so the author has something working, but I'm
wondering how robust the parsing is. I haven't seen many code examples (only
short fragments on the page), so I don't know what potential issues, if any,
there are. But, this is the sort of thing that could significantly complicate
adding new language features that requires additional syntax.

edit: I'm perusing the source for the compiler, which is of course written in
Zinc. This code from the main driver of the compiler perhaps gives a better
feel for how it may look in practice:

    
    
      while i < argc
        def arg = argv[i]
    
        if is equal (arg, "-debug")
          debug = true
    
        elsif is equal (arg, "-v")
          version = true
    
        elsif is equal (arg, "-u")
          unicode = true
    
        elsif is equal (arg, "-o") && i < argc-1
          out filename = new string (bundle, argv[++i])
          to OS name (out filename)
    
        elsif is equal (arg, "-I") && i < argc-1
          append (include path, new string (bundle, argv[++i]))
    
        else
          filename = new string (bundle, arg)
    
        end
    
        ++i
      end
    

From an aesthetic point of view, it doesn't look that bad. In this example, I
think "is equal", "out filename", "to OS name" and "include path" are all
identifiers. But I'm still wondering what kind of parsing and lexing issues
that may arise.

~~~
nene
I already have hard time parsing this code. The main problem I see, is that to
read the code I have to know every single keywords in the language.

For example I was wondering if "new" is a keyword. If it is, then "new string
()" might be something interesting, otherwise it's just a function call.

Similarly this raises a question of whether I can write the following code:

    
    
      if end of line (str)
    

This might or might not be permitted because "end" is a keyword. If it is
permitted, then the result looks pretty damn ambiguous to me. If it's not then
I have to name my identifier differently, like so:

    
    
      if end_of_line (str)
    

But then I'm skrewing up the style of my code...

~~~
scott_s
I thought of the recognizing a keyword issue, but then I dismissed it: syntax
highlighting make it a non-issue. The ambiguity with using keywords in
identifiers is valid, though.

~~~
kranner
Syntax highlighting is not available everywhere, e.g. in black-and-white
print.

~~~
tianyicui
Usually black-and-white print use bold to indicate the keywords.

------
thesz
Another Zinc:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.6...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.6772)

That one is more popular, it eventually became OCaml.

~~~
mahmud
The Zinc paper is one of my all time implementation papers, right up there
with Dybvig's thesis, Rabbit and Orbit papers on Scheme, Reppy's thesis on
Concurrent ML, and SPJ's Tagless paper.

The Zinc Experiment is Leroy at his best; compiler hacking lore meets
programming language research (no hand-waving past performance issues, with a
critical eye towards foundations.)

------
paperclip
I will try to ignore the shallow (but horrifying) issue of identifiers
including spaces.

The real question to be asked here is what is wrong with the current portable
assembler (C) ? C has occupied this niche for a long time and quite
successfully - I believe all current mainstream kernels are written in C (or
possibly a limited subset of C++).

If you want a 'portable assembler', a modern C compiler is in my opinion, a
good choice:

    
    
      - a solid specification: detailing the behaviour of operations, what is defined, implementation, or undefined behaviour.
    
      - access to platform specific features through builtins and intrinsics
    
      - ability to use inline asm if you really want to (or need to)
    
      - easy integration with existing libraries
    
      - minimal dependencies on a runtime library (pretty much none in freestanding implementations)
    
      - most compliers give have ways to get good control of both what code is generated and structure layout.
    

The modern C ecosystem provides (mostly good) tools for:

    
    
      - tracking memory leaks/invalid memory accesses (valgrind)
    
      - static analysis (clang static analyser, sparse, coverity, ...)
    
      - debuggers (gdb ...)
    
      - solid optimizing compilers (icc, gcc, llvm)
    
      - profilers (oprofile, perf, vtune, ...)
    

Admittedly, most of these tools don't depend on the code being written in C,
but I suspect any new language would take a while to get properly integrated.
If you want to use a low level language, you really want to have access to
these tools or equivalent.

A new language trying to compete in this space would have to offer something
fairly substantial to get me to switch - and a strange syntax like zinc is not
going to help. From the documentation at least, zinc seems to currently be
missing: an equivalent to volatile; asm; anyway to access a CAS like
instruction; 64bit types; floats; a way to interface to C code; clear
documentation about behaviour in corner cases (what happens if you a left
shift a 32bit value by 40?). The only thing seems to bring to the table to
compensate is the ability to inherit structures

~~~
haberman
I agree with you. I just wanted to list the one complaint I do have about C:
missed optimization opportunities due to lax aliasing rules.

Consider the following C translation unit:

    
    
        void foo(const int *i);
        void bar();
    
        int baz() {
          int i = 1;
          foo(&i);
          return i + 1;
        }
    
        int quux() {
          int i;
          foo(&i);
          i = 1;
          bar();
          return i + 1;
        }
    

You'd like to think that both baz() and quux() could compile the return
statements to a constant "return 2." After all, foo() is taking a pointer to a
CONST int. But alas, this is not the case, because foo() could cast away the
const. So in truth, both functions are forced to reload the integer from the
stack, add 1 to it, and then return that! You can't use any values you had
loaded in registers (or in this case, you can't evaluate the expression at
compile time).

My example is contrived, but you can easily construct examples that fit the
same pattern and are real.

I've heard that Fortran still beats C in optimization in some cases; I would
expect that the above is one major reason why. C99's "restrict" addresses some
of the difference but cannot help you with the above.

------
wbhart
What is wrong with 64 bit integers? Maybe they've been indicted on war crimes
or something. The number of languages that appear and don't support them....
And what about interfacing with C? I can count the languages on one hand that
have a simple and efficient C interface! (I have a list of other things almost
always ignored by languages for no good reason... efficiency, friendly
license, lack of macros or ability to extend the language...)

------
wildmXranat
Interesting find. It seems to have been left to collect dust. Last changes are
about 3 years ago.

------
humbledrone
I guarantee that I would confuse the types "byte" (uint8_t) and "octet"
(int8_t). The typical distinction between a byte and an octet has to do with
the number of bits in the representation (a byte usually has 8, an octet
always has 8). I don't know of any convention for bytes being unsigned and
octets being signed.

~~~
joubert
You're right that with "byte" there isn't an official size specification,
although the de facto size is 8 bits, unlike with "octet", which was
specifically defined as 8 bits (for interoperability between different
systems).

Regarding the question of signed/unsigned - I'll try to explain:

 _byte - unsigned_

On page 37 of the C99 standard: "A byte contains CHAR_BIT bits, and the values
of type unsigned char range from 0 to 2^CHAR_BIT - 1)"

i.e. according to the C99 standard, a byte is unsigned.

 _octet - signed_

Think of an octet in two ways: the concept of something that is exactly 8-bits
on the one hand, and on the other hand, the technical representation of this
concept.

When you read the literature you'll notice that an octet refers simply to the
size of something (8 bits) and not is signedness. For example, octets arguably
arose in the networking world, and the NDR (Network Data Representation)
refers to octet in sign-neutral way.

On page 256 of the C99 standard: "The typedef name int N _t designates a
signed integer type with width N, no padding bits, and a two’s-complement
representation. Thus, int8_t denotes a signed integer type with a width of
exactly 8 bits."

Now, how would you go about representing the concept of an "octet" (which is
sign-neutral)? If you used an unsigned 8 bit integer, you can't represent the
sign of the (conceptual) octet, while a signed 8 bit type can.

------
timrobinson
This reminds me of a slightly higher-level High Level Assembly:
<http://en.wikipedia.org/wiki/High_Level_Assembly>

Edit: the HLA web site always used to be a decent place to learn assembly
language. I don't remember it being so mauve though:
[http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/index.h...](http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/index.html)

------
gasull
Related: assembly programming with Python syntax.

<http://www.corepy.org/>

------
gcv
From the description: "The goal is to have a portable assembler." Why not just
use Fortran?

------
yawniek
looks interesting but i cant get it to work on os x or linux.

~~~
sanxiyn
Works for me. Here is how to do minimal 3-stage bootstrap.

    
    
      gcc bootstrap/io.c bootstrap/zc.c -o zc1
      ./zc1 -I lib -I lib/platform/default -I src src/main.zc -o zc2.c
      gcc lib/libc/io.c zc2.c -o zc2
      ./zc2 -I lib -I lib/platform/default -I src src/main.zc -o zc3.c
      cmp zc2.c zc3.c # should be identical

