Nim compiler — Pascal source code

fm77 · on Nov 29, 2020

from nim-lang/nim/nim/scanner.pas

  // This scanner is handwritten for efficiency. I used an elegant buffering
  // scheme which I have not seen anywhere else:
  // We guarantee that a whole line is in the buffer. Thus only when scanning
  // the \n or \r character we have to check wether we need to read in the next 
  // chunk.

Turbo Pascal uses the same buffering scheme, see

https://turbopascal.org/processing-source-line-characters

Turbo Pascal though needs the crlf within the first 128 bytes otherwise the compiler quits with an error „Line too long".

pjmlp · on Nov 29, 2020

No it doesn't. That isn't the official source code of Turbo Pascal.

"Because of my interest in Pascal programming language and compilers I created a Turbo Pascal compiler written in Turbo Pascal - TPC16. This is a compiler compatible with the original Borland Turbo Pascal 7 command line compiler tpc.exe. Later I modified this compiler to compile under Delphi - TPC32. On the foundations of this compiler I later created Turbo51 - Pascal compiler for 8051 microcontrollers. Then I decided to publish the secrets of the first compiler I have created and this website was born. Here you can find the internals, algorithms and data structures of the Turbo Pascal 7 command-line compiler. You can also download the executable file of the TPC16 compiler.

I created the compiler from scratch, but credits for the beauty of the language and for the exceptional elegance of the compiler go to Niklaus Wirth, Anders Hejlsberg and Borland."

https://direct.turbopascal.org/about

fm77 · on Nov 29, 2020

This may sound like a contradiction to you because you are right, this is not the official source code of Turbo Pascal, but it is the source code of Turbo Pascal.... :-)

Barry also stated this 5 years ago, see

https://news.ycombinator.com/item?id=10202563

but that is not true.

How can I be so sure? I reverse-engineered pretty much all Turbo Pascal versions starting with v4.0 all the way up to v8.0 (Delphi v1.0). I even disassembled the beta versions of Delphi, v7.9h and v7.9k, to see and study the transition from Turbo Pascal to Delphi.

That being said, what you are looking at at turbopascal.org is an absolute mind-blowing reverse-engineering job. One hint is when you look for example at

Procedure AddReferenceRecordForTypedConstant (UnitSegment, BlockRecord: Word; ReferenceFlags: TReferenceFlagSet; DX, TypedConstantOffset: Word);

https://turbopascal.org/processing-typed-constants

at the bottom of the page - where the author couldn't come up with a proper name for the DX register, so he left that identifier unchanged. If you disassemble Turbo Pascal "tpc.exe" and look for that subroutine you'll see that Anders Hejlsberg really used the DX register.... (I know, not really convincing...)

geofft · on Nov 29, 2020

If I understood right - this is a "white box" reimplementation of the real Turbo Pascal, which uses the same algorithms and structures and control flow as the real one (and therefore should be bug-for-bug compatible) but is written in Pascal instead of assembly? That is, the "DX" argument there isn't actually passed in register DX, but it represents a function call in real Turbo Pascal that passed the same semantic contents in register DX?

fm77 · on Nov 29, 2020

I don't know what a "white box" reimplementation is (white-box cryptography?) but I guess you meant a clean room implementation (https://en.wikipedia.org/wiki/Clean_room_design) which I believe this is not the case here (I am wild guessing though). For a clean room implementation you need at least two guys, one reverse engineer who writes down his findings and a guy who takes that information and reimplements the same functionality according to the information the reverse-engineer provides.

But you are right in that it uses the same algorithms, structures and control flow as the real Turbo Pascal and therefore should be bug-for-bug compatible but is written in Pascal instead of assembly.

geofft · on Dec 4, 2020

Probably a poor choice of words, but I mean the opposite of a clean room reimplementation - it's one where the same person is looking at disassembly and writing the new implementation (i.e, the thing being reimplemented is a white box, not a black box).

sim_card_map · on Nov 29, 2020

Interesting!

Isn't original TP7 source code available?

https://github.com/shidel/DustyTP7

elvis70 · on Nov 29, 2020

You have linked a source code repository of programs and units written for TP7.

The sources of TP6 have leaked a few years ago. Search for exmortis on HN.

fm77 · on Nov 29, 2020

That "sources of TP6" you are referring to is not a leak but also a fantastic reverse-engineering job done probably 1993/94 and appeared 20 years ago on the web...

elvis70 · on Nov 30, 2020

In this context, it's a really impressive work! Can you tell us more?

elvis70 · on Nov 29, 2020

Also, the default buffer size of Text and file (without specifying a type) types is 128 bytes as well for the same historical reason: CP/M, where files are divided in records of 128 bytes.

eterps · on Nov 29, 2020

The source for scanner.pas looks recognizably like an implementation inspired by Niklaus Wirth's compiler books.

zyxzevn · on Nov 29, 2020

Now we need Nim with the super easy Lazarus user interface.

squarefoot · on Nov 29, 2020

That one would be a game changer. Lazarus has been ported everywhere, from x86 PCs to ARM64 embedded boards.

k__ · on Nov 29, 2020

Nice.

I'm still dreaming of a world where Rust had Nim syntax.

arc619 · on Nov 29, 2020

Why not just use Nim? The language is rapidly converging on an ownership model similar to rust anyway and has better metaprogramming, more compile targets and a high development velocity (plus faster compile times).

Rust has a bigger ecosystem of course, but Nim's is pretty decent and you can use C, C++ and JS libs natively for anything else.

fwsgonzo · on Nov 29, 2020

We really need a simpler language that has the same ownership model. I like Nim because I can compile it without threading support, and for RISC-V (albeit 64-bit, wish it had 32-bit support too..).

Overall, I like it, but I could not justify using it in an enterprise setting without knowing that the direction of the language is going towards what is now the established minimum safety, which is borrow checking and no UB.

gameswithgo · on Nov 29, 2020

if you have borrow checking, you are going to inherit a lot of complexity in order to deal with the limitations it imposes. you could be simpler than rust, for sure, but bot as simple as say C or Go.

Zig might be a language you are interested in. Simple like C with sensible ways of dealing with UB and bounds checking, as well as options instead of Nulls.

planetis · on Nov 29, 2020

Zig is interesting but still very immature. Its memory model is manual management thus has no advantages over rust or nim. So it's marketed as better C (not any more than that).

pizza234 · on Nov 29, 2020

> We really need a simpler language that has the same ownership model

Can you mention which part of the Rust language you found complicated (in real world), excluding the ownership/memory management area (which includes: bck, lifetimes, smart pointers, traits related to synchronization)?

I read a lot of comments about Rust's complexity, and while nobody doubts the complexity of the ownership/memory management, there's a very common lack of details when it comes to the rest of the language.

pjmlp · on Nov 29, 2020

The rest of the language is just fine if you are used to the ML language family, maybe that is the issue for those coming from more mainstream languages.

zinclozenge · on Nov 29, 2020

No algebraic datatypes + pattern matching, yes I'm aware of libraries that implement it via metaprogramming, but that's not good enough for me.

arc619 · on Nov 29, 2020

> No algebraic datatypes

Not sure what you mean there, there's product types with tuples and objects, and sum types as object variants.

> Pattern matching

Eh, I mean it's just sugar over a case statement.

  let
    num = 5
    str = case num
      of 1: "One!"
      of 2, 3, 5, 7, 11: "This is a prime"
      of 13..19: "A teen"
      else: "Unknown"  # Compile error if all cases arent covered.

As you say there's libraries that let you deconstruct and partially match more complex types if you need that. Being able to create things like pattern matching, async, and novel multithreading runtimes as a library in Nim shows how powerful (and useful) the metaprogramming is.

Enums in Rust are more interesting, but again nothing you can't do with an object variant.

kevin_thibedeau · on Nov 29, 2020

Try using Rust without any macros.

simias · on Nov 29, 2020

Nim is nice but "better metaprogramming" is going to be very contentious I think. "Different metaprogramming" for sure, but the rest is a matter of preference frankly.

Nim macros are a lot nicer than Rust's, but Rust's trait system is very powerful and expressive. Sure, you can do something like that with Nim's macros, but you can't expect that all third party code is going to play nice.

That's the problem with the macro approach: it's great because it's ultra-flexible, but it also sucks because it's ultra-flexible. Everybody ends up doing their own little DSL that fits their use case or their personal taste, and it adds friction at the interfaces.

nikki93 · on Nov 29, 2020

Is this based on having used Nim in practice in projects and observing the code of many Nim projects and collecting data / insights; or without having used Nim and on speculation / general thoughts about macros / DSLs (or somewhere along the spectrum)? Just so it's clear how to contextualize.

Because having tried to explore these questions by studying Nim projects, going through a book and working on a game engine in it the past months; I've found that ufcs+overloading covers the static traits scenario, and it feels like the runtime traits object scenario is better self-rolled (I don't really want the compiler impl'ing vtables underneath me, save deciding to reuse a static dispatch feature for a dynamic one; if I want a struct of func ptrs I will make a struct of func ptrs.) at least for my usage. Various Nim code I come across is usually pretty readable, and I don't need to invent macros willy nilly.

ufcs feels more elegant than needing to decide whether to namespace a function inside a struct or not (or which one), which is what most langs (including Rust) seem to do.

planetis · on Nov 29, 2020

There is no "friction" with macros, sure lot's of packages implement their own utility macros but most can be hidden under ordinary procs. Macros CAN be called like ORDINARY procs, check syntactic sugar module [1], though not all macros are designed this way. Unless we are talking about public DSLs. But I don't really understand what's the problem with them.

[1] https://nim-lang.github.io/Nim/sugar.html

planetis · on Nov 29, 2020

Also macros do not capture their caller AST! They operate on the code passed to them. Another important point that removes any friction. This way nesting macros works with no problem, they're expanded in order. (I can think even more reasons why your argument is unbased, like "hygienic" identifiers, operating on AST level versus ugly string concats, etc but whatever.)

Xevi · on Nov 29, 2020

I don't know much about Rust, and even less about Nim, but does this mean that Nim will have the same safety guarantees that Rust has? Also, can you really just add something like that down the line and ensure that the whole ecosystem works correctly? I was under the impression that Rust's strength was that it's been designed from the start to use an ownership model.

beagle3 · on Nov 29, 2020

Nim’s arc/orc mode (introduced in 1.4 and likely to become default in 2.0) does borrow checking but does nit enforce it in the same way Rust does: instead, you’ll get a compiler warning “i can’t just borrow here, so I am making a copy” which you can prevent by making your type uncopyable (which will make this an error). The semantics are not equivalent to rust, but they are similar.

Nim stops race conditions by controlling data passage among threads (disallowing shared data between threads, in general) and always has.

I am not familiar enough with Rust to say that everything rust can do and assure is indeed covered in Nim; however, plain Nim without using the escape hatches (which like Rust unsafe code, exist for those times you must use them) is safe.

Xevi · on Nov 29, 2020

That's awesome. I'm currently studying Rust, but I'm looking forward to giving Nim a try in a year or two. I really like the syntax, and if it can offer the same safety and performance as Rust, then I see a bright future for it.

k__ · on Nov 29, 2020

"The language is rapidly converging on an ownership model similar to rust "

Nice.

Do you have more resources on that?

Also, does Nim have a WASM target?

sp33der89 · on Nov 29, 2020

Here are some handy links (for me) that might help you out too:

https://nim-lang.org/blog/2020/10/15/introduction-to-arc-orc...

https://www.youtube.com/watch?v=aUJcYTnPWCg

https://github.com/stisa/nwasm

You can also go an Emscripten route: https://hookrace.net/blog/porting-nes-go-nim/

planetis · on Nov 29, 2020

Latest presentation is https://www.youtube.com/watch?v=zPlBZkWvuug It has updated information.

nikki93 · on Nov 29, 2020

Re: WASM -- https://github.com/nikki93/ng-public -- Public repo for a game side project I've been working on that does Nim -> C/C++ -> WASM (native desktop also works, iOS probably works (have used the same build system for C++ iOS before) and Android can be made to work). One of the modules involved wraps entt which is a template-heavy C++ library and not many languages can actually wrap with just a cdecl-y ffi layer.

Re: borrow checking: https://nim-lang.org/docs/manual_experimental.html#view-type... It's still pretty early and also IME the mix of arc (which is determinstic) with manually managed memory (esp. through some data structure that manages entity data like entt) has been pretty nice.

atombender · on Nov 29, 2020

Nim is apparently getting support for owned pointers [1]. There's a proposal [2]. The proposal PR has since been closed, but it's apparently still being worked on.

[1] https://nim-lang.org/araq/ownedrefs.html

[2] https://github.com/nim-lang/RFCs/issues/144

planetis · on Nov 29, 2020

I had played with and really liked "owned". However there was design issues with it and that feature was forgotten.

beagle3 · on Nov 29, 2020

Most of those semantics are now tracked automatically without needing annotation in arc/orc.

gameswithgo · on Nov 29, 2020

If you are hoping for the lack of curly braces in particular, there is F#, which is sort of Rust with the minor matter of having a garbage collector, and inefficient iterators.

tonetheman · on Nov 29, 2020

I have on and off played with Nim and independent of this article and this code had thought that it reminded me greatly of Pascal. With some Python thrown in there too.

auxym · on Nov 29, 2020

The syntax makes it look like Python at a first glance. When you write it however, the type system makes it feel much more like Pascal.

skulk · on Nov 29, 2020

If you squint really hard, it looks like Nim! (this is coming from someone who never learned Pascal)

TurboHaskal · on Nov 29, 2020

And if you ever programmed in Pascal, you can tell its influence on Nim just by reading Nim source code. The idiomatic way to write enums made me chuckle.

keyle · on Nov 29, 2020

I'm not sure what the OP was trying to show with this.

It's certainly cool and interesting to see that nim(rod) was written in pascal 11 years ago?

I love Pascal, I just give up on it too fast. But man it's fast, compiles fast, "easy" to use, until you start doing real world stuff.

pjmlp · on Nov 29, 2020

Having used all Borland compilers since Turbo Pascal 3 all the way up to around the first set of Delphi versions, and a couple of UNIX Pascal compilers as well, I wonder what is so hard for real world stuff.

ksec · on Nov 29, 2020

I continue to wonder if we have even exceed Delphi in terms of RAD today.

There are so many things that were better in its field, domain or niche but simple got washed out in History. It was only the other day someone commented Rails, despite its similarity in concept and execution, it is still not as good as good old Enterprise Objects Framework from NeXT.

pjmlp · on Nov 29, 2020

Depends on how you look at it, I certainly feel like using Delphi when doing .NET and Java development, which are also the development stacks that took the spotlight away from Delphi (alongside Borland's mismanagement).

What they miss in regard to Delphi is proper AOT to native code capabilities, something that they should have supported (from my point of view) since version 1.0. Java had it via commercial JDKs (which only enterprises cared about), .NET via NGEN, mono aot and some research stuff like CosmOS, Singularity or Midori.

However now 20 years later they are finally realising that must be an option, but that is mostly thanks to the competion from Go, Rust, Swift and friends than anything else.

Interesting that you mention EOF, it eventually became WebObjects (in Java) and it was due to NeXT / Sun collaboration that J2EE was born. Many Java bashers don't realise how much influence Objective-C had in the Java ecosystem, in language features, language runtime, and yes J2EE.

Regarding things being washed out of history, check out on Burroughs, VMS, Solo Pascal, Mesa, Multics, PL.8 for examples of systems programming before C was at all relevant outside Bell Labs walls, the Xerox PARS Workstations namely Interlisp-D, Mesa, Mesa/Cedar, Smalltalk or Wirths linage of Modula-2 and Oberon derived work based on Xerox PARC work.

Unfortunely computer history isn't something that most computing degrees spend time on.

ksec · on Nov 29, 2020

On EOF ,It was actually an answer from Quora [1] and a comment on HN [2]. I didn't have the luxury of playing with WebObject then, I remember it was quite expensive, but I have only heard good things about it.

>What they miss in regard to Delphi is proper AOT to native code capabilities

May be I am just a fan of Pascal. Which for some reason most developers hate and opt for PL with more C like syntax.

>Unfortunely computer history isn't something that most computing degrees spend time on.

Yes. We end up not knowing the how and why it was like that in the first place. And ends up with people reinventing everything.

[1] https://www.quora.com/What-was-it-like-to-be-a-software-engi...

[2] https://news.ycombinator.com/item?id=25199154

turminal · on Nov 29, 2020

Sorry if this is obvious, I have no experience with Delphi, but what is AOT?

pjmlp · on Nov 29, 2020

Compilation to native code Ahead Of Time.

WalterGR · on Nov 29, 2020

(The term is general - not specific to Delphi.)

jpcooper · on Nov 29, 2020

What was so good about Enterprise Objects Framework? I don’t know much about Rails either, by the way.

pjmlp · on Nov 29, 2020

It was a proper object entity relationship mapping framework to relation databases, already available in 1990's NeXTSTEP.

http://www.kevra.org/TheBestOfNext/page570/page571/page573/p...

jpcooper · on Nov 29, 2020

Thanks. Will have a read.

HacklesRaised · on Nov 29, 2020

I read "I couldn't find libraries that did most of what I needed and I couldn't be bothered to contribute said libraries"

pjmlp · on Nov 29, 2020

Not even written in C? Because language interop is a thing, unless one is religously following the mantra of being a X-lang developer instead of embracing polyglot skills.

cy_hauser · on Nov 29, 2020

Or unless you're coding simply because you enjoy the experience and don't care about the interop and making a wide use language at that point.

pjmlp · on Nov 29, 2020

When I find a issue that could be more practical do in C++ instead of Java or .NET, instead of throwing away my productivity out of the window and rewrite everything into something else, I just write a tiny wrapper for what is missing and write everything else in Java or .NET languages.

Just like I used to call Windows DLLs or COM libraries from TPW and Delphi.

Which incidently is how Google pushes everyone to write native code on Android, as little as possible, just tiny native libraries. Even the NativeActivity is just a regular Java Activity implementation with predefined native methods declarations that the shared object is expected to fill in, to be properly called from the framework.