Hacker News new | past | comments | ask | show | jobs | submit login
Nim compiler — Pascal source code (github.com/nim-lang)
123 points by _zhqs on Nov 29, 2020 | hide | past | favorite | 58 comments



from nim-lang/nim/nim/scanner.pas

  // This scanner is handwritten for efficiency. I used an elegant buffering
  // scheme which I have not seen anywhere else:
  // We guarantee that a whole line is in the buffer. Thus only when scanning
  // the \n or \r character we have to check wether we need to read in the next 
  // chunk.
Turbo Pascal uses the same buffering scheme, see

https://turbopascal.org/processing-source-line-characters

Turbo Pascal though needs the crlf within the first 128 bytes otherwise the compiler quits with an error „Line too long".


No it doesn't. That isn't the official source code of Turbo Pascal.

"Because of my interest in Pascal programming language and compilers I created a Turbo Pascal compiler written in Turbo Pascal - TPC16. This is a compiler compatible with the original Borland Turbo Pascal 7 command line compiler tpc.exe. Later I modified this compiler to compile under Delphi - TPC32. On the foundations of this compiler I later created Turbo51 - Pascal compiler for 8051 microcontrollers. Then I decided to publish the secrets of the first compiler I have created and this website was born. Here you can find the internals, algorithms and data structures of the Turbo Pascal 7 command-line compiler. You can also download the executable file of the TPC16 compiler.

I created the compiler from scratch, but credits for the beauty of the language and for the exceptional elegance of the compiler go to Niklaus Wirth, Anders Hejlsberg and Borland."

https://direct.turbopascal.org/about


This may sound like a contradiction to you because you are right, this is not the official source code of Turbo Pascal, but it is the source code of Turbo Pascal.... :-)

Barry also stated this 5 years ago, see

https://news.ycombinator.com/item?id=10202563

but that is not true.

How can I be so sure? I reverse-engineered pretty much all Turbo Pascal versions starting with v4.0 all the way up to v8.0 (Delphi v1.0). I even disassembled the beta versions of Delphi, v7.9h and v7.9k, to see and study the transition from Turbo Pascal to Delphi.

That being said, what you are looking at at turbopascal.org is an absolute mind-blowing reverse-engineering job. One hint is when you look for example at

Procedure AddReferenceRecordForTypedConstant (UnitSegment, BlockRecord: Word; ReferenceFlags: TReferenceFlagSet; DX, TypedConstantOffset: Word);

https://turbopascal.org/processing-typed-constants

at the bottom of the page - where the author couldn't come up with a proper name for the DX register, so he left that identifier unchanged. If you disassemble Turbo Pascal "tpc.exe" and look for that subroutine you'll see that Anders Hejlsberg really used the DX register.... (I know, not really convincing...)


If I understood right - this is a "white box" reimplementation of the real Turbo Pascal, which uses the same algorithms and structures and control flow as the real one (and therefore should be bug-for-bug compatible) but is written in Pascal instead of assembly? That is, the "DX" argument there isn't actually passed in register DX, but it represents a function call in real Turbo Pascal that passed the same semantic contents in register DX?


I don't know what a "white box" reimplementation is (white-box cryptography?) but I guess you meant a clean room implementation (https://en.wikipedia.org/wiki/Clean_room_design) which I believe this is not the case here (I am wild guessing though). For a clean room implementation you need at least two guys, one reverse engineer who writes down his findings and a guy who takes that information and reimplements the same functionality according to the information the reverse-engineer provides.

But you are right in that it uses the same algorithms, structures and control flow as the real Turbo Pascal and therefore should be bug-for-bug compatible but is written in Pascal instead of assembly.


Probably a poor choice of words, but I mean the opposite of a clean room reimplementation - it's one where the same person is looking at disassembly and writing the new implementation (i.e, the thing being reimplemented is a white box, not a black box).


Interesting!

Isn't original TP7 source code available?

https://github.com/shidel/DustyTP7


You have linked a source code repository of programs and units written for TP7.

The sources of TP6 have leaked a few years ago. Search for exmortis on HN.


That "sources of TP6" you are referring to is not a leak but also a fantastic reverse-engineering job done probably 1993/94 and appeared 20 years ago on the web...


In this context, it's a really impressive work! Can you tell us more?


Also, the default buffer size of Text and file (without specifying a type) types is 128 bytes as well for the same historical reason: CP/M, where files are divided in records of 128 bytes.


The source for scanner.pas looks recognizably like an implementation inspired by Niklaus Wirth's compiler books.


Now we need Nim with the super easy Lazarus user interface.


That one would be a game changer. Lazarus has been ported everywhere, from x86 PCs to ARM64 embedded boards.


Nice.

I'm still dreaming of a world where Rust had Nim syntax.


Why not just use Nim? The language is rapidly converging on an ownership model similar to rust anyway and has better metaprogramming, more compile targets and a high development velocity (plus faster compile times).

Rust has a bigger ecosystem of course, but Nim's is pretty decent and you can use C, C++ and JS libs natively for anything else.


We really need a simpler language that has the same ownership model. I like Nim because I can compile it without threading support, and for RISC-V (albeit 64-bit, wish it had 32-bit support too..).

Overall, I like it, but I could not justify using it in an enterprise setting without knowing that the direction of the language is going towards what is now the established minimum safety, which is borrow checking and no UB.


if you have borrow checking, you are going to inherit a lot of complexity in order to deal with the limitations it imposes. you could be simpler than rust, for sure, but bot as simple as say C or Go.

Zig might be a language you are interested in. Simple like C with sensible ways of dealing with UB and bounds checking, as well as options instead of Nulls.


Zig is interesting but still very immature. Its memory model is manual management thus has no advantages over rust or nim. So it's marketed as better C (not any more than that).


> We really need a simpler language that has the same ownership model

Can you mention which part of the Rust language you found complicated (in real world), excluding the ownership/memory management area (which includes: bck, lifetimes, smart pointers, traits related to synchronization)?

I read a lot of comments about Rust's complexity, and while nobody doubts the complexity of the ownership/memory management, there's a very common lack of details when it comes to the rest of the language.


The rest of the language is just fine if you are used to the ML language family, maybe that is the issue for those coming from more mainstream languages.


No algebraic datatypes + pattern matching, yes I'm aware of libraries that implement it via metaprogramming, but that's not good enough for me.


> No algebraic datatypes

Not sure what you mean there, there's product types with tuples and objects, and sum types as object variants.

> Pattern matching

Eh, I mean it's just sugar over a case statement.

  let
    num = 5
    str = case num
      of 1: "One!"
      of 2, 3, 5, 7, 11: "This is a prime"
      of 13..19: "A teen"
      else: "Unknown"  # Compile error if all cases arent covered.
As you say there's libraries that let you deconstruct and partially match more complex types if you need that. Being able to create things like pattern matching, async, and novel multithreading runtimes as a library in Nim shows how powerful (and useful) the metaprogramming is.

Enums in Rust are more interesting, but again nothing you can't do with an object variant.


Try using Rust without any macros.


Nim is nice but "better metaprogramming" is going to be very contentious I think. "Different metaprogramming" for sure, but the rest is a matter of preference frankly.

Nim macros are a lot nicer than Rust's, but Rust's trait system is very powerful and expressive. Sure, you can do something like that with Nim's macros, but you can't expect that all third party code is going to play nice.

That's the problem with the macro approach: it's great because it's ultra-flexible, but it also sucks because it's ultra-flexible. Everybody ends up doing their own little DSL that fits their use case or their personal taste, and it adds friction at the interfaces.


Is this based on having used Nim in practice in projects and observing the code of many Nim projects and collecting data / insights; or without having used Nim and on speculation / general thoughts about macros / DSLs (or somewhere along the spectrum)? Just so it's clear how to contextualize.

Because having tried to explore these questions by studying Nim projects, going through a book and working on a game engine in it the past months; I've found that ufcs+overloading covers the static traits scenario, and it feels like the runtime traits object scenario is better self-rolled (I don't really want the compiler impl'ing vtables underneath me, save deciding to reuse a static dispatch feature for a dynamic one; if I want a struct of func ptrs I will make a struct of func ptrs.) at least for my usage. Various Nim code I come across is usually pretty readable, and I don't need to invent macros willy nilly.

ufcs feels more elegant than needing to decide whether to namespace a function inside a struct or not (or which one), which is what most langs (including Rust) seem to do.


There is no "friction" with macros, sure lot's of packages implement their own utility macros but most can be hidden under ordinary procs. Macros CAN be called like ORDINARY procs, check syntactic sugar module [1], though not all macros are designed this way. Unless we are talking about public DSLs. But I don't really understand what's the problem with them.

[1] https://nim-lang.github.io/Nim/sugar.html


Also macros do not capture their caller AST! They operate on the code passed to them. Another important point that removes any friction. This way nesting macros works with no problem, they're expanded in order. (I can think even more reasons why your argument is unbased, like "hygienic" identifiers, operating on AST level versus ugly string concats, etc but whatever.)


I don't know much about Rust, and even less about Nim, but does this mean that Nim will have the same safety guarantees that Rust has? Also, can you really just add something like that down the line and ensure that the whole ecosystem works correctly? I was under the impression that Rust's strength was that it's been designed from the start to use an ownership model.


Nim’s arc/orc mode (introduced in 1.4 and likely to become default in 2.0) does borrow checking but does nit enforce it in the same way Rust does: instead, you’ll get a compiler warning “i can’t just borrow here, so I am making a copy” which you can prevent by making your type uncopyable (which will make this an error). The semantics are not equivalent to rust, but they are similar.

Nim stops race conditions by controlling data passage among threads (disallowing shared data between threads, in general) and always has.

I am not familiar enough with Rust to say that everything rust can do and assure is indeed covered in Nim; however, plain Nim without using the escape hatches (which like Rust unsafe code, exist for those times you must use them) is safe.


That's awesome. I'm currently studying Rust, but I'm looking forward to giving Nim a try in a year or two. I really like the syntax, and if it can offer the same safety and performance as Rust, then I see a bright future for it.


"The language is rapidly converging on an ownership model similar to rust "

Nice.

Do you have more resources on that?

Also, does Nim have a WASM target?



Latest presentation is https://www.youtube.com/watch?v=zPlBZkWvuug It has updated information.


Re: WASM -- https://github.com/nikki93/ng-public -- Public repo for a game side project I've been working on that does Nim -> C/C++ -> WASM (native desktop also works, iOS probably works (have used the same build system for C++ iOS before) and Android can be made to work). One of the modules involved wraps entt which is a template-heavy C++ library and not many languages can actually wrap with just a cdecl-y ffi layer.

Re: borrow checking: https://nim-lang.org/docs/manual_experimental.html#view-type... It's still pretty early and also IME the mix of arc (which is determinstic) with manually managed memory (esp. through some data structure that manages entity data like entt) has been pretty nice.


Nim is apparently getting support for owned pointers [1]. There's a proposal [2]. The proposal PR has since been closed, but it's apparently still being worked on.

[1] https://nim-lang.org/araq/ownedrefs.html

[2] https://github.com/nim-lang/RFCs/issues/144


I had played with and really liked "owned". However there was design issues with it and that feature was forgotten.


Most of those semantics are now tracked automatically without needing annotation in arc/orc.


If you are hoping for the lack of curly braces in particular, there is F#, which is sort of Rust with the minor matter of having a garbage collector, and inefficient iterators.


I have on and off played with Nim and independent of this article and this code had thought that it reminded me greatly of Pascal. With some Python thrown in there too.


The syntax makes it look like Python at a first glance. When you write it however, the type system makes it feel much more like Pascal.


If you squint really hard, it looks like Nim! (this is coming from someone who never learned Pascal)


And if you ever programmed in Pascal, you can tell its influence on Nim just by reading Nim source code. The idiomatic way to write enums made me chuckle.


I'm not sure what the OP was trying to show with this.

It's certainly cool and interesting to see that nim(rod) was written in pascal 11 years ago?

I love Pascal, I just give up on it too fast. But man it's fast, compiles fast, "easy" to use, until you start doing real world stuff.


Having used all Borland compilers since Turbo Pascal 3 all the way up to around the first set of Delphi versions, and a couple of UNIX Pascal compilers as well, I wonder what is so hard for real world stuff.


I continue to wonder if we have even exceed Delphi in terms of RAD today.

There are so many things that were better in its field, domain or niche but simple got washed out in History. It was only the other day someone commented Rails, despite its similarity in concept and execution, it is still not as good as good old Enterprise Objects Framework from NeXT.


Depends on how you look at it, I certainly feel like using Delphi when doing .NET and Java development, which are also the development stacks that took the spotlight away from Delphi (alongside Borland's mismanagement).

What they miss in regard to Delphi is proper AOT to native code capabilities, something that they should have supported (from my point of view) since version 1.0. Java had it via commercial JDKs (which only enterprises cared about), .NET via NGEN, mono aot and some research stuff like CosmOS, Singularity or Midori.

However now 20 years later they are finally realising that must be an option, but that is mostly thanks to the competion from Go, Rust, Swift and friends than anything else.

Interesting that you mention EOF, it eventually became WebObjects (in Java) and it was due to NeXT / Sun collaboration that J2EE was born. Many Java bashers don't realise how much influence Objective-C had in the Java ecosystem, in language features, language runtime, and yes J2EE.

Regarding things being washed out of history, check out on Burroughs, VMS, Solo Pascal, Mesa, Multics, PL.8 for examples of systems programming before C was at all relevant outside Bell Labs walls, the Xerox PARS Workstations namely Interlisp-D, Mesa, Mesa/Cedar, Smalltalk or Wirths linage of Modula-2 and Oberon derived work based on Xerox PARC work.

Unfortunely computer history isn't something that most computing degrees spend time on.


On EOF ,It was actually an answer from Quora [1] and a comment on HN [2]. I didn't have the luxury of playing with WebObject then, I remember it was quite expensive, but I have only heard good things about it.

>What they miss in regard to Delphi is proper AOT to native code capabilities

May be I am just a fan of Pascal. Which for some reason most developers hate and opt for PL with more C like syntax.

>Unfortunely computer history isn't something that most computing degrees spend time on.

Yes. We end up not knowing the how and why it was like that in the first place. And ends up with people reinventing everything.

[1] https://www.quora.com/What-was-it-like-to-be-a-software-engi...

[2] https://news.ycombinator.com/item?id=25199154


Sorry if this is obvious, I have no experience with Delphi, but what is AOT?


Compilation to native code Ahead Of Time.


(The term is general - not specific to Delphi.)


What was so good about Enterprise Objects Framework? I don’t know much about Rails either, by the way.


It was a proper object entity relationship mapping framework to relation databases, already available in 1990's NeXTSTEP.

http://www.kevra.org/TheBestOfNext/page570/page571/page573/p...


Thanks. Will have a read.


I read "I couldn't find libraries that did most of what I needed and I couldn't be bothered to contribute said libraries"


Not even written in C? Because language interop is a thing, unless one is religously following the mantra of being a X-lang developer instead of embracing polyglot skills.


Or unless you're coding simply because you enjoy the experience and don't care about the interop and making a wide use language at that point.


When I find a issue that could be more practical do in C++ instead of Java or .NET, instead of throwing away my productivity out of the window and rewrite everything into something else, I just write a tiny wrapper for what is missing and write everything else in Java or .NET languages.

Just like I used to call Windows DLLs or COM libraries from TPW and Delphi.

Which incidently is how Google pushes everyone to write native code on Android, as little as possible, just tiny native libraries. Even the NativeActivity is just a regular Java Activity implementation with predefined native methods declarations that the shared object is expected to fill in, to be properly called from the framework.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: