He's wrong of course about being the only user.
I love this software, along with k/q. I admire the work Mr. Shields has put into this project. I especially like the use of musl and provision of static binaries.
I guess I am stubborn and stupid: I like assembly languages, SPITBOL, k/q, and stuff written by djb. Keep it terse. Ignore the critics.
Yet this is now on the front page of HN. Maybe because it is the weekend? I really doubt that the software I like with ever become popular. But who knows? Maybe 10 years from now I will look at this post and marvel at how things turned out.
There is no "structured programming" with spitbol. No curly braces. Gotos get the job done. Personally, I do not mind gotos. It feels closer to the reality of how a computer operates.
Would be nice if spitbol was ported to BSD in additon to Linux and OSX. As with k/q I settle for Linux emulation under BSD.
Thanks for that typo, I laughed immediately after drinking some tea and now it's all over my monitor :(
In an alternate reality, high-level languages would be wired directly into our "hardware", via microcode or FPGA's or what have you. Software systems would be designed first, then the circuitry. In this alternate reality, Intel did not monopolize decades doubling down on clock speed so that we wouldn't have time to notice the von Neumann bottleneck. Apologies to Alan Kay. 
We should look at the "bloat" needed to implement higher-level languages as a downside of the architecture, not of the languages. The model of computing that we've inherited is just one model, and while it may be conceptually close to an abstract Turing machine, it's very far from most things that we actually do. We should not romanticize instruction sets; they are an implementation detail.
I'm with you in the spirit of minimalism. But that's the point: if hardware vendors were not so monomaniacally focused on their way of doing things, we might not need so many adapter layers, and the pain that goes with them.
> Another study was started to see if a new RISC architecture could be defined that could directly support the VMS operating system. The new design used most of the basic PRISM concepts, but was re-tuned to allow VMS and VMS programs to run at reasonable speed with no conversion at all.
That sounds like designing the software system first, then the circuitry.
Further, I remember reading an article about how the Alpha was also tuned to make C (or was it C++?) code faster, using a large, existing code base.
It's not on-the-fly optimization, via microcode or FPGA, but it is a 'or what have you', no?
There are also a large number of Java processors, listed at https://en.wikipedia.org/wiki/Java_processor . https://en.wikipedia.org/wiki/Java_Optimized_Processor is one which works on an FPGA.
In general, and I know little about hardware design, isn't your proposed method worse than software/hardware codesign, which has been around for decades? That is, a feature of a high-level language might be very expensive to implement in hardware, while a slightly different language, with equal expressive power, be much easier. Using your method, there's no way for that feedback to influence the high-level design.
Basically (again, knowing nothing about this), I assume that there's a better balance to be struck between the things that hardware vendors have already mastered (viz, pipelines and caches) and the things that compilers and runtimes work strenuously to simulate on those platforms (garbage collection, abstractions of any kind, etc).
My naive take is that this whole "pivot" from clock speed to more cores is just a way of buying time. This quad-core laptop rarely uses more than one core. It's very noticeable when a program is actually parallelized (because I track the CPU usage obsessively). So there's obviously a huge gap between the concurrency primitives afforded by the hardware and those used by the software. Still, I think that they will meet in the middle, and it'll be something less "incremental" than multicore, which is just more-of-the-same.
That's because it is closer, as I'm sure you know, since you stated your fondness for assembly languages. I even like them for specific, limited tasks (advanced loop control). That said, I think preferring them over more "modern" constructs such as if/while/for is sort of like disparaging all those new gas powered carriages, because you can get around just fine with your horse to power your carriage, thankyouverymuch. There are very good reasons to approach most uses of goto with skepticism.
Goto is essential, it's the glue that holds the instruction set together. That said, we must not fetishize it, just as we must not fetishize items of the past that are largely superseded by what they helped create. To do so slows us down, and we fail to achieve what we otherwise could. We must not forget them either, they have their places, and to do so would also slow us down.
I'd argue that e.g. an x86 LOOP instruction is far more equivalent to a do/while loop than a goto. Most of the jump instructions I see in my disassembly aren't unconditional like goto is - if anything, car engines are closer to horses in what they accomplish than, say, jnz is to goto! Even jmp certainly doesn't used named labels, as any goto worth it's salt will use - instead you'll see absolute or relative address offsets.
>> Personally, I do not mind gotos. It feels closer to the reality of how a computer operates.
There's a time and place to get close to the hardware, but I've never felt that goto got me meaningfully closer. Of course, my first and primary exposure to GOTO was in BASIC - where it was interpreted.
You want to get close to the hardware? Play with intrinsics. Open up the disassembly and poke at it with a profiler. Find out if your algorithm manages to execute in mostly L1 cache, or if it's spilling all the way out into RAM fetches. Figure out where you need to prefetch, where you're stalling, where your cycles are being spent. Diagnose and fix some false sharing. Write your own (dis)assembler - did you know there's typically no actual nop instruction? You simply emit e.g. xchg eax, eax, which happens to do nothing of note, and label it "nop" for clarity.
IMO, you'll have more time to do these things when embracing the advantages that structured programming can provide. Of course, I may be speaking to the choir, at least on that last point.
As for JNE not being a GOTO, it most certainly is. It just so happens to only happen under certain circumstances (along with the other conditional jumps, and yes, that's how they are described). Compare:
IF X <= 45 GOTO THERE
And let me assure you, when writing assembly, you almost always use labels. A disassembly will show you the absolute/relative address because that's all it has to go by.
Oh, and my rant wasn't aimed at you, per-se, but the statement about goto which I expanded in isolation to a fictional point of view. That point of view may or may not have any relation to how you feel about programming and goto, I have no idea.
Goto is definitely still out there.
But one day, as I waited for a keypunch to make some changes to some program or other I was writing, my eye fell upon a copy of the Green Book left behind by some other programmer. I started reading it, and my little mind was completely blown.
SNOBOL was something else, it forced me to think about programs in a completely different way. It wasn’t about specifying steps to be taken one by one, it was about designing a way to pattern match.
And more than in the obvious, RegExp way, but you could do things as the pattern was matching, and thus you were writing a kind of program where the control flow was determined my backtracking and success or failure of matches.
To this day, my programming is highly influenced by one feature of SNOBOL, I guess it “imprinted” on me: Patterns are first-class values (like regular expressions in other languages), but they are also composable, there is an arithmetic of patterns. To this day I favour designs where things can be composed and decomposed at will rather than used as opaque, monolithic entities.
I’m not saying SNOBOL was better than the FP and OOP and multi-paradigm languages that dominate today, but the experience of learning a new way to think was intoxicating, and once a year or so I re-read the Green Book and think about thinking differently.
If you are interested, I highly recommend you read the whole book. If you see “pattern matching” and think “Regular Expressions,” you will miss the forest for the trees. I’m not sure that anybody needs to know SNOBOL (or its descendants), but I think that it’s a valuable exercise to learn it once.
"A language that doesn't affect the way you think about programming, is not worth knowing.”
In the late 70s, I took two compiler courses with RBK Dewar, one of the creators of SPITBOL. Those courses were wonderful. He mentioned SPITBOL occasionally, and I remember one story in particular. The implementation was done in assembler, (if I'm remembering correctly), and it took 20 runs to get a working system, (I guess that means a basic suite of tests running successfully). That style of working is completely alien today, and arguably less effective.
Dewar also spent some time talking about his work on the SETL language (for processing sets). Flow analysis for global optimization could be expressed extremely concisely, and was of course applied to SETL itself.
On one or two occasions I asked Don Woods to clarify some feature of the language that was incompletely described in the original INTERCAL document, and he dug out the original SPITBOL code in order to answer my question.
At the time I did not realize that Brian Kernighan created Ratfor. Ratfor changed my thinking about program structure and coding style more than any other single event in my professional life.
> SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 language.
Also from wikipedia SNOBOL page shows some examples:
And C2 talks about it as well:
In Python, you would probably use regex for the pattern matching. In SPITBOL, you can accomplish the task at the language level. I doubt the pattern matching is as capable as regex but that's a useful feature to have (edit: based on braythwayt's comment, it sounds like the pattern matching is more capable than regex). It might be better suited to NLP tasks. According to the developer, "SPITBOL is unique in the power it provides to manipulate strings and text. I’ve yet to see anything else come close."
See Zuider's comment for more information.
I hope all-caps-keywords is optional.
"SNOBOL4 patterns subsume BNF grammars, which are equivalent to context-free grammars and more powerful than regular expressions." - https://en.wikipedia.org/wiki/SNOBOL
As opposed to the 50's software technology of garbage collection?
What an off-putting remark to include.
It's strange how anyone would look back and think the language ideas from back then are all useless. Or that there's significantly new stuff going on now in mainstream implementations.
Don Syme, creator of the F# language, wrote to an OCaml mailing list
about how hard it was to get generics into the CLR. How MSCorp
viewed it as "academic" and "experimental".
Don Syme's blog has some more details on how deeply MSR got involved to ship generics. MS Corp simply wasn't capable or interested. This gets revised when talking to folks today - claims that oh yeah, Delphi++ was always gonna do generics right. But in truth, it seems they weren't going to make it without MSR's (F# team overlaps with the generics team) help.
It's incredibly frustrating to see, even as an outsider. Having the patience, tact, and political savvy to pull this off? It's pretty impressive. But did MS learn? Nope. F# is still a second-class citizen, with only token support. F# still isn't marketed as "generally better than C#", but still aimed at "niche" users. It's sad.
MS language tech is pretty much stagnated for 8 years now. Feels like IE all over again. They trounced their major competitor (Java), so now they can kick back and add minor stuff here and there because there's no pressure.
From the blog:
> Generics for .NET and C# in their current form almost didn't happen: it was a very close call, and the feature almost didn't make the cut for Whidbey (Visual Studio 2005). Features such as running CLR code on the database were given higher priority.
> Ultimately, an erasure model of generics would have been adopted, as for Java, since the CLR team would never have pursued a in-the-VM generics design without external help.
I have no particular reason to doubt his perspective - and I certainly don't have any inside knowledge - but a feature not making the cut for version N doesn't always imply that a shitty version would have been implemented in version N+1.
> The world is as it is today because we act when we have the chance to act, and because we invest in foundations that last the test of time.
Could this be a case of the victors getting to write the canonical history?
MS language tech is pretty much stagnated for 8 years now. Feels
like IE all over again.
Out of curiosity, are there language features that you think should be added to C#? For C# to be like IE, the world would need to have moved on and C# would have to be behind in this new world. It doesn't seem like that's the case. And compared to Java, C# moves at a pace that's absolutely breakneck. What has Java gotten in the pat 8 years? Crappy not-closures?
They trounced their major competitor (Java)
I've not got much inside knowledge. I was an MVP 2003-2005 on C# then CLR and Security. I wasn't very knowledgeable back then, but I don't recall any push for FP style, at all. I think the fact that C# originally didn't have lambdas, then added them with an 8-character keyword says enough.
Implementing better generics later? I doubt it. There's been no CLR changes since v2, as far as the type system or IL goes. So that's 10 years, no additions, just added tweaks here and there. Hell, even now, .NET Native relies on source-level transformations, instead of being implemented at the IL-level.
They've been hyping Rosyln. Great. One famous problem with C#'s compiler is that its design made it very hard to add type inference consistently to the language. They rewrote the compiler, did they fix that? Nope. Even worse: My watch says it's 2015, but VS/C# still doesn't ship a REPL. Come on. (Yeah maybe "C# Interactive" will show up some day now that 2015 has RTM'd. But not today.)
The core of my complaint is that they have a mindset of implementing things as hard-coded scenarios, versus general purpose features. Async. Dynamic. Duck typing. Operators. Even the lambda syntax being ambiguous between expressions and code. Why? Cause they choose an end-user scenario, e.g. LINQ, then implemented just what they need to get that scenario done. That lacks elegance. It adds conceptual overhead.
Java has more popularity because MS decided to shun non-Windows. Essentially no one prefers Java-the-language over C#, but Windows-only is a nonstarter in many cases. My IE comment is saying that, like IE, MS has removed resources and the drive to seriously improve its language tech, as there are no real competitors in their space, language-wise.
P.S. I still think C# is a good language in relative terms and they have brilliant people doing great work on it. And the polish of the tooling - wow, yeah it's amazing. I'm just disappointed that MS doesn't seem to be interested in really upping-the-ante and being a leader here. F# is basically a "best-of-breed" language that'd put them solidly ahead, yet they neglect it.
Edit: This is probably too negative of a comment. It's just years
of frustration with MS coming out, that's all.
You haven't had to convince other engineers in your company to adopt a new practice or management to build a new structure?
The difference is that I don't refer to my employer in the third person. I thought the phrasing was strange. It confused me why he needed to convince Microsoft ("MSCorp") since he is a part of Microsoft.
Reading the blog post linked in the comment I was replying to brought the needed clarity: he was talking about convincing Microsoft-not-Microsoft-Research ("MSCorp") while he was at Microsoft Research.
At the very least, Objective-C, Swift and PHP still use reference counting.
Reference counting is better than mark-and-sweep GC for several use cases:
* Real time code where you don't ever want a GC to steal cycles. I know that a lot of research has been done to decrease the amount of time stolen, but it's always non-zero.
* Immediate, deterministic clean-up of resources as soon as they are no longer referenced: If you have a limited number of file handles, for instance, you want them closed ASAP, and not when some GC decides it's time.
* No performance penalty for having weak references. I use this in asset management: A map of asset handles to weak references to currently loaded assets. If an asset is no longer used, it's deallocated immediately. Having weak references in a GC system can increase processing complexity.
Real time code shouldn't allocate/deallocate memory, much less from a GC'able pool. With that constraint, it's possible to have real time code that coexists with a GC, such as RTSJ's NoHeapRealtimeThread or Eventrons, with an effective cost of zero cycles taken by GC from the realtime code.
In C++ you can also replace the allocator to pull items from a pool, so that the "allocation" is "grab the first item from the pool linked list" and "deallocation" is "link this item back into the pool." The first case costs two pointer reads and one write, the second case costs two pointer writes and one read.
This lets you use objects as if they're dynamically allocated while still keeping very costs in allocation/deallocation.
Critical sections are really designed to wrap only a few lines of code. Basically nothing nontrivial should be done within a critical section, IMO.
If you're dealing with multithreading, the only safe thing to do with references is to put them in a list of "things to release later." And then do that from the main thread.
GC does make this easier, sure. But creating a "release list" is not hard. Making a GC not stall the program at awkward times is actually a lot harder.
If you have a limited number of file handles, you may want them closed ASAP, and not when some reference-counting mechanism or GC decides. Reference counting is not ASAP. Typically, you have some smart pointers which will drop the reference in relation to some lexical scope. That could be too late: the file handle object could be in scope over some long running function, and so the file is held open. The fix is to call the close method on the object, and then let refcounting reap the object later. (Woe to you if you decide to manually take over reference counting and start hand-coding acquire and release calls. Been there, debugged that.)
I implemented weak hash tables in a garbage collector; it's neither complicated nor difficult. Under GC, we use weak referencing for essential uses that requires the semantics, not as a crutch for breaking circular references.
I worked on the now-deprecated GC for Cocoa frameworks, and we made heavy use of weak references for out-of-line storage. This put us at risk for cycles: if A references B through a weak-to-strong global map, and B in turn references A, we have an uncollectible cycle even under GC. This represented a large class of bugs we encountered under GC.
So both GC and RC have their classes of cycles and unpredictable behavior. I've come to believe that these techniques are more related than we'd like to admit, and the real difference is in their second order effects. For example, GC enables lockless hash tables, which require hazard pointers or other awkward techniques under RC. On the other hand, RC enables cheap copy-on-write, which is how Swift's collections can be value types.
An atomic increment/decrement takes so little time as to make this irrelevant. If you're in such a tight loop that you care about a single increment when calling a function (to pass a parameter in), you should have inlined that function and preallocated the memory you're dealing with.
I'm talking about general use of smart pointers, which means that there's a function call involved with the smart pointer value copy, and throwing an increment in is trivial by comparison.
>whichever module happens to drop the last reference to an object which is the last gateway to a large graph of objects
When writing games, I don't think I ever had a "large graph of objects" get dropped at some random time. Typically when you drop a "large graph" it's because you're clearing an entire game level, for instance. Glitches aren't as important when the user is just watching a progress bar.
And you can still apply "ownership semantics" on graphs like that, so that the world graph logically "owns" the objects, and when the world graph releases the object, it does so by placing it on the "to be cleared" list instead of just nulling the reference.
Then in the rare case where something is holding a reference to the object, it won't just crash when it tries to do something with it. In this rare case a release could trigger a surprise extra deallocation chain, as you've suggested.
If that's ever determined to be an issue (via profiling!) you can ensure other objects hold weak references to each other (which is safer anyway), in which case only the main graph is ever in danger of releasing objects -- and it can queue up the releases and time-box how many it does per frame.
Honestly having objects reference each other isn't typically the best answer anyway; having object listeners and broadcast channels and similar is much better, in which case you define the semantics of a "listener" to always use a weak reference, and every time you broadcast on that channel you cull any dead listeners.
Aside from all of that, if you're using object pools, you'd need to deallocate thousands, maybe tens of thousands, of objects in order for it to take enough time to glitch a frame. Meaning that in typical game usage you pretty much never see that. A huge streaming world might hit those thresholds, but a huge streaming world has a whole lot of interesting challenges to be overcome -- and would likely thrash a GC-based system pretty badly.
For example, with the Qt library you can pass objects around by value, yet behind the scenes everything is reference counted with automatic copy-on-write. It's the best of all worlds. You get easy, value-based coding (no pointers), speed (because deep down everything is a reference), and deterministic destruction (no GC). http://doc.qt.io/qt-5/implicit-sharing.html
I'm curious if any languages have adopted a Qt-style approach natively.
PHP does exactly this, but for arrays and strings only (objects are implicitly passed by reference). So you can pass arrays by value with no performance penalty, as they are actually passed by reference. A COW mechanism ensures you end up with a local copy only if you write to an argument; such mechanism is disabled when passing arguments byref.
You can easily create a cycle of objects not reachable by any of your roots in your object graph. The ref counts won't ever reach 0 so to collect it you still need a gc pass periodically, imposing stop times. For the same reason you must never rely on ref counting to clean up file objects etc.
So writing refcounting code simply means being aware of this when designing the more complicated data structures in your code to use weak backreferences.
File objects are not secretly stashed in complicated graphs to prevent their destruction and you very much can rely on their behavior. GC passes to clean up cycles is something you got confused about: that's what GC does (because unrooted cycles are very much an issue there too!), not refcounting where you always have to break the cycles yourself, manually, preferably when designing the data structures.
That's like saying memory leaks doesn't magically appear, you write them. In real code, ref cycles are everywhere and it is not trivial to know beforehand what code will generate cycles. And don't give the spiel about how that's only something that affects bad programmers.
(I was lucky enough to study compilers under Prof. Dewar when I was a grad student at NYU - I still have my notes on SPITBOL’s architecture, somewhere…)
> We were talking about students' tendency to let the compiler substitute for thinking
This is actually why I use OCaml. Not going to comment on whether useful error messages are a detriment to pedagogy, but offloading thinking to the compiler is a lifesaver.
Though OCaml's errors are not always the most explicit or nicely written.
It awoken me to just how different and amazing a programming language could be and bent my mind around something very different than what I'd been doing with Fortran. It was an entirely new way to think about designing solutions.
Years later when I was introduced to Prolog, everything thing felt very much at home...Prolog's backtracking algorithm being very much like SNOBOL's pattern matching system.
Of all the languages I've worked with over the years, SNOBOL and FORTH are in class by themselves for how they informed my thinking about problem solving...lessons I carried with me in work done in many other languages.
It's a shame that both languages have passed into history...they each had subtle things to teach a developer just learning their craft...
Can anyone elaborate on this?
'SNOBOL4 stands apart from most programming languages by having patterns as a first-class data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation and alternation. Strings generated during execution can be treated as programs and executed.'
By contrast, other languages such favor the use of regular expressions.
I am not so sure about the claim to uniquness. Ralph Griswald went on to develop the Icon language which included this feature. The Unicon language, a superset of Icon also has this distinction.
FULLNAME = FIRSTNAME LASTNAME
Maybe you want to handle old-school people like raganwald:
OLDSCHOOLNAME = LASTNAME ‘, ‘ FIRSTNAME
ANYNAME = FULLNAME | OLDSCHOOLNAME
Regular expressions work the same way, but in most languages, the regular expression language is really a DSL embedded in the syntax for a regular expression literal. Whereas in SNOBOL, all those operators are ordinary language operators and you can use them anywhere.
So you can manipulate patterns programatically.
full_name = first_name * last_name
old_school_name = last_name * P', ' * first_name
any_name = full_name + old_school_name
(* is sequence, + is or, P is a function that converts a string to a pattern)
That's the kind of way that patterns are more deeply baked into these languages.
Perl 6 appears to have assimilated that idea:
IIUC grammars are also first class in Perl 6, but it isolates their DSL to grammar blocks. I'm not sure of the specifics of each to note whether one is capable of easily doing something the other can't or has a hard time with, but it looks to boil down to SPITBAL's implementation being slightly easier to access as there's no grammar block required, and Perl 6's being slightly more clear and self documenting, due to that same requirement.
Note: I've yet to use either, so someone with more experience, possibly you, might be able to correct my misunderstandings.
I don't know SNOBOL though, so I'm having a hard time picturing what the actual implications of that are, or exactly how it works. But I'm intrigued enough now that I want to go read this "Green Book" and see what it's all about.
.rule name value pattern ; Define a syntax rule
.insn pattern ; Define an instruction
... ; Macro expanded on pattern match
"pattern" contains literal characters, whitespace and
references to other rules with "<rule-name>" or <expr>
for a math expression.
"value" is a comma separated list of expressions which
can contain "argN" to reference the Nth value from the
pattern (as returned by embedded rules).
For example, this is how you could construct the
instructions "lda <expr>", "lda #<expr>", "ldb <expr>",
and "ldb #<expr>":
.rule dual 0x01 lda
.rule dual 0x02 ldb
.rule mode 0xf8,arg1 <expr>
.rule mode 0xfa,arg1 #<expr>
.insn <dual> <mode>
.byte arg1|arg2 ; Emit op-code
.word arg3 ; Emit argument
SNOBOL (StriNg Oriented and symBOlic Language) is a series of computer programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4.
TkS*LIDE is a Tcl/Tk based IDE for SPITBOL (along with SNOBOL4). Binaries for SNOBOL4 and SPITBOL are included with the IDE, along with a tutorial and sample programs.
SNOBOL is also distinguished by being described in Guy Steele & Richard Gabriel: 50 in 50 speech as one of the three languages worth knowing:
> SPITBOL is unique in the power it provides to manipulate strings and text. I’ve yet to see anything else come close.
I would be interested to know more about what features SPITBOL offers for string processing. I'm going to take a look at the "Green Book"  Dave mentions, but if anyone else has relevant focused resources on that topic I'd love to give them a look.
The string processing is fantastic -- extremely powerful. I think regexes have a lot of the same power, but I always found SNOBOL4 more readable after the fact, when I had to go back and read and fix my own code. But that's not it either.
I think the main reason I liked SNOBOL4 so much was that it was the first dynamic language I used. Values have types, variables do not. That was a big revelation. I don't think I actually exploited it very often, but it was a cool new idea. And the absence of type declarations also contributed to the sense that the code looked clean. Automatic memory management was also very nice. I had spent a lot of time dealing with memory management in C. I must have in Pascal also. And I really don't remember what Algol-W did for memory management -- a free or delete statement maybe? And of course in FORTRAN, COBOL, and BASIC, there was no dynamically allocated memory at all, so you had to guess high, keep track, etc. Not having to worry about tracking memory was a nice change.
Would be interested to know what built-in types are currently available. I wonder also whether this language would also be a good fit for test case writing.
The author of HolyC and TempleOS, Terry Davis, might be in a similar position:
Though I imagine with the exposure it's gotten over the years, Terry might not be the sole user (probably sole regular user).