Hacker News new | past | comments | ask | show | jobs | submit | PeCaN's comments login

If you watch the video it looks like their proposed syntax is not APL-like but closer to mainstream languages.

I'm honestly not sure if this is a good thing or not. You said "easier" syntax than APL but APL is honestly a very easy syntax for working with arrays. That's a significant part of the advantage of APL, it makes it very easy to come up with, talk about, and maintain array algorithms.

Matlab and Julia and other languages aimed at scientific computing have some array language-like traits but lack a lot of the functions that make APL more generally applicable. And .map is all wrong; it's extra noise and it doesn't generalize down to scalars or up to matrices—the defining feature of array languages is that operations are implicitly polymorphic over the rank of the input.


I understand that. I still don't want to spend the effort to learn APL.

It's like digital cameras that came around. Many users knew how to use film cameras so you made the digital cameras to be mostly like film cameras even if the digital medium would have enabled a very different, much better camera straight out of the box. But the market had invested so much time in this learning how to work with film that you had to do it like that. Path dependency is not just about rigid thinking, it's about using what you have because that saves a lot of resources.

Regarding, .map being all wrong, in Ruby it's not a property of an array, it's a method for enumerables. Array is one type of enumerable, but it works with hashmaps etc. https://ruby-doc.org/core-2.6.5/Enumerable.html So it's not that non-general. It is noisy (and weird with the pipes) because it's general.


To be honest I don't really see people who don't want to learn APL being that interested in putting in the effort to completely upend how they think about programming and algorithms in order to use other array languages, regardless of syntax. (After all this is by far the hardest part of learning APL, the symbols are easy enough and easy to look up anyway.)

map is general in kind of the wrong way. You could after all add a #map method to Object for scalars and make a Matrix class that also implements it and then just call map everywhere. However you still run into the problem, mentioned in the video, that it doesn't easily generalize to x + y where both x and y are arrays; you have to use zip or map2 or something (and now you still have to figure out how to do vector + matrix) and yes you can kind of do explicit "array programming" in Ruby if for some reason you're really compelled to do that but it will look awful. And that's just what array languages do for you implicitly. As a paradigm there's a bit more too it than "just call map everywhere"—there's still all the functions for expressing algorithms as computations on arrays.


I've been working on something like this on and off for the past 4 years or so, although with something more like generators than streams.

I think it's a very, very promising idea (I admit to heavily being biased towards anything APL-influenced) although surprisingly difficult to get right. Gilad Bracha is obviously way smarter than me so I'm definitely curious where he goes with this.

One additional idea that I keep trying to make work is integrating variations of (constraint) logic programming and treating solutions to a predicate as a generator or stream that operations can be lifted to rank-polymorphically. As a simple example a range function could be defined and used like (imaginary illustrative syntax)

    range(n,m,x) :- n <= x, x <= m
    
    primesUpto(n) = range(2,n,r),               # create a generator containing all solutions wrt r
      mask(not(contains(outerProduct(*, r, r), r)), r)  # as in the video
    
I've never really gotten this to work nicely and it always feels like there's a sort of separation between the logic world and the array world. However this feels incredibly powerful, especially as part of some sort of database, so I keep returning to it even though I'm not really sure it goes anywhere super useful in the end.


Have you seen Halide? https://halide-lang.org/

It's a bit special-cased (afaict), but it sounds like it makes at least a solid step in this sort of direction.


What does “:-“ mean?


Probably “such that”


it's not so much that gcc does anything specific so much as LLVM is just really really inefficient—they don't track compilation time at all so it's easy for releases to regress, half the stuff in there is academics implementing their PhD thesis aiming for algorithmic accuracy with little regard for efficiency, and LLVM's design itself is somewhat inefficient (multiple IRs, lots and lots of pointers in IR representation, etc)

that said this makes it an excellent testbed but compilation time will keep getting slower every release until they start caring about it


The rust compiler team now tracks LLVM performance, see [0]. They are able to spot regressions as soon as they get merged, and petition to get the change reverted or performance fixed when the impact is large. This has been fairly successful so far, but obviously needs sustained efforts if we are to ever dream of ever getting halfway decent compilation times.

[0]: https://nikic.github.io/2020/05/10/Make-LLVM-fast-again.html


that's just because gcc has certain optimization passes that can't be disabled

(that said gcc -O0 is still absolutely nothing like what a human would write)


- it takes some die space sure but no x86 processors are actually limited by instruction decoding (except, iirc, the first generation xeon phi in some cases)

- huge pages don't exactly require weird OS hoops although i agree the 4kb→2mb→1gb page sizes are inconvenient


All the Intel CPUs with RDRAND, although I guess that's not exactly hidden anymore.


Do you have any documentation that a back door has been found? I'm aware Ted Ts'o and others were worried of the possibility of backdoors in RDRAND, but I wasn't aware of any being confirmed. Also, the Wikipedia article for RDRAND really needs to be updated if a back door has been confirmed.


I sort of wonder if this approach to JITing is worth it over just writing a faster interpreter. This is basically like what V8's baseline JIT used to be and they switched to an interpreter without that much of a performance hit (and there's still a lot of potential optimizations for their interpreter). LuaJIT 1's compiler was similar, although somewhat more elaborate, and yet still routinely beaten by LuaJIT 2's interpreter (to be fair LuaJIT 2's interpreter is an insane feat of engineering).


They detail why they chose the current path rather than continuing improvements of the interpreter in the previous post [1]. See the last few paragraphs for a summary.

[1] http://blog.erlang.org/a-closer-look-at-the-interpreter/


>[She] complains Risc-V need 4 instructions to do what x86_64 and arm does in two, but... it says Risc-V.

So… what, it should take 5 instructions?

Executing more instructions for a (really) common operation doesn't mean an ISA is somehow better designed or "more RISC", it means it executes more instructions.

>And x86_64 CISC instructions devolve to a pile of microcode anyway.

Some people seem to have this impression that like every x86 instruction is implemented in microcode (very, very few of them are) and even charitably interpreting that as "decodes to multiple uops" (which is completely different) is still not right. The mov in the example is 1 uop.


> Executing more instructions for a (really) common operation doesn't mean an ISA is somehow better designed or "more RISC", it means it executes more instructions.

True. But as bonzini points out (or rather, hints at) in https://news.ycombinator.com/item?id=24958644, the really common operation for array indexing is inside a counted loop, and there the compiler will optimize the address computation and not shift-and-add on every iteration.

See https://gcc.godbolt.org/z/x5Mr66 for an example:

    for (int i = 0; i < n; i++) {
        sum += p[i];
    }
compiles to a four-instruction loop on x86-64 (if you convince GCC not to unroll the loop):

    .L3:
        addsd   xmm0, QWORD PTR [rdi]
        add     rdi, 8
        cmp     rax, rdi
        jne     .L3
and also to a four-instruction loop on RISC-V:

    .L3:
        fld     fa5,0(a0)
        addi    a0,a0,8
        fadd.d  fa0,fa0,fa5
        bne     a5,a0,.L3
This isn't a complete refutation of the author's point, but it does mitigate the impact somewhat.


That's fair. It's definitely not a killer, (or even in my opinion the worst thing about RISC-V,) just another one of these random little annoyances that I'm not really sure why RISC-V doesn't include.


One common use of array indexing walks the array sequentially.

But hash tables are used here and there, also in loops.

Some people know them as "dictionaries" or "key/value stores".


I'm not sure about this "RISC way" stuff. From a uarch standpoint the RISC vs CISC distinction is moot and from an ISA standpoint the only real quantifiable difference seems to be being a load-store architecture.

ISAs without conditional moves tend to have predicated instructions which are functionally the same thing. I'm not actually aware of any traditionally RISC architectures that have neither conditional moves or predicated instructions. While ARMv7 removed predicated instructions as a general feature ARMv8 gained a few "conditional data processing" instructions (e.g. CSEL is basically cmov), so clearly at least ARM thinks there's a benefit even with modern branch predictors.

Conditional instructions are really, really handy when you need them. It's an escape hatch for when you have an unbiased branch and need to turn control flow into data flow.


We were talking ISAs so let's focus on that.

The quantifiability comes from measuring results when you give compilers new instructions, vs paying implementation complexity (time, money and future baggage to support the insn forever). The upsides and downsides here come in different units so it's still tricky.

Lots of instructions can be proposed with impressive qualitative speeches convincing you how dandy they are, but in the end it's down to the real world speedup yield vs the price you pay in complexity and resulting second order effects.

(In rarer cases the instructions might be added not for performance reasons but to ease complexity and cost, that's where qualitative arguments still have a place when arguing for adding instructions).

It's fine if we don't have the evidence in this thread - I was just asking on the off chance that someone can point to a reference.


It's not like someone is proposing some crazy new instruction to do vector math on binary coded decimals while also calculating CRC32 values as a byproduct. It's conditional move. Every ISA I can think of has that.


This prompted me to look through some RISC ISAs (+x86), there may be errors since I made just a cursory pass.

Seems the following have conditional moves: MIPS since IV, Alpha, x86 since PPRo, SPARC since SPARCv9

The following seem to omit conditional moves: AVR, PowerPC, Hitachi SH, MIPS I-III, x86 up to Pentium, SPARC up to SPARCv8, ARM, PA-RISC (?)

PA-RISC, PowerPC, ARM at least do a lot of predication and make a high investment to conditional operations (by way of dedicating a lot of bits in insn layout to it), but also end up using it a lot more often than conditional move tends to be used.


ARMv7's Thumb2 has general predication of hammocks via "if-then", and ARM itself had general predication. ARMv8 has conditional select, which is quite a bit richer than conditional move. POWER has "isel". Seeing an ISA evolve a conditional move later in life is pretty strong evidence that it was useful enough to include. So would modify your list to be:

ISAs that evolved conditional move:

  - MIPS
  - SPARC
  - x86
  - POWER (isel)
ISAs that started life with it:

  - ARM (via general predication)
  - Alpha
  - IA64 (via general predication)


Good list.

Observation re list of ISAs that evolved conditional move vs ISAs that omit conditional move: MIPS, POWER, x86, SPARC all targeted high power "fat core" applications at the point where it got added. AVR, Hitachi SH, PowerPC didn't add it while being driven more by low power / embedded applications. And many ISAs continued to see wide use in the pre-cmov versions of the ISA in embedded space (eg MIPS) after the additions. (PowerPC even removed it when being modeled after POWER)


To be clear for anyone not so up-to-speed on this: what AArch64 has (conditional select) is strictly less expressive than AArch32 (general predication).

The take away there is that general predication was found to be overly complex where the vast (vast!) majority of the benefit can be modelled with conditional select.


Its less than general predication, but a little bit more than cmov/csel. The second argument can be optionally incremented and/or complemented. Combined with the dedicated zero register, you can do all sorts of interesting things to turn condition-generating instructions into data. A few interesting ones include:

   y = cond ? 0 : -1;
   y = cond ? x : -x;
   x = cond ? 0 : x+1;  //< look ma, circular addressing!


popcount is extremely useful in a lot of algorithms, from RSA to chess engines to sparse arrays and tries

it's honestly pretty baffling that RISC-V doesnt have it (perils of design-by-academia)


The RISC way is to prove the exception is worth it. There are a lot of possible specialized instructions that are useful for some workloads.


ARM has it. DEC Alpha had it (before x86, even).

I get that there are a lot of narrow-use instructions but popcount is a pretty well-known and common operation.


In the Alpha case it was a very late addition, in the last widely shipping version of the chip and IIRC was speculated to be part of some supercomputer/classified use case. ARM has a history of having quirky un-RISCy instructions.

(edit: also it seems that ARM has just cnt.v8 for counting 8-bit lanes in NEON and no 64-bit scalar instruction version, interesting. Being part of NEON also means it's an optional part on ARM)


Late addition is more indicative of value than appearing in first releases. People guess about the base instruction set, but additions happen only in response to high demand.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: