Hacker Newsnew | past | comments | ask | show | jobs | submit | more BiteCode_dev's commentslogin

Always fun when geeks discover basic philosphical concepts like it's a new thing and not something greeks nailed 2000 years ago.


But its on substack.. so its way different.

And.

Its worded,

Like This.

#Deep


You don't really often need an array language, just like you don't really often need regexes.

When when you have a problem that perfectly fits the bill, they are very good at it.

The problem is they are terrible at everything else. I/O, data validation, manipulation strings, parsing, complex logic trees...

So I feel like just like regexes, there should be an array language parser embedded in most languages, that you opt in locally for just this little nudge.

In Python, it would be nice to be able to "import j" like you "import re" in the sdlib.

The entire J code base, including utility scripts, a console, a stdlib and a regex engine, is 3mb.


>... there should be an array language parser embedded in most languages, that you opt in locally for just this little nudge.

April is this for Common Lisp: https://github.com/phantomics/april


I don't know if you're aware that there's a formal analogy between matrix operations and regex operations:

  Matrix vs Regex
  --------------

  A+B with A|B
  
  A*B with AB
  
  (1 - A)^{-1} with M*
To make the analogy between array programming and regex even more precise: I think you might even be able to make a regex engine that uses one boolean matrix for each character. For example, if you use the ASCII character set, you'd use 127 of these boolean matrices. The matrices should encode transitions between NFA states. The set of entry states should be indicated by an additional boolean vector; and the accepting states should be indicated by one more boolean vector. The regex operations would take 1 or 2 NFAs as input, and output a new NFA.


Didn't know that but I assume you can share most of the engine's logic anyway. Those kind of generalisations tend to break down once you get pratical implementations.


The following is a Python prototype:

    import numpy as np
    from scipy.sparse import csr_matrix, bmat

    class NFA:
        def __init__(self, T, S, E):
            self.T = T; self.S = S; self.E = E
        @property
        def null(self): # Nullable?
            return (self.S.T @ self.E)[0,0]

    # --- 1. The Core Algebra ---

    def disjoint(fa1, fa2):
        """ Places fa1 and fa2 into a shared, non-overlapping state space. """
        n1, n2 = fa1.S.shape[0], fa2.S.shape[0]
        z = lambda r, c: csr_matrix((r, c), dtype=bool)
        
        # Block Diag Transitions
        chars = set(fa1.T) | set(fa2.T)
        T_new = {}
        for c in chars:
            m1 = fa1.T.get(c, z(n1, n1))
            m2 = fa2.T.get(c, z(n2, n2))
            T_new[c] = bmat([[m1, None], [None, m2]], format='csr')

        # Stack Vectors
        S1 = bmat([[fa1.S], [z(n2,1)]], format='csr')
        S2 = bmat([[z(n1,1)], [fa2.S]], format='csr')
        E1 = bmat([[fa1.E], [z(n2,1)]], format='csr')
        E2 = bmat([[z(n1,1)], [fa2.E]], format='csr')
        
        return NFA(T_new, S1, E1), NFA(T_new, S2, E2)

    def fork(fa):
        """ Returns two references to the exact same machine. """
        return fa, fa

    def connect(fa1, fa2):
        """ 
        The General "Sequence" Op.
        Wires fa1.End -> fa2.Start, and updates Start/End vectors.
        """
        # 1. Transitions: T_new = T_combined + (Bridge @ T_combined)
        # Bridge = Start_2 * End_1^T
        Bridge = fa2.S @ fa1.E.T
        
        chars = set(fa1.T) | set(fa2.T)
        T_new = {}
        for c in chars:
            # If fa1==fa2 (fork), this just gets fa1.T[c]
            # If fa1!=fa2 (disjoint), this adds the non-overlapping blocks
            m_comb = fa1.T.get(c, _z(fa1)) + fa2.T.get(c, _z(fa2))
            
            # Apply the feedback/feedforward
            T_new[c] = m_comb + (Bridge @ m_comb)

        # 2. States: Standard Concatenation Logic
        # S_new = S1 + (S2 if N1)
        # E_new = E2 + (E1 if N2)
        # Note: If fa1==fa2, this correctly computes S + (S if N) = S
        S_new = fa1.S + (fa2.S if fa1.null else _z(fa1, 1))
        E_new = fa2.E + (fa1.E if fa2.null else _z(fa1, 1))

        return NFA(T_new, S_new, E_new)

    # --- 2. The Operations (Now Trivial) ---

    def cat(fa1, fa2):
        return connect(*disjoint(fa1, fa2))

    def leastonce(fa):
        return connect(*fork(fa))

    def union(fa1, fa2):
        d1, d2 = disjoint(fa1, fa2)
        chars = set(d1.T) | set(d2.T)
        T_sum = {c: d1.T.get(c, _z(d1)) + d2.T.get(c, _z(d2)) for c in chars}
        return NFA(T_sum, d1.S + d2.S, d1.E + d2.E)

    def star(fa):
        return union(one(), leastonce(fa))

    # --- Helpers ---
    def lit(char):
        T = {char: csr_matrix(([True], ([1], [0])), shape=(2,2), dtype=bool)}
        return NFA(T, _v(1,0), _v(0,1))
    def one(): return NFA({}, _v(1), _v(1)) # Epsilon
    def _z(fa, c=None): return csr_matrix((fa.S.shape[0], c if c else fa.S.shape[0]), dtype=bool)
    def _v(*args): return csr_matrix(np.array(args, dtype=bool)[:, None])

    # --- Execution ---
    def run(fa, string):
        curr = fa.S
        for char in string:
            if char not in fa.T: curr = _z(fa, 1)
            else: curr = fa.T[char] @ curr
        return (curr.T @ fa.E)[0,0]

    if __name__ == "__main__":
        # Test: (a|b)+ c
        # Logic: cat( leastonce( union(a,b) ), c )
        
        a, b, c = lit('a'), lit('b'), lit('c')
        regex = cat(leastonce(union(a, b)), c)
        
        print(f"abac: {run(regex, 'abac')}") # True
        print(f"c:    {run(regex, 'c')}")    # False (Needs at least one a/b)


Pastebin mate.



AFAIK APL was used to verify the design of IBM 360, finding some flaws. I wrote my first parser generator in J. I think these both contradict your opinion.

I think Iverson had a good idea growing language from math notation. Math operations often use one or few letters - log, sin, square, sign of sum or integral. Math is pretty generic, and I believe APL is generic as well.


People wrote video games in Excel.


I suspect the same regarding the analogy with regex, but I still haven't finished learning an array language. Do you know what you'd use an array language for?


Personally, to defer importing numpy until I can't anymore.

Sometimes you just need a little matrix shenanigans, and it's a shame to have to bring in a bazooka to get decent ergonomics and performances.


pyrsistent is super slow, though. Just ran a quick benchmark:

  - Creation - 8-12x slower  
  - Lookup - 22-27x slower  
  - Contains check - 30-34x slower  
  - Iteration - 5-14x slower  
  - Merge - 32-158x slower  
 
Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.

Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.

You'd better have an incredible ROI to justify that.


> pyrsistent is super slow, though

Since when is Python about speed?

> Just ran a quick benchmark

Where's the code? Have you observed the bottleneck call?

> Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

> This is rarely the case in practice

Where's the stats on the actual practice?

> You'd better have an incredible ROI to justify that.

The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.


Clearly the ROI is perfect for you.

I won't waste more of your time.


It's perfect for most Python developers actually, not just for myself, contrary to your "in practice" claim.


Ordering is very useful for testing.

This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run.

After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state.

3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key.

Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly.


Also, this is why we still gravitate toward FOSS communities. It's the last vestige of a dying era. A circle where people like that have a chance to hang up together and keep the warm feeling of being human.


FOSS is a bit like blogging in that a lot of it seems to be motivated by a desire to win an argument you lost once already.

I’m a maintainer on one library in small part because of an argument I had with a maintainer of a similar library years ago. And nearly a maintainer on another one. I voted with my feet and made improvements to DX an/or performance because I can’t pull down a wrongheaded project but I can pull up a better one.

(Incidentally I looked at his issue log the other day and it’s 95% an enumeration of the feature list of the one I’m helping out on. Ha!)


I've never thought about it this way but now that you mention it both blogging and FOSS once stripped of substance seem like L'esprit de l'escalier externalized.

Do I go soul searching now or start a blog?


Never put it this way before, but it's exactly why I started blogging. I was fed up with how bad Python content was online.


Anybody using it for something serious ? I can't see a use case beyond I need a quick script running that is not worth setting up a vps.


So close to understand the concept of taxes.


I think a lesson in smuggling will sink in before a lesson on tax bureaucracy does. If the $50m worth smuggled through Canada would have been an easy $12.5m for the US government.


- C takes a lot more context than a high-level language

- a lot of C code out there is not safe, so the LLM outputs that

- C encodes way less of the programmer's intention, and way more implementation details. So unless the author is extremely good at naming, encapsulating and commenting, the LLM just has less to work with. Not every C code is Sqlite/redis/ffmeg quality.

- the feedback loop is slower, so the LLM has less chance to brute force a decent answer

- there is no npm/pypi equivalent for C on which to train the LLM so the pool for training is less diverse

- the training pool is vastly Linux-oriented, with the linux kernel and distro system libs being very prominent in the training data because C programs on Windows are often proprietary. But most vibe coders are not on Linux, nor into system programming.

Sure, you can vibe code in C. Antirez famously states he gets superb ROI out of it.

But it's likely you'll get even better results with other languages.


Microsoft has this problem with most of its products.

It's not just AI, it's a market fit and quality problem.

They don't need to solve it, however.

Their strategy has been quite clear: make it barely usable so that is passes muster to auditors, integrate it with systems that corporations need, and sell them on the integrations.

Teams and Azure suck?

So what?

Big companies will pay for that, because it's integrated with their ldap, has an audit trail, gives them the ISO-whatever stamp, and lets them worry about something else.

That the users are miserable is almost never the question for the ones signing the checks.

In a world where box-checking is paramount, this approach is a winning strategy.


Tell claude to put the screenshot as an centered image with the body having the starry background on repeat. Then define the links as boxes over each icons with an old little tech trick called an image map.

Common at the time before flash took over.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: