More

thesz · 2025-02-12T13:16:44 1739366204

This is a post about shiny new thing author loves.

And author loves shiny new things [1]: "...use 10% of your time writing code (or telling AI to write code)..."

[1] https://creston.blog/stop-writing-code/

What about old, not shiny things?

For example, SQL is Turing complete and expresses simulation of data flow paradigm [2]. It is effectively executable on current massively parallel hardware and even scales horizontally, enabling safe transactional "microservices." Why should we embrace WASM, but not SQL?

[2] https://en.wikipedia.org/wiki/Dataflow_architecture

thesz · 2025-02-02T05:54:46 1738475686

It definitely is related to prediction by partial match (PPM).

BWT sorts rotated data and what is achieved is that same suffixes group together:

  ...
  "Bzip2 and Bzip3 are simply combining more"
  "Bzip3 are simply combining moreBzip2 and "

The preceding (to suffix) character goes to end and then gets outputted. This is much like PPM going backward. There is a PPM* algorithm (unbounded context length) where authors considered reconstruction of contexts from data, utilizing something like LZSS seach. Same idea is in BWT - context is reconstructed from data.

BWT also breaks near dependencies in data, this is why move-to-front with Huffman or arithmetic encoding works well there.

unnah · 2025-02-02T09:21:05 1738488065

Wow, thanks. As always, the best way to learn more on the internet is to be confidently and totally wrong!

thesz · 2025-02-01T08:42:47 1738399367

Just like with neural networks and Adam [1], LLMs evolve to make transformers their best building block.

[1] https://parameterfree.com/2020/12/06/neural-network-maybe-ev...

thesz · 2025-02-01T08:40:36 1738399236

  > ...our results give: ... (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.

It would be good to also prove that there is no task that becomes exponentially harder with chain-of-thought.

thesz · 2025-01-30T21:01:28 1738270888

Let me parrot you, it's fun.

"It's pattern matching, plain and simple, an area where pattern matching algorithms excel. Pattern matching driven decomp absolutely leads"

Decompilation is a dependence graph problem, one can formulate decompilation as a graph transformation/rewrite. Neural networks are notoriously bad at graphs.

gsam · 2025-01-31T02:17:07 1738289827

> Neural networks are notoriously bad at graphs.

AlphaFold is based on graph neural networks. The biggest issue is that we still do not know how to best encode graph problems in ways neural networks can exploit. Current graph neural network techniques exploit certain invariants but cannot distinguish between various similar graphs. And yet, they're still generating meaningful insights.

thesz · 2025-01-31T08:50:45 1738313445

> AlphaFold is based on graph neural networks.

And what is the size of graphs processed by AlphaFold? How does it compare to the size of program dependence graph?

Here's evaluation of boolean satisfiability encoding of protein folding problem: https://pmc.ncbi.nlm.nih.gov/articles/PMC7197060/

Boolean satisfiability solvers are used in satisfiability modulo theories solvers that, in turn, are used to prove various things about programs.

thesz · 2025-01-30T19:08:25 1738264105

> Heuristics are dead

I guess it is an example of an heuristic.

ahartmetz · 2025-01-31T05:33:52 1738301632

Absolutely. LLMs use methods that are not really provably solving the given problem, they just happen to work in most relevant cases. That's a heuristic.

sitkack · 2025-01-31T13:00:16 1738328416

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

> Our methodology leverages state-of-the-art natural language processing techniques to systematically evaluate the evolution of research approaches in computer vision. The results reveal significant trends in the adoption of general-purpose learning algorithms and the utilization of increased computational resources. We discuss the implications of these findings for the future direction of computer vision research and its potential impact on broader artificial intelligence development.

https://arxiv.org/abs/2410.09649v1

ahartmetz · 2025-01-31T13:47:07 1738331227

I know that these algorithms are the best that exists for many tasks, but any non-deterministic non-provable algorithm is still technically a heuristic. Also, currently, the bitter lesson seems to be that more of the same runs into diminishing returns, contrary to "The Bitter Lesson"(tm). There will probably be a better AI architecture at some point, but "lol just add more GPUs / data / RAM" times are kind of over for now.

sitkack · 2025-01-31T16:12:37 1738339957

You are misinterpreting the bitter lesson, it only gets more bitter with time!

The arxiv paper is well written, worth a read.

ahartmetz · 2025-01-31T17:48:08 1738345688

I don't think that I misunderstood - it says that more computing power beats "hand-engineering". But more computing power doesn't seem to work that well anymore to improve AI performance - and there is not much more computing power coming in the next couple of years, not orders of magnitude more for free like it used to be. Case in point: the somewhat disappointing new nVidia generation.

sitkack · 2025-01-31T20:16:35 1738354595

> it says that more computing power beats "hand-engineering".

This is the simplistic tl;dr, but that isn't what is says. You should reread the essay and the linked arxiv paper.

It is much more than throwing more compute at a problem.

thesz · 2025-01-30T19:05:31 1738263931

> Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.

Decompilation is about dependencies which makes it a graph problem.

One such problem is boolean satisfiability and this particular kind of problem is extremely important. It also very easy to gather mappings between CNF and solutions. Actually, randomization of standard benchmarks is now part of SAT competitions, AFAIK.

Have you seen any advances there using large predictive models?

Proper decompilation is even harder, it is much like halting problem than SAT. Imagine that there is a function that gets inlined and, therefore specialized. One definitely wants source for the original function and calls to it, not a listing of all specializations.

This moves us to the space of "inverse guaranteed optimization" and as such it requires approximation of the solution of halting problem.

thesz · 2025-01-27T09:06:13 1737968773

  > It was strange to me that there was no log-as-service...

Actually, there are plenty of them. Most heard of is Ethereum 2.0 - it is a distributed log of distributed logs.

Any blockchain that is built upon PBFT derivative is such a system.

thesz · 2025-01-25T04:44:31 1737780271

  > Would you reimplement a regex engine yourself? I hope not.

Yes, because there are plenty to choose from. ;)

I've implemented parallel-parsing-processes-style [1] parser combinators in at least three languages, though.

[1] https://www.cambridge.org/core/services/aop-cambridge-core/c...

thesz · 2025-01-24T22:49:08 1737758948

...which takes these remaining 99% of a development time...

saagarjha · 2025-01-25T04:22:41 1737778961

Surely your LTO bugs are not so easy to fix that they take less time to resolve than linking itself does.