For example, SQL is Turing complete and expresses simulation of data flow paradigm [2]. It is effectively executable on current massively parallel hardware and even scales horizontally, enabling safe transactional "microservices." Why should we embrace WASM, but not SQL?
It definitely is related to prediction by partial match (PPM).
BWT sorts rotated data and what is achieved is that same suffixes group together:
...
"Bzip2 and Bzip3 are simply combining more"
"Bzip3 are simply combining moreBzip2 and "
The preceding (to suffix) character goes to end and then gets outputted. This is much like PPM going backward. There is a PPM* algorithm (unbounded context length) where authors considered reconstruction of contexts from data, utilizing something like LZSS seach. Same idea is in BWT - context is reconstructed from data.
BWT also breaks near dependencies in data, this is why move-to-front with Huffman or arithmetic encoding works well there.
"It's pattern matching, plain and simple, an area where pattern matching algorithms excel. Pattern matching driven decomp absolutely leads"
Decompilation is a dependence graph problem, one can formulate decompilation as a graph transformation/rewrite. Neural networks are notoriously bad at graphs.
AlphaFold is based on graph neural networks. The biggest issue is that we still do not know how to best encode graph problems in ways neural networks can exploit. Current graph neural network techniques exploit certain invariants but cannot distinguish between various similar graphs. And yet, they're still generating meaningful insights.
Absolutely. LLMs use methods that are not really provably solving the given problem, they just happen to work in most relevant cases. That's a heuristic.
> Our methodology leverages state-of-the-art natural language processing techniques to systematically evaluate the evolution of research approaches in computer vision. The results reveal significant trends in the adoption of general-purpose learning algorithms and the utilization of increased computational resources. We discuss the implications of these findings for the future direction of computer vision research and its potential impact on broader artificial intelligence development.
I know that these algorithms are the best that exists for many tasks, but any non-deterministic non-provable algorithm is still technically a heuristic. Also, currently, the bitter lesson seems to be that more of the same runs into diminishing returns, contrary to "The Bitter Lesson"(tm). There will probably be a better AI architecture at some point, but "lol just add more GPUs / data / RAM" times are kind of over for now.
I don't think that I misunderstood - it says that more computing power beats "hand-engineering". But more computing power doesn't seem to work that well anymore to improve AI performance - and there is not much more computing power coming in the next couple of years, not orders of magnitude more for free like it used to be. Case in point: the somewhat disappointing new nVidia generation.
> Give time to researchers to gather enough mappings between source code and machine code, get used to training large predictive models, and you shall see top notch decompilers that beat all engineered methods.
Decompilation is about dependencies which makes it a graph problem.
One such problem is boolean satisfiability and this particular kind of problem is extremely important. It also very easy to gather mappings between CNF and solutions. Actually, randomization of standard benchmarks is now part of SAT competitions, AFAIK.
Have you seen any advances there using large predictive models?
Proper decompilation is even harder, it is much like halting problem than SAT. Imagine that there is a function that gets inlined and, therefore specialized. One definitely wants source for the original function and calls to it, not a listing of all specializations.
This moves us to the space of "inverse guaranteed optimization" and as such it requires approximation of the solution of halting problem.
And author loves shiny new things [1]: "...use 10% of your time writing code (or telling AI to write code)..."
[1] https://creston.blog/stop-writing-code/
What about old, not shiny things?
For example, SQL is Turing complete and expresses simulation of data flow paradigm [2]. It is effectively executable on current massively parallel hardware and even scales horizontally, enabling safe transactional "microservices." Why should we embrace WASM, but not SQL?
[2] https://en.wikipedia.org/wiki/Dataflow_architecture
reply