More

robertelder · on Aug 13, 2024

This is my favorite observation.

robertelder · on Aug 13, 2024

This is my favourite comment.

robertelder · on March 21, 2024

This is my favourite comment.

robertelder · on June 12, 2023

That's a very impressive story! I was hoping this post would get a bit more attention so I could read of more tales like this, but I guess this will have to do.

I assume that a robot capable of handling 400lb must have a pretty beefy power supply. You can probably get an intuitive sense of how much damage a robot can do to you by imagining what would happen if all of the power output was directed directly at your body (in the form of heat/electrical/kinetic). A tiny 5v motor might on a breadboard might be enough warm your hand up or fling a small pebble toward you, but a motor that requires one of those bigger 400v 3-phase 60A supplies can probably pull enough energy to melt iron or generate the same kinetic energy as some military ordnances.

robertelder · on May 4, 2023

The title of this post stood out to me because it's a fairly niche topic that I've been meaning to write a 'rant' post about myself. I think in a very 'pure' parsing theory sense, there is no real advantage to separating the lexer and the parser. I claim that any counter-argument is due to the limitations of individual software tools, and their inability to let up specify the grammar that we actually want.

If I recall correctly, the original historical reason for separating them was due to memory limitations on early mainframe computers. Actual source code would be fed into the 'lexer' and a stream of 'tokens' would come out the other end and then be saved to a file, which would then be fed into the next stage.

Having said this, you can ask "In practice, is there currently an advantage to separating the lexer and the parser with the tools we use now?" The answer is 'yes', but I claim that this is just a limitation of the tools we have available to us today. Tokens are usually described using a simple regular expression, whereas the parsing rules are 'context free', so worst-case complexity of parsing the two is not the same. If you pull tokens into the parser and just parse them as naive 'single-character parser items', then you end up doing more work than you would have otherwise since your parser is going to try all kinds of unlikely and impossible chopped up token combinations. The other big issue is memory savings. Turning every token into a (small) parse tree is going to increase memory usage 10-20x (depending on the length of your tokens).

Personally, I think there is a substantial need for more advanced ways of describing language grammars. The ideal model would be to have various kinds of annotations or modifiers that you could attach to grammar rules, and then the parser generator would do all sorts optimizations based on the constraints that these annotations imply. Technically, we already do this. It's called putting the lexer in one file, and the parser in another file. There is no good reason why we can't specify both using a more unified grammar model and let the parser generator figure out what to do about each rule to be as efficient as possible.

The computer science theory behind grammars seems to have peaked in the 1980s, and there doesn't seem to have been many new innovations that have made it into daily programming life since then.

robertelder · on April 11, 2023

I can't wait to watch the FitMC video on this.

robertelder · on Jan 30, 2023

I spent a fair amount of time working on building my own from-scratch C compiler that I no longer work on anymore:

https://recc.robertelder.org/

One of the reasons that I stopped working on it was because of how slow it became, so I might be able to contribute to answering your question.

Initially, when the compiler was simpler, it was actually much faster. I was able to do some meaningful proof of concept demos with it like compiling a small microkernel, and compiling most of its own source code. Of course, the natural thing to do is to make it so that could cross-compile itself and run in the browser, and that's where it became terribly slow, which required more code to optimize, and the new code that was added to make it faster in the long term made it much slower in the short term.

To start with, if you think of a simple piece of code like this:

    if ( 1 ) { putc('a'); }

This is only a 23 byte character program, so why should it be slow to compile? Well, the first stage of parsing this program involves tokenization. In this short program, I count 16 different 'tokens' (including the whitespace). If you want to have even the simplest data structure to describe one of your 'tokens', that only contains a single pointer to an offset in the program, then you will need to consume 16 pointers just for the tokens. On a 64 bit machine, you'll have 8 byte pointers, and 16 * 8 = 128 bytes, just for the pointers into the byte array of the program! And we haven't even started talking about the memory overhead of all the other things you'll need to describe about these tokens in your token object.

So, now we already have a memory overhead that is more than 5 times as big as the program, but we also have to build the parse tree, control flow graphs, linker objects etc. and you also have to pull in a mess of header files, bloated libraries etc. If you're wasteful with memory in the compiler, you can easily run out of memory from compiling a few megabytes of source code. Being more intelligent with memory management requires copying memory around a lot, which also adds to the latency.

So, now you need to think about optimizing your memory use, and do 'smarter' things that trade memory usage for CPU. Plus, you're likely to also start needing free/delete a lot from heap memory which is a system call and therefore slower than a call within your program. By the time you implement all this 'optimization', you compiler has become an incredibly complicated and bloated system that requires even more code to optimize all the opportunities for improvement.

robertelder · on Dec 2, 2022

Since we're on the topic of snap updates:

A couple weeks ago I was working away in the terminal when all of a sudden, my USB camera turned on and its light started flashing at me indicating something had just started interacting with my webcam. I immediately assumed "Oh, that's probably just some hackers watching me through my web-cam.", so I looked through /var/log a bit and noticed that it had just re-detected all USB devices and two new users had just been added to my system:

    snapd-range-12345-root:x:12345:12345::/nonexistent:/usr/bin/false

    snap_daemon:x:12345:12345::/nonexistent:/usr/bin/false

Does anyone know what these new users are for, and why they were added just now instead of at install time? I googled a bit, but couldn't find any recent news about it.

numeromancer · on Dec 2, 2022

It was the hacker known as "Canonical".

JonChesterfield · on Dec 3, 2022

Seems totally legitimate for a proprietary package manager to take control over your webcam.

forgotpwd16 · on Dec 3, 2022

>proprietary package manager

The client is foss. The store is proprietary. The store isn't required to install or/and distribute snaps.

fulafel · on Dec 3, 2022

Ubuntu system user accounts use the <1000 range of user ID numbers which you can see looking at /etc/passwd. Unlike this 12345 uid listed above.

On the other hand this username is mentioned in a snap dev forum: https://forum.snapcraft.io/t/system-usernames/13386 - but there it says it should be using the 524288-589823 uid range...

robertelder · on Dec 10, 2021

I've been thinking about this since I saw it here on HN yesterday, and I can't help but entertain the idea that this might end up being 'the worst software security flaw ever'.

brabel · on Dec 10, 2021

We tested this on several JVM versions and found you needed to go really far back, to around Java 8u121 I think, to see the specific exploit using LDAP+HTTP class loading work because they changed the value of the JVM property that allows loading a class file from a remote codebase... however, as this article points out, quite mind blowingly, early JDK11 releases also seem to have been vulnerable (I believe at least JDK 11.0.2 is not vulnerable anymore, but can't confirm right now).

We also found that other similar exploits based on JNDI can work even if the one based on LDAP redirecting class loading to a malicious HTTP server doesn't (I won't mention it here because it makes it much easier to exploit, so disabling log4j's evaluation of jndi patterns or migrating to the patched version is absolutely necessary, still).

robertelder · on Dec 10, 2021

Interesting, thank you for that analysis. From what I understand, the RCE exploit really needs two things to work: 1) The interpretation of the JNDI reference by log4j, and 2) The 'auto-execute loaded classes' (which I don't quite understand).

Is there any kind of low-level flag you can pass to Java or your environment to completely disable JNDI? I recall that there is a flag you can pass to log4j, but I can't see any reason why I would ever use JNDI anywhere in Java.

Also, do you have any additional insights on how exactly the mechanism for 2) works? From what I understand, this is a feature of Java itself?

brabel · on Dec 11, 2021

If you're using the Java Module System and deploying via jlink, you can make sure to not include these modules and JNDI won't be available at all:

    java.naming@version
    jdk.naming.dns@version
    jdk.naming.ldap@version
    jdk.naming.rmi@version

To list the modules your JDK has, use `java --list-modules`.

If you're not using the module system, you can't completely disable JNDI, but you can tell the JVM to not load classes from a remote host by setting the system property "com.sun.jndi.ldap.object.trustURLCodebase" to "false". This has been the default in most JDKs for several years, but apparently some folks still somehow got victim to this. There are other configuration properties you can adjust listed in the javadocs for javax.naming.Context at https://docs.oracle.com/javase/8/docs/api/index.html?javax/n....

The LDAP/JNDI exploit works because when JNDI performs a lookup (and in this case, simply logging a message with `${jndi:...}` on log4j would trigger that), it might connect to a remote host that's in control of the attacker... the LDAP response from whatever LDAP server that got contacted may contain all sorts of instructions for the JVM to load classes remotely, from a HTTP server anywhere on the internet, for example. The attack I've seen used the LDAP ObjectFactory that lets the LDAP response tell where to get the bytecode of another ObjectFactory via any URL. If the JVM "com.sun.jndi.ldap.object.trustURLCodebase" property were false, this would've been blocked, but otherwise, the attacker class would be loaded and could immediately run (via a static block for example) any Java code at all on your server. Notice that this is a feature of LDAP, not a bug, but it should never have been possible for untrusted input to be used in JNDI lookup, for obvious reasons. There are other ways to "bypass" this flag by using other LDAP features that load remote code (won't list them here, but they're easy to find if you know LDAP and JNDI) or using another JNDI provider (RMI, CORBA) in case the libraries you have in the classpath include another ObjectFactory that loads remote code (e.g. many JDBC Drivers, Apache Tomcat etc.) - it's impossible to tell how many similar attacks become possible once you have JNDI opened up to untrusted input.

This attack has been known for several years... if you look hard enough you'll find whole toolkits showing how to perform these attacks dating back at least 6 years, from what I found.

Here's a detailed writeup from 2016: https://www.blackhat.com/docs/us-16/materials/us-16-Munoz-A-...

keyle · on Dec 10, 2021

We need proactive logging to see when something dodgy is going on!

    *boned by proactive logging*

blibble · on Dec 10, 2021

absolutely, most vulnerabilities are stopped by the frontend

this one gets all the way through and hits the backend

better hope your backend is on a separate LAN with no internet access..!

tetha · on Dec 10, 2021

This one could possibly hit past the backend and hit tools like sentry, a log aggregation and such, through an unaffected backend.

dylan604 · on Dec 10, 2021

nah, it's just a logging package that not everyone uses. it would be much worse if it was in an OS of some sort.

anyfoo · on Dec 10, 2021

Would it? It's a very common logging package, and Java is cross-platform. I also think OSes tend to be updated more often than JDKs (but I'm not sure).

nonameiguess · on Dec 10, 2021

It's used by Elasticsearch, so possible you could exploit the log aggregation service even if the app-level logging library isn't vulnerable, but you'd need a way to make sure the first-level logging doesn't interpret the format string.

jpeter · on Dec 11, 2021

Everyone uses it: https://github.com/YfryTchsGD/Log4jAttackSurface

robertelder · on May 15, 2021

Neat. I wrote a tool that shows interactive examples of how you can actually do the calculations for multiplications with Fourier transforms:

https://blog.robertelder.org/fast-multiplication-using-fouri...

The trick is really just about realizing that you can re-write a number like 1234 as a polynomial like 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0.

nceqs3 · on May 15, 2021

That is really cool and useful. Visualization of the FFT can be hard sometimes. Thanks for making that.