More

rep_lodsb · 2026-05-16T17:55:44 1778954144

Something like uMatrix should be built right into the browser, and the fact that this isn't the case really says it all about how it's not the "user agent" anymore. It's the one extension that's absolutely essential IMO -- no third-party connections at all by default, yes it breaks a lot of sites, but then you should ask yourself if the content was really worth reading in the first place!

Besides the blocking, being able to see at the click of a button what kind of crap most sites want to load is really eye opening. And they would do so completely silently if you're using a "normie" browser created or financially supported by the largest advertising company in the world.

Instead the mainstream gets "security features" like Safe Browsing, where it connects to a Google server every day without most people's consent or even knowledge, downloading a list of hashes of "bad stuff" to block. Like open source software to download videos from YouTube (yt-dlp), which it flags as malware. Of course the tinfoil hat conspiracy theory that it's also sending every URL you visit to their server isn't true -- only the ones that match a hash, "to check for false positives". It's easy to see how this mechanism could be abused to log who is visiting particular URLs of interest, without alerting the user to it happening. As far as I see it, you would just have to trust them when they super-double-pinky-swear they would never do this. And of course the TLAs wouldn't allow them to disclose it if something like this happened on their orders.

tredre3 · 2026-05-16T18:29:22 1778956162

> [...] the fact that this isn't the case really says it all about how it's not the "user agent" anymore. [...] yes it breaks a lot of sites, but then you should ask yourself if the content was really worth reading in the first place!

Users want websites to work. The agent excludes a feature that, you admit, break most websites.

Yet you find it puzzling and anti-user behavior? Can you elaborate?

Would you be okay if it was built-in but disabled by default and hidden behind a setting or a flag?

rep_lodsb · 2026-05-16T20:59:56 1778965196

Maybe websites should work without loading megabytes of scripts from third-party servers? I think that should be disabled unless you opt-in.

Also browsers by default using a blocklist from some company, and showing a giant scary warning and contacting their server when the user deliberately navigates to an URL that is on that list. That should be opt-in as well, rather than something that just happens and is considered acceptable.

rep_lodsb · 2026-05-16T11:33:57 1778931237

Slight correction, the correct offset is 7, and DAA only adds 6. But the trick is also adding the carry bit. This works on the 6502 in decimal mode too, e.g. https://news.ycombinator.com/item?id=6342286

On the Z80 and 8086, the code can be made one byte shorter by taking advantage of adjust-after-subtraction, which the 8080 didn't have (and on 6502 worked differently):

    CP   10      / CMP  AL,10      ;set carry if valid decimal digit
    SBC  A,69H   / SBB  AL,69H     ;0..9 => 96h..9Fh (auxC=1), 10..15 => A1h..A6h (auxC=0)
    DAA          / DAS             ;subtract 66h if auxC set, 60h if clear

gdevic · 2026-05-17T12:52:34 1779022354

That. I blatantly "stole" those from Z80 since they are elegant and effective. I have BF (flag) that gets set in ALU when the result is > 9, then DAA/DAS that add 6 or 10 (the latter wraps around as -6 since registers are 4-bit wide).

     12'b0000_0000_001?: begin : instr_daas    // DAA, DAS
       if (flags[BF_BIT])
         rx[0] <= rx[0] + (op_is_daa ? 4'd6 : 4'd10);
       flags[CF_BIT] <= flags[BF_BIT];
       state <= FETCH;
     end : instr_daas

rep_lodsb · 2026-05-04T13:12:05 1777900325

I know this comment will get ignored by the true believers, and likely pasted directly into Claude by the author in order to "further improve" the code, but here's some small excerpts from the terminal emulator (glass.asm, 19360 lines, 555 KiB):

    cmp dword [rax], 'XAUT'
    jne .rxa_next
    cmp dword [rax+4], 'HORI'
    jne .rxa_next
    cmp word [rax+8], 'TY'
    jne .rxa_next
    cmp byte [rax+10], '='
    jne .rxa_next
    ; Found XAUTHORITY=path

Okay, this is setup code that only runs once at startup - but that would be a reason to optimize it for size and/or readability! REPE CMPSB exists, and may not be the fastest, but certainly the most compact and idiomatic way to compare strings. Or write a subroutine to do it!

This pattern is used everywhere for copying or comparing strings, this was just one example of it.

There's a state variable that's used to keep track of whether the input is text to be displayed or part of a control sequence. It's a full 64 bits, probably not because we need 18 quintillion states? Here's how it is evaluated:

    ; Dispatch based on state
    mov rcx, [vt_state]
    cmp rcx, VT_ESC
    je .vtp_esc
    cmp rcx, VT_CSI
    je .vtp_csi
    cmp rcx, VT_CSI_PARAM
    ...

In total, there are 7 compares + conditional jumps, one after another. Compilers would generate a jump table for this, and a better option in assembly might be to make vt_state a pointer to the label we want to go to. Branch predictors nowadays can handle indirect jumps, and may actually have more trouble with such tightly clustered conditionals as seen in this code.

This code is on the "slow" path, there's a faster one for 7-bit ASCII outside of control sequences, with a lengthy comment by Claude at the top on how it optimized this. Even this one starts with a bunch of conditionals though:

    cmp qword [vt_state], 0            ; VT_NORMAL == 0
    jne .vtp_loop_slow
    cmp dword [utf8_remaining], 0
    jne .vtp_loop_slow
    cmp byte [pending_wrap], 0
    jne .vtp_loop_slow

These could likely all be condensed into a single test or indirect jump via the state variable, by introducing just a few more states for UTF-8 decoding and wrap. Following this, here's a "useless use of TEST" (the subtraction already set the flags):

    mov rbx, [grid_cols]
    sub rbx, [cursor_col]              ; rbx = cells left on this row
    test rbx, rbx
    jle .vtp_loop_slow                 ; no room (or already past)

This also again shows the compulsive use of 64-bit registers and variables for values that should never be this big. It's not the "natural" data size on x86-64 at all, every such instruction requires an extra prefix byte.

I freely confess that I'm a "Luddite", and was explicitly looking for bad (and obviously so) code, but this took me just a few minutes of scrolling through the nearly 20K lines in this file, so it should be somewhat representative of the whole.

geir_isene · 2026-05-04T18:11:39 1777918299

Thanks for the improvement. Highly appreciated.

rep_lodsb · 2026-05-02T11:35:49 1777721749

Fun fact: "/dev/nul" (with only one L) would have worked, even if there is no directory with that name.

That's been a feature since DOS 2.0, there was even an undocumented option AVAILDEV to make the prefix mandatory, instead of having device names present everywhere. But it broke the common trick used to detect if a directory exists ("if exist c:\some\path\nul").

rep_lodsb · 2026-04-30T15:01:44 1777561304

Last I checked, it's still mostly unusable, especially on real hardware (both modern and Win2k-era). Not saying this because I have anything against it, that's just a fact.

Of course one could say that Windows 11 provides negative value, in which case running DOS 1.00 would also be better ;)

rep_lodsb · 2026-04-30T09:56:55 1777543015

The original code using ROL should have been correct? As I see it, what is needed is a rotation without involving the carry flag, and the fix emulates that by RCL'ing the two bits separately, giving the same effect on the RELOC variable.

This may actually have been a bug not in the code, but in the assembler used to build it. The 8080 had mnemonics ROL and ROR that rotated through carry, and RLC/RRC (standing for "rotate circular", not "through carry"). Opposite meaning of the 8086 mnemonics! So I suspect this may have been switched up in the assembler, especially if it was running on an 8080 machine and developed by someone more familiar with its instruction set.

The STORE bug would have prevented using files over the size of 512 bytes, not just 64K. It's dividing by the sector size, and if DX was greater than that, it would have caused a "divide overflow" exception, since the result wouldn't fit in 16 bits.

(Also, by the Laws of Robotics you have to tell me if you're an LLM, or used one to generate this comment.)

EDIT: not an assembler bug, it seems. The printed listing shows that it produced the correct opcode for ROL (RCL would be D1 D1):

    0A28 8A 0E D4 1B           1353  MOV CL,[RELOC]
    0A2C D1 C1                 1354  ROL CX
    0A2E D1 C1                 1355  ROL CX
    0A30 88 0E D4 1B           1356  MOV [RELOC],CL

So I don't know why this version of the code wouldn't have worked. Maybe the penciled-in "fix" was to free up CL for some other purpose?

bananaboy · 2026-04-30T12:45:26 1777553126

I think that marlburrow account is probably an LLM or someone using an LLM to write their comments. Looking at their github account, the issues in their Kinbot repo all look like a bunch of LLMs talking to each other!

rep_lodsb · 2026-04-26T17:49:36 1777225776

Division by zero is handled the same on x86, and also changed from "trap" to "fault" after the original 8086/8088. And despite what is often said about its variable-length instruction format, skipping over opcodes is pretty much trivial compared to VAX.

Early versions of Microsoft Flight Simulator included a handler for the divide exception, which adjusted the result to +/- "infinity" (for a 16-bit signed integer, 32767 or -32768). The rest of the code relied on this in order to work correctly, and it was more efficient to take advantage of the processor's microcode doing this check rather than coding it explicitly before every division.

So even if it doesn't make mathematical sense, being able to continue after this type of exception is a useful feature to have.

>(and if you ignore SIGSEGV, it is considered perfectly acceptable that your program spins in a SIGSEGV loop until you kill it.)

This, on the other hand, shouldn't ever be acceptable. If a fatal-by-default signal is just ignored, it should always terminate the process.

rep_lodsb · 2026-04-22T20:58:04 1776891484

Most of mul/div was implemented in hardware since the 80186 (and the more or less compatible NEC V30 too). The microcode only loaded the operands into internal ALU registers, and did some final adjustment at the end. But it was still done as a sequence of single bit shifts with add/sub, taking one clock cycle per bit.

rep_lodsb · 2026-04-22T18:55:04 1776884104

That's greatly oversimplified, or less generously, just flat out wrong. Win32 programs have always had their own isolated address space. That infamous BSOD is the result of memory protection hardware catching an access to something outside of that address space. When you open a DOS box, it uses the paging and V86 hardware mechanisms to create a new virtual machine, even though it shares some memory with the instance of DOS from which Windows was booted.

What Windows 9x didn't have was security. A program could interfere with these mechanisms, but usually only if it was designed to do that, not as a result of a random bug (if the entire machine crashed, it was usually because of a buggy driver).

roytam87 · 2026-04-24T09:33:53 1777023233

win32 programs in win32s shares same address space.

rep_lodsb · 2026-04-22T16:02:39 1776873759

These two steps usually run in parallel though, with transistors to enable them depending on what operation should be performed.