More

hxtk · 2026-05-07T15:19:23 1778167163

When I read some written content, before AI, I learned a few different things in order. First, just by its mere existence, I learned that someone had found an idea worth expending some effort to express. Next, I would learn the words of the content. Next, I would usually acquire some kind of knowledge that I was able to synthesize or extract from the content. That last step isn't a given, but it's very likely to happen given the pre-filter implied by the first bit of information I learned.

There's no pre-filter anymore. It's exceedingly hard for me to quickly determine how important a person thinks an idea is or how much thought they've put into it in the age of AI, and so there's no guarantee that if I invest the time to read the content then there will be a proportional amount of meaning available for me to extract. This risk always existed even with works written by humans, but now it's overwhelming and has decreased my overall of exposure to new ideas that I didn't explicitly go looking for because I have a much higher expectation that information placed in front of me unsolicited will just be a waste of my time.

hxtk · 2026-05-06T12:07:14 1778069234

The missing status page [1] treats it as downtime any time any component of the system is down, and calculates the overall uptime based on the time that doesn't overlap with any individual category outages, and the overall downtime as any time overlapping with at least one individual category outage to avoid double-counting They show 24h of minor outage on that date.

I'm guessing that this site is taking the downtime in a given day across all services and adding it up, which would mean the worst possible day has 10 days of downtime (a day of downtime for each major category).

1: https://mrshu.github.io/github-statuses/

hxtk · 2026-04-29T16:06:37 1777478797

This analysis yields very different results under utilitarianism vs rule utilitarianism.

Under the former, you could argue, "What I'm doing is a science or useful art, so if copyright exists to advance those things then taking a more permissive interpretation of copyright to allow my efforts to succeed is in the spirit of the law."

Under the latter, you could argue, "Works get published because as a rule, researchers and artists know they have lawful recourse through copyright if the work gets used without their consent. The absence of that rule incentivizes safeguarding works by treating them as secret and each disclosure as a matter of personal trust, so the existence of that rule promotes the sciences and useful arts."

hxtk · 2026-04-09T18:52:32 1775760752

Cloud is more cost effective the less of it you have because it doesn’t cost 3x more to maintain a kubernetes cluster with thrice the nodes, but it does cost 3x more to rent one. This is even more true for serverless.

I can imagine a lot of small apps buy into serverless at a time where it’s legitimately the most cost-effective solution and then they’re stuck because serverless platforms are easy to lock yourself into.

hxtk · 2026-02-16T18:37:15 1771267035

The Nightmare Course [1], so named because someone with that skillset (developing zero-days) is a nightmare for security, not because the course itself is a nightmare, and Roppers Academy [2] are both good for learning how to reverse engineer software and look for vulnerabilities.

The nightmare course explicitly talks about how to use Ghidra.

1: https://guyinatuxedo.github.io 2: https://www.roppers.org

decidu0us9034 · 2026-02-16T22:49:19 1771282159

The first is certainly interesting, but it won't help you develop 0day. I would think of it like more of a collection of fun puzzles and esoterica. For example all the heap unliking/metadata attacks and House of X stuff is pretty antiquated. These will help you win ctfs but are certainly not a prerequisite or even all that relevant to contemporary vuln research. Most of the public research I see is probably at least a year behind the current meta (and I expect the public internet will only grow more quiet over time)

pertique · 2026-02-17T03:55:59 1771300559

What resources would you recommend now? Even if they're a year behind the meta, it's a lot easier to start a year behind than 6+

hxtk · 2026-02-14T13:26:34 1771075594

It’s surprising to me how much people seem to want async in low level languages. Async is very nice in Go, but the reason I reach for a language like Zig is to explicitly control those things. I’m happily writing a Zig project right now using libxev as my io_uring abstraction.

pjmlp · 2026-02-14T18:22:28 1771093348

Using async in low level languages goes all the way back to the 1960's, became common in systems languages like Solo Pascal, Modula-2, with Dr.Dobbs and The C/C++ User's Journal having plenty of articles regarding C extensions for similar purposes.

Hardly anything radical.

hxtk · 2026-02-14T19:01:04 1771095664

When I look at historical cases, it seems different from a case today. If I’m a programmer in the 60s wanting async in my “low level language,” what I actually want is to make some of the highest level languages available at the time even more high level in their IO abstractions. As I understand it, C was a high-level language when it was invented, as opposed to assembly with macros. People wanting to add async were extending the state of the art for high level abstraction.

A language doing it today is doing it in the context of an ecosystem where even higher level languages exist and they have made the choice to target a lower level of abstraction.

pjmlp · 2026-02-16T07:27:40 1771226860

People are putting C++20 co-routines to good use in embedded systems, in scenarios where the language runtime with some additional glue layers, is the OS for all practical purposes.

And GPGPUs as well, like senders/receivers whose main sponsor is NVidia.

melodyogonna · 2026-02-14T14:28:12 1771079292

But Zig's async is being designed to enable this low-level control.

audunw · 2026-02-15T13:58:09 1771163889

There’s two things I think a low level language should have:

1. Standardise on a sync/async agnostic IO interface (or something similar) so you don’t get fragmentation in the ecosystem.

2. Stackless coroutines. Should give the most efficient async io code, and efficient code is one of the reasons I want to use low level language

xnoreq · 2026-02-17T06:36:41 1771310201

Huh? Golang doesn't have async?!

hxtk · 2026-02-13T14:51:58 1770994318

Apparently there are lots of people who signed up just to check it out but never actually added a mechanism to get paid, signaling no intent to actually be "hired" on the service.

hxtk · 2026-02-12T04:57:14 1770872234

I've actually been experimenting with that lately. I did a really naive version that tokenizes the input, feeds the max context window up to the token being encoded into an LLM, and uses that to produce a distribution of likely next tokens, then encodes the actual token with Huffman Coding with the LLM's estimated distribution. I could get better results with arithmetic encoding almost certainly.

It outperforms zstd by a long shot (I haven't dedicated the compute horsepower to figuring out what "a long shot" means quantitatively with reasonably small confidence intervals) on natural language, like wikipedia articles or markdown documents, but (using GPT-2) it's about as good as zstd or worse than zstd on things like files in the Kubernetes source repository.

You already get a significant amount of compression just out of the tokenization in some cases ("The quick red fox jumps over the lazy brown dog." encodes to one token per word plus one token for the '.' for the GPT-2 tokenizer), where as with code a lot of your tokens will just represent a single character so the entropy coding is doing all the work, which means your compression is only as good as the accuracy of your LLM, plus the efficiency of your entropy coding.

I would need to be encoding multiple tokens per "word" with Huffman Coding to hit the entropy bounds, since it has a minimum of one bit per character, so if tokens are mostly just one byte then I can't do better than a 12.5% compression ratio with one token per word. And doing otherwise gets computationally infeasible very fast. Arithmetic coding would do much better especially on code because it can encode a word with fractional bits.

I used Huffman coding for my first attempt because it's easier to implement and most libraries don't support dynamically updating the distribution throughout the process.

hxtk · 2026-02-11T22:04:35 1770847475

Even that functions as a sort of proof of work, requiring a commitment of compute resources that is table stakes for individual users but multiplies the cost of making millions of requests.

hxtk · 2026-02-09T02:50:33 1770605433

It’s gotten easier of late because Bazel modules are nice and Gazelle has started support plugins so it can do build file generation for other languages.

I don’t like generative AI for rote tasks like this, but I’ve had good luck using generative AI to write deterministic code generators that I can commit to a project and reuse.