> Thus, learning Mandarin by reading requires first memorizing hundreds or thousands of words, before you can even know where one word ends and the next word begins.
That's not true at all; you can go a long way just by clicking on characters in Pleco, and Pleco's segmentation algorithm is awful. (Specifically, it's greedy "find the longest substring starting at the selected character for which a dictionary entry exists".)
Apple is a hypocrite. They already were a huge partner to NSA's PRISM and China's surveillance programs. Their privacy marketing solely because they could not tolerate profits made by Google and Meta. Now they also want to become ad company.
Even if you ignore the above points, Apple's software is closed source. You cannot change OS or install any unapproved app on your own phone. Apple phones are Orwellian's wet dream. If people still trust bigtech then society is doomed.
That hasn’t aged well because the microkernels of the day like Mach failed to keep their promises. There are newer ones like the L4 family that were designed specifically for performance, but they have not been deployed as a base for a full-featured OS like Mach was for macOS or OSF/1, where IPC was too slow and the OS server was glommed to the microkernel, making it an even ungainlier monolith. Just another illustration of academic theory vs industrial practices.
Luckily, society broadly creates more than enough wealth and resources to provide for you, even when you are disabled or elderly.
In modern society, if you are a “sink,” it is solely because the system is designed to make you that way – perhaps the wealthiest among us are able to pay a bit less on taxes, in exchange for your caretaker not getting monetary support. For example. I reckon gutting social security would be very profitable to some.
The government’s job is to be useful, and supporting those who can’t support themselves – like the elderly and disabled – is very literally the most noble and useful thing it can do.
I had luck with it. I was hired from the February 2022 HN post and enjoyed working there for three years before we were acquired. I posted last week in February 2025, hoping for similar results.
Hey folks, I made a tool to handle EB-1A judging—finding hackathons, applying, and even drafting outreach emails in one spot. Already hit 100+ signups and 1,000+ page views. Check it out: https://judging.exalien.club/. Would love any feedback if you’ve tried something similar!
I was curious if "approximate was even faster than exact" at the time in 1975..78. So, I learned a bit how to get a v7 booting on a SIMH pdp11 simulator. I actually got an ordered merge against @Paul_Houle's delta-against-the-prefix encoded format mention to run faster than the v7 spell. Here is a little transcript:
$ time zk
Total: 500400000
real 2.0 user 2.2 sys 0.1
$ echo suw = 1st 5500 sorted-unique words of Tale Of Two Cities
$ wc < suw
1531 1531 11043
$ time comm -23 - sowpodsAIO < suw >/n
real 3:04.0 user 2:28.2 sys 35.4
$ time ni0 sowpodsAIO.pd < suw.pd > /n
real 2:01.0 user 1:48.5 sys 11.4
$ time ni1 sowpodsAIO.pd < suw > /n
real 1:41.0 user 1:29.1 sys 11.3
$ time ni2 sowpodsAIO.pd < suw > /n
real 1:26.0 user 1:14.1 sys 11.5
$ time ni3 sowpodsAIO.pd < suw > /n
real 1:04.0 user 53.2 sys 10.3
$ time ni4 sowpodsAIO.pd < suw > /n
real 1:04.0 user 53.3 sys 10.1
$ time ni5 sowpodsAIO.pd < suw >/n
real 45.0 user 34.4 sys 10.1
$ time spell < suw > /n
/bin/spell: /usr/dict/spellhist: cannot create
real 46.0 user 1.4 sys 1.4
$ wc answer.spell
78 78 604 answer.spell
$ wc answer.ni2
35 35 225 answer.ni2
zk is Bob Supnik's classic PDP-11 performance test. `comm -23` is the slow part of the v6 Unix spell. `ni0.c` is the same as the above `notIn.c`. `ni1.c` specializes to have stdin be non-prefix-delta but the dictionary still prefix-delta. `ni2.c` does the two simple (also not English-specific) optimizations of not recomparing prefix characters already known to match and not comparing words part of some "increasing run of longer words" like "abbreviate abbreviated abbreviates abbreviating abbreviation abbreviations". `ni3.c` ditches stdio buffering for the dictionary in favor of its own to play with the buffer size since v7 could only do BUFSIZ=512 bytes. `ni4.c` does the same for the stdin stream and also eliminates the need for a follow-on strlen (like the modern getline/getdelim). Finally, `ni5.c` manually inlines the character reading loop into the macro to get the next dictionary word. In real life, there would probably be 2-4 seconds of IO time added for all but the `comm` variant (which needs 2.7 MB IO not 870 kB).
For posterity, that ni5.c outperforming v7 spell (not counting IO) in 1979 v7 C is this:
#include <stdio.h> /* Emit stdin words not in /t/dict.pd */
extern int errno; /* ni dict.pd < sort-uWds */
#ifndef Z
#define Z 2048
#endif
char buf[Z]; char *b = buf; int n = 0;
fillbuf(fd) int fd; {
int m = read(fd, buf, Z);
if (m > 0) { b = buf + 1; n = m - 1; return buf[0]; }
return -1;
}
#define chGet(fd) (n-- > 0 ? *b++ : fillbuf(fd))
char bufI[Z]; char *bI = bufI; int In = 0;
filbufI() { /* symbols must be <= 7 chars long */
int m = read(0, bufI, Z);
if (m > 0) { bI = bufI + 1; In = m - 1; return bufI[0]; }
return -1;
}
#define chGetI (In-- > 0 ? *bI++ : filbufI())
readw(wI, max) char *wI; int max; {
register int i;
register char c;
for (i = 0; i < max; i++) {
c = chGetI;
if (c != '\n' && c != -1)
wI[i] = c;
else
break;
}
return i;
}
main(ac, av) int ac; char **av; {
int fD = open(av[1], 0);
char wI[64], wD[16], nPS; /* Word buffers, lens, then.. */
int nI=0, nD=0, oI=1, oD=1; /* ..a flag sez read was Ok. */
int m, nS, nP, i; /* Running num. matching prefix chars */
if (!fD){fprintf(stderr, "%s: %d\n", av[1],errno);return 1;}
#define DGET /* Update [wno]D with its next word */ \
if ((oD = oD && (nPS = chGet(fD)) != -1)) { \
/* pcc needs & 255 here| , but runs ~8% slower */ \
nP = ((unsigned)nPS)>> 4; nS = ((unsigned)nPS) & 15; \
nD = nP + nS; \
for (i = nP; i < nD; i++) wD[i] = chGet(fD); }
#define IGET /* Update [wno]I with its next word */ \
m = 0; /* New inp word => reset cmp to start */ \
oI = oI && (nI = readw(wI, sizeof(wI)));
#define PUT { fwrite(wI, 1, nI, stdout); putchar('\n'); IGET }
IGET DGET /* Set Theory "not in": ordered merge impl */
while (oI && oD) {
register int c = 0, lim = nI < nD ? nI : nD;
if (nP < m) m = nP;
for (/**/; c == 0 && m < lim; m++) c = wI[m] - wD[m];
m--;
if (c == 0)
c = nI - nD;
if (c < 0) PUT /* I<D: Emit&AdvanceI */
else if (c == 0) { IGET DGET } /* I==D: advance both */
else { /* I>D: advance D to..*/
if (nD < nI) { /*..a maybe <= word. */
DGET /* Loop & compare more */
} else { /* Cannot possibly be <= while nP > m */
do { DGET } while (oD && nP > m);
} /* The above trickery elides like 95% memcmp */
}
}
while (oI) PUT /* flush tail out */
return 0;
}
It's actually not so bad of a C program, especially if you compare it to Doug McIlroy's A) much larger and B) approximate and C) very English-specific program. While I'm sure there is a little bit of time in pre-processing for v7spell that could be elided, I feel I've confirmed the idea (mentioned in McIlroy1982, actually, at the end of HASHING & before SUPERIMPOSED CODES) that an exact approach could have been fine. McIlroy 82 seemed aware of this, mentioning [11] the Morris-Thompson, but I'm not sure he was aware of the pretty severe trade-off between more compacting data compression ideas and how fast they are to decode. As you can perhaps see from this program the prefix-delta fits very naturally into the actual data flow of the algorithm and even instructs you how to intelligently skip a lot of data comparison. It might be possible to do that even better - I didn't think too long about that.
Of course, at some scale dictionary and some scale inputs the loop-over-words will beat whole-document-at-once, but 5500 words (chosen to match a statement in McIlroy82) is not that big. This post alone is over 1000. And 270e3 words is actually a large unabridged dictionary, bigger in fact than the 100-200K words McIlroy speculated upon at the time.
People don't like repetitive topics - that's a much simpler explanation and things get flagged for repetitiveness all the time. The source doesn't really matter, after the Nth DOGE article, if Kabosu herself descended from doge heaven and wrote a DOGE article, it would probably get a lot of flags.
I think it's only necessary because extensions may need to run client side. Not sure if that means the extension needs to be published on their marketplace site.
I searched, couldn't find it. what are you talking about?