More

zasdffaa · on Dec 3, 2022

DB 'expert' here - agree completely. It's about profiling then banging your head against it and learning more from books/the docs/web. That's really it.

zasdffaa · on Dec 3, 2022

I don't want to diss this, it's useful to some but to say "Your database knowledge is outdated" is presumptuous to the point of insulting. It's basic to intermediate level stuff, like using CTEs to break up queries (and you don't discuss how this can overload the optimiser), and doing multiple aggs in one select.

I also can't C&P from your pages as they're images, but Ghost Conditions Against Unindexed Columns has '... AND type = in (3, 6, 11)' - is that right or did you mean 'type in (...)'. It also talks about multi-col indexes as being more useful in some cases, true, but very elementary.

This is good and well done, but please don't oversell it.

zasdffaa · on Dec 2, 2022

It was exceedingly narrow to the point that the first cavers chosen there were small women iirc.

Pic of them halfway down https://www.theatlantic.com/science/archive/2015/09/homo-nal... plus article.

zasdffaa · on Dec 2, 2022

Things have a purpose. You might use your oven 2% of the time, you don't turn it on and use it (or turn it on and leave it empty) 100% of the time just cos it's there. When the purpose is done, you stop.

zasdffaa · on Dec 2, 2022

Then may I recommend in addition Hacker's Delight. The 1st ed is most fun, 2nd ed has a lot more depth in certain areas (division IIRC) which you might choose to skip.

zasdffaa · on Dec 1, 2022

A good non-dev can be hugely helpful to devs and much increase their productivity. To repeat: a good one.

zasdffaa · on Dec 1, 2022

Do you (or anyone) have some idea why anyone could possibly have thought 16 bits would be enough? Many decisions are bad in hindsight but surely no hindsight was needed for that.

mananaysiempre · on Dec 1, 2022

Nope. But on reflection, I can’t really tell if it was really that dumb of an idea.

If you look at the text following the “Yes” quote, you’ll find that “all characters” is carefully defined to mean ”all characters in current use from commercially non-negligible scripts”. Compared to the current definition of “all characters we have reasonable evidence have ever been used for natural-language interchange”, it doesn’t sound as noble, but would also exclude a number of large-repertoire sets (Tangut and pre-modern Han ideograms, Yi syllables, hieroglyphs, cuneiform). Remove the requirement for 1:1 code point mapping with legacy sets, and you could conceivably throw out precomposed Hangul as well. (Precomposed European scripts too, if you want, but that wouldn’t net you eleven thousand codepoints.)

At that point the question seems to come down to Han characters: the union of all government-mandated education standards (unified) would come down well below ten thousand characters, but how well does that number correspond to the number of characters people actually need? One potential source of death is uncommon characters people really, really want (proper names), but overall, I don’t know, you’d probably need a CJKV expert to tell. To me, neither answer seems completely implausible.

On the other hand, it’s also unclear that a constant-width encoding would really be all that valuable. Most of the time, you are either traversing all code points in sequence or working with larger units such as combining-character sequences or graphemes, so aside from buffer truncation issues constant width does not really help all that much. But that’s an observation that took more than a decade of Unicode implementations to crystallize.

It is certainly annoying how large and sparse the lookup tables needed to implement a current version of Unicode are—enough that you need three levels in your radix tree and not two—but if you aren’t doing locales it’s still a question of at most several dozens of kilobytes, not really a deal breaker these days. Perhaps that’s not too much of a cost for not marginalizing users of obscure languages and keeping digitized historical text representable in the common format.

zasdffaa · on Dec 1, 2022

Speaking as a brit, the opium wars were an abomination but China does like to (or find convenient) playing the victim.

zasdffaa · on Nov 30, 2022

I see nothing in your link to justify what you say. Perhaps you can elaborate.

wisnesky · on Nov 30, 2022

The way to run conjunctive SQL queries forward and backward is described in this paper, https://www.cambridge.org/core/journals/journal-of-functiona... , (also available on the arxiv), where they are referred to as query 'evaluation' and 'co-evaluation', respectively. We never would have been able to discover co-evaluation if not for category theory! The previous link includes this paper, and many others.

charcircuit · on Nov 30, 2022

So what is coevaluation and why is it useful? Please don't just point at the paper again.

wisnesky · on Dec 1, 2022

Bi-directional data exchange has many uses. For example, given a set of conjunctive queries Q, because coeval_Q is left adjoint to eval_Q, the composition coeval_Q o eval_Q forms a monad, whose unit can be used to quantify the extent to which the original query Q is "information preserving" on a particular source (so query/data quality). As another example, we use the technique to load data into OWL ontologies from SQL sources, by specifying an OWL to SQL projection query (tends to be easy) and then running it in reverse (tends to be hard). But no doubt more applications await!

zasdffaa · on Dec 1, 2022

Can you point to some examples of owl/sql transforms being flipped? I have trouble believing that an invertible transformation is hard (presumably each step is invertible, right), and certainly "never would have been able to discover" seems inconceivable to me.

Looking at the paper it is very dense and abstract, also 50 pages long.

Edit: on reflection I am doing a bit of sealioning which was not my intention but it does look that way. I'll try to read your paper but if you assure me cat theory really allowed you to do those things you claim, I'll accept you at your word.

wisnesky · on Dec 1, 2022

You might try pages 8-16 of this presentation: https://www.categoricaldata.net/cql/lambdaconf.pdf . The examples are relational to relational and simplistic but they do illustrate running the same transformation both forward and backward, as well as show the "unit" of such a "monad". We implemented everything in public software, so hopefully the software is even better than my word! As for loading SQL to RDF specifically, I'd be happy to share that technique, but it isn't public yet- please ping me at ryan@conexus.com.

MockObject · on Dec 1, 2022

Could this possibly be explained to the average programmer, who doesn't have the foggiest notion what conjunctive queries, coevaluation, or monads are?

zasdffaa · on Nov 30, 2022

At my university I asked about these flashy new machines which I'd never have a chance to use, I was told they were slow. That was way back then.

tyingq · on Nov 30, 2022

They were the slowest SGI boxes, but a MIPS R3000 at 12.5 to 36Mhz would have been faster than the 33Mhz 386DX in a high end PC in 1998.

Edit: Things were moving pretty fast at the time, so I suppose if you bought one in 1998, it would have looked anemic when the 486 came out. Basically, if you bought anything and held it for 3-5 years, it was much further behind than the same situation today.

bluedino · on Nov 30, 2022

>> the 33Mhz 386DX in a high end PC in 1998.

1998's high end PC would have been a Pentium II 450MHz with 128MB of RAM, 10GB HD, and a NVIDIA card.

The 386/33 was high end in 1990

tyingq · on Nov 30, 2022

Ah, yes, typo on my part...it was 1988, not 98.