Hacker News new | past | comments | ask | show | jobs | submit | rhythmvs's comments login

‘Le siècle des Lumières [plural] … Le terme de «Lumières» a été consacré par l’usage pour rassembler la diversité des manifestations de cet ensemble d’objets [la superstition, l’intolérance, l’abus des Églises et des États].’ (fr.wikipedia.org) — Except for Postmodernism, an era does not usually give itself a name. The term was coined later on (say, just about the acme of the revolution’s Terror, when a cult of such lights/gleams/luminaries was contrived). Where in German it reads ‘Aufklärung’, in French ‘éclaircissement’ is used instead.

First words of the Bible: בְּרֵאשִׁית, בָּרָא אֱלֹהִים, אֵת הַשָּׁמַיִם, וְאֵת הָאָרֶץ. ‘In the beginning God created the heaven and the earth.’ (Gn 1,1) — Okay, the third verse reads, in French (Bible de Sacy translation): ‘Or Dieu dit: Que la lumière soit faite. Et la lumière fut faite.’ — Not a plural.

The angel of the light = Lucifer (<Lat. ‘light-bearer’); cfr 2 Cor. 11:14: ‘And no wonder, for Satan himself masquerades as an angel of light.’ — I don’t believe Kant could ever have imagined him being associated with satanism…


Probably not but he is german, not french. French philosophy and ideology assimilated elsewhere.


alphabetical sort != numerical sort

Developers take note: Unicode Collation Algorithm (UCA)…

As discussed on https://forum.glyphsapp.com/t/unicode-collation-to-alphabeti...


Great work! I’d love to see added a stub for graphics, image processing, and computer vision!


With over 600 “fidäls” (graphemes) a fascinating syllabary for sure! While today the Ge’ez language indeed is used in liturgy only, the script is still used by dozens of East-African languages (with up to about 60 million potential users). Astonishingly, there is only a handful of digital fonts available. More interestingly, typesetting classical Ge’ez poses challenges which are not easily solved by current software.

I have attempted a modern type design based on the earliest printed Ge’ez and hacked together custom line-breaking to typeset a piece of the Ge’ez bible (which to-date does not exist in electronic plain text).

http://dodecaglotta.com/figures/Ethiopic-incunabula-3000px.p...


> More interestingly, typesetting classical Ge’ez poses challenges which are not easily solved by current software.

Quite interesting! Could you give some examples/pointers? I did a bit of sleuthery but as I don't understand Ge'ez, reading (for example) the Uicode Consortium proposal on typesetting Ge'ez didn't highlight what is different, merely what needs to happen. Which is of course quite appropriate for a technical memo but didn't help me find out what I was looking for.


Punctuation is a challenge. To electronically type-write modern Ethiopic languages (like Amharic) in Ge’ez script the regular (Western) word space (U+0020) is used as a word boundary. Properly typesetting classical Ge’ez however requires usage of the colon-like Ethiopic Wordspace (፡ U+1361). But how to do so and implement text encoding is unsure.

First, I simply substituted all word spaces (and equivalent white space characters) with U+1361. Obviously this breaks things. For one, each and every text editor (unaware as they are of ፡’s existence as an alternative word divider character) treats the incoming bitstream as a very long string in which there is not a single word boundary, and thus no opportunity to break lines. Combined with the fact that there do not exist hyphenation pattern dictionaries for classical Ge’ez, text encoded like this can basically not even be rendered on screen, unless as a single, indefinitely overflowing line.

Next, convinced that Unicode’s U+1361 was unpractical to be used as a _character_ (at the level of the text encoding), I implemented it as an alternate _glyph_ to the regular word space, using OpenType glyph substitution (thus at the level of the font and text shaping). This worked out beautifully, because now I could cheat typesetting engines, taking advantage of common line-breaking algorithms (which not only use word dividers as line-breaking opportunities, but also stretch/shrink them to justify lines). Unfortunately, as word spaces are stretched or shrunk, the OpenType glyph shaping engine, while drawing the colon-like ፡ Ethiopic word divider, is not aware of the available space, which thus is placed unsatisfactory, either to near to the preceding word, or worse, even overlapping the preceding character.

Eventually, I went for a hybrid approach whereby I used a combination of U+0020 + U+1361 + U+0020 (i.e. surrounding the fixed-width Ethiopic word divider with regular, flexible white spaces). While this is an ugly hack (certainly from a puristic text encoding perspective), it practically solves the issue, with nicely spaced-out ፡s in-between words.

Another, related issue concerns the lack of hyphens. Since word boundaries in classical Ethiopic are unambiguously marked with the explicitly drawn ፡, there’s no need to indicate when a word is broken in-between syllables at the end of a line. If a line ends with ፡, the reader will know the next couple of glyphs on the following line will not belong to the preceding word, but form another one. Else, it must be assumed that the syllables following the last ፡ on the line, will form a word together with the syllables on the following one up-to the next ፡. But as no current typesetting software supports this locale, one again needs to find a hackish work-around. I did so, at the level of the font, by putting an empty, zero-width glyph at the U+002D codepoint (hyphen-minus)…

There are some more issues involved with typesetting classical Ethiopic Ge’ez, but word dividers, hyphenation and line-breaking are the toughest.

If you’d like to know more, the W3C has an Editor’s Draft concerning ‘Ethiopic Layout Requirements’ [1], but many of the issues raised remain as of yet unresolved, pending user feedback. I found an Individual Contribution (For consideration by the Unicode Technical Committee) “Proposal to Reclassify Ethiopic Wordspace as a Space Separator (Zs) Symbol” [2] on the Unicode.org website, being very illustrative and offering thorough suggestions for implementation details.

I’d love to discuss these and other scholarly typesetting issues with anyone interested. Do check out my Dodecaglotta side project and get in touch!

[1] https://w3c.github.io/elreq/ [2] http://unicode.org/L2/L2015/15148-ethiopic-wordspace.pdf [3] http://dodecaglotta.com/#type-design


Brilliant work, this is one of those things that is going to give me hours of entertainment .. off I go to learn Ge'ez ..



It wasn't an intentional reference to Evola, if that's what you mean - I was unaware of the association until you pointed it out.


The patent holder appears to be an unidentified pseudonym: https://www.quora.com/Who-is-Hendricus-G-Loos

> “All devices are used for Mind Control projects run by CIA or other intelligence agencies. A group of researchers (under the name Dr H Loos) were actually a group of hired professionals for researching and inventing such devices which could be developed and used for mass mind control, PSYOPS, behaviour modification later by CIA.”


I'm curious why they patented their research, rather than keeping it hush-hush.


Possibly because research elsewhere is approaching their work independently.


The US patent system accommodates classified patents: https://www.uspto.gov/web/offices/pac/mpep/s120.html


How can that work? You need a security clearance to read them? What if someone else invents the same thing before the patent expires?


> What if someone else invents the same thing before the patent expires?

Government will offer you a nice sum of "keep quiet money" and an NDA, you're getting hit with NSLs, your patent is outright seized by the state, you're getting threats e.g. of tax or immigration checks... the ways government can intimidate you are endless.

For the obvious, legal, way see https://en.wikipedia.org/wiki/Invention_Secrecy_Act


So basically you're saying that any independent invention of the same thing is also classified.


As soon as you try to patent it, USPTO will look up if there's a similar thing - and if there is and it is "red-flagged", you're screwed...


But suppose you DON'T try to patent it. You just invent it and try use it or sell it. What then?


> “monospace fonts tend to be wider, so the default font size (medium) is scaled so that they have similar widths”

> “the base size depends on the font family and the language … Default system fonts are often really ugly for non-Latin- using scripts.”

‘often’, ‘tend to be’: I am worried. I think it’s a really bad idea to deflect default behavior based on such assumptions, certainly when the deviations are a blind process triggered by proxies (like language tags and some vaguely statistical rules-of-thumb for dealing with generic font family names). That is: without even looking at the actual design and metrics of the actual font involved.¹

What happens if the monospaced font in case has a normal x-height and/or an advance-width equal to that of its serif counterpart? What if the CJK and Devanagari fonts have characters drawn already ‘big on the body’?² Then such hard-coded default moonshot-fixes which try to cater for the lowest common denominator will make things needlessly hard to debug and force the designer still to ad-hoc size-adjust font per font, but now also trying to fix the browser’s ‘fixes’. (Too bad: any `normalize.css` wont help…³)

And yet, all of the needed data is available in the font file. There’s even a dedicated CSS property for dealing with fonts’ varying metrics: `font-size-adjust`.⁴ Not that browser makers care to implement⁵, but since the OP’s post concerns Firefox (which does support `font-size-adjust`, but the article does not discuss it) I wonder: is it a matter of performance that retrieving the actual font metadata and metrics is left out of the equation? Surely, the fact that local font files, base64 data-URI embedded or externally hosted ones can be used, makes implementation all but trivial…

At Textus.io⁶ we’re going great lengths to solve typographic issues such as these. Point in case: for each font we read out the `xHeight` value, then calculate the actual font-size relative to the font’s UPM (unitsPerEm), so we have consistent apparent font-sizes, c.q. aspect ratios.

I think it all boils down to a separation of concerns: proportion and interrelated sizes (ascender, caps, descender, x heights, stem width, etc.) are up to the discretion of the font designer, overall aspect size is the business of the typesetter (css stylesheet author), and the browser ought always draw consistently, regardless of generic font family name, language and/or Unicode code range.

¹ https://opentype.js.org/font-inspector.html ² https://medium.com/@xtianmiller/your-body-text-is-too-small-... ³ https://github.com/h5bp/html5-boilerplate/issues/724https://developer.mozilla.org/en/docs/Web/CSS/font-size-adju...http://caniuse.com/#feat=font-size-adjusthttp://www.textus.io


> I am worried ... really bad idea ... will make things needlessly hard to debug

It has worked this way since the days of Netscape Navigator[1], though, so it's not a change or a new idea. It's just keeping things the way they've always been. (Nowadays, in Firefox, the corresponding settings are under Preferences -> Content -> Fonts & Colors -> Advanced.)

[1] http://www.alanwood.net/unicode/mac_net6.gif


That Netscape screenshot only shows users have always been able to specify their own font preferences for three generic font families (serif, sans-serif, mono), along with a respective font-size. I’m not objecting to that: in fact, it’s a Good Thing that users are able to take control.

But taking control also means dealing with inconsistencies myself. If I were so bold as to set Impact as my default serif font, then I would also set a considerably smaller font-size to make up for the huge x-height. The thing is: if it would bother me, I, as an end-user, could do so, thanks to the browser’s interface exposing to me the needed controls.

It’s a whole different thing when browsers would start changing default behavior based on ‘common’ properties in ‘most’ fonts, without inspecting the real features of the actual fonts they render. Different, because if it would bother me, as a web developer, I couldn’t do anything about it, because the browser, while deviating from standard defaults, would not offer me the required controls, i.e. would render my CSS rules differently based on opinionated variables I cannot know.


That screenshot is from Netscape 6—a Gecko-based Netscape release without “Navigator” in the name. The mechanism mostly survives in Gecko today except for the Latin unification (previously split to Western, Central European, etc.).


Oops! It was definitely there in Netscape 4, but I was a bit sloppy with the image search. Here is the Netscape 4 screenshot:

http://www.alanwood.net/unicode/mac_net472.gif


> And yet, all of the needed data is available in the font file. There’s even a dedicated CSS property for dealing with fonts’ varying metrics: `font-size-adjust`

These are web features from before CSS existed.

The whole font sizing thing comes from how `<font size=x>` used to behave. When CSS came it only complicated this. font-size-adjust is pretty new.

And once the web starts behaving a certain way it's kind of hard to "fix".

(Also, according to a comment on /r/rust, this seems to be something that is an intrinsic problem with monospace fonts, https://www.reddit.com/r/rust/comments/6swzl5/fontsize_an_un...)

(Also, just to mention, all of this complexity would still have to exist even if the monospace thing wasn't a problem, to support user configured font sizes)

> I wonder: is it a matter of performance that retrieving the actual font metadata and metrics is left out of the equation?

I think there is a nontrivial perf impact, yes. We've had to do some locking to make font metrics (for ex and ch) work in Stylo, for example, and while we've optimized it it still makes us lose out on parallelism a bit.


I believe I’ve come across allegations that the metrics encoded in the font (xHeight and unitsPerEm, by the sound of it) are fairly hit-and-miss. Any comments on that?

Bear in mind non-Latin scripts also.


Sure. Badly produced fonts (those you can find on the cheap), tend to have faulty metrics: i.e. the designer did not change the default ones of the font design app (Glyphs, FontLab, FontForge, &c.) they are using so as to match the actual outlines of the glyphs they drew.

E.g. the font’s metrics tell: xHeight=500 and UPM=1024, so we may assume the aspect ratio is 0.48. But then we look at the outlines and see the designer drew the upper most top node of the x at the 600 coordinate on the Y-axis. So, in fact, the actual aspect ratio is 0.59, and thus the metrics are useless indeed…

But again: should we then _assume_ to be always dealing with badly produced fonts? Or could we just expect font metrics to tell the truth? For if we won’t, then we’re totally lost, and will make things even worse for professionally produced fonts, which do honor the specs.

(As for non-Latin scripts: it’s indeed up to the discretion of the font designer how to draw the height of, say, CJK characters or the size of the teeth in Arabic, but always _relative to the xHeight_ as stated in the font’s metadata, so the font behaves consistent with the specs.)

At Textus.io, we therefor rely on the font’s built-in metrics metadata, for now. But since we are obsessed with fool-proof typography, we are indeed considering to look at the actual coordinates of the outlines instead…

I do not expect browsers to do deal with inconsistent font metrics, though. As a developer, I instead want to rely on browsers behaving consistent, being assured they won’t ‘fix’ things I already fixed, for then the outcome will be totally unpredictable. That’s why we have standards and specs after all, haven’t we?


> But again: should we then _assume_ to be always dealing with badly produced fonts?

A major theme of the web platform is to assume the worst, because the worst is actually often not uncommon.

Another major theme is backwards compatibility. You've mentioned a fix that would break all previous sites relying on this behavior. Fixes that require a clean-slate redo of the web just won't work.


Note that these are mostly user configurable settings (although it's only the default font size that can be configured, not the other absolute and relative values)


Thanks! Liked your Open-Publisher, too!

Did you subscribe to our waiting list (Textus.io)? If so, I could give you beta access right away.


Today, 18 July 2017, is Jane Austen’s bicentenary.

Just the perfect day to showcase our product: Textus, a typesetting engine in the cloud.

Generating all of Jane Austen’s Collected Works from scratch (i.e. plain text markdown) into beautiful online ebooks and PDFs is a matter of seconds, instead of weeks…


Nope, it’s everyone! Well, perfect timing: it went down just six seconds after we submitted our latest repo to Show HN: https://news.ycombinator.com/item?id=14800791 :-/

Anyway, the output of the source on Github is here: http://janeausten2017.com/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: