Hacker News new | past | comments | ask | show | jobs | submit login
Shapecatcher: Draw the Unicode character you want (shapecatcher.com)
155 points by Tomte on Jan 13, 2023 | hide | past | favorite | 37 comments



Seems to be matching against one particular typeface, and is very sensitive to how similar what you write is to that typeface. The first thing I tried was a lowercase "a," which, in my handwriting, is a one-story "a," and it had no idea what it was. Likewise it guessed poorly at my lowercase "f" because the top of my "f" curves back down, and its doesn't, etc. Seems like it would benefit from a dataset with more variations on how various characters are written in practice.

Neat concept though!


I tried multiple times to match an ampersand, and thought I did a fairly accurate job.

I eventually managed to get it as a third match, but this seems to be what it matches against [0], which isn't the ampersand I'm used to (nor the one you get in a google image search).

[0] https://shapecatcher.com/unicode_img/38.png


It's now useful for arrows, mathematics etc.


Only halfway related, but a long time ago I had an idea for a character encoding that was stroke based.

In other words, based on minimal stroke primitives (line, arc, circle, etc.) that were placed not with any exact coordinates, but simply in relation to each other conceptually and with crude size/position categories. E.g. "downward stroke, top to bottom" is a capital "I" while "downward stroke, middle to bottom, dot closely above first stroke" would be a lowercase "i". And then for matching forms with different meanings, there would be a final "family" selector, e.g. to distinguish an em-dash from the Chinese character for "one", or an en-dash ("punctuation" family) from a minus sign ("math symbol" family).

And then a suitably compressed bit encoding for the instructions. So in the end, something like "I" might just be 3-4 bits long, while a complex Chinese glyph might be 60 bits.

But the main feature being that a font renderer could always draw a primitive version of any glyph, even if you don't have it in a single font anywhere, because the character code itself encodes it. And then that character codes wouldn't be just something totally arbitrary invented by Unicode, but inherently meaningful, and anyone could be free to invent any character they wanted, that would always been drawn by any software, no Unicode gatekeepers needed.

Obviously it's not terribly practical, for a whole host of reasons. But I still sometimes think about how elegant it would be to have a "geometric" self-describing character encoding, and to get away from all of the political decisions around language scripts and where they get put in Unicode and in which version.


Problem is, how do you define these representations in the first place? For one thing, some characters have multiple different forms… like a/ɑ, or g/ɡ [0]. Then you have characters which differ only minutely — like Thai ช/ซ, or Ethiopic ሀ/ህ, or Cherokee Ꭺ/Ꭿ. And, worse, those characters can also look entirely different between fonts (see e.g. [1] for Thai). So by the time you’ve finished working through all those choices, and created a format which can distinguish them all, you’ve effectively created yet another font — just in a very lossy vectorised format.

Or if you go the opposite direction, and expect each individual character to make its own individual choices in this regard — well, that basically has the same problems as PDFs: it may look good, but it’s totally impossible to process programmatically, since it overspecifies the visual details at the expense of semantics. And although that might be fine and even desirable for certain usecases, it does limit the places where this format could be used.

That being said, I can certainly see possibilities and places where this could be useful to me. Perhaps I’ll have a go at implementing this someday.

[0] In case those look the same in your font, here they are again:

    a/ɑ  g/ɡ
And in case those look the same too, then… well, have a look at the codepoints in the Unicode reference charts, I guess!

[1] https://wrdingham.co.uk/thai/tellthai_preface.htm


Yup, I didn't say it would be easy. :) But initial thoughts are simply that characters would be defined "canonically" (for progammability) with fonts free to vary stylistically as desired just as they are now (for the classic single- and double-story 'a' and 'g' you mention, for instance.) While minute differences within any language should be handled by the strokes themselves or else the final "family" selector I mentioned when strokes are identical (is it an en-dash or a minus sign).

Again, more of a thought experiment of how character encoding might have gone a different way in the past. And because canonical forms would be required for interoperability it would still need a coordinating body (like Unicode) to standardize them, but people would still be free to encode their own meaningful characters outside of the standards (such as rare/ancient family name characters in Chinese that aren't in Unicode).

What's really fun to think about is how rendering libraries might even use machine learning to draw glyphs when no font glyph is available, in the style of an existing font, whether a garamond serif, or brush Chinese.


You will appreciate RFC 5242 [1].

[1] https://datatracker.ietf.org/doc/html/rfc5242


Very interesting, than you! I always thought I couldn't be the only one with this basic idea, fascinating to see a related proposal for it.


Please note the publish date. ;-) (But the proposal itself is detailed enough that you will still appreciate it.)


It's an interesting idea, my first thoughts are how do you make it machine readable. Like if you're writing a browser how do you translate something that "looks" like google.com and know you need to go to google.com and not googIe.com?

Maybe you don't bother (i.e., don't try to parse bytes in this encoding as plain text) but that has a bunch of consequences too.


This is basically the point (or one of the points) I was trying to make in my sibling comment; thank you for expressing it much more clearly than I did!


Sounds like a glyph encoding.


Reminds me of Detexify, a similar tool for learning the LaTeX codes for a symbol.

http://detexify.kirelabs.org/classify.html


I’ve been using this site sporadically for nearly a decade. It’s been a handy resource.


This reminds me of qhanzi, which I found to be super useful for studying Chinese characters.

https://www.qhanzi.com/


Pleco does this well too.


I can't get it to work for any chinese characters.


Hmm, it doesn't seem to recognize "Egyptian Hieroglyph D053", no matter how accurately I draw it!


Drew a h-bar. It found Cyrillic tshe (ћ), as well as h with a stroke (ħ), even Planck's constant (h),

... but not ℏ.


This doesn't appear to even try to match to Chinese characters? I drew the ones for convex and concave (凸 and 凹, respectively), but only got back dominoes and APL symbols as matches. Drawing a box returns the Japanese katakana ロ, but not the Chinese hanzi 口.


Yeah, the text to the right of the input is explicit about this:

> Currently, there are 11817 unicode character glyphs in the database. Japanese, Korean and Chinese characters are currently not supported.


Tried it with β and with ξ but did not get a match. (Disclaimer, I did eventually get the β working, but first versions only gave other results such as P, ρ or Բ).


I cannot for the life of me get it to recognize a snowman.


After failing too many times looked up how it is supposed to look <https://img.shapecatcher.com/svg/9731.svg>, then got it on second approach <https://cdn.imgpaste.net/2023/01/14/K6IkIp.png>.


Try an 8 with the cross-bar erased, and two eyes - nothing else. That gives me U+26C4. I can't get U+2603, though.


It will be great if it can find all variants of:

   ‾⎻⎼⎽ lines

   _ light lines
| bar, ⎸left bar, right bar⎹

different angles of /, \

etc...

I'm using these for documentation and monodraw is only useful up to a point and references to these are scattered in different pages in no relation for drawing purposes.

I tried drawing a \, but shapecatcher only show just "\" and only if it's semi-close to 45°.

Edit: Thanks @mapierce2 , Detexify seems to work better for this purpose, but the results seems to be images, not text.


I looked up the snowman in several fonts and copied as best I could. Utter failure. If it can't find the snowman....


Snowman didn't work for me, but the other characters I tried did. I had some trouble with "therefore" because there were a lot of virtually identical symbols, but I can't blame it for that.

It mentions not all characters are in the the database.

ETA: it's got a pretty weird snowman. http://shapecatcher.com/unicode/info/9731

By immitating this rendition I was able to bring it up.


Awesome, I got it to work with all the characters I tried.

It would be nice for it to also give alt codes for it in the output

like é = [alt 1 3 0]


That just the decimal representation of the code point, isn't it?


I am rather impressed with this and will definitely make good use of it.


Funny, I just found this on Google yesterday when looking for some glyphs.


are Egyptian not hieroglyphs not included? because it didn't recognize even one of my impeccable drawings


U+23FB ⏻ POWER SYMBOL

Doesn’t recognize.


dammit, we all know what most of you tried first.


Glagolitic capital letter dobro?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: