

Bacon Ipsum, because Unicode is hard - relaxnow
http://www.geertvanderploeg.com/unicode-gen/

======
jhawthorn
I've been using ｆｕｌｌｗｉｄｔｈ ｃｈａｒａｃｔｅｒｓ. Due to their simplicity to convert to
and read. This page of course has the advantage of more variety in characters.

Fullwidth conversion code in ruby:

    
    
      "string".tr(' !-~', "\u3000" + (0xFF01...0xFF5f).to_a.pack('U*'))

~~~
geon
I wondered what was up with the weird kerning.

    
    
        Nａｍ
        Nam
    

(By the way, they seem to break tab-indentation for pre formatted text in the
markdown.)

~~~
Danieru
Full width characters have a fun habit of breaking most things. Even more so
than unicode.

Since they are only used in CJK languages the majority of programmers are
unaware of them. The seperate code points between half-width and full-width
mean that you need a decent unicode library. Otherwise a user could spoof
another user.

One cool feature is it lets you write monospaced even if you do not control
the font. Since there are full-width codepoints for most programming symbols.

------
kenrikm
I use <http://www.fillerati.com/> because It's nice to use real sentences
instead of ipsum.

~~~
viggity
Isn't the whole point of lorem ipsum is that it doesn't make sense to the
person reviewing the document, that way someone proofing a design will focus
on the page elements, white space and text flow instead of getting distracted
by the text?

~~~
sjs382
It's also valuable to test readability with something you can actually read.

------
babuskov
Not very usable. If you really need to support Unicode on your website, you
need to make sure that proper characters are shown. It does generate filler
text, but would you know that character set and HTML code page are setup
correctly if you don't know which glyphs have to be shown to the user? To give
an example, this tool might generate character č, but if wrong code page is
used (say ISO_8859-1 instead of UTF-8), you would see something like ł or ć in
the browser. And you wouldn't even notice the difference. In some languages,
such subtle change in one character can turn text into swearing or just
gibberish.

I'd rather see a website with _fixed_ text that shows some often used Unicode
characters, The one you can validate (i.e. shows exactly the same on your
website). Adding a JPG or PNG picture of what it should look like would be a
plus. By "often used" I mean: Latin with accents, Cyrillic, Greek, Hebrew,
Arabic, Chinese, Japanese, ... etc. I'm sure a nice "Bacon Ipsum" with one
paragraph from each of these would be quite usable. Although, some are written
right-to-left, so maybe two different sets would be better: one for RTL and
other for LTR languages.

~~~
Shorel
The idea is that you should only use UTF-8, otherwise why bother ?

------
jrabone
Would be useful to have control over output encoding (ie. generate an octet
stream output in UTF8, UTF16BE/LE). Generation of surrogate pairs would also
be needed for encodings that support them, since many applications get this
wrong (even in Java, the number of java.lang.Characters in a java.lang.String
is not necessarily the same as the length of the String...)

You could even add options for "badly behaved" UTF8 (eg. overlong encodings,
deliberate sync errors etc.) to stress test the other end.

------
mhansen
Feature request: RTL. :)

~~~
gioele
plus RTL inside LTR and LTR inside RTL. Have a look at Wikipedia's pages of
Arabic dishes for good examples.

------
rthprog
Link to the github repo: [https://github.com/gvanderploeg/unicode-
gen/tree/master/src/...](https://github.com/gvanderploeg/unicode-
gen/tree/master/src/main/webapp)

------
Groxx
Do random accents help verify that you're handling Unicode correctly? You
don't know what it's supposed to look like, so how do you know you're
displaying it correctly?

~~~
rogerbinns
Did you try it? For example "Internationalization" becomes
"Iｎｔèｒｎáｔìｏｎàｌïｚâｔｉòｎ". If your code did anything bad to the unicode (eg
decided codepoints are one byte, stripped high bit) then it would be very
obvious.

For my testing I also go to the wikipedia home page and copy the text in the
middle of the page which lists how many articles there are in various
languages. This is great because it uses a wide variety of code points,
including ones greater than 0xffff.

~~~
Groxx
I also see that "Iｎｔèｒｎáｔìｏｎàｌïｚâｔｉòｎ" breaks in the middle, instead of
considering it a word. Is that correct? It may be important for checking your
layout. There are also no Asian scripts, nor taller / shorter characters that
might overlap if your line spacing is too small.

------
nilved
Is anybody else as tired of seeing bootstrap as I am? I digress, though; it's
a great project. :)

~~~
lewispb
It does allow for variety but typically developers just go for the defaults. I
like <http://bootswatch.com/> though.

~~~
ihsw
It's really depressing how many websites are still stuck on Bootstrap 1.4.

