Hacker News new | past | comments | ask | show | jobs | submit login
Smjörið er brætt og hveitið smátt og smátt hrært út í það, þangað til það er gengið upp í smjörið.
125 points by pg on Feb 7, 2008 | hide | past | favorite | 154 comments
Thanks to a fix by Patrick Collison, utf-8 now seems to work right.

So. A note to all the "unicode makes this unusable" people -

Apparently, while you were complaining, someone else was solving.

Thanks, grandma.

OK. Now how about database access (with support for prepared statements), regular expressions, and networking?

Super-duper. I'm not going to use Arc for anything serious until essential libraries are in place.

Different people have different ideas of serious. To me, exploratory programming is fairly serious, because that's the kind of programming that generates ideas.

Arc is already capable of supporting some subset of applications that are serious in your sense. News.YC is at least moderately serious in that sense.

You said News.YC uses some kind of persistent hash structure for storing everything. This seems to me like Greenspun's Tenth Law except with Berkeley DB instead of Common Lisp; I wouldn't want to write the logic to do what BDB already does much better and faster (I don't want to implement ACID transactions myself if I decide I need them).

news.arc stores all information in flat files as lists.


At least you were kind enough to let us know.

Well you'll be late to the table for Web 3.0 then... ;-)

clarification: Arc should probably have some kind of database hook at some point, not much point in reinventing that wheel when it works so well for many problems. But that's not the point, as the author has repeatedly pointed out.

Which of those are you planning to contribute?

I'd love to see regular expressions as s-expressions in arc - not pasted on as interpreted strings.

Actually I think I have seen a library for MzScheme that does it this way.

Also curious how this would look

It could look like this: (* (+ "a" (* "b"))) for: "(a+b)" [Perhaps you prefer not to re-use + and * for this.]

Sure, it's somewhat more verbose for this toy examples - but every intermediate expression (and the whole thing) is a Lisp-Object in its own right.

You would not even need macros.

Edit: There is a place for macros here to make things less verbose. Just write a macro that 'compiles': (* (+ a (* b))) into the form above.

For convenience you can offer a function that builds RegExps out of strings in the usual way.

There is a public git repo anyone can push to (the so-called git "wiki", or anarki). It contains regular expressions. You could wrap some DB bindings for mzscheme with relative ease.

I think the complaints were more about how pg was originally saying that he intended to never support Unicode. That said, people should realize that UTF-8 encoding/decoding is the zeroth step to internationalization with Unicode.

pg didn't say that. people just made it up.

He did however say:

"It's not for everyone. In fact, Arc embodies just about every form of political incorrectness possible in a programming language. It doesn't have strong typing, or even type declarations; it uses overlays on hash tables instead of conventional objects; its macros are unhygienic; it doesn't distinguish between falsity and the empty list, or between form and content in web pages; it doesn't have modules or any predefined form of encapsulation except closures; it doesn't support any character sets except ascii. Such things may have their uses, but there's also a place for a language that skips them, just as there is a place in architecture for markers as well as laser printers." (http://arclanguage.com/)

I certainly interpreted this as ASCII-only (along with presentational markup) was an explicit design decision, and I don't think this is a far-fetched interpretation. Luckily PG has clarified (http://news.ycombinator.com/item?id=111189) that this is not really what he meant, and now everyone is happy and has regained trust in Arc!

What kind of site does it make YC if this is how to get karma?

there is a very different pattern of karma for comments in "big" threads.

On reflection: your reply is ♫♪♫ to my ears. I ♥ good points...

Edit: heh, "thanks grandma" has even more than that.

Edit: btw it's funny that i'm still being upmodded

so someone downmodded all of:

- a simple, popular comment

- a comment suggesting maybe the first one shouldn't have been upmodded so much

- a comment backing off

Why would you downmod all of them? If you like dislike the first comment, you should like the second, and vice versa. And if you're unhappy with me in general for the first two, you should like the third one.

1) How do you know the same person downmodded the three?

2) Simplest explanation for that case would be that he's downmodding the latter two as noise. You can agree to a comment and still believe it doesn't add anything to the topic in which it appears.

I downmodded because the discussion was getting too meta :P

all 3 downmods appeared between quick refreshes.

♫♪♫ to my ears. I ♥ unicode! To ∞ and beyond, ☺

How do I make the infinity? Actually where do I get all those symbols?

You have to buy a unicode keyboard. They have over 95000 keys and take up about nine square meters. Hope you have a big desk.

(Sorry, I couldn't resist)

You know, that would actually be an amusing hardware-hacking project.

If you made a keyboard that had every character in every language spoken in the EU, you could even file to make it a standard with whatever earnest standards body is in charge of such things. No linguistic minority should have to use control keys! It would be like giving peanut butter to a dog.

A smart bureaucrat might mandate dynamically rewritable keys and get the Optimus http://www.artlebedev.com/everything/optimus/

But smart people don't endeavor to regulate minutia.

But smart people don't endeavor to regulate minutia.

I think you just implied that all politicians are dumb. I agree.

I used WindowsXP 'Character Map' accessory for the ♫♪ and ♥. I think I copied and pasted the ∞ and ☺ from http://www.bigbaer.com/sidebars/entities/ .

http://www.unicode.org/charts/ (characters)

http://www.unicode.org/charts/symbols.html (geometrical shapes etc)

As I understand only UTF-8 is supported, so forget characters for UTF-16 and UTF-32 (but you rarely need these anyway)

UTF-8 can represent any Unicode character.

yeah i confused that with ISO 10646 (has the 3 levels which specify unicode, not all levels support the same characters) and UCS2, which UCS2 is essentially UTF-16, so i meant if he used any characters bytecoded in UTF-16 essentially he'd have a problem, although utf-8 doesnt support certain octets. UTF-8/16 were not yet part of the standard before UCS version 2.0. Been a while I guess since i got updated

Ok, so I see infinity is U+221E(http://www.fileformat.info/info/unicode/char/221e/index.htm), now how do I go from that to posting it on a forum like this?

Well, I went ahead and made this tool (http://utilitymill.com/utility/Display_Unicode_Char_From_Hex)

to answer my own question.

If you have a mac, Cmd-Opt-T

Do I understand correctly that Arc strings are sequences of octets?

If so: I really don't want to be a negativity guy but it seems like every language that has made an 8-bit string the default string type has regretted it later because it is so painful to change it without breaking code. Okay, Paul says that he won't mind breaking code. Maybe he means it, but it doesn't make any sense to me to knowingly and consciously repeat a design mistake that dozens of other people have made and regretted.

It really just takes one day to get this right. You need to distinguish between the raw bytes read from a device and the true string type (which needs to be 21 bit or greater). You need a trivial converter from one to the other (which you can presumably steal from MZScheme) and back.

That's it. You get this right at the beginning and you never have to backtrack or break code.

My apologies in advance if this post is based on incorrect premises. I'm trying to help.

Arc snarfs the string implementation from MzScheme which support Unicode in The Right Way, as code points rather than octets.

So should I infer that the only reason UTF-8 is mentioned is that the reader APIs do not let you select the codec? Or is even that provided in which case it is accurate to say that Arc supports Unicode-in-general?

Arc uses MzSchemes reader (it modifies the readtable slightly to support []-syntax). AFAIK you cannot access the reader API from inside Arc. The reason Utf-8 is mentioned is that it is the default encoding when MzScheme reads or writes files or streams.

I don't think anyone at this point would claim that Arc supports unicode-in-general.

Could you offer a better solution? What would your solution offer that octets do not? Random character access? No, because not a single unicode encoding offers easy random character access (because they are made of possibly several codepoints, which, in some encodings, are made of more than one basic "chars"). Gylph, word and sentence segmentation? I guess not.

Abstraction. Having Unicode strings (i.e. strings that are a sequence of Unicode code points rather than octets) allows you to work with strings without worrying about encoding (except when doing IO, which is where encoding matters).

If you treat strings as octets OTOH even simple operations like concatenating two strings might lead to headache if the strings are in two different encodings. And how do you keep track of the encodings of individual strings? Madness lies down that road.

.(; sɹǝpuoʍ sǝop ǝɹnssǝɹd ɔı1qnd ɟo ʇıq ǝ1ʇʇı1 ɐ 'ǝǝs ¡ʍou ǝɯosǝʍɐ sı ɔɹɐ uı ʇɹoddns ǝpoɔıun ¡ɥɐɥ

Yes, because writing upside down is so incredibly useful! How did we ever live without it?


Nâh, dâh zèn we maui klâh mei...

    (define Y
      (λ (m)
        ((λ (f) (m (λ (a) ((f f) a))))
         (λ (f) (m (λ (a) ((f f) a)))))))

Great... but that's Scheme, not Arc.

oh yes.

Make λ an alias of fn, and have it replace automatically in whatever editor you use?

fn is fast to write, but λ is much more readable, 'cos it stands out.

Wikipedia to the rescue: "The butter is melted and the flour stirred into it (slowly but surely), until it is has blended with the butter." (http://en.wikipedia.org/wiki/User_talk:S.Örvarr.S)

It's an icelandic recipe for roux?

What language is that? I'm guessing Icelandic; it's a little too unicodey to be Danish or Norwegian, but the words look similar.

Good guess.

As an Icelander, I must surely ask what pushed you to use Icelandic as an example. :)

That it would look foreign to the maximum number of people.


People love thorn. Anglos used to have it and now that it's gone, they miss it.

I don't þink ðere's any reason we can't have ðem back -- þorn and eð, I mean.

Does Icelandic have the 'th' sound? I've heard that English is the only European language with it, but if Icelandic has the written thorn, maybe you have that sound too?

It does, that's precisely what þ and ð represent (the unvoiced and voiced variants respectively, which got folded into the same "th" in English). Also, þorn is the best letter name ever :)

Spanish seems to have 'th' as well

农历新年 Happy (Chinese) New Year!

I'll never understand why asians type all in question marks. It must be some kind of unary system.


If you are wondering: On Linux/X11, there's Ctrl+Shift+[unicde number in hexadecimal], gnome-character-map, umap or KCharMap (ت)

And now for the less serious part:

ሞሡሢ Am I the only one whom these Ethiopic characters remind of Tengwar? BTW, are there Unicode chars for Tengwar? I think there should be! (But not for Klingon, because it sucks.) I have fun wirting this on my ⌨, but ℐ∫ ᚾℍℹ⑀ not pointless? Who cares? Anyway, now we can use distinct characters for Roman numerals: Ⅰ,Ⅱ,Ⅲ,Ⅳ,Ⅴ,Ⅵ,Ⅶ,Ⅷ,Ⅹ,Ⅻ,Ⅽ,Ⅿ! Ye darn kids! Everythin we had was 7-bit ASCII, without parity, and we were damn greatful for it? You think you had it bad? I had to use Morse code for browsing porn, back in my days! And I had to etch my public key into the wall of a rotten ol' cave! We did not have this fancy-shmancy routed network, i had to remember the way from here to there all by myself!

--- this post was presented to you by Too Much Coffee.

Freude, schöner Götterfunken! Tochter aus Elysium!

if you search for "Smjörið er brætt og hveitið smátt og smátt hrært út í það, þangað til það er gengið upp í smjörið." on Google this thread is the fourth result.

Damn fast...

இது தமிழ

Tamil++; // (இது C)



किसी वस्तु, व्यक्ति, स्थान, या भावना का नाम बताने वाले शब्द को संज्ञा कहते हैं। जैसे - गोविन्द, हिमालय, वाराणसी, त्याग आदि संज्ञा में तीन शब्द-रूप हो सकते हैं -- प्रत्यक्ष रूप, अप्रत्यक्ष रूप और संबोधन रूप ।


В самом деле браво!

Чему вы рады? :)

Да просто...

Делать нечево...

Мы все-таки на этом веб-сайте находимся, а не работаем... :~)

Happy chinese new year. 白人看不懂


И цан’т белиеве ит!

И тоо!

[that 'ts' should be a 'k']

Røv og nøgler! PG succumbs to the demands of political correctness! Will we soon see mandatory static type declarations and CSS in Arc?

Well, not quite. I gave Patrick an early version of the code, a couple weeks before Arc was released, and he immediately sent me this fix. I just didn't get around to incorporating it till now.

There's a difference between things I don't care about, and things I'm actively against. I don't care about character sets and css, so those things will no doubt gradually get better.

Classic static typing, however, I think is actually a bad idea in a general-purpose language. It makes languages weaker. So it's never likely to happen in Arc itself. However, one of the explicit goals of Arc is to be a good language for writing other languages on top of, and I can imagine plenty of languages for specific types of problems (e.g. circuit design) in which static typing would be a good idea.

It's not true that static typing always makes languages weaker. It makes map more powerful, for example: the desired type of output sequence can be inferred rather than having to supply a first argument of the same type like in Arc.

I used to agree with you, by the way -- static typing in most languages feels like a straight-jacket. ML wasn't enough to change my mind. It took Haskell.

Interestingly, heterogenous lists are the only example I ever hear cited for how ML-family type systems can cramp your style. It leads me to wonder if the situation is not unlike Fibbonacci sequences and naive recursion.

Anyway, I find that usually when I want a heterogenous list in Lisp, all I really need is a tuple. I want an ad-hoc way to group some values together (i.e., I don't want to bother creating a named structure), but I generally know the type I want in each position.

In the rare situations where I really do want a heterogenous list, Haskell does make it possible. The standard library has a Dynamic type that stores an arbitrary object along with a first-class manifest type identifier. These type identifiers have to be generated at compile time, but GHC has built-in syntax for this, and if it didn't it could still be implemented as a Template Haskell macro, or failing even that, just done by hand once for each user-defined type. That's all the support that's necessary from the core language -- the rest of the dynamic typing system is just an ordinary library.

Now, granted, if you wanted to use manifest typing for everything in Haskell, it would be ridiculously cumbersome and you'd be much better off just using a dynamic language[1]. But if you use it only where it's needed, then the dynamic casts will bloat your program by a couple symbols per thousand lines, and in return you get programs that damn near always work the first time they compile, along with a few other nicities like the one I mentioned above with map.

[1] There are plenty of cases where the converse is true. To name an obvious one, you could write a set of Lisp macros to implement lazy evaluation. But if you wanted to use them everywhere, you'd be much better off in Haskell.

I rarely need to calculate Fibonacci numbers, but I use lists of varied types of objects constantly.

So do I -- when I'm working in Lisp. Lisp gives you a Swiss army knife and lets you build specialized tools when you want them. Haskell gives you specialized tools and lets you build a Swiss army knife if you want it.

To be clear, this is what ML's variant types are all about. You can easily create a list that contains e.g. both ints and chars:

  let mylist = [`Int 5; `Char '5']
Technically the elements have the same compile-time type, but the question is, what practical difference does that make? In what cases are variants an inadequate solution?

I don't understand why CSS or HTML are being mentioned during the design of Arc. These seem like library issues and your announcement of Arc was spoiled IMHO by the "rant" about HTML and tables. This is only made worse by the Arc Challenge which seems to be more about the design of libraries for HTML/HTTP etc. than the language.

What am I missing?

If your language doesn't support anything but toy apps it quickly evolves to be optimized for building toys.

If the first Arc apps had not been full-featured Web apps, but had instead looked like examples from SICP, everyone would be complaining that the language was only good for computing Fibonacci sequences and writing interpreters for itself.

OTOH, you can't expect a new language to immediately offer the library resources of, say, Perl.

So the plan for Arc's early days seems to be similar to what the Pragmatic Programmer guys called the "tracer bullet" approach:

Tracer code is not disposable: you write it for keeps. It contains all the error checking, structuring, documentation, and self-checking that any piece of production code has. It simply is not fully functional. However, once you have achieved an end-to-end connection among the components of your system, you can check how close to the target you are, adjusting if necessary. Once you're on target, adding functionality is easy.

On day zero, Arc let you construct and deploy every aspect of a useful software system (a web app)... but it took a very narrow and direct path to that goal: emphasis on tables, no Unicode support, borrowing some functionality from an existing Scheme environment, etc., etc. That is what PG was trying to convey in his announcement: the strategic plan for Arc's early days is to work on designing a complete skeleton, but not add a lot of flesh.

I could tell from all the people already dissing Arc before it was released that whatever I released was going to be attacked on any possible pretense. So, like someone bracing himself to be hit, that was what I was thinking about as I was about to release it: what are people going to seize upon as a way of attacking it? Which meant that was what much of the initial announcement ended up being about.

It was a pretty odd situation to be in. If I'd been releasing Arc into a neutral environment, I probably would have said what I wrote in http://paulgraham.com/core.html. But maybe it's just as well I gave all the flames something to expend themselves on before talking about subtler questions.

I thought you handled it pretty well. Basically, you wrote a big sign saying "here is the bike shed", to make sure bike-shed commenters had something to occupy them. :)

Actually, it's probably beneficial to encourage flames. Your users are hackers, and flamewars are the only form of public dialogue among hackers. Ergo...

Fair enough, and after all: "Real programmers ship"

I doubt anyone would have commented on the tables thing if pg hadn't made a big deal out of it.

people did view source on hacker news, saw tables, and brought it up. pg didn't make a big deal out of it first.

Bringing it up about the website is a separate issue from bringing it up about the language.

By "Classic static typing", do you mean C++/Java-style static typing, or does it include Haskell/ML-style type inference as well?

I mean the kind that will not let you create a list whose elements could be of any type.

Java will happily let you fill a list or hash with objects that could be of any type yet it is generally the quintessential blub language.

No, it doesn't. Elements of a Java collection must all be of the same type. The elements may be implicitly coerced to a common supertype, but if you want to get the original types back you have to downcast--which is basically explicit dynamic typing.

You can always downcast in the presence of an instanceof case statement:

   ArrayList myList = new ArrayList(new Object[] { "foo", 42, new Bar() });
   for(Object elem : myList) {
      if(elem instanceof String) doStringThing((String) elem);
      else if(elem instanceof Number) doNumberThing((Number) elem);
      else if(elem instanceof Bar) doBarThing((Bar) elem);
      else doObjectThing(elem);
Or you could keep elements as Objects until you needed to perform a specific operation on them, then cast at the site and perform the operation, letting the ClassCastException propagate if you're wrong. This is basically what Arc does.

Have you looked into optional static typing (e.g. EC4 style)? The default Array type for EC4 takes types of all kinds as well. It is a satisfying middle ground.

Thanks for the serious reply. I'm not sure I deserved it :) I appreciate your clarification of the distinction which is not clear from the "manifesto" (http://arclanguage.com/).

I suspected you deliberately mentioned ASCII-only and presentational markup because you knew it would tick off (and hopefully scare away) a certain type of perfectionist which you consider non-productive for explorative hacking.

Ååååh another Dane on the ropes :-)

Öt szép szűzlány őrült írót nyúz. Egy hűtlen vejét fülöncsípő, dühös mexikói úr Wesselényinél mázol Quitóban.

Bork bork bork!

Η ευχαρίστηση στην εργασία βάζει την τελειότητα στην εργασία ~ kudos to 'Patrick Collison'


Indeed. Aristotle in fact ... translates roughly as "... Pleasure in the job puts perfection in the work ..."

What needed to be changed? I am no character encoding guru but I thought that treating strings as opaque octet sequences was good enough to "support" UTF-8. i.e. Unless you actively break it, it should work by default.





geeks. you guys forget about us sometimes. what is this?

Hacker News now supports "utf-8", a way of storing text that supports characters from languages other than English.

The excitement comes from the fact that Hacker News is programmed in Arc, and the change to Hacker News implies that Arc will soon support utf-8 too.

thank you sir





To je výborný !

Zdravím do Čech :)

Byl jsem narozeny cech, spravne prazak, ale ted jsem american. Prominte, muj pocitac nema hacky a carky, a moje cesina je detska.



testing some pitfalls... U+005c \ U+FF3C \ U+FFE5 ¥

U+007E ~ U+301C 〜 U+FF5E ~


heh fantastic. I take back all that I said. Nice one ☺

someone from India --> संगणक प्रणाली - आर्क

Хубаво. Фонта само е някакъв преебан.

það er gaman að lesa. Takk fyrir!

Bona novaĵo!


يعيش اليونيكود, شكراً بول.

ನಮಸ್ಕಾರ. ಸವಿನುಡಿ ಕನ್ನಡ :)

namaskAra, savinudi kannaDa :)

łąś żółć - testing

unicode and arc are ♫♪♫ to my ears too.. a fun way to play with arc and unicode might be at http://twext.cc/go/arc

Լիսպը փայլուն է։

but ixnay on the igpay atinlay upportsay

pg, you are my hero.

My name is Daniël. Not sure though if writing that wasn't possible before...

މީމަގޭ މާދަރީ ބަސް. ދިވެހި ބަހަކީ ރީތިބަހެކެވެ.

This is my language - Dhivehi. Written right to left.

Paul Graham rocks.

ПГ рулз.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact