Hacker Newsnew | comments | show | ask | jobs | submit | ekiru's comments login

Neither the Wikimedia Foundation nor the Wikipedia editors participating in the linked discussion seem to think that domain name supplier is a community matter.

As far as I can tell, this was a decision of the Wikimedia Foundation, and the resulting action will be taken by the Wikimedia Foundation.

-----


Smalltalk requires explicit labeling of all but the first argument (though they are ordered) for keyword messages. Good naming conventions can make this enjoyable (at least subjectively).

In a hashmap based lisp, I'd expect defun to look something like

(define-function: (foo-with-a: a b: b c: c) with-body: ...)

Of course, since you're using an unordered representation, one could instead say (with-body: ... define-function: (c: c b: b foo-with-a: a)), which is significantly more confusing. This also means you probably can't do implicit progn.

progn in general will probably be unpleasant, unfortunately, unless one includes syntactic sugar for ordered sequences, which might defeat the purpose of an associative-array-based Lisp.

The first thing I've noticed about a vector-based Lisp is that you actually need primitive integers so that you can do indexing, which isn't the case with cons-cell based Lisps.

-----


It's non-terminating execution during compilation, similar to the example elsewhere in the comments using nonterminating macro definitions in Common Lisp.

A compilation process that involves executing and waiting for the termination of a program that doesn't terminate can't itself terminate.

-----


It actually says "What you won't learn: Java."

-----


In my defense, this is what shows up on my screen:

  What you will learn
  What you won't learn
  Java.
Weird!

-----


Your defense is that you badly misquoted what shows up on your screen?

That is weird...

-----


Honest question: Is that really what you thought? It seems obvious to me that I was implying a defense that the phrase I quoted also appears on the page, directly next to the phrase ekiru quoted, which makes the mixup a little more understandable, because it means I didn't see something that simply didn't exist. Did you get that and post your reply anyway, or did I just express myself poorly?

-----


> For example, in The Myth of the Rational Voter[1], Caplan describes how it's actually rational behavior for voters to vote irrationally: the benefit of the degree they can sway the election is far smaller than the cost of acquiring sufficient knowledge to determine what candidate would most benefit them.

This is a minor detail, given that that was just an example you were giving; but in case it piqued anyone's interest, I'll correct it. That's not actually the explanation Caplan gives for voters acting irrationally (indeed, there's nothing that even appears irrational about that; that's obviously rational behavior). What Caplan argues is that voters rationally vote irrationally because they have preferences over beliefs, giving voting in accordance with false but preferred beliefs psychological benefits, and that if these psychological benefits outweigh the negative effects of the irrational voting, discounted by the low probability of deciding the election, it is rational to vote irrationally.

-----


Almost every Perl 6 implementation implements their parser using Perl 6 regexes(partly because it's much more difficult to properly support the extensible grammar otherwise), which are essentially PEGs.

-----


Alternately, you can implement such context-sensitive features in another pass after the parse.

-----


Linus suggested in the OP that automated reference counting (where the language implementation handles reference counting for the programmer) is preferable to GC.

Python uses such reference counting (although it also has a GC fallback to ensure cyclic structures are collected).

-----


And it is one of the reason why python is slow and difficult to scale on multiple cores (the main difficulty by far of removing the GIL is reference counting).

-----


Scaling Python on multiple cores is easy: you just run multiple Python processes, each with its own per-core GIL.

You are correct, though, that ref-counting overhead is one of the main reasons Python is slow.

-----


is the ref counting slow because it has to be a protected operation? or is it just the amount of them. if it is due to being protected by locks there are lock free ref counting systems (as described in here: http://www.google.com/url?sa=t&source=web&cd=1&v...) that essentially allow batch refcounting opterations to accumulate in a per thread counter that are only reconciled occasionally. assuming you design your data structures correctly (no false sharing) you can get very high write performance because all your counters are cached and there is no cache invalidation due to competing writes on other processors. of course it takes more memory so it is only suitable for highly shared objects, but it can make a significant difference in a reference counting system. (the article points to scalability issues in the linux ref counting system and how to resolve them)

-----


That's awesome! I knew there were faster approaches to ref-counting, but I didn't know about that one, "sloppy counters".

The most common approach to speeding up ref-counting is to ref-count bigger objects — modules rather than individual variables, say.

The simplest way to speed up ref-counting transparently is to statically analyze the code and remove redundant increment and decrement operations. This can be tricky in practice, and I haven't heard of anyone actually doing it.

-----


Just for the record: Pramod Joisha did static analysis of redundant reference counting operations in 2006 in the Bartok C# research compiler (http://www.hpl.hp.com/personal/Pramod_Joisha/Publications/is...). IIRC, doing it improves performance significantly.

-----


It seems to me that the best way to do this would be to put the analysis into a tracing JIT. since the compiler knows exactly what will happen in a long code sequence removing redundant incs/decs would be fairly trivial.

-----


The problem with most methods to improve ref counting speed is that it generally breaks existing C extensions. For that reason alone, I don't expect cpython to significantly change its way of doing things in that area for a long time.

-----


The drawback with multiple processes is that they each have all the compiled bytecode on their heap. When I "import nltk" my heap goes from ~12 to ~36 MB. Add a few more dependencies and you end up wasting a non-trivial amount of RAM on python heaps.

-----


You could do the import and then fork the child processes. That should share all the memory used by the nitk module between all the children (for as long as the memory is not modified by anyone, which triggers a copy-on-write allocation of the affected pages).

-----


most forks do copy-on-write and every access to a python object meddles with reference count, which is - you guessed it - a write.

-----


The context of the copy-on-write win was the bytecode for modules. I don't know that you'd have anybody meddling with the reference counts for that...but I haven't really looked.

-----


Bytecode is stored in function objects which IIRC are reference-counted.

-----


Aside from ad-hoc classes (defined within the scope of a function, say), does it ever make sense to garbage-collect a module or a class?

Say you do "import smtplib" in your main file. Now that module is imported -- forever. I don't know the internals of Python well enough, but I bet that the module reference has strong references to its contents, so that even if nobody is actually calling anything in smtplib, it will be there in case someone does. The same should be true about modules importing other modules; they stay visible at the scope-level, so they are permanently loaded.

So for those cases it would make sense to keep them separate from global garbage collection. I'm pretty sure that the method tables of all the classes in the system take up considerable space, probably in the order of megabytes for many apps.

-----


That's a valid point, but you can do things to mitigate this, like splitting up your program into processes that have different dependencies. For instance, if you have a UI process and a core logic process, you won't need to import your giant UI library twice.

-----


If you split the code horizontally instead of vertically you have to deal with IPC, which could be problematic (and slow). I would do that only if there was already a clearly defined interface between the modules, not just for the sake of concurrency. Moreover, taking into account growing number of cores in modern machines such strategy doesn't seem very future-proof.

-----


Or you could just use a language whose implementation doesn't blow.

-----


And suffer from 5x slower development time. Yeah your deadline now has to be extended from 2012 to 2015 but at least the language is FAST and is technically excellent, right?

-----


False dichotomy...

-----


Thank you for making this point. To my great dismay I see more and more people think that you need concurrent threads in one process in order to scale. It seems as if many programmers believe this as scripture and subsequently have difficulty thinking about distributed systems (not just multi-core, but multi-host).

-----


This is a bit like saying, "Of course C has full type safety. Just use a Haskell implementation written in C." Both your statement and his are technically accurate, but yours is on a subtly different topic.

-----


How is it a bit like that? He complained that Python didn't scale to lots of cores because of the GIL; I said his Python didn't scale because of threading, which interacts badly with the GIL. If you use a shared-nothing approach to parallelizing your Python code, you don't run into that problem. It's not as if shared-state threading happens magically — you have to rewrite your code to use it, too.

-----


You are playing with words here: of course you can run multiple python instances to scale your application on multiple-cores, that's a trivial statement. But I was talking about python the interpreter (more exactly cpython). There are legitimate cases where multi-threading is the natural, elegant solution, and cpython, mostly because of reference counting, prevents that.

-----


CPython prevents you from using shared-state multithreading to scale your application on multiple cores, but it doesn't prevent you from scaling your application on multiple cores. That's not "playing with words".

It is indeed unfortunate when the limitations of our platforms force us to contort our code to improve performance, but that is just as true of multithreading as of multi-process programming. The difference between the complexity of the two is small.

-----


Python doesn't do code analysis to determine when it's effectively doing obj.refs++; obj.refs-- repeatedly.

This sort of analysis is useful, and if the interpreter had been designed to do any optimization along with jitting, would probably come nearly for free. Reference counting could be far far cheaper than it is in python. (How much cheaper? I don't know - it'd need work to figure it out)

-----


Recently, there has been some work on removing redundant reference count operations in the Python interpreter. The following paper describes how it can be done: http://portal.acm.org/citation.cfm?id=1869631.1869633.

Regarding the performance impact of reference counting, the following facts are important:

- Switching from immediate reference counting to deferred reference counting (L.P. Deutsch and D.G. Bobrow, 1976 [1]) eliminates about 90pct of all reference count operations in Smalltalk (Berkeley Smalltalk '82, that is) [2]

- A very good account of reference counting can be found in either Dave Ungar's excellent PhD thesis [3] and Dave Ungar and Dave Patterson's in-depth analysis of Smalltalk performance [4].

[1] An efficient, incremental, automatic garbage collector (http://www.cs.umass.edu/~emery/classes/cmpsci691s-fall2004/p...)

[2] High performance storage reclamation in an object-based memory system (http://techreports.lib.berkeley.edu/accessPages/CSD-84-167.h...)

[3] The Design and Evaluation of A High Performance Smalltalk System (http://www.eecs.berkeley.edu/Pubs/TechRpts/1986/5376.html)

[4] Berkeley Smalltalk: Who knows where the time goes? (Chapter 11 of http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/)

-----


This sort of analysis does not come easily. Say you have:

    def bar():
        return some_constructor()

    def foo():
        b = bar()
Here the decrement is in bar() and the increment is in foo(). You have no way to elide the operation without doing inter-procedural analysis, which is hard.

-----


You don't need to catch all the cases to see a big improvement. Just the most common ones.

-----


The problem is in a language like Python, where everything is broken up into little functions, you're not going to catch the most common uses without doing inter-procedural analysis.

-----


C and Java apps with high contention don't scale well onto multiple cores, either.

The key to speed is to not share state, which Python can do fine. It's called fork.

-----


Sometimes you cannot easily not share state. The fact that you can use multiple processes to scale is not an interesting statement when comparing python to other languages because it is true for every language out there.

I think the GIL has made people in the python community too defensive: the GIL does not prevent from building scalable architectures in many cases, but it still sucks, and it would be better without. That's a limitation (and a tradeoff because it made development and integration with C easier). And there are scalable architectures based on threads (example: http://www.mailinator.com/tymaPaulMultithreaded.pdf) - "thread suck" has became a meme which slightly bothers me in general. Not a panacea, but a good solution when applicable.

-----


I thought the key was not managing mutable state. Sharing lots of immutable data can probably shave a lot of performance over ipc.

-----


> Sure Harvard may employ people who are titans in their field, Nobel Prize winners and the like, but unless you are going to Grad school there it won’t affect you any. They won’t be teaching you, TAs will.

Is this actually true? I don't attend Harvard, but I do attend a private university in the top 10 of the US News and World Reports rankings, and I don't believe there is a single CS course taught by a grad student (although some are taught by "lecturers". Some of the classes from the core curriculum are taught by grad students, but that's not actually necessarily a bad thing. Both of the grad students who have taught courses I've taken have been excellent teachers, which can be much more important for a teacher of introductory calculus or such.

-----


> I don't believe there is a single CS course taught by a grad student

Actually, this is incorrect. Although I'm not aware of any courses taught by grad students this year, I now recall that one of the lecturers also taught while a Ph.D. candidate. However, only a small minority of CS courses here seem to be taught by grad students.

-----


The real issue is that Python's for construct doesn't create a new lexical scope for the body.

In either Perl 5 or Perl 6, both of which use mutable variables, the corresponding code does the right thing (although Perl 5 only does so if you actually use a newly defined lexical: as in "for my $m ('do', 're', 'mi') {...}").

Unfortunately, Python's implicit declaration of lexicals means that it would be difficult to do a for loop with an iteration variable that isn't lexical to the loop body if that was the default (where would you put the "nonlocal" statement?).

-----


> The real issue is that Python's for construct doesn't create a new lexical scope for the body.

Yes and no, that's what leads the author to notice the "issue", but it does not actually solve the problem (if closing over a mutable environment is to be seen as a problem), it just pushes it back.

It can be argued that loops is the most common case where this is a problem (by quite a long shot) and working around that issue (by creating statement-level scopes or by using internal iterators [0]) would lead to the vast majority of users extremely encountering it. But still...

[0] internal iterators being the reason why Smalltalkers never encounter that issue, and rubyists and modern JS-developers (using Array.prototype.forEach or equivalent library-specific features) rarely do.

-----


There are certainly other potential issues with closing over mutable variables. However, if Python's variables were immutable, it would have to create a new scope for the for body(either with statement-level scopes or by using internal iterators/higher-order functions instead of for statements).

-----


> There are certainly other potential issues with closing over mutable variables.

I don't really like your use of the word variable here. Do you mean objects or bindings?

> However, if Python's variables were immutable, it would have to create a new scope for the for body

(I'm guessing bindings) Sure, but it's not like immutable bindings are required for that, just go the internal iterator route and you're set for most cases.

-----


I use "variable" to mean "binding" exclusively.

> Sure, but it's not like immutable bindings are required for that, just go the internal iterator route and you're set for most cases.

Right. That's why in my first comment, I said that not creating a new lexical scope for the loop body is the real issue here. I didn't make it especially clear that it doesn't matter whether you do that using an internal iterator or by giving for statements a statement-level scope, but of the two examples I gave of languages with mutable bindings and the "correct" (IMO) behavior here, Perl 5 uses the statement-level scope approach, while Perl 6 uses the internal iteration approach(for is very light syntactic sugar for map).

I believe I misread your previous comment as suggesting that mutability was the key issue here. In response to that, I wanted to point out that even in a situation with immutable bindings, one still needs to have each iteration's binding be a different binding to deal with this case.

-----


Why is this a problem at all? Isn't closing over a set of mutable variables the standard way OOP is shown to reduce to higher-order functions?

I have a great degree of respect for Manuel, and I read his comment, but I still don't see how sticking with only copying free bindings is a problem. That's how function calls in Python/C#/whatever work anyway -- when you pass a parameter in, it's just the reference that's copied, so programmers are familiar with it already.

-----

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: