Hacker News new | comments | show | ask | jobs | submit | Jabbles's comments login

You can't "technically" do that. Which "rule" (from which rulebook) do you think allows you to arbitrarily repeat structures?

This is called center embedding, and it's actually the subject of quite a lot of research. Some of this research has led to a distinction between grammatical (which a sentence like "The cat the rat the dog bit chased escaped" may be) and acceptable (which such a sentence is not). Other research has deemed it agrammatical and unacceptable.

Here's a helpful presentation summarizing some of this research: https://depts.washington.edu/lingconf/slides/Bader&Haeussler...

The sentence the comment the user replied to included parsed with difficulty.

There's no rulebook when it comes to syntax (grammar), unless you're a prescriptivist. There is plenty of precedent for embedding subclauses (e.g., "Here is a sentence, that I just made up, which has an embedded subclause."), but -- beyond anecdotal data -- there is no hard cut-off point to how many times you are permitted to embed. Empirically, most people don't like more than one level, but plenty of people allow it; thus it becomes more a function of cerebral load than it does an argument of permitted structures.

If you're arguing over technicalities, you are inherently adopting a prescriptivist approach.

There are two possible approaches to your sentence and neither renders it grammatically correct. A prescriptivist would dismiss it as not being allowed by any rules while a descriptivist would point out that it defies both common usage and intelligibility.

I disagree, and FWIW, I find your tone hostile.

A prescriptivist would say it follows allowed grammar rules but is still unintelligible.

A descriptivist would say it follows common usage patterns but takes it to such an extreme as to be unintelligible.

You see this pattern more often with proper names:

- That's the house Jack built.

- That's the woman the house Jack built fell on.

- That's the lawyer the woman the house Jack built fell on hired.

- That's the briefcase the lawyer the woman the house Jack built fell on hired carried.

At what point does it become "incorrect"? When it becomes unintelligible to most native speakers?

It's also worth noting these sentences seem much stranger in writing than they do when heard. In writing, most sane people will use punctuation in the examples above. But in speaking, our tone of voice conveys enough context to use more streamlined structures.

> At what point does it become "incorrect"? When it becomes unintelligible to most native speakers?

Beyond 1 level of nesting. There's no rule allowing arbitrary nesting, nor do people commonly nest beyond 1 level.

I'm not sure where you saw hostility in my comment.

By and large, the rules I learned in various English and grammar classes, beyond the initial sentence structure, were prohibitive rather than permissive; there's no rule explicitly _allowing_ arbitrary nesting, because why would need a rule to specify that, but there's also no rule that I'm aware of that _prohibits_ arbitrary nesting.

In fact, the "rule" that matters here is general sentence structure. It's fine to embed a single secondary clause inside a primary clause:

  That's the woman [the house Jack built] fell on.
The structure of the inner clause has no further restrictions on it than the structure of the outer clause. Thus, the entire above sentence is perfectly valid as an inner clause to a new outer clause:

   That's the lawyer [the woman <the house Jack built> fell on] hired.
There's nothing in any grammar book that I'm aware of explicitly prohibiting infinite nesting of this sort; it's just silly and quickly becomes unintelligible. Though I definitely take Tenhundfeld's point that this sort of structure is a lot more common in spoken language, and in fact it's a lot more easily comprehensible with the aid of vocal cues and tones.

Can you point me to the rule disallowing arbitrary nesting? I'm not aware of a rule existing in either direction, besides vague guidelines to write clearly.

Also, which example is 1 level of nesting? I think "That's the woman the house Jack built fell on." is 2 levels nesting, and I frequently hear constructs like that – in speaking. I agree it's rarer in writing though.

For example: What server is that? That's the server the consultant Jack brought in set up.

I'd never write that, because it's confusing. But I will naturally say things like that, without any problem in clarity, because you have more granular control over grouping and pausing, etc. If I said a sentence like that, you might hear it more as "That's the server the-consultant-Jack-brought-in set up."

Anyway, sorry if I misread your tone. It just came across very much as "I know what's right, and you're wrong." That might have been me projecting though. :)

There's empirical work on this [1] (alas, paywalled): multiple center embedding in speech is rare but exists. It is more common in writing.

There are examples of 2 levels of center-embedding which are clearly grammatical and not hard to understand:

> Anyone [who saw the woman [who committed the crime] at any point] should be questioned.

Others are very hard to understand:

> Anyone [who the man [who the woman shot] killed] should be questioned.

There are many factors that make a sentence hard to understand, not just degree of embedding. Some degree 2 sentences can be easy, and some degree 1 sentences can be hard (e.g. "the horse [raced past the barn] fell", which has only 1 level of embedding).

So if you want to describe English with the rule "no embedding beyond 1 level" then you're going to miss a lot of valid sentences while also failing to rule out lots of invalid ones.

Contrast this with a valid descriptive rule of English, such as "determiners such as 'the' always come before the nouns they modify". I can say "the man died", but "man the died" is never comprehensible under any circumstances and cannot be made better.

No one would say that a sentence with 10 levels of embedding is perfectly fine English. But descriptively, you have the problem that there is no non-arbitrary upper bound on embedding degree: it's not clear when one more level of embedding takes you from grammatical to ungrammatical, and furthermore it depends on the contents of the embedded clauses.

There is also some work I'm aware of, currently in prep for publication, showing that the extent to which people can understand multiply-embedded sentences strongly correlates with IQ (measured by Raven's matrices).

All this suggests that the simplest descriptive rule is to say that English allows center-embedding recursively, but that there are human processing limitations on sentences that create very deep stacks. The "human processing limitations" part is basically the whole field of psycholinguistics: many details have been worked out, but many details are still sketchy.

[1] Karlsson (2007): http://journals.cambridge.org/action/displayAbstract?fromPag...

Thanks for sharing the research, it's fascinating that there has actually been serious investigation of this.

That being said, I still think the rule of 1 level still makes sense (not that all 1 level embeds are valid or easy, but that all embeds beyond 1 level are invalid).

> Anyone [who saw [the woman who committed the crime] at any point] should be questioned.

That sentence has two complementisers, which the OP's sentence specifically did not. Introducing complementisers completely changes the rules and I think makes embedding much more possible.

> Anyone [who the man [who the woman shot] killed] should be questioned.

Even with the brackets, I don't understand this one. Anyone killed by the man should be questioned?

(Also, shouldn't they both be 'whom'?)

Saying it stops at level 1 implies the valid rule of embedding is a special case itself of an invalid general rule, rather than it being a general rule itself. If the special case is the most common case, does that really imply only the special case is valid? Why?

I would argue there's a large grey zone. It isn't binary, I'd say it is more of a spectrum between prescriptivism and descriptivism. After all, grammar and syntax isn't fully randomized even if largely arbitary. We make certain choices (even if we aren't aware of it) prescriptively which we try to teach each other (even if we aren't aware of it), and the mixing and noise creates a set of choices in the language actively in use that simply needs to be described for what it is. There's recurring patterns of various sources.

And if they allow one sentence to be understood by members of the intended audience that knows the common rules and context and usage well, then that's as valid as it gets. Even if only very few can understand it, the fact that it is possible is enough.

A bit like HTML in browsers - there's the written rules and there's what browsers do anyway, and what's valid in real life is whatever gets the point across.

> If you're arguing over technicalities, you are inherently adopting a prescriptivist approach.

How so? Surely there is room for debate within descriptivism as to what constitutes evidence of something being used?

> There's no rulebook when it comes to syntax (grammar), unless you're a prescriptivist.

And no two prescriptivists can agree on which rulebook to use!

Well, you can go from

"The cat meowed"


"The cat the mouse feared meowed"

If your claim is that nesting to arbitrary depths is not allowed, what is the maximum depth that is "legal"?

My guess is when the parse tree exhausts short term memory. I have no problem reading the sentence with a single nested clause ("the cat the mouse..."), but the one with a doubly nested clause is unreadable ("the dog the cat the mouse..."). Greater nesting levels might be grammatical in an abstract sense but is extremely difficult for people to understand, much like how it's possible to come up with legal C code that results in indeterminate behavior.

Probably neither the Chicago manual of style nor my Sunday school grammar book would condone such writing.

Google is replacing PageRank with RankBrain and when this is complete they won’t know why certain pages are offered as the best results.

Google is well aware of the problems of machine learning:


What rule allowed him to use the word "goer"?


Here's the thing– the "rules" of grammar only really exist to facilitate effective communication.

Every so often, a skilled communicator chooses to bend a rule in order to make a point. If the point is well made, it sticks, and people start imitating it.

Next thing you know, the newly-coined word becomes an "actual" word with real currency. (I put "actual" in quotes, because if you think about it, all words are made up. All "rules" are really just general guidelines.)


All the rules are made up too! Like fashion, language is one of those things where popularity rules.


> (I usually count all forms of a word, like “kick” and “kicked,” together as one word, although there are a few special cases where I don’t.)



So on a whim, I put some system documentation that I wrote into the simple word writer linked on that page (xkcd.com/simplewriter). So many red words... perhaps that's a challenge suited for another day.


I'm not sure, but I think we can all agree "US Space Team" is an infinitely better name than NASA!


But I'm pretty sure "US" is only allowed because it matches the common word "us", rather than as an acronym for a nation.


Presumably proper names are exempt from the count.

Of course now that you've raised the point, I'll probably have to waste some time today working out a simple word name for myself.


It would seem that he did not allow himself proper nouns:

> I found myself wishing that Munroe had allowed himself a few more terms—“Mars” instead of “red world,” or “helium” instead of “funny voice air.”


I guess nounification of verbs is allowed.


Maybe he can mentor us on how to do that. /rimshot


The rule that adding an -er to a word you get the agent.




I would guess he concerned himself more with stems than surface form.


Except "fill" is allowed, but "filler" is not. I think he cheated.


If he uses regular rules for allowable modifiers (e.g., "-er" can be added to a verb that is in the base list, to form a noun that is refers to the agent of the action described by the verb, then "filler" would be allowed if "fill" but not "filler" was on the list, but only to refer to the actor who causes something to be filled, rather than thing with which something is filled.)

I certainly think its clear (and even implicit in his own descriptions) that the exact methodology is not disclosed, but I don't think its far to say simply "he cheated".

Depending on context, filler is a much more complicated concept. You could be referring to stuff they put in food that is not food, or stuff they put in cracks to fill them in, or a nozzle at a gas station.

By contrast, go is such a foundational word to English that it appears in English reading primers for toddlers, and a goer goes.


grow, grower.

The point is, he's not simply stemming. He cheats.


Beyond what the other comments said, that was the title of the very first diagram he made like this, and he was probably just trying to pick a punchy title.


Doers gonna do.



1. How often do you insert into a slice? You know that's a O(n) operation, right?

2. nil is typed, I'll grant that it's not the most obvious part of Go's type system.

3. Meh.

4. This is type covariance. It's complicated and hard to get right. The Java Generics FAQ looks like this: http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.ht...

5. I like by-value loops. Everything else in Go is by-value.

6. Yeah, fuck everything we know about parsing, let's make whitespace significant, then we can declare slices like:

    x := []int{
    len(x) == 3
    y := []int{1 -2 +3}
    len(y) == 1
7. Meh


That's not type covariance. Variance only comes up when you have generics, which Go doesn't, and subtyping, which Go doesn't really have. It's the lack of a coercion/conversion between slice-of-T to slice-of-interface.

The correct language-level solution to #4 is to introduce generics and to allow generic type parameters to have interface bounds. That would solve the problem without having to introduce variance.


I don't see how it's not type covariance.

Go's slices are generic to the extent needed, and the subtyping of Go's interfaces fits the requirements for ordering of types well enough.


In Go, if T implements interface I, T is not a subtype of the interface type I. T can be converted to the interface type I, but it's not a subtype.


What's the difference though? Is this true of interfaces in Java for instance, or does it have to do with the specific implementation of go?


Isn't it possible to talk about variance as soon as you have a type constructor (function from types to types)? In this case [].

Edit: I don't know go, but if slices give write access, I think the automatic conversion the author wants would be unsound.


Yes, a coercion would be unsound (at least, unless the coercion copied the backing store of the slice, which would be very unintuitive). It shouldn't coerce. I'm describing generic type parameter bounds, which are not the same thing.

(For what it's worth, I'm not sure I would bother solving this problem if I were suddenly put in charge of Go's design, given Go's extreme aversion to type system features. I'm just describing what the solution to this problem typically is.)


I don't know if you need to make whitespace any more significant in Go than it already is to implement 6. As you may know, Go inserts semicolons at the end of a line in the lexing phase if a token of the right type is the last one on the line (https://golang.org/ref/spec#Semicolons):

> • an identifier

> • an integer, floating-point, imaginary, rune, or string literal

> • one of the keywords break, continue, fallthrough, or return

> • one of the operators and delimiters ++, --, ), ], or }

The things you can put as elements of a slice, values of a map, or values in a struct literal are Expressions nonterminal symbols (https://golang.org/ref/spec#Expression). As far as I can tell, the tokens that can end an Expression are a subset of those after which semicolons can be automatically inserted.


This sums up my opinion on this pretty well.

On 1 I'll just say it'd be nice if append (and other varargs functions) could take multiple of individually listed and slice-expanded args at the same time. As in: `append(a[:2], 3, a[2:]...)`. I already expect append to be O(n), and this would be some very nice sugar.


Yes, the syntax here just looks so unnecessarily ugly. For all the talk of Pythonistas moving to go, I can't see myself switching over to a language with so many ugly, hard-to-remember warts.


Why do the standard paternity tests not notice when someone is effectively the child's uncle? The genes must match to a far greater extent than an "unrelated" person. Is that information lost between the actual test and the "yes/no/probability" result?


Brothers and sisters only share 1/4th dna. brother A gets 50% of dad's genes. Brother B also gets 50%, but not (necessarily) the same 50%. same process on the mothers side.


They share ½. Half-siblings share ¼ dna, because of exactly what you said. Full siblings share half of the half they got from their father, plus half of the half they got from their mother, for a total of ½.

An uncle and nephew, as in the article, would share ¼.


So it's 1/2 shared dna for siblings, not 1/4th. Only (I think, possibly) it's 1/2 on average instead of being exactly 1/2 as in the case of parents and children.


Ah, I did the table wrong in my head. You're right. 2 genes, one from father one from mother, sibling A along the top, simbling A down the side.

       MM MF FM FF
    MM  1 .5 .5  0
    MF .5  1  0 .5
    FM .5  0  1 .5
    FF  0 .5 .5  1
(4+ .5 * 8 + 0)/16 = 8/16 = 1/2


You've been misinformed. The 2009 version of Effective Go stated, as it does now:

"This approach can be taken too far. Reference counts may be best done by putting a mutex around an integer variable, for instance."



See the very first example of "channels" in that 2009 version:

"In the previous section we launched a sort in the background. A channel can allow the launching goroutine to wait for the sort to complete." ...

    c <- 1;  // Send a signal; value does not matter.
That's using a channel as a lock on shared data. Not seeing that in the new book. This is a step forward.

(What Go really needs is Rust's borrow checker and move semantics, so that when you communicate on a channel, the compiler checks that you're not sharing too much.)


Of course if you care about speed at all, you will stay away from Go's mutex's. Time your own code, it is shocking how slow they are. I haven't published my own test times, but with a quick search here's an example of a 10ms loop taking 2s with mutex's. http://www.arkxu.com/post/58998283664/performance-benchmark-...


A mutex held for up to 10ms on each iteration, over 500 iterations, averaging out to 2s doesn't seem surprising at all. In fact, this is quite expected. It would take considerably longer if the random sleep hit 10ms every time.

If you modify the code to release the lock before sleeping...

    func (c *Counter) add(ch chan int) {
            tmp := c.Num
            tmp += 1
            time.Sleep(time.Duration(rand.Intn(10)) * time.Millisecond)
            c.Num = tmp
            ch <- 1
...and reacquiring it after, it still finishes in 10ms on my machine, as you'd expect. As you can see, the problem isn't so much that locks are inherently slow in Go, but rather the mutex, as the acqusition function name implies, is doing what it is supposed to do: Lock. What is true is that you have to be careful to use them properly.


33 comments from 1 month ago: https://news.ycombinator.com/item?id=10237902


Thanks, good catch!


Typically if you wish to limit the number of goroutines you would spawn N workers and have them read from a single channel. If 20k of your incoming connections want to do something they send on the channel, without spawning a goroutine themselves.

Did you try something like that?


Yep, this is what I meant by 'goroutine pools'. The select statements were on the sending side to ensure if the feed channel was full we wouldn't retain too much additional state. It works, but at that point its starting to look like an async event-loop with a thread-pool....


How do these 20k connection feed the channel without being themselves managed by goroutines ?

One thing I wish was possible in go is being able to use the `select` keyword with both channels and IO.


Not if it's a cube:

Volume = 1e-4 ^ 3 = 1e-12 m^3 Assume density 1kg/l Mass = 1e-9kg Velocity = 8000m/s KE = 0.032J

Mass of bullet (https://en.wikipedia.org/wiki/Physics_of_firearms)

A .44 Remington Magnum with a 240-grain (0.016 kg) ...(360m/s)

Which has 1036.8J of kinetic energy.

So the ratio is actually about 32400 in favor of the bullet. But note that this will change with the cube of the size mentioned, so even though this is exaggerated, a fleck of paint 3mm across would make the quote reasonably accurate.


Timothy Gowers has a few thoughts on this:




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact