
LINQ Ruined My Favorite Interview Question - scottcha
http://scottchamberlin.tumblr.com/post/55152416452/linqinterview
======
llambda
I hate to be the bearer of bad news, but I think there may be even simpler
solutions to this problem:

    
    
        (take 10 (reverse (sort-by (comp first rest) (frequencies (string/split ... #"\+s"))))
    
    

The above is a Clojure one-liner example that I believe satisfies the original
problem. So while LINQ may have simplified from the C-language family
solutions he had seen, it's clearly possible to take it one step further with
the expressivity of modern languages like Clojure...

Edit: remember to sort! (Forgot my coffee this morning...)

Edit again: and aphyr's solution is even more concise and idiomatic, where `s`
is the first paragraph of the blog post:

    
    
        => (->> (string/split s #"\s+") frequencies (sort-by val) reverse (take 10))
        (["I" 7] ["to" 6] ["the" 5] ["a" 5] ["of" 4] ["candidates" 3] ["is" 3] ["question" 3] ["in" 3] ["their" 2])

~~~
shill
Oh, we can use our favorite language in our new job? Here is a Python
solution.

    
    
        from collections import Counter
        Counter(s1.split(' ')).most_common(10)

~~~
untog
Well, sure, once you're allowed to use external libraries anything is a one
line solution. In JS:

    
    
       doStuff = require("doStuff");
       var result = doStuff(theString);
    

isn't JS so efficient _?!?_

~~~
shill
Without imports:

    
    
        d = {}
        for word in s1.split(' '):
        	try:
        		d[word] += 1
        	except KeyError:
        		d[word] = 1
        print [(x, d[x]) for x in sorted(d, key=d.get, reverse=True)][:10]

~~~
spenuke
I'm a newbie, but the question seemed approachable so I went for it. This is
what I came up with. (Python, btw.)

    
    
      def top_ten(s): 
        words = s.split(' ') 
        word_list = set(words) 
        return sorted(word_list, key=lambda x: words.count(x))[:10]
    

The question didn't ask for word counts, so I didn't see the need for a
dictionary. I'd appreciate any advice on my solution. I'd be thrilled if I'm
not too far off from being capable of starting to apply for jobs.

~~~
matchu
The need for the dictionary shows up when your text gets significantly large.
Each call to `words.count` is going to re-examine each word of the text to
count 'em up, so, if n is the number of words in the text, and m is the number
of distinct words in the text, then this solution is at least O(mn + n
_log(n)) whereas the dictionary-based solution is O(n_ log(n)). That is, we're
re-reading the word list over and over, whereas the dict-based solution only
reads it once. It's therefore more likely to be more efficient.

But I like the readability of this solution and there's a strong argument to
be made for it on that basis, especially if the string is short. If this were
a job interview, this would be a totally acceptable solution, though it'd be
important to be able to discuss why other solutions might be faster and why
you prefer this one anyway.

~~~
kyllo
Isn't the difference between O(mn + nlog(n)) vs O(nlog(n)) running time going
to get _less_ significant as the value of n gets larger?

I thought the whole point of Big O / asymptotic analysis is that you can
ignore lower-order terms and constant factors because they are insignificant
for any appreciably large input size. And also because the lower order terms
and constant factors vary too much depending on the programming language, the
compiler or VM, the hardware, etc.

~~~
spenuke
I'd never heard of Big O until this thread, but from what I can tell it'll
only be when a solution is in the O(logn) that "running time will get less
significant as n gets larger". This is simply the way you describe logarithmic
growth, so maybe that's where you got confused. Also, wouldn't it be fair to
say that a logarithm (nlogn) is a lower order term than mn? In which case your
definition stands.

At any rate, I wanted to test this out, so I made a naive benchmark for
running these functions. The dict solution was ten times faster (0.0011s vs
0.015s) than the list version with ~1350 words. The dict solution ran in 0.13s
at ~162,000 words, while I waited a couple minutes before killing the list
version on that input.

~~~
kyllo
Oh, you're right, m is not a constant but also varies somewhat independently
of n, so you can't exclude it from the big O notation. And whether mn or nlogn
is the lower order of the two terms depends highly on the value of m, which is
the number of unique words in the text. A really long text that's just the
same word over and over will have a big n but a small m.

~~~
matchu
Worst-case big-O runtime is therefore O(n^2), and best case is O(n log(n)) :)

------
overgard
LINQ tends to get thought of as "database syntax sugar", but it's way more
than that. It's C#'s version of the lazy collection operations you find in
most functional languages, just given friendlier sqlish names. (IE, "Select"
instead of "map" and "Where" instead of "filter")

I almost feel a bit gross when I have to write a "foreach" loop at this point,
because there's almost always an equivalent way to do it in LINQ (although
it's a tradeoff, as the one downside of LINQ is that given it's deferred
nature, it's harder to debug).

~~~
untog
_I almost feel a bit gross when I have to write a "foreach" loop at this
point_

Agreed. I've transitioned to spending most of my time in JS, and wherever I
can I use .map(), but the chaining it's not quite the same as LINQ. Someday I
intend to write a library of Array addons to provide GroupBy and so on, but I
can't imagine it'll be super efficient.

~~~
aaronm67
In Underscore, you can do something like:

    
    
        _([ ... ]).map( ... ).filter( ... ).value()
    

It's not exactly extending the native array...but there are far fewer side
effects to doing it this way.

~~~
WickyNilliams
The problem with this is that it's doing multiple iterations. LINQ on the
other hand builds up an expression tree that is "compiled" into a single loop
when `ToList()` (or some other method which gets the results) is eventually
called.

------
joshuaellinger
I am a big fan of LINQ but it gives and takes away complexity at the same
time.

It makes a 'whole class of things that you would have to do with loops' go
away. Once you are comfortable with the syntax, it makes code a lot more
readable.

But you lose track of when and where things are getting executed.

For example, it is easy to make something that you intend to execute inside of
SQL server run inside of C# code. And then, all of a sudden, string comparison
is case-sensitive.

You wind up having to context switch between procedural and set mentality
without the same kind of visual cues you used to get.

~~~
kryten
It's also really easy to shoot yourself with unless you understand everything
and think each problem through.

Four killer issues I've seen so far:

We had a major production performance issue which turned out to be a stray
ToList which was causing a massive memory ballooning. Didn't get noticed in
test as the test cases passed but it hit prod and 2000 users bashed it and
tried to allocate 20Mb each causing our cluster to shit a brick.

Null reference exceptions! There are so many dereferencing operations in an
average LINQ expression that you really have no idea which one is blowing if
it goes pop in production in release config.

People using .Single(...) and getting more or less than one result back. So
frustrating.

If you push an IEnumerable<T> over an interface boundary the performance and
memory semantics are not preserved and you end up with a leaky abstraction.
These are shits to resolve. Example: queries executing inside the view which
is outside the transaction scope.

We've had to ban it in some circumstances.

~~~
usea
I agree with your pain points.

We try to use ICollection in our APIs instead of IEnumerable, since the latter
can have surprising semantics like being a wrapper for some operation which
may not be valid anymore, or might be slower than you expect to do things like
.Count(). IMO it's really not best for transporting across interface boundries
in the most common case; only when you're specifically trying to avoid having
the whole collection in memory or something like that.

Another thing that can help is this wonderful May<T> library[1]. It a great
option type[2] for .NET. It helps make operations more composable.

[1] [https://github.com/Strilanc/May](https://github.com/Strilanc/May)

[2]
[http://en.wikipedia.org/wiki/Option_type](http://en.wikipedia.org/wiki/Option_type)

------
moomin
I hate to be the arrogant know it all on Hacker News, but seriously, if you're
writing c# and not using LINQ all the time, you need to catch up.

I'm tired of seeing people answer interview questions with anything _other_
than LINQ.

~~~
corresation
_but seriously, if you 're writing c# and not using LINQ all the time_

You don't sound arrogant, but rather sound naive. I agree that someone who
recruits surely should have known about and have experienced LINQ
significantly by now, but the notion that you should be using it "all the
time" is absolute nonsense.

I avoid LINQ. I encourage others to avoid LINQ. It is almost always a sign of
bad code.

LINQ is syntactical sugar over basic set operations. It is perfectly fine if
you're doing naive activities, such as the example give -- a contrived example
of brute forcing a problem, where two approaches of the same very basic need
unsurprisingly yield the same complexity -- but it _falls apart_ in real-world
persistent code with considered algorithms and storage. In real long term
code, it is usually the canary in the mineshaft telling you that the
developers aren't using proper algorithms or storage.

In many large-scale projects it invariably turns into the performance
_nightmare_ that ends up causing whole rewrites.

Again, not because of LINQ itself, which of course can do basic operations
like grouping and sorting as quickly as you could do "by hand", but that it
makes it _so easy_ to do those things that developers start to resort to that
as a catch-all magical solution that is costless because how much could a line
or two of code cost? -- a master List<stuff> that they just sort and group by
and select from all over the code (O(n) * O(n) * O(n)...why not?). The end
result is that the data structures and encapsulation that should have happened
never did, so while it might seem more concise and obvious on a one-to-one
comparison with a loop perspective, _neither_ case should ever have happened.

LINQ is the gun by which a lot of terrible programmers are repeatedly shooting
themselves in the foot with, all while gloating about their concise code.

~~~
Strilanc
(Note: When I say LINQ I am referring to the functional style it encourages,
not the query syntax. The query syntax is nice, but it's just a trivial
syntactic transformation.)

Correct me if I'm wrong, but the world is moving _towards_ functional
programming (i.e. LINQ) not away from it. Personally, I find LINQ far, far
easier to read, write, and analyze. (On the other hand, I understand the
deferred semantics and watch for warning signs like enumerating a sequence
more than once.)

Honestly, a C# company __avoiding __LINQ sounds to me like the canary in the
mineshaft telling you the company has programmers falling behind the times and
doing things the hard way.

~~~
moomin
I'm definitely in agreement with the last point. However, I originally said
that good C# code contains a lot of LINQ, not that C# code with a lot of LINQ
is necessarily good. Deferred execution is something that confuses people, but
frankly, it's a concept they need to learn. It's been in the language since
yield return got added in 2.0.

LINQ's deferred execution was the right choice for performance, but it's the
hardest one. You've got to know when to call .ToList(). I'm not denying that
I've seen people evaluate the same expensive list 100 times, use a join when
precomputing a Dictionary would have been much faster, close a connection
before the result is actually evaluated. But I've never seen C++ programmers
say you can make mistakes with pointers, so don't them.

Clojure's lazy sequences are guaraanteed to evaluate once, but that comes at
the expense of storage. In particular, a query like the one in the original
blog post will evaluate multiple intermediate lists that then need to be
thrown away. And indeed, they've introduced reducers to address this, which
behaves more like LINQ.

~~~
gngeal
_In particular, a query like the one in the original blog post will evaluate
multiple intermediate lists that then need to be thrown away._

Why not fuse the operations, if the values are immutable?

~~~
moomin
Well, there's two ways to do that in Clojure: write your own list processing
(usually regarded as bad style) or use reducers. Both are appropriate in
specific instances, but the fact remains that the lazy seq code above is the
idiomatic way of doing it in Clojure. So the _default_ way of doing it is
slower than LINQ's default way of doing things.

It wouldn't be hard to make LINQ behave like Clojure, either.

------
kephra
/me wonders

are candidates allowed to chose their favorite language?

man bash | tr '[:upper:] ' '[:lower:]\n' | sed '/^$/d' | sort | uniq -c | sort
-rn | head | awk '{ print $2 }' | fmt

or do you only hire windows coders?

~~~
epistasis
This is how I do this sort of thing all the time. But, the sort is O(n log n),
so it's asymptotically less satisfying.

~~~
kephra
True, a bag of words would perform better.

man bash | tr '[:upper:] ' '[:lower:]\n' | awk '/./ { bag[$1]++ } END {
for(word in bag) { print bag[word], word } }' | sort -rn | awk '{ print }
NR>=10 { exit(0) }'

But it would require more typing and thinking.

 _sure_ one could do this in awk completely,

    
    
      man bash | awk '
      /./ { 
        for (i = 1; i<=NF; i++) { 
          bag[tolower($i)]++
        } 
      } 
      END {
        for (i = 1; i<=10; i++) {
          score=0;
          for(word in bag) {
            if (bag[word] > score) {
              score=bag[word];
              best=word
            }
          } 
          printf "%s ", best
          delete bag[best]
        } 
        printf "\n" 
      }'
    

if the requirement is: _please_ chose one language and not the complete Unix
babylon.

------
azurelogic
I had a class where we built a DB engine from scratch, and I ported my code to
C# just for LINQ and proper list handling. While it's not the most efficient,
it reduced many of the complex sections down to a few lines, like reordering
the values being inserted to match the order necessary for a record by
comparing the schema to the parameter list in the insert statement.

If you like LINQ and wish it were available in JS, look at underscore or lo-
dash.

------
mythz
LINQ does make C# more readable but it doesn't add much value over normal
functional collections which usually end up more concise, simpler and easier
to reason about - visible in my Dart port of C#'s 101 LINQ samples (which are
also lazy):
[https://github.com/dartist/101LinqSamples](https://github.com/dartist/101LinqSamples)

performance of Linq 2 Objects is not that great either and it doesn't add much
value readability-wise over rewriting the same task in other dynamic
languages:
[https://github.com/dartist/sudoku_solver](https://github.com/dartist/sudoku_solver)

------
tolmasky
My main problem with LINQ is that it seems to perform terribly on mobile. I
was looking for map/reduce/etc type functions for C# in Unity, and thought I
found it with LINQ. To my dismay, LINQ creates so many crazy intermediate
objects to pull off its "laziness" that our GC high water mark was being
crossed all the time. I went and just reimplemented everything from
underscore.js in C# and got way better performance. I imagine this is probably
not an issue on desktops/servers.

~~~
gecko
It's not mobile; it's old versions of Mono having a completely shit GC. Do you
know whether Unity has upgraded to SGEN yet?

~~~
tolmasky
I don't believe so, I know we had to use the same pool workarounds that
everyone seems to need due to this issue

------
T-zex
The interviewer doesn't see the difference between the LINQ and extension
methods and he hasn't provided a single line of LINQ in his post.

~~~
jmcqk6
You're confusing the syntax with the functionality.

------
Shish2k
In the vein of [http://xkcd.com/353/](http://xkcd.com/353/) :

    
    
        >>> from collections import Counter
        >>> Counter("here are some words here are".split()).most_common(3)
        [('are', 2), ('here', 2), ('words', 1)]

------
gboudrias
What is it with people's obsession over lines of code?

As someone who doesn't do C# or LINQ, that second solution seems to me like
someone really wanted to have as few lines of code as possible.

I don't claim to have an impressive programming pedigree, but while I take
simplicity and performance into account, I never take "conciseness" into
account. Conciseness usually means "this is opaque as shit but at least it's
short". And who really cares about short? What's the purpose of "short"? None
that I can find, other than impressing interviewers. Anyone who believes
otherwise should probably be writing in Clojure or Haskell (and probably is),
but I personally just don't see the point.

But that's just my opinion. My favorite language is Python.

~~~
kragen
Less code is less places to insert bugs and less to read. The majority of time
spent "writing" software is actually spent reading the existing code, so "less
to read" is really important. "Concise" does not mean "short"; it means "short
and clear". Obviously if your short code is opaque or bug-prone then you're
defeating the purpose.

------
jastr
The O(n) solutions sound way more interesting!

1\. Iterate through all key,value pairs once, keeping track of the 10 most
common 2\. Run quickselect 10 times 3\. Coolest (and an interview question in
it's own right) - modify quickselect to return the top 10!

------
ExpiredLink
> _“Return the top 10 most frequently occurring words in a string.”_

...

> var words = s1.Split(' ');

Wrong. Yet another example where an interviewer cannot correctly solve his own
questions.

~~~
danabramov
Care to elaborate? It probably lacks punctuation characters, case-
insensitiveness and RemoveEmptyItems option, but is there anything else
missing?

~~~
jameshart
中国四分之一地区六亿人受雾霾影响 contains more than one word.

~~~
cema
Depends on the definition of a word. I18n is normally hard, and likely would
make for an exciting discussion instead of just a tech interview.

------
A1kmm
I'm not sure that language features making code more succinct really ruin the
question.

The C# isn't even that succinct compared to doing a similar thing in other
popular languages. For example, in Haskell:

    
    
      topTenWords :: String -> [String]
      topTenWords = take 10 . map fst . sortBy (flip (comparing snd)) . map (\l -> (head l, length l)) . group . sort . words

~~~
tome
flip (comparing whatever) is a neat trick! I'll have to remember that.

For \l -> (head l, length l) I tend to use head __* __* __* length (without
the spaces between those stars).

------
lubomir
The LINQ really is much more succint, but the original code did not set the
bar very high. Why should one write a 12 line comparing function when using
'b.value - a.value' would work pretty much the same (unless C# really requires
comparison to return -1/1 instead of any negative/positive integer, which
would be fixed by a sign function).

~~~
ajanuary
Or do:

    
    
      private static int CompareKVPByCount(KeyValuePair<string, int> a, KeyValuePair<string, int> b)

{ return a.Value.Compare(b.Value); }

Or even:

    
    
      kvpList.Sort(kvp => kvp.Value)
    

But that would probably be straying into authors "list of language features
I've completely ignored for the last 5 years"

------
dbaupp
I think the analysis may've been skewed by the outlying points. They are
almost certainly "high leverage points"[1] and so possibly exert an undue
influence on the final trend lines.

[1]:
[http://en.wikipedia.org/wiki/Partial_leverage](http://en.wikipedia.org/wiki/Partial_leverage)

------
tel
Here's the Haskell golf

    
    
        import qualified Data.Map as M
        import Data.List
        import Data.Ord
    
        countWords :: String -> [String]
        countWords = map fst . take 10
                   . sortBy (comparing snd)
                   . M.toList . M.fromListWith (+) . map (\w -> (w, 1)) 
                   . words

------
joshka
The biggest benefit I find from using Linq is readability. It allows the code
to express what it is doing rather than how it is doing it. Compare:

    
    
        foreach (var item in list)
            if (SomeCondition(item)) return item;
        return null;
    

vs.

    
    
        list.FirstOrDefault(item);

------
seivan
I would have used a weighted SET.

But it's funny, the first time I read the paragraph, I'd assume the words were
not separated by space, and you had to find occurrences of combinations than.

I instinctively made the test much harder than it would be. I'm damaged.

------
zwieback
Good read. It also shows that interviewing can be a great learning tool. I've
experienced that myself, while interviewing takes a lot of time investment on
the part of the interviewer it's also often a source of new insights.

------
lelf
Well, welcome to high-level programming

    
    
      count = take 10 . map head . reverse . sortBy (comparing length) . group . sort . words
    

That's ignoring Unicode rules for word splitting of course

------
mrcozz
Everyone knows "Hadoop is a distributed system for counting words." ;-)

[https://github.com/twitter/scalding](https://github.com/twitter/scalding)

------
aaronbrethorst
That was a fun exercise :)

    
    
        input_string.split(/\W+/).inject(Hash.new(0)) {|acc, w| acc[w] += 1; acc}.sort {|a,b| b.last <=> a.last }[0,10]

------
superfx
Since we're comparing notes, here's the Mathematica version:

Reverse[SortBy[Tally[StringSplit[#]], #[[2]] &]][[;; 10, 1]] &

------
ajanuary
The first projection isn't needed, which would eliminate the creation of 10
objects.

------
coderguy123
text.Split(' ').Where(x=>!string.IsNullOrWhiteSpace(x)).GroupBy(x =>
x).Select(x => new { word = x.Key, count = x.Count() }).OrderByDescending(x =>
x.count).Select(x=>x.word).Take(10).ToArray();

------
prakashk
Perl 6:

    
    
        .say for (bag($text.words) ==> sort {-*.value})[^10]

------
seoguru
here's a verbose ruby version:

    
    
       def topx(str,x)
          c = Hash.new(0)
          str.split(/\s+/).each { |s| c[s] += 1 }
          c.sort_by {|k,v| -v}.take(x)
        end

------
coderguy123
isn't .ToList() part of linq too. technically the first solution is also using
Linq. would be interesting to make them implement sort algorithm too.

------
pawrvx
LINQ is my favorite API of all times. It rocks.

~~~
olmobrutall
I totally agree. Sure there where similar things a in the funcional world
before, but LinQ had some important pros:

\- deferred ejecution by default, saving memory and time

\- step by step syntax, each new operation is at the end, not the beginning

\- excellent type inference and intellisense, js? ruby?...

\- it works with the same syntax on the database!!! Haskell?

\- map and filter where there, but groupby and join where not so common in
previous query comprehensions APIs.

\- the most important: it's actually usable in jobs you get paid for, not
experiments you can make at home or university.

There are however two things that doesnt make it 100% perfect:

\- expression tree lambas are identical to non expression ones, making it hard
for developers to know if one step is going to be translated or executed. I
would have chosen => for non expression and -> for expressions for example or
something like that.

\- having two syntax, method chain and query comprehensions, produces a
frequent anoying back and forth since some operators are better written in one
(let, join, group by) while others are only available in method chain (take,
toDictionary...)

------
tel
Why does this "ruin" a question?

~~~
freework
Because it spoils the interviewers ability to feel smug.

------
kragen
So, aside from the Clojure, Mathematica, Python, Ruby, Bourne Shell, Haskell,
and Scala solutions posted in the other comments, all of which are simpler
than the C++, C#, and JS solutions, presented here with some minor cleanups:

    
    
        (take 10 (reverse (sort-by (comp first rest) (frequencies (string/split ... #"\+s")))) ; llambda Clojure
    
        // haakon Scala
        s.split(' ').groupBy(identity).mapValues(_.size).toList.sortBy(-_._2).take(10).map(_._1)
    
        (->> (string/split s #"\s+") frequencies (sort-by val) reverse (take 10)) ; aphyr Clojure
    
        var top = (from w in text.Split(' ')  // louthy C# LINQ
                   group w by w into g 
                   orderby g.Count() descending 
                   select g.Key).Take(10);
    
        collections.Counter(s1.split()).most_common(10) # shill Python
    
        d = {}  # shill Python without collections library
        for word in s1.split(): d[word] = d.get(word, 0) + 1
        print [(x, d[x]) for x in sorted(d, key=d.get, reverse=True)][:10]
    
        words = s.split()    # spenuke and abecedarius probably O(N²) Python
        sorted(set(words), key=words.count, reverse=True)[:10]
    
        d3.entries((s.split(" ").reduce(function(p, v){  // 1wheel JS with d3
            v in p ? p[v]++ : p[v] = 1;
            return p;}, {})))
          .sort(function(a, b){ return a.value > b.value; })
          .map(function(d){ return d.key;})
          .slice(-10)
    
        # kenuke O(N²) Ruby:
        str.split.sort_by{|word| str.split.count(word)}.uniq.reverse.take(10)
    
        counts = Hash.new { 0 } # my Ruby
        str.split.each { |w| counts[w] += 1; }
        counts.keys.sort_by { |w| -counts[w] }.take 10
    
        # aaronbrethorst ruby
        str.split(/\W+/).inject(Hash.new(0)) {|acc, w| acc[w] += 1; acc}.sort {|a,b| b.last <=> a.last }[0,10]
    
        Commonest[StringSplit[string], 10]  # carlob Mathematica
    
        Reverse[SortBy[Tally[StringSplit[#]], #[[2]] &]][[;; 10, 1]] &  # superfx old Mathematica
    
        $a = array_count_values(preg_split('/\b\s+/', $s)); arsort($a); array_slice($a, 0, 10) // Myrth PHP
    
        tr -cs a-zA-Z '\n' | sort | uniq -c | sort -nr | head  # mzs and me sh
    
        -- lelf in Haskell
        take 10 . map head . reverse . sortBy (comparing length) . group . sort . words
    
        # prakashk Perl6
        .say for (bag($text.words) ==> sort {-*.value})[^10]
    
        # navinp1912 C++
         string s,f;
         map<string,int> M;
         set<pair<int,string> > S;
         while(cin >> s) {
                 M[s]++;
                 int x=M[s];
                 if(x>1) S.erase(make_pair(x-1,s));
                 S.insert(make_pair(x,s));
         }
         set<pair<int,string> >::reverse_iterator it=S.rbegin();
         int topK=10;
         while(topK-- && (it!=S.rend())) {
                 cout << it->second<<" "<<it->first<<endl;
                 it++;
         }
    
    
    

I thought I'd maybe take a look at Afterquery:
[http://afterquery.appspot.com/help](http://afterquery.appspot.com/help)

Although I haven't tested it, I think the Afterquery program to solve this,
assuming you first had something to tokenize your text into one word per row,
would be something like

    
    
        &group=word;count(*)
        &order=-count(*)
        &limit=10
    

which, though perhaps less readable, is simpler still, except for Mathematica.
More details at
[http://apenwarr.ca/log/?m=201212](http://apenwarr.ca/log/?m=201212).

Perl 5, perhaps surprisingly, is not simpler:

    
    
        perl -wle 'local $/; $_ = <>; $, = " "; $w{$_}++ for split; print @{[sort {$w{$b} <=> $w{$a}} keys %w]}[0..9]'
    

And neither is this, although it uses less code and less RAM:

    
    
        perl -wlne '$w{$_}++ for split; END { $, = " "; print @{[sort {$w{$b} <=> $w{$a}} keys %w]}[0..9]}'
    

I was surprised, attempting to solve this in Common Lisp, that there's no
equivalent of string/split in ANSI Common Lisp, and although SPLIT-SEQUENCE is
standardized, it's not included in SBCL's default install, at least on Debian;
and counting the duplicate words involves an explicit loop. So basically in
unvarnished CL you end up doing more or less what you'd do in C, but without
writing your own hash table. Lua and Scheme too, I think, except that in
Scheme you don't even have hash tables.

~~~
pjmlp
Small C++11 improvement:

    
    
         string s,f;
         map<string,int> M;
         set<pair<int,string>> S;
         while(cin >> s) {
                 M[s]++;
                 int x=M[s];
                 if(x>1) S.erase(make_pair(x-1,s));
                 S.insert(make_pair(x,s));
         }
         auto it=S.rbegin();
         int topK=10;
         while(topK-- && (it!=S.rend())) {
                 cout << it->second<<" "<<it->first<<endl;
                 it++;
         }
    
    

Surely, it could even be more improved with help from lambdas and algorithms.

------
jahabrewer
(his name is Jon Skeet)

(sorry)

~~~
navinp1912

         string s,f;
         map<string,int> M;
         set<pair<int,string> > S;
         while(cin >> s) {
                 M[s]++;
                 int x=M[s];
                 if(x>1) S.erase(make_pair(x-1,s));
                 S.insert(make_pair(x,s));
         }
         set<pair<int,string> >::reverse_iterator it=S.rbegin();
         int topK=10;
         while(topK-- && (it!=S.rend())) {
                 cout << it->second<<" "<<it->first<<endl;
                 it++;
         }

~~~
kragen
Not bad. I think it would be a little simpler and faster with:

    
    
        while (cin >> s) M[s]++;
        for (map<string,int>::iterator i = M.begin(); i != M.end(); i++) {
                 S.insert(make_pair(i->second, i->first));
        }
    

But maybe there's a downside to that approach that isn't obvious to me?

~~~
navinp1912
If you move what while (topK--) into the map loop , it becomes an online code
for topK whereas what you wrote is an offline . If you want offline then
pushing it into a priority_queue and then popping it out would be much faster.

------
inzax
He should try adding AsParrallel() to the expression. I bet there will be a
drastic speed up as long as he has a multi core.

