Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Pythonic way to do MapReduce using hadoop (dynamicguy.com)
7 points by ferdous on Feb 20, 2011 | hide | past | favorite | 3 comments


Great overview, from top to bottom.

Just wondering, is it better performance-wise to explicitly check for key membership than to rely on exceptions?

Existing:

  for line in sys.stdin:
      name, marks = line.rstrip().split('\t')
      try:
          if agregatedmarks.get(name):
              agregatedmarks.get(name).append(marks)
          else:
              agregatedmarks[name] = [marks]
      except ValueError:
          pass
Or

  for line in sys.stdin:
      name, marks = line.rstrip().split('\t')
      if name in aggregatedmarks:
          aggregatedmarks.append(marks)
      else:
          aggregatedmarks[name] = [marks]
This is a common idiom I find myself using (append to an existing list or creating a new one in a larger dict).


aggregatedmarks.setdefault(name,[]).append(marks)

See http://docs.python.org/release/2.5.2/lib/typesmapping.html


neat and minimal! I will update the code. Thanks for your comments and stopping by :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: