
Playing with Go: Embarrassingly Parallel Scripts - jameskilton
http://collectiveidea.com/blog/archives/2012/12/03/playing-with-go-embarrassingly-parallel-scripts/
======
jgrahamc
Yes, Go does make writing stuff like that nice, but I think the actual code
given is pretty heavyweight. This does the same thing:

    
    
      package main
    
      import (
        "fmt"
        "net"
        "io/ioutil"
        "strings"
      )
    
      func main() {
        file_in, _ := ioutil.ReadFile("domains.txt")
        domain_list := string(file_in)
    
        done := make(chan bool)
        count := 0
    
        for _, domain := range strings.Split(strings.TrimSpace(domain_list), "\n") {
          go func(d string) {
            ipAddresses, _ := net.LookupIP(d)
    
            ip := ""
            if len(ipAddresses) > 0 {
              ip = ipAddresses[0].String()
            }
    
            fmt.Println("Mapping: ", d, "->", ip)
            done <- true
          }(domain)
    
          count++
        }
    
        for i := 0; i < count; i++ {
      	<- done
        }
      }

~~~
dadkins
There's a library for that pattern: <http://golang.org/pkg/sync/#WaitGroup>

Instead of counting and using a done channel,

    
    
        import "sync"
        ...
        var wg sync.WaitGroup
        for ... {
            wg.Add(1)
            go dowork()
        }
        wg.Wait()

~~~
jameskilton
Nice. Yeah I'm very new to Go so I know almost nothing about the stdlib
libraries. Thanks for the pointer.

~~~
darkhelmetlive
Tooting my own horn here, but I'm working on a book covering the Go Standard
Library. Still in progress, and not at the sync package yet, but it's coming
along. Check it out if you feel inclined.

<http://thestandardlibrary.com/go.html>

~~~
lsb
You know what'd be cool? To take arbitrary code in a language, pattern match
on the implementation in that language of each std lib function, and actively
recommend substitutes for duplicated code.

~~~
jlgreco
That would be very cool. I wonder how hard it would be though. At least in Go
I know that the standard lib contains lots of duplicate code (primarily so
that things that should be small don't require larger things as dependencies.
I think the time package's String() functions use reimplemented fmt package
functionality for example, since fmt is a much larger dependency than time
should have.)

~~~
skybrian
Yes, but outside the Go standard libraries, adding a dependency on a standard
library isn't a big deal and won't add a cycle.

~~~
jlgreco
Yes. What I mean though is that if you are reimplementing, say, _fmt.Printf_ ,
such a suggestion system might correctly suggest you use _fmt.Printf_ instead,
but also suggest you can use _func (m Month) String() string_ from _time_ , or
something equally silly.

Since the standard libs in Go duplicate code, you would have to be careful
that your suggestion system isn't picking up false positives. I think the idea
has a lot of promise though.

------
hannibalhorn
Go is great for stuff like this, especially when part of an actual system.

That said, if this was just an adhoc job (to figure out which domains point to
a specific IP address) you can just use "xargs -P" or GNU parallel and it
becomes a pretty basic shell script, along the lines of: cat domains.txt |
xargs -P 1000 -n 1 host

~~~
skeltoac
Until last week I didn't know that xargs could invoke commands in parallel.

    
    
      xargs -n1 -P8 dig A <hosts.txt | grep -v ';' | grep $TARGET_IP | sed 's/\.\s.*//' >hosts-matched.txt

~~~
scurry
So what's the difference between xargs and parallel? I thought the point of
parallel was that it was xargs with the addition of running things in
parallel. But if xargs can do that already, is there any reason to use one
over the other?

~~~
eggoa
It's mostly about how they handle special characters.

[http://www.gnu.org/software/parallel/man.html#differences_be...](http://www.gnu.org/software/parallel/man.html#differences_between_xargs_and_gnu_parallel)

------
danneu
My attempt with Ruby and Celluloid.

<https://gist.github.com/a803d86234e8d1fc5496>

I also include a list of 100 domains in a domains.txt if anyone wants to try
for themselves.

    
    
        require "socket"
        require "celluloid"
    
        class IPGetter
          include Celluloid
    
          def get(url)
            Socket.getaddrinfo(url, "http")[0][2]
          end
        end
    
        pool = IPGetter.pool(size: 100)
        ips = {}
    
        File.open("domains.txt").each_line do |line|
          line.chomp!
          ips[line] = pool.future.get(line)
        end
    
        ips.each do |url, ip_future|
          puts "#{url} => #{ip_future.value}"
        end

~~~
ericmoritz
Or Erlang

[https://gist.github.com/bb01a85404e6b445dcb3#file_resolve_do...](https://gist.github.com/bb01a85404e6b445dcb3#file_resolve_domains_plist.erl)

    
    
        % -*- erlang -*-                                                                                                                              
        %%! -smp enable                                                                                                                               
                                                                                                                                                      
        worker(Hostname) ->                                                                                                                           
            {ok, IP} = inet:getaddr(Hostname, inet),                                                                                                  
            io:format(                                                                                                                                
              "~s => ~s~n",                                                                                                                                 
              [ip_to_string(IP)]                                                                                                                      
             ).                                                                                                                                       
                                                                                                                                                      
        ip_to_string({N1,N2,N3,N4}) ->                                                                                                                
            io_lib:format(                                                                                                                            
              "~w.~w.~w.~w",                                                                                                                          
              [N1,N2,N3,N4]                                                                                                                           
             ).                                                                                                                                       
                                                                                                                                                      
        main([DomainFile]) ->                                                                                                                         
            {ok, Bin} = file:read_file(DomainFile),                                                                                                   
            String = binary_to_list(Bin),                                                                                                             
            Domains = string:tokens(String, "\n"),                                                                                                    
            plists:foreach(                                                                                                                           
              fun(Domain) -> worker(Domain) end,                                                                                                      
              Domains                                                                                                                                 
            ).
    
    

This uses <https://github.com/eveel/plists/>

------
dkhenry
Scala is really great for this too. The downside is the spin up time for the
JVM, but the upside is that if you use SBT's script launcher it will compile
and cache the script transparently for you , you can still pull in _any_ JVM
dependency and you can run it just like a shell script. I needed to test for a
port being open in parallel and it was a cake walk to use NIO's SocketSelector
to do it reactivly.

------
j2labs
This kind of a thing is also very easy in Python if you use Gevent.

I have used Gevent a lot over the last couple years and see a lot of
similarities in Go's concurrency, which I think is great.

Concurrency doesn't have to be about insane looking code!

~~~
d0mine
With concurrent.futures
[http://docs.python.org/3/library/concurrent.futures.html#con...](http://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor)

it could look like:

    
    
      from concurrent.futures import ThreadPoolExecutor as Pool
      from socket import getaddrinfo
    
      def lookup(domain):
            try:
                result = getaddrinfo(domain, 80)
            exception Exception as e:
                print("error %s -> %s" % (domain, e))
            else:
                print("done  %s -> %s" % (domain, result))
    
      nconcurrent = 20
      with open('domains.txt') as file, Pool(nconcurrent) as pool:
            for domain in (line.strip() for line in file):
                pool.submit(lookup, domain)
    

To run multiple processes instead of threads, change the import to
ProcessPoolExecutor.

To support multiprocessing.Pool (for Python 2 where concurrent.futures is not
in stdlib), replace pool.submit() with pool.apply_async() and use
contextlib.closing() around the Pool().

------
geoka9
To be fair to Ruby, it releases the GVL when a thread is blocked on IO; so for
this kind of task it wouldn't be an issue, probably.

------
lazyjones
To be fair to languages without such great parallelism support: you can do
this using asynchronous/event-loop-based code because the parallelism will be
limited by the nameserver anyway (the calling code does almost nothing, it
mostly waits for the net / the nameserver).

~~~
chimeracoder
> To be fair to languages without such great parallelism support: you can do
> this using asynchronous/event-loop-based code

Well you can _do_ with callbacks anything that you can do with channels and
goroutines. Go's primary appeal is that it makes concurrent[1] code easy to
reason about, not that it enables you to do anything that you "couldn't do"
otherwise.

Continuations are just GOTOs, and just like GOTOs, some people love them and
some people hate them, but even people who like them can find them difficult
in large doses. Goroutines and channels are nice, because they fit the
structure of imperative code, whereas callbacks sort of resemble imperative
code but "inside out".

[1] Note that I didn't say parallel!

------
jnazario
in the DNS case asynchronous event handling would be super easy to do. in
python asyncore with something like dpkt to construct and read DNS lookups
works like a champ, as does twisted. i did a simple async DNS resolver in pure
python (asyncore, dpkt) and can sustain thousands of lookups a second. GNU
adns also has bindings in various languages.

you can get Go's parallelisms via CSP (e.g. python-csp, ruby-csp) and replace
a lot of fragile threading/parallel code with it. i've been doing that in lieu
of learning Go (i know i know .. i'm lazy) and been very pleased.

anyhow, many ways to skin cats. those are just two or three.

~~~
tptacek
You wouldn't do DNS lookups asynchronously in Go to begin with. Modeling
concurrency of any sort in Go the way you would with an event loop is usually
a code smell.

~~~
blablabla123
Not sure if I understand your point, can you explain a bit more please? Are
you saying event loops with select {} are considered code smell?

~~~
tptacek
If I had to simultaneously generalize the idea and make it specific enough to
explain it further, I'd say fiddly callback state machines are a code smell in
Go.

