
Show HN: Pings 8000 servers in 11 seconds: Parallel HTTP/SSH/TCP/Ping library - jeffpeiyt
https://github.com/eBay/parallec#demos
======
jeffpeiyt
Hi Uberneo, yes, we exactly use it for HTTP, but also others. here is a 20
lines example to extract from a lot of HTTPs to elastic search.
[https://github.com/eBay/parallec-
samples/blob/master/sample-...](https://github.com/eBay/parallec-
samples/blob/master/sample-
apps/src/main/java/io/parallec/sample/app/http/Http3WebAgrregateToElasticSearchMinApp.java)

Parallec has a special super convenient response context let you pass in/out
any object when handling the response. Now you can conduct scalable API calls,
then pass aggregated data anywhere to elastic search, kafka, MongoDB,
graphite, memcached, etc.

Python has global interpreter lock so if it is computational expensive, you
have to use multi-process to use more than 1 core. Parallec can let the
handler to run your onComplete() function either in worker before aggregation
(parallel) or in manager after aggregation

~~~
jstoiko
Looks really cool. Thanks for sharing!

I think what Uberneo is asking is whether Parallec would handle the html
parsing like Scrapy. I believe the answer is no. You wouldn't want to slow
down Parallec with parsing though, you would rather send the html output to
some other process for that, right?

~~~
jeffpeiyt
Thanks jstoiko! You are right. Parallec is not specifically built to do
crawling or parse website pages recursively (however you may build such
crawler on top of it) We mostly use it to manage (HTTP/S) agents on every
production machine in the cloud for software deployment, remediation, asset
discovery etc. (Parallec like a kubenate master) to manage all the kubelet
(agents)

Yes, we may or may not want to slow down. Sometimes if it is just regex or
simple parsing we just put the parser inside of the worker. We can send the
results out to Kafka etc so that some other process/machine can process them.

[http://www.parallec.io/docs/submit-task/#apis-on-response-
ha...](http://www.parallec.io/docs/submit-task/#apis-on-response-handling)

------
uberneo
Can this be used to scrape data from website parallely like scrapy?

