
Ask HN: What would you compute on 2000 badly behaved worker nodes? - pedrombafonso
I work for CrowdProcess, and we built a distributed computing platform that runs on Web Workers (badly behaved, i.e. volatile connections with some latency), to monetize websites replacing advertising.<p>We already have +2000 nodes and have been getting pretty good speedups in Monte-Carlo based algorithms. We also have the simplest ever API for a distributed computing platform.<p>What else would you build using it ?
======
patio11
At a previous day job we used n-queens to benchmark distributed computing, as
it is disgustingly parallelizable and produces nice visual results.

~~~
pedrombafonso
Thanks for your answer! Do you know about any real life application for that
problem?

~~~
patio11
If you're freezing to death inside of a computer cluster n-queens will save
your life.

~~~
lucastx
laughing out loud, here.

took me some seconds to understand this subtle one.

------
chris_va
Sigh. This doesn't seem slightly immoral to you guys?

This is a fairly common idea, and it usually gets shot down. I am surprised
you guys made it this far into the process. Unless maybe there is some user
opt-in model?

For example, do you know how much more expensive this is (e.g. Wh/Tflop) than
traditional datacenter grid computing? Or, how you are essentially charging
users without their knowledge? I'm sure the legal system will love that one.

~~~
joaojeronimo
There is an opt-in: [http://cdn.crowdprocess.com/opt-
in.html](http://cdn.crowdprocess.com/opt-in.html) (only one website requested
it so it's not in English so far).

It's as immoral as advertising, maybe even less. In advertising you show up at
a web page and see tons of things that you did not want to see or did not
bring you to that web page, sometimes shift your focus and annoys you. It's
the same with CrowdProcess, except instead of annoying you, we annoy one CPU
core. We believe that while being more expensive than traditional datacenter
grid computing, it may be less expensive because it only has to outperform
ads. We don't compute on all the CPU cores, of course, only on one.

We actually ask websites to tell they're a part of this, but we cannot control
what they do because they can simply display:none.

~~~
patmcc
>>We actually ask websites to tell they're a part of this, but we cannot
control what they do because they can simply display:none.

You could certainly just check to see they're using it properly. Do a screen
scrape or even have someone hit the page every month or two, ban anyone
abusing it by not notifying users.

~~~
joaojeronimo
that may be harder to do efficiently than building the entire platform and
we're an extremely small team. I'm sure some day we'll do it but can't
prioritize that now.

~~~
patmcc
Really? How many websites are signed up / do you expect to sign up? Can you
not spare 5 minutes per site per month to make sure there's a notification
and/or opt-in? Or come up with an automated way to check it. Or use mechanical
turk and pay somebody $0.50 per site to check for you.

If you can't prioritize something as important as running an ethical (and law
abiding - take a very close look at the ramifications of unauthorized computer
usage, which I think it could be argued you're doing with this platform)
business, then you really shouldn't be in business.

~~~
pgfonseca
It isn't as much a matter of ability to verify, but to enforce. Currently the
platform is supplied by quite a few websites (over 100), and the best way to
get them to adequately communicate this is through proper incentives.

The incentives must be: if you do not comply, your content won't be monetized
(as would happen with ads).

------
ris
Could you give me a list of all your CDN domains so I can blacklist you?

~~~
joaojeronimo
Sure! You'll want to block ss.crowdprocess.com

An opt-out is under way, one website requested an opt-in but we haven't
translated that yet: [http://cdn.crowdprocess.com/opt-
in.html](http://cdn.crowdprocess.com/opt-in.html)

------
pokoleo
What you're saying is that you have a botnet.

Look at botnet owners.

~~~
joaojeronimo
That's actually pretty good advice, except it may be hard to find legal and
moral things to run. We wanted to find the cure for cancer and not produce
rainbow tables or do DDoS attacks.

~~~
glimcat
It's also hard to find legal and moral justification for constructing a botnet
in the first place.

~~~
joaojeronimo
Well we noticed that flash ads sometimes take up even more than 100% of a CPU
(meaning it can spawn threads and use multicore processing), video ads perhaps
even more since they may get to be gpu accelerated, as CSS3 animations. We
figured if people are spending this much CPU cycles for advertising, than why
not clean up all the advertising and use the CPU cycles for some protein
folding and finding a cure for cancer to make a website owners, visitors and a
group of researchers happy ?

------
bdcs
If you can use GPUs then I think scrypt-based coin mining will be the most
profitable thing to do. If not, then you need to find problems that are
relatively fast on CPUs compared to GPUs, parallelizable, and low bandwidth.
It will be a small intersection, but there will likely be something.

~~~
pedrombafonso
We're not yet the GPUs. We're definitely searching for that small intersection
you've mentioned.

Do you have any clues how to find it?

------
tlarkworthy
CPLEX and Gurobi cost a lot but are used by big companies to solve mixed
integer linear programming problems. You can exploit parralelism in the MIP
part.

Operations use MIPs a lot. Usually via an excel spreadsheet :s However people
expect a good interface to these problems, its not trivial

------
thaumaturgy
Hmm. It's not clear from your documentation, is it possible to use
xmlhttprequest through the web workers and get the response?

Because having thousands of systems as distributed web crawlers would be
really really cool.

~~~
joaojeronimo
It's not possible, sorry. It's possible if you ask us to access a certain
address, but to the outside world it's not allowed. It would be pretty cool to
have distributed web crawlers but it would also be extremely dangerous if
someone decided to use CrowdProcess to do a DDoS.

~~~
thaumaturgy
Seems like that could be handled by your API, if you throttle requests by
request domain. But, no worries, just curious.

------
Everlag
_coin miners where_ is any coin using scrypt?

~~~
joaojeronimo
that's a pretty good one, especially since mining Litecoins is still
profitable on EC2 (at least last time I checked). I was planing to write
something like that on the weekend but couldn't, will have to do it pretty
soon...

------
apw
I checked the FAQ, but didn't see an answer -- how do you prevent malicious
actors from returning bogus data?

~~~
joaojeronimo
We thought of sending puzzles to the worker that would have to really be
computed by the VM and would take some amount of time until it was possible to
be faked, or would have changed by the time a human could decipher the puzzle
and return the expected result, but so far we're only ignoring bad actors and
sometimes comparing results from different actors until a quorum is found
among results

