Cool! Like you, I wish people would make productive use of spare cycles.
Can I suggest you add a note comparing this to BOINC/gridcoin as well? I think your marketplace is a better idea, but because of the security implications dgacmu pointed out downthread, it shouldn’t be treated lightly.
Also, I really like your white paper (https://vectordash.com/primer) as it’s nicely formatted and has pretty icons. I think you should update the comparisons to the cloud providers though. Comparing a Pascal-series (1080 Ti) part to a Kepler part (K80) isn’t particularly fair, as NVIDIA charges about the same for that part as the later P100s and V100s. That is, people should be comparing what the best price/performance ratio is, and all three providers have better options. There’s a secondary question about whether the right comparison is to AWS Spot / Preemptible VMs / Low Priority VMs, but I’m not sure from a quick skim what the likely preemption rate will be with your setup.
Finally, let me echo dgacmu in saying it’s too bad that NVIDIA forces all providers to only provide the expensive parts. The best price/performance is certainly the consumer parts (that 1080 Ti compares favorably to a P100 at a fraction of the cost), but both the EULA and otherwise prevent it. You’re not running a direct service / providing things in a data center which is a cool workaround! But you should definitely be more clear about the risks of letting untrusted third-party code to execute against your GPU.
Again, great project!
That said, I'd much prefer to see GPU cycles (and power usage) going towards scientific endeavors rather than the current crypto-mining boom.
Cloud computing is so ridiculously expensive for GPUs.
But there's a big problem of trust with this for ML.
How do I know you actually ran what I paid you for and not just generated random data that looks right in the shape I wanted it?
You could farm it out to two people and if the results disagree, then payment is decided by a third. But then you've just doubled the (paid) workload, and you've not really solved collusion by a significant portion of the workers.
I was doing a lot of fun deep learning projects on the side & would often Venmo my friend who was mining Ethereum to use his GPU to train my models. He made more and I paid less than AWS spot instances or Paperspace.
This is just a fun side project hoping to let people who want to train their deep learning models do it cheaply (on other people's computers!)
I had the same idea a few days ago - but in my head, the process would be wrapped up as a "cryptocurrency" where the AI researchers pay real money and the "proof of work" is useful/"real" work. I ran into 2 issues regarding trust: the first is that how do you verify that the hardware owner is running the real job an not NOOP'ing and sending false results? The second issue is how do you protect the hardware from malicious jobs? GPUs have DMA access - how do you stop task submitters from rooting your box and recruiting it into an AI botnet (for free)?I ended up dismissing the idea, but if you could work out these 2 issues, there's money to be made...
Consensus. Have _n_ nodes perform the same work (if it’s deterministic), and only accept (and pay) if all the nodes match - or at least the nodes that were part of the majority
I don’t think this would be considerably different from SETI or folding@home, which have been going on for around twenty years.
For my senior project in college one of our ideas was a distributed render farm that operated like what we’re talking about. There were some additional issues there (transfering assets, preventing node owners from extracting the output [say a studio was “letting” fans donate compute time to render a feature film], etc).
Sounds vulnerable to sibyl attacks.
The real problem here, I believe, and I've seen this idea pop up several times on hackernews, is that almost no machine learning tasks are progress free.
If the cryptocurrency is just paid out to the person who solves the task first in a non-progress free problem, then the person with the fastest GPU would mine all the coins and nobody else can participate. One of the key ideas behind proof of work is that if two people have the same compute, and person A has a headstart, if person A has not succeeded by the time person B starts, they'll have the same probability of mining a block.
People seem to be just jumping on the crypto bandwagon and trying to come up with "useful" proof of work, but it's a pretty difficult task.
My assumption is that you would have to be faithful in a low number of untrusted nodes in order for that to end up cheaper.
The cases of folding and SETI are particularly different because there are institutions which have in interest in funding these programs in part due to their goal being a public cause. The same clearly doesn’t apply to micro tasks if you will.
But I can imagine cases in which you can accept bad actors giving bunk results for some percentage of the calculations you run. As long as you’re rotating nodes often enough (provided that they’re from distinct actors) I’m imagine it could work out to be economically more feasible to spend the time to work around that bad data than it would be to directly hire fully trustable compute power.
Soudns vulnerable to sibyl attacks.
Only real way to do this is run the job in a VM with a GPU and CPU+motherboard that support passthrough (read: not consumer NVidia GPU's, your CPU+board must support an IOMMU and your card cannot freak out when being reset after initialization).
Don't get me wrong, I think it's a great idea. I just don't see why it needs a blockchain and all the associated trustless infrastructure. Even nicehash doesn't bother with all that.
What you see as “trying to force what should probably just be a centralized service,” I see as “innovating a new approach to powering decentralized architecture.”
I’m not saying you’re wrong. It would be easier to solve the problem using existing tool sets and more mature protocols. Yet, I’m pretty sure that the Golem team is doing something right. So there’s that. Maybe this isn’t a zerosum thing.
I'm not at all convinced that the golem team have any particular insight to solve this obvious and common problem that everyone else doesn't have. And frankly I think that the overhead of running unnecessary infrastructure will render them price-uncompetitive to any reasonable centralised provider. In short, I predict they will fail.
But eh, they raised USD$8m and I didn't, so what do I know.
I guess that's why we're sitting in different camps.
One advantage that the Golem team has over a centralized, proprietary solution is the open nature of the project:
The Golem team doesn't necessarily need to solve every problem. Being built on top of the Ethereum Network is advantageous. If they make an appealing, open platform with potential, maybe other developers will pick up the ball and run with it to power their own ends.
In short, I predict that they will succeed.
Indeed, doesn't mean I don't want to hear the other side's point of view though!
Open source is not going to save them. They have one main problem - how to tell if people did the work they claim they did? If centralised, they can "test" new users or perhaps periodically check up on long term users by secretly allocating duplicate work and verifying its content. How can you do that in public? The blockchain is actually working against them.
And who really needs a cryptographically secure attestation that on march 25th 2018, user XYZ completed ML shared 456.7? This is a level of audit logging appropriate for a bank and basically nothing else. All you need is availability accounting of some sort. It's not rocket science. I couldn't write the client for this app but I sure as hell could write the back end and I wouldn't even think of using a blockchain. Make no mistake, their choice of technologies is for buzzword compliance, not technical necessity - a very bad sign.
There is also no need for the GNT. It solves no problem and users could just as easily be compensated in ETH or anything else. Sure, it's a funding mechanism, fine. We still haven't figured out how ICOs should even work.
Despite all the rigmarole, they have a product they need to sell like any other startup: rent us your GPU/CPU for $x/hr. Because of their overhead, I predict they will easily be outcompeted by centralised providers. People are not going to use golem over another, better-paying alternative just from the goodness of their hearts. And I cannot see any way how golem can be structurally more efficient than a centralised solution.
All said, I'm not as optimistic as you. Not like I want them to fail though, good luck to them!
Just a couple more responses:
1. The use of blockchain is not for buzzword compliance. Julian (CEO) is a longtime Ethereum supporter/developer. This project has really been in development since 2014 or so, long before blockchain was "buzzy." So the use of blockchain here is not for grabbing cash. They actually think it's a better (perhaps harder) way forward.
2. Not only does the token allow investors to directly invest in the project, it also allows developers to "print" tokens that can be locked behind smart contracts. That way developers can be rewarded for reaching project goals with bowls of their own dog food. Not bad to eat when it's pretty much "real" money.
3. The decentralized and distributed nature of the project will allow the Golem Network to achieve goals and execute code that no centralized competitor could achieve/run. I'll leave it as a thought exercise for you to speculate what those goals/codes might look like.
Thanks for the engagement. It's great to test my beliefs through debate. Time will be the true arbitrator here though. Best wishes.
As somebody who's spent a silly amount of money on EC2 spot instances to train models, I would certainly overlook the odd dodgy result for access to those GPUs at those prices.
I just hope you find a way so that the ingenious but disreputable people that seem to come when money's to be gained don't ruin it for everyone. However, I wish you every success.
I imagine you could do some kind of hardware fingerprinting, but there's nothing stopping a really bad actor from modifying the kernel to pretend to have a GPU and NaN on allocation. I suppose I'm descending into absurd levels of distrusting trust that may never happen.
I also foresee annoying customers who say they only get back NaNs but this is down to instabilities in their training and they flood any reporting of bad actors that you have.
I don't believe either are actually terminal with the right incentives.
One question, I noticed you only pay in crypto right now, do you plan to offer USD or other fiat currencies in the future? Crypto isn't a problem for me (I don't mine crypto myself but I wouldn't be opposed to carrying a passively obtained ∗coin balance and watching it appreciate over time), just curious.
Anyway, I think you have the makings of a nifty project here. Good luck!
While something like Stripe Connect may be useful, the fees are unreasonable for smaller transactions. A quick hack to cash out to crypto is to use your Coinbase wallet address as the payout address, and just sell off the crypto the moment it hits your wallet.
But seriously, it does have the lowest fees of the four supported by coinbase, and always should (since that’s its whole raisin d’etre!).
Cool idea, well done!
If enough people are interested in it, I can add support since it's already supported by Coinbase/GDAX.
That would be the naive/bruteforce way to verify trust.
Just like you can find a substring faster via Boyer–Moore than a char-by-char match, there are more efficient ways to verify trust in a distributed environment like this. There's a few white papers on the topic.
Alternatively you could give X% of tasks to multiple workers for cross checking. In your example X% is 100%, but it does not have to be that high.
The problem is you can't really issue substantial fines in this instance. I suppose you could pay less for the first few runs where verification is more likely.
I really want this to succeed, and I think these problems can be overcome.
P.S. I'm @samin100 on Twitter if you enjoy tweets about GPUs!
I ended up not pursuing it to work on a different idea, feeling like Golem would probably eat that lunch. I love the simplicity of your approach - very in the spirit of an mvp. Let me know if you'd like to see the note (teaser: an ad that says "Because not everyone has a 500 GPU cluster", but "not" is crossed out and replaced with "now", over a background image of a Go board).
Anyways, best of luck, I'll be excited to see how it turns out!
You might look into the Win 10 linux support, Ubuntu is one o the supported distros. Not sure if it would have full access to resources, have just used it a bit at work and setup was super simple.
He said it was on their list of most requested features, but refused give me a data on when it would be ready :/
That seems misleading ^^.
Depending on your threat model, this may be borderline OK, or may be insanely high risk. We've seen driver exploits in the recent past:
that can vector through the GPU to gain access to arbitrary host memory (aka, you're in deep trouble). The only "right" way to give access to your GPUs requires IOMMU support and running in a true guest VM, which Nvidia's consumer cards and drivers are explicitly prevented from doing (because they'd like you to buy the Tesla versions, thanks).
This may be a totally acceptable risk on an isolated mining rig, but people should be aware of the heightened risks they're exposing themselves to. Breaking your machine is well within the capabilities of state-sponsored actors on this one, and from time to time (just after vulnerabilities get announced), could be decently within advanced-script-kiddie zone.
Thank you for pointing this out.
Also, one of my friends working on this project is a sophomore in the CS dept at CMU, and given your interest in distributed systems & DL, would it be possible to meet up for a couple of minutes and discuss security a bit more in depth? (if yes, I'll shoot you an email)
But unfortunately this won't replace mining: the large scale mining farms have high end GPUs in their rigs, but the rest of the HW is very low end, because that works perfectly for mining and they want their ROI as low as possible.
I have a 6 GTX 1070 GPU rig, which would be decent GPUs for AI/ML, but the rest of the rig has a Celeron CPU, 4gb RAM,and a 5400 RPM HDD. Oh, and to be able to see all the GPUs, I had to downgrade all the PCIEx slots to 1x.
I am curious what kind of ML tasks would be able to fully utilize these GPUs, on such a low end hardware.
Stoked to see distributed-compute as a paying service making another try. One of these days, it is going to fly.
By the way, Vectordash is the most interesting idea & service I have seen in a while :)
Just replace your miner earnings with the earnings listed on http://vectordash.com/hosting
The seller can fake the computation to return bogus result is one thing.
But even if there is no malicious intent, the resulting computation is still ended up bogus result, the malfunction of the hardware.
Commodity hardware isn't that reliable and there are so many commodity GPUs in the wild that looks like working but return incorrect result.
The key to the trust issue is not trusting the remote computer, because you can’t. You have to validate somehow.
It's kind of overblown. If you run well within the temperature and power limits they are still usually good for years at 24/7. Usually they run a bit hotter or slower because something has evaporated or slightly worn out, or maybe a fan died that needs to be replaced, but otherwise are fine.
If you are lending out your own GPUS presumably you can set a reasonable power and temperature limit.
I would love to let my 1080 Ti be put to use when I'm not playing games on it, cryptocurrenty mining was fairly profitable until a month or so ago and now the power usage (even at $0.10/KWh) isn't covered by the returns after paying the tax bill (I could always file the revenue under my LLC and deduct the electricity expenses and even deprecate the hardware cost, but without a dedicated rig and power monitoring it's a tax audit nightmare waiting to happen).
The fans are in a bit worse shape, but those tend to be easily replaceable if they fail.
Side note: AI research is more beneficial to society than running a few instances of Crysis 9 at 4K 120HZ.