Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: FloydHub (YC W17) – Heroku for Deep Learning
178 points by saip on Feb 16, 2017 | hide | past | favorite | 85 comments
Hi HN! I’m Sai, one of the cofounders at FloydHub (https://www.floydhub.com). We're building FloydHub to be a “Heroku for deep learning”. We are in the current batch (W17) at YC. But I still like to think of FloydHub as being an HN incubated startup.

10 months ago, I was working at Microsoft and doing a lot of deep learning (DL) there. While the DL community is terrific, I was often frustrated by how difficult it was to get started and build upon others’ work. For example, running any popular Github project often started with an exercise in dependency hell. As I untangled these for myself, I wrote up some notes on setting up popular DL frameworks, which unexpectedly started trending on HN after someone posted it there (https://news.ycombinator.com/item?id=11697571). That's when I realized that engineering was a huge bottleneck in deep learning and a problem worth solving after all.

I’ve since quit my job and have been working fulltime for the last 9 months on building FloydHub to make deep learning easier. Our goal is to let the data scientists focus on the science, while we handle the engineering grunt work (provisioning and scaling infra, running reproducible experiments, enabling sharing and collaboration, supporting DL frameworks with zero setup, shipping trained models to production easily, etc.) Lots of interesting challenges - happy to talk about them!

We have a lot of work ahead, but we’re excited to share with you what we have so far! Looking forward to your feedback.

I was one of their earliest beta customers. Despite the initial quirks, the experience has been nothing but magical. It allowed me to go from code to training/model generation in one command, without any of the devops nonsense to do deal with. While digging into deep learning, I honestly felt that the devops stuff was actually more complicated than the math/backprop/neural nets stuff.

It felt like a Heroku moment for me. They have the potential to do to Tensorflow what Heroku did to Rails. Super simple deploy!

Obviously their vision is much broader (with an entire eco-system/'hub', reproducibility, etc.), but to me atleast the first part is super useful and exciting!

One advantage they have is that GPUs are INSANELY expensive on the cloud - they can actually make it cheaper for everyone with clever binpacking and proper termination.

My advice is that in the initial stage, they should partner with all the Moocs to ensure that all deep learning students are using Floyd. It's cheaper, faster and the students can focus on the science. And they provide, 100 free hours!

Disclosure: I've known the guys for quite a while.

I think partnering with MOOCs is a great idea. I've been taking the Deep Learning Foundations Nanodegree from Udacity and thought it would be awesome if we got some sort of discount to try this out. Course developers have been trying to get us AWS credits for a while now. What would be the best way to contact the guys?

Hey! Glad you brought this up. We've gotten quite a few requests from students enrolled in Udacity courses and have been helping get them set up & run their class projects on Floyd.

Here's our instructions for the Self Driving Car Engineer nanodegree program: https://github.com/floydhub/CarND-Term1-Starter-Kit. Happy to do the same for your class as well! How can I reach you? Feel free to to mail us directly: founders@floydhub.com.

We've also reached out to folks at Udacity to see if we can offer any official support for the courses.

Thanks for the reply! I just sent you guys an email. Will take a look at what you did for the Self Driving Car Nanodegree.

Sidenote: We always say the hardest things in software is cache invalidation and naming things, but I also found the problem to be dependency tracking and reproducibility. It's just so hard to get started with ANY non trivial software project and your os/tools/libs expire way before you finish the projects. Software should be cheap and repeatable, but for some reason it takes active maintenance and is therefore very expensive. If a superhuman intelligence looked at us from afar, we would probably look like how ants look to us: Millions of small workers with very inefficient probabilistic behavior. Sure, we get the job done, but very slowly with a lot of waste. That said, ants lived for ever, so maybe it is the right thing to do ;)

Either way, I often find myself choosing backward compatibility and stability over innovation and polish and choose to learn vim and bash instead of replacing the silver bullet every year.

Shameless plug, I am working on a platform similar to FloydHub, but for frontend engineers [0]. The problem is a real one.

[0] https://pipez.io

I can definitely resonate with this. Deep learning is in such an early stage, the frameworks and tooling are still maturing and evolving rapidly. This makes it really hard to reproduce other's work. Maybe there will be one winner in the frameworks war (Tensorflow?) and things will be better.

Pipez sounds really useful, good luck!

I’m a Data Scientist and I work in a large-ish corp. We have a dedicated engineering team who take care of all of our infrastructure needs - we mostly focus only on data science. I would expect most medium and large enterprises to have the same setup. Why would they use FloydHub?

I sense that you are not currently their target market the way that Heroku was not originally intended for the enterprise but have since expanded to it. There's nothing wrong with that, of course.

My first time deploying to Heroku in the summer of 2011 was a magical moment after spending a rather frustrating day futzing with a bunch of more barebones providers like Linode, AWS, Engine Yard, and few others I can't even remember at this point. I cannot stress how magical it was -- I like to think I'm a competent enough developer but I wasted a long time gluing things together and never quite getting it right for some reason. Before calling it quits for the day I decided to try that Heroku thing I had been reading and within 20 minutes I was up and running. That was impressive.

To note, I've been a happy paying customer of theirs since then, so they've made a decent chunk of change off my business over the last 6 years. If you can capture companies and projects in the infant stages, you can grow with them for quite a long time before it, if ever, becomes economical to roll your own infrastructure.

I think you said it yourself, "We have a dedicated engineering team who take care of this". I assume the goal of Floyd is to eliminate the need to build such teams and reduce cost of this type of effort. This indeed seems to be the goal of most SaaS/PaaS startups.

Smaller companies and startups, or medium sized companies that just start with ml/dl will find it very useful.

Currently working in a small startup and our data scientists (just a team of 6) are frequently fighting over the gpus in the office for time and online services atm are overpriced for our small budget.

Agreed, that is the target audience we are going after with FloydHub. If you have some time to chat about your startup, please send me an email naren[AT]floydhub.com. I would love to get your feedback and see what would make Floyd useful for companies like yours.

I mean with DLaaS then you don't need an entire dedicated engineering team... Same reason you might use AWS.

I have been using your docker container for 6 months or so now, thanks for putting it together :)

The jupyter jobs look neat, but I assume they are charged continuous time? Would be cool if somehow that only ended up charged for compute time, but I understand that would be difficult.

Are these instances guaranteed to be in a given region, for if I wanted to route more complex debug output / intermediate files to S3?

@Jupyter NB, we charge continuously right now. Charging for compute time only is possible, but an interesting engineering challenge (sandboxing, scheduling, etc.) - We’ll take this as a feature request! :) We’re all in the Oregon data center (us-west-2) now.

Thanks - glad you found it useful! The attention and feedback that I got from building dl-docker has been terrific. Definitely one of the reasons we started working on this seriously :)

Little off topic, does DL-docker work in Ubuntu on Windows for the GPU?

You mean bash on Windows? The CPU version of dl-docker will work on Windows! But unfortunately, the GPU version does not. You would need GPU passthrough, which is currently not supported. See https://github.com/NVIDIA/nvidia-docker/issues/197

Hi Naren, really glad to see this launch, your "Deploy your Trained Models" that feature is awesome... what about external dataset access? I would like to access the Google YouTube ML dataset [0], how could that be done within FloydHub without uploading? Also, perhaps related, have you thought of teaming up with Kaggle [1]?

[0] https://research.googleblog.com/2016/09/announcing-youtube-8... [1] https://www.kaggle.com/c/youtube8m

Hi, Thanks a lot! We are always adding to our public datasets (with appropriate licences). So we would be very happy to set it up for you on Floyd. Agreed that uploading large datasets is not quick. Instead we recommend you download them directly in to Floyd. See: http://docs.floydhub.com/home/managing_output/ for an example.

Re: Kaggle, we haven’t had a chance yet but that sounds like a great idea.

I was able to download a kaggle dataset using kaggle-cli.

Looks really amazing! Reproducibility of models and experiments is huge. It should be almost a requirement if one is going to publish results claiming SotA, etc. Seems like you could become the GitHub for deep learning in addition to the Heroku for deep learning.

Thanks, that's an awesome comment! Agreed. IMHO, end-to-end reproducibility is important from multiple angles - provenance for research, enabling collaboration, driving down costs by eliminating redundant runs of the same jobs, etc.

We were initially calling ourselves Heroku + Github for DL, but realized that was too confusing, haha.

This looks neat! We have been doing a lot of deep learning for NLP at our startup recently. Several “engineering bottlenecks” in the process (1) managing multiple jobs is definitely worth solving. git for deep learning would be neat (2) collaborating is a pain when the team is remote. I guess this ties to (1) too.

And oh, about the time I forgot to turn off our GPU instance for a couple of weeks… racked up a nice bill...

We’ve definitely felt the bane of forgetting to turn off some really expensive GPU instances. Efficient scheduling and spinning instances up/down as required is one of the first things we built to cut down on the costs.

Git is an apt analogy. The search space of hyperparameters is usually fairly large for most DL algorithms, so a good amount of experimentation is required to tune them. Things can start to get haywire without end-to-end version control of code, data, parameters, results, environments, etc. Definitely one of the core problems we solve.

This sounds interesting and useful - I hope you guys make it!

A couple of years ago I worked for a local cloud server provider, as a backend developer. Some of the work I did moved the company into deploying VPS instances using OpenStack. Our backend code was mainly PHP-based; so we used OpenCloud for the purpose - extending it where needed (when we started it didn't support all we needed; I extended things in such a way so that when we did need to upgrade OpenCloud, it would gracefully work without breaking anything - it was a gamble that I didn't know if it would really work - just had a hunch - 9 months in we upgraded, and it all worked perfectly).

Anyhow - at that time, seeing what we had available for servers and such (we were competing somewhat with DO) - I suggested we add support for GPU instances and maybe pivot toward an ML offering of some sort. Not gut our bread-n-butter, but offer up some kind of ML package for those that needed or wanted it.

I was shot down by management as it being too "pie in the sky" - not even demand or something like that. To be honest, I'm not even sure they understood what I was trying to convey, so maybe part of the problem was mine as well.

The company was eventually sold and I moved on, but seeing now how these kinds of services are in demand, I sometimes wonder on what "could've been". Ever since taking my first MOOC in ML (Ng's ML Class in 2011) - I've tried to interest employers in applying what (little) I know on the subject. I'm not an expert, but I'd love to apply my learning (on top of the 25+ years of software dev experience I already have). Today, I'm in the middle of the Udacity Nanodegree MOOC - I doubt much of that will transfer for my current employer, but maybe the general knowledge I'm getting of TensorFlow and Keras, among other bits, might help in the future.

I think, though, that FloydHub might be fun to play around with for future personal ML/DL projects as time goes on; I look forward to trying it out someday soon!

That’s awesome! ML to some extent, and DL to a large extent are still not considered mainstream enough by many. The big players (Google, Facebook, Baidu) seem to recognize this and invest appropriately, but the others not so much yet. That’s hopefully a sign of a burgeoning market!

Good luck with your Udacity class! If you want to play around, take a look at our guide for Neural Style Transfer: http://docs.floydhub.com/guides/style_transfer/

Hi! I'm Naren, the other co-founder of FloydHub. I'll be happy to answer any questions and really appreciate any feedback you can provide. Thanks!

Features page says "Develop interactively on the cloud using Jupyter Notebook. Your code, results and outputs are always preserved."

But http://docs.floydhub.com/guides/jupyter/ says "IMPORTANT: Floyd does not save your Jupyter notebooks after you stop the floyd job. So you need to download any relevant notebooks by selecting File > Download As menu from the Jupyter notebook."

Which is it?

The warning in the documentation page is no longer valid. We DO save the notebook files and keep them after the session is terminated. They will be part of the run output.

The docs have been updated to reflect this. Thanks for pointing this out.

The landing-page responsiveness on mobile breaks the layout at several points. Otherwise good content on there so far!

Thanks for the catch. My web development skills are definitely sub-par. Fixed the landing page now, to some extent. Hopefully it's not rendering it unreadable.

The first link to the pricing page on the FAQ points to localhost.

Everything else looks really slick!

Ah! Late night coding error :) It is fixed now. Thanks for pointing it out.

What tech did you use to build the homepage?

React + Redux for frontend. react-bootstrap + Bootswatch Paper theme for the UI. This was my first foray into anything web related, so it's good to see positive feedback.

Curious about the name - why Floyd?

When we started out, we were just building a bot (droid) to manage our deep learning workflows (scheduling jobs, logging results, etc.)

So, Flow + Droid => Floyd. Also, a hat tip to one of my favorite bands :)

So I guess we're competitors in a sense, but congrats nonetheless! We hope to launch at least a private beta of NeuralObjects like "Real Soon Now"™.

It'll be interesting to see where we decide to go down different paths, or how we take different approaches to things.

Thanks for the comment! There's lots of challenges to be solved in this space, and I'm sure there's room for all of us. Excited to see what you guys are up to. I will look forward to your beta release "real soon" :)

What was it Steve Blank said? "Startups don't die from competition with other startups, they die because they built a product nobody wants" (or something like that).

As you say, there's plenty of room out there. And we're all competing with Amazon, Google, Microsoft, etc. anyway. :-)

Running DL and sharing results is a huge pain. I feel versioning is a big challenge, also experiments with multiple architectures are a pain since large parts of the same calculations are repeated. Do you also solve this problem? (Feature request if not)

Yes, our Enterprise offering allows for building more complex workflows. Everything (code, parameters, environment and data) is versioned for reproducibility. One of the big value-adds is efficient caching of parts of the pipeline to avoid repetitions and save resources. We have noticed that this usually results in 10x increase in the number of experiments run by teams.

You said you worked at MS, so you're presumably pretty familiar with Azure's offering. I'm no expert (at all) but I played with it briefly and it was all pretty slick and easy to get up and running. How does FloydHub compare to that?

I assume you're talking about AzureML Studio. It's a pretty neat UI-centric tool for building machine learning workflows! It's great if you're starting out with ML, but offers little in terms of customizability. For example, it only supports R and Python, has no GPUs, no CLI, no container support for managing reproducible environments, etc. I think these are kind of deal breakers for doing deep learning :)

FWIW, I worked as a data scientist in Bing for 6 years and haven't seen/heard any other data scientist use it internally. We ended up building our own GPU clusters and going through the regular drill.

Curious how Microsoft's approach compares to what you have heard about Google's approach to ML as a service. My impression is that it looks kinda/sorta like it's internal approach/infrastructure...if you squint.

Which also makes me curious about FloydHub's infrastructure. Any gory details?

Floyd’s Infrastructure runs entirely on Docker. That makes it backend agnostic (our cloud offering currently runs on AWS). Floydhub uses nvidia-docker for the deep learning jobs that require GPU. We also version the entire pipeline (code, data, params and environment) for exact reproducibility.

GPUs instances are really expensive. One of the biggest challenges at the moment is around reducing this cost. Eg. Spot Instances and Spot Blocks. Still some challenges to be solved there.

We also want Floyd to be an end-to-end solution for building, training and deploying deep learning models. In that vein, we are also investing in adding support for Tensorflow serving but it has been a rough ride so far. Getting a generic solution that can host any Tensorflow model has not been straightforward.

Try deepdetect, it s easier than serving, though probably still less flexible when it comes to the TF backend.

ML-as-a-service offered by many companies (Microsoft Cognitive Services, Google Cloud Prediction, IBM Watson, etc.) are fairly similar. They're great out-of-the-box for some domains, say English speech recognition. For others (text/image), they’re fairly easy to get started with (don’t need much training data, no managing infra, etc.) However, they are mostly black boxes and set a slightly low bar in terms of quality. Anyone doing serious AI will hit the limits of what they offer fairly quickly.

The DL community is awesome in its openness and contributions. Our goal with FloydHub, in contrast to the ML APIs, is to provide the tools for data scientists to effectively leverage this. We want to solve the engineering hurdles that come in the way of doing some cool science.

I like the ease of use to get up and running with Floydhub. What internal tools you had to solve this at Microsoft? Were they any good? I heard Facebook has their own FBLearner Flow internally for managing their ML workflows and it's pretty neat.

I’ve heard FBLearner Flow is pretty cool for running/managing/sharing ML pipelines inside Facebook. Never seen or used it myself, but Microsoft had a similar internal tool called AEther that was very cool too. We’ve definitely taken inspiration from AEther in building Floyd.

Here’s an anecdotal story about how awesome AEther was (been a long time, so a little fuzzy on details): In 2011, Harry Shum was the VP of the Bing division at Microsoft. It was the early days of Bing (~10% market share, ~$2bn annual loss, etc.) - we had good talent, but were lagging behind Google in tech. In one of our all-hands meetings, Harry jokingly announced that if we beat Google in our core relevance metric (called NDCG), he’d take the entire Bing team, approx. 300 people strong, for a fully paid trip to Las Vegas.

Sure enough, a year later, Bing did beat Google in our core relevance metric (http://www.insideris.com/microsoft-bing-beats-google-in-the-...) and all 300 of us went to Vegas for a weekend as promised. (Spoiler: Google did eventually beat Bing back later)

The success and rapid acceleration in relevance gains was attributed in large parts to the introduction of a new tool called AEther (in addition to improving ML tech and hiring top talent). AEther was an experimentation platform for building and running data workflows. It allowed data scientists to build complex workflows and experiment in a massively parallel fashion, while abstracting away all the engineering concerns. I used it a ton on a daily basis and loved it. The AEther team claimed that it increased the experimentation productivity of researchers and engineers by almost 100X. Even now, when I ask ex-Bing data scientists working at other companies about what they miss the most from their time at Microsoft, AEther is almost always in the top 3 answers.

Having seen how awesome AEther was from the inside, one of our goals is to bring its benefits to the rest of the world as well. However, having talked to a few individual data scientists and researchers over the last month, their preference seems to be CLI over GUI (while bigger companies like it much better). May be its one of those things you have to get used to, or may be our implementation is clunky. So we’re making the GUI an enterprise only feature for now, while we continue to help individual data scientists through our CLI.

It seems like everyone and their dog wants to solve this problem; why is it going to be you?

Haha, there’s no magic and it’s difficult to say with any certainty that we’re going to make it. When we started out, we were just scratching our own itch. The AI community is amazingly open and fast paced. May be because of that, the tooling around it isn’t as mature as for, say, software development. More companies, IMHO, tend to be focused on building the next big thing _with_ AI, rather than the next big thing _for_ AI. With the increasing popularity of the space and the relative infancy of the state of tooling, we believe now might be a good time to tackle this problem.

That said, it’s hard to say how things are going to turn out. This is my first startup after working in the corporate for 6 years and it’s been amazing so far. Learning a lot, and excited for what’s ahead! :)

YC Magic?

YC definitely helps in giving us a lot of credibility when talking to customers especially enterprises. We have also been learning a ton about sales during the last couple of months - for a couple of engineer / data scientist it has been a humbling experience. Other than that there is not really any magic here!

Skimming through the site this looks very polished for a startup offering.

Thanks! We've been iterating on it for a while. My sister is a UX designer and helped a ton. I still think there's room for improvement, but it's great to hear a positive comment about it!

Who did the artwork for your landing page? Looks great.

Nothing fancy - bought some cheap stock art and my sister, who's a UX designer, modified and brushed them up :)

Nice! There is a real need for this in the market right now--especially among students, who struggle to setup working environments. I really benefited from your early Github project aimed to make setting up DL machines easier. Curious how defensible your product will be in the event Heroku/AWS come in with a competitor?

Defending against big players with almost infinite resources is always an interesting problem. Kind of have to hope they don’t come at you head on :) That said, there might be other aspects that come into play. For example, the market is fragmented with big providers each having their own frameworks and infra that they prioritize (Google - Tensorflow/GCP, Facebook - Torch, Microsoft - CNTK/Azure, Amazon - MXNet/AWS, Baidu - PaddlePaddle, etc.) Since there is yet no clear winner, which will likely be the case for the near future, vendor lock-in is not really desirable. We intend to be backend / framework / language agnostic.

Thanks Sai, this looks great. I found myself dreading the clunky workflow with AWS GPUs, even using CLI, so this is nice to see a Heroku-like product that's a bit easier to work with.

Also, awesome 100 hours offering -- Looking forward to using FloydHub for deep learning!

Sounds good! Give it a spin and let us know what you think. If you want a reference - here's a guide we wrote to run Neural Style Transfer (http://docs.floydhub.com/guides/style_transfer/)!

You guys seem to make setting up Deep Learning infrastructure easy for data scientists which is awesome! What about situations where the end user/business isn't sure exactly how to use Deep Learning (which algorithm, how to partition the data into training and result sets, stability of results etc.) for the problem/data set at hand? Would it be possible to use FloydHub as a marketplace of sorts where I could hire a deep learning enthusiast to appropriately construct the experiment for me and then explain the results?

Yes, we are planning to build a FloydHub marketplace for talented data scientists to find gigs. We believe Floydhub can showcase their work and expertise easily (similar to StackOverflow) and help find a suitable partners to work with. A wide range of industries like Medical, Oil and Finance have been collecting huge amounts of data and now with Deep Learning, they can effectively make use of that. So I definitely see a rise in demand for deep learning practitioners and we want to support them on Floydhub.

Congrats on your launch! Seems like a really slick product.

Have you taken a look at https://cloud.google.com/ml/? I assumed you would have but didn't see anyone mention it in the comments.

Do you see your product as complementary to a service like that? e.g. could you use the Google service as the execution engine for yours? or do you see that as competition?

Is this currently running on a one of the available GPU cloud services (IE AWS, Azure, Nimbix, etc...) or some self hosted hardware?

Nevermind, another comment mentions AWS. You seem to pretty much run at cost then (a little less), if you GPU hours are $0.432 @ 2 cores, 32G, and a k80. Then packing a p2 instance would run you $7.2 (or $14.4, etc..) and back of the napkin math says you can pack about 15 gpu jobs in a p2.8x. So you hour cost on a job is $0.48.

Just an observation :).

Yes, we currently run on AWS! The p2.8xlarge has 8 GPUs, not 16 :) So, it would still boil down to $0.9/hr/GPU. Also, utilizing 8 GPUs concurrently is pretty difficult/inefficient for most jobs since benefits from parallelization tapers off quickly.

We are only using p2.xlarge (1 GPU) for now. Driving down the cost is really important to us. We use reserved instances, spot fleets, etc. to be at <50% of AWS pricing. Lots of interesting challenges to be solved there wrt. effective scheduling and fully utilizing resources.

We’ve been thinking about our own infrastructure. It would really drive down the cost, but obviously, comes with its own challenges :)

I guess the profit margin comes from the $99/user fee

ahhh, thats true.

Our Individual plan has no monthly fees, and is purely pay-per-use @ <50% of AWS pricing.

Have been using this for a couple of weeks and it looks really promising! It definitely feels like Heroku for deep learning!

The github page for floydhub has a lot of open source projects. What parts of floyd is not open sourced?

Deep Learning is a very open source friendly community - most of the popular frameworks/algorithms are in fact open source. In fact, Floydhub is built on top of a lot of open source projects. We definitely considered open sourcing the core components of Floyd but the Rethinkdb fiasco made us "rethink" that. Instead, we are supporting the open source community by hosting Deep Learning Docker images, popular datasets, and projects. In the future, we are planning to open source more parts of Floyd but not all (similar to Github).

It feels like a good product. I will try it soon!

One small piece of feedback. If you open your landing page from mobile, e.g. My iPhone 7, the header height is changing all the time due to the text changing dynamically. It get hard to scroll past it and keep reading.


Thanks - fixed now! :)

Per second pricing and jupyter mode is pretty nice. Like heroku you're trying to resell AWS and stay competitive which is a challenge. Anyway, this is indeed a common problem and this looks like a solid approach.

Fully agree! In the longer run, I believe there might be some great opportunities wrt infrastructure. GPUs instances are super expensive (now combine with long runtimes). Self hosting infra at scale can drastically drive down prices 10x+ (we're already 2x cheaper than AWS). Till we get there, there’s a few lower hanging fruits to pick, like streamlining development processes and making things easier. Glad you like the Jupyter Notebooks!

This seems pretty awesome, but I can't get it working with my code. Do you have any plans to release extra documentation, or examples of how to adjust existing models to run on FloydHub?

Sorry to hear that, we are constantly improving the docs. You can find the latest version here: http://docs.floydhub.com/

If something is not clear, contact us in the communication app on the website. We will be happy to assist you in any way possible.

I was able to get a different model working, and FloydHub is great- I'm really finding it useful. I'm looking forward to see where you take it!

This looks really cool, I'm slightly confused why the type of CPU intensive workloads only has a single core though?

We picked this as a starting point to assess demands, e.g. CPU vs. GPU, before scaling out. We'll be adding more tiers very soon. What kind of instance would suit your needs?

Congrats with the launch!

Congrats on the launch!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact