Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: My ML applications book, which HN helped me write (sample chapter) [pdf] (mlpowered.com)
272 points by e_ameisen 22 days ago | hide | past | web | favorite | 37 comments

I recently published my first book! Writing a book has been a personal goal for the longest time, and Hacker News is the main reason I got to do so.

Reading interesting posts on HN daily eventually inspired me to write my own, and posting my own writing to HN showed me there was an audience for the topics I wanted to write about.

I started blogging about Machine Learning in 2017, and posted some of my writing to HN. Many posts did not do well, but some made it to the front page, sometimes even to the top spot (see https://news.ycombinator.com/item?id=18147710 , https://news.ycombinator.com/item?id=16224346 and https://news.ycombinator.com/item?id=17257143).

As a consequence, my posts ended up getting over half a million reads, which was enough to put me on O'Reilly's radar. They reached out and asked me to write a book for them. I decided to write something that follows the theme of my blog posts, and focus on practical Machine Learning advice that isn't often covered in ML classes. The book took 18 months to write, and is now out. I couldn't have done it without HN.

You can find a longer description of the contents at https://www.mlpowered.com/book/.

Thank you HN!

Congrats! Did O'Reilly give you valuable technical feedback while you were writing the book? I was in talks with publishers, but decided to self-publish (https://leanpub.com/beautiful-spark/) because I didn't get the impression that the book publishers would be able to help at all with deeply technical issues.

O'Reilly did organize the tech review process, where they found multiple reviewers to give detailed feedback on every part of the book. The O'Reilly staff themselves helped immensely with the editing process, as well as decisions about which topics would be the most worthwhile to include.

That’s great! I don’t believe it’s ever occurred to me that a publisher might reach out to a blogger.

One good use for web analytics eh.

I have a text file I’ve kept for 16 years that contains quotes from many sources and some of my own musings. I dream to one day use it to inspire myself to spout something.

Perhaps publish it on Github? Maybe someone will find a gem in there.

I have some musings [1] on Github too. Thoughts that I want to get out there but are too incoherent to become blog posts ...

[1] https://github.com/Rainymood/musings

Great riddles.

It's a more common origin story than you might think, and not just in tech publishing. I have a friend who wrote a comedy article as a paid guest post. The thing went wildly viral and the publishers came a-knocking to bookify his piece.

I got approached by \newline based on a Github repo.

Took me 3-4 months to convert it into a book and I made ~10k with it.

I got approached from Packt publishing based on a github repo. Thought it was cool at the time, but later I felt maybe it was a bit like recruiterspam on linkedin, I just got hit by some keyword filter, as my repo wasn't particularly exciting.

Congrats on writing the book! As I'm a student of GeorgiaTech, I have access to Oreilly's Safari. I just found the book over there and am looking forward to reading it.

GATech student here! I've added it to my playlist, I've got a couple more shiny reads if you want to check it out.


Another GATech student here. I'm taking the ML class this semester... thanks for this!

Do all Georgia Tech students have access to the Safari? (coming from a fellow GT student)

Somewhat related: a poorly known benefit of an ACM subscription [1] is that it gives you access to the entire O'Reilly's Safari library - plus ACM Digital Library.

Best $99 I spend every year.

[1] https://www.acm.org/

First of all, congratulation for finishing your book!

I've read the TOC and looks really interesting. Is the content accessible for your average software engineer who never touched ML in his career? (like me)

It should be! The book mostly focuses on how to apply ML in practical concepts, and less about the internal workings of ML methods.

I tried to explain the vast majority of ML concepts that I mention in the book. If you'd like to dive deeper into ML theory, I have a list of recommended books in the preface (included in the PDF preview).

Congratulations to the release.

I am a little bit skeptic about your running example, the "ML Editor". A model that helps you asking "good" questions, e.g. on StackOverflow.

Isn't that like an extremely complicated problem, I would even say AI hard? How do you want to evaluate if a question is "good" (and sure thing, it's not the number of upvotes it gets)? Is there a working example of such an editor in action, because I highly doubt that this is currently possible.

I answered the next comment down the chain, but I agree with you. As framed, there is no satisfactory solution to making a question "good".

Improving question quality could however be a product goal for StackOverflow, or for a company focusing on writing tools (Grammarly, Textio,...). The book describes a process for turning that vague product goal into a more tangible set of metrics, which lead to choosing an ML approach, and iterating on it.

Eventually there is a finished prototype, for which you can find a GitHub link in the PDF preview. It has definitely not solved how to evaluate if a question is "good", but aims to provide a narrower set of recommendations (the first chapter actually dives into an approach for this).

Didn't read the book yet, but from your description absolutely agree on the trickiness of the problem - it almost requires solving the "proper artificial intelligence" problem first, like you say.

However, I imagine it could be applied to refine an existing question? There certainly exist "obviously poor" questions on SO, and it's a good first step to make otherwise poor question "look" like a good question - trivial things like formatting and misuse of the language. It won't get other, high-level attributes of a genuinely good question however, but some poor questions are poor in just that - formatting and language, the "requires editing" queue.

Regarding "intrinsically poor" questions, on the other hand, if everyone used the described model, readers would now have an increased cognitive load to distinguish between good and poor questions. Over time, the described model would drop in performance, as the "typical good question attributes" are used in poor questions which wouldn't have those otherwise.

(Forgive me for trivialising the concept of the quality of a question)

It's still a very interesting problem for a book. It's just as suitable for demonstrating the model development process, and it's likely very relevant to the vast majority of the readers (I imagine).

Yes, this is the approach in the book. The concept of question quality is nuanced, and does not have a clear definition. It can be easy to feel like you've solved the problem by just throwing in ML and calling it a day, but producing something useful is a real challenge.

The book covers multiple aspect of that process, from choosing an ML approach that isn't too simple or ambitious, to iterating on a model within the context of its final use case (i.e rather than only optimizing for a metric, testing how the model helps with its end goal).

In my experience, I've found that it is often those challenges that make or break the quality of an ML product, so the book focuses on tools to make complex problems more tractable, and less risky.

I think it's better to believe that the problem isn't hard and see how far you get, than assuming it's AGI complete without being precise about why. And we only really get a sense of why we ought to strongly believe it's AGI complete if we try to make progress on the problem.

For what I’ve been reading recently, there will be some big consolidation in the sector this decade, with a handful of industries faring much better than others because of diminished return on investments, incoming regulations and marginal tech improvements?

I found installing everything to get up and running with this to be effectively impossible; I think using tensorflow 1.15(and therefore requiring everyone to download Python <= 3.6) to have been a mistake. I've tried quite hard to get the environment running but eventually it always fails. A shame, this looked quite interesting.

Sorry to hear that! Has tensorflow been your main issue? It is only used in one of the example notebooks, so you can skip that requirement without too much of an impact.

If I can help with troubleshooting, send me an email at mlpoweredapplications@gmail.com

In the meantime I'll bump up the version in requirements.txt to solve the conflict.

Yes, it has - I didn't have any other problems, but I did notice you recommending the usage of `pip install -r requirements.txt`. I am by no means an expert in Python, but it is my understanding that using pip this way isn't recommended and you should instead invoke `python -m install -r requirements.txt` where python could be python36, python38, or just python. The same with virtualenv, `python -m venv ml_editor` appears to be the new way of doing this.

Interesting. I wasn't aware of that issue. It seems like after running activate, `pip` does point to the right location (`which pip` points to the virtualenv pip), but I'll look into it.

Unfortunately the changes you did didn't improve my problem, pip insisted again that a correct tensorflow version wasn't available. I was, however, able to download tb-nightly so I do at least have a version. Quite shocked tensorflow thrashes so intensely on installation - is that normal?

I'm glad you have found a workaround. Again, this is for an example script outside of the main narrative of the book, so the impact should be minimal.

If pip struggles to find a specific version, I'll usually remove the specific version requirement, and give whichever version pip finds a try by running `pip install tensorflow`

Congratulations on sticking with it and getting it done!

I am going to purchase the book because I have never done any ML and I started working at a company that I think may benefit from applying ML to some of its products. I hope to use the book as a starting point for diving deeper into specifics of ML.

Does it matter to you where the book is purchased?

Not at all, choose whichever platform and medium will give you the best reading experience.

I hope you find the book helpful, please reach out if you have any questions or feedback.


Do you have a slack or some other place to group together readers of this book to discuss? I'd love to poke other readers' (and yours) brains as I go through this.

I do not have a slack room, but that does sound like a good idea.

For now I do have an email address provided for any questions in the book, and am reachable on Twitter (mlpowered) and GitHub.

I read this book on the O’Reilly website using their trial period and have also pre-ordered a physical copy. It is excellent and I recommend it to anyone into practical ML.

I'm so glad to hear you have been enjoying it!

I am glad you wrote it! ;)

Looks great, I think I will pick this up. Can I ask a strange question, what font is used for the text in the book? It's very pleasant to look at!

That’s a great question, and I’m not sure. Googling around led me to an HN thread where a commenter claims that it is TheSans (lucasfonts.com/fonts/the-sans) but I do not have a good enough eye to confirm!

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact