
Show HN: Self-Published Book – “Data Science in Production” - bweber
Hi HN,<p>Over the past 6 months I&#x27;ve been working on a technical book focused on helping aspiring data scientists to get hands-on experience with cloud computing environments using the Python ecosystem. The book is targeted at readers already familiar with libraries such as Pandas and scikit-learn that are looking to build out a portfolio of applied projects.<p>To author the book, I used the Leanpub platform to provide drafts of the text as I completed each chapter. To typeset the book, I used the R bookdown package by Yihui Xie to translate my markdown into a PDF format. I also used Google docs to edit drafts and check for typos. One of the reasons that I wanted to self publish the book was to explore the different marketing platforms available for promoting texts and to get hands on with some of the user acquisition tools that are commonly used in the mobile gaming industry.<p>Here&#x27;s links to the book, with sample chapters and code listings:<p>- Paperback: <a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;dp&#x2F;165206463X" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;dp&#x2F;165206463X</a><p>- Digital (PDF): <a href="https:&#x2F;&#x2F;leanpub.com&#x2F;ProductionDataScience" rel="nofollow">https:&#x2F;&#x2F;leanpub.com&#x2F;ProductionDataScience</a><p>- Notebooks and Code: <a href="https:&#x2F;&#x2F;github.com&#x2F;bgweber&#x2F;DS_Production" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;bgweber&#x2F;DS_Production</a><p>- Sample Chapters: <a href="https:&#x2F;&#x2F;github.com&#x2F;bgweber&#x2F;DS_Production&#x2F;raw&#x2F;master&#x2F;book_sample.pdf" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;bgweber&#x2F;DS_Production&#x2F;raw&#x2F;master&#x2F;book_sam...</a><p>- Chapter Excerpts: <a href="https:&#x2F;&#x2F;medium.com&#x2F;@bgweber&#x2F;book-launch-data-science-in-production-54b325c03818" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@bgweber&#x2F;book-launch-data-science-in-prod...</a><p>Please feel free to ask any questions or provide feedback.
======
dvt
I'm noticing no mention of virtual environments/venv. This is something I
notice a lot of (Python) junior data scientists and data engineers struggle
with. It's very important to set up environments properly (and following best
practices) to avoid version collisions, global scope pollution, etc.

Great work, though! I'm also using bookdown (I instantly recognized the
template) for a book I've been working on and it's a pleasure to use. Would
love to see a blog post on how you marketed the book and how your sales are
doing once the ball gets rolling!

~~~
ShorsHammer
Given how fragmented virtual environments are in python is which is the most
popular currently? Last I looked there was venv, pyenv, pipenv, pipx, poetry
and pipsi.

Haven't done any python work in a while but am curious to know what people
use. Starting right now, I'd probably just stick with venv, though guessing
the others do offer some extra benefits.

~~~
k4ch0w
I see conda a lot more for new python coders in the machine learning space,
having a GUI helps newer programmers. Check out Anaconda/Miniconda. Code is as
simple as

    
    
      conda create -n myenv python==3.8
      conda activate myenv
    

Do all your things

    
    
      conda env export > myenv.yml
    

On your coworkers machine

    
    
      conda env create -f myenv.yml

~~~
CoolGuySteve
conda can also manage packages that aren’t part of python such as R, which is
why I mainly use it.

------
lifeisstillgood
Fantastic - the end result pdf looks great.

Please blog about the details of the production process (scripts you wrote,
problems with paper sizes or the amazon real paper process.) I am 30,000 words
into a book and would love to hear more.

Good luck !

~~~
bweber
Sounds like there is a few requests for this, so I'll look to authoring a post
on this. And also talk about motivation for going the self-publishing route.

You can follow me on Medium for this update:
[https://medium.com/@bgweber](https://medium.com/@bgweber)

~~~
lifeisstillgood
thank you - look forward to it

------
holocen
I really enjoy your articles on Towards Data Science and this seems to pull a
lot from it. I bought the PDF copy. I have a Full Stack background and really
like it all from the data engineering perspective

Thanks!

~~~
bweber
Thanks. I originally planned on covering more topics related to DevOps, such
as CI/CD for model deployment, but felt that this might be a bit of a stretch
for some readers, and it's any area where I have less experience. Glad to here
it's useful from a full-stack perspective.

------
freediver
Way to go! Would you consider a blog post about the self-publishing
experience?

~~~
btbytes
As user dvt mentioned, this is built using bookdown[1], an R library (with the
help of Pandoc). You can see that the example chapter of this book looks
exactly like the bookdown output[2]. The bookdown PDF explains in detail how
to use Rmd+RStudio+R+Pandoc+Markdown to publish this.

[1]:
[https://bookdown.org/yihui/bookdown/](https://bookdown.org/yihui/bookdown/)
[2]:
[https://bookdown.org/yihui/bookdown/bookdown.pdf](https://bookdown.org/yihui/bookdown/bookdown.pdf)

~~~
bweber
Here's the complete source for the last text I authored using this pipeline:
[https://github.com/bgweber/StartupDataScience/tree/master/bo...](https://github.com/bgweber/StartupDataScience/tree/master/book)

You can use the same tooling to create an epub output, but the formatting will
be substantially different.

------
the_resistence
Well done. Just bought.

~~~
bweber
Please leave a review if you purchased on Amazon.

