Hacker News new | past | comments | ask | show | jobs | submit login

Snakemake is a beautiful project and evolves and improves so fast. Years ago I realized I needed to up my game from the usual bash based NGS data processing pipelines I was writing. Based on several recommendation I choose Snakemake. I have never regretted it, It worked perfectly on our PBS cluster then on our Slurm cluster. I made some steps to make it run on K8s, which it supports, and most recently, I'm still/again happy with my choice for Snakemake because it (together with Nextflow) seems to be the chosen framework for GA4GH's cloud work stream's "products" like WES and TES [0]. This seems to be the tech stack where Amazon Omics and Microsoft Genomics focus on [1]. It enables many cool things, like "data visiting": Just submit your Snakefile (the definition of a workflow, a DAG basically) to a WES API in the same data center where your data lives, and data analysis starts, near the data. Brilliant.

I owe a lot to Snakemake and Johannes Köster, I hope some day I can repay him and his project.

[0] https://www.ga4gh.org/work_stream/cloud/

[1] https://github.com/Microsoft/tes-azure




I too owe a lot of my PhD and postdoc productivity to Snakemake. It's my bioinformatics super-power, allowing me to run a complex analysis, including downloading containers (Singularity/Apptainer) and other dependencies (conda), with one command.

Great for reproducibility. Great for development. Great for scaling analyses.

Snakemake is vital infrastructure for my work.


Its fantastic but it doesn't scale laterelly particularly well, compared to just Make.


What dimension are you referring to?


Large scale reproducibility was a problem a few years back for one. Conda and containers were a constant problem for us back then, especially if you had multiple NGS tools running in different environments. This has probably been solved by now, but we went with another workflow system


Agreed, Conda has always been a nightmare to maintain and redeploy, whatever you put it in.


Nextflow seems to scale very well


We went with Nextflow and Galaxy


You can't go wrong with nextflow. I heard a lot of scientists complaining that it's too hard to understand, but honestly the DSL and the scheduling model (flow based) is just great


100% agree, and it's wonderful to see Snakemake on the top of HN.

Snakemake is an invaluable tool in bioinformatics analysis. It's a testament to Johannes' talent and dedication that, even with the relatively limited resources of an academic developer, Snakemake has remained broadly useful and popular.

Super nice guy too, he's always been remarkably responsive and helpful. I saw him present on Snakemake back when he was a postdoc, and it really changed my approach to pipeline development.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: