Hacker News new | past | comments | ask | show | jobs | submit login

Start with downloading SRA toolkit: https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-To...

Find some data of interest: https://www.ncbi.nlm.nih.gov/sra?term=(%22Homo%20sapiens%22[... (This searches SRA for human genome sequences on illumina with fastq files available)

Run fasterq-dump on the SRR (listed as "Runs" in the SRA page of your choice): fasterq-dump SRR21812682

Download a microbial genome of interest, here is the link for common yeast: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF...

Install an alignment tool like bwa: https://github.com/lh3/bwa

Unzip the the genome file and create a bwa index: gunzip GCF_000146045.2_R64_genomic.fna.gz && bwa index GCF_000146045.2_R64_genomic.fna

Align: bwa GCF_000146045.2_R64_genomic.fna SRR21812682.fastq (or whatever the fastq files are named)

If you get any alignment results, you've "found" fungal DNA in a human sample. This is a highly simplified workflow, but covers the basic ideas. One of the papers is free and the method sections covers their workflow (it is much more complicated):

https://www.cell.com/cell/fulltext/S0092-8674(22)01127-8

Useful resources: https://www.biostars.org/ https://rosalind.info/problems/list-view/?location=bioinform... https://www.cancer.gov/about-nci/organization/ccg/research/s... (source data for this paper, cancer specific sequencing data)




If anyone is thinking of doing this, it is both this easy and way harder.

Yes, this is an approximate workflow. It doesn’t take that much specialized knowledge to get it running.

However step 2/3 (find/download a dataset of interest) is harder. Finding whole genome sequencing data for a cancer that you can download without being part of a research institution is difficult. There are a lot of controls over who can access raw DNA sequences from patients. RNA data are much more readily available as they are less identifiable.

Specifically, here is the type of access you need:

https://gdc.cancer.gov/access-data/obtaining-access-controll...

The reasons for this are good and I’m not trying to say otherwise. Just that from a practical perspective, being able to technically perform the analysis is doable for many non-biomedical people here. However, accessing the raw data is much more difficult.


I am reaching for something - that software is now enabling a minimal access level to amazing corners of science and technology - we can luck satellite photos from the ether, have whole dna sequences on our desktop.

I feel there is a "basic bootcamp for the 21C" that I missed




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: