
Show HN: Visualization of Longevity and Mortality - subcosmos
https://www.infino.me/mortality/usmap
======
xaa
Very cool. I think if you added a few more ways of slicing and dicing this
data, you would have a decent chance at getting it accepted to a
bioinformatics journal. For example, Oxford Journals Bioinformatics has an
"Application Note" type of submission for useful programs that aren't quite
original research. I work with one of the editors and he was impressed with
the interface and thought it would have a good shot at acceptance.

In particular I think there need to be options added to normalize the y-axis
of the graph beyond raw counts to percentage of deaths by age group to control
for the fact that the number of deaths per bin is not the same.

It would also be nice to add a little more categorization to the causes of
death, in a tree-like structure. For example, all vascular disease, with
cerebrovascular disease, CVD, etc, as subtypes.

Also it would be nice to be able to ask questions like, "which states have the
most (or least) fraction of deaths by, e.g., CVD"? Do some states have smaller
or larger gender gaps in particular diseases?

~~~
subcosmos
I DO have a much larger viz in the works! It lets you cut down by race, year
(1999-2013), and a few other metrics. I wanted to put this out first to see
how it performs on various devices. Stay tuned!

I come from an academic background. Made this in my PhD: amass-db.org

But my core passion this days is the project that this viz is hosted at
(www.infino.me). I think I can make a much bigger impact with more consumer-
facing nonprofit apps than in publishing academic articles.

~~~
xaa
Cool, I am looking forward to an improved version.

I would be interested in your rationale behind how the public would/could use
this kind of data. I think the infino.me/health seems to be an example,
pointing out major risk factors behind cancer and CVD. But don't you think
this is already common knowledge?

Or is the primary goal to get people to voluntarily share their health and
genotype data to get a big dataset for analysis, and maybe eventually provide
a sort of personalized risk assessment? I wonder how the FDA views that sort
of thing.

~~~
subcosmos
In short, I built a search engine for my genome. Im letting the rest of the
world use it ;)

I did my PhD work in diabetes genetics. It runs in my family, and Im kinda
pissed that its killing off a large fraction of us. The eventual goal is to
make this into some kind of communal science effort where people can
contribute open source analysis pipelines. I need the right kind of
organizational structure however to keep the data safe and centralized but
still allow open source research.

So, some kind of platform where algorithms can get in, results can get out,
but raw data stays locked up. I want a world where this kind of research
happens in the open, and not privately in biomedical corporations.

~~~
xaa
It is a good idea in general, but the idea of infosec for genetic data is a
minefield. I deal with similar limitations daily -- one of my areas is large-
scale meta-analysis of expression data, which was dandy when that data was
collected using microarrays. Now, it's RNA-seq, so a lot of that data indeed
stays locked up and you have to apply for special permission to access
individual datasets from dbGaP, making large-scale studies
difficult/impossible.

But although "algorithms in/results out" sounds good in principle, I think it
will be hard to implement in practice. You would have to make algorithms run
without network access to prevent a bulk_send_data_to_ip() type of function
from being written, but that would hamper complex programs requiring external
data.

In general I think the only realistic way forward is to take the 1000 genomes
approach of finding people who are willing to take the privacy risks of truly
open-sourcing their data. But it sounds like an interesting idea and I hope
I'm wrong and your approach turns out to be workable.

~~~
subcosmos
I like your thoughts here! Indeed I have been hoping to mostly attract people
who are willing to be fully open with their data. If I ever get big enough to
implement this 'algorithms in/results out' approach I intend to re-engage the
whole userbase and have people opt-in to crowdsourced scientific analysis. To
prevent data leaking I figured we would start with full code review, and
indeed air-gapped analysis.

Its a continually evolving thing. I imagine it would be years before I get to
that stage. Depends on if I find funding or university help.

Hit me up at info@infino.me if you'd like to chat more.

------
subcosmos
As lifespan in the modern age continues to increase, it is interesting to dig
into the leading causes of death for perspectives on where to put our research
efforts.

Made using the dc.js library. It's interactive! You can click on any of the
plots to refilter the data.

~~~
adrianN
I think it's worthwhile not to put all our efforts into curing a particular
cause of death. First because this doesn't necessarily increase quality of
live for the elderly, which I think is hugely important. Secondly, because
after a certain age all kinds of things break down and the current leading
cause of death (eg cancer, heart-disease, alzheimer) is likely just an effect
of general accumulation of damage in the body.

~~~
subcosmos
Agreed completely. My deepest passion is aging biology. What I learned however
in studying type II diabetes and obesity, is that the molecular mechanisms are
hugely aligned. Its all the same core metabolic genes that the aging
biologists focus on in worms and flies. Those are the genes that seem to be
related to insulin signaling and diabetes.

My core vision of this project is to better understand this overlap.

------
kbenson
The age of death graph cuts off the values of the y axis on the left, so all I
can see is a bunch of "00,000" values. I can only really make relative
assessments to the age groups. :/

~~~
subcosmos
Im tempted just to turn the axis off and make a box showing total counts.
Getting things to line up is always a pain. Yuck!

Thanks for reporting the bug.

------
varelse
Not to sound snarky, but uploading my fitbit data IMO just solved the mystery
of obesity (at least for me).

I burn ~3200 calories a day and I walk >10 miles a day. This is the 100th
percentile of your data and that floors me.

So I have to say that step 1 is getting people to get off their asses and
move. It'd be nice if they cut down on red meat and sugar intake while they
were at it, but small steps, no?

~~~
subcosmos
Well, so this project of mine is in beta, which makes you the 22nd Fitbit
user. Not the best sample size :) Im on the higher end too. Not sure how we
stand up against the 20 million other fitbit users out there.

~~~
varelse
Cool, one thing though, you're not getting my genome. That's like giving you
my fingerprints and we've just barely met. My fitbit data has probably already
outed me to you given what I said previously. I'm OK with that because there's
nothing but good news there. My genome? Well, it's a mixed bag like anyone
else's hence the 20K+ steps per day as an ongoing service patch.

I do wish there were a way to anonymously calculate the very comparative
statistics you're generating, but alas I don't see one. Am I missing
something?

~~~
subcosmos
No problemo!

Really what we need is some kind of homeomorphic encryption process that would
enable large scale Genome Wide Association Studies (GWAS) to be performed
without actually divulging the underlying raw data. Until that happens though,
scientists like myself have to contend with the privacy concerns of many.

