Hacker News new | past | comments | ask | show | jobs | submit login

The parent is right, saying a database about proteins 'looks like one narrow niche knowledge base.' in this context is just nonsense.



in which context, and why your bold statements should be trusted blindly?


TL;DR: I explain what proteins are and why they're important, show some biological "flowcharts" , and end up with one "function definition" from the "source code" that makes you you, and has to do with you eating and breathing.

Long:

This is the bit that makes biology awesome to me, so excuse me for the small essay ;-)

Proteins are basically extremely advanced nanomachines which work together in larger systems to ultimately form a cell. Having a listing of all the proteins in a cell (the sum of the parts) is insufficient to grok the whole, but it's pretty darn important. The abilities and limitations determine and constrain what a cell can do, and ultimately influence what organisms and ecosystems are and are not capable of. Which is a big chunk of the science of biology.

Uniprot lists many/all of the genes/proteins that have been decoded so far. It's a bit odd to call that a "niche" in the context of the field of biology.

I'm not going to discourage you though: Proteins and protein systems are pretty darn awesome!

Example:

(using KEGG rather than uniprot, since it's got graphical maps, which is handy to get an intuition)

For instance, if you want to know why you need to eat and why you need to breathe (flowchart for what your cells do with starch and oxygen):

* https://www.genome.jp/pathway/map00500 start by finding starch on this map (gets split into glucose)

* https://www.genome.jp/pathway/map00010 which gets broken down into 2* pyruvate

* https://www.genome.jp/pathway/map00020 which gets processed

* https://www.genome.jp/pathway/map00190 and ultimately "burned" with oxygen.

Each step is a 'chemical reaction catalyzed by proteins'[1] (in the rectangles). You can dig in deeper to find your actual source code: Say we click on a random step (in this case near the top of glycolysis on map 00010)

* https://www.genome.jp/entry/K01810+K06859+K13810+K15916+5.3....

At the bottom you can find the gene listed for Homo Sapiens (HSA)

* https://www.genome.jp/entry/hsa:2821

And this lists the amino-acid (AA) sequence for the protein, and the nucleotide (NT) sequence found in humans. Since this is highly preserved functionality, that's probably (almost) exactly the source code that you have in each of your cells.

KEGG is nice to get an overview of some of the pathways that are fully understood with the maps.

[1] calling it a "chemical reaction" is sort of underselling many proteins. Proteins can have moving parts and can work together. I prefer to think of them as sophisticated nanomachines.


> Uniprot lists many/all of the genes/proteins that have been decoded so far. It's a bit odd to call that a "niche" in the context of the field of biology.

I actually checked uniprot, yes, it lists proteins (probably most of them), but ontology is raither narrow, it has few dozens properties, you can't for example query that DB with question: give me diseases which can be attributed to broken pathways synthesizing protein X, you would need to do a lot of manual work and check external databases of uncertain quality.

Another question is quality of that dataset, why it is so obvious that all those millions of pathways for hundreds thousands proteins are researched and described with 100% accuracy?


> you can't for example query that DB with question: give me diseases which can be attributed to broken pathways synthesizing protein X, you would need to do a lot of manual work and check external databases of uncertain quality.

nod

Uniprot is more useful if you're looking for the actual "bare metal" NN and AA sequences. Which is rather important in its own right, obviously: Sooner or later you DO need the actual sequences if you're going to do something with them in real life.

But uniprot doesn't -itself- give you an understanding of what that code is then doing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: