Run the UniProt sparql service on it ;) I was lucky to test the yarcdata implementation of sparql on an xmt machine around 2012, super UI but single user.
It doesn't. Most people complaining about "ugliness" of Rust are either too lazy to read documentation (which is great btw), or just don't like the syntax, because it doesn't resemble their favorite language[s]. Some have valid criticism, but they usually lay it out straight, sometimes in the form of a blog post, instead of vague "Rust is bad" statements.
I like Rust a great deal and really enjoy using it, but I also think it's kind of ugly. Like I'm talking surface level aesthetics, purely subjective. I can't quite put my finger on why. I think it's to do with the heavy use of special characters?
i mean..don't act like it's impossible for someone to have a well-informed opinion that Rust is bad. It is an opinion and for some people, Rust is bad! I've given it multiple shakes and can't see myself using it. Don't like the values, the syntax, or the design.
it's just a fact that i can't imagine a situation where i'd use rust over haskell. I'd sooner generate a C program via Haskell eDSL than use Rust if I were to do embedded work, for instance.
And aside from embedded, Haskell clears Rust kind of comically if I lay out a matrix of what I care about in a PL.
You don't have to exhaustively defend your opinion though. "I think X is bad/ugly/etc. $HIGH_LEVEL_COLOR" is plenty imo. You don't have to prove your opinions.
You're jumping a few steps ahead, and that's not what we're asking for.
> you can't even properly use its functions to build capital-A abstractions because it's compiler is too dumb to optimize them properly
This line in particular is what (multiple) people are indicating makes no sense. You don't have to exhaustively defend your opinion, but you could write a more insightful opinion. There's a difference at play here.
Capital-A abstraction means lambda calculus. Compiler too dumb means you can't just program with functions & write said Abstractions (which map cleanly to proofs via HC) in Rust because it does not handle them well.
I think that GO with GO-CAM is definitely going that way. Basic GO is rather simple and can't infer that much (as in GO by itself has low classification or inference logic build in). Uberon, for anatomy, does use a lot of OWL power and shows that the logic-based inference can help a lot.
Reactome, is a graph, because that is the domain. But technically it does little with that fact (In my disappointed opinion).
Given that GO and Reactome are also relatively small academic efforts in general...
Just want to note that there is a lot of non GPL software in RHEL. e.g. Apache licensed or MIT. For which they do not have an obligation to provide the sources.
I already pointed that out multiple times to Rocky Linux staff. They have no answer to this. The whole concept of Rocky is a big bet, that Red Hat won't pull the sources of non-GPL binaries e.g. in cloud instances and other public accessable places, where Rocky is currently fetching their sources from.
If you choose not to participate in this gamble, then this distro may not be suitable for you.
Nice classification of graph dataset size. I just think that XXL is the wrong size for a 1Tb dataset. Datasets like wikidata, UniProt, omabrowser etc. as a graph are really not XXL when decently funded (i.e. if they could afford the memgraph cloud pricing). I would have shifted them so that medium would be XS and then continue to the 1 trillion edges+nodes to get to the XXL size.
Yeah, size classification is always tricky since the reference point always moves.
If you deal more with large datasets, the spectrum would ideally move to the right for you, as you have described, since you probably need a trillion on that scale.
In 2014 I was at Oracle Open World. A 3rd party hardware vendor was saying (and having customers) for Hadoop "clusters" that had 8 cpu cores. Basically their pitch was that Oracle Hardware (ex sun) started at a dense full rack of about a 1 million USD or so, but with the 3d party you could have a hadoop "cluster" in 2U and for 20K. The oracle thing was actually quite price competitive at the time, if you needed hadoop. The 3rd party thing was overpriced for what it was.
Yet, I am sure that 3rd party hardware vendor made out like bandits.
I just want to note, that reliability becomes an issue at scale as well. C can be the first to crash making it's "faster speed" useless. An example: there was a benchmark game for DNA GC counting. C was the fastest at beginning (a vectorized rust version has taken over). However, the C version in the lead at that time was the only version that could not count the GC ratio in human Chromosome X. Due to segfault/integer overflow. So in practical terms the C version was really infinitely slow even if the top ranker in the benchmark.
On a person note: I am dealing with C segfault ruining 261 hours of compute, which will probably need about a 1000 or so more to debug :(
Then I can say that the C version for that game was designed badly from the start. You should not do that kind of mistakes when laying your foundation down.
I understand your pain. I have written a material simulator in C++ which didn't use any safety barriers after the startup process, which made sure that it was safe to start the hot loop. On the other hand, my code never had a memory leak or never crashed, because I knew what I was going into and designed it appropriately. Most importantly, that thing is fast. Millions of iterations per core per second fast.
> and often don't even capture the molecule pose as it would appear in biology.
Mapping between ligands in PDB and cognate ligands as being annotated in UniProt is improving :) my UniProt curator colleagues are working hard on this. Though a lot was made possible by re-annotating all cognate ligands with ChEBI.
Thank you for your work :). It's been a long time since I used the PDB directly but I remember being frustrated about how essential and sparse it could be.
I'm curious about your toolchain. Is it just a community going through and manually annotating, or do you have something that helps pick out obvious things that can be fixed using something computationally predictive? If you have links on the UniProt website I can also just read those. Thanks!