Hacker News new | comments | show | ask | jobs | submit login
Ask HN: Can use the #1 supercomputer for any project I want. What should it be?
107 points by Xcelerate 1745 days ago | hide | past | web | 91 comments | favorite
I'm taking a very interesting class this semester called Computational Physics. The goal in the course is to select a research project that requires the use of parallel computing. We can use our university cluster or, because of the close association of the university with Oak Ridge National Labs, we have the opportunity to use the world's fastest supercomputer, Titan (17-27 pFLOPS). The usage of the entire machine at once is reserved for very special projects, but I'm sure I'll still be able to use a large number of nodes because projects that can be done on smaller supercomputers aren't allowed on Titan.

The professor recommended choosing something that relates to published research in physics or our own research field (mine is chemical engineering -> molecular dynamics). This freedom to choose whatever is really exciting, and I've got some interesting ideas, but I imagine there a lot of experts in their field who post on HN, and so someone may have a good idea for a new and exciting project.

My initial idea is to contribute to the effort of porting Quantum Monte Carlo code to GPUs. Titan's processing power is unique among supercomputers in that most of it comes entirely from nVidia Tesla K20X GPUs. QMC is among the most accurate methods that exist in predicting physical phenomena; the problem is that the methods are incredibly computationally demanding, something which highly parallelized GPUs are well-suited to handling. But I don't know. Maybe that's too much to do in a semester.

Determine the resistance of hashed InChI keys to brute-force attacks.

The IUPAC InChI keys are a nomenclature to turn chemical structures, like morphine, into a unique linear representation, like InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1 .

This string is too long, so there is a way to use SHA-256 to convert the above into a hashed string BQJCRHHNABKAKU-KBQPJGBKSA-N, where the first 14 characters are the basic topology, and the successive 9 characters contain the other information.

Some people believe that this can be used to query the web for "secret" information. That is, you work at a pharma and want to know if others know about compound X. If they don't know about X then you don't want to tell the about it. Otherwise it reveals information about new compounds that you are working on.

You instead search for hash(X). If others have hash(X) then you're revealing less information, since this is likely a publicly known structure. If others don't have hash(X) than you might conclude that you have a proprietary structure and haven't revealed enough information for others to know what you are working on.

I don't believe that the hash key is appropriate for this. I think it is open to a brute force attack.

While 26 * * 14 is very large, most people work inside "chemical space" of reasonable drug-like compounds. The Reymond group has enumerated, in GDB-13, all drug-like containing up to 13 atoms of C, N, O, S, and Cl, which is 977 million compounds. If you convert these to InChI hashes, then you might be able to guess the core scaffold given an unknown InChI hash.

Once you have the core structure, you might then be able to brute force the bond and hydrogen assignment.

This can be tested by taking GDB-13 and finding the InChI hashes. You may need to enumerate over a range of hydrogens for each one, giving some several billion keys. Then take, say, the ChEMBL data set (for N<=13 heavy atoms) as the source for the "secret" keys, and generate their hashes. Are there matches? What percentage of topologies can be found this way?

If you can find the topologies, can you then as phase 2 deduce the overall structure, and if so, what percentage?

As a quick estimate, phase one (enumerating GDB-13 InChI keys) would take over 10 years on my desktop, and 5 TB of disk. That's about perfect for a supercomputer job, and you can select subsets as a appropriate based on the available time. (Eg, pick only the C, N, O, S subset of GDB-13 and ChEMBL and you've reduced your space by a lot.)

I don't know what's needed for phase 2 and can provide no estimate.


this is an embarassingly parallel problem. The supercomputer hardware (interconnect) would be simply wasted. If you provide C++ source code that can be compiled with Native Client, and a list of tasks to run (binary command lines), Exacycle (http://googleresearch.blogspot.com/2012/12/millions-of-core-...) is more appropriate. Based on our publicized results this wouldn't take very long (10 years on a desktop = less than an hour on Exacycle; you'd spend more time writing the program and the data analysis than the actual runtime).

however, I don't really see this problem as a high priority. pharma has bigger problems than to probe remote websites to find out what their competitors might be interested in.

Absolutely embarrassingly parallel. Thing is, I don't know what kinds of problems Xcelerate is interested in working on, I don't know what projects are given preference on Titan, and I don't even know if I'm in the right order of magnitude in my estimate.

There are easier ways to make it more complex. For example, enumerate the first 15 atoms instead of the first 13. I believe this requires coordination between the enumeration nodes in order to reduce duplicate generation, but I don't know that subfield and labeled graph enumeration gets further away from hands-on chemical work. And stage 2 requires a combinatorial approach taking maybe 1,000x more time than the first stage. (That's a jazz hands level of waviness - I really don't know.)

I once did work in molecular dynamics. In fact, I was one of the co-authors of NAMD. That's a rather intently explored field, and progress in it can feel rather incremental. What I proposed is one that no one has explored, to my knowledge. There's maybe 50 people in the world who would be able to do this work now, if they had time and hardware access (which they don't), and perhaps ~5,000 people who would be affected by knowing the result; mostly to know if they could do certain searches on the public web.

So it's one with little competition and where the results would be immediately publishable. Not bad for a semester project, I think! I think it would be a good Master's thesis. And a lot different from the usual set of MD, docking/screening, folding projects that have consumed massive amounts of supercomputer time for the last three decades.

Given that Xcelerate was talking about porting some code to the GPU, I suspect that that person already expect to "spend more time writing the program and the data analysis than the actual runtime." When I was doing MD work, I expected to spend about 6 months in simulation time and about 2 years in analysis for a PhD. The previous generation to me in the group had to build their own hardware before doing simulations, so I though that it's usually the case that the non-simulation time is longer than the simulation time. :)

You said: "I don't really see this problem as a high priority. pharma has bigger problems than to probe remote websites to find out what their competitors might be interested in."

So they should only work on the big problems that everyone else is doing? What about smaller problems which might help solve big problems?

For example, 10 hours ago here on HN, mtgx posted a link titled "Patents Are Making Us Lose The Race Against Antibiotic-Resistant Bacteria" (to http://www.techdirt.com/articles/20130110/09590621628/world-... ), which reports that the "World Economic Forum's 8th Global Risks Report" suggests that "Rather than today's monopolistic hoarding [of data], what we need is more sharing of [pharmaceutical] knowledge."

Let us suppose that they are correct, and we need to have more sharing of some knowledge between different companies, perhaps with "public or philanthropic funding to incentivize academic collaboration."

One of the questions you could ask, if you had more information about what goes on inside of the companies, is: what's the overlap of the chemical space being tested by the different companies? Are they too similar? A recent paper on the topic - and one of the few such papers - is the recent "Big pharma screening collections: more of the same or unique libraries? The AstraZeneca–Bayer Pharma AG case". (See http://pipeline.corante.com/archives/2012/12/06/four_million... for a summary.)

In it, they conclude that there is a "low overlap between both collections in terms of compound identity and similarity."

They did it by using 2D ECFP4 fingerprints. A fingerprint is very much like a Bloom filter, where bits are set based on chemical features. It's one way to get information about identity and similarity. It's not strictly reversible, but it's leaky. For example, fragments can have certain characteristic patterns in the ECFP4 fingerprint, which can be used to infer some of the original structure.

The InChI hashes are another way to test if two data sets contain identical structures. Two pharmas may want to compare them before during a pre-competitive collaboration, in order to determine the overlap between their two collections. What is their risk model? Can they outsource the evaluation knowing that no information can be leaked? Or is there an unexpectedly high leak rate which means that the hashes must never leave the internal network, and should be restricted to only a few trusted people?

Right now, we don't know that answer. We believe that it reveals little information. I'm not so sure about that.

If they don't reveal information, then certain new types of discussions can occur, which might (if the World Economic Forum is correct) help lead to the development of new drugs.

Or, if it does reveal information, then it might help characterize how leaky that information actually is. Eg, it takes 1 month on an Exacycle machine to identify 1% of the structures with fewer than 15 atoms, and 10 years (estimated) to identify 1% of the structures with 17 atoms. That gives a usable risk model that doesn't currently exist.

With regards to the people who decide what runs on supercomputers: the simple rule is they designed the system for low-latency high-throughput calculations, and don't want embarassingly parallel stuff running there. They spent half the budget on interconnect, and don't want it wasted. It's entirely possible the OP's project allows them to run embarassingly parallel codes, but frankly, that's just a waste of energy given what the hardware is capable of. I had this argument when I worked at the supercomputing centers, over and over (I told them to build loosely coupled commodity clusters with great storage IO throughput) but they never saw my point of view, so instead I built exacycle and work to make it available to scientists who can keep it busy.

With regards to sharing scientific information related to pharma, I (and the rest of the Exacycle team) believe strongly that the results the Exacycle Visiting Faculty achieve must be made public- ideally all the raw data in addition to the published analysis. If you look at the blog post I referenced, you'll notice that three of the faculty are working on protein/drug related problems, and we'll be sharing the bulk of the results with the scientific community.

The work we've done has moved protein folding and drug design far beyond incrementalism. The scale we make available also turns things that were 1-10 year projects and makes them an overnight job. I also spent a year running MD simulations on a supercomputer, so I can appreciate the higher level of scientific productivity that a system like Exacycle provides.

Some of the questions you raise about cloaked IP sharing are interesting, mainly from a theoretical perspective, but I think pharma would be more likely to use existing IP protection mechanisms, or they would simply not collaborate at all, before adopting stuff as "complicated" as this.

Thanks for the update. My own supercomputer experience is nearly 20 years out of date. I demoed our MD work in the HP booth back in SC'94, and IBM for SC '95. Even then it was apparent that MD required good interconnect to match the increases in CPU.

Beyond that though, I don't know where this conversation should go. I could be cynical and point out that the proposals you all accepted, while worthy, look similar to the proposals from 20 years ago but with "SP-2" or "cluster" scratched out and replaced with "Exacycle." Even some of the names (Russ Altman, GROMOS/GROMACS) have little changed in the last 15 years.

I could observe, as you did in passing, that this project does not turn "1-10 year projects" into "an overnight job" because writing the software and understanding the results dominates the overall time - otherwise they would be done, no? And realistically, few people embark on 10 year projects; they estimate 10 years then work on another project which is more tractable, with the hope that in 10 years it will become a 2 year project.

I could go the Greg Wilson route, and argue that the "Real Bottleneck in Scientific Computing" is not the CPU cycles but the need for better training in computational scientists in how to write and organize software, and a cultural change in how journals treat software development and testing. Yet supercomputers gets a large amount of funding compared to training.

Or I could quizzically wonder why it is that "interesting", "theoretical" work isn't exactly the sort of work that researchers should be doing. For example, "InChIKey collision resistance: an experimental testing", doi:10.1186/1758-2946-4-39 shows that related work is easily publishable.

And I further wonder how my conversations with people at pharmas, about how they might use the InChIKey, is "complicated." It was someone at AstraZeneca who told me they were evaluating the use of InChI keys in order to link internal databases to external web search engines and data sets without triggering the usual reactions against revealing internal structure information. They thought to generate the hash for each compound, make a hyperlink to the external site using that hash, and they are done. That seemed reasonable to me then. Over time I've wondered just how safe it actually is. Can pharmas actually use InChI keys to do things that they can't do with "existing IP protection mechanisms"? Are they leaking data already, without realizing it? If so, what is the level of leakage?

But realistically, or perhaps back to cynically, justifications like "could cure cancer" or the newer "could help identify new antibiotics", or like "help understand disease" or "reveal insights into fundamental biochemistry" are ever so much easier to justify funding than the mundane "train computational chemists in how to program" or "see if there are new ways to exchange proprietary data between distrusting organizations".

What, me bitter? Perhaps slightly, but I've decided that there's much more interesting diversity in the sorts of problems that one person or a small group can work on, without having to justify its practicality or how it fits into some long-term Grand Challenge vision. It just means that there are some interesting projects which need high-end compute power that I just can't work on. Yet.

Why is Google insisting on the use of NativeClient?

Not insinuating anything, but a NaCl-compiled distributed task would be an excellent fit for farming out to unsuspecting browser clients. (Reminds me about that sneaky javascript-based bitcoin miner that was injected on a few web sites some time ago)

We looked at several security and sandbox technologies, and NaCl was closest to our requirements: near-native performance, easy access to the NaCl software development team and supported C and C++. Strictly speaking, we use it as a "Native Server", rather than a "Native Client" (it runs on server-class machines, not desktop browsers).

We've talked with Vijay Pande (of Folding@Home fame) about making "Folding@Chrome", but that would be opt-in (IE, you'd install a Chrome App).

If you have an alternative sandbox and security framework, I'd be happy to hear about it.

That's cool, I never really thought of using NaCl on the server side like that.

So you are saying it is a better solution than any combination of Java, Ruby, SELinux, any other regular OS ACL, Xen, or any other virtualization platform?

At least in my experience, most of the codes we expected to run were written in C or C++. We would like to support Java, but it's not an option right now (Native Client doesn't support Java, and I'd need to research the Java sandboxing technologies). Another issue is that we limit memory usage, and Java memory usage tends to be higher than C++ programs; ditto for virtualization. Note that virtualization isn't really a security solution on its own. I can't really see Ruby being the right approach for CPU-intensive computing (we're talking hundreds of thousands of cores, so it's worth investing in making the code efficient).

BTW, I definitely think seccomp http://en.wikipedia.org/wiki/Seccomp and linux containers http://lxc.sourceforge.net/ or perhaps OpenVZ: http://wiki.openvz.org/Main_Page

could be engineered into a secure sandbox, without having to resort to virtualization (which I'd like to emphasize isn't really a security solution). This is called "container virtualization" (as compared to hardware/CPU-based), and has a number of very nice properties.

Because NaCl is a fully sandboxed way to run full-performance native code.

You might be interested in this: http://zerovm.org/ (No association but it seems highly relevant.)

It takes your computer 323 milliseconds to convert an InChI key to a hash? You need to improve the implementation a bit there, I would expect a sha-256 operation, even with a bit of string parsing first, to take _at most_ single digit milliseconds. Add to that a small EC2 cluster of multicore machines, and you should be able to do this in no time - no need for those fancy super-high-speed interconnects that the 'real' supercomputers have.

I think I was too loose (read "wrong") in my nomenclature. I wrote "The IUPAC InChI keys are.." when I should have written "The IUPAC InChI strings are.." There's the long InChI string, and the shorter InChi key. The key is the hash form of the string.

Generating the hash is, as you say, trivial given the InChI string. Computing the InChI string in the first place is not. It requires molecular perception of the input graph. For example, which of the hydrogen positions are mobile (eg, can be identical under tautomerization) and which are fixed? It also requires generating the canonical graph representation, which is done via a variation of the nauty algorithm. Usually it's quite fast. Some structures can take longer. By default there's a 60 second timeout.

I haven't worked with the InChI code for a while, so I don't know what the performance numbers are. An output log from 2009 shows "Finished processing 2186 structures: 0 errors, processing time 0:00:05.93." or 369/second = 2.7ms. Perhaps its under 1.0 ms now?

You estimated 323 milliseconds/record based on the numbers I gave. However, I did not describe the step which turns 977 million compounds into the actual number to evaluate. I would also include the combinations replacing Cl with F and with Br, and I would replace at most two of the nitrogens with a phosphorous. I may also have to try different hydrogen counts, depending on the types of structures in GDB-13. I guessed that this would require about 100x as much compute time as the base 977 million structures.

This is all meant as very rough estimates. It's only been a background thought until now, and I hadn't written it up before. In any case, the parameters are all quite tunable, in that I could crank up or down the problem size to match the available CPU time and researcher patience.

My background: I am a HPC sysadmin, and I administer a cluster of about 240 nodes. I have a PhD in astrophysics, and have done research in computational medical physics. The users I currently support are mostly people who molecular dynamics codes.

Molecular dynamics could an ideal match for the hardware and your own knowledge: the standard MD codes (NAMD, VASP, Wien2k, Quantum Espresso) in the field are all well-tested and debugged. Some of them (NAMD at least) can also take advantage of GPUs. These codes represent hundreds of person-hours of R&D. I advise against reinventing the wheel. In fact, in almost all established computational science endeavours, it is safe to say that you should not start from scratch. And definitely not if you have just a single semester.

I don't know much about existing quantum monte carlo codes, nor how mature the application of GPUs is to QMC. I assume you will do a brief literature review. (Brief because one semester means that the scope of your project has to be fairly well-limited.) Communicate with QMC practitioners via mailing lists etc, too. There are bound to be some problems people have that they just never got round to figuring out due to other priorities. Parallel programming is hard, GPU programming is hard, and transforming serial algorithms into GPU-efficient algorithms is hard. But that's half the fun. :)

Yeah, I definitely couldn't and wouldn't reinvent the wheel in one semester :) I have, in fact, been using LAMMPS on Kraken (the other supercomputer at Oak Ridge) for my research, but I decided for this project I would do something a little different so I can learn something new.

The literature I have read thus far on GPU QMC seems to indicate that it's a fairly recent endeavor and has only been tested on small clusters. Of course, I need to do a deeper search before I begin working on anything, but if the field could use help scaling it up to something the size of Titan, then that could provide an order of magnitude increase in the kind of processing available and significantly help research that relies on QMC. If I could even make a small dent in the progress and add a new simulation technique to my knowledge-base, I think that would be a successful project!

Don't want to discourage you, but the problem is that super-large runs generally require super-well-debugged code to scale to such large core numbers. Hence it's pretty difficult to do such a run without gradually stepping up with progressively larger runs, ironing out the bugs, running a bigger run, etc. This is likely to be extra hard for hybrid GPU/distributed applications, and the cycle time for these very large runs becomes long so the process is slow.

Our research group was determined to run the largest hydrodynamic cosmological simulation ever done, using a new code. Over a year ago the code was "basically ready" and had been used on "small" runs of ~hundreds of k CPU hours. They still haven't completed a big run...

I'm really curious what sorts of bugs you are referring to. Could you elaborate on what the hurdles are going from a small run to a large one?

Classically speaking, it's hard to get linear scaling for a large number of cores. Eg, on a small, 128 core machine, one algorithm I did recently gives maximum throughput with 15 cores, and that only gives a 10x speedup. To make effective use of all 128 cores I would have to pay a lot more attention to how the data flow maps to the architecture. That's hard. For example, one of the older pieces of hardware used a ring-based topology, which was more appropriate for a force decomposition, while another network architecture made a spatial decomposition easier. And on that old ring-based topology system, the previous developers could program the network chip to handle I/O while the CPU was doing the dynamics, in order to eek out even more performance.

In general, if you scale by more than one or two orders of magnitude then you're going to run into new, unexpected problems.

In one project I worked on, the code was fast on all of the test sets, but when scaled up to the entire data set it was unexpectedly slow. We traced that down to a logger which we hadn't realized was enabled. In your code you might find that a minor piece of code used order n-squared time but because of data partitioning, it has minimal impact.

Or you might find that you have a barrier, where everything is supposed to be synchronized. (Eg, once every 1,000 time steps you rebalance the system load.) A few stragglers might not affect small system sizes, but cause 50% slowdowns with large numbers of CPUs. You might even have to rewrite the code to remove the barrier and come up with another way to do dynamic load balancing.

Bram Cohen, in his Stanford EE380/2005 talk on BitTorrent (linked from http://bramcohen.livejournal.com/11025.html?nojs=1 , at 8:45) mentioned how TCP's checksum isn't good enough once you start swapping terabytes of data over the Internet, so BitTorrent has to work around that.

On the topic of flakiness, Google's own Big Table/MapReduce, etc. is designed to be able to handle machine faults, which can happen with large numbers of cores. How well does your code handle those sorts of rare failures?

In another project I worked on, there was a bug in the code. It didn't conserve energy as well as we thought it should. That was eventually tracked down to a trigonometry error where we didn't compute one of the energy terms correctly when the angle was exactly 0.0. This doesn't happen that frequently, which is why we only saw it with large simulations run for a long time.

dalke mentioned some good examples.

In our case, the most annoying problem we had was that the entire run would sometimes just hang. I'm not aware that the problem was tracked down, but the author suspected some race condition in the MPI library that caused some message to get lost and thus some task to wait indefinitely.

Model the expansion of the universe assuming a foam like structure and a non-zero cosmological constant at different values. Then model the path of light rays traveling large distances over long periods of time. The patterns of hydrogen absorption lines should be pretty distinctive and could be used to test various cosmological possibilities.

This was my 1995 final year Physics project with inadequate hardware and poor language options.

Of course back then the assumptions were considered a bit radical but now much more accepted as reflecting something approaching reality.

I think it would be an interesting project to do it again with modern tools and some real computing horsepower.

Guessing you saw the article on the lack of quantisation jitter on quasar light then, then? :)

Right. Which is interesting on a whole host if levels. Though, this is more of a macro experiment. You're really measuring distribution of "walls" and "filaments" of matter in the universe by measuring the hydrogen absorption lines ... As the lines get shifted and new ones occur you get a sort of barcode pattern, each possibility distinctive of certain histories of expansion and build up of structure. There should be clear signatures of, for example inflection points where an expansion switches to contraction or a slowing expansion flips to an accelerating one.

You may find http://www.astro.auth.gr/~tsagas/Publications/Journals/MNRAS... interesting if you haven't seen it already, suggesting FLRW could still be consistent with observations. But yeah, totally agree that this would be a neat simulation to run!

That is a interesting paper. I'm gonna read it later. The dipole observations related to galactic spin was a fascinating discovery a while back. More evidence for an "axis" presents some interesting cosmological possibilities. Thanks for this evenings reading assignment :-)

No worries. Nights are best spent trawling through papers, gently blowing one's mind. Always a pleasure finding someone else passionate about cosmology!

Everything else is just stamp collecting ;-)

How about calculating the entry conditions of asteriod Apophis (or something along the line of computational fluid dynamics). That would be something important, something worth the power and time of Titan, and pretty much a big deal, if you pull it off! The latest data for it would be available soon, given you convince Goldstone the nature of your request.

http://en.wikipedia.org/wiki/Computational_fluid_dynamics http://en.wikipedia.org/wiki/99942_Apophis http://en.wikipedia.org/wiki/Goldstone_Deep_Space_Communicat...

Bitcoin. Pay off your student loans one Merkle tree at a time.

Serious question. How much Bitcoin could you mine on this supercomputer? Given that you don't have to pay for hardware or electricity, I imagine it would be competitive with the fastest GPU, FPGA and ASIC miners.

Ok well blocks generate at an average of 1 / 15 minutes, so about 96 blocks a day.

Each block generate 25 BTC reward. At ~14.00 USD/BTC on MtGox right now that about $34k.

So the maximum reward in USD if you could totally control the block chain per day is $34,000. Compare hashing rate of your super computer to total mining hash rate of the btc mining swarm and that is your fraction of 34,000.

If the fraction is > 0.5, wouldn't he be able to spoof the entire chain and collect ALL the money?

In practice you would need far more than 50% of the mining power to take complete control of the blockchain.

The greater the hashrate of your miners, the greater probability that your miners will find a valid nonce that satisfies the difficulty equation; it doesn't magically give you the ability to find a valid nonce before any other party. As long as there is one other party mining, there is a non-zero chance that their block will be accepted before yours.

What ratio of malicious/non-useful blocks vs valid, legitimate BTC transaction containing blocks would be required before the economy fails, that is an open question.

I think what you mean to say is that you need to match the performance of the current network. But this would put you at 50% of the global mining speed. For example the current network mines at 20 Thash/s, so you need to bring another 20 Thash/s for a majority attack, in which, an attacker can do the following: https://en.bitcoin.it/wiki/Weaknesses#Attacker_has_a_lot_of_...

Anyway, Titan is not powerful enough to perform a majority attack. First of all, there is no Bitcoin miner optimized for the latest GK104/GK110 Tesla and GeForce GPUs. With current miners, people report speeds of barely 110 Mhash/s on the GeForce GTX 680, which should translate to ~140 Mhash/s on the Tesla K20X (which is only 30% faster in terms of 32-bit integer ops per second.) Titan has 18688 K20X, so that's only 2.6 Thash/s total. Far from the 20 Thash/s required. It would still bring 400 coins/day, or $5600/day :)

I theorize here [1] that based on the number of ALUs and clock frequency, that a properly optimized miner should be in theory 4x-6x faster than this on GK104/GK110, bringing Titan to 10-16 Thash/s. (Nobody bothered optimizing Nvidia because everybody is mining on more efficient Radeon GPUs or FPGAs.) Still not enough to match 20 Thash/s. Disclaimer: I have only programmed AMD GPUs, never Nvidia. Maybe there is a reason this 4x-6x theoretical perf gain has never been realized, such as the inability to execute 1 32-bit integer instruction per ALU per clock due to instruction latencies greater than 1 clock, or throughput being less than 1 instruction per clock... I don't know. Theoretical FLOPS numbers published by Nvidia indicate that floating point instructions, as opposed to integer instns, can run at 1 instruction per ALU per clock. Which makes it even weirder that there would be such a huge perf gap between int and fp.

[1] https://bitcointalk.org/index.php?topic=129292.msg1381510#ms...

That's for one attempt, right? How often are the nonces found?

The bitcoin algorithm adjusts the difficulty so 2016 blocks are generated in 2 weeks, so a nonce is found every 10 minutes or so.

So every 10 minutes you have a 50% of taking over the chain? If so, sounds like a guaranteed win if you have some time to spend.

Sure, but that just lets you set the next block. You can't use it to generate false transactions because no other client would find those transactions valid.

At best you can drop all transactions that are broadcast to the network and generate blank blocks, thus freezing all future transactions.

Oh, I thought with > 50% you would represent enough "clients" to "prove" the false transactions and earn the trust of the remaining network.

It wouldn't be as much as you'd think. For one thing, Titan uses ECC nVidia GPUs, which (if I remember correctly), are NOT optimal for producing bitcoin; AMD GPUs are better. I know you mentioned not paying for electricity, but the power bill for Titan would far exceed any bitcoin produced.

Efficency is mostly about the cost of the hardware (purchase and running costs), power usage and performance. Purchase, running cost and power usage are not part of your efficiency calculation.

ATI cards have traditionally had better hashing (in general, rainbow table gen, jtr, BTC) because they have a larger number of Execution Units per core, however clocked slower, than the nvidia. Higher number of EU allows better exploitation of parallelism important for the performance of hashing.

This is the fastest damn computer, its not a brand loyalty GM vs Ford, Coke vs Pepsi, Android vs iOS duality. Oh it's got nvidia, not optimized for hashing. It's going to kick the arse of any consumer or professional grade GPU on the market...

No, the point was integer-ops vs floating-point. AMD kills NVIDIA on integer-op performance, which SHA256 uses.

Not much. There was a supercomputer that was just recently dismantled in Arizona and someone asked the same thing. I calculated a gross return of approx. $200/day. That's not counting electrical costs, which will probably put you in the red.

~20Pflops = 3 terahashes/sec, which will earn you about $274/hour. Probably not the best use of supercomputer resources. :)

How did you calculate that? Those units don't quite work out. It's a totally different architecture to your standard i7/radeon home mining rig.

I don't think it's actually that different, through not as easy to calculate as e did.

Quote, Wikipedia: "Titan has 18,688 nodes (4 nodes per blade, 24 blades per cabinet), each containing a 16-core AMD Opteron 6274 CPU with 32 GB of DDR3 ECC memory and an Nvidia Tesla K20X GPU with 6 GB GDDR5 ECC memory."

So it's pretty much standard hardware just a lot of it. Except that the graphics card is basically a consumer tesla with some extra juice, but it's still just a card built with what's possible today. Nothing extra fancy.

Points for mention the merkle tree. Merklelicious!

OK, I'll give it a shot. I am no expert in any of these fields but I know that this could be valuable to society. Perhaps some work in the field already exists. If so, I am sure HN readers will point that our quickly.

The problem: The treatment or removal of tumors.

Issue: You can damage surrounding tissue, nerves and structures in the process.

Current methods:

Mechanical extraction. Just cut them out. Potential for damage to surrounding structures can be significant.

Gamma "knife": Focused multi-dosage radiation used to "kill" the tumor. Does not really remove a tumor so much as it stops growth without having to make an incision. Common for cranial tumors.

Other radiation: Can be devastating.

Desirable future solution:

Engineer molecules that are precisely designed to affect the tumor and spare non-afflicted structures. Perhaps this takes the form of a molecule that simply becomes a marker of sorts that some other technology (gamma knife?) can use to guide the "attack" with great accuracy.

In an ideal sci-fi world one would be able to administer a substance containing molecules that can bind to the tumor and actually dissolve it.

Take the example of something like an acoustic neuroma (my wife is a doctor, she suggested this example). This is a tumor that grows within the cranial cavity and around the acoustic and facial nerves. Mechanical removal usually means certain damage to the auditory never (causing deafness on that side) and potential damage to the facial nerve (causing that half of the face to drop and have not muscle tone). This type of tumor is usually either cut out or stopped in it's tracks using gamma knife. In most cases the person ends-up deaf on one side. This means the loss of acoustically locate sound, brain-computed noise reduction and even balance problems.

It'd be interesting to see a molecular "attack" type solution that could dissolve the tumor while staying away from the brain and not impacting the nerves at all. The treatment could also take the form of a substance that is actually injected into the tumor (rather than the entire bloodstream).

Don't know if this is the kind of problem that could be tackled with molecular dynamics on a supercomputer. Even if you could, there might not be enough data available or commonality in tumorous cells to develop "key" molecules that could target them.

Rainbow Tables. (or the new ones but I can't remember their name).

Sorry if it's not as sexy as you thought, but I think this could make a pretty big impact in the security community.

That's an interesting idea. Although, ideally, I'll want a project that requires a supercomputer. Rainbow tables can be created through distributed computing.

(I also had the evil idea of seeing how many bitcoin I could generate on Titan, but I don't think the ORNL staff would even find that funny.)

Say this to the super computer administrators: "im doing pure compsci research; im attempting to find alternative implementations of one-way trapdoor functions to optimize proof of work validations" then spend the next time tuning your mining bot and pocket the 14.17 USD/BTC you generate. For research!

This is already being done (albeit not with the aforementioned computer) with the same tech that the Folding@Home people use.


Real time zooming within the Mandelbrot set. Can't be done on anything less than a super computer---don't know how many nodes would be needed; the answer to that would be interesting in of itself. Different strokes etc., but that is most assuredly what I'd do.

It can be done on a moderately powerful GPU these days, but only to about 10x zoom (or maybe less, I don't remember) before it starts to look weird. The main problem is floating point precision (or lack thereof) causing distortion :(

Don't you need increasingly high numbers of iterations at higher zooms as well? I don't know if this is an actual property of the render, but I recall it being an issue the last time I programmed it, even with just double precision floats.

How far?

Real-time zooming on Mandelbrot was already done 20 years ago.

Sure it was using a "trick" in that at each frame only a few vertical and horizontal lines were recomputed, and the others simply re-used from the previous frame but still... (and at every frame each line was guaranteed to be "at most" x frames old).

It looked really nice and it was cool to see a real-time zoom on a Mandelbrot... In the nineties!

It was going for quite a while too...

Natural convection in a system heated from below - try and detect when the heat transfer switches from conduction to convection - maybe compare Nusselt numbers computed versus what's predicted by Navier-Stokes (no inter molecular interactions) ?

Edit: Another one - model the human circulatory system - trying to find optimal postures for maximum heart efficiency. Also simulate blood-flow keeping track of anti-coagulant factor concentration to detect high-risk areas for blood-clot formation.

Indeed, HPC applications that relate to life sciences, would be good candidates for world's top systems. I recommend you shop around from the two references here, to find your take: https://hpcbios.readthedocs.org/en/latest/HPCBIOS_2012-93.ht... # Both schools of the Atlantic are well represented.

Now, if I was you, there is something very specific I would be doing: scalability plots of HPC applications from life sciences, for 4^n processors each time (n=0..10+). I would strive for brief benchmarking runs but, even getting a correct run at high scale is already an accomplishment (remember that application lead-in/out time may be serious).

If you fancy, combine it with a tool like this: http://hpcugent.github.com/easybuild/ and you may automate metrics collection against multiple 1) compilers 2) MPI stacks 3) dependent libs 4) cpu/gpus/both

Man, the best thing in this is you need not be the most expert scientist to get some basic (application-provided-tests-for-validation-purposes) runs but the result can be important reference work for future optimization acts. Just make sure you don't repeat others & document precisely your setup so that future comparisons are well-grounded. This could be a decent Technical Report for others to use!

btw. you may shape even more the idea by looking at this: http://www.nvidia.com/object/bio_info_life_sciences.html

Good luck!

I assume you don't have your pet project then. Otherwise, you wouldn't ask :-).

Normally the most difficult part is adapting your code (if exists) to the supercomputer anyway.

What about cross-referencing non-coding parts of all existing genomes? With RNA secondary structures? I bet there would be a big benefit, if only finding a lots of non-coding RNA that does something, since it is known that this part of genomes does most of the job.

There are a huge number of genes which are known to be transcribed (as mRNA) and turned into proteins, but we have no real idea as to what these proteins actually looks like, or what their function may be, for example [1].

If we knew their final protein structure, it would be much easier to work out what biological processes they may be involved in.

Also, taking known protein structures and learning more about how they may interact with other proteins would be extremely useful. For example, a large number of molecular pathways in cancer are fairly well described in a general sense, but lack specific information on the actual reactions.

Finally, being able to put it all together in the context of mutated genes (which we can now screen for fairly easily) and being able to determine what impact a mutation may have on the protein (can it still be formed, how will it interact in a pathway?). For example, some mutations break a protein completely, while others may impact "switch" genes, leaving them in a permanently "on" state preventing them for reacting to external signals [2].

This kind of thing is really important in cancer research (and one day treatment), where the function of specific proteins may be the difference between life and death.

[1] http://www.genecards.org/cgi-bin/carddisp.pl?gene=C1orf186 [2] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891745/

Computational folding is nice, but it's not the only computationally-intensive activity in proteomics.

For some proteins with post-transcriptional (ex. alternative splicing) or post-translational modifications (ex. attachment of functional groups, formation of disulfide bridges, proteolytic cleavage) it is difficult or impossible to arrive at the correct folded structures from computation on nucleic acid sequences alone. For such proteins, the structures can be determined experimentally. That means coaxing pure samples of protein to form single crystals and then subjecting the crystals to X-ray diffraction (XRD). Backing out the structural information from the XRD patterns takes serious horsepower (though admittedly not necessarily power with Titan-class interconnects). XRD does not give information about the location of hydrogen atoms in the structure though. For that information you need to do neutron diffraction (the de Broglie wavelength of the neutrons is short). ORNL is actually one of the few places that do neutron diffraction. The OP could connect with some of the neutron folks there and assist them with their research. If not for proteins, maybe for superconductors.

I had a chance to run some Monte Carlo code on some of the machines at ORNL as part of a supercomputing workshop there during the reign of Jaguar. It was an amazing experience. Enjoy it, OP! I hope you get to take a trip there and see all the racks too.

If you need an introduction to the neutron folks at Oak Ridge, let me know.

I'll actually be visiting there early February for an experiment. I believe I'm supposed to do some training first. Do you work there?

I don't, working on a start up instead. Training is standard for those visiting, which can range from 15min to 3hr depending on what you will be doing. If you haven't make sure to leave enough time to complete it before the training office closes.

I just read the copyright thread, and someone mentioned how the Coca-Cola recipe is still secret, not copyrighted or trademarked. I kept thinking that with enough computational power, someone could eventually reverse engineer it from some mass spectrometer analysis (or maybe some molecular technique). But you'd need a good DB of organic molecules and their origin to start with.

The recipe is published at the back of the book For God, Country and Coca-Cola. Now that you have the recipe, what could you actually do with it? Who would actually sell the product? How would you market it? http://en.wikipedia.org/wiki/Coca-Cola_formula

Once you have your recipe you can go to a company like Cott who will manufacture it for you: http://en.wikipedia.org/wiki/Cott and then you have the problem of selling it. In many stores it is actually Coca Cola employees who deliver the product and put it up on the shelves. All the store does is ring the drinks up at the till. And the supplier often has to purchase the shelf space http://en.wikipedia.org/wiki/Slotting_fee

There are "open source" cola recipes: http://en.wikipedia.org/wiki/Open_source_cola http://en.wikipedia.org/wiki/OpenCola_(drink)

'Secret recipes' for major commercial food and beverage products are really a marketing myth - dominant brands want the public think their product is so good that all their competitors would copy it, except they don't know the recipe.

The reality is that a broad range of techniques (e.g. GC-MS) could reliably identify the chemical composition of products if competitors really wanted to know, and creating a recipe from the chemical composition isn't that hard (the chemical composition of many flavouring herbs and spices used commercially are widely available), and assuming no chemical transformations during processing, solving for the levels of ingredients from the levels of plants is simple linear algebra (solve Transpose(A)Y=Tranpose(A)AX for X, Y is a column vector giving the concentrations of the dominant species, A is a matrix giving the concentrations of each compound in each plant, and X is the unknown column vector giving the amount of each plant to use). Assuming thousands of species and ingredient possibilities, this could realistically be solved in less than a second on my home computer - no need for the supercomputer.

The real reason that big companies like Pepsi don't make an exact copy of Coca-Cola is not that they don't know what is in Coca-Cola, it is that it would be bad business - it is far better to make a product with its own distinctive taste for customers to like that is nevertheless close enough to be a substitute.

I'd be interested in just knowing the techniques used to go about this rather than the answer itself.

Obligatory xkcd: http://xkcd.com/683/

If you're genuinely interested, here are some nice analytical chemistry books:

Quantitative Chemical Analysis (Daniel C. Harris) Principles of Instrumental Analysis (Skoog, Holler, Nieman)

The following one, which you should read first because it's free, is also good:


Just ask questions if you're interested.

It might be interesting to estimate if or when a (parallel) implementation of Strassen's algorithm tuned to the Titan architecture might exceed the performance of some comparable (parallel) direct method for matrix multiplication. It is not your own field per se but since the computation underlies many things you would like to do it could still be interesting and approachable.

I have no idea what this machine can do, but while dreaming having endless computation power I'd do something for number theory and start thinking about how to proof that PI is normal. If that happened and only a few suggest PI is not normal, probably the inner nerd takes over and searches the position of all current top forty songs within PI and spreads this info. Then I'd buy some popcorn and follow future attempts of justifying copyrights onto a vectors into PI in front of a court while a direct master cut vinyl playing live recorded free jazz provides chilly background atmosphere.

How good would this be for the matrix step of GNFS attacks on RSA? I guess the first step would be lots of discrete machines to do the first step, but doing the matrix reduction on this might be good?

Calculate the trajectory of every photon hitting the planet Earth.

Erwin Schrodinger would like a word with you.

I suspect Heisenberg would like in on that conversation.

Maybe, maybe not.

Erwin Schrodinger would not like a word with you.

Univ Illinois has something on QMC already. http://code.google.com/p/qmcpack/

Do this: randomly generate texts to see if and how long it gets to:

- Shakespeare's literature - a c/c++ compiler

Crack some SSL CA private keys.

You're going to need a bigger computer.

Maybe some kind of interesting comparison between all the different algorithms there are for password encryption.

Bring a 4K monitor and you've got yourself the ultimate Crysis rig.

Try something with Cellular Automata or something akin to Conway's Game of Life. I don't think there are any hard scientific applications for CA, but you could make some pretty nice wall art, and you could probably finish it in a semester.

Try Brute-Forcing the human genome with the aim to find fractal patterns and it's Hausdorff dimension. You could try to find out if the extracted fractal patterns can be used to speed up protein folding.

That doesn't even make sense.

"person of interest"?

I actually thought the same thing. Let's say that we have the legal means to do this (ie. We won't get ultra sued or killed). How would one begin this daunting task?

Install Windows ME.

Have one of the computers run a 3d simulation of a body in a somewhat interesting environment that has windows pointing to cameras + screens (+mics +loudspeakers) in several public spots. It needs to have eyes connecting to nerves, ears doing a fourier analysis on sounds, the results going in to the network as pulses (do 50 fourier analyses on slightly phase shifted sound channels, coming from 2 points, like real ears do (real ears do about 20000 shifts each, but let's get real, even with a supercomputer)).

Have the rest of the computers run a pulse-based neural network controlling that body, communicate the pulses over udp/the network and actually let it run for 2-3 months.

Make sure that that body can see (and be seen and heard preferably, interaction would be great) from almost all locations in the environment.

We know that in the very short term such a network should try to imitate whatever it sees. It should start moving and try to duplicate whatever it sees through the screens (it may take a while though). Verify that does indeed happen with a big, complex network (preferably with the same architecture : big, central nerve connecting to the whole body, with offshoots at it's end going nowhere, only used for processing). See if you can get it to interact with people, and generally see how far you can drive this.

Model universal expansion with VSL, and solve the horizon problem once and for all :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact