Muyuan Chen is one of (maybe the?) primary developer of the sub-tomogram averaging portion of the EMAN2 software package (linked below in another comment). Typically what you do is take a 3D tomogram (think like a scan) using a microscope, but it's extremely noisy. Then you go through and extract all the particles that are identical, but in different orientations, in the tomogram. So if the same protein is there multiple times, you can align them to each other, and average them together to increase the signal. Then you clone back in the higher signal averaged volume at the position and orientation that you found them in originally.
The one-line command to go from EMAN2 coordinates to Unreal Engine 5 is kind of crazy.
As usual on these (rare) threads, I'm happy to answer any questions about structural biology or cryo-EM.
Do you know what the author means by "Current visualization software, such as UCSF ChimeraX6 , can only render one or a few protein structures at the atomic level."
I haven't used VMD for about 30 years, but even in the 1990s I was using it to visualize the full poliovirus structure (4 proteins in 2PLV * 60 copies, as I recall).
It took about 6-10 seconds per update on our SGI Onyx, but again, that was 25 years ago.
I can only guess, but I believe that ChimeraX's rendering pipeline is single threaded (just an empirical guess based on my CPU usage when using it). Additionally, loading that many atom positions requires a huge amount of memory (I routinely use > 32 GB memory just loading a few proteins) and things start to slow down quite a bit.
Loading a 60-fold icosahedral virus has used > 100 GB memory on my workstation, and resulted in a 0fps experience. It might render OK from the command line, but now imagine a few dozen of those, plus a cell, plus all the proteins in the cell...
Odd. I can't see why. I think we had 128 MB on that IRIX box, and I know I loaded a 1 million atom structure with copies of 2PLV (full capsid plus a bit more to get to a million.)
Each atom record has ~60 bytes (x, y, z, occupancy, bond list, resid, segid, atom name, plus higher-level structure information about secondary structure, connected fragments, etc.) We had our own display list, so another (x, y, z, r, color-index) per atom, giving 20 more bytes. We probably used a GL/OpenGL display list for the sphere, and immediate mode to render that display list for each point, so all-in-all about 100 bytes per atom, which just barely fits in 128 MB.
That was also all single-threaded, with a ~0.1 Hz frame rate. Again, in the 1990s.
I wanted to see what more recent projects have done. Google Scholar found "cellVIEW: a Tool for Illustrative and Multi-Scale Rendering of Large Biomolecular Datasets" (2017) at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5747374/ which says
> The most widely known visualization softwares are: VMD [HDS96], Chimera [PGH04], Pymol [DeL02], PMV [S99], ePMV [JAG11]. These tools, however, are not designed to render a large number of atoms at interactive frame-rates and with full-atomic details (Van der Walls or CPK spherical representation). Megamol [GKM15] is a state-of-the-art prototyping and visualization framework designed for particle-based data and which currently outperforms any other molecular visualisation software or generic visualization frameworks such VTK/Paraview [SLM04]. The system is able to render up to 100 million atoms at 10 fps on commodity hardware, which represents, in terms of size, a large virus or a small bacterium.
Following that is a section on Related Work:
> With their new improvement they managed to obtain 3.6 fps in full HD resolution for 25 billion atoms on a NVidia GTX 580, while Lindow et al. managed to get around 3 fps for 10 billions atoms in HD resolution on a NVIDIA GTX 285. Le Muzic et al. [LMPSV14], introduced another technique for fast rendering of large particle-based datasets using the GPU rasterization pipeline instead. They were able to render up to 30 billions of atoms at 10 fps in full HD resolution on a NVidia GTX Titan
Checking up on VMD, in "Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing" from 2016:
> VMD has achieved direct-to-HMD rendering rates limited by the HMD display hardware (75 frames per second on Oculus Rift DK2) for moderate complexity scenes containing on the order of one million atoms, with direct lighting and a small number of ambient occlusion lighting samples.
Those citations are 6-7 years ago, which make me scratch my head wondering why ChimeraX can't handle a picornavirus.
The author of EMAN2 is incorrect, I don't know why they claimed that. ChimeraX is probably like Chimera and targets a 30fps but can drop below that significantly based on dataset size and rendering quality. It should be using OpenGL with display lists (or some more modern variant on that). The main loop is likely in Python, but if you're just moving a molecule around, the rendering should touch very little python. On a modern machine with an nvidia gaming card it should be fine.
For example, in this case I loaded 2PLV with "open 2PLV" and on the right side, there's an option to select one of the mmcif assemblies, with select 1 being "complete icosahedral assembly, 60 copies of chains 1-4". With the default ribbon rendering, rotating is completely smooth; with all atoms displayed (wireframe or sphere), it's still smooth. Computing a surface for the entire capsid takes well under a second(!) and still renders smoothly. Rotating shows my GPU (nvidia RTX 3080 Ti) at about 50% utilization, and if I exit Chimera, my GPU's releases ~200MB of memory.
Chimera was never intended to do high quality rendering of cellular environments with many hundreds of proteins. It was intended for a combination of nice rendering and scripting directly in python. VMD definitely handled some extremely large scenarios faster. A dedicated small C++ using modern OpenGL would be able to do far, far more than Chimera when it comes to simple rendering without any scripting control.
Opening the bio-assembly for 3J31 ate about 8 GB of my VRAM, and 32 of my system RAM, in ChimeraX. Which is actually less than I remember a few years ago. I wonder if the render pipeline has changed a bit. That said, it's still very significant if you have 10-20 viruses attached to a cell, for instance.
EDIT - that's also for atoms. Going to 3D maps is significantly more computationally intensive. A typical, sub-tomogram average, or annotation will be in MRC file format, which is horrendously slow with a box size > 1024 pixels or so.
THere's something wrong with your numbers. Opening 3j31 and turning on the complete assembly uses <<1GB extra system RAM and <1GB extra GPU RAM on my machine.
I've worked with 3d maps and the same thing has been true for over 20 years: if you want to work with extremely large systems, you need an expensive graphics card (I was the first person to port Chimera to Linux some ~20 years ago, and we always compared the performance of my gaming cards to faster professional cards. for volumes like 3d maps, the largest volumes always were quite close to the largest graphics cards)
"Current visualization software, such as UCSF ChimeraX6, can only render one or a few protein structures at the atomic level."
Lots of current visualization software is focused on visualizing a single protein structure (for example, ChimeraX). New visualizing and modeling systems are being developed to go up in scale to cellular scenes and even whole cells. For example, systems like le Muzic et al.'s CellView (2015) [1] are capable of rendering atomic resolution whole cell datasets like this in realtime: https://ccsb.scripps.edu/gallery/mycoplasma_model/
I still think "few" is the wrong word. I usually think of "few" as meaning up to around 6, while Chimera and VMD can easily handle hundreds of proteins at the atomic level.
Sounds pretty cool, how does EMAN2 deal with dynamic structures? I assume you'll get garbage if sufficiently different conformations get averaged together. Is there some kind of clustering to find similar conformations as is sometimes done in cryo-EM?
Yes, at multiple levels. You can do a heterogeneous refinement that takes the structures and solves for n number of averages, trying to use something like PCA to maximize the distances between the averages. Particles get sorted into the average they contribute most constructively to. That works well for compositional heterogeneity or large conformational differences.
For minor differences, there's something called the guassian mixture model (lots of software packages have similar, but GMM is EMAN2's version).
What you can get out of the other end is a volume series, that shows reconstructed 3D volumes along various axes of conformational variability. This quickly turns into a multi-dimensional problem, but it has been very successful in, for instance, seeing all the different states of an active ribosome.
Not an expert in that field. I can speculate wildly that I'm not super optimistic. I can say I've seen the narrative around the field start to give way slightly - shifting from "genome is all you need" to "genome is not enough".
The problem is that many of the interesting or urgent pathologies have no obvious (or weak) associations. Or maybe the noise level is too high. So there's got to be a piece of the puzzle we're missing, or something is getting lost in the noise. Whether a neural network can pull something out of the noise remains to be seen, but if it can't, I'm not super optimistic about our chances. Overall I'd say trying to tackle the problem at the outcome level is probably more promising right now. Even if we can find good associations, we're still lacking therapies.
Which was 12 years ago! After watching that video, I had a much greater appreciation for how our bodies are made up of trillions of tiny protein machines. Fascinating stuff!!
My friend in grad school was in Ron's group. He built a microscope that visualized individual kinesin molecules and measured their speed using fluorescent labelling. The whole thing was held together with a bunch of scripts written in LabView. Ron had oodles of money and was able to support long-term software development of open source software like MicroManager, which gives a common interface to a wide range of microscopy software.
The systems he studies are literally little motors that can attach to biological surfaces and drive around in specific directions, pick up payloads, and then drive to other places. They work in very different way from how humans engineer tiny motors and understanding/engineering their behavior was a major focus in the early 2000s.
> My friend in grad school was in Ron's group. He built a microscope that visualized individual kinesin molecules and measured their speed using fluorescent labelling.
Yeah, exactly that, but with kinesin instead of dynein (everybody started with myosin, but loss interest, and moved to kinesin and then dynein) and about 10 years earlier.
Those little blobs moving along the filaments are ~10-100 nanometers, you wouldn't normally be able to see them, but they managed to tether fluorescent (glowing) molecules to them and those act like point sources of light, which allows for precise localization because the PSF of a point is approximately gaussian and finding the centroid of a gaussian is trivial.
Appreciate the share. I've seen a compilation of these clips out of context and loved them. Never figured out where they came from. They really are amazing in striking the balance between organic and mechanistic. The Kinesin in particular are cute.
Yes... though regrettably, that render is profoundly misleading. The payload is actually flailing around violently. Dancing in the molecular nanoscale moshpit from hell. Between each glacial step, the payload basically explores its entire tethered configuration space. Picture balloon in a hurricane tied to a mouse clinging to a wire, rather than a donkey towing a barge. The render optimizes for art over education, for pretty over engendering misconceptions.
Consider filming a runner, and only showing frames where the arms and legs are in the same unmoving positions, the rigid person quietly floating along over the ground. Or a soccer game render, of floating statues. Not without value, but profoundly misleading. Especially for the poor alien student, deeply unfamiliar with animal life and planetary surfaces.
One nice aspect of TFA, is rendering a moment frozen in time. Rendering non-bogus dynamics remains a hard open problem.
so it's using ChimeraX to turn a PDB file (protein or DNA structure) into an isosurface and then triangulating the surface into a mesh which is then rendered in unreal.
I just released a biology education app very much like the preprint for the Vision Pro launch (and soon for iPad/iPhone). I worked with David Goodsell's group to integrate their whole-cell bacteria model and David wrote the content. It looks like this: https://twitter.com/timd_ca/status/1753250624677007492 Our first bit of content is a tour through a 300 million atom bacteria cell for Apple Vision Pro (>60 fps, stereoscopic, atomic resolution).
The linked preprint is beautiful, and I love the pipeline. I wonder if it's possible to export to other tools like Blender? The linked preprint is part of a pretty cool field of research into mesoscale modeling and visualization. For me these are a few of the standout papers, projects and works in the area (and there are many more):
- le Muzic et al. "Multi-Scale Rendering
of Large Biomolecular Datasets" 2015 [1]
- - Ivan Viola's group helped pioneer large scale molecular visualization. This reference should be in the preprint, IMO.
- Maritan et al. "3D Whole Cell Model of a Mycoplasma Bacterium" [2]
- - This is out of David Goodsell's lab and the model I'm using.
- Stevens et al. "Molecular dynamics simulation of an entire cell" [3]
- Brady Johnston's Molecular Nodes addon for Blender [4]
I recently bought his book primarily for its illustrations and was painfully disappointed at the print quality. It was so bad I thought Amazon had sold me a counterfeit so I returned it and ordered directly from the publisher instead, at significantly higher cost, only to receive the same.
I’d happy pay more for a high quality print but don’t see anywhere to do so.
If you get a kick out of 3D renderings of cells and molecules, you're gonna have a field day with the work done at https://random42.com/. PSA: I started working there as a 3D artist but now lead the interactive department. You'd be surprised at how much a good art direction really makes a difference in scientific visualization. Real-time graphics advanced considerably in the last couple years but it's always a challenge to transport that nice, smooth pre-rendered look over to mobile devices and the web at 60 frames per second (90 on virtual reality headsets, to boot...)
Wow - these are stunning! I am curious if you have any "realistic time" animations, e.g., where blood circulates with the speed close to the one in the human body.
Nice videos but I'm always reminded when watching this type of molecular biology video that it's missing all the water molecules. These proteins and things aren't floating around in empty space.
This is an abberviated version of the same yes, I'm surprised though that the vimeo link isn't working without login. I don't have an account with them and was able to play it without issues.
I would like to ask a question and add before that that I have no intent to judge, discredit or diminish the value of this. It’s merely that I really don’t understand and would like to gain insight.
The question is: How is this a scientific contribution?
Or, to ask it differently: What makes this a scientific contribution?
It’s more of an engineering accomplishment. I could see this being useful for exploratory data analysis of large protein (mesoscale) complexes. A surprising amount of science starts with a grad student staring at a really complex plot for a really long time, then suddenly going “oh shit thats weird”. That kind of realization is terribly difficult if your visualization tools are fighting you the whole time.
I understand that they get the proteins from PDB/ChimeraX, but how much manual process is involved to map and place the individual proteins? The paper says it gets the protein locations from CryoET tomograms, but I'd be surprised if these let you automatically identify which proteins are where, and exactly how to place them, and even less so how the ligomers bind together to form larger structures - for example, in the video the membrane surfaces are very smooth, and look almost textbook picture perfect, which suggests they come from a hardcoded model or are smoothed in some way. One part of the paper mentions subatomic averaging from the tomogram, but another mentions:
> From a tomogram (Figure3a), we select the particles and determine the orientation of the crown of the spike, as well as the stalk that connects the spike to the membrane
Is this a manual process, where the researcher is using his mental model of how the proteins are fit and placing/rotating individual proteins? Or do the tools they developed let this be automated. Both are impressive! If the former, I'm blown away by the effort it must take to make these kinds of videos.
It's a guided, but automated process. EMAN2 (the software used/partially written by the author), for instance, has a convolutional neural network particle picker. So you can pick a few particles by hand, pick some noise, pick some bad particles, train a neural network, and then inference that network to pick the remaining particles throughout the tomogram.
There are a variety of other methods too. There is simple 6 dimensional real-space cross correlation. You can place points by hand, or according to a model. For instance, if you are trying to identify viral spike proteins, and the virus is spherical, and the spike proteins are always on the surface of the sphere, you have a great starting point. So you can say "place points at n interval along the surface of this sphere" and then oversample the spherical surface. You then take a reference volume (can be generated a number of ways), and check each point you placed to see how well it matches the reference volume. You can allow for rotations and translations of the reference volume at each point, and if you find points that are too close together, you can merge them automatically.
If you have a high contrast, relatively static protein (such as a ribosome), you can do 2D template matching in the tomogram, where you use a central slice (or maybe a collapsed projection) to do cross correlation in 3 dimensions instead of six (X translation, Y, translation, Z rotation). Or you can beef that up with more neural network/YOLO type stuff.
EDIT: To expand on this, continuous density like membranes can be roughly modeled just with techniques like thresholding, watershedding, etc. There are some neural nets such as Membrain [0] and tomoseg [1] (also by the author of this paper), but membranes certainly are trickier. I typically segment membranes by hand (and do so rarely).
ChimeraX has a VR functionality. Certain modeling programs still support Nvidia's Stereo3D (Coot, PyMOL, Chimera, ChimeraX, and more) which I still use for modeling.
That relies on X11 unfortunately, so I'm looking for a new way to do 3D viewing.
The one-line command to go from EMAN2 coordinates to Unreal Engine 5 is kind of crazy.
As usual on these (rare) threads, I'm happy to answer any questions about structural biology or cryo-EM.