Hacker Newsnew | past | comments | ask | show | jobs | submit | brutos's commentslogin

Assuming this is actually a genuine question:

Here are the per country ratios of different VOC/VOI: https://covariants.org/per-country

It took a while until it became dominant in India, and then a similar pattern repeats in each new country.


According to this thread from another virus expert [1], you can integrate pretty much every class of viruses in the system the authors of this paper used.

[1] https://twitter.com/ArisKatzourakis/status/13904196191696732...


I implore everyone to read the full article and not only the headline. The paper that is being reported on was widely criticized in its scientific community for its weak evidence and should probably be seen (if you are very charitable) only as a starting point for other virus experts to jump into the discussion and do more experiments and not as something for general consumption.


Yes because poor science is never jumped on by the media for general consumption.


I would advice to interpret the IFR reported by the Ioannidis paper with an extreme amount of caution. One of the authors criticized (which is quite an understatement) by Ioannidis went into a detailed rebuttal in [1]. A second thread [2] also gives a very detailed analysis of issues with the paper.

[1] https://twitter.com/GidMK/status/1376304539897237508 [2] https://twitter.com/AtomsksSanakan/status/137593538213983437...


I'd personally be extremely cautious of a rebuttal in a twitter thread rather than a professional rebuttal which is how these sorts of things should be sorted out in the scientific world. A twitter thread to rebut a peer reviewed paper actually makes me sad that this is where scientists go during a period where we need their input more than any other time.

Also, fwiw I followed antibody testing studies for the better part of a year and anything over 0.004 IFR was rare. I was surprised to see it as low as 0.0014 from this review expecting close to 0.002-0.003 but 0.0068 is incredibly high.


That's a valid, but still weird take to me. Academic publishing takes a very long time. Every paper I was involved in took many months from submission to eventual publishing (completely ignoring the time it takes to prepare a submission).

Ioannidis published something extremely controversial (if not even flawed) and one of the main authors he attacked responded with a lengthy explanation so that this manuscript would not remain unchallenged. I found that aspect way more important than the venue of response. Would you prefer to leave Ioannidis' work unchallenged for potentially months instead?


I've seen multiple published (not peer reviewed) rebuttals published in academic papers within a week or two during this pandemic. Not everything published needs to be peer reviewed before going before a wider audience - twitter simply isn't the platform for generating useful dialog - maybe for any context, but certainly not in an academic context.

edit: do you recall the paper that made waves in the U.S. claiming covid causes heart damage? That paper was published and editted prior to peer review based on criticism multiple times within a week or two IIRC - the appropriate way to handle conflict in science isn't by trying to get twitter to chime in and make things personal. Ioannidis "attacks" were unprofessional - in my mind, though, they were absolutely not on the level of trying to call out an author on twitter.


If you can recompile but the code was written to only target AVX-512 you can use https://github.com/simd-everywhere/simde to near effortlessly map the intrinsics to AVX2 (or lower).


This will be quite bad for reproducible science. Publishing bioinformatics tools as containers was becoming quite popular. Many of these tools have a tiny niche audience and when a scientist wants to try to reproduce some results from a paper published years ago with a specific version of a tool they might be out of luck.


Simplest answer is to release the code with a Dockerfile. Anyone can then inspect build steps, build the resulting image and run the experiments for themselves.

Two major issues I can see are old dependencies (pin your versions!) and out of support/no longer available binaries etc.

In which case, welcome to the world of long term support. It's a PITA.



I would recommend running a registry mirror as it's fairly straightforward.

https://docs.docker.com/registry/recipes/mirror/


That's still more effort than pushing a tar file to a free public gh repo.


There is a bit of upfront work, but backups are thereafter automated.


True. I was thinking more for archiving science. Most people in that category would probably rather push to gh or upload to Dropbox than set up a docker registry.


That doesn't help against expiring base images though.. :/


Yeah. That’ll be a mess. The way I try to do it is to build an image for a project’s build environment and then use that to build the project. The build env image never changes and stays around forever or as long as is needed. So when you have to patch something that hasn’t been touched for 5 years you can build with the old image instead of doing a big update to the build config of the project.

Many Docker based builds are not reproducible. Even something as simple as apt-get update failing with a zero exit code (it does this) adds complexity and most people don’t bother doing a deep dive.

Personally I use Sonatype Nexus and keep everything important in my own registry. I don’t trust any free offerings unless they’re self hosted.


There needs to be a way to create a combined image of all dependencies to distribute with a Dockerfile and code. That way people could still modify the code and dockerfile.


https://docs.docker.com/engine/reference/commandline/save/

You already have the source, now you have a dist.


Tern is designed to help with this sort of thing: https://github.com/tern-tools/tern#dockerfile-lock

It can take a Dockerfile and generate a 'locked' version, with dependencies frozen, so you at least get some reproducibility.

Disclaimer work for VMware; but on a different team.


The Dockerfile should always be published, but it does not enable reproducible builds unless the author is very careful but even so there's no support built in. It would be cool if you could embed hashes into each layer of the Dockerfile, but in practice it's very hard to achieve.


My field is doing something similar.

Reproducible science is definitely a good goal, but reproducible doesn't mean maintainable. Really scientists should be getting in the habit of versioning their code and datasets. Of course a docker container is better than nothing, but I would much rather have a tagged repository and a pointer to an operating system where it compiles.

It's true that many scientists tend to build their results on an ill-defined dumpster fire of a software stack, but the fact that docker lets us preserve these workflows doesn't solve the underlying problem.


FYI, and for anyone else still learning how to version and cite code: Zenodo + GitHub is the most feature rich and user-friendly combination I've found.

https://guides.github.com/activities/citable-code/


Thank you for mentioning Zenodo. I really liked how EU funding agencies push for reproducibility/citability of data and code when you submit proposals to them.

I haven’t filed any NSF stuff (yet) but didn’t come across any such hard requirements where you had to commit to something like zenodo or else to archive the result of your research work for archiving/citations purposes.


I <3 Zenodo. My societies don't require open data, but that's a generational shift.

Also, if you do bio-type research, you can use Data Dryad too!


Zenodo is great! In theory you could also upload a docker image to Zenodo and give it a DOI, but it doesn't seem to have an especially elegant way to pull this image after the fact.


It seems you simply have to pull it every 5.99 months to not get it removed. So add all your images into a bash script and pull them every couple weeks using crontab and you‘re fine.

On the other side, I see the need for making money and storage/services cannot be free (someone pays somewhere for it - always), but 6 months is not that much for specific usages.


"Pulling docker images every 5 months as a service"


Hey, you could distribute that as a container on Docker hub...


Which would mean people will regularly pull it and thus prevent it from being deleted. I call that a self-sustainable business model.


This is both amusing and actually feasible


Finally a good use for that raspberry pi idling in the corner


How hard is it to pull non-ARM images from a pi?


You can pull another platform's image. If the image is only available for one platform, you can just pull it. If the image is available for multiple platforms, it will pull your platform instead, you need to explicitly specify the digest to get another platform's:

For example:

docker pull python@sha256:2d29705d82489bf999b57f9773282eecbcd54873d308a7e0da3adb8a2a6976af

Pulls the latest Python image for Linux Alpine running on IBM System/z.

Not what you were asking, but if you have qemu-user-static installed, you can even run Docker images for other platforms. For example:

docker run -it python@sha256:2d29705d82489bf999b57f9773282eecbcd54873d308a7e0da3adb8a2a6976af

That's Python for Linux for IBM System/z. Check the CPU architecture:

import os;os.uname().machine

Running other platform's images under QEMU is going to be quite slow on a Raspberry Pi, I imagine (I haven't tried it). But of course for this case you don't have to run the image, you just want to pull it, so that doesn't matter.


Just ensure you've busted the cache, otherwise you're only pulling a joke.


I'm sure you've cited research older than 5.99 months right?

I wish they would grandfather images before this new ToS to not get wiped so that future images would be uploaded to more stable and accepting platforms so images on Docker Hub from research pre-ToS update don't get wiped.


well it sounds like someone's gotta pony up the bucks for a their own image repo, rather than freeload off someone else's storage infra.


Full circle achieved.

START Run your own stuff on stuff you own. Run your stuff on other people's stuff you rent. This is too expensive to maintain at your rent. Pay us more. Back to run your own stuff on stuff you own. ...and so on, and so forth.

And this, ladies and gentlemen, is why anything worth doing is worth actually doing yourself. Nothing is worse than building something conditionally feasible on someone else only to have the rug pulled out from under you by sudden business pivots.

But that's the nature of the beast I suppose. I've certainly not found a great way to do it any other way.


Science Docker Repo as a Service backed by Amazon Glacier and index, one time fee to access?


Oh yes I did, some probably as old as myself. Things just don‘t change that much in certain areas.


Why? It'll force a shift to a more elegant and general model of specifying software environments. We shouldn't be relying on specific images but specific reproducible instructions for building images. Relying on a specific Docker image for reproducible science is like relying on hunk of platinum and iridium to know how big a kilogram is: artifact-based science is totally obsolete.


Hummmm, what if the instructions says to get a binary that is been deprecated 5 years ago?

What if it use a patched version of a weird library?

Software preservation is an huge topic and it is not done based on instructions.


The FreeBSD Ports tree specifies package building via reproducible instructions, and handles things like running extra patches for compatibility and security on source distributions. FreeBSD binary packages are simply packaged ports.


Include the patch in the build instructions


There will always be these cases. The issue is that in many fields it is the norm rather than an exception.


I couldn't agree more. The defense of images over instructions to build them has often been "scientists don't work this way", but to me that's either overly cynical or an indication that something is rotting in academic incentive structures.


You could say the same about distributing docker images for deploying code for non-scientific software as well (and honestly, it may very well be true).

But that doesn't change the fact that it's just way easier to skim a paper and pull a docker image than follow every paper's custom build instructions and software stack.


Why would build instructions have to be custom? Making a reproducible image should be as easy as getting a docker image


> rotting

I would not say rotting. From my perspective, the academic community has always lagged behind engineering best practices (except in their specific fields).


These reproducible instructions you speak of are already present in Dockerfiles.

It seems like you're arguing against using docker images, when docker builds solve the very issue you speak of.

Correct me if I'm wrong...?


A Dockerfile is not a reproducible set of build instructions in most cases. I'd guess that the vast majority of Dockerfiles are not reproducible.

Let's look at an example dockerfile for redis (based on [0])

    FROM debian:buster-slim
    RUN apt-get update; apt-get install -y --no-install-recommends gcc
    RUN wget http://download.redis.io/releases/redis-6.0.6.tar.gz && tar xvf redis* && cd redis-6.0.6 && make install
(Note, modified from upstream for this example; won't actually build)

The unreproducible bits are the following:

1. FROM debian:buster-slim -- unreproducible, the base image may change

2. apt-get update && apt-get install -- unreproducible, will give a different version of gcc and other apt packages at different times

Those two bits of unreprodicble-ness are so core to the image, that they result in every other step not being reproducible either.

As a result, when you 'docker build' that over time, it's very unlikely you'll get a bit-for-bit identical redis binary at the other end. Even a minor gcc version change will likely result in a different binary.

As a contrast to this, let's look at a reproducible build of redis using nix. In nixpkgs, it looks like so [1].

If I want a reproducible shell environment, I simply have to pin down its dependencies, which can be done by the following:

    let
      pkgs = import (builtins.fetchTarball {
        url = "https://github.com/NixOS/nixpkgs/archive/48dfc9fa97d762bce28cc8372a2dd3805d14c633.tar.gz";
        sha256 = "0mqq9hchd8mi1qpd23lwnwa88s67ac257k60hsv795446y7dlld2";
      }) {};
    in pkgs.mkShell {
      buildInputs = [ pkgs.redis];
    }
If I distribute that nix expression, and say "I ran it with nix version 2.3", that is sufficient for anyone to get a bit-for-bit identical redis binary. Even if the binary cache (which lets me not compile it) were to go away, that nixpkgs revision expresses the build instructions, including the exact version of gcc. Sure, if the binary cache were deleted, it would take multiple hours for everything to compile, but I'd still end up with a bit-for-bit identical copy of redis.

This is true of the majority of nix packages. All commands are run in a sandbox with no access to most of the filesystem or network, encouraging reproducibility. Network access is mediated by special functions (like fetchTarball and fetchGit) which require including a sha256.

All network access going through those specially denoted means of network IO means it's very easy to back up all dependencies (i.e. the redis source code referenced in [1]), and the sha256 means it's easy to use mirrors without having to trust them to be unmodified.

It's possible to make an unreproducible nix package, but it requires going out of your way to do so, and rarely happens in practice. Conversely, it's possible to make a reproducible dockerfile, but it requires going out of your way to do so, and rarely happens in practice.

Oh, and for bonus points, you can build reproduible docker images using nix. This post has a good intro to how to play with that [2].

[0]: https://github.com/docker-library/redis/blob/bfd904a808cf68d...

[1]: https://github.com/NixOS/nixpkgs/blob/a7832c42da266857e98516...

[2]: https://christine.website/blog/i-was-wrong-about-nix-2020-02...


Unless something changed in the months since I have used Nix, this will not get you bit-for-bit reproducible builds. Nix builds its hash tree from the source files of your package and the hashes of its dependencies. The build output is not considered at any step of process.

I was under the impression that Nix also wants to provide bit-for-bit reproducible builds, but that that is a much longer term goal. The immediate value proposition of Nix is ensuring that your source and your dependencies' source are the same.


You're right that nixos / all nix packages isn't/aren't perfectly reproducible.

In practice, most of the packages in the nixos base system seem to be reproducible, as tested here: https://r13y.com/

Naturally, that doesn't prove they are perfectly reproducible, merely that we don't observe unreproducibility.

Nix has tooling, like `nix-build --check`, the sandbox, etc which make it much easier to make things likely to be reproducible.

I'm actually fairly confident that the redis package is reproducible (having run `nix-build --check` on it, and seen it have identical outputs across machines), which is part of why I picked it as my example above.

However, I think my point stands. Dockerfiles make no real attempt to enforce reproducibility, and rarely are reproducible.

Nix packages push you in the right direction, and from practical observation, usually are reproducible.


This is true, but the Nix sandbox does make it a little easier. If you're going for bit-for-bit reproducibility, it has some nice features that help, like normalizing the date, hostname, and so on. And optionally you can use a fixed output derivation where you lock the output to a specific hash.


the focus of nix in the build process is the ideal of if you have three build inputs bash 4, gcc 4.8.<patch>, libc <whatever version> , and the source of the package being the same(hash-wise) , the output is very much(for most cases) going to be the same, since nix itself(even on non-nixos) uses very little of the system stuff, it won't be using the system libc, gcc, bash, ncurses, etc, it will use its own that is lock to a version down to the hash, it follow a target(with exact spec) -> output , where as Dockerfile more resemble, of a build that is output first , and not doing build very often, this is why Nix have their own CI/CD system, Hydra to allow ensure daily or even hourly safety of reproducible builds


Exactly. Basically, if your product needs network access during build, you don't have a reproducible build, and if you don't have a reproducible build, it's only a matter of time before something goes horribly wrong.


Maybe they should switch to Github. https://github.com/features/packages


Or store the containers in the Internet Archive alongside the paper. They’re just tarballs. Lots of options as long as you're comfortable with object storage.


This still means that tools published in the last few years until now might just be gone soon. The people who uploaded the images might have graduated or moved on and none will be there to save the work.


Sounds like a job for the Archive Team, as long as there's some way to identify the images worth saving.


Yep, just mentioned it to the Archive Team IRC. We're probably going to selectively archive particular Docker images, although that's a lot of manual labor.

If you have any ideas wrt to selecting important images, that'd be great.


Rough idea: maintain an Awesome List of images worth saving, take submissions from public, use that list to automate what to pull?


Yeah, good idea — I’m not in these fields so it’s difficult for me to judge. Also, it sounds like we should be prioritizing niche images that only a handful of papers use rather than images that people rely upon regularly.


Couldn't you bootstrap a list by searching/parsing the Archive dataset itself? Searching for

A) "docker pull" commands and parsing the text that comes after it based on the command's syntax[1] to extract instructional references to images such as "docker pull ubuntu:latest, and

B) Searching for links/text beginning with "https://hub.docker.com/_/" to identify informational references to image base pages such as (https://hub.docker.com/_/ubuntu)

[1] https://docs.docker.com/engine/reference/commandline/pull/


Good idea! The base images are probably not in danger of being deleted though.

The other issue is that (to my knowledge) the amount of papers on IA isn't terribly impressive. I think maybe indexing and going through SciHub will be better since some of these fields slap paywalls in front of their papers.

However, that's a pretty large task as well. The other thing is that papers rarely say "to reproduce my work do . . .". Usually the best we've got is a link to a GitHub repo (if that). I'm not sure how effective that strategy will be since it's guaranteed to be an under-count of the docker images we'd need to archive. Perhaps in conjunction with archiving all images that fall under particular search queries, we'd get the best of both worlds.

I've you've got ideas, feel free to hop onto efnet (#archiveteam and #archiveteam-bs) (also on hackint) to share your thoughts.


Since images tend to be based on each other I wonder if someone's analyzed the corresponding dependency graph yet. In theory you should get quite far if you isolate the most commonly used base images.


Are those not the images that are basically guaranteed to stay in Dockerhub?


“Guaranteed” is a strong word.


quay is another alternative.


Publishing containers to GitHub might be free but you have to login to GitHub to download the containers from free accounts, significantly hampering end-user usability compared to Docker Hub, particularly if 2FA authentication is enabled on a GitHub account. As mentioned elsewhere Quay.io might be another alternative.


We (the GitHub Packages team where I work) are working on a fix for this and a number of issues with the current docker service. You can join the beta too, details here https://github.com/containerd/containerd/issues/3291#issueco...


You don't need to register an SSH key to download a public repo I thought


Not an SSH key, but you do need an access token:

> You need an access token to publish, install, and delete packages in GitHub Packages.

https://docs.github.com/en/packages/using-github-packages-wi...


...but not to download. You can clone a repo and download release artifacts without a PAT. That's only necessary for interacting with the API for actions that need authentication, which would be anything involving mutating a repository.


Yes, you need one to download. Note that you'll get an auth prompt for this Registry API URL:

https://docker.pkg.github.com/v2/test/test/test/tags/list

> "code": "UNAUTHORIZED", "message": "GitHub Docker Registry needs login"


Using the GitHub Docker Registry requires auth, even just for downloads.

https://docs.github.com/en/packages/using-github-packages-wi...

GitHub Packages is different from GitHub Releases (and their artifacts) or cloning repos.


> You need an access token to publish, install, and delete packages in GitHub Packages.

Yes, you do.


GitHub access tokens are a bit of a nightmare since you can't limit the permissions for a token. Only workaround I've found is to create another GitHub user for an access token and restrict that user's access.


GitHub storage for docker images is very expensive relative to free: I don’t think it’s a viable solution in this case.


They should be using Nix or similar then. The typical Dockerfile is not reproducible.


As long as the Dockerfile is released alongside, this should not be an issue.

I don't see any valid reason why anyone would upload and share a public docker image but not its Dockerfile and therefore do not pull anything from Dockerhub that doesn't also have the Dockerfile on the Dockerhub page.


Dockerfiles are not guaranteed to be reproducible. They can run arbitrary logic which can have arbitrary side-effects. A classic is `wget https://example.com/some-dependency/download/latest.tgz`.


What about when the image that it is based on goes out of date and is pruned too?


This is part of why I tend to only use images that only build from a small set of well-established base images like scratch, alpine, debian and occasionally ubuntu. Those base images can also be handled in the same way. For any exception, you can always do the same.

A bonus to this is that you no longer have the risks of systems breaking because of Dockerhub or quay.io (which I haven't seen mentioned here yet, btw) being offline.


Couldn't journals host the images? Or some university affiliated service, let us call it "dockXiv"?

Having the images on dockerhub is more convenient, but as long as the paper says where to find the image this does not seem that bad.


My grandmother has nine children. Seven of those survived and two did not. This was very common in the area she lived then. To this day, however only in private moments, she talks about the two that did not make it. The loss of the two children many decades ago still brings her a lot of pain. She gave the name of one of the children that did not survive to a later child, however that never erased the suffering.

I think we should be more kind to our ancestors. Just because they lived in fucked up times, compared to our current standards, does not mean they experienced a different quality of suffering.


> Just because they lived in fucked up times, compared to our current standards, does not mean they experienced a different quality of suffering.

Perhaps it's more about "you can get used to anything." Happiness set-point theory, etc.

If you're always experiencing some low level of pain, then the pain soon stops being distracting. It still has pain qualia, if you focus on it; but those qualia no longer impact your life. You learn to function around/through that pain. It stops having relevance to your brain's decision-making process. It stops being processed consciously.

And someone who has learned to do that, if they experience a set, larger amount of pain (a tooth being extracted, say), will experience less subjective pain relative to someone not already experiencing that low constant underlying pain, because the relative amount of pain they experienced—the pain they haven't learned to ignore, the pain that leaps to conscious attention—will be less for them, than it is for the person who normally experiences no pain at all.

Now take that concept and apply it to mental anguish or guilt/shame. I would expect that it would imply that people who lived in times where everyone had all sorts of reasons to be anguished, and thus were low-level constantly anguished—would end up less likely to get PTSD, simply because there are fewer things in their lives that can truly "pin the needle" of anguish enough to cause PTSD, when the tare on their "enough anguish to rise to conscious attention" scale has been reset higher.


I think you might be onto something. Today's folks, me including, take suffering, be it physical or psychical as something exceptional, the worst of the worst situation in life and will do just about anything to get rid of it. And in many cases we have some ways to help with it.

Compare it to times where you had to simply endure suffering (and everybody suffered somehow), be it headache, badly healed fracture, being gang raped during some war/raid by bandits, seeing your wife/child dying during childbirth and so on. And nowhere to escape to. Apart from alcohol/natural drugs which produced many addicts - drunks were part of many societies since ever and story was often the same as today


The message from the community was to ensure mask supply for medical staff first. They are both more at risk themselves and a risk for others due to the large numbers of contacts.

Once supply is secured everyone should be educated about correct use and be encouraged to wear them.

Most of Asia already has a culture of wearing masks and existing supply chains. Europe/US did not and it takes a while to establish.


It might not be surprising, but it is an extremely important point to keep reiterating.

People keep spreading the malthusian myth of overpopulation and shifting blame to the poorest. Blaming the poor while giving the rich a free pass is not only unfair, it’s cruel.

Once climate change denialism is not tenable anymore, climate change will be instrumentalized for atrocities.


I'm afraid that you are onto something here. People misattributing climate change to the poor masses is increasingly common and overpopulation in poor countries is often cited as a reason not to engage in any change of behavior on the rich nations side.


I was looking for adding at least one Threadripper server to our local HPC system. I only found one ASRock motherboard for TR4, but none yet for sTRX4. I hope we will see 1U rackmounted sTRX4 systems soon.

Our new HPC system will mostly consist of Epyc 7742, but having one node with super high single-thread performance would be nice for less well parallelized applications.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: