
ACM costs vs. arxiv.org costs - luu
https://twitter.com/jeremyphoward/status/1219365213201264640
======
jph00
I wrote this tweet. Wasn't sure anyone would read it - glad it's got noticed,
because I do think it's an important issue!

I'd love to know where that money at IEEE and ACM is going. The annual reports
don't make it at all clear, unfortunately. Obviously, there are for-profit
publishers where the money is simply going to huge profit margins. But that's
not the case for professional societies.

One thing I noticed when I had to sign up to ACM for a conference a few years
ago was that I got harangued by sales-people from ACM for months afterwards,
trying to get me on a call to have me buy more expensive memberships. It
wasn't an automated system - it was an actual person, trying to get me onto an
actual phone call with them. It occurred to me at the time that that must be
very expensive, yet it's still a profitable thing for them to do - so there's
clearly a lot of money changing hands...

I don't think this is a good sign. Perhaps the professional societies can
openly publish a full breakdown of what they're spending money on?

~~~
cs702
Thank you for tweeting this.

I'm making my way through the IEEE annual report to which you linked[a], and
am dumbfounded too. Not only does the IEEE spend $193.4M/year operating
"periodicals and media," they also spend $103.5M/year on "membership and
public imperatives" (!?) and $38.5M/year on "standards" (!?). Note that these
figures do _not_ include the amounts spent on conferences, which are at least
tangible.

Moreover, it looks like the organization has $523.8M in investments at fair
value (p. 43) -- an endowment greater than that of the vast majority of US
colleges and universities. Am I imagining things, or does the IEEE look a lot
like "a small hedge fund attached to a professional organization?"

To put that last figure in perspective, if income and gains from the IEEE's
investment portfolio are, say, at least 5%/year, they could fund all of
arXiv.org's annual expenses _20 times over_ in perpetuity without dipping into
capital.

Something smells rotten indeed.

[a] [https://www.ieee.org/content/dam/ieee-
org/ieee/web/org/corpo...](https://www.ieee.org/content/dam/ieee-
org/ieee/web/org/corporate-communications/annual-report/2018-ieee-annual-
report-final.pdf)

~~~
jph00
Another asset they have (and monetize) is a vast library of old papers. For
instance, if you want to read Kildall's 1973 classic "A unified approach to
global program optimization" then ACM will gladly let you do so... after you
pay them $15.00 for the 8 page paper.
[https://dl.acm.org/doi/10.1145/512927.512945](https://dl.acm.org/doi/10.1145/512927.512945)

It's hard to see how this is compatible with their claim to be "Advancing
Computing as a Science & Profession".

~~~
mhh__
I just clicked it and downloaded the paper?

(I agree with you just possibly a poor example)

~~~
aspaceman
Clicked at home and got the paywall. Pretty sure you're on a network with
access.

~~~
mhh__
Spooky! That makes sense.

Wouldn't have known because I've been (ummm...) "Acquiring" (libgen) them
elsewhere

------
elehack
There's a lot to critique in publishing and associated costs, but this tweet
is unfortunately factually wrong.

From the linked article, ACM's publication costs are $10.9M, not $33.7M.

One of the ACM's major publication initiatives over the last 3-5 years has
been an overhaul of their publication templates and publication workflow, to
ensure greater consistency in publication formatting, improve accessibility,
and archive publications in more future-proof formats. There are also the
ongoing costs of creating and indexing metadata (ACM tracks more metadata than
arXiv, including resolved citations), preservation (ACM buys failsafe
perpetual access services from Portico, arXiv has mirrors at other university
libraries).

Should it cost $10.9M? I am not sure. Does it cost a lot more than what arXiv
does? Yes.

For a costing exercise: the service ACM buys from Portico is archival and
republication. If ACM goes insolvent, Portico flips on their archive and the
content remains available. How would you price this service, knowing that when
it is actually needed, it's because your customer can no longer pay bills, and
you now need to take up their hosting (and all related costs) for
approximately forever with no further revenue? I think a network of university
libraries would be a more cost-effective way to provide this service, but it's
the kind of thing that people working on publication and archival
professionally think about, and that factors into the cost of professional
archival-level publication.

(I cannot speak to IEEE.)

~~~
chrisseaton
> their publication templates and publication workflow, to ensure greater
> consistency in publication formatting, improve accessibility, and archive
> publications in more future-proof formats

Publication workflow, formatting and accessibility? For every paper I’ve done
I just send the ACM a final PDF produced myself from a LaTeX template that
hasn’t changed in years. What’s the workflow for taking an already final PDF
from authors and uploading it to a file server?

~~~
elehack
That workflow has changed in the last few years.

\- Brand new templates (introduced about 5 years ago, the LaTeX template has
had multiple updates per year since then)

\- Workflow that makes use of the source (or possibly codes the source embeds
in the PDF, but you have to provide LaTeX source to ACM these days)

\- Papers now render in both PDF and HTML (and the HTML looks quite good),
this started showing up within the last 1-2 years

\- Papers are archived in an XML-based format (something called JITS, I do not
know details) to facilitate rendering to PDF, HTML, ePub, and other formats
not yet devised

~~~
elcritch
That doesn't seem too impressive. It's essentially a workflow that a few
universities could band together and replicate via an open source project
relatively easily IMHO.

As an example, Pandoc can already handle 90% of this type of workflow by
itself (converting Latex to various XML formats). An open source project
shared among a few universities or developed by single body like the ACM and
used among dozen's of publications and fields. Even two or three full time
people working on this would cost much less than $1M per year.

------
andrew_n
One great thing about a $99/year ACM membership is that it includes full
access to the O’Reilly service formerly known as Safari Books, normally
$499/year. I have no idea what the volume pricing on Safari subscriptions is,
so can only say there’s a possibility that ACM article sales subsidize Safari
books; I do know that, as an individual who has to learn new things and look
old things up constantly, the inclusion of O’Reilly/Safari makes an individual
ACM subscription a fantastic deal.

Additionally, having seen someone organize an academic conference once, I do
know that IEEE provides a conference with things like bank accounts,
insurance, and purchasing departments that can meet the creditworthiness
requirements of major hotels. It also ends up covering the shortfall if the
conference winds up losing money.

I’m all for improving efficiencies where possible, and there are definitely
some problems with these organizations, don’t get me wrong; but I did want to
emphasize that both organizations are definitely providing real value to parts
of the computing community.

Disclaimer: I haven’t read the link as twitter is (intentionally) inaccessible
from my machine.

~~~
exikyut
I'm curious why twitter is inaccessible from your machine.

~~~
Mathnerd314
From "intentionally" I would guess a social network DNS blocklist that he
self-configured, although it could also be a workplace policy.

~~~
andrew_n
Yeah, self-configured DNS blocklist. Some of the reasons I’m not a fan of
twitter apply to hacker news as well; for example, both can be distracting and
contribute to procrastination.

Compared to hacker news, my main issues with twitter are a much lower signal-
to-noise ratio, lack of prioritization over time windows—if I take a
day/week/month away from hacker news, it’s easy to catch up on the top things
I missed at [https://hn.algolia.com](https://hn.algolia.com) —lack of depth in
content, and pervasive tracking across other sites including through t.co.

------
Rochus
Isn't this a typical case of Parkinson's law?

See e.g.
[https://en.wikipedia.org/wiki/Parkinson%27s_law](https://en.wikipedia.org/wiki/Parkinson%27s_law)
(didn't find a free version of the book unfortunately).

There is even a fitting example in the book where Parkinson describes how the
administration of the British Navy became bigger and bigger although there
were fewer and fewer ships to manage. In the present case, one would have to
assume analogously that with the introduction of the Internet and the
advancing automation, considerable costs were eliminated.

So the question is: have ACM and the other mentioned organizations already
reached the tertiary and last stage of INJELITITIS?

~~~
cowsandmilk
> have ACM and the other mentioned organizations already reached the tertiary
> and last stage of INJELITITIS?

Frankly, ACM membership and publication is a good deal compared to many other
societies. For $200 annual dues, you get unlimited access to all journals.

Contrast to the American Chemical Society. For $175, you get:

1) Access to 50 articles for 48 hours 2) The right to purchase access to
additional articles at $12 a pop for 48 hour access.

You want to look at the article again after 48 hours? Pay up again.

[0] [https://www.acs.org/content/acs/en/membership/member-
benefit...](https://www.acs.org/content/acs/en/membership/member-
benefits/publications-discounts.html)

~~~
Rochus
Well, that was not the argument. I agree, there are much greedier
organizations around. However, this does not answer the question why the
operation of these organisations is so expensive compared to their services.
Parkinson has a good explanation though ;-)

------
drallison
This post makes no sense. ArXiv is a site to which papers are posted. ACM and
IEEE are technical societies with a range of publications professionally
managed, peer reviewed, and edited. They serve different needs and have--
surprise surprise--different costs.

~~~
jph00
They don't pay for peer review. I'm not sure you mean by "professionally
managed" or "edited" exactly - or why that would cost over $100m.

(Disclaimer: I wrote the tweet. Although I didn't expect it to appear on
HN...)

~~~
denzil_correa
Thanks Jeremy for highlighting this issue.

A lot of scientific publishers have hijacked "Open Access" to charge high fees
for the same publication as before and pocket more money. For example,
"Springer Blood Cancer Journal" charges $ 4,580 as OA fees. I can't imagine
how one can rationalize that cost.

~~~
Someone
PLOS ONE’s fee is $1,595. A factor of 3 doesn’t seem inexplainable to me.
That’s “Springer _Nature_ Blood Cancer Journal”, so factors of 1½ in “better
editing”, “older, less efficient systems” and “fewer publications/year, so
lower efficiency of scale” could already do it.

Also, is Nature working on digitizing old content? That can be costly (I
remember reading somewhere that it could involve a) finding a library that has
a copy and b) flying there to photograph it) and that’s a cost PLOS won’t ever
have.

~~~
denzil_correa
> PLOS ONE’s fee is $1,595. A factor of 3 doesn’t seem inexplainable to me.

You need to first explain why PLOS ONE is an appropriate baseline. A world
class Open Access journal costs roughly $10 per submission [0]. Most arguments
I have seen using PLOS ONE as a baseline, talk about the "non profit" part of
PLOS. It has to be stressed here at that "non profit" doesn't mean PLOS works
on a "non profit" business model. It just means that they generate profit AND
the profit isn't distributed to its members, directors or officers [1].

[0] [https://gowers.wordpress.com/2016/03/01/discrete-analysis-
la...](https://gowers.wordpress.com/2016/03/01/discrete-analysis-launched/)

[1] [https://www.law.cornell.edu/wex/non-
profit_organizations](https://www.law.cornell.edu/wex/non-
profit_organizations)

------
musicale
Usually when the budget is opaque there's a reason for it.

Money really seems to be wasted for the most part, and the digital library has
always been an embarrassment. Years later, it still has no API, and last I
checked the terms of service forbid you from accessing it with software!
Completely braindead.

The only reason to join ACM besides accessing the crappy digital library is to
save money on conference registration fees, which are outrageous because of
what the venues usually charge and get away with.

Usenix seems (to me at least) to be better run and had open access policies
years before ACM. But the IEEE is largely a cesspool as well, charging absurd
fees for digital library access, and sponsoring a number of spamferences.

~~~
musicale
Actually I might stand corrected somewhat; if you want a Safari/O'Reilly
subscription, ACM might be the way to do it?

However... I'd probably be in favor of unbundling this and other commercial
services from ACM memberships.

------
liamdiprose
If its hosting costs the ACM is worried about, they could offer torrents as a
free download option to lighten the load.

~~~
covertlibrarian
I imagine the vast majority of the digital library is included here:

[http://libgen.is/scimag/repository_torrent/](http://libgen.is/scimag/repository_torrent/)

80,700,000 articles as of today. These torrents contain every single article
that's ever been accessed through sci-hub. Database dumps at
[http://gen.lib.rus.ec/dbdumps/](http://gen.lib.rus.ec/dbdumps/) provide an
index.

------
redis_mlc
For those who are trying to follow the money, often organizations like these
have expensive old-school benefits like defined-benefit pension plans.

~~~
jph00
Following the money is an excellent plan!

I haven't heard anything about defined-benefit pension plans, or their use in
professional societies. Could you say a bit more? Or provide a link to learn
more?

------
getpolarized
What are the costs here... 25M downloads per month doesn't seem like it should
cost $100k per month. Must be total archive size I imagine?

~~~
zachruss92
This isn't insanely high given its traffic and that it's attached to an
educational (not for profit) institution. They need developers,
infrastructure, support, people to process the papers, etc...

------
zozbot234
Is arxiv.org mirrored on archive.org? The latter has its Wayback Machine of
course, but that might not necessarily follow .ps and .pdf links.

~~~
kragen
ArXiv has historically been extremely hostile to crawlers, which is one of the
reasons for its low costs.

~~~
Dylan16807
Bah. If they average thousands of downloads per file, there's plenty of room
in there for some crawlers.

And the total data set is less than a terabyte; seed a torrent somewhere for
$20.

The user-pays S3 bucket also exists as a good thing but S3 is much more
expensive than data needs to be.

~~~
kragen
They started in 1991, when a terabyte was an unimaginably huge quantity of
data and it was common for anonymous FTP servers like xxx.lanl.gov to request
that you not connect until after business hours to avoid interfering with the
main purpose of the machines. When I joined the internet in 1992, our 7.5-MHz
VAX had a 56-kbps frame-relay link to New Mexico Technet (TECNET on our
DECNET), which I think may also have provided LANL's rather beefier internet
connection. They started providing WWW access in 1993, before Apache added
preforking to NCSA HTTPD, and in fact I think before NCSA HTTPD itself. This
means that initially every new HTTP request involved forking a new child
process from the HTTP server, which took a few hundred milliseconds. This is
the context in which the arXiv's hostile stance toward spidering was
established.

I agree that it would be an extremely valuable course of action to seed a
_series_ of torrents, since a single torrent wouldn't work; it would have to
be replaced every time a new paper was uploaded, fragmenting the swarm enough
to render it useless. Also, they could surely use Fastly and permit spidering.

~~~
Dylan16807
> They started in 1991, when a terabyte was an unimaginably huge quantity of
> data

The collection was a lot smaller then. The whole thing has fit on one hard
drive for a long time. And as far as interpreting my post as criticism goes,
apply it to the last ten years only.

> This is the context in which the arXiv's hostile stance toward spidering was
> established.

They should have reconsidered it at some point.

> series of torrents

Sure. A torrent for each 500MB chunk they already collate, or yearly, or both.

~~~
murkt
arXiv has bulk dumps that anyone can S3. They aren't updated every day, not
even every month, but still they have it. I used it myself.

~~~
Dylan16807
[https://news.ycombinator.com/item?id=22115894](https://news.ycombinator.com/item?id=22115894)

------
dependenttypes
Something seems weird. 25000000 downloads per month, lets assume that the
average pdf size is 1MB, that would be only 776 GB of data uploaded per day, I
do not think that this can justify the $1.3m/year figure.

~~~
aneutron
For the sake of the argument, let's assume the following:

\- If you factor all the man-hours required to keep the site up and running
(ignoring the setup costs), then you'd have about one engineer for a whole
year.

\- If we go for a simple S3 based download and accompaniying infrastructure,
it would be safe to assume that it would cost AT LEAST about $700k per year
(AWS is not cheap).

I'd say it's about right.

------
EamonnMR
Can anyone speak to if ACM and/or IEEE memberships are worth it for a
professional developer? I loved reading stuff like the History of Programming
Languages papers back when I had student access in college.

------
Causality1
Taking someone's money to fund research and then not giving them access to
that research is theft. If that money is tax money, I'd call it treason.

~~~
throwawayjava
The big bucks are in patents, not in The Proceedings of the Fifth Annual ACM
Intergalactic Conference on Subsubsubsfied.

Doing something about
[https://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act](https://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act)
seems way more important than fixing publishing...

------
fock
and except for simple formalities, Arxiv isn't reviewed, typeset or anything
else, which a normal journal usually entails (and typesetting costs... hands
up, who is doing all the integral d-s in mathrm...). And yet people cite Arxiv
often blindly...

~~~
thanatropism
\renewcommand{\d}{\mathrm d}

~~~
mkl
I prefer \newcommand{\intd}{\,\mathrm{d}} to separate it a little.

------
dvanduzer
(arxiv.org is not archive.org if someone wants to fix the title. both are
absurd creations that can't possibly exist, though.)

~~~
AceJohnny2
> _both are absurd creations that can 't possibly exist, though._

How so?

~~~
bordercases
It was an endearing comment.

