Hacker News new | past | comments | ask | show | jobs | submit login
Why is a browser extension required to access research papers?
67 points by bsenftner 5 days ago | hide | past | favorite | 21 comments
There are a series of subreddits concerned with machine learning topics. In those subreddits there is an organization named CatalyzeX that is constantly promoting their browser plugin to access the latest ML papers, code, etc.

Here is one such post of theirs: https://old.reddit.com/r/LatestInML/comments/s2l10i/given_a_single_video_of_a_human_performing_an/

My question: why a browser extension? This really smells fishy, because a browser extension bypasses any browser security one may have, and has access to one's entire computer. What valid reasoning could there be to justify a browser extension for this purpose?

You don't need the plugin to view the paper.

CatalyzeX appear to be just posting (or spamming depending on your point of view) interesting papers/videos to AI/ML subredits to publicise their chrome plugin (that "finds and shows implementation code for any... research papers").

The link to the paper is at the top of the post under "paper link" from that you can click through to the paper on arxiv.org

The reason its a browser extension is on the plugin page: https://chrome.google.com/webstore/detail/aiml-papers-with-c...

▶ Browse the web as usual and you'll start seeing [CODE] buttons next to papers everywhere.

Nice explanation, nice purpose, but IMHO every browser extension should be open source unless it implements some particularly valuable know-how logic the authors understandably want to keep secret.

Chrome browser extensions aren't really compiled, though I suppose they can be obfuscated. If you install this extension you ought to be able to find it in your Chrome profile directory and look at the manifest and JS source.

(former longtime chrome engineer here)

Or install an extension to view code of other extensions before installing, directly on chrome store pages


> because a browser extension bypasses any browser security one may have, and has access to one's entire computer.

This isn't true. They don't have any access outside of the browser with that extension. Further, the list of sites that they've asked for permissions to seems vaguely reasonable. I have not actually looked at the extension code to see what they're doing, but they have only asked for permissions to the following:

Catalyze.com domains





And unfortunately the extension security model isn't all that granular, so "Read and change your data" really just means they wanted to touch the DOM somewhere on those sites. They could be recording data, but it's more likely they wanted to add a button/link somewhere.

"Can read and change your data on google.com" is the opposite of reassuring.

It means they can view/modify the DOM of those sites.

Lots of harmless reasons to want to do that (ex: adding a link/button) but yes, it also means they can grab the text content of the site (read your data) and change the site DOM or your text nodes (change your data).

Just because they can doesn't mean they do, though (but it also doesn't mean they don't... shrug).

Again - the problem here is there's absolutely no way to tell the browser "Hey, I don't want to see text nodes, just put this button here in the DOM".


I should add - they appear to have only requested www.google.com, which is usually just search. Gmail/Docs/Sheets/Suite/Other are usually under seperate subdomains - ex: gmail is mail.google.com.

> It means they can view/modify the DOM of those sites.

Not really into this topic (extensions), but could this capability be used by an extension to create fake (invisible) login forms, grab your login data through auto-complete and send it home?

I mean - yes, although there's no real reason to even bother with the fake login form. They could just monitor the real login form the next time you login.

That said - at least for google, login is on accounts.google.com - so they aren't asking for that here.

Additionally, these extensions do go through review by google - something as blatant as a content_script that's phoning home with login details would ideally be caught (I develop extensions for work, but haven't tried submitting something malicious for review - so I can't really comment on whether they DO actually catch it)

You can use sci-hub.se to access virtually any academic paper (which has been digitized) you want. No browser extension required.

Unfortunately due to an ongoing court case, Sci-Hub have paused adding any new papers

[ https://old.reddit.com/r/scihub/comments/lofj0r/announcement... ]

Further details and suggested alternatives at https://old.reddit.com/r/scihub/wiki/index

The case is ongoing but the injunction was temporary and has now expired. See https://news.ycombinator.com/item?id=28421477

The real problem is that the LatestInML subreddit is effectively an ad for the extension.

Solution: don't subscribe to that Subreddit.

Hi, co-author of CatalyzeX/the extension here! Most of the comments here seem to have been addressed, just re-iterating the following:

- The extension lets you easily jump to the code for papers (the papers are all open-access on the web) - The extension code is not obfuscated so it's easy to check out the underlying code for yourself (just search online for how to view extension source code). We just haven't officially open-sourced it as of now as it still needs a fair bit of cleaning up and optimizing - What we're using the browser permissions for is to add code buttons in-line on the webpage(s) you're on. Permissions offered by Chromium browsers are unfortunately not more granular than what we're using to simply update the DOM.

Feedback is always welcome!

FWIW in latest Chrome if extension wants access to all websites, you can toggle it to be "click to enable", or only whitelist a few websites authorized to run it. Go to extension page -> choose extension and modify where it can run.

Very nice. Firefox needs this too.

https://bugzilla.mozilla.org/show_bug.cgi?id=1737161 seems to be where they are working on it.

Edit: At least... I'm guessing. Regrettably the details of the bug is linking back to some internal atlassian mozilla seems to be keeping. Same with related bugs.

This is a little surprising to me, since all of mozilla's development tracking used to be on bugzilla. I hope this isn't a move to something more like Safari or Microsoft where issues vanish into an internal system with external updates ignored.

> This really smells fishy, because a browser extension bypasses any browser security one may have, and has access to one's entire computer.

IIRC that one is no longer true since the advent of sandboxing (and the end of native-code extensions via NPAPI and friends), all permissions have to be granted by the user explicitly.

> What valid reasoning could there be to justify a browser extension for this purpose?

Given the permissions it asks for (access catalyzex.com, arxiv, google scholar, twitter and google) and the description, I'd guess that whenever you search for some research paper it will forward the search to catalyzex and annotate search results.

Unfortunately, the access to Google and Twitter can also be used to exfiltrate your credentials or to commit actions on your behalf there, so I'd be very careful. Too bad the Chrome extension store (unlike the Firefox extension store) does not allow you to directly download the extension to examine it, or to prevent automatic updates.

This is correct! :) Also, the extension's source code is not obfuscated so it's possible to download, extract, and go through the readable code. If anyone's keen, just Google or Bing search to see how to do so (or lmk!). We haven't officially open-sourced as of now as the code still needs cleaning up, refactoring, etc :)

Other issues aside, CatalyzeX has annoyed me for a long time. The creators of the extension used to spam almost all ML/DL related social media. Maybe they still do, I don't know cos now I've stopped following all of those platforms.

Why did it annoy me so much? - They would share the popular, latest papers that most people in the ML sphere would anyways come across if they follow academia twitter or ML subreddit. That is fine in itself but no, those posts would hijack the original source (mostly arxiv or respective project page) by taking you to the CatalyzeX website which is mainly designed to drive traffic to it and has all sorts of irritating design patterns. Mostly, it just felt dishonest and a blatant shadowing of the original authors' hard work to me.

It's a way of ensuring you're on a supported platform with a supported browser, and, most likely, that you're only logging-in once (or in only one place at a time) with any given set of credentials

Since it's an authenticated (from their perspective) add-on, they also might do some "clever" things with the materials being offered (animations vs flat PDFs, for example)

It may also handle/verify some aspect of their support model

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact