rehaanahmad's comments

rehaanahmad · 2024-09-08T19:55:45.000000Z

Email me at contact@alphaxiv.org, I'll add it asap!

john-titor · 2024-09-08T21:01:57.000000Z

Thanks a lot, will do!

rehaanahmad · 2024-09-08T17:35:54.000000Z

Great idea, we'll look into making the home page the trending page soon.

Regarding HTMl, our original site actually only supported HTML (because it was easier to build an annotator for an HTML page). the issue is that a good ~25% of these papers don't render properly which pisses off a lot of academics. Academics spend a lot of time making their papers look nice for PDF, so when someone comes along and refactors their entire paper in HTML, not everyone is a fan.

That being said, I do think long term HTML makes a lot of sense for papers. It allows researchers to embed videos and other content (think, robotics papers!). At some point we do want to incorporate HTML papers back into the site (perhaps as a toggle).

DoctorOetker · 2024-09-08T20:33:01.000000Z

I apologize for changing topic here:

Did you bulk download the arxiv metadata, PDF and or LaTeX files?

I am trying to figure out what the required space is for just the most recent version of the PDF's.

I can find mentions of the total size in their S3 bucket but unclear if that also includes older versions of the PDF's.

I also wonder if the Kaggle dataset is kept up to date since it states merely 1.7M articles instead of 2.4 I read elsewhere.

Edit: I just found the answers to my question here: https://info.arxiv.org/help/bulk_data_s3.html

rehaanahmad · 2024-09-08T17:35:13.000000Z

One of the co-creators of this site. A lot of great suggestions I'm reading so far, a lot of them are currently in the works (zooming in/out, infra issues for slow loading times on some papers, google scholar claiming papers).

For some more context, we are a group of 3 students with a background in AI research, and this site was initially built as an internal tool to discuss ai papers at Stanford. We've been dealing with a lot of growing pains/infra issues over the past month that we are in the process of hashing out. From there we would love to make a more concerted effort to share this in areas outside of AI. Happy to hear your thoughts here, or more formally via contact@alphaxiv.org.

I do want to highlight, our site has a team of reviewers/moderators and having folks from different subject areas is critical to making sure the site doesn't end up a cesspool, apply here: https://docs.google.com/forms/d/11ve-4cL0axTDcqnHF66zX6greFV....

musicale · 2024-09-09T02:24:35.000000Z

Moderation is typically the thing that doesn't scale. I am not sure it's a solvable problem (see reddit, stackoverflow, youtube, quora, etc. for negative examples and anti-patterns.) Often sites start out great and then degrade when they become popular.

My main recommendation was going to be organizational: to cooperate and work with arXiv itself, rather than risking a potentially adversarial or competitive relationship.

Now that I think about it however, I am convinced by a peer comment that was basically "leave arxiv the way it is and don't mess it up." So carry on then.

rehaanahmad · 2024-09-08T17:30:32.000000Z

We have a team of enthusiastic reviewers/moderators in a couple sub-categories. We plan on growing this team out as the site continues to grow. If you'd like to be a reviewer: https://docs.google.com/forms/d/11ve-4cL0axTDcqnHF66zX6greFV...

rehaanahmad · 2024-09-08T17:29:34.000000Z

Zoom is in the works! We are adding this in the coming week!

rehaanahmad · 2024-09-08T17:28:40.000000Z

Thanks for reaching out, I am one of the students working on this. We are adding google scholar support soon. If your paper isn't on Scholar or ORCID, you will need to submit a claim that our team reviews. There isn't really any other option, arXiv doesn't allow us to view the author's submission email automatically (although we are in the process of becoming an arXiv labs project soon).