Hacker News new | past | comments | ask | show | jobs | submit login

> though it's hard to get noticed.

I'd suggest to start with at least a brief paragraph on what thenose is, what it's goals are etc. I read your post, and found myself reading the technical workings of something I didn't know anything about.

Oh, thank you. Basically AI training datasets have been knocked offline recently by DMCAs, and the goal is to bring them back online in a place that can't be knocked offline. The most popular training dataset was The Pile, hosted by The Eye: https://pile.eleuther.ai/

Notice the links now 404. We tried to make a drop-in replacement for those links. All they have to do is change the-eye.eu to thenose.cc in the urls.

Unfortunately there's not a lot of ways to get their attention to let them know this exists now. I'll try emailing the contact address but I imagine they receive lots of spam, so I was hoping to try to get noticed by people like yourself first. Maybe a direct email is still the best way, but there's no guarantee they'll even be willing to change the urls due to legal risks. For all they know I could be logging the IP address of everyone who downloads it and forwarding it to authorities. But I'm not, and it's a frustrating problem to try to solve. I just want to help AI flourish.

This also serves as a template for someone else to do the same thing, so at least there can be multiple mirrors.

Thank you again. The fact that you even took the time to look it over meant a lot. If you have any other ideas, I'd be interested to hear.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
