Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: An open-source media hosting site that's anonymous and fast (mediacru.sh)
95 points by jdiez17 on Aug 10, 2013 | hide | past | favorite | 57 comments



Stellar work. Your openness about everything is incredibly refreshing, and integration with RES was an awesome move. It's an easy choice to make the switch!

I am seriously impressed. I loaded up your README on github and had your app running within 10 minutes locally---including `gif` uploads. That's just really nice craftsmanship that is usually missing in fresh projects. Giant kudos to you guys.

(There were a couple trivial steps I had to do that weren't documented. I submitted a pull request. [1])

[1] - https://github.com/MediaCrush/MediaCrush/pull/108


Thank you so much! It feels awesome to know you've managed to run it smoothly. Really makes all the effort we've put into it worth it!

PS: Merged your pull request, thanks!


Thanks! Much appreciated.


"The only thing we store about you is your hashed IP address"

The IPv4 address namespace is so small that brute-forcing hashes would be trivial, so this isn't an effective approach.


Update: looks like it _will_ take a lot of time to bruteforce a single IP:

   >>> timeit.timeit('bcrypt.generate_password_hash("127.0.0.1")', "from app import bcrypt", number=10)
   3.342690944671631


True. We use 12 rounds of bcrypt, though, so they should be at least moderately secure. It's the best solution we've found so far... we'd be happy to hear alternatives that allow us to generate secure hashes for the IPs and allow us to ban people by IP if necessary.

And yes, I know that 12 rounds of bcrypt doesn't mean much. It will delay brute force attacks, though.


If the only purpose is for IP blacklists (bans, dos, etc), then doing 12 rounds of bcrypt would be counter productive. It'd be a lot of CPU usage on your end. Especially if its done for every request.

A better approach would be something quicker to compute but that you can destroy equally easily. Generate a random token that cycles every X minutes/hours. The HMAC the remote IP and this secret. Use the result for bans/rolling rate limiting. If you keep the token only in memory then you don't have to worry about the IP lists being leaked as they won't be recoverable.


An even better approach would be a Bloom filter.


I've created an issue [1] to discuss this in GitHub - feel free to explain your ideas there and we'll figure out what would work best.

[1] https://github.com/MediaCrush/MediaCrush/issues/116


You can only keep the last three bytes of the IP address (e.g. x.1.2.3). And ban only for short time (for example 1-2 hours) when required.


Hm, not a bad idea. Thanks!


Ban all .gov and .mil known IPs. Ban all IPs owned by the MIC (KBB, BAA, Etc)


What makes you think these people aren't capable of purchasing or routing through commercially available address space?


They are certainly capable of paying for it... the facts show though that they are typically not good at shielding their source IP. The wikipedia edit from the senate IP, the tracking of the SAIC IP to NSA etc...

Its not a 100% guarantee - but it's a start.


Hi, I made this site with jdiez17 over the past few months. We'd love to hear your thoughts on our work.


This is awesome, and I really appreciate the effort toward privacy/transparency. Along those lines, supporting do not track is great, but why use GA at all? Just implementation ease? Is this something you plan to move away from?


Thanks!

Well, GA is quite convenient - we get pretty graphs, realtime analytics and so on. It's not something we have considered moving away from, since it's trivial to disable it entirely. And it's not significantly worse than any other tracking tool.


I'd say it's substantially different from hosting your own Piwik, OWA, or even something like snowplow - where you could elect to avoid IP storage.

That said, those all entail a lot of work and/or additional cost. You're also absolutely right that allowing users to disable it (and ads) is an amazing feature.


I've found piwik to be unusable for large datasets


How large?

Piwik is really ancillary to the discussion at hand, but I often see the claim that Piwik can't handle busy sites, and it's important to quantify the claim.

I've had success (and others report similar behavior) with 500,000+ hits per day. http://piwik.org/docs/optimize/ reports adequate support to higher levels. It's quite easy to set this up with EC2 + RDS, and using autoscaling gets you a very resilient solution that can easy handle those numbers. Also, in the case of mediacru.sh, many of the optimizations have little impact since they optimize for reporting on the already-gathered analytics. With only two analytics viewers/users - this is not much of an issue.

If you're doing more than 1mil per day, then I think something like snowplow, a commercial solution, or a fully custom solution are appropriate anyway.


You might want to consider self hosted analytics. I've heard a lot of talk about Piwik, though I've not done much with it myself.


In addition to what jdiez had to say about GA - we're trying to understand our audience a little better. MediaCrush uses tons of new web tech that won't work on outdated browsers, and GA helps us get an easy look at support for things like that. Also tells us what kind of media is most popular, and who's sending us traffic, which is just kind of nice to know.


I want to like you, but for me using GA nullifies every nice thing you say in https://mediacru.sh/serious

There is no excuse for not running your own: http://demo.piwik.org/

Also worth reading: http://manurevah.com/blah/en/blog/Like-this-if-You-are-Again...


I'd volunteer to set it up for you on a virtual machine for free.


Okay, I understand that using Google Analytics when we're so pro-privacy is a bit of a weird choice.

We've realised that self-hosting our analytics might be a better choice. I've created an issue[1] to discuss this matter. Ideally what we'd want is something as close to GA as possible - real time analytics being reasonably important.

Note: we were aware of the implications of using GA on the site, but since we offer the ability to disable them very easily we didn't think it was a big deal. That's a mistake on our part, so let's discuss how to fix it.

[1] https://github.com/MediaCrush/MediaCrush/issues/117


LOL! Okay, we don't like Piwik because it's written in PHP, so fsck privacy, google can haz it. This is truly sad.


Demo page [1] uses gif made from Chuunibyou demo Koi ga Shitai! [2] anime. Was it totally random choice or MediaCrush crew did it purposely (aside from showing big difference)? Apparently many geeks are after anime, but sadly overall plot-quality of them deteriorates slowly almost each year. That said, Chu2koi was actually one of better series in 2012 Q4 (with nice visual side too).

To be less off-topic: good job on making it, openness and finally shipping it.

  [1] https://mediacru.sh/demo
  [2] http://en.wikipedia.org/wiki/Love,_Chunibyo_%26_Other_Delusions


Heh, it's certainly not random choice. Sir_Cmpwn is a massive anime geek. Personally, I'd have used a cat gif.


I thought 2011 and 2012 were pretty good.

2011 had Fate/Zero, Madoka, Steins;Gate, Hunter x Hunter, Mawaru Penguindrum

2012 had From the New World, Jojo's Bizarre Adventure, Humanity has Declined, Mirai Nikki, Hyouka

2013 has.. uh.. Attack on Titan, I guess?


This is great!

Any plans to have an API that other products/services can use? Is it against the TOS to post to /upload from a different domain?

I'll be working on a site soon that might allow some type of media upload. Would it be okay to use mediacru.sh for something like that?

I could see "free hosting" getting really expensive for you guys though, especially if you allowed hotlinking, etc.

Anyways, best of luck to you. Hopefully this becomes profitable for you.


You're free to use our API. There's some shitty "docs" here: https://github.com/MediaCrush/MediaCrush/issues/50


Awesome. Thanks again!


Hmm. Weird i tried this with two of the same MP3 files and it didn't change the file size & quality is the same? :/ Original: http://151.236.11.202/mp3/2.mp3 Converted: http://151.236.11.202/mp3/1Vzigd0EXPo1.mp3 Any suggestions?


Actually, we don't do any processing on mp3 files yet. The following files are compressed to the best of our ability: GIF, JPG, PNG, MP4, OGV, SVG.

Got an idea for a good way to compress mp3 files?


To be fully honest i have no clue lol. I though you had some type of method and that is why i was like oh damn this will be amazing for my new project :-) Best of luck hope to see MP3 compression soon.


Don't.


Well go figure it can be done with the mp4s so why not give it a try with mp3s?


I'm a stickler for quality. I may be an outlier - but the future shouldn't be in 320kbit/s


We aren't planning on using lossy compression. We're sticklers for quality, too.


It would be nice if you could convert videos from Youtube and convert them into a GIF


from what I've seen this is all free. which is ofcourse great. and open source too. But I wonder if you guys have a business model? I mean somebody has to pay for those servers right?


We don't have a business model right now. At this stage we're just trying to build a service with the best possible experience for the users. Monetization will come later, we can afford to pay server and bandwidth bills for now.

Our only source of income is donations and the advertisements that we show exclusively on the home page. You can check all of our accountability at https://mediacru.sh/transparency, by the way.


Brilliant. I wish you guys the best of luck. I hope all of this pays off in the end and you guys can make this awesome product profitable, because I hope to be using this for a long time :)


Maybe I'm blind, but what's the file size limit?


25MB. That's not actually stated on the website (good catch) but you can see it here, on the nginx config: https://github.com/MediaCrush/MediaCrush/blob/master/config/...


Those rewrites you're using

https://github.com/MediaCrush/MediaCrush/blob/master/config/... (BTW, how to hightlight two separate lines?)

are considered taxing

http://wiki.nginx.org/Pitfalls#Taxing_Rewrites

and you could replace them with just a "return."

I love the idea, the fact that you provide all the configs, that it's written in python... it's a really great project.


Oh, didn't know about that. Well, that's the good thing about being open source - you have some knowledge we didn't have, so submit a pull request fixing it, I'll merge it, and we'll all be a little bit happier.

Thanks for the heads up!


Oh, thanks. I think this is a promising project but I'd like to point out a couple of things.

I tried to upload 3 different files. 2x PNG file (no more than 1MB in total) and a gif file (~5MB). I was only able to upload one of them (one of the PNG files). I tried the upload the same files on 2 different browsers (Chrome 29.0.1547.49 & Firefox 23) in both normal and private mode but the result was the same. Maybe you're dealing with heavy traffic right now or maybe the problem was on my side. I hope I don't sound like I'm criticizing you (not that there's anything wrong with that), but I'm merely pointing out my experience in the first 5 minutes. I'll keep using the service to see if everything works out, though. Because I'm currently looking for a service like yours.

Another point I want to make is the similar to the one I've already made. You need to provide more information about the service. Clearly since this is an anonymous service, users won't be able to sign up for an account to manage their files, but what happens when I upload my files? Are they going to be indexed by the search engines? How long are you going to keep my files online?


Can you link me to your failed images? I'll see what's up. Load isn't too terrible at the moment.

As for information, I'll make sure it's more clear on the site, but I'll answer you directly as a temporary measure: When you upload your files, they disappear into our servers and can only be accessed by that URL. If you lost it, just upload it again, we'll hash it on the client before you actually do the upload. As for indexing, view pages are not shown in search engines. Your files stay there forever.


I don't have any links because the uploads weren't completed. I was able to upload the PNG file that I wasn't able to before, but I still can't upload any gif files. All of the gif files that I tried to upload stuck at exactly this point: http://alog.lu/mcrush.png. I tried to upload a gif file that has a smaller size (~2MB), and it looked like I was going to upload it successfully, but the upload started all over again and then get stuck at the exact same point. It's really strange. I'm pretty sure it's not due to my connection because I can use any other website perfectly fine and I did try to upload different gifs from different browsers, but I got the same result. Maybe it's an OS related issue? I'll try again tomorrow morning on Xubuntu to see if the problem persists. Thanks for your interest and your answers on other issues.


Firefox is being a bitch today, it seems. I'll be looking into it.


It's not only Firefox though, as I mentioned. I also tried it on Chrome (both on normal and private modes).


Fixed.


Yep, it works now. Thanks.


Yeah, I can reproduce it sometimes now, too. Investigating.


Any plans to feature a site search?


Nope, kind of goes against the idea - there's no way for you to "discover" uploads to mediacrush. We don't even allow crawlers to index media pages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: