MediaCrush – A website for serving media super fast

graue · on Nov 21, 2013

Hmm... did this site get served a secret warrant last week? Or did they just forget to update their warrant canary?

https://mediacru.sh/transparency/warrant-canary.txt

(Note: the date 08/11 is written European style meaning November 8th, as you can see if you go up a directory.)

jdiez17 · on Nov 21, 2013

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    This is a reply to a comment posted by graue on 2013/11/21 at 06:00h GMT.

    > Hmm... did this site get served a secret warrant last week? Or did they just forget to update their warrant canary? 
    > https://mediacru.sh/transparency/warrant-canary.txt
    > (Note: the date 08/11 is written European style meaning November 8th, as you can see if you go up a directory.)

    Hi, just wanted to let you know that we haven't, in fact, been served a warrant.

    The failure to update the canary was due to my own mistake, and I'm terribly sorry about that.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.14 (GNU/Linux)

    iQIcBAEBAgAGBQJSjdtLAAoJELJF4ERQRPL8VuMP/j8bqNm/uAMzq1n+ebf90RRq
    cDQUsjCbENoR3/1VF4GR0iQhzxDQ28C2Wcc/rjgPjNkL5fLL9QQNb5hUZ38a+ray
    r3fBE4ZQZ5XSriq9iOGy2RoXKhwM/1QuJ9qaOOYmJwkc+/Re+1WbAtAbKnBoPkOy
    z5xMkSnr7b1jI/sUHHmlU6s5wvchXKKLmniCKjtaLp2WLVv95FoxrzRoNu/gHVv2
    LXnjKTllzfcPm9thvCRoikv/N3PKuDBvCIbGm6yhYsNo8a1croAlnChEf0rDWk1B
    8IFM5SXcsuVOSymHJ18VVp2s7xGi1RRcTpUyDt/s74kUuLx7Wpd27YWf5Yko8O+m
    BWfLXbAUamxwRyCmNN219xnhdAb0paaiddbvQX+PHUMMM2+UwdWSSgWnyloFVhGs
    bqZ/vQO6FSP4CVCZvvxyFm493MWSBTvZ2bpWWgdVdIBAg/qSv+D0I6XGyAhUdCqh
    5j38U7nMaHFROr+lCISXdtMxUvPBzNFxKV+3ZTxm/L3hWU75pT9XWsJOxejiIdFe
    7IMgKpbwsWDUg5Mat5muhn13vBH9B5qfa1smhO1eiP/29XLogLj3B2gZ0nnEIO0q
    1o+j/G5crxxhqW01nGBzJq3IaP3+dsCP9Eiwom3cO0EsulZUL9TRsAPjT5IhXJNz
    uPzRgvYpICnrL2qqyGfP
    =uKWu
    -----END PGP SIGNATURE-----

jqueryin · on Nov 21, 2013

Speaking of warrant canaries, has anybody open sourced one to produce something similar with PGP and news?

http://www.rsync.net/resources/notices/canary.txt

http://en.wikipedia.org/wiki/Warrant_canary

ddevault · on Nov 21, 2013

Ack! My other half is responsible for the warrant canary. He keeps forgetting it. We may as well not even have it. It doesn't mean much without a signature, but I assure you that we have never been served a warrant.

laureny · on Nov 21, 2013

Nice try, NSA.

ddevault · on Nov 21, 2013

Well, even if we had been served a warrant, I dunno what we'd give them. We don't store anything about our users. https://blog.mediacru.sh/2013/07/19/MediaCrush-for-nerds.htm...

on Nov 21, 2013

[deleted]

ddevault · on Nov 21, 2013

How do I word it?

As of November 11th, 2013, neither MediaCrush nor its admins have ever received any sort of warrant, or any other kind of notice or request, from the government of any country.

Additionally, we do not store anything about a user who visits our site. Here's an example from the HTTP log:

[21/Nov/2013:06:59:36 +0000] "GET /static/favicon.ico HTTP/1.1" 200 16958 "-" 0.000

When you upload a file, your IP address is run though bcrypt (12 rounds) and saved with the file information in redis. Reversing bcrypt is infeasible with modern technology (and probably for many years to come). We store nothing else about you.

gunn · on Nov 21, 2013

Is there a reason a complete IP address rainbow-table wouldn't defeat this?

ddevault · on Nov 21, 2013

bcrypt is designed to thwart rainbow-table attacks. It salts the hashes and it takes a while (1/3 of a second on my machine) to compute a single hash.

https://en.wikipedia.org/wiki/Bcrypt

ddevault · on Nov 21, 2013

I'm going to respond to all of you at once by saying this: bcrypt is the best possible solution that we are aware of. It's infeasible for anyone but the most resourceful adversaries to brute force your hashed IP, and even then it's still expensive.

However, that's part of why we're open source. You can't trust us when we say that we aren't storing your IP. We could be doing it and you'd have no way of being able to tell. If you're concerned about this, run a private instance of MediaCrush. There are instructions in the README, it's pretty easy to set up.

birken · on Nov 21, 2013

I don't know exactly what situation you are trying to avoid, but with the standard bcrypt, if somebody has the IP hash and a candidate's specific IP, they can positively match the two (something you specifically mention on your privacy page).

One possible tweak is to continue using bcrypt and a salt, but instead shorten the hash output to something like 24 bits. This way it still cannot be so easily reversed or rainbow-tabled, and collisions still shouldn't be an active problem. However, it wont be possible to positively match a given IP to a hash, since multiple IPs will likely hash to a given output. Granted, if you have a candidate IP and it matches the output hash, there is a very high probability that it was the source IP, but it wouldn't be 100%.

wgd · on Nov 21, 2013

At 1/3 of a second to compute a single hash, brute forcing the entire space of possible addresses takes around 45 CPU-years. But computing the hash of every single IP address is ridiculously parallel, so it's trivial to spin up 2k machines on EC2 and brute force the entire thing in a week. Total cost, somewhere under $8k if you don't want to bother owning real machines, less for any organization that happens to need to do similar things on a regular or semi-regular basis.

It's not a trivial investment since that effort only gets you a single IP address, but it's easily within the reach of a vast number of organizations if they have real motivation (read: not a fishing expedition) to reverse it.

bmelton · on Nov 22, 2013

And of course, they wouldn't have to test every IP address in the world, they'd only have to test the IP addresses that appeared in the webserver logs at some point, substantially reducing the time requirement.

ddevault · on Nov 23, 2013

For what it's worth, we don't keep IPs in the http log.

gunn · on Nov 21, 2013

Good answer... It looks like it would take on the order of 7 CPU years to create a table of every used address, much less time to target an area or individual. I don't think this is an issue at all though, I was just curious.

ddevault · on Nov 21, 2013

Actually, the salting is an important detail. Those 7 CPU years would only create a table for one hash. That's how long it takes to brute force a single hash, not the entire space.

vinceguidry · on Nov 21, 2013

I'm pretty sure he was joking.

duiker101 · on Nov 21, 2013

can't you make it automatic, taking the news with rss and stop it if you get a warrant?

Wilya · on Nov 21, 2013

That defeats the purpose.

I think the whole point of a warrant canary is that you have to do something, every week/month, to confirm that you never had to obey a warrant. And (presumably, I don't think it has been tried in courts yet) a gag order can prevent you to speak about a warrant, but it can't force you to do anything, including actively saying that you didn't receive anything.

If it's automated.. the gag order prevents you to stop it, so it might as well not be there.

ddevault · on Nov 21, 2013

Nope. If an adversary seizes our servers, we couldn't stop it from falsely reporting that all is well.

atlbeer · on Nov 21, 2013

If you sign the request w/ a key that isn't present on your servers than it should be impossible for that to occur.

Unless they seize the computer with the key as well

duiker101 · on Nov 21, 2013

If an adversary seizes your servers wouldn't you have bigger worries like getting them back or complaying with the warrant?

mike-cardwell · on Nov 21, 2013

  mike@glue:~$ wget -qO - https://mediacru.sh/transparency/warrant-canary.txt|gpg --verify
  gpg: Signature made Fri 08 Nov 2013 11:48:13 GMT using RSA key ID 5044F2FC
  gpg: BAD signature from "MediaCrush Administrators <admin@mediacru.sh>"
  mike@glue:~$

jdiez17 · on Nov 21, 2013

That's not the signed warrant canary - the PGP signed message lives at https://mediacru.sh/transparency/warrant-canary.signed.txt.

    josemanueldiez@InfiniteImprobabilityDrive:~$ wget -qO - https://mediacru.sh/transparency/warrant-canary.signed.txt|gpg --verify
    gpg: Signature made Fri Nov  8 12:48:13 2013 CET using RSA key ID 5044F2FC
    gpg: Good signature from "MediaCrush Administrators <admin@mediacru.sh>"

markatto · on Nov 21, 2013

Your site says that you "losslessly compress images, video, and audio" - I poked around in the code a little and didn't find the compression stuff, but it doesn't make sense to me that you can do that. There is some optimization that you can do on PNGs, but for most media encoded with a lossy method you shouldn't be able to achieve a smaller filesize without reencoding with another lossy method (and by definition losing more information, althoughh it might not significantly decrease the percieved quality)

ddevault · on Nov 21, 2013

Actually, we just took some of that out to reduce processing times for users. We're going to overhaul the backend processing system so that we can process some things asyncronously, and then we'll put all that code back.

However, we do losslessly compress some things. We run PNG files through optipng and JPGs through jhead to strip out EXIF data, but most interestingly, we run GIF files through ffmpeg and serve them up with HTML5 video [1]. We usually get between 500 and 2000% faster for GIFs.

However, I agree that it's a little misleading, since we don't do it for every kind of media, and an ideally compressed file cannot be compressed further. I've been considering rewording it.

[1] https://mediacru.sh/UZHD8_afZDqz

Daiz · on Nov 21, 2013

As someone who does media compression pretty much daily, your marketing spiel really came out to me badly for the same reasons. While you can optimize JPGs and PNGs in a lossless fashion, you really can't do the same for audio and video. You might be able to make them smaller without losing (much) perceived quality (aka do transparent compression), but you're still doing lossy compression. Same with converting gifs to videos - while the biggest loss certainly happens in the making of the original gif and while you can get a humongous increase in compression quality, converting it to VP8 is still lossy.

And speaking of which, saying that you can get "1000-3000% faster for some files" is also pretty dishonest. You can pretty much claim those kind of numbers with only one of the many formats you support (and one where you're not doing lossless compression), but the way you word it makes it sound like you could get it for potentially anything.

All in all, I'd really suggest you rewrite the description to be more honest.

Two other things I noticed: I can't seem to select text on the homepage (on latest Chrome). Your icon also looks rather similar to that of Miro: http://www.getmiro.com/

ddevault · on Nov 21, 2013

I agree. We originally started to simply deal with GIF compression (MediaCrush was previously known as gifquick) and we have less ground to stand on with respect to spectacular compression with support for more formats. I'll reprioritize the "rewrite the spiel" task thanks to feedback from HN.

As for selecting text on the home page, not much we can do about it. We force your focus into a contenteditable div to allow you to paste images/URLs directly into the page.

Regarding the icon, it just so happens that I had the sneaking suspicion that I had seen it before when the designer presented it to me. So I ran a reverse image search and looked at about a thousand similar icons. I didn't find Miro - if I had, I probably would have asked for it to be different. In any case, I think it's different enough that I'm not worried.

eonil · on Nov 21, 2013

We all know how H.264 is great. You don't need to explain it again.

And you should fix it right now. I already got super-bad first impression. Currently your product is just marketing junk shit to me. And you have no way to fix my impression because I won't review your product ever again.

ddevault · on Nov 21, 2013

> We all know how H.264 is great. You don't need to explain it again.

You know that, but our intended audience isn't the HN crowd.

I'm sorry to hear that you got a bad impression of the site, but even without any fancy speed tricks (and there are fancy speed tricks), it's still an open-source, privacy-centric media hosting site that I think is pretty damn great.

eonil · on Nov 21, 2013

It could be great. And that exaggerated text voids everything. That's why I told you to fix it ASAP. Before that text make you to lose any more people.

jdiez17 · on Nov 21, 2013

A pull request rewording that text would be very welcome.

girvo · on Nov 21, 2013

"Fix it right now"

That came across to me as super snarky and entitled, despite you having a point. Rewrite it right now.

(See? Now, I actually don't think you meant it like that, but if I was the OP I would've taken that badly. Food for thought!)

eonil · on Nov 21, 2013

OK. I realized this is not a product yet. I have to admit that I missed it. I apology for bad attitude.

dalore · on Nov 21, 2013

Needs to put more information in the README on what it does, not just installation.

I'm guessing based on the libraries used that it tries to make the sizes of static media content smaller, like smaller png's etc. But that's just a guess.

colmvp · on Nov 21, 2013

MediaCrush is awesome. We use it all the time on the NHL subreddit for quick replays of things that happen in hockey games. Because of the media controller, you can pause or quickly toggle to a key point in a play.

dublinben · on Nov 21, 2013

Is there a reason why you've written your own license, instead of using an established free software license like the GPL?

ddevault · on Nov 21, 2013

What? The MIT license is OSI approved.

dublinben · on Nov 21, 2013

You're right, it just wasn't clear you were using the MIT License.

belorn · on Nov 22, 2013

MIT license is one of those licenses which the license text do not reference itself by name.

A bit odd, but the requirements to use the software are understood to be simple.

mankash666 · on Nov 21, 2013

Wonderful work. How do you plan to sustain the site when it gets too big to fund privately. Is this a for-profit venture?

ddevault · on Nov 21, 2013

This isn't for-profit. It's actually not even technically a startup. I'm paying for the servers out of my own pocket. We've got a big pipe and lots of storage, so we'll last a while.

Ideally, we'll eventually be in a position to run ads internally instead of relying on adsense, and then we can hopefully make a little better money off of that. We also get a surprising number of donations [1] that help keep us going.

By the way, does anyone know anything about marketing? That's one thing we aren't good at. We've gotten very little press coverage. We tell people "if you like it, tell your friends" and that's gotten us this far.

[1] https://mediacru.sh/donate

mankash666 · on Nov 21, 2013

I'm no expert, but here are my two cents: 1> Spread the word manually on HN, 4chan and the like 2> Add basic social media presence (twitter, FB ..). This will let you communicate with your fans/early adopters, and let them talk to you too. 3> When the audience in #2 is big enough, launch on kickstarter et al. 4> IF you have a successful kickstarter campaign, the media will cover you, given the big pain points you solve

What I do not know is how you will sustain when you get BIG. The "donate" method works well for someone like Wikipedia, BUT your traffic/costs are generally higher given that most of it pertains to media, as against text for wikipedia

ddevault · on Nov 21, 2013

Regarding 2, we have @mediacru_sh [1] and a subreddit [2]. Not many folk follow that, though. We have a problem with Facebook, though. Considering that all the devs are very pro-privacy [3], none of us have one!

As for Kickstarter, I'm not sure what we'd be raising money for or giving to contributors as a thank you. We are pretty featureful as it stands now, and I'm very against the idea of making people pay for certain features.

[1] https://twitter.com/mediacru_sh

[2] https://pay.reddit.com/r/mediacrush

[3] https://blog.mediacru.sh/2013/07/19/MediaCrush-for-nerds.htm...

nthnclrk · on Nov 21, 2013

Love the idea. Potentially you could fund via a paid tier for business/enterprise use?

Rather than offering more features, you could have some sort of guarantee of length of hosting ie. Pay $299 a year and get x storage for x years? (Choosing suitably generous figures)

Then update your disclaimer on the about page to be free forever for personal use?

I don't know, but personally, this would be brilliant for the business I work at, but we could never use it for fear of hosting being unavailable if you go out of "business". I actually think you're doing a disservice to many organisations out there for not offering that — with the handy upside of having secured incoming funds.

...just a thought.

ddevault · on Nov 21, 2013

If you're worried about it going away, you can always run your own instance! It is open-source, and the instructions are pretty straightforward if you have a look through the readme. You can always donate [1] to help ensure continued development. Even if it does go under, too, we'll keep the GitHub up and maintain it in our spare time, since we love the project and use it so much for our own needs.

You do have a good point, though. I don't want to force businesses to pay for hosting with us, but I am open to considering other means of monetization. Maybe we could set up and host private instances for people, plus support, for a fee?

[1] https://mediacru.sh/donate

nthnclrk · on Nov 21, 2013

Great points. Certainly aware of the open source nature, and ability to clone and run an instance, but for some organisations that's not really an option — they need a turnkey solution that has a somewhat inelastic dollar value and the ability to outsource all of the administration rather bringing another project in-house. That or they don't even have the technical human capital to support even rudimentary development work (there are TONS of Small to Medium Enterprise that would fit this bill).

I think your idea of private instances might be something worth exploring. It allows you to stick to your word, and the essence of the product while still finding another way to monetise.

Of course, if none of that seems interesting and you'd rather leave that problem for someone else to address than that's totally fine too. I only suggest it to you as you're already looking at the solution from a technical standpoint, and validated the need (based on traffic and use), indicating there is a market there to be serviced.

jjoe · on Nov 21, 2013

Can you elaborate on how you're gauging success for your project?

ddevault · on Nov 21, 2013

Well, as of earlier today, there are 8,947 media blobs on the site. Additionally, we recently hit 100k uniques [1] over the lifetime of the site, and a few thousand today alone. We maintain this level of traffic fairly consistently. Of course, we're still getting started. The site is only a few months old.

[1] We publish reports, including analytics, here: https://mediacru.sh/transparency/

EDIT: Original post title: "We open-sourced our successful startup"

snikch · on Nov 21, 2013

I think the operating profit for Oct[1] should have a negative in front of it.

[1] https://mediacru.sh/transparency/2013-11-01-Financial-summar...

ddevault · on Nov 21, 2013

Thanks for pointing that out. I'll fix it up shortly.

jjoe · on Nov 21, 2013

Awesome, thanks for sharing! Cheers

airkraft · on Nov 21, 2013

So you have done a poorly designed website which is nothing but yet another uploader with file compression (using third party tools) and some other falsely claimed features (caring about privacy while using Google Analytics, AdSense, Disqus and did not dig much more). This is an okay idea, but being serious here, this does not take more than a few days to develop and should not have much attention, except if there are surprising features I am not aware of.

jdiez17 · on Nov 21, 2013

Disabling Analytics and AdSense is very easy - just enable DNT on your browser. We haven't found any acceptable alternatives, although building our own analytics software[1] is on the roadmap.

[1] https://github.com/MediaCrush/MediaCrush/issues/118

pomfpomfpomf3 · on Nov 21, 2013

Have you considered Piwik?

http://piwik.org/

jdiez17 · on Nov 21, 2013

Yes, but after evaluating it we decided not to use it. Also, we have a strict no-PHP policy.

pomfpomfpomf3 · on Nov 21, 2013

Have you decided to not use it only because it's written in PHP or are there any other reasons?

jdiez17 · on Nov 21, 2013

There are other reasons, but PHP is a pretty big one. If we're going to use anything other than Google Analytics, we want to integrate it pretty deeply with our frontend. We were thinking of making a Flask extension that would capture all non-personal information without having to serve analytics code to the client. Also, as far as I know, Piwik does not support DNT.

pomfpomfpomf3 · on Nov 21, 2013

Piwik actually does support DNT and also can process server logs.

http://piwik.org/privacy/

http://piwik.org/log-analytics/

talles · on Nov 21, 2013

"you'll be able to host your media losslessly and easily on MediaCrush until the end of time"

"What's the upload limit? A very generous 25 MB!"

For real? Nice.

ddevault · on Nov 21, 2013

For really reals. We're considering bumping it up to 50 MB, too.

vinceguidry · on Nov 21, 2013

You'd need at least 500 MB before anybody would bother.

ddevault · on Nov 21, 2013

We compete with sites like imgur and their limit is 10.

nacs · on Nov 22, 2013

That upload limit is per file not per account. Most people's jpg/gif/pngs doesn't approach 50MB let alone 500MB.

vinceguidry · on Nov 22, 2013

Ah, for some reason I was thinking the site was for video and not images. Probably because that's what I think of when I see the term 'media'.

gsharma · on Nov 21, 2013

This is something we can use. I just tried an image and mediacrush gave me a URL of the hosted image. I compared the size of the original image vs the one on MediaCrush. They were both same (130kb) Am I missing something? Or there is nothing to optimize in the image that I used?

Edit: Just realized the original image is optimized by Cloudflare Pro account. I guess they work well.

ddevault · on Nov 21, 2013

We recently disabled some optimizations to reduce processing time while we overhaul the backend processing system to support async processing. Usually, we run PNG files through optipng and JPGs through jhead (the latter isn't very effective).

Response to your edit: heh, I guess it wasn't that easy to optimize in the first place. If you want to see how well we usually treat it, run `optipng -o5 foobar.png`

gsharma · on Nov 21, 2013

Thanks, will check out other images.

ibsathish · on Nov 21, 2013

Awesome initiative. I sincerely hope this triggers an avalanche for more to follow for the sheer betterment of the hacker community worldwide.

Thanks.

ddevault · on Nov 21, 2013

Sure thing. I love open source and we had no reason to make it closed source. We've been open source from day one.

maho · on Nov 21, 2013

I love how fast your website loads! It feels much more responsive than other, compareably minimalistic sites. How did you do it?

ddevault · on Nov 21, 2013

Nothing special, as far as I know. Our nginx config is public [1]. We serve static files directly through nginx and proxy to a gunicorn server for dynamic content. We're a single virtual private server on AWS (soon to be a single dedicated server on Voxility).

[1] https://github.com/MediaCrush/MediaCrush/blob/master/config/...

toomuchtodo · on Nov 21, 2013

Have you thought about putting Cloudflare in front of your site to reduce bandwidth consumption?

ddevault · on Nov 21, 2013

We have CloudFlare set up, but we only plan to turn it on to mitigate DoS attacks and help out if we run into non-malicious severe load.

toomuchtodo · on Nov 21, 2013

Why not use their CDN to offload? (No snark! As an infrastructure guy, I'm always curious why people decide to do certain things certain ways)

ddevault · on Nov 21, 2013

Well, we make a big deal about user privacy, and I don't want to give a third party access to that kind of information.

toppy · on Nov 21, 2013

Anybody can delete my file by adding 'delete' to public URL of uploaded file, am I right? Please don't do this.

ddevault · on Nov 21, 2013

You are not correct. Only you can delete your uploaded file.

That reminds me of a cool thing [1] one of our users built on top of MediaCrush, though, where others can delete your files.

[1] https://github.com/blha303/SnapCrush

toppy · on Nov 21, 2013

By "only you can delete" you mean "only user with your IP can delete"?

1. Open FF / upload some file, get https://mediacru.sh/<someid>

2. Open Chrome in private mode

3. Go https://mediacru.sh/api/<someid>/delete

4. File is gone!

ddevault · on Nov 21, 2013

You == your IP, in this case. We check the deleter's IP against the bcrypted one we store with the file before allowing them to delete it. There's an open GitHub issue discussing alternative methods [1] if you'd like to read some more about it.

[1] https://github.com/MediaCrush/MediaCrush/issues/311

toppy · on Nov 21, 2013

It's clear now. Nice job!

paf31 · on Nov 21, 2013

This is really excellent, thank you.

a_c · on Nov 21, 2013

Would love to hear what's your next plan, if you don't mind.

ddevault · on Nov 21, 2013

What's next for MediaCrush? We have a public issue tracker on that same GitHub repo [1], which gives a good idea of our near-term plans.

[1] https://github.com/MediaCrush/MediaCrush/issues

gwu78 · on Nov 21, 2013

jhead.

I haven't heard anything about that program for years.

I always considered it one of those programs designed by and for people who can appreciate the command line.

The world needs more programs like jhead.

jdiez17 · on Nov 21, 2013

I know what you mean - and we do appreciate the command line in MC, but we also understand that it's not for everyone, so we try to make things like jhead accessible to everyone.

BorisMelnik · on Nov 21, 2013

don't see a way to (easily) copy the url to clipboard?

ddevault · on Nov 21, 2013

Right click -> copy? Maybe we should include a copy button.

codygman · on Nov 21, 2013

Perhaps he's wanting/referring to one of those little hyper link images you can click to copy the url of the page.

crorella · on Nov 21, 2013

ddevault · on Nov 21, 2013

No reason not to.

spindritf · on Nov 21, 2013

What happened to the language bar on GitHub?