-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
This is a reply to a comment posted by graue on 2013/11/21 at 06:00h GMT.
> Hmm... did this site get served a secret warrant last week? Or did they just forget to update their warrant canary?
> https://mediacru.sh/transparency/warrant-canary.txt
> (Note: the date 08/11 is written European style meaning November 8th, as you can see if you go up a directory.)
Hi, just wanted to let you know that we haven't, in fact, been served a warrant.
The failure to update the canary was due to my own mistake, and I'm terribly sorry about that.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAEBAgAGBQJSjdtLAAoJELJF4ERQRPL8VuMP/j8bqNm/uAMzq1n+ebf90RRq
cDQUsjCbENoR3/1VF4GR0iQhzxDQ28C2Wcc/rjgPjNkL5fLL9QQNb5hUZ38a+ray
r3fBE4ZQZ5XSriq9iOGy2RoXKhwM/1QuJ9qaOOYmJwkc+/Re+1WbAtAbKnBoPkOy
z5xMkSnr7b1jI/sUHHmlU6s5wvchXKKLmniCKjtaLp2WLVv95FoxrzRoNu/gHVv2
LXnjKTllzfcPm9thvCRoikv/N3PKuDBvCIbGm6yhYsNo8a1croAlnChEf0rDWk1B
8IFM5SXcsuVOSymHJ18VVp2s7xGi1RRcTpUyDt/s74kUuLx7Wpd27YWf5Yko8O+m
BWfLXbAUamxwRyCmNN219xnhdAb0paaiddbvQX+PHUMMM2+UwdWSSgWnyloFVhGs
bqZ/vQO6FSP4CVCZvvxyFm493MWSBTvZ2bpWWgdVdIBAg/qSv+D0I6XGyAhUdCqh
5j38U7nMaHFROr+lCISXdtMxUvPBzNFxKV+3ZTxm/L3hWU75pT9XWsJOxejiIdFe
7IMgKpbwsWDUg5Mat5muhn13vBH9B5qfa1smhO1eiP/29XLogLj3B2gZ0nnEIO0q
1o+j/G5crxxhqW01nGBzJq3IaP3+dsCP9Eiwom3cO0EsulZUL9TRsAPjT5IhXJNz
uPzRgvYpICnrL2qqyGfP
=uKWu
-----END PGP SIGNATURE-----
Ack! My other half is responsible for the warrant canary. He keeps forgetting it. We may as well not even have it. It doesn't mean much without a signature, but I assure you that we have never been served a warrant.
As of November 11th, 2013, neither MediaCrush nor its admins have ever received any sort of warrant, or any other kind of notice or request, from the government of any country.
Additionally, we do not store anything about a user who visits our site. Here's an example from the HTTP log:
When you upload a file, your IP address is run though bcrypt (12 rounds) and saved with the file information in redis. Reversing bcrypt is infeasible with modern technology (and probably for many years to come). We store nothing else about you.
I'm going to respond to all of you at once by saying this: bcrypt is the best possible solution that we are aware of. It's infeasible for anyone but the most resourceful adversaries to brute force your hashed IP, and even then it's still expensive.
However, that's part of why we're open source. You can't trust us when we say that we aren't storing your IP. We could be doing it and you'd have no way of being able to tell. If you're concerned about this, run a private instance of MediaCrush. There are instructions in the README, it's pretty easy to set up.
I don't know exactly what situation you are trying to avoid, but with the standard bcrypt, if somebody has the IP hash and a candidate's specific IP, they can positively match the two (something you specifically mention on your privacy page).
One possible tweak is to continue using bcrypt and a salt, but instead shorten the hash output to something like 24 bits. This way it still cannot be so easily reversed or rainbow-tabled, and collisions still shouldn't be an active problem. However, it wont be possible to positively match a given IP to a hash, since multiple IPs will likely hash to a given output. Granted, if you have a candidate IP and it matches the output hash, there is a very high probability that it was the source IP, but it wouldn't be 100%.
At 1/3 of a second to compute a single hash, brute forcing the entire space of possible addresses takes around 45 CPU-years. But computing the hash of every single IP address is ridiculously parallel, so it's trivial to spin up 2k machines on EC2 and brute force the entire thing in a week. Total cost, somewhere under $8k if you don't want to bother owning real machines, less for any organization that happens to need to do similar things on a regular or semi-regular basis.
It's not a trivial investment since that effort only gets you a single IP address, but it's easily within the reach of a vast number of organizations if they have real motivation (read: not a fishing expedition) to reverse it.
And of course, they wouldn't have to test every IP address in the world, they'd only have to test the IP addresses that appeared in the webserver logs at some point, substantially reducing the time requirement.
Good answer... It looks like it would take on the order of 7 CPU years to create a table of every used address, much less time to target an area or individual.
I don't think this is an issue at all though, I was just curious.
Actually, the salting is an important detail. Those 7 CPU years would only create a table for one hash. That's how long it takes to brute force a single hash, not the entire space.
I think the whole point of a warrant canary is that you have to do something, every week/month, to confirm that you never had to obey a warrant. And (presumably, I don't think it has been tried in courts yet) a gag order can prevent you to speak about a warrant, but it can't force you to do anything, including actively saying that you didn't receive anything.
If it's automated.. the gag order prevents you to stop it, so it might as well not be there.
mike@glue:~$ wget -qO - https://mediacru.sh/transparency/warrant-canary.txt|gpg --verify
gpg: Signature made Fri 08 Nov 2013 11:48:13 GMT using RSA key ID 5044F2FC
gpg: BAD signature from "MediaCrush Administrators <admin@mediacru.sh>"
mike@glue:~$
josemanueldiez@InfiniteImprobabilityDrive:~$ wget -qO - https://mediacru.sh/transparency/warrant-canary.signed.txt|gpg --verify
gpg: Signature made Fri Nov 8 12:48:13 2013 CET using RSA key ID 5044F2FC
gpg: Good signature from "MediaCrush Administrators <admin@mediacru.sh>"
Your site says that you "losslessly compress images, video, and audio" - I poked around in the code a little and didn't find the compression stuff, but it doesn't make sense to me that you can do that. There is some optimization that you can do on PNGs, but for most media encoded with a lossy method you shouldn't be able to achieve a smaller filesize without reencoding with another lossy method (and by definition losing more information, althoughh it might not significantly decrease the percieved quality)
Actually, we just took some of that out to reduce processing times for users. We're going to overhaul the backend processing system so that we can process some things asyncronously, and then we'll put all that code back.
However, we do losslessly compress some things. We run PNG files through optipng and JPGs through jhead to strip out EXIF data, but most interestingly, we run GIF files through ffmpeg and serve them up with HTML5 video [1]. We usually get between 500 and 2000% faster for GIFs.
However, I agree that it's a little misleading, since we don't do it for every kind of media, and an ideally compressed file cannot be compressed further. I've been considering rewording it.
As someone who does media compression pretty much daily, your marketing spiel really came out to me badly for the same reasons. While you can optimize JPGs and PNGs in a lossless fashion, you really can't do the same for audio and video. You might be able to make them smaller without losing (much) perceived quality (aka do transparent compression), but you're still doing lossy compression. Same with converting gifs to videos - while the biggest loss certainly happens in the making of the original gif and while you can get a humongous increase in compression quality, converting it to VP8 is still lossy.
And speaking of which, saying that you can get "1000-3000% faster for some files" is also pretty dishonest. You can pretty much claim those kind of numbers with only one of the many formats you support (and one where you're not doing lossless compression), but the way you word it makes it sound like you could get it for potentially anything.
All in all, I'd really suggest you rewrite the description to be more honest.
Two other things I noticed: I can't seem to select text on the homepage (on latest Chrome). Your icon also looks rather similar to that of Miro: http://www.getmiro.com/
I agree. We originally started to simply deal with GIF compression (MediaCrush was previously known as gifquick) and we have less ground to stand on with respect to spectacular compression with support for more formats. I'll reprioritize the "rewrite the spiel" task thanks to feedback from HN.
As for selecting text on the home page, not much we can do about it. We force your focus into a contenteditable div to allow you to paste images/URLs directly into the page.
Regarding the icon, it just so happens that I had the sneaking suspicion that I had seen it before when the designer presented it to me. So I ran a reverse image search and looked at about a thousand similar icons. I didn't find Miro - if I had, I probably would have asked for it to be different. In any case, I think it's different enough that I'm not worried.
We all know how H.264 is great. You don't need to explain it again.
And you should fix it right now. I already got super-bad first impression. Currently your product is just marketing junk shit to me. And you have no way to fix my impression because I won't review your product ever again.
> We all know how H.264 is great. You don't need to explain it again.
You know that, but our intended audience isn't the HN crowd.
I'm sorry to hear that you got a bad impression of the site, but even without any fancy speed tricks (and there are fancy speed tricks), it's still an open-source, privacy-centric media hosting site that I think is pretty damn great.
It could be great. And that exaggerated text voids everything. That's why I told you to fix it ASAP. Before that text make you to lose any more people.
Needs to put more information in the README on what it does, not just installation.
I'm guessing based on the libraries used that it tries to make the sizes of static media content smaller, like smaller png's etc. But that's just a guess.
MediaCrush is awesome. We use it all the time on the NHL subreddit for quick replays of things that happen in hockey games. Because of the media controller, you can pause or quickly toggle to a key point in a play.
This isn't for-profit. It's actually not even technically a startup. I'm paying for the servers out of my own pocket. We've got a big pipe and lots of storage, so we'll last a while.
Ideally, we'll eventually be in a position to run ads internally instead of relying on adsense, and then we can hopefully make a little better money off of that. We also get a surprising number of donations [1] that help keep us going.
By the way, does anyone know anything about marketing? That's one thing we aren't good at. We've gotten very little press coverage. We tell people "if you like it, tell your friends" and that's gotten us this far.
I'm no expert, but here are my two cents:
1> Spread the word manually on HN, 4chan and the like
2> Add basic social media presence (twitter, FB ..). This will let you communicate with your fans/early adopters, and let them talk to you too.
3> When the audience in #2 is big enough, launch on kickstarter et al.
4> IF you have a successful kickstarter campaign, the media will cover you, given the big pain points you solve
What I do not know is how you will sustain when you get BIG. The "donate" method works well for someone like Wikipedia, BUT your traffic/costs are generally higher given that most of it pertains to media, as against text for wikipedia
Regarding 2, we have @mediacru_sh [1] and a subreddit [2]. Not many folk follow that, though. We have a problem with Facebook, though. Considering that all the devs are very pro-privacy [3], none of us have one!
As for Kickstarter, I'm not sure what we'd be raising money for or giving to contributors as a thank you. We are pretty featureful as it stands now, and I'm very against the idea of making people pay for certain features.
Love the idea. Potentially you could fund via a paid tier for business/enterprise use?
Rather than offering more features, you could have some sort of guarantee of length of hosting ie. Pay $299 a year and get x storage for x years? (Choosing suitably generous figures)
Then update your disclaimer on the about page to be free forever for personal use?
I don't know, but personally, this would be brilliant for the business I work at, but we could never use it for fear of hosting being unavailable if you go out of "business". I actually think you're doing a disservice to many organisations out there for not offering that — with the handy upside of having secured incoming funds.
If you're worried about it going away, you can always run your own instance! It is open-source, and the instructions are pretty straightforward if you have a look through the readme. You can always donate [1] to help ensure continued development. Even if it does go under, too, we'll keep the GitHub up and maintain it in our spare time, since we love the project and use it so much for our own needs.
You do have a good point, though. I don't want to force businesses to pay for hosting with us, but I am open to considering other means of monetization. Maybe we could set up and host private instances for people, plus support, for a fee?
Great points. Certainly aware of the open source nature, and ability to clone and run an instance, but for some organisations that's not really an option — they need a turnkey solution that has a somewhat inelastic dollar value and the ability to outsource all of the administration rather bringing another project in-house. That or they don't even have the technical human capital to support even rudimentary development work (there are TONS of Small to Medium Enterprise that would fit this bill).
I think your idea of private instances might be something worth exploring. It allows you to stick to your word, and the essence of the product while still finding another way to monetise.
Of course, if none of that seems interesting and you'd rather leave that problem for someone else to address than that's totally fine too. I only suggest it to you as you're already looking at the solution from a technical standpoint, and validated the need (based on traffic and use), indicating there is a market there to be serviced.
Well, as of earlier today, there are 8,947 media blobs on the site. Additionally, we recently hit 100k uniques [1] over the lifetime of the site, and a few thousand today alone. We maintain this level of traffic fairly consistently. Of course, we're still getting started. The site is only a few months old.
So you have done a poorly designed website which is nothing but yet another uploader with file compression (using third party tools) and some other falsely claimed features (caring about privacy while using Google Analytics, AdSense, Disqus and did not dig much more). This is an okay idea, but being serious here, this does not take more than a few days to develop and should not have much attention, except if there are surprising features I am not aware of.
Disabling Analytics and AdSense is very easy - just enable DNT on your browser. We haven't found any acceptable alternatives, although building our own analytics software[1] is on the roadmap.
There are other reasons, but PHP is a pretty big one. If we're going to use anything other than Google Analytics, we want to integrate it pretty deeply with our frontend. We were thinking of making a Flask extension that would capture all non-personal information without having to serve analytics code to the client. Also, as far as I know, Piwik does not support DNT.
This is something we can use. I just tried an image and mediacrush gave me a URL of the hosted image. I compared the size of the original image vs the one on MediaCrush. They were both same (130kb) Am I missing something? Or there is nothing to optimize in the image that I used?
Edit: Just realized the original image is optimized by Cloudflare Pro account. I guess they work well.
We recently disabled some optimizations to reduce processing time while we overhaul the backend processing system to support async processing. Usually, we run PNG files through optipng and JPGs through jhead (the latter isn't very effective).
Response to your edit: heh, I guess it wasn't that easy to optimize in the first place. If you want to see how well we usually treat it, run `optipng -o5 foobar.png`
Nothing special, as far as I know. Our nginx config is public [1]. We serve static files directly through nginx and proxy to a gunicorn server for dynamic content. We're a single virtual private server on AWS (soon to be a single dedicated server on Voxility).
You == your IP, in this case. We check the deleter's IP against the bcrypted one we store with the file before allowing them to delete it. There's an open GitHub issue discussing alternative methods [1] if you'd like to read some more about it.
I know what you mean - and we do appreciate the command line in MC, but we also understand that it's not for everyone, so we try to make things like jhead accessible to everyone.
https://mediacru.sh/transparency/warrant-canary.txt
(Note: the date 08/11 is written European style meaning November 8th, as you can see if you go up a directory.)