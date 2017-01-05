Every year we sync all our family photos to have redundant backups. When she went to get the three months of backups from flickr she got "download error" after error. She sent me this link and hypothesized that the bulk-download feature is no longer working because of the need to now first decompress the files before transmitting them.
Luckily, she was able to get the 25 gigs of family photos down using a third-party application, but it's another reminder to never wholly trust the "cloud."
The concept makes sense but I'd never thought of that before.
Arithmetic coding allows fractional probability distributions to be represented. It's usually at least 10% more efficient than Huffman coding.
It's other parts of JPEG compression which lead to quality loss, and those parts aren't recompressed.
What they're saying is that you can take a JPEG compressed image, decompress to raw pixels, and then recompress with JPEG more efficiently (if you're careful, no specifics on how this is done), and you save space.
That's why they mention that they're doing it very carefully, because you've got to make sure that when you decompress the new optimized image that it pixel for pixel matches the original decompressed image.
As far as I can tell, 'raw pixels' is not quite accurate. Both PackJPG and Lepton seem to decompress as far as frequency-domain coefficients, but no further. That means they don't repeat the lossy stages in JPEG encoding - the transformation of pixel data to the frequency domain, and the discarding of high frequency infomation. It looks like PackJPG and Lepton do some interesting tricks with the coefficients to essentially make them more amenable to the following lossless compression. Both PackJPG and Lepton have their own file format, so they're definitely not outputting standard JPEG images.
Here's the Lepton page:
https://blogs.dropbox.com/tech/2016/07/lepton-image-compress...
And for more detail, here's the paper that PackJPG is based on:
http://www.elektronik.htw-aalen.de/packjpg/_notes/PCS2007_PJ...
Besides removing information from the file that doesn't affect the rendered image (like EXIF data), lossless recompressors typically replace the huffman coding of DCT coefficients with a more efficient arithmetic coder. So you don't start over from raw pixels, but you replace the type of compression used with a more modern and efficient algorithm. That means ordinary software can't read the JPEG (since you've essentially created a new format) but you can just decompress into standard JPEG whenever someone wants to look at the image.
You can do this if the goal is pixel perfect accuracy, but Flickr can’t do this since they have “a long-standing commitment to keeping uploaded images byte-for-byte intact”…
Lepton (one of the examples they mentioned) losslessly compresses a JPEG to Lepton format, then losslessly decompresses it back to JPEG. The pixels are never decompressed and in fact a JPEG decompressed from Lepton is bit-exact the same as the original. I've tested and verified this on several million images.
I suspect that Flickr's past the explosive growth stage so an optimization like this is timely.
When you have a database which keeps redundant master slave replicas and mutation logs, a backup system which keeps many previous backups on-site and offshore on tapes, a storage system that has many RAID mirrors per host, a distributed filesystem which stores replicated chunks of data across an entire datacenter to handle host failure, and you keep copies of the entire lot in multiple datacenters for accessibility during planned downtime and natural disasters.
Make it hard to navigate by hiding everything behind hashes, to prevent fair use downloads. Keep tags in beta for 15+ years.
Of course, when usage goes down, that helps with the problem quite a bit. A poor experience, even for viewing content, lessens engagement and leads to lower usage and fewer uploads.
Sadly, I'm afraid a much more extreme data storage reduction approach awaits faithful users of Flickr.
When Yahoo! bought a large photo blogging site in Taiwan, it simply shut it down with about six months notice, deleting everything as it did.
Do you have an example of this? Seems like a bad way to run search.
> instead let them flood the results because they used keyword tag spam
Tag spam largely doesn't work but it's not impossible. Flagging these results makes them go away and brings them to our attention.
> Make it hard to navigate by hiding everything behind hashes, to prevent fair use downloads.
Huh?
> Keep tags in beta for 15+ years.
Double-plus huh? Flickr is only 13ish years old. It's not ready to get a drivers license yet.
> Sadly, I'm afraid a much more extreme data storage reduction approach awaits faithful users of Flickr.
Nope! Unless you know something I don't?
> When Yahoo! bought a large photo blogging site in Taiwan, it simply shut it down with about six months notice, deleting everything as it did.
That sucks. Which one?
I have a flickr pro account from 6 or so years ago with hundreds of photos on it. I've tried over 10 times over a year to contact their support and get turned over to Tech Support in India that won't even read into your case!
Of course, the original email address I used for my flickr was deleted, so none of the avenues on Yahoo Help (which is where they redirect you) work. Not to mention the password may be reset after all the leaks Yahoo had.
So when I see these people on @FlickrHelp on Twitter (No replies) and Flickr having office parties, it really makes me feel quite disappointed! Yeah sure, real human touch! Former paying customer who just wants to login his account with tons of priceless photos. And they have a thread of like thousands of people who can't get into their accounts [1]
At least the employees are having fun with data compression. Sad I can't talk to an actual human to get access to my account!
[1] https://www.flickr.com/help/forum/en-us/72157668446997150/
Agreed, their support is terrible, and they don't seem to distinguish between free or paying users (and in terms of account access issues, at least, there isn't any dedicated Flickr support, just horrible Yahoo support).
I had access to the email address registered with my account, so at least that wasn't a drama. But still, took about 3 months of assertively-worded support emails before they got me back in and I could upload Flickr photos again.
I think Flickr is a great service, and I've been faithful to it for many years. But that episode jaded my opinion significantly. And with Yahoo now going under, I'm not expecting much re: the future of Flickr. I'm considering moving my primary online photo storage to S3, Github Pages, or some other alternative.
Also I have found an issue where they replaced the Flash based profile avatar chooser by HTML5 but forgot to test on computers without Flash installed so it was still not shown. Apparently nobody noticed for months. To their credit, I spoke to one of their engineers and it was solved very quickly.
Fortunately I can log in if I remember just correctly what email I used for my Yahoo! account. my browser fortunately remembers and the credentials in lightroom still allow my to put stuff up there.
As noted here - http://stackoverflow.com/questions/30013032/prevent-user-to-... you can get it using your browser's web inspector dev tool.
I am currently managing a Postgres cluster with a petabyte of data in it. We found ZFS to be a great way to reduce overall storage costs. We just switched our machines to machines running ZFS, and we were suddenly using 1/3rd the amount of disk space. Although it took us a while to learn all of the gotchas of ZFS, it wound up saving us a huge amount of $$$.
(As I understand it, ZFS would not have helped in Flickr's case. Since JPEGs are already compressed, ZFS would not have provided any benefit. Flickr was able to save storage by using an ad hoc compression algorithm.)
Did the use of ZFS in your system incur noticeable processing cost in your system? Any noticeable increase in latency on your system?
I investigated ZFS for my home server and it was recommended to have 1Gb RAM for every Tb, and much more if you enabled deduplication/compression (I forget which).
I would never use deduplication on ZFS, it is very slow even if you have enough RAM. And in most cases, savings are less than 10%.
How exactly is the Flickr "look" defined?
Looks like someone's hoping to get hired
[1]: https://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-ob...
- They say cost is 0.03/GB? Doesn't backblaze b2 charge 0.005/GB? Why isn't B2 a real option?
For example, if a piece of client side malware, for every Vimeo video a user views, loads in the background a random rarely viewed video, it would probably lead to days of downtime.
The image would have been clearer if it used more than purple and what looks like the most minor variation on light blue.
>increasing camera resolution, burst mode and the addition of short animations (Live Photos) have increased bytes-per-image rapidly enough
>Users only rarely delete or change images once uploaded.
I'm very curious, how much of all this tr.. sorry, sweet memories are never ever viewed after, say one week from upload.
Store images on tape? What about degradation of the tape overtime? Certainly seems to be a factor compared to hard drive degradation.
With tapes, even though they might last longer, you typically wouldn't scan and check the data as regularly. That means, if it does go unavailable, there is a larger window of time for other replicas to fail in before re-replication completes.
After a certain point, datacenter growth (both physically and logically) gets so brutal that you need to consider running things more efficiently.
I'd say if they reduced their datacenter annual cost by at least $5m it was money well spent. From the sound of it they actually saved a lot more than that. And by increasing their overall storage efficiency it's a feature that pays dividends even when they do ultimately add more storage in the future.
Storing original uncompressed images eats into the 15GB free storage budget, and uses paid storage upgrades after that.
Unless you have a Pixel phone (free unlimited original uncompressed image storage).
[0] https://support.google.com/photos/answer/6220791
YouTube does that, so that when a new video compression format comes along, they can transcode into it.
https://www.minio.io/
Erasure coding isn't something new, unknown or unique to Minio, it's built into Ceph, GlusterFS and Openstack Swift, the largest distributed object stores.
You may be interested in Tahoe-LAFS though (https://tahoe-lafs.org/trac/tahoe-lafs). It has many good things in it, one of them is that all files get erasure-encoded so that k nodes out of n are needed to restore the file. When you set a node to be a storage provider (such as S3, GCS, ...), then you effectively have erasure encoding over providers: If S3 is down, you can still retrieve your data from the rest of the providers.
Some people have actually tried to do it, as described in the first section in (https://tahoe-lafs.org/trac/tahoe-lafs/wiki/TipsTricks).
Plus, they mine the shit out of that data. Even if they're not earning ad revenue on it, they're tracking location, usage statistics, I wouldn't be surprised if their user agreement includes the ability to have AI "watch" the video and try to mine data about what's happening in the video the same way they mine your email to build smarter and more effective consumer profiles about you. So even with nobody watching your video, I wouldn't put it past the "G-men" to find a way to eke some profit off it.
Our automated systems analyze your content (including emails) to provide you personally relevant product features, such as customized search results, tailored advertising, and spam and malware detection. This analysis occurs as the content is sent, received, and when it is stored. [1]
[1] https://www.google.com/policies/terms/
We may have petabytes for rack, but 4K videos are coming.
wiki.c2.com/?ParkinsonsLaw
They aren't the norm, but I imagine YouTube sees a fair number of 4K uploads.
And the trend is not improving much (or at all). By comparing video uploads to YouTube from last two years, the trend is roughly the same: only 1.1% of videos are in 4k [620M videos, 7M in 4k for 2016; 322M videos, 3.6M in 4k].
Yes, but screens increase in resolution very slowly, and you're physically limited in how much resolution you can get on a small sensor.
And video compression is improving quite impressively.
So in the end, storage is growing a lot faster than the bitrate of videos.
I'm sure there will be plenty of demand, it comes through changes in the way we use things.
Is that true? I've been pricing hard drives recently, and they seem to have stagnated for the last few years, staying at about the same price, with the best value per Gb at around 4Tb.
I thought h265 took about half the space of h264 for the same quality, but a 4k screen is actually 4x the pixels of 1080p...
So I'm guessing 4k h265 would be about double the bitrate of 1080p h264.
More importantly, don't just look at a year or two. 1080 availability is significantly more than a decade old. In fact, you should probably be comparing against 1080 mpeg-2. That gets you nearly the same bitrate between then and now. In that time hard drives have gone from ~1GB per dollar to 35GB per dollar.
With 10TB drives that could be 6PB or more in a single rack.
For original photo storage which is primarily write once w/ rare if ever deletes and infrequent reads, you could potentially get by with PMR drives, further reducing the cost.
I do believe that YouTube is not hyper excited about these videos, but they do have some optimization techniques in place. For instance they don't automatically generate all possible video profiles up until they are not requested. By default they produce around 6, while very popular videos may have up to 40.
We detached this subthread from https://news.ycombinator.com/item?id=13386349 and marked it off-topic.
Of course I'm mad too when I have to deal with a clueless call center rep, especially when they don't move you up to higher tier support even if they have no idea how to handle the issue. But as long as they do that, I'm fine with the concept.
How do you know they are in India? Accent? Asking because India offshore support peeps used by Dell, Walmart etc are all give white christian names - like Mary, John, Adam etc - and also undergo 3 months of rigorous 'American Accent' training. I know because 2 of my Indian cousins ( I am American-Indian born and raised here) work at such call centers in Mumboi and Chenna respectively.
So it's quite difficult to discern that they are Indian cos the Companies that hire them spend millions of $ trying to disguise their voice and tone to make them sound like they are local / American.
American/English sounding names they introduce themselves are all the more jarring because of that. I've got no problem talking to Prakash or Dharmesh (or whoever from wherever), but comically-obviously-not-Steve instantly gets me annoyed (it doesn't help that so far, 100% of these occurences have been spam calls).
Where did I say anything about an accent? Merely pointed how that "Steve" would have looked.
edit: I should mention they use "American" accents for Australian customers too.
Yeah, like that would really help with 20+ years of speaking an accent.
3 months of training is obviously not close to sufficient to mask that accent...
When I get them on a tech support call it just makes me automatically angry because I know how much more work I'll have to do just to communicate.
If I can get to someone in a US support center, I've just found that they tend to be better enabled to actually address my problem--even if they have a southern accent :-)
I'm in Australia and I've noticed more of them being Filipino which I find just slightly less difficult to understand than Indian, but still difficult and irritating.
But I agree, it's such a relief if you win the lottery and get a native English speaker, things always seem to move along so much quicker.
Or is it, rather, that Sam and YC {and venture capital in general}, by proxy through you, are annoyed at decreasing access to low-wage workers from a developing country notorious for unintelligible accents?
From a moderation point of view, it's not a question of disagreeing. We can agree with someone and still ask them to stop. In fact that's common. In this case it was about asking people not to go on offtopically about peeves, because that's not interesting, and also about HN being an international community. Respect across national divides is important here.
All I said was that I had trouble understanding heavily accented English. I never said I don't like the accent. It's irrelevant if I like it or not. It's that I can't understand. Being able to understand is IMO a basic requirement for effective telephone support.
I have no problem with the people themselves. I actually don't care what their nationality is or where they are physically located. But they need to be able to do their job. The system is broken if I can't understand the person who is trying to support me over the phone.
The only reason I'm writing this is in case some future CEO is reading these comments, hopefully they won't be blinded by the cost savings and fall down the same trap as so many companies who roll out such ineffective and irritating phone "support".
And it's matter of far more than just tone use and accent, though, IME, plenty of big companies are not succeeding in doing much about those two elements in Indian call-center staff serving US customers, even if they are some song lots of money on it -- and I'm not talking about disguising the fact they are Indian, but even getting them to the point where they accent and tone use isn't an impediment to communication with Americans.
But then, plenty of companies with US-based call center staff aren't doing much to select people that are great at clear oral communication with other Americans, either.
The whole "Hire Americans!!1!" kind of viewpoint smells like Trump and honestly: The world deserves better.
Also, your assumption is wrong. 95%+ of Americans speak English while only 12.5% of Indians speak it. Even with India's much larger population that still amounts to roughly a 3rd as many compared to the U.S. There is also a knock-on effect. When only 12% of the population speak a language they tend to avoid using it as a primary conversational language and thus only get practice in professional settings which lowers their fluency in it. Add to that all of the regional oddities and its quite different from American English. Consider for example phrases like "Kindly do the needful" or "out of station". Phrases they say daily that are completely foreign to American ears.
When you next have a chance compare someone in the Philippines (92% english) speaking English vs someone from anywhere in India speaking English and it'll become very apparent to you that the percentage of the population that speaks the language makes a huge difference in the intelligibility of said speaker.
https://www.statista.com/statistics/580586/states-with-the-h...
Willingness to work saves the company money. It doesn't help the user. What helps the user are things like clear speaking and good phone connections. It's possible to do that in any country, but the relative cost varies, and outsourcing implies cutting corners.
While I agree that 3 months of training isn't going to cut it, it isn't necessary to have spoken in that language your whole life if you are serious about your accent.
Oh wait, the Politically Correct liberals are crying "Racist!". Never-mind.
I'm a native English speaker and I find it super difficult to understand English with an Indian accent.
Photographers still use and make projects or series on stuff like Flickr or Behance.
Instagram and Facebook are more "blog-like" and aimed at snapshots of daily life, holidays etc to share with friends. It's normal imho that these are much higher in volume.
I really dread that some day Yahoo will manage to casually destroy it.
Every year we sync all our family photos to have redundant backups. When she went to get the three months of backups from flickr she got "download error" after error. She sent me this link and hypothesized that the bulk-download feature is no longer working because of the need to now first decompress the files before transmitting them.
Luckily, she was able to get the 25 gigs of family photos down using a third-party application, but it's another reminder to never wholly trust the "cloud."