Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Short Answers To Every Matt Cutts Video (theshortcutts.com)
307 points by ASquare on April 8, 2014 | hide | past | favorite | 71 comments


"Does Google use EXIF data from pictures as a ranking factor? Potentially, yes."

This is a simple, but brilliant optimization.

With the Manufacturer & Model info from the tag you could make an educated guess about professional vs. non-professional photography and draw conclusions about website's intent.

With the date & time you've got relevancy of time beyond just search results. EG if you're optimizing for freshest / relevant content a "new" article that contains a recent picture with positive PR signals could outrank another "new" article that is using old pictures.

With geo-data you've got an obviously powerful signal.

And on and on.

It is kind of crazy to think about the sheer # of breadcrumbs we leave online for GOOG (and, ahem, other organizations) to learn about us with.

PS -= This is hugely useful resource, thank you for submitting!


Most professionals will scrub the meta data from images as part of the image optimization process - smaller images faster sites.


Really? I've seen metadata scrubbed for privacy reasons, but never for size reduction. And realistically how much is saved in proportion to the image size?


Imgur and Craigslist specifically scrub EXIF metadata (EDIT thanks to /u/user24) for privacy.


- for privacy


Thank you. Edit made. Need moar coffee.


Which, sadly, also reduces the chances of finding who actually produced a given photograph. TinEye and Google Images can help, but, it can be surprisingly difficult to locate the actual origin of a given photo.

(Let alone Tumblr and their infinite vortex of "reblog", which leads to more than a few dead ends)


It can make a huge difference.

    `convert from_camera.jpg -resize 200x200 resized.jpg`
    `exiftool -all= resized.jpg -o resized-exif-stripped.jpg`
    
    from_camera - 2.4M
    resized.jpg - 33K
    resized-exif-stripped.jpg - 16K
In this simple example, the EXIF data double the file size of a 200x200px thumbnail.


Is that number from ls(1) or from du(1)?


ls -h


This is one of the optimizations mod_pagespeed makes by default: https://developers.google.com/speed/pagespeed/module/filter-...

There's not usually much EXIF text but it isn't gzipped so it's more expensive than need be. This is the kind of tweak that's not really worth doing on its own but when you're already optimizing the image you might as well do this too.


It's not about the text, but usually the EXIF also contains a thumbnail (for faster display on the camera) that can take tens of KB.


And some images also contain a color profile as metadata -- which can be quite large.


Every little helps - Photoshop save for web can give you some quite usefull reductions I shaved 1 second of some key pages on kelley search just from optimizing a single image.


"Every little helps"

but, does it? i mean it sounds like we're talking about taking a few bytes off a probably Xmb or XXmb image.


One megabyte is huge for a web image. That's essentially a full-screen 4K/UHD version of a highly-detailed image with no detectable compression artifacts (where "detectable" means "image subtraction doesn't count; zoom in tight and use your eyes") when run through a good optimizing utility. That's good enough for a better-than-acceptable (not optimal, but from a pace or two away most people wouldn't notice) tabloid/A3 print without uprezzing. And there is enough data to support substantial uprezzing before the image goes wonky for those inclined to do so.

Now, yer average pro doesn't tend to indulge in the pixel-peeping game (except in private, with the lights down low and the door locked, and always washes his/her hands afterwards), and tends to be reluctant to give away images that can still sell - or at least would prefer it if people kept their own Photoshop/Gimp chocolate out of their personal peanut butter. Even when the images are offered as freebies, there is no reason to force a viewer to download more than they need to make decisions; the hi-rez image will sit behind a link, to be opened at the viewer's discretion. The back button hurts everybody.


who said anything about reducing the filesize by a megabyte? i assumed the EXIF data was maybe a few kb.

not sure why i'm being downvoted. i get it, "every little bit" adds up. but here in the real world removing the EXIF data to save bandwidth is a micro optimization 99% of the time.


The grandparent meant that you wouldn't be removing a few kbs from "a probably Xmb or XXmb image", since web images aren't that large.

You're right that it doesn't make sense to downvote you for that; it was an interesting question.


Depends on the app. Multiply larger size x1000s of images, x1000s of visitors, x days. For offsite backups, 2x the storage cost. My general rule is, any image over 40k is too big for a web page, unless your website is about photography/high-resolution imagery. And I hate jaggies and compression artifacts as much as anybody ;-)


I'd expect most content management systems will do this, but there's tons of exif data hanging out on the Internet.


I thought professionals would be more likely to add in exif data for copyright information.


The copyright and contact fields are usually the only ones not empty. The rest are pretty much irrelevant except to the magical thinking crowd. (Those are the people who believe that a particular camera, lens, or aperture/shutter/ISO combination will result in the "same" picture or the "same" image quality. For the most part, the fact that the photographer happened to be holding a D4, a 1DX or an 80-MP Phase One at the time doesn't mean the shot would have been substantially different - at web sizes, at least - from what an entry-level DSLR/MILC or even a decent point-and-shoot would have rendered.) Where provenance matters (particularly in things like the WPPA "Photojournalism" competition division) only the raw file counts. When the piece is part of a tutorial or "how I shot it", the actually relevant and helpful data appears separately as body text with an explanation.


The rest are pretty much irrelevant except to the magical thinking crowd. (Those are the people who believe that a particular camera, lens, or aperture/shutter/ISO combination will result in the "same" picture or the "same" image quality.

I wouldn't go quite that far: some people see a cool shot and wonder how it was created and how the techniques used to create it might be applied elsewhere, much as someone might see a nifty piece of software and want to see the source code.


They usually don't look at the important things (most of which is never recorded in EXIF), like where the light is coming from, whether or not reflectors or external (usually non-TTL) flash were used, and so forth. Beyond "wide" or "narrow" (for the sensor format and focal length), the aperture is rarely critical; the same applies to "slow" and "fast" shutter speeds. Both of those can be derived by eye, and if they add up to "not enough light", then there must have been a higher ISO in use. And I'm rather hoping that some day people will figure out that distance controls perspective and focal length merely changes the framing. (Fisheyes being an exception here, but they're sort of easy to spot. And Scheimpflug corrections are not EXIF-compatible.) The actual numbers don't particularly matter, and are highly situational.


>> Does Google use EXIF data Yes, they already use EXIF data. Google is using it in Google Plus search. I proved this a while back by putting a unique string of characters in the EXIF data and if you search for that string of characters in Google Plus it brings up the photo/image.


> With the Manufacturer & Model info from the tag you could make an educated guess about professional vs. non-professional photography and draw conclusions about website's intent.

That is really bad IMO. I don't want my images judged on which camera I used. I want them judged on their content. I don't decide which book to read based on what program was used to edit the book. I don't watch a movie based on what software was used to edit it. Why would I judge a picture based on what camera was used?

If sites start judging content by camera then there will be an incentive to edit the EXIF info to make it appear the best camera was used.


>I don't decide which book to read based on what program was used to edit the book.

Yes you do. You read books published by major publishers that adhere to a certain quality or standard. You don't read books that were hand-written on a banana leaf.

>I don't watch a movie based on what software was used to edit it.

Ever seen Battlefield Earth? Didn't think so.

>Why would I judge a picture based on what camera was used?

Usually one is oblivious to the camera used for a picture. However, the camera used does indeed correlate with the quality of the image. The better the camera, the higher quality the picture. This gives information about the site owner's intent for content.


Back in the days when I was running pretty large SEO company (at its peak it was around 500 small businesses using us) few of guys in this field chimmed in and got some heavy testing done. This was done around penguin or panda update (not sure though). We have got 200 domains with no SEO done to them before, got some Wordpress blogs and started testing trying to see how accurate Matt Cutts suggestions were. Unfortunately results were not only mostly random, but also pretty opposite to what we have learned from Matt. Websites that were spammed (but in smart way) were ranking pretty good, while WH sites barely made to top 20 for targeted keywords. You can see tons of examples like that around the marketing forums. I am wondering if Google overgrown itself algorithm wise and if they still know whats going on there.



This is a great website that should be bookmarked by everyone who cares about search traffic.

It is curious though why this submission got up voted to the homepage and the previous two times it was submitted that it never went anywhere...makes me wonder if there is lots of great content that never gets up voted on HN.

Here is a submission from over a year ago.https://news.ycombinator.com/item?id=5405485


It is curious though why this submission got up voted to the homepage and the previous two times it was submitted that it never went anywhere

This is easily explicable: it's the luck of the draw. There is a lot of randomness in which stories get promoted from /newest vs. fall through the cracks.

We're working on an idea for reducing this randomness.


HN very much has a hive mind. HN used to have great articles all the time. Now it feels like reddit. Every day tons of great articles slip into obscurity while HN upvotes the meme of the day (Snowden, NSA, 2048, etc).


Seems to me that's not evidence for or against a hive mind, its evidence that, whether or not it has a hive mind at all, the aggregated preferences of the current HN user base don't align as well with your preferences as those of the user base did at one time in the past.

But having a user base whose aggregated preferences disagree with yours more often than they used to is no more evidence of a hive mind than having what you perceive to be "great articles all the time" is.


There was a submission of this two months ago as well that only got two upvotes... Do you think the community changed that much in two months?

Maybe HN should take articles that get more than one up vote but die down to resurface in new several times to see if it slipped though the cracks or if it really has no value to the community.


> There was a submission of this two months ago as well that only got two upvotes... Do you think the community changed that much in two months?

There's probably pretty big effects from time of day and other things on whether or not things get upvotes. That's not a "hive mind", that's just the fact that upvotes come from people coming here as they feel like doing so, not from a bunch of 24/7 staff that are paid to apply objective, consistent criteria (which would be a kind of hive mind.)


Every day tons of great articles slip into obscurity

Can you provide examples? This is the problem we're going to work on next, after we've made some progress on comment incivility.

(I'm going to demote this subthread as off-topic so it doesn't interfere with the real discussion.)


> "Should I focus on clarity or jargon when writing content?"

> "Clarity, but also include jargon."

Well, that pretty much summarizes my writing strategy.


You should be able to express your ideas -- with jargon or nerdspeak or technobabble -- in such a way that people can understand you even if they don't understand those words.


Thank you for that detailed advice.


Site is down for me ("Oops! Google Chrome could not find www.theshortcutts.com"), anybody else having problems? Tried my ISPs DNS as well as the Google DNS


It's been down for a few hours. I keep checking when I have a free minute but doesn't seem to be coming back up.


I think HN broke the site. You can always find the original videos at http://www.youtube.com/user/GoogleWebmasterHelp

Power-user tip: Google does regular live hangouts for people who have questions about search and SEO. Here's the schedule of upcoming hangouts: https://sites.google.com/site/webmasterhelpforum/en/office-h...


Loving that you can filter the videos based on what colour t-shirt he's wearing.


It's amazing how addictive a nonsensical navigation system can be.

Also: has he been working out? His chin seems to shrink a bit with every new video.


Looks like he's been getting the kind of exercise I need to make time for:

http://www.mattcutts.com/blog/a-big-challenge-running-a-50-m...


We made our April Fools video with the changing shirt color specifically to play with that: http://www.youtube.com/watch?v=1Cjz_kJGtS8


The perfect wearables strategy.


The pinterest layout is only really good for pinterest (or very similar content). This forces my eyes all over the place when I just want to read down the list.


That's also similar to the default G+ layout, which I similarly detest.


Switch javascript off, makes it a lot more readable.


The third Q&A "How can I tell google that multiple domains are related?" - "Use Hreflang" doesn't make any sense. This question does not reflect the question asked in the video, and the answer is incomplete. The question is about translated versions of international sites, that's not simply "related". And sitemaps work as well.

I think the site is a good idea, but if one of the first three results is flawed I'm not too sure about the overall quality.


I mean... the videos are sortable by shirt color. I'm not sure quality is truly the goal here.


After getting through the trouble of listening to so many videos - I'd suggest to put 1-2 paragraph answer for each video instead of 1-2 words.

Lots of important details are lost together with fluff.

Although I always appreciate such work - summarizing videos via get-to-the-point text is a great service.

PS: I'd ignore SEO videos of more than 1-2 yr old - unless tracking Matt's t-shirt color is someone's hidden fetish...


I was expecting one line answer to all questions: "well, it depends, but you think about user, not Google"


This site has actually been around for a while. Great Coles Notes for SEO. But if you have the time, watch the full videos from Matt. His advice is required listening.


The domain name is pretty funny. Props.


And you can sort talks by t-shirt color.


I wonder if there are any correlations between shirt colour and content - like he [unconsciously] wears a red shirt when he's being less open with his responses.


I switch shirts every 5-6 videos, and we go down the list of videos in order of votes, so shirt color is mostly uncorrelated with the content. If I'm doing a longer or more in-depth video, then I might pick a specific shirt that goes with the topic.


I don't get the inconsistent answer to site load time:

If you were an SEO of a large company, what would you include in your 2011 strategy? Optimise site speed, control of CMS, education program, internal linking, social media

Should I be obsessing about load times? Slightly

Do site load times have an impact on Google rankings? No


Page speed does have an impact on Google rankings. You should care about the speed of your site not just because of your Google rankings, but because it makes a huge difference to your user experience.

That's the hazard of this site. In summarizing to 1-2 words, some of the nuance is lost.


Great website and great idea in general! I love being able to read a concise answer rather than watch a video when time is a constraint.


Maybe make it: theshirtcutts.com ?


Excellent effort.


Agreed, that's a very large number of videos.

The lead-in copy on the landing page might need an iteration or two, "to help struggling site owners understand their site in search" read very strangely to me. I'm not a native speaker, though.


You're correct. It's not great copy. Also, 'struggling site owners' will not understand these videos. Cut 'struggling' and 'their site' and it becomes:

"...to help site owners understand how Google search works..." or "...to help site owners understand how their decisions affect their Google search results..."


Nothing changes the fact that Google is your biggest competitor and they control ranking, the ads and display. Google's ad clicks keep up each quarter and your traffic down. (No, online traffic or search volume isn't increasing by that much. Not even close)

Matt does a great job for Google buying them time but for webmasters he's useless.


It's as if Robin Van Persie and Will McKenzie (Simon Bird) had a baby.


Still better than when someone said I looked like Rick Moranis, so I'll take it.


I wouldn't worry about that too much.


Instant search works great and is functional.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: