"Does Google use EXIF data from pictures as a ranking factor? Potentially, yes."
This is a simple, but brilliant optimization.
With the Manufacturer & Model info from the tag you could make an educated guess about professional vs. non-professional photography and draw conclusions about website's intent.
With the date & time you've got relevancy of time beyond just search results. EG if you're optimizing for freshest / relevant content a "new" article that contains a recent picture with positive PR signals could outrank another "new" article that is using old pictures.
With geo-data you've got an obviously powerful signal.
And on and on.
It is kind of crazy to think about the sheer # of breadcrumbs we leave online for GOOG (and, ahem, other organizations) to learn about us with.
PS -= This is hugely useful resource, thank you for submitting!
Really? I've seen metadata scrubbed for privacy reasons, but never for size reduction. And realistically how much is saved in proportion to the image size?
Which, sadly, also reduces the chances of finding who actually produced a given photograph. TinEye and Google Images can help, but, it can be surprisingly difficult to locate the actual origin of a given photo.
(Let alone Tumblr and their infinite vortex of "reblog", which leads to more than a few dead ends)
There's not usually much EXIF text but it isn't gzipped so it's more expensive than need be. This is the kind of tweak that's not really worth doing on its own but when you're already optimizing the image you might as well do this too.
Every little helps - Photoshop save for web can give you some quite usefull reductions I shaved 1 second of some key pages on kelley search just from optimizing a single image.
One megabyte is huge for a web image. That's essentially a full-screen 4K/UHD version of a highly-detailed image with no detectable compression artifacts (where "detectable" means "image subtraction doesn't count; zoom in tight and use your eyes") when run through a good optimizing utility. That's good enough for a better-than-acceptable (not optimal, but from a pace or two away most people wouldn't notice) tabloid/A3 print without uprezzing. And there is enough data to support substantial uprezzing before the image goes wonky for those inclined to do so.
Now, yer average pro doesn't tend to indulge in the pixel-peeping game (except in private, with the lights down low and the door locked, and always washes his/her hands afterwards), and tends to be reluctant to give away images that can still sell - or at least would prefer it if people kept their own Photoshop/Gimp chocolate out of their personal peanut butter. Even when the images are offered as freebies, there is no reason to force a viewer to download more than they need to make decisions; the hi-rez image will sit behind a link, to be opened at the viewer's discretion. The back button hurts everybody.
who said anything about reducing the filesize by a megabyte? i assumed the EXIF data was maybe a few kb.
not sure why i'm being downvoted. i get it, "every little bit" adds up. but here in the real world removing the EXIF data to save bandwidth is a micro optimization 99% of the time.
Depends on the app. Multiply larger size x1000s of images, x1000s of visitors, x days. For offsite backups, 2x the storage cost. My general rule is, any image over 40k is too big for a web page, unless your website is about photography/high-resolution imagery. And I hate jaggies and compression artifacts as much as anybody ;-)
The copyright and contact fields are usually the only ones not empty. The rest are pretty much irrelevant except to the magical thinking crowd. (Those are the people who believe that a particular camera, lens, or aperture/shutter/ISO combination will result in the "same" picture or the "same" image quality. For the most part, the fact that the photographer happened to be holding a D4, a 1DX or an 80-MP Phase One at the time doesn't mean the shot would have been substantially different - at web sizes, at least - from what an entry-level DSLR/MILC or even a decent point-and-shoot would have rendered.) Where provenance matters (particularly in things like the WPPA "Photojournalism" competition division) only the raw file counts. When the piece is part of a tutorial or "how I shot it", the actually relevant and helpful data appears separately as body text with an explanation.
The rest are pretty much irrelevant except to the magical thinking crowd. (Those are the people who believe that a particular camera, lens, or aperture/shutter/ISO combination will result in the "same" picture or the "same" image quality.
I wouldn't go quite that far: some people see a cool shot and wonder how it was created and how the techniques used to create it might be applied elsewhere, much as someone might see a nifty piece of software and want to see the source code.
They usually don't look at the important things (most of which is never recorded in EXIF), like where the light is coming from, whether or not reflectors or external (usually non-TTL) flash were used, and so forth. Beyond "wide" or "narrow" (for the sensor format and focal length), the aperture is rarely critical; the same applies to "slow" and "fast" shutter speeds. Both of those can be derived by eye, and if they add up to "not enough light", then there must have been a higher ISO in use. And I'm rather hoping that some day people will figure out that distance controls perspective and focal length merely changes the framing. (Fisheyes being an exception here, but they're sort of easy to spot. And Scheimpflug corrections are not EXIF-compatible.) The actual numbers don't particularly matter, and are highly situational.
>> Does Google use EXIF data
Yes, they already use EXIF data. Google is using it in Google Plus search. I proved this a while back by putting a unique string of characters in the EXIF data and if you search for that string of characters in Google Plus it brings up the photo/image.
> With the Manufacturer & Model info from the tag you could make an educated guess about professional vs. non-professional photography and draw conclusions about website's intent.
That is really bad IMO. I don't want my images judged on which camera I used. I want them judged on their content. I don't decide which book to read based on what program was used to edit the book. I don't watch a movie based on what software was used to edit it. Why would I judge a picture based on what camera was used?
If sites start judging content by camera then there will be an incentive to edit the EXIF info to make it appear the best camera was used.
>I don't decide which book to read based on what program was used to edit the book.
Yes you do. You read books published by major publishers that adhere to a certain quality or standard. You don't read books that were hand-written on a banana leaf.
>I don't watch a movie based on what software was used to edit it.
Ever seen Battlefield Earth? Didn't think so.
>Why would I judge a picture based on what camera was used?
Usually one is oblivious to the camera used for a picture. However, the camera used does indeed correlate with the quality of the image. The better the camera, the higher quality the picture. This gives information about the site owner's intent for content.
Back in the days when I was running pretty large SEO company (at its peak it was around 500 small businesses using us) few of guys in this field chimmed in and got some heavy testing done. This was done around penguin or panda update (not sure though). We have got 200 domains with no SEO done to them before, got some Wordpress blogs and started testing trying to see how accurate Matt Cutts suggestions were. Unfortunately results were not only mostly random, but also pretty opposite to what we have learned from Matt. Websites that were spammed (but in smart way) were ranking pretty good, while WH sites barely made to top 20 for targeted keywords.
You can see tons of examples like that around the marketing forums. I am wondering if Google overgrown itself algorithm wise and if they still know whats going on there.
This is a great website that should be bookmarked by everyone who cares about search traffic.
It is curious though why this submission got up voted to the homepage and the previous two times it was submitted that it never went anywhere...makes me wonder if there is lots of great content that never gets up voted on HN.
It is curious though why this submission got up voted to the homepage and the previous two times it was submitted that it never went anywhere
This is easily explicable: it's the luck of the draw. There is a lot of randomness in which stories get promoted from /newest vs. fall through the cracks.
We're working on an idea for reducing this randomness.
HN very much has a hive mind. HN used to have great articles all the time. Now it feels like reddit. Every day tons of great articles slip into obscurity while HN upvotes the meme of the day (Snowden, NSA, 2048, etc).
Seems to me that's not evidence for or against a hive mind, its evidence that, whether or not it has a hive mind at all, the aggregated preferences of the current HN user base don't align as well with your preferences as those of the user base did at one time in the past.
But having a user base whose aggregated preferences disagree with yours more often than they used to is no more evidence of a hive mind than having what you perceive to be "great articles all the time" is.
There was a submission of this two months ago as well that only got two upvotes... Do you think the community changed that much in two months?
Maybe HN should take articles that get more than one up vote but die down to resurface in new several times to see if it slipped though the cracks or if it really has no value to the community.
> There was a submission of this two months ago as well that only got two upvotes... Do you think the community changed that much in two months?
There's probably pretty big effects from time of day and other things on whether or not things get upvotes. That's not a "hive mind", that's just the fact that upvotes come from people coming here as they feel like doing so, not from a bunch of 24/7 staff that are paid to apply objective, consistent criteria (which would be a kind of hive mind.)
You should be able to express your ideas -- with jargon or nerdspeak or technobabble -- in such a way that people can understand you even if they don't understand those words.
Site is down for me ("Oops! Google Chrome could not find www.theshortcutts.com"), anybody else having problems? Tried my ISPs DNS as well as the Google DNS
The pinterest layout is only really good for pinterest (or very similar content). This forces my eyes all over the place when I just want to read down the list.
The third Q&A "How can I tell google that multiple domains are related?" - "Use Hreflang" doesn't make any sense. This question does not reflect the question asked in the video, and the answer is incomplete. The question is about translated versions of international sites, that's not simply "related". And sitemaps work as well.
I think the site is a good idea, but if one of the first three results is flawed I'm not too sure about the overall quality.
This site has actually been around for a while. Great Coles Notes for SEO. But if you have the time, watch the full videos from Matt. His advice is required listening.
I wonder if there are any correlations between shirt colour and content - like he [unconsciously] wears a red shirt when he's being less open with his responses.
I switch shirts every 5-6 videos, and we go down the list of videos in order of votes, so shirt color is mostly uncorrelated with the content. If I'm doing a longer or more in-depth video, then I might pick a specific shirt that goes with the topic.
I don't get the inconsistent answer to site load time:
If you were an SEO of a large company, what would you include in your 2011 strategy?
Optimise site speed, control of CMS, education program, internal linking, social media
Should I be obsessing about load times?
Slightly
Do site load times have an impact on Google rankings?
No
Page speed does have an impact on Google rankings. You should care about the speed of your site not just because of your Google rankings, but because it makes a huge difference to your user experience.
That's the hazard of this site. In summarizing to 1-2 words, some of the nuance is lost.
The lead-in copy on the landing page might need an iteration or two, "to help struggling site owners understand their site in search" read very strangely to me. I'm not a native speaker, though.
You're correct. It's not great copy. Also, 'struggling site owners' will not understand these videos. Cut 'struggling' and 'their site' and it becomes:
"...to help site owners understand how Google search works..." or "...to help site owners understand how their decisions affect their Google search results..."
Nothing changes the fact that Google is your biggest competitor and they control ranking, the ads and display. Google's ad clicks keep up each quarter and your traffic down. (No, online traffic or search volume isn't increasing by that much. Not even close)
Matt does a great job for Google buying them time but for webmasters he's useless.
This is a simple, but brilliant optimization.
With the Manufacturer & Model info from the tag you could make an educated guess about professional vs. non-professional photography and draw conclusions about website's intent.
With the date & time you've got relevancy of time beyond just search results. EG if you're optimizing for freshest / relevant content a "new" article that contains a recent picture with positive PR signals could outrank another "new" article that is using old pictures.
With geo-data you've got an obviously powerful signal.
And on and on.
It is kind of crazy to think about the sheer # of breadcrumbs we leave online for GOOG (and, ahem, other organizations) to learn about us with.
PS -= This is hugely useful resource, thank you for submitting!