Hacker News new | past | comments | ask | show | jobs | submit login
Google Broke Image Search for Creative Commons (cogdogblog.com)
368 points by colinprince on Sept 28, 2022 | hide | past | favorite | 151 comments



I read comments and articles all the time about the quality of Google's search dropping. I haven't noticed it much in practice, but I'm persistent, use copious ad blocking, and my Google-fu is strong, so I usually find what I'm looking for.

Image search is another story. I've been less and less satisfied with Google Image search results, and visual search has been totally neutered. It only returns low res results that rarely match the original as well as it used to. I used to be able to plug in a 400x400 image and find a dozen copies of it at a usable size. No more. Too many copyright complaints, I assume. I've started using Bing for image and visual search now. It's not as good as old Google Images, but marginally better in some cases than the current iteration.


I've stopped being able to reliably find an animated gif and copy and paste it.

One used to be able to just right click on an image and copy it. Now, I only get a still. I try clicking into the website that hosts the image and it's click, click, click just to get anywhere close to the size I want and often times its not even an animated gif anyway because they do some sort of media query and serve me up an uncopyable movie instead.

The web sucks now. People work around it with bots and the like on Reddit, but I feel like the economics have been figured out and it's not fun anymore.

Instead, we use /giphy in our Slack and hope the algorithm finds something that kinda-sorta was what we were thinking.


I use Kagi frequently, and just for giggles give Yandex image search a try, you can actually specify dimensions (like you used to on Google).


Another recommendation to use Yandex Image search as an option. They search differently and also try to find similar images.


i see putin has his troll army everywhere!!!!

(yes, im kidding)


With /giphy I think half the fun is when it falls flat on its face. The original command is always visible and most people just read that for the sentiment and then enjoy when the gif it found misses the mark by a mile.


The problem is not just google's algorithm but also the fact that people are not sharing useful information the same way they used to. These days when I'm trying to hack smarthome stuff or looking for advice on 3D printing something or software limitation workarounds the best I can hope for is a subreddit, otherwise the knowledge is hidden on a discord server after I "join the community" for the 98343789th time. I can't just read a 10 year old forum thread where people talked each other through solving it, I have to join the discord and figure out which channel to ask my question in and do some dance with frog memes and people react to my question with an emoji of a toothless man laughing before we can talk shop.


I'm sympathetic but this is turning to a get off my lawn rant. Things weren't easy too depending on what it was. Forums and irc were just like discords in that you had to deal with insular culture and often serious verbal abuse for being stupid enough to ask for help in a forum meant to get help in.

That said, I do see the sentiment. It would be nice to have the old convience of being able to look up old forum posts (especially with summaries in the OP via edit). Stackoverflow often fits that role now although I dread the answers even worse than old forum posts. I guess what I want is my cake and the ability to eat it too, I dont know why we can't just have the ability to search forums and have the general kinder attitude that modern media tend to have, they shouldn't be mutually exclusive.


> serious verbal abuse for being stupid enough to ask for help in a forum meant to get help in.

in all honesty, isnt this a bit of "beauty is in the eye of the beholder", "sticks and stones" etc?

yes, you might be called various words, told to RTFM. but seriously here, is it really so bad? do you really want to drag down actual verbal abuse to something so absolutely trivial?


Yes, forums and chat channels have always been cesspools but at least forums (and IRC logs) were _searchable_ cesspools


This really depends on the forum. Yes, most had some eccentric people, but this was a small price to pay. More strictly moderated platforms or "modern media" tend to show you more ads and questionable content that is overall more harmful than someone being a bit too blunt. Also, there were extremely helpful forums too that had a much, much better atmosphere than your standard discord channel or other large social media platforms without it even being a competition.

I don't want the internet to be kind everywhere, some artificial culture. If I don't like the tone of the place, I can go elsewhere. Strong moderation comes with too high a price in my opinion.


But you could search them. And still can. If it's in discord, it's a black hole.


Hiding documentation in some Discord is infuriating.

It's Yahoo Groups all over again, except that open groups could have their messages indexed, unlike on Discord.


Even popular things such as finding out what are the more popular Path of Exile league starter builds is harder to find now than in the past. At this point, I'm often just searching the PoE forums or subreddit to get an idea for a build.


> my Google-fu is strong

The thing is, this doesn't matter anymore. You have very little control as Google tries to be smart. It's very hard if not impossible to find something older, obscure, things from other regions, languages, etc...


I think "Google-fu" just refers to being able to bend the search engine to your will. In the early days it was with operators and special keywords (inurl, etc) but today most of those are not as useful or actively harmful and so "Google-fu" has progressed. It's knowing which terms to drop from the error you are searching for, it's knowing how to phrase things correctly, it's knowing how to skim the results and separate the wheat from the chaff. Or at least that's what it means to me and how I use it.


I don't think they misunderstood and I think their point still stands. The irrelevance of Google's search results are becoming ever more unyielding to the user's intent. The portion of results that are SEO spam for every query is increasing. The amount of your query being dropped and ignored in your search is increasing. Google results are becoming increasingly irrelevant and is on a trajectory to a point of completely ignoring your query where the results are strictly a combination of spam and a random pick of websites.

"Google-fu" is not progressing. It's struggling to hang on by the decreasing number of threads before it's utterly ineffective.


Google-fu is prompt engineering.

Where we used to say “rome fall why”, you’d now write “why did the roman empire fall”. Because the AI likes that phrasing and produces better results.

Soon you’ll write a 300 word description of what exactly you’re looking for, like you would when asking a trusted expert, and Google will figure something out. The days of keyword searching are long gone.


Not to pick on your quick example, but actually testing it, these two terms [1] [2] have nearly identical results. Top result in both cases is history.com, followed by wikipedia, and the next 4-6 results are the same but in slightly different order.

I think this is actually an example of the benefit of their AI. Despite the big difference in the "style" of phrasing (simple english vs more formally naming the subject noun), both seem to map to a very similar representation in their embedding space. I've run into frustrations with this myself, but for basic questions like this it seems like the search works Pretty Good.

[1] https://www.google.com/search?q=why+rome+fall&oq=why+rome+fa... [2] https://www.google.com/search?q=why+did+the+roman+empire+fal...


Consider that "Why did the Roman Empire fall?" consists of 1 word that describes the type of question being asked ("why"), 2 useless junk words ("did the"), and 3 "key words" ("Roman Empire Fall"), of which 2 should really be treated as a single word referring to a single concept/entity ("Roman Empire", for which "Rome" is a synonym in some cases).

Humans instinctively know this, so we are able to construct queries like "why rome fall".

But that "why rome fall" query, which we think of as a purely mechanical keyword search, already requires quite a bit of sophisticated processing in the search engine. The system has to recognize that "fall" is synonymous for "collapse" or "wane in power" and not synonymous for "autumn". It also has to recognize that "rome" means "the (Western) Roman Empire" and not the modern city of Rome in Italy or "the Holy Roman Empire" or the city of Rome, NY, USA. It furthermore needs to interpret "why" in such a way that it emphasizes results with "reasons" or "explanations", rather than something like a "timeline" or "summary".

Personally I find it really weird that Google is interested in pushing users more to interact with its digital librarian / AI assistant, instead of continuing to improve keyword search.

I have a few guesses as to why they are going this way:

1. It makes the user interface simpler from an engineering perspective (fewer user-facing buttons and options to implement and test).

2. There is strategic benefit to making search more of a black box. Maybe they are specifically trying to "educate" users to expect and be comfortable with such black boxes. Maybe the plan is to get people so accustomed to "AI assistant" search that they see keyword search as outdated, and thereby secure a competitive advantage for the next several years over other search engines, by having the biggest and best AI models.

3. They are trying to increase the amount of rich "natural language" user search inputs in their data. Making keyword search worse will encourage people to use queries that more closely resemble natural language. I assume that this has strategic benefit related to Guess 2 above.


My own google-fu tells me that both of those searches are likely to be pretty poor - using the word "fall" instead of "collapse" is likely to snap up quite a few weird results about autumn tourism in italy and including "empire" in the second query feels likely to get you a batch of other poor results (like, for instance, the collapse of Russia commonly known as the third roman empire).

Personally I'd suggest "collapse of rome" which does deliver you a rich embedded result specific to the fall of rome.

I agree that Google's search parsing peaked a while back though, it seems to be getting weaker and weaker and now partially relies on the fact that search term autocompletion on mobile devices will supplement it by helping present an array of options near what you might want.


That may work for common topics, but it does not appear to work for niche ones, at least in my experience.

If I want to find a particular user-run forum on some obscure bit of some hobby, "<hobby name> <forum topic>" brings it up. But if I type out "Forum for <hobbyists> discussing <topic>" I get... a random selection popular of fora where someone has mentioned <topic>, often in passing or with minimal information.


Except that this isn't objectively an improvement, even in a perfect world where AI is substantially better than it currently is.

> you’d now write “why did the roman empire fall”. Because the AI likes that phrasing and produces better results.

"the AI likes that phrasing" is exactly the problem here. How is anyone supposed to know what the AI "likes", other than painstaking trial-and-error in the unbounded and arbitrarily high-dimensional search space of human language?

Even the people who built the model probably don't know. Language models (and deep NNs in general) are extraordinarily complicated things, and there are problems with pretty much every technique that purports to provide visibility into their inner workings. There are just too many parameters and too many "information paths" in such a thing for regular people to wrap their heads around it. The ability to incorporate a high amount of complexity is a big part of why those models are so effective to begin with, but it also makes them really hard to reason about.

"AI" is currently in a weird spot where it's starting to kinda-sorta behave like an intelligent human in some limited settings, but in general is nowhere near as smart as a human. Most models still have a very shallow conceptual understanding of anything, even if they're becoming uncanny in their ability to match sophisticated patterns. It might not even be possible to teach some concepts to language models as they currently exist today, if only because there is only limited conceptual understanding available to be learned from corpora of text and images, even huge ones. Humans are still tremendously more effective than our best language models at understanding meaning and intent. Can an AI ever learn about love, regret, fear, or bliss, by reading millions of news articles and books and looking at millions of images?

Thus AI right now is in a kind of "worst of both worlds" situation, where it is complicated enough to be hard to reason about precisely, but still mostly unsophisticated and therefore highly sensitive to how inputs are crafted. Therefore it's hard to formulate inputs that provide useful outputs. It's still alpha-level technology at best, and there might be one or several conceptual innovations remaining between what we have today and something resembling general intelligence.

Consider also that "AI assistance" is complementary to keyword search, not a replacement for it. Google search AI is becoming something like a "digital librarian", a creature that can understand your queries and guide you to a starting place in the relevant literature. But much like in a real library, the digital librarian is going to be most useful as a starting point. At some point, if you already know what you're looking for, you still are going to want to search on "structured" criteria, as well as, yes, keywords embedded in text.

And finally, do you really want to type a 300-word description in order to get good search results? I was already getting good results with 3 keywords. I have already done the sophisticated pattern-matching and concept-graphing in my own brain, and now I know exactly what terms I want to look for. Why should I be forced to coach an AI on how to redo all that work for itself, instead of just letting me do a damn keyword search? Not to mention wasting my time and giving me carpal tunnel typing it all out.


You touched on the fact that the AI right now is primitive and unable to parse regular english well - I agree that it still has a ways to go in this regard but, while all of us complaining here might prefer the old google, we are the "in" crowd that actually put in the time to learn the old arbitrary rules. It isn't great to keep around arbitrary rules purely for the sake of consistency if those rules are bad. I think the fact that the search results are often unable to clearly distinguish different questions (and may return some autumn related results for "fall of rome") is a clearly bad thing - but the old format we're used to required a lot of learning and adaptation (the aforementioned "Google-fu") that shouldn't be a necessary skill for future generations.


I had a good laugh yesterday. I was setting up a password vault for my mother and had to get the login pages for various services and a few did like Etsy does - search for "Etsy Login" and you'll notice their SEO has managed to put /search?q=login above /signin in google's results. It amuses me whenever SEO is taken to such an extreme that it actually makes the results from your company less useful for people actually looking for your company.


It's different for me, but the second result is quite funny: https://i.ibb.co/fqSF7rd/Screenshot-20220929-093644-Chrome.p...


It's the opposite now - you increasingly have to use "Google-fu" to make sure that your query is not creatively reinterpreted in ways that are virtually guaranteed to yield irrelevant results (but more of them) - e.g. substituting words with "synonyms" (which aren't), or removing the most important keyword from the query altogether.


These days you need -youtube to get rid of the video results.


I made my career using google search and it has changed pretty significantly from the early 2010s. There was a point where it would seemingly scour the entire web for whatever string of text I entered, but now it tries so hard to give me what it thinks I'm looking for that I very often get results that are completely irrelevant. I'm still usually able to find what I'm looking for eventually, but it takes a lot more work on my part.


It's why I used DDG first, because it does this less in my opinion.


DDG is just Bing though.


With a proxy!


Wait how did you make your career using Google search?


Google search is how I was able to learn how to do the jobs I've been doing for the past few years. Fixing one problem at a time by googling it.


Oh, okay. Thought this was some kind of SEO wizardry or YouTuber or something. Thanks!


> I read comments and articles all the time about the quality of Google's search dropping. I haven't noticed it much in practice, but I'm persistent, use copious ad blocking, and my Google-fu is strong, so I usually find what I'm looking for.

The results quality has gone down and it's noticeable only after you switch to another independent search engine, like Brave Search.

For example, search for the term: "javascript undefined vs null" on Google and Brave Search. Brave Search gives way more information in the sidebar and Google doesn't at all.

The discussions feature on Brave Search is great, you don't even need to append queries like 'stackoverflow' or 'reddit' for searching discussions.

On top of that, let's say you're trying to search for an npm library like 'react-select', if you search that term on Brave Search, it gives you a button to copy `npm install react-select` right below the npmjs.com link.

It's crazy how good Brave Search is compared to Google sometimes, haven't used Google Search in a long time because of it.


I thought we were against Google et al. implementing features that keep people on their site instead of directing traffic to other parts of the web?


There's also something to be said about how Google should be returning information, not answers. Skip to the 1:00 mark in this Technology Connections video where he demonstrates how if you ask Google when the touch lamp was invented it'll pop up with a giant, confident answer of 1984, despite the fact that if you do the research yourself you can find the first patent for it was filed in 1954.

[1] https://youtu.be/TbHBHhZOglw


I'm against Google doing it because Google is huge. I support Brave doing it because it's a useful feature and Brave is inconsequentially small.

Big things are not the same as small things, and should not be treated the same.


Should Brave's useful features be removed at a certain threshold if they see significant user growth?


As far as the thought experiment goes for making up rules, you could probably have a threshold where advertising ability gets cut off and it would do enough.


That's how small things become big, and then the cycle repeats.

So I guess you're advocating that this dynamic equilibrium and "circle of life" is just panglossian optimal?


The optimal is that nobody ever gets more than 25 (or whatever) percent of the market.


It's certainly controversial so all I can give you is a subjective viewpoint but yes, objectively these things make search engines more convenient but cut the traffic to the original websites by a small margin.

So far, I've noticed Brave Search only shows sidebar results for Stackoverflow and a few other popular forums that do not advertise on their pages directly, nothing else. So they're not really taking any revenue away. As for npmjs thing, I'm not sure if their revenue is hurt in any way because to view the package documentation you still need to open the link. Brave Search just provides you a copy command button extra for convenience.

As for discussions, they do not give full context so you always need to click the link so that's great for website owners as they get more exposure and traffic too.

At this point, Brave Search features are more on the UX side of things than anti-competitive so I personally will hold off the tinfoil hat for now.


There is no we on stuff like that, just vocal minorities and an apathetic majority.


Who's "we"? HN is not a person - not that an individual necessarily has a single, self-consistent set of beliefs themselves...


I was thinking the same thing. We can't complain about Google doing it but then give other search engines a pass.


I certainly am. Brave search putting answers in the sidebar is not a point in its favor, IMO.


Brave Search is pretty good for basic queries but it doesn't support most operators, even basic ones like quotes, making it less suitable for people who use their google-fu a lot. When I use Brave as my default, I have to fall back on DDG or Google much more frequently than I was falling back on Google with DDG as my default, and it's usually because I need to use an operator.

More on topic, they still use Bing's near-useless image search, so even if it's getting worse Google Images is seemingly the only decent option in that category.


I just noticed more and more copy and pasted content from stackoverflow intruding on my searches. Word for word the same content. There are so many people out there just creating blogs and stealing content and SEOing their way into traffic, it's not even funny anymore.


On some of them I’ve noticed the text will overall have the same idea but a lot of the words will be different. I think some of these sites will translate it to Chinese and then translate it back to English to give it a lower similarity score. It is truly amazing the amount of effort some people will expend to avoid adding anything useful to the world.


Not only to avoid adding anything useful, but to actively make it worse.


And they probably don’t even make any meaningful amount of money off it. But given enough scammers trying this sort of thing at least for a while and the web is polluted even if a given individual has already moved onto their next scam.


And because something might be answered either on stackoverflow or one of the many stackexchange spinoffs, restricting your search to either will remove results from the other.


If you're interested in a comparison, try Yandex image search. Someone here mentioned it a while ago and I now use it as my go to.


Yandex is good. I use presearch, which searches independently, and provides links to other search engines on the side.

https://presearch.com/


I hadn't realized that Bing image search still lets you link directly to the Image URL, not just the page it's hosted on. Something Google stopped doing a few years ago in response to content platform complaints.

OK, I'm definitely switching to Bing Image Search for images.

Per OP... they do seem to have a public domain/CC search limit feature too. I don't know how well it works. (I'm not sure how either it or Google identify CC/public domain content. Is there an opengraph tag?)

https://www.bing.com/images/search?q=dogs&qft=+filterui:lice...


I used Tineye.com's reverse image search extensively before google added the feature and baked it into chrome. Google's results have consistently worsened to the point that I've switched back to tineye and installed the chrome extension.


Sorry if this is off-topic but does anyone know if "tineye" is a reference to Brandon Sanderson novels? Specifically the Mistborn series?

I couldn't find any reference on tineye.com but it seems like it has to be.



Try Yandex. I've found it to work better when trying to find a different, higher res version of certain image and Google to work better when trying to find a related image.


I've definitely noticed worse results from Google search.

I just get page after page after page of "content" that appears to be either GPT written or written by somebody who has no idea about the topic.

They all seem to follow a pattern, they have a table of contents, and they take sentences from real sites regurgitate them and put them together into semi-random paragraphs.

If you know nothing about the topic it appears on the surface to be legitimate. And I bet to any quality engineers it all seems totally legitimate, because they're not experts in these fields.



Yandex image search is fairly decent.


Its actually crazy good and does borderline questionable things, like it does facial recognition. you can upload a social media picture of someone and Yandex will find that person elsewhere on the Internet.


I just use startpage.com as it doesn't use google's "personalized search" algorithm(s) and well because it's more private.


Just a couple days ago I was searching for a funny video I had seen. I tried searching on YouTube and Google - typing in descriptions, what I remembered of the title, the excerpts of dialog I could recall, what was going on in the video - couldn't find anything even close to it. Searched on TikTok and got it in the first result.

Google search quality is in severe decline in my opinion. I have many experiences where I am searching for stuff that I know exists and that I know Google of yesteryear would have found, and Google comes back with garbage results and spam. Personally, I am hopeful that this means a Google-killer will be coming along soon.


https://youtu.be/bWbytHBp0zI

Google search is severely broken.


I was feeling this as well, but didn't pay attention _how_ bad it is. I mean it really is... Is there ANY search engine left that returns more than 500 results for any search? Are there any community driven search engines? I mean at this point it can't be hard to build a better alternative...


Kagi is great so far!


Tried it, but returning lots of results doesn't seem to be their focus.

Now I installed OpenSearchServer and see how far a local index gets me. The simple query "a" gets me at most 394 results at google, after letting OpenSearchServer crawl/index for just 2h I already get over 900 results for "a". Well.. I guess it really is not hard to beat that meager 394 results.


> my Google-fu is strong

Initially read this as, well, "FU Google", and thought "yep, FU to them too". I guess I will acknowledge I have a bias. Then curious about this term, I googled-on-bing "FU Google". Top result was Google-fu, not the expletive.

I have no Google-fu.


Google fu? You are actually trying to hack Google in an unauthorised way.

'More than 1 result is a bug, citizen.'

https://www.youtube.com/watch?v=XeIIpLqsOe4


Google Image Search is deteriorating in certain areas for years. Namely:

* Reverse Image Search (sometimes no matches although image is for sure out there). I wonder if Reverse Image search sometimes broken because of copyright?

* NSFW images (I'm really old enough and I don't need to be protected by Google or anyone else, I think it's a kind of censorship)

* And now also Creative Commons as pointed out by Op

The best alternative (tested them all) is in my opinion

https://yandex.com/images/

Chrome extension for reverse image search supporting Yandex

https://chrome.google.com/webstore/detail/fast-image-researc...

Android app supporting Yandex for Reverse Image Search

https://play.google.com/store/apps/details?id=com.thinkfree....

P.S.: I know Yandex is based in Russia, I only use it for specific image searches and I'm happy we have a good alternative based outside of USA... I wished there were more in the World.


It's amazing how they have destroyed the reverse image search for the last 3/4 years. It's since the switch to ML and the identification of keywords, they don't seem to have image hash search anymore.


Yandex is the best image search. It even has something better than duplicate search - I believe they do embedding based similarity. It works like lexica.art, a recent diffusion image search engine.


For image search Bing is actually decent, and doesn't serve you webp.


Bing is second best after Yandex and better than Google sometimes. But Yandex still outperforms in my subjective view especially on Reverse Image search.


Hrmm, I tried reverse-searching for the last photo I took - of the USNS John Glenn at the port of Oakland - and both Yandex and Google return similar results, but the Google result returns instantly and Yandex took nearly a minute. What's an example of a search where Yandex does much better?


https://addons.mozilla.org/de/firefox/addon/reveye-ris/

This one is for Firefox with Google, TinyEye, Bing and Yandex.


The key point here is "... and hardly anyone noticed."

In theory, site reliability engineers and software engineers work together at Google to maintain existing functionality and check for regressions. In practice, regression checks can only cover what they've been told to cover, and if breaking a feature doesn't cause a metric to crash, Site Reliability doesn't have the information to know something is wrong.

The result is that generally speaking, Google prioritizes existing features by how popular they are (i.e. "Maintenance by Popularity") modulo how much of a stink someone influential can make if they screw it up (i.e. "Maintenance by Twitter."). If almost nobody uses the CC filter (and I bet, in the grand scheme of things, they don't), Google may not have enough signal to know they broke the filter unless someone had the foresight to add a metric to check search results on that query separate from the rest of the search result data (and at Google's scale, you can't just add a metric for free; in addition to eng-hours to build and tune it, the data has to be stored somewhere and teams have finite budgets for space that can only be grown by negotiation with the relevant teams managing the monitoring services or the project as a whole).


...or they screw it up, make it worse and worse, and less and less people use it, making it less popular, less chance someone fixes it, less people use it, less chance, less people, less, less, google graveyard.

Both google image search and "normal" google search are becoming more and more a pain to use, where you have to use quotemarks on pretty much everything plus a few excludes to find anything at all.


Yep. This is definitely the flip-side of Google's process.

It makes some sense; take a step back, and the story is "If Google software engineers don't care enough to make it good, and Google users don't care enough to keep using it in spite of its flaws, why is Google throwing money at it at all?"

Google is a weird company because so many priorities are set by software engineers, not managers; some projects die because they literally run out of passionate engineers to work on them and management isn't incentivized to force engineers to work on projects they hate, instead asking the question "If nobody wants to work on this, is it worth it to keep doing it?"


I mean... on the other hand, managers like to bring out some shiny new projects... noone gets praise for "we continued supporting this project..." at quarterly and yearly meetings.

Atleast I think so... what else can explain so many chat platforms that came out of google and died soon after?


At some point you reach a complexity level that even the engineers don't even know how it's supposed to work.

I mean someone type something on google the result is not what is expected how do you troubleshoot that or know that it's a regression?


I am using `kagi` as my search engine. While reading the article I searched `sur:fmc` [0]. The first result was a site that gives information about search urls [1] and gives correct description of what is the function of `sur:fmc`. I couldn't find this page on the first 5 pages on google [2] (and didn't look further). Not to mention google trying to correct me the query text.

[0] https://kagi.com/search?q=sur%3Afmc

[1] https://sites.google.com/a/arps.org/esresearch/images

[2] https://www.google.com/search?q=sur%3Afmc


Kagi search? It said something like after I exceeded my daily search limit in the trials: "Go search somewhere else". I mean, who are these snotty kids, that have the audacity to talk down on people? Not gonna spend my money there for sure.

Kagi is good, but get a proper PR person and don't let imbeciles ruin the experience.


This is because it is a paid search engine with a free limited trial. This trial is limited because the search inquiry is costing them money and they don't have VC money to through on growth. Neither they plan to follow this growth path. They focus on quality for the targeted market. And to be honest, they succeed in that until now.


I think the point is that there are much better ways to communicate this.


This ia because they are mainly a paid search engine with a limited free trial.


Guessing "ur" in sur means Usage Rights, and M means Modification, C means Commercial, F means Free (to reuse)?


$10/mo is a really hard sell for something I can get for free, but if it's actually useful I would be very tempted


IMO, it's worth it. People spend that much on Starbucks every day. The key feature I like about Kagi is that being able to downrank or outright block certain domains is built right in. I also like that it doesn't throw a bunch of crap on to the page above the fold and it doesn't pretend that it found results when it finds no matches.


Kagi is great! I have to use Google for only about 5% of searches with Kagi, vs about 40% with DDG. I was genuinely surprised


I've noticed recently almost all broad searches on image search return watermarked stock photos.

It's terrible. The entire stock photo industry is so bad at creativity you can basically instantly tell if a photo is a stock photo, making anyone using them look like a complete fool.

Anyway, they either have some serious legal issues with image search or they are becoming precautionary, but it's becoming almost impossible to find decent images.


>> "The entire stock photo industry is so bad at creativity you can basically instantly tell if a photo is a stock photo, making anyone using them look like a complete fool."

Only the bad, outdated stock photos. There is still a whole market for the "obviously corporate corporate website" corporate website, but that's falling out of fashion.

What you're thinking of are the bland, white backgrounded photos and photos shot in stale, generic office settings so they could be worked into any design. That's not really how it's done anymore. Modern stuff doesn't look posed and staged, and often isn't.


In stock photography, if there are people in it, they can only make money on that image if there are signed model releases. If you have signed model releases, it is staged and posed.

Sure, someone could take a candid image and then after the fact attempt to gain releases. However, that's not workflow with a high margin of success. At that point, the "model" has all of the power. Also, crowd shots in public streets blah blah.


W8 Y? At least in the US I'm generally free to photograph and video anyone I'd like, own the rights to said media, and and do whatever I'd like with it outside of a few explicitly carved out scenarios like fraud or using images of somebody whose career is also their image. Are stock photos one of those carveouts?


After they were sued a few years ago, Google seems to have neglected this product. Bing is way better.


It's a common pattern... No good engineer wants to work on a product with their feet stuck in legal quicksand. So all the good engineers leave, and the product stagnates with no direction.

Even if the lawsuit is won, the product is still doomed.


> you can basically instantly tell if a photo is a stock photo, making anyone using them look like a complete fool

I find this to be a highly interesting take...

The whole point of stock photography is that you can buy images to use various projects, quite often one-off or short-term ones. Because for the project, taking your own photos or paying a photographer to take them would be too expensive or time-consuming relative to the estimated value and return of the project.

I feel like once someone has made the decision to buy or use a stock photo, they have already decided where they want to stand on the scale of authenticity and originality. Deliberately seeking out "stock photos that don't look like stock photos" just sounds too much like trying to be something you're not.



With any luck DALL-E will undo this.

What if one day every Google search has an "auto-generated" panel of images as well?


IMO, DALL-E suffers from the same problem because it's been trained on the very same boring stock photos.


Unfortunately you can't really be sure that an image with metadata claiming it is CC licensed is actually CC licensed or that the website offering it has the permission of the author. I have been burned by this.


You can say the same about anything. I am guessing you don't use anything that is open source in any capacity and don't buy any proprietary software and libraries in case they are lying and they can't actually distribute them?


Sure. But this actually happened. I've been twice bitten by using images that claimed to be CC and then an apparent copyright owner appeared and said otherwise. I've never had that happen with open source software.

I think copyright trolling is more prevalent with images, and I think it's generally easier to determine the canonical origin of software. But yes, it's absolutely a risk and a reason why many companies have a legal review process before any new libraries can be used.


True. Although an additional wrinkle with Creative Commons is that, depending upon how conservative you want to be and how the copyright owner interprets terms like non-commercial and what constitutes appropriate attribution, there are all sorts of variations that may or may not be suitable for a given use.

Of course, for many casual purposes it's widely ignored and for photos of people used for advertising and marketing, you need a model release anyway.


To add what others are saying. I CC-BY all my photos but just because the photo is CC-BY doesn't mean it's safe to use. I don't know all the other "rights" but for example a photo I took of Mickey Mouse, or a movie poster, or a photo I took of some art in a museum may have additional rights issues. Even pictures of buildings

https://helpx.adobe.com/stock/contributor/help/known-image-r...

Note: I get that adobe might be wrong here. Whether they are right or wrong on particulars is beside the point. I just linked there because it was clearer than most other search results I found.

Here's another

https://mymodernmet.com/eiffel-tower-copyright-law/


100% this. I run a decent sized publisher, and have to make sure images are licensed properly, and even with proper training, we still get the robo-lawyers shakedowns at least once a quarter. It's between $400-$1000 per "settlement", so still less than Getty licensing costs. Cost of doing business :shrug:


I can't prove it, but my theory for one of the images I got bitten for is that either the photographer or a coconspirator posted the photo to WikiCommons as CC licensed, then later the photographer sends a takedown saying it was posted by an imposter and isn't authorized. It's deleted from Wiki but then they get to hunt down everyone who copied that image and send them a bill of $1000. Quite a scam.


There was an article that trended a few months back too about CC "Attribution trolls". (I can't find it in a quick search, sorry, but I can paraphrase.) There's a legal "bug" in the Attribution clauses of early CC licenses that basically says that the copyright owner gets to dictate how the Attribution must read down to detailed specifics in wording and formatting. They post to WikiCommons as CC licensed under specific old versions of CC and rely on the fact that most people don't copy and paste the attribution strings verbatim to troll for licensing fees.

(So, watch out for CC licenses older than 4.0 for that.)


I go to Yandex for image search first, these days, especially if I want to do an image-based search. Whatever you may think of it, Yandex uses both facial ID and object recognition, and will return side-results that are both visually and semantically related to the uploaded image.

Google Images, on the other hand, seems to look for the nearest monetizable domain cued by the uploaded image, and gives you adjacent results about that. It certainly does no facial recognition, etc. (well, none that it will feed back to you in results, anyway).


Google image search has been broken for at least two years now - looks like Google does not give a damn. I remember it could find 5 to 20 times more pictures than before, you could actually trace images to their sources - this has become impossible. Why? I've no idea.

Then they disabled the Google web cache which was hugely useful since it allowed to open dead website/webpages or allowed to browse something when your government restricts access to it. I guess the copyright lobby and China forced the removal of this feature.

Google search is still unmatched in terms of being able to find text but other features have seen a huge cut. :-(

And don't remind me about iGoogle. I loved it. https://web.archive.org/web/20160314122329/http://linuxfonts...


I'm with Google Search. I tweeted to the author of the post, but will also share -- this looks to be a bug that we're tracking down. Definitely not what we'd have expected or wanting to have happen for a queries of this nature.


Sorry, but it's more than a bug. it's blatant manipulation. you even acknowledge it as much just claim it is to fight extremists, but the truth is you neutered the platform in 2016


I made canweimage.com. It can't replace all the features of Google, but it can fit the bill if you just need a basic search of Creative Commons.


Thanks! But just to be clear, it isn’t somehow searching all things under a CC license, but just things within wikimedia right?


Last time I checked, Creative Commons had their own search for things under their licenses. But now apparently that project got transferred to WordPress and is now named Openverse: https://wordpress.org/openverse/?referrer=creativecommons.or...

Anyways, I'd argue that's the most comprehensive database of CC-licensed works.


Thanks for this.

Also, I just checked that flickr.com still allows you to filter their searches by license.


This looks great. Care to share the API call used for searhing?



Amazing!

I notice a slight difference between

https://commons.wikimedia.org/w/api.php?action=query&generat...

and searching for 'cat' on canweimage.com

Is there some query processing needed?


I use these additional parameters: `&gsrlimit=25&prop=imageinfo|pageimages&iiprop=url|size`. I think it just changes how much and what type of data is returned, but maybe that could be the difference?


I stopped using Image Search like a year ago, simply because it is useless.

You cannot find anything.

Personal anecdote - I was at a specialist doctor to discuss the results of my tests. I was shocked when he typed a health issue to Google Image search, clicked on (what seemed to me) a random table matching his search terms and compared it with my results. He then said that everything is fine, the parameters are normal.

That got me scared, what if someone intentionally put a doctored table and used SEO to promote it to the top to mislead doctors and causing bad outcomes to patients?


Thanks for the HN treatment. As a followup piece, a Google response of this being "a bug".

Followup story, my return rate for CC licensed images of "dog" went from 3 to 13. More than that, license info is not displayed (only linked), is frequently wrong, and photo credits often given to the site, not the creator of the image.

https://cogdogblog.com/2022/10/google-cc-image-search-better...

Try Openverse for much more accurate and plentiful results https://wordpress.org/openverse/


One thing with any query parameter API like this is that there's no guaranteed signal when the API has breaking changes.

I'm going to assume that there are hundreds or thousands of products, tools, hobby projects, ect, that direct to Google searches; none of which have any mechanism to know and break gracefully when the API changes. Furthermore, Google is under no obligation to coordinate with anyone who just arbitrarily send queries their way. (I've had a few hobby projects use Google Queries.)

Seems like the most we could really ask is to put some kind of version stamp into the query parameter; and Google could optionally support old parameters or simply return an error. Otherwise, we have to accept that sending browsers to other pages via query parameters is inherently fragile and has a high probability of breaking at any time.


I have noticed multiple problems in Google search results. But I don't give them any feedback because I don't feel helping Google become better, and therefore more dominant, is the right thing to do long-term.


Does anyone know how Google Image Search (or Bing Image Search[1]) identify images as creative commons or public domain in the first place?

If I'm a content provider that wants to maximize the chances that an image search will flag my images as CC or public domain, what should I do? Are there open graph or other meta tags? Or what?

[1]: https://www.bing.com/images/search?q=dogs&qft=+filterui:lice...


Probably they have a license classifier trained on thousands of pages that have been manually checked.


I fear this may be a feature no one at Google is actively caring about anymore.

Here's Matt Cutts' tweet about it in 2014: https://twitter.com/mattcutts/status/422944316458168320

But I think it dates back to 2009? https://googleblog.blogspot.com/2009/07/find-creative-common...


I noticed. I use this tool all the time. Now Google just made it useless.


I encourage anyone looking for freely-reusable images to try the new MediaSearch feature on Wikimedia Commons. Also works for other types of media files. All images on Commons are free to reuse but there is also a license filter if you are looking for more specific permissions in terms of attribution, etc.

https://commons.wikimedia.org/wiki/Special:MediaSearch


I think this is not that there are only three dog pictures because Google broke CC. I think this is more of a Google/ML/etc. "over-helping".

If I click on all the other dog pictures bar ("german shepherd, puppy, baby, rottweiler, police", etc. right below the option to select size, color type, time, licenses) additional CC images show up of the selected sub-type. The selection is additive (and) filters.

It is more that Google deciding what I really want.


I'm a little confused by part of this article. The author states part of the evidence they used for deciding something was wrong is that a search not restricted to Creative Commons licensed images included many images with open source licenses and some in the public domain. If you restrict your search to Creative Commons licensed images why would you expect images under other open source licenses or in the public domain to be returned?


Its not just creative commons images. In case you havent noticed, those billions of search results have max out between page 20 and 40 for a while now.



Google's ad delivery platform is become a less and less reliable search engine as it's sideline.

Support alternate search engines, we used to have a bunch of viable options. Let's get back to that.


Yandex is better for image search.... specially reverse image search.


> Who Locked the CC Dogs Out?

Says the person who puts their stuff into Flickr, which requires an account.


People still use Google Image search?


I've been somewhat thoroughly testing Google text/image search by coincidence. I'll share my experience about some oddities here.

I do somewhat unusual things, like search for parts of a joke or a string of semi-randomly generated words for part of an AI type thing I'm working on creating to see if these are either unique or original and how they may appear in the context of the internet. Often times, if anything, there's something like a banned twitter (bot?) account only available through a cached backup that said it once in some bizarre context.

I've noticed it significantly can change search results depending on if you're logged in, or the country you're searching from (via a VPN). Different countries have different levels of success for different types of searches, but I don't have any sort of solid guide to map this out in any shareable way.

Boot up a virtual machine on a VPN if you're curious to take a look yourself. You may need to manipulate it so you don't bring in any suspicious cookies or other identifying information to show your actual country. Some VPN IPs are well known, and your results may be manipulated anyway.

Some results literally will never show up if you're searching from the USA. If you switch to Hungary for example, suddenly things could start appearing. Even if the matching result is a Chinese site that should have relatively equal relevance to both countries.

Sometimes I use Bing. It's not exactly better, but it's also not really worse. It's just different. In my non-scientific opinion after seeing so many of these differences, it's because it feels like they just forgot to enable (or haven't yet gotten to enabling) the kind of filtering Google has.

If something is no longer showing up in Google search, using Bing feels like going back 1 relative year in time before Google nerfed your active search. Sometimes I suspect Google is breaking down searches to parsable keywords and then sometimes adding those to a blacklist.

DuckDuckGo seems to sometimes filter things too, and some of the other commonly recommended alternative search engines. I don't know if they're actively doing this, or if it's a byproduct of forking off some other engine. I don't have much information here because I've largely given up bothering with these.

There's some other large engines not widely discussed in the US I am currently looking at as well. I don't have enough experience to form a solid opinion yet, and I have suspicion of their privacy so I don't want to be loosely associated with recommending it until I know more.

Miscellaneous other thoughts around this topic:

- Google has been heavily pushing some results more than others obviously. Pinterest and Quora are always at the top of searches now. I think this is pretty common knowledge.

- Chrome has a right click -> Search with Google Lens button now. Are they working on AI object detection of images more so than a visual match now? Could this factor into image matches?

- TinEye - When looking into this question myself, a lot of people recommend TinEye. I've literally never had TinEye actually match a picture by the way. Am I using it wrong?


Google Lens is just terrible. Really terrible. I don't know what the hell they are doing there :D.

Tineye is also useless. They have a very outdated library or not crawling spaces they should be crawling.


did not know this was a feature and i use the advanced filters a lot


Is there a better creative commons search engine that will only return free media?


Wikimedia Commons recently introduced a new MediaSearch feature (everything there is free by definition): https://commons.wikimedia.org/wiki/Special:MediaSearch


The article and comments promote this alternative: https://wordpress.org/openverse/


Creative Commons always felt broken to me. Is anyone able to explain why following is not flawed and not possible, that is:

Content thief steals contents anonymously, posts it as Creative Commons content anonymously using free hosting, archives the content on Way Back, and then uses content claiming it is Creative Commons if anyone asks?

Basically it is content laundering - and something that impossible for Creative Commons to address as is.

(If reasoning is not flawed, also interested in possible solution to the issue.)


Your scheme would apply for any sort of licensing, permissive or not, and is certainly not constrained to using creative commons.

It's also flawed in the same, albeit weak, manner: the source it was stolen from predates the stolen copies, and so can show its true provenance. Of course, if it wasn't published or otherwise registered, no one would know.


This is no different from stealing something physical in the real world, and then claiming its yours. Sure some people will believe you and be impressed, but as soon as you attract the attention of the original owner, they will use the law to prove it is not yours but theirs. This is applicable to anything that can be 'owned'. Licensing only dictates how something can be used lawfully, the law protects the license from being abused.


> Basically it is content laundering - and something that impossible for Creative Commons to address as is.

CC is just a usage license that authors can apply to their works in order to share them freely with the world. It's strange to me that you might think it is the job of CC to police how people (mis)use it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: