
Show HN: I made a site to catalogue 10,000 CC0-licensed stock photos - davidbarker
http://finda.photo/#v2
======
liamca
[Full disclosure, I work on a service called Azure Search]

Very nice site! Since your site is so much based around search, I thought I
would pass on a few suggestions based on what I saw. If you happen to be using
a search based engine for your content such as ElasticSearch, SOLR or maybe
Azure Search :-), there are a few simple things you could add to make the
experience a little smoother. Suggestions in the search box are nice to allow
people to quickly see results as they type. You could even add thumbnails of
the images in the type ahead such as you see using the Twitter Typeahead
library
([http://twitter.github.io/typeahead.js/](http://twitter.github.io/typeahead.js/)).
I also noticed that your search does not handle spelling mistakes or phonetic
search (matching words that sound similar). Finally, through the use of
Stemming, search engines can often help you find additional relevant content.
For example, if the person is looking for mice, but your content has the word
mouse in it, this will bring back a match. Since you don't have a lot of
content, this can really help people find relevant content.

Hope that helps.

~~~
chdir
> using the Twitter Typeahead library

Unfortunately, it's no longer maintained [0], plenty of unfixed issues. You
could try a recent fork [1] :

[0]
[https://github.com/twitter/typeahead.js/issues/1424](https://github.com/twitter/typeahead.js/issues/1424)

[1]
[https://github.com/corejavascript/typeahead.js](https://github.com/corejavascript/typeahead.js).

------
oliwarner
I like to think I am very conscious of copyright. I might not always adhere to
it in my person life (who can claim they do these days?!) but professionally,
everything is done strictly legitimately. With that in mind... Am I the only
person who is slightly uncomfortable with the phrasing around PD and CC0? With
other copyright licenses there is _somebody_ there is saying _they_ own
something.

I'm particularly uncomfortable with Flickr's "no _known_ copyright
restrictions". What if people infer PD from that and upload it somewhere else
under CC0? Then it gets sucked into this finda.photo? Yuck.

As for finda.photo, why are you truncating the source down to just a domain
name?! Many of the sources include proper uploader details so why aren't you
copying those over and displaying them?

I know you're not required to, but attribution isn't a bad thing if you can
give it. I for one would be much happier using a photo if I knew exactly where
it came from.

~~~
ghaff
One of the challenges I have with attribution generally--and, to be clear, I
try to be very careful with attribution on any CC, etc. photos that I use--is
that the attribution is usually detached from the photo. (It may be stored in
the metadata--or not.) So, even though I make a point of cutting & pasting the
flickr links when I'm putting together a presentation, it's very easy for the
attribution text and the photo to become separated on subsequent use.

There are potential ways that you could fix this from a technology
perspective, e.g. have a process to create a new JPEG with the credit below
the original photo. But anything like this is going to be a bit clunky and
potentially ugly graphically.

~~~
snowwrestler
There are fields within the JPEG file itself for this information, called IPTC
fields. I know they can be read with photo-specific software like Photo
Mechanic or Photoshop, but they seem incredibly under-used by the Internet in
general. They're perfect for a use case like persistent attribution, but few
image software services seem to know about or expose them.

~~~
tombrossman
I agree these are under-used but they face the same issues as the other
methods like putting the data in text near the image on a web page or
presentation slide. These fields, same as all Exif metadata, can be
overwritten or removed by anyone with access to the file. The data can be
faked by someone intending to deceive, or it could disappear when posted to
big sites like Facebook or Twitter who routinely remove Exif data by default
(presumably to protect the majority who don't understand how GPS tagging works
on mobile phone photos, etc).

Once the photo metadata is gone, it is too easy for others to claim it is an
'orphan work' and avoid liability under copyright law. At the opposite end of
the spectrum, people like me who release most images as CC0 are annoyed that
that license tag was stripped from the metadata, preventing others from freely
reusing them. I use and rely on Exif tags a lot but they are fragile and you
cannot rely on them staying embedded with your images once they hit the web.

~~~
snowwrestler
The data _can_ be faked or deleted within IPTC fields, of course.

What IPTC fields have over the typical ways of handling attribution is that
they are not left behind when the image file is copied--so they should be more
resistant to _accidental_ removal of attribution metadata.

On most websites, the attribution is a line of text that is displayed next to
the image. Anyone copying the image, who wishes to preserve attribution, must
also separately copy the attribution text. Then they need a way to store that
text, and keep it associated with the image. Not easy, actually!

> Once the photo metadata is gone, it is too easy for others to claim it is an
> 'orphan work' and avoid liability under copyright law.

You cannot avoid liability this way. Under the law, it is the responsibility
of the person using an image to know that they have the right to use it. Just
claiming "I thought it was orphaned" does not work if you are being sued by
the actual image rightsholder.

> I use and rely on Exif tags a lot but they are fragile and you cannot rely
> on them staying embedded with your images once they hit the web.

Yes, this is my point! They're fragile because web services don't preserve
them--but theoretically they could.

The cynical side of me thinks that a lot of web services don't _want_ to know
all the rights data for the media they carry. Ignoring rights gets them more
traffic and engagement, and under the relevant law (the DMCA), they are
allowed to. All they have to do is remove infringing images when the rights
holder requests it.

------
BrunoJo
I always use [https://pexels.com](https://pexels.com). They also have only CC0
images.

~~~
imrehg
Strange, I was curious what kind of images are there, did a search for
"Taiwan", and the result is literally 8 pictures with "Shutterstock" watermark
and that's all. Is that supposed to be CC0? Even if it was yes, would that be
useful at all to have watermarked images like that?

~~~
yitchelle
Just tried the same thing. It looks like the Shutterstock watermarked photos
were an advert sponsored by Shutterstock themselves. No actual CC0 photos for
the site itself.

------
brandonheato
Why not just use flickr? A search for images with "No known copyright
restrictions" returned 663,502 results.
[https://www.flickr.com/search/?license=7%2C9%2C10&text=&adva...](https://www.flickr.com/search/?license=7%2C9%2C10&text=&advanced=1&view_all=1)

~~~
ryanlol
I'm not sure if "No known copyright restrictions" means what you think it
does.

------
vortico
I like the domain, the design is usable, and the database is great. This has
it all.

------
m-i-l
Looks good. Feedback from a designer I showed this to: it would be useful to
search based on aspect ratio (landscape vs portrait at minimum).

~~~
davidbarker
Thanks! That's actually already possible. There's a list of all the attributes
you can search by here:
[http://finda.photo/search/tips](http://finda.photo/search/tips)

For example,
[http://finda.photo/search/?q=--aspectratio+%3C+1](http://finda.photo/search/?q=--aspectratio+%3C+1)
would give you portrait images.

------
lucaspiller
Very nice! What are you using to search the photos by colour and feature?

~~~
fratlas
Probably performed dominant colour analysis on each (like color-thief), sort
by similar ranked colours using a closeness transformation like LAB? No idea
for the feature though.

~~~
elorant
Is there an algorithm for what you just described? I'm currently researching
for color classification and it seems quite a complicated issue.

~~~
kamy22
Hi, If you want you can contact me. I created a powerful algorithm to detect
dominant colors in an image using K-means clustering and lab color space.

Examples:
[https://twitter.com/kamy22/status/479040852028051456](https://twitter.com/kamy22/status/479040852028051456)
[https://twitter.com/kamy22/status/472517258418606080](https://twitter.com/kamy22/status/472517258418606080)

Have a good day!

~~~
criddell
What about feature? If I want a dog, how do you determine which images contain
a dog?

~~~
kamy22
You have to do a lot of research to solve these type of problems. I think that
neural networks and machine learning are the best way... but it's a complex
problem.

Here you can find awesome publications
([http://rodrigob.github.io/are_we_there_yet/build/classificat...](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#4d4e495354)).
It's something like a bible of neural networks :P

~~~
criddell
Considering that companies like Google are so good at this, why build your own
photo site? Why not upload all the CC0 images to a public Google Photos
library?

~~~
kamy22
I like your question. Ok, you can use 500px, Flickr, Google and other sites...
But... In my personal opinion, a developer should be curious. I'm a dev and,
for this reason, I like to give myself a challenge. It's a good way to learn a
lot, to discover new solutions, to meet new people, to improve my skills, to
create something new. So... you can use Google Photos library or you can
consider to create something different (because definitely your solution will
be different from the others). It's a choice ;)

------
petecooper
Adding Alana to the list of CC0-only stock photos.

[1] [http://alana.io/](http://alana.io/)

[2] [http://alana.io/about-us/](http://alana.io/about-us/)

~~~
scope
Adding to the list: Pixels

[3] [https://www.pexels.com](https://www.pexels.com)

[4] [https://www.pexels.com/photo-license/](https://www.pexels.com/photo-
license/)

They currently have over 5000 photos (~600 new images are added every month)

------
Flimm
When of the about pages says that the photos are on a GitHub repo, which
sounds really cool, until you follow the link and the repo hasn't been shared
yet. Hopefully it's just a matter of time before it is shared.

[http://finda.photo/search/tips#contributing](http://finda.photo/search/tips#contributing)

------
j_lev
Hi - for some reason the search bar keeps changing my search terms eg
Australia --> Australium

~~~
davidbarker
Sorry — that'll be the search trying to get a singular term from a plural.

I've tagged the images with singular terms to make them easier to search, so
it will change terms like "bridges" to "bridge", or "men" to "man", unless I
override each term. If anyone can suggest a better way, I'd be very grateful.

I'll add "Australia" as an override, for now. Thanks for pointing it out.

~~~
mintplant
Have you looked into using stemming?

[https://en.wikipedia.org/wiki/Stemming](https://en.wikipedia.org/wiki/Stemming)

Most major languages should have a library available to handle this for you.

------
frantzmiccoli
It seems that you do have a valid SSL certificate but
[https://finda.photo/search?q=test](https://finda.photo/search?q=test) is not
working properly.

------
franciscop
Check also [http://pixabay.com/](http://pixabay.com/) for Public Domain
pictures, I've found many awesome gems there

------
trtmrt
Firstly it is slow... Secondly I have typed "wolf" and I got: 3 foxes, 1 lion,
2 monkeys, 1 snow house and 2 wolfs that do not look like wolfs !?

~~~
davidbarker
Apologies for the slowness. I suspect it was struggling under the traffic. It
seems to be running pretty quickly now, though.

Most of the images are automatically tagged, and can sometimes be incorrectly
labelled. I'm aiming to work through and manually check them all. In the
meantime, I might add the ability to flag incorrect keywords.

------
chrxr
[http://finda.photo/image/14847](http://finda.photo/image/14847) \- Tags are
weird. This is not a dog, mouse, canine or feline. It's not sitting. It has
'eyes' but I think that might be irrelevant. Although I would agree that
ferrets (not an included tag) are cute, I'm not sure I'd describe them as
domestic. Otherwise, great!

~~~
chrxr
I stand corrected: "The ferret (Mustela putorius furo) is the domesticated
form of the European polecat"
[https://en.wikipedia.org/wiki/Ferret](https://en.wikipedia.org/wiki/Ferret)

~~~
pbhjpbhj
Ferrets are relatively common domestic pets in the UK. Perhaps because the
sport of ferreting was historically more prevalent here?

~~~
chrxr
Yes, I had to check as after posting the initial message I remembered seeing a
child walking a ferret on a lead in Penryn. I've never seen one wandering the
streets of Oxford though...

They don't appear to make it into the top 10 pets of 2014:
[http://www.pfma.org.uk/pet-population-2014](http://www.pfma.org.uk/pet-
population-2014)

Perhaps a ferret adoption drive is necessary!
[http://ferretshelters.org/](http://ferretshelters.org/)

------
hantusk
An idea: You could use this pretrained machine learning library to classify
your images/improve search even more:
[https://www.reddit.com/r/MachineLearning/comments/3yt4o5/dee...](https://www.reddit.com/r/MachineLearning/comments/3yt4o5/deep_learning_in_a_single_file_for_smart_devices/)

------
andreash
what is the diff betweeen this and pixabay or pexels.com? which there was one
meta-search engine to cover them all :)

~~~
rogeryu
Even if there is no "diff", this is an extra backup. Any of those sites can go
down I guess, then where are all those pictures?

------
uvesten
I really like both the selection and the color chooser! Did you do any manual
selection of the photos?

~~~
davidbarker
Thanks!

The selection is mostly based on the source sites I chose, like Unsplash,
which all have only good-quality photos. The aim was to show all of the images
from those sites.

The actual download and analysis of the images is done on my local machine,
and each image has its own JSON file. These are then used to populate/modify
the database, so I can track any changes to each image's data (if I add/remove
keywords, for example) using Git.

~~~
pigscantfly
Have you thought about automating keyword discovery or do you plan to stick
with a manually-auditable dataset?

[http://arxiv.org/pdf/1412.2306.pdf](http://arxiv.org/pdf/1412.2306.pdf)

------
quaffapint
Might be a good place for an infinite scroll when going through pages of image
results - one lest click they have to do.

------
_spoonman
Just a really great job on this. Love it.

------
edpichler
Just curiosity, where the owner possibly find all these photos to fill up the
database?

------
fareesh
I ran into a Laravel error on the homepage due to the server running out of
memory.

------
jlis
nice one!

------
juiced
My first search on "new york" returned nothing...

------
thecodemonkey
If you're still a little concerned with licensing and copyrights, I would
recommend taking a look at www.graphicstock.com - you just play a flat monthly
or yearly fee and you can download as much as you want.

Disclaimer: I work for the company behind GraphicStock. Oh, and we're hiring!

