Hacker News new | past | comments | ask | show | jobs | submit login
Photo Metadata and Search on MacOS (28mm.github.io)
90 points by jjwiseman on Feb 1, 2018 | hide | past | web | favorite | 18 comments



Your script has what looks to be a pretty big inefficiency here[1] that's slowing it down. Looping over a set kills the constant-time presence-checking that you get from using that data structure; it will likely be much faster to do something like this:

    tags = list(set(descr.split()) & categories)
The following would also work:

    tags = [x for x in descr.split() if x in categories]

[1]https://gist.github.com/28mm/9820bd8b6eb27555efe9d6f46dd95a8...


Ah, interesting observation. I’ll look at changing it to something more like what you’ve suggested.

If memory serves, the reason it doesn’t first split the description on white space is that some categories contain whitespace, and would never match.

Thanks!


If you need to check against multiword tags, I'd suggest a utility function to expand a list of words into each possible one-or-more-word subset. Should still be substantially faster than the current state, and you can improve it even more by limiting it to phrases with no more words than the tag with the maximum number of words.

    def get_all_phrases(descr):
        words = descr.split()
        if len(words) == 1:
            return words
        phrases = []
        for i in range(2, len(words) + 1):
            phrases += get_phrases_of_len(i, words)
        return words + phrases

    def get_phrases_of_len(length, words):
        return [' '.join(words[i:i+length]) for i in range((len(words) - length) + 1)]


> Undoubtedly, launchd can be told to disable photolibraryd, but the approved mechanism wasn’t immediately obvious to me.

You want "launchctl unload /System/Library/LaunchAgents/com.apple.photolibraryd.plist".


Author here—-that is vastly better, thanks for pointing it out.


Why not just copy the database (and wal) to tmp before accessing it?


Lack of familiarity with SQLite—-and maybe some mistaken assumptions about its behavior :)


I often copy first because I'm afraid me locking the database is going to interfere the app. Although, for my Apple Notes extraction, I am hitting the database directly.


This should also do the trick:

    launchctl stop com.apple.photolibraryd
(And replace “stop” with “start” to get it going again.)


It's also absolute madness that the native Photos app on iOS will not search the meta-data of the photos, only of the apple-restricted "scene recognition" objects, and dates.

Add a description to a photo, search for that description in Photos for iOS, and there will be no results.


Apple’s Photos team seems to be actively working against metadata support. It used to respect EXIF/IPTC information on import, but High Sierra for example completely ignores all metadata dates and sets the time to file modified time. ಠ_ಠ


I wish there was an open source Photos/Picasa type of app for power users. Something that could create and export indexes in many formats.


There are, but they don't work well. Like darktable.

Out of all the photo library management apps, apple photos is by far the most performant.


This is a major shortcoming of MacOS to those of us who use it heavily for photography. Or even those with large cameraphone libraries.

The data is there. It should be available to all applications!


”Instead, I opted for an egregious hack. In order to lock its database, photoslibraryd, need to be able to write to it. I simply removed write permisions. […] Now, it’s possible to open $photodb, and poke around.“

If photolibraryd has that file open permanently for exclusive access, I think that shouldn’t work; photolibraryd either should keep write rights and others shouldn’t be able to open the file, as long as it has the file open, or you run the risk of database corruption (probably recoverable) when photolibraryd can only write half of what it wants to write (which will happen is, I think, implementation defined because of the existence of NFS)

I also think it is unlikely that photolibraryd has code checking for that the file becomes read only ⇒ apparently, photolibraryd periodically reopens that file for writing. If so, one should be able to race it for exclusively opening it.

Also, I’m curious. Photos.app must have a way to open that database for writing. How does it do that? Does it stop photolibraryd? Send it a signal?


Doesn't it seem more likely Photos.app simply communicates with photolibraryd?


This is a clever idea. I had just assumed that the photoanalysisd tags were indexed by Spotlight already!


I hope he files a radar about this :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: