Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I made a Mac app to search my images and videos locally with ML (desktopdocs.com)
213 points by correa_brian 2 days ago | hide | past | favorite | 167 comments
Desktop Docs is a Mac app that lets you search all your photos and videos in seconds with AI.

Once you find the file you're looking for you can resize it, export it to Adobe Premiere Pro, or drag and drop it into another app.

I built Desktop Docs because I keep tons of media files on my computer and I can never remember where I save stuff (lots of screenshots, memes, and downloads). The Apple Photos app also only supports photos in your iCloud.

Desktop Docs supports adding folders or individual files to an AI Library where you can search by the contents of your files, not just file titles.

You can search by objects ("cardboard box"), actions ("man smiling", "car driving"), by emotion ("surprised woman", "sad cowboy"), or the text in the frame (great for screenshots or memes).

It's also 100% private. Make any media searchable without it ever leaving your computer.

How I built it: - 100% Javascript (I'm using Electron JS and React JS). - Embedding generation (CLIP from OpenAI is used to compute the image embeddings and text embeddings for user queries). - Redis (storing and doing KNN search on the embeddings with this DB). - Image/video editing (the app ships with FFmpeg binaries to explode videos into individual frames and scale images).

Demo: https://www.youtube.com/watch?v=EIUgPNHOKKc

If there are any features you'd like to see in Desktop Docs or want to learn more about how I built it, drop me a comment below. Happy to share more.

Some feedback - for me (and probably for the HN crowd) saying it's powered by "AI" takes away credibility from what otherwise seems like a reasonable project.

My first impression was that you'd "just" upload my pictures to OpenAI with a prompt and call it a day.

Maybe highlight that it uses ML running locally? (I see that it's in the FAQ, but in the title)

Thanks for the feedback and for taking a look. I think that's a fair point.

One other bit of feedback on the website - consider moving the Buy button to a persistent menu bar at the top, so you don't have to intersperse it throughout the product info flow. The first time I came to the buy button, I clicked it, not being ready to buy, but not realizing there was still a ton more product info further down the page.

Apple is a good example: https://www.apple.com/macbook-pro/

Thanks. This is great feedback. Appreciate you taking the time to offer thoughtful advice.

> highlight that it uses ML running locally

I'd love to know the specifics. if there's an installable, reproducible local build w/ regular model/updates/maintenance that I could subscribe to, I'd be an interested party to a tool like that.

Oh my god! A locally run one time purchase application!

I'm so glad someone remembers how software is supposed to work.


Someone's gotta do it :)

This is cool, although FYI that Finder & Spotlight does have some support for this already since Ventura:


FTA: "macOS Ventura allows Spotlight to find images in your iCloud Drive, Photos, Messages, Notes and Finder, making it very easy to find the media that you are looking for. Apple says that it can even detect images based on the content like 'dog in car'."

I haven't been able to get this to work in Finder; can you please share which search attribute you're using?

In Finder, there is the search box in upper right. Type something like "Blue shirt", and one of the options that pops below the box will be: "Content: Contains 'Blue shirt'"

It should do a visual search for some common things, as well as OCR in image text.

Thank you for your reply. My initial testing yielded no results, but simplifying the search term to single words (like "mountain") worked to turn up a few (among many that should have appeared). Seems like there may be room for Desktop Docs, at least until Apple's own ML search improves.

Nice. I didn't know Spotlight could do that. Does it also work on your local photos and videos that are not in iCloud?

Yes has local support, I never enable iCloud images

Not sure how much low-level filesystem access you'd need, but it would be cool to support adding metadata via tags, a la


Adding on to this, it would be cool if it operated directly on the filesystem rather than having to first add everything to an intermediate "library". It's kind of a minor thing but definitely a pet peeve of mine when apps graft their own concept of a "library" onto my own perfectly working filesystem.

Demo looks neat though. I wonder if it can tell me which of my video files are SD, HD, and 4K. I've ripped so much media that I've lost track and did not name my files in such a way that it's obvious what resolution each one is. Something like that probably doesn't even need AI, just a peek at the existing metadata.

Indexing file metadata is on the roadmap. Right now Desktop Docs will render file metadata (resolution, file size, duration), but planning on making this a searchable field.

For checking the video resolutions, ffprobe should do what you want (a tool that’s part of ffmpeg). Combined with the find command you should be able to do it in one line.

100%. Desktop Docs uses ffprobe under the hood to get file metadata.

This is why I still use Bridge over Lightroom for a lot of my daily interacting with files. I don't always create an LR project. I just want to edit one or two files. So I love this feature as well.

I use get-video-properties python library for metadata harvesting. Works great

There's support for adding your own tags to files if you'd like. The tags aren't auto-generated but I've considered adding some classification features to support something like "suggested tags."

On macOS you can just attach metadata to files directly so Spotlight can index it.

I have a large on-disk photos library (in the Apple Photos app.) I have a lot of semi-duplicate photos. The main deduplicator software I tried can detect near-duplicates well (eg visually similar photos taken a fraction of a second apart) but it doesn't choose the best one to keep: it will suggest to keep a version where people have their eyes closed, and delete one with eyes open, for example. Could AI make a better choice there?

A reliably good app that helps me delete duplicates and shrink the photo library size would be amazing. Could this do something along those lines?

Similarly, I want to delete low quality images: blurred, etc. I don't know of an app that does that.

I don’t know if it’s quite what you’re looking for but there are plenty of AI photo culling software choices for photographers.

Do you have a recommendation? I'm using Photosweeper currently.

Lots of articles online that look like semi-blogspam, and I'll go through each product in them... but if someone has a specific human-powered recommendation I'll happily listen :)

I think Aftershoot is the most popular, but I’d search “culling” in r/weddingphotography to get a good overview.

I don’t know if any run locally though so pricing might not be great for someone doing loads of images…that said a typical wedding for me had at least 5k raws to go through so it might not be that bad.

One other note - I saw another comment that your AI model runs on CPU. Does it run well or better on the M1 or newer? That is, can you market it as an Apple Silicon-only app?

It runs on both intel and Apple chips. I do think it's a bit faster on the silicon macs.

Just a heads up, im in the US, and when I click the link to buy, the price is $24.99

I'd personally love it if this also indexed text files and pdfs.

The search bar in Finder does a decent job searching for text (recursively) in files and PDFs.

This app focuses on semantic search rather than full-text search, if I understand correctly.

Correct. No text search, but it's something I'd consider adding if enough people asked for it.

I'm thinking of adding support for text files. Only thing holding me back is that it would make the app much larger.

I'm also considering making the models more a la carte so you can manage which ones your want to use depending on your use case. That way you could use for text, image/video, or both.

How so? Indexing text seems fairly straightforward?

Adding more capability means adding another set of model weights.

Right but indexing text is, like, you already have the text…

For the best performance you'd probably want to use something other than CLIP's text encoder, but yeah you already have the text and could index it.

If enough people asked for it, I would consider adding a different embeddings models.


Yeah, I'd love an app on Windows that can index text files, images, and pdfs recursively through a directory.

Same - I kicked the tired with "Image Scan OCR" from the windows app store I think.. it can do a whole folder (batch process) of pics and extract the text into the sidebar, which is nice.. but it would be nice if the text was exported into a searchable thing and the text was linked to the pic/filename it came from..

Of course I want to search for data in my photo downloads / screenshots folders and be able to pull / easily grab / copy the pic(s) to another folder or upload to a site and then paste the text..

I guess chaining this into notion / obsidian or similar would be beneficial as well.

I'm working on a Windows version! I'll let you know when it's ready.

Would you consider a trial that can ingest 100 images? I’d like to try it before buying.

Appreciate you checking it out! I'm not offering a free trial at the moment, but I'll definitely let you if that changes.

Very interested in an app of this kind, but wouldn't buy it without trialing on my data. Too many content-specific gotcha's could render it ineffective, amazing or anywhere in-between.

A time-based trial could be good in that respect, and as a benefit for the developer gets a satisfied user hooked-in and not wanting to lose access.

Understood. Thanks for checking it out. I'm not not offering a free trial, but can let you know if that changes.

Yeah need to be able to try before buying. Vanilla CLIP isn't as great as advertised

Sorry what’s CLIP stand for?

CLIP is the vision model used for indexing media: https://openai.com/index/clip/

same here

Just a legal question. Can it really be non refundable? Aren’t there markets where regulators would require for you to refund?

Is OP in those markets?

I think practically it doesn’t matter. This is two people selling something for $20 (or maybe $50, I can’t tell). They tell you beforehand, so you know. It’s unlikely they operate in jurisdictions that force software to be refundable. So I guess you can sue them to get your money back. Or chargeback through your credit card.

This is why we can’t have nice things. I miss the old internet where it was just people sending small amounts of money to other people for cool things (people mailed me checks in 1996). And users didn’t have the expectation of legal expenses to account for unlikely edge cases.

Whether it's legal or not is really a moot point. They're using Stripe to take payments, which will close a merchant's account if they get too many chargebacks.

IMHO, it would be better to eat a few refunds than risk changing my payment provider.

I'd love it if someone paid me with a check.

Just trying to build cool stuff and put it out there into the world.

Just as a data point, I've sold one-time-purchase desktop software for a few years and refunds have been pretty few and far between.

I understand the fear that a bunch of people will buy it and immediately request a refund because what's to stop them! but that hasn't been my experience, and I think having a generous refund policy engenders some good will and leads to more sales.

(of course if you've actually had problems with this, disregard! :)

To address the GP's comment though - I don't know the legalities behind this, but I remember buying physical software, in boxes, from stores, where the policy was "once you open the box/break the plastic seal you can't return it" and in the digital realm it seems like downloading would be the closest proxy to that. I just think concerns about this sort of dishonesty are pretty overblown, especially on the scale of indie software.

This is actually part of the EU's returns law, digital media with the seal opened aren't required to be refundable.

Funnily enough, there's no exception for physical books, so there are some people who basically treat bookstores as libraries.

Interesting. How long does it take to build the index of a movie like spider man (on whatever hardware you have at hand)? And how big will its index be?

It should only take a few minutes.

The images all get scaled down to 256x256 before the embeddings are generated to optimize for space. Don't have an exact number on how large the index will be but happy to run some tests and get back to you. The embeddings are stored as float32 arrays of 512 length.

Are the images being compressed losslessly or they lose quality? I've played around with image compression in Electron JS and haven't found a good solution, so any resources are greatly appreciated.

The resizing process doesn't introduce any compression artifacts but there is some loss when the images are processed by CLIP, depending on the dimensions.

CLIP likes perfect squares, so that's a limitation on the model size.

In terms of general compression are you familiar with FFmpeg? It has support for lossless compression into a bunch of different image formats.

Oh ok, makes sense. I'll look more into FFmpeg, thanks.

Can this recognize individuals? I really need something that can index images by which ones have prepend vs prepend’s kid or whatever.

Apple does a really good job of this with their Photos app, I regularly search for photos of people via their name. Surely there's got to be some OSS solution for this!

It will recognize famous people, but you can't tag people yet. That's something on my roadmap. Is that something you're interested in?

Definitely. It’s the only reason I’d buy. I’ve got thousands of photos and I need to search by individual or by combinations (eg, “what’s that picture with grandma, grandpa, and uncle”)

Awesome. I'll move this up on my list of things to do. I think it would be a cool feature.

I would love this feature too.

Immich does this well enough for my needs! Really cool OSS project

Agreed, Immich is awesome and great. It can also do searches like "woman smiling", because all images are scanned by an ML agent as well.

Never heard of Immich. I'll check it out!

I sort of don't understand how does it make video searchable... I guess each frame is extracted from a video and some sort of model is applied to it? And each frame is stored to some embedding? I imagine the storage will become huge since, say each frame from a video is stored and indexed!

Correct. Videos are broken down into frames and each is indexed with a vision model. The images get scaled down to optimize for space. The more media you index, obviously that will increase the size of the index, so we're always looking for ways to improve the process for scale, like compressing the vectors or partitioning the embeddings so they can be stored elsewhere.

I'm still learning a lot about performance for this type of operation, so it's a work in progress.

Looks like image search using CLIP embeddings with frames sampled from video at a low frame rate.


Just purchased and testing it out now alongside Kino AI (https://trykino.com), which is very similar but a bit more Video Editing focused.

Great thus far, but one question -- is there a good way to remove a directory from DD's database? I accidentally added my top level movie folder, instead of the subfolder I meant to -- couldn't figure out an obvious way.

Thank you!

Thanks for buying Desktop Docs! I'm actually working on this right now. Releasing an update shortly.

This feature is out!

You should see a "Delete" button on the "Logs" page on your next update.

Excellent turnaround, thanks so much! As soon as the update drops I’ll dig back in — but it’s impressive thus far, can’t wait to see how it develops.

No pressure also, but FCPX support in addition to Adobe would be amazing.

or could just use Screenie (100% free & local): https://apps.apple.com/us/app/screenie-screenshot-manager/id...

That looks like a very different app; from the description:

> "Screenie is a revolutionary screenshot manager designed specifically for macOS Catalina. With Screenie, you can drag screenshots from your menubar,preview and drag images from the Screenie Panel, and even search the text inside your images!"

The last update was two years ago (which added support for macOS Monterey), and, apparently unlike Desktop Docs, it collects usage data and diagnostics.

Yeah, doesn't look like Screenie handles videos. And, correct, your images/videos never leaves your computer.

Does Screenie search frames inside of videos?

Looks cool. Never heard of it. I'll check it out.

>> It's also 100% private. Make any media searchable without it ever leaving your computer.

It seems you are using the CLIP model, which you can run on the CPU. Would you have any estimates of how long the indexing would take? Also curious how often you sample videos (every minute? every 10sec?)

One of the troubles I have is the large library of family videos, which become especially difficult to index. I've been thinking of everything from sample-indexing to scene detection etc.

All the images get scaled down before the app computes the embeddings, so it should only take a few minutes, but definitely depends on how much you're trying to index at once.

I'm making improvements to make indexing more resilient and faster. The videos are sampled once every second. That's something I'm tweaking for better performance as well. Was considering letting the user adjust this too.

How have you tried indexing your videos so far?

Yeah, buy a 3090, put it in a Ubuntu closet box, set it up as a server, and let it rip. It’s really not hard or expensive.

I’m tired of people trying to force a square peg (MacBooks) into a round hole (ml compute). Apple needs to get on board with making it easier and more cost effective for people to access compute more effectively. Developers are jumping through hoops to accommodate apples business goals of locking you to their ecosystem.

Sounds cool! I haven't explored that but there are lots of interesting ways to approach this type of thing. Tons of creative ways to use these models.

Very nice, thanks for sharing. I have two questions relevant to me:

- Will the purchase include upgrades? - I have most of my media on a NAS. Will this app be useful in my case? Would I be indexing a network drive?


1) Yes the purchase includes upgrades. After you install those happen automatically.

2) This doesn't support NAS yet, but it's on the roadmap. If enough people are interested in that I'd bump it up in priority.

i'm definitely interested in that

The upgrade is particularly relevant for the Apple ecosystem where updates and API changes regularly weed out unmaintained apps.

Upgrades are included in the purchase! Once you install Desktop Docs you'll get the new versions automatically.

You did not build a Mac app, you built a webpage in a shell of a Mac app…

This is an unkind nitpick of something someone has made.

It’s an app that runs on a Mac. What framework it’s running on doesn’t make it less of an app.

I do think people will have certain expectations when they hear "Mac app", though. Though at the same time, Electron is a bad word on HN. The ideal situation for marketing would be additional support for Windows/Linux so that it can be called a cross-platform app.

Guidelines don't dictate what an app is, they're just fucking GUIDELINES

Agreed, thanks.

This might be a good thing, since it means it could be more easily ported to Windows and Linux as well

This is why I'm using Electron. Lower effort to distribute for other operating systems.

Looking forward to seeing it on Windows and Linux then! I don't have a mac at the moment myself.

I actually had a similar idea to yours using python but ended up fizzling out on the idea after a bit as other projects came up.

Is it an app? Does it run on a Mac? Then it is a Mac app

So by your standard if I embed a windows emulator that runs a windows app I have created a Mac app?

Yes - the mac app is not the windows app though, it is the entire package of emulator + windows + windows app.

If it's not a mac app then what exactly is executing on the mac?

Yes, if you can embed Windows in a Mac app, you have created a Mac app. Parallels is a Mac app, too. If it looks like a duck and quacks like duck, it's a duck.

That’s the thing. Electron apps and co do not quack like a duck. They feel terrible and out of place in the macOS ecosystem. They are not Mac apps.

And the same is true of some apps made by Apple framework too! Catalyst apps are (almost always) terrible, including the ones done by Apple. The only exception I’m aware of is Messages, and even that one is not perfect.

They are apps that happen to run on a Mac. Not Mac apps.

Appreciate you checking it out.

one could say the same of Slack, WhatsApp, Zoom, etc..

And I do say it.

Can it detect and classify near duplicates? I always wanted to build something similar but only to detect similar photos and remove duplicates, as multiple backups and groupings is not manageable without a cloud service having a blind trust. Imagine I have 4-5 albums of 20 year old events and don't know which one was original vs color-corrected.

It doesn't specifically detect duplicates, but it's something I've considered adding. Would have to think through how to implement it, but seems doable.

Is this something you'd be interested in?

DupeGuru is quite good at finding near duplicates, you should check it out

Love the name. I'll take a look too.

Hi, nice idea but very basic landing page. Show more photos or add a video in the landing page showing exactly what to do. We don't understand much

Thank you for the feedback. Appreciate you taking a look. Right now there is a demo on Youtube: https://www.youtube.com/watch?v=EIUgPNHOKKc

Planning on adding more photos/videos to the site soon.

I have been looking for something like this for ages. Sadly I am usually on windows.

How does it perform with many files?

I'm working on a Windows version! Will let you know when it's ready!

Apple Photos supports local AI indexing without iCloud.

That's cool. I'm not a big Photos user, but I've heard some people really like it. Thanks for sharing!

This is really cool

Is facial recognition (of known people) on the roadmap / a possibility?

It would be handy to be able to say photos of "mom" and it display all photos with mom in them. Probably not simple to build =p

I've given this some thought and it's definitely on the roadmap. A few other people have asked for this in the comments. Tagging people is very doable. Right now the model does a decent job with famous people, but not custom faces.

Seems like it's worth bumping this up in priority.

Apple Photos already does this but you have to use icloud

I don’t think you do have to use iCloud, I refuse to send my photos to iCloud and I can do it

Relevant xkcd: https://xkcd.com/1425/

This was from before modern AI. It's not relevant any more. In fact it's remarkable for just quite how irrelevant it has become in such a short time.

i read about this technology first from https://news.ycombinator.com/item?id=40015953 and https://news.ycombinator.com/item?id=39392582. curious if OP also got his inspiration from the two mentioned - pretty cool!

I hadn't seen those! That's awesome, thanks for posting those. Love how many use cases there are for these kinds of models.

This is cool. I hope there will be a Linux version some day

Working on it! I'll let you know when it's ready.

these are the kind of Show HNs i’ve missed. thanks for sharing this

Thanks for checking it out! Appreciate the positive feedback.

This seems like a really neat tool, nice work! I wish I had a use case for it- any plans for pdf or text file support? I’d love a local assistant that allowed for parsing my pdfs and text files the same way.

This is what Spotlight on macOS is doing for all files by default, just in case you didn't know?

Does it? I thought spotlight was basically a text search. I didn’t think you could search for concepts.

That's cool. I didn't know spotlight did that. Does it work for files that aren't in iCloud?

It works on local files yes

It will only find exact matches though. No synonyms etc.

That's right. Though, I expect that this may change with the next macOS version this year. They (probably) have the knowledge how to apply a good embedding model on all text files, and their hardware is capable of doing it in an efficient way.

Thanks! I've thought about it. The only thing stopping me is the size of the app. Adding a model to handle text would make the app much bigger.

Long-term I'd like to support this, maybe by letting users add/delete models as needed.

Can you design in it such a way that it only downloads the model if the user specifically enables that feature? Add a note along the lines of "A one time additional download is required to enable this feature" or some such, and I think it would allow you to expand the features if you wanted while keeping the initial download smaller. I might be weird in thinking this would be acceptable. Also, would be appreciated that I didn't download something I was never using.

Yeah, I think that could be a good way to handle it. Thanks for the suggestion!

If this is an offline app - or at least that your images don't leave your computer, that's pretty cool. How big is that model provided by OpenAI? Doesn't the app itself become huge?

It's completely offline and your images never leave the computer. Unfortunately the models are a bit big, so the app is large.

In the future I'd like to add functionality to add/delete models as needed so you can manage how much space the app takes on your computer. That would also make it more feasible to support other models for text docs.

How fast is this and how big does the index get? I noticed in the demo there were only 100 files that were used which made for accurate response and a snappy response.

Cool project btw.

Thanks! The indexing is pretty fast. Working on making some optimizations to make it faster and more resilient. As far as size, the images all get scaled down to 256x256 to optimize for space. The embeddings are stored in float arrays of length 512.

Can I ask it to show me all the photos taken with a specific camera model (e.g., my iPhone 6 vs my iPhone 13)?

Not in its current state. Thinking out loud, you could likely do this with the metadata stored on the photos. Desktop Docs does some metadata extraction and let's you add custom tags, but this is a good idea. Is that something you'd be interested in?

Can you label people manually or do face recognition?

Not currently, but that is on roadmap to add soon. Right now you can search for famous people, but not custom tagging. I can let you know when this is supported.

This honestly sounds great and the price looks great, too.

Only things stopping me are

1. I'm on regular Linux 2. It's not clear if it sends absolutely nothing out over the network.

1. Might be a heavy lift depending on your UI library or a trivial thing if you can expose everything as CLI executables. 2. Sounds like that's the intent, I just want to be really sure.

I'm working on a Windows and Linux version. I can let you know when that's ready.

And correct, your data doesn't leave your computer. 100% local.

Same here, keep me posted about a windows version!

If this can be made into an Umbrel app that'd be sick!

For more context, I like the private nature of this. None of Umbrel apps have object recognition so it requires you to tediously add metadata tags, which becomes more of a chore and something you don't actually do, leaving your photos to "rot".

I haven't heard of Umbrel, but I'll check it out!

Would love a windows version :)

Working on it! I'll let you know when it's ready.

This sounds great. Would you consider making this a plugin for owncloud/nextcloud?

If enough people asked for it, sure!

The one guy here who thought it was something cool built with the ML programming language.

The author lists which programming languages were used, ML wasn't included. I'm sure ML refers to machine learning approach.

It does. But the reason I clicked the link is I thought it was about something written with the ML programming language. I had a compiler class in college that was taught in ML, and it is a language that has influenced others. I have not used it since. This is second or third time I’ve clicked a HN link thinking that.

You are correct, ML now almost always stands for machine learning, and that is what it stands for in this case. I was just wondering if anyone else out there did the same thing. Looks like a hard, “No.” :)

I didn't realize there was an ML programming language! I also didn't know OCaml was part of the ML family.

Is there a trial version for download somewhere? $75 AUD seems like a heck of a lot of money.

No trial version, but check the pricing again. There was an old price on there.

What's the new price?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact