Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Rem: Remember Everything (open source) (github.com/jasonjmcghee)
555 points by jasonjmcghee 9 months ago | hide | past | favorite | 196 comments
An open source approach to locally record everything you view on your Apple Silicon computer.

Note: Relies on Apple Silicon, and configured to only produce Apple Silicon builds.

I think the idea of recording everything you see has the potential to change how we interact with our computers, and believe it should be open source.

Also, from a privacy / security perspective, this is like... pretty scary stuff, and I want the code open so we know for certain that nothing is leaving your laptop. Even logging to Sentry has the potential to leak private info.




This does look cool. It reminds me of a recent discovery I made. The other day, while trying to recover some disk space, I found a giant file on my hard disk. It turned out to be a nine-hour screen recording from almost a year ago. I had no idea it existed, so I must’ve accidentally left the screen recording on. Scrubbing through it sped up, watching the whole thing in a couple minutes, was fascinating; it was like a window into my thought process at that time. You could see how I was researching something online. It was almost like a play-by-play, akin to re-watching a sports performance – very instructive and surprisingly useful.

Also, the the sense of being back in that time seeing details that I otherwise probably would’ve forgotten was transformative.

In a similar vein to what you’ve done, but focusing specifically on web browsing, I’ve created a tool called ‘DownloadNet.’ It archives for offline use and fully indexes every page you visit. Additionally, it can be configured to archive only the pages you bookmark, offering another mode of operation. It’s an open-source tool, so feel free to check it out: https://github.com/dosyago/DownloadNet


This sounds a bit obvious to me after I write it down: I think there’s some value in the fact you were unaware and it was a random time.

If you take your work very seriously, I can see it being valuable to record it like athletes do. It would be tempting to use this on the “most important” days or when you’re “really ready”. At the very least, there’s a burden of choice and memory. I don’t know about security implications, but it seems valuable to randomly record a day per month and send it to yourself a week later. Or in the case of this tool, select some period for extra review.


There's a windows tool called Timesnapper that takes a screenshot every few seconds and let's you replay and navigate.

After reviewing a few days I learned to start focusing on one thing at a time.

It was cringeworthy to see how ineffective multitasking by switching between a few tasks was.


Absolutely. Watching your playback in TimeSnapper gives a lot of insights into the way a small distraction can derail you for hours (or it does for me, I mean)


It's amazing to me the kind of vulnerable personal responsibility and insights that occur prompted by simply seeing yourself and how you act, clearly. I heartily concur with the above comments and am super happy to see other people having this similar experience.

It suggests these kind of "mirroring" self-training practices and feedback might be useful across a whole range of endeavors, which sounds awesome. A super easy way to improve -- akin to people checking their reflection in a mirror -- that a bit of technology could really help with :)


Looks like it's available for macOS as well: https://timesnapper.com/


> It archives for offline use and fully indexes every page you visit.

Oh, I also made a tool to do this! Never open-sourced, since it’s an utter pain to set up and the UX is terrible, but amazingly useful all the same.

Incidentally: how does DownloadNet work? My tool uses a browser extension to send the full-text of each webpage to a server, but yours doesn’t seem to have a corresponding extension, so I can’t see how it would retrieve the text.


Ah, good, let me introduce you to the wonderful world of the Chrome Devtools Protocol! (fka Chrome Remote Debugging Protocol)

I love this API for almost everything browser related. I built my RBI product atop this (BrowserBox: https://dosyago.com), and I think it's a drastically underrated API.

Also, it works out of the box in Edge, Brave, Chromium, and many parts of CRDP are supported by Firefox and Safari^1

1: See for example: https://github.com/WebKit/webkit/tree/main/Source/JavaScript...


I opened the page of BrowserBox but didn’t understand what it does. Can you provide an example of a real-world use case?


thanks for checking it out, sure: some of the ways people are using it follow below.

- reverse-proxy to protect proprietary code on your website from being inspected

- content accelerator (similar to mightyapp's original idea) where it's faster to render pages on a cloud vps with thick bandwidth than it is on a local device (in some cases at least!), and depending on the usage profile, it's even cheaper to serve that bandwidth, especially if you use additional video codecs.

- a framework to deliver web data collection and automation, agent authoring and intervention tooling on any device with no download

- cors-proxy to include and access content across domains for building design and test tools saas

- co-browsing for customer training and demonstration

and then there's the many cybersecurity and privacy ways including:

- standard remote browser isolation to isolate your device from zero day threats (an extra couple layers, requiring an even longer exploit chain, of protection, at least)

- to aid compliance and privacy by preventing insider data exfiltration in both directions when dealing with sensitive data (by blocking file transfer, copy paste, etc)

admittedly it's diverse, and hard make generalizations about customers.

one way i think is cool that i haven't seen yet (but want to get around to doing myself!) is a way to deliver "browser extensions" without needing either: 1) a compatible browser on your device, 2) the extension to come from a store, 3) any local download. In some sense it's safer as the extension does not run locally, but in other ways it's more dangerous as there's no central store. But it's very cool to explore, and what we really need for that is a great "developer API" that can expose a "browser extensions"-like layer.

One cool thing is that ad-blocking extensions built on BrowserBox will not be limited by the current restrictions that extensions developers face on existing platforms. The aim is to provide a powerful instrumentation api as simply as possible.

thank you for your question :)

btw - 42matters looks great! love your site design, really fantastic look. analytics is surely lucrative, i knew a similarly focused company also bearing the name 42 i'm sure. somewhere before anyway (but surely it had a different origin!). is 42 indicative of something special in analytics?


Very interesting, thanks! I’d better add this to my list of things to look into…


Those big lists! When I saw the post today about "most favorited submissions"^0, I reflected that many of them were things that might qualify as things people want to look into and learn about :) haha

0: https://news.ycombinator.com/item?id=38809642


When allowed I use a tool called Manic Time that (in the paid version) does this.

It used to be "local by default" but now I think that might be changing to "local if you want".

They have also in the past been a perfect creator of commercial software as far as I know:

- generous free edition

- paid versions work forever with its current feature set

I typically set it to auto delete after 14 days and disallow screenshots from my ordinary browser (because meetings and passwords), Slack and Teams (meetings) etc.


DownloadNet reminds me of how I got really started with Perl programming over 20 years ago. Since I was using my parent's land line with a dial-up modem (which cost cents/minute), I wanted to speed up the process of looking for a job via the government's official job search site.

Turns out, on my slow computer it was faster to clean up a megabyte of HTML with regular expressions before giving it to Firefox than just rendering it as-is - by about 30 seconds per search result page.

Perhaps it's possible to sanitize often visited websites with DownloadNet? (currently getting aggravated by reddit hiding images via JS code to prevent download / viewing in another tab...)


> Perhaps it's possible to sanitize often visited websites with DownloadNet? (currently getting aggravated by reddit hiding images via JS code to prevent download / viewing in another tab...)

Many years ago, I remember using a utility called: privoxy, on Linux/Unix, for that very purpose. No idea if it’s still viable, but thought I’d mention it, in case you’re serious?!


That's a fascinating idea. I like the idea of "custom user script extensions" that folks can plug in, author and share.

If you're passionate enough you could contribute a write-up, some code sketch or even a full PR of how this works. I'm sure you're probably too busy for that, or just not interested, and that's OK. I really appreciate the contribution you've already made with this idea.

I think allowing folks to filter, or sanitize (for whatever purpose really, sanity, focus, etc), sounds very useful.

Thanks! :)


I installed this, and nothing happened. no start menu entry, no taskbar icon, nothing. According to powershell, it put stuff in %userprofile%\.config, but that's it.

Where did everything else go? This ran and disappeared like a miner.

the localhost:22120 didn't load anything.


Archivebox and its companion browser plugin can also accomplish the capability of archiving everything you visit and may be of interest https://archivebox.io/


Did you find significant changes and blindspots in your past behavior? I recently had a similar feeling when coming across old journals unexpectedly.


Long time ago, I did sth similar, i.e. made a screenshot every few seconds, with the purpose to automatically extract information from it, e.g. how long I was using some app.

I wrote a PNG DB to split PNG images into many blocks and have each block stored in a DB. If there are several equal blocks, it is only stored once. Via a hash table, the lookup for such blocks is made fast. With this PNG DB, I have a compression rate of about 400-500%. https://github.com/albertz/png-db

Some of the scripts I used to analyze the screenshots are here, but in the end, it was not really so successful and reliable: https://github.com/albertz/screenshooting

In the end, that lead to another project, where I just was storing that information more directly, i.e. what application was in the foreground, what file was open. https://github.com/albertz/timecapture


As much as I dislike the current AI hype, a local on-machine AI model that can read/interpret videos/thousands of images (basically a recording of screen time combined with video/audio/handwriting of my everyday life), store it in an indexed format, and project it back to me in an easy to understand/quickly digestible format would be a godsend I'd invest a lot of money into (provided false positives were close to zero)


Absolutely. Combine it with real-time analysis of your current screen, and you've got a computer that knows the complete history of what you're doing and why. That kind of global analysis could be really useful.


To me, it seems obvious Apple eventually builds this into MacOS (“it’s a feature not a product”). This is like local apps or native OS features that would index your drive contents and provide a frontend to query, but on steroids. This also gets us closer to transparent computing.


Rewind claims to do this, but you'll have to trust them on the local claims, it's not open source: https://www.rewind.ai/


I would love this project to serve that need and personally want this to.


On Windows I use a small program that grabs a frame every second through the desktop API as a DirectX texture, and compresses that straight on the GPU to h265 using AMF. I'll upload the source in case it's interesting for anyone else.


I would love this!



Thanks for sharing. It's a very cool idea. I briefly tried using it but I don't have an AMD card (just Intel and NVIDIA). How difficult do you think it might be for me to implement support for something like NVENC instead? I only skimmed the code and I'm not sure yet exactly what the AMF code is doing.


Thanks for open sourcing it so fast


Thanks, I am giving it a try - any dependency required for Windows 10? Compiles fine, but get an error about AVIFileInit - maybe to do with <vfw.h>?


AMF is AMD's equivalent to NVENC. I think if you aren't running an AMD card, you won't have the necessary libraries?


Yea, I haven't really needed an NVENC version yet. Will probably add it... eventually... for my laptop. Shouldn't be too any more difficult than just swapping out the right parts, though.


Ah yes, that would be it. Thanks! I threwa quick C++ program together to do something similar using FFMPEG.


why does CPP code always look so messy and unintelligible?

like if I see C# or Python it makes sense to me at least in some way

whereas CPP code always looks like it's powering some rocket engine?

Also thanks for sharing!


> like if I see C# or Python it makes sense to me at least in some way

Could it be that you're just more used to looking at C#/Python than other things, then other things are more foreign and therefor look messy?

As another anecdote, I cannot stand browsing/looking through C# code as it tends to be filled with various classes just to basically write very basic programs. The amount of over-engineering I've seen in C# surpasses everything else I've looked at. Not to mention how people seem to arbitrary chose between private/public with no real consensus on when to use what, everything seems to be encapsulated in the wrong way. And don't get me started on the infrastructure around it, csproj vs sln and dealing with dependencies/assemblies.

But then I mostly write Clojure code day-to-day, and I realize that my troubles for dealing with C# is mostly because of what I'm used to, not because the language itself is inherently shit. I only have myself to blame for this. I'm sure people who write C# day-to-day have the same feelings about Clojure as I have about C#.


Well first of all, C++ is the language you'd be using to power a rocket engine. And second, that code is a terrible example because most of it isn't C++. Large parts of that are very C like or directly C because it's using the Windows API.


Largely because it's a melting pot of ancient and modern coding standards. Got the C Win32 API along COM style and then whatever AMF is doing. Makes things very verbose and explicit.

I've seen worse Python.

Personally, I think it's charming. :)


+1


See sibling comment response. :)


I used ffmpeg to try to do smart compression for me (diffing etc)- but run OCR first. Also did a poor man’s text merging to try to make use of the overlap from scrolling


What OCR did you use?

Tesseract?

What was the performance (of the OCR) like?


Copying text from the saved footage is wild!

I had a poor man's version of this with TimeSnapper Classic, a free Windows utility that takes a screenshot every n seconds then lets you view a timelapse at the end of the day to show how you spent your time.

After a few weeks my disk was starting to fill up with screenshots. I browsed the folders and noticed that most of the screenshots looked almost identical. "I should come up with some kind of image codec optimized for image sequences, that diffs against the previous image to save space." Then realized I had basically reinvented GIF / video codec haha. So I wrote a script to shove the timestamp (filename) into the image itself (with ImageMagick), and convert them to video with ffmpeg. 99.9% size reduction!

This looks a lot more useful though.


I forgot to explain the most valuable thing about watching a timelapse of your own day. It puts you "outside" time so you can view it from "above", essentially see the whole thing at once. (Not quite, since that would be a (still image) timeline, but the effect is very similar if the timelapse is short enough.)

Really puts things into perspective.


Reading this, really reflects how I think about rewinding my day. Scary stuff to me. I would prefer a light version of that. Maybe aggregated data, more then screen time from Apple tho.

Is there a tool recording your browsing behavior. That would be smth I'd try for starters :D


I really, really want something like this that is truly multiple platform and local. Linux and Windows are a must. Must be 100% offline so that it is useable without Internet. I'd gladly pay $60 per each major version per year. Add permissive open source license and you have me as a customer for life. Maybe I should just build it myself if others are interested?


I've been looking for just a tool like this since Rewind was first introduced. Count me very interested!


I'd be definitely interested.


About Remember Everything

I use singleFile ( browser extension) - saves a copy of every webpage I view on Chrome and FireFox. I use a program AutomaticScreenshotter to record my screen activity to capture other non browser activity. Enables me to work what I was doing on my PC at any past date. All files are saved in a Year/month/day dir structure. Finding stuff - use windows search at present.

I also use ditto to save all copy and pastes in a mysqldb.

I've been doing this since before 2010 ( the dir structure) THe extensions and screengrabs , only started that about 3-4 years ago.

I've often wonder if forensic PC investigation tools would /could also be used ( my with some mods to help produce a PC timeline of my activity.


I'm curious how much data is produced and saved every day with such a setup, if I had to guess I'd say multiple gigabytes, but that doesn't sound sustainable on any reasonably sized hard drive


> Note: Relies on Apple Silicon, and configured to only produce Apple Silicon builds.

Just curious, what is relying on apple silicon?


This is also what I was wondering. The demo is showing recording a web-browser, and I'm wondering if that is all it is doing. If so, wouldn't that mean creating a browser plug-in would make this possible on any platform?

I also don't understand the chatGPT component, and what it is trying to tell him. Though I'm sure if you just threw the URL and the screenshot to chatGPT, you could ask it questions about that source.

I'm not sure how useful this is tbh, or how I would use it. I'm not saying it isn't useful, just that I'm not sure how I would use it, or why it is useful.


> The demo is showing recording a web-browser He said it's not recording but taking a screenshot every 2 seconds and I assume it's not just for a browser but all text on the desktop.

> I also don't understand the chatGPT component You give it context from the "recording" and it answers questions you give it with that context info.


Full disclosure, I haven’t tested it on Intel, but I don’t think it will not be able to keep up with taking screenshots, generating ffmpeg videos, and doing OCR that often and will drain your battery very quickly.

But if you / someone can get it to be efficient enough, awesome!


I think you underestimate computers. Taking 2fps screen recordings is a trivial task. Doing OCR may be slightly more work but at 2fps I doubt it is an issue. Worse case you could tune the OCR frequency based on the computer's abilities.


You're confusing 2fps with 1-screenshot-every-2-seconds (or 0.5fps), what the README actually says).

I wouldn't be surprised if the battery issue is problematic, likely will result in at least some kind of battery life reduction, but perhaps not 30 or 50% at 0.5fps.

I haven't looked into the code, but if you're running ffmpeg, then battery life will likely take a hit depending on what exactly you're doing. Video encoding _can be_ heavy on the CPU/GPU.


That makes it even less work. Running ffmpeg is just video encoding, I don't think a 0.5fps video would be a huge issue.

Lots of people work plugged in most of the time. I don't see why one would want to gatekeep to keep them from using it.


What gate keeping? I just see a valid correction to your misstatement and your reaction reads like a defensive Karen wrote it.


Not supporting a platform just because it may cause it may cause battery drain which may not even matter to plugged in users seems like gatekeeping.


it's literally an open source MIT licensed hobby project. fork it and improve it and share here. complaining about it is kinda rude.


I don’t have an Intel Mac to test on- but you can absolutely just clone it and swap the config to Intel


I have to agree. If you're interested in supporting Intel(x86/64), it's open source, and you sound like you have the hardware to add support for and test on Intel.


Not supporting? The commenter simply said it may cause battery drain. It is a discussion on the topic (both sides based purely on conjecture), and a relevant one. You disagreeing does not mean others are "gate keeping". Stop trying to weaponize trendy language and white knight this thread.


The original README was claiming that relies on Apple Silicon and that they have configured builds to exclude other Apple platforms. I see it has been greatly softened now to "Only tested on Apple Silicon, and the release is Apple Silicon" which I think is quite reasonable.

I have no problem with not supporting a platform because you have no interest or any other reason, but previously it was quite proud to not support it which is different.


Ridiculous. You are working very hard to be offended.


I haven't looked this codebase yet, but a screenshot every few seconds isn't a noticeable slowdown on most machines.

At such slow rates you don't need to create video - you just keep the individual images.

OCR doesn't need to be real-time, but can be done in batch mode or when the machine is idle.


I had been doing that with opensource Linksys ip cameras since 2010 and they only have like 180mhz and 32MB RAM. What are you thinking about?


Congrats on getting this off the ground, and thank you for putting it out there for us to learn from!

I've been curious how Rewind worked under the hood because I've been playing with an idea in my head: an AI assistant that helps you protect your attention.

You would describe the kind of content that you consider a distraction, and any other constraints you have (e.g. "Don't let me watch cat videos unless I'm on a break".

And whenever it sees you watching anything that fits your prompt, it'll pop up on the screen and start a conversation with you to try and understand whether you actually need to consume the content you're looking at.

An AI that intervenes when you're going off track (based purely on how YOU define going off track). Current website blocking approaches aren't useful because they're all-or-nothing. I don't ever want to block entire sites because often there's useful content there relevant for my work. I want to block content on a much more granular level.

And I'd love for an "attention audit" at the end of each day. Attention is our most valuable asset, and I believe protecting it is a worthwhile endeavor... I'd just like some help doing so :).


I encourage you to fork this repo and build it.

Might be worth checking out Ollama and bakllava. https://ollama.ai/library/bakllava

Maybe the model is a bit too slow, but I'm sure smaller ones will come out soon. You can likely fine tune to do exactly what you need.


Thanks for the share! Will check it out.


Oh this seems like a wonderful idea. Loads of invasive privacy issues if you’re not doing the detection locally but I’d absolutely use something like this


Thanks! I agree that everything needs to happen locally, and I believe it's possible.

I'd love to better understand the problems you're facing that makes you want to use a tool like this.

Couldn't find your email, but if you're interested in chatting, you can find mine in my bio. Would appreciate it!


Really like this. I might use it as a way to keep myself accountable.

I wonder if the screenshots can easily be categorized as "time wasting" vs "productive" (possibly via ML model?). Could optionally gamify statistics. Example last hour: 78% productive, 12% hacker news, 10% inactive. You could go for your own high score (e.g. 3 x 100% hours in a day would probably be a great day for me!).

PS: love the video demo. I figured out what this does in < 30 seconds. Thank you!

PPS: (very tangental) video speed controller (browser addon) now works with loom videos - a few months ago that wasn't the case.


https://www.rescuetime.com/ already does what you describe very well, without ML. I've used it for years now, for personal accountability.

It even already does the "high score" thing you are talking about, LOL


You can list windows and detect the front window using macOS APIs instead of taking a screenshot and running OCR/detection.


Somebody else pointed out RescueTime, but if keeping it local is a priority, I recommend Qbserve, which I've been using (mostly passively in the background) for a few years now.

[0] https://qotoqot.com/qbserve/


This category of software was actually really useful when I wanted accountability during R&D tasks on engagements. I used Timing, and it would parse the active window titles and create a timeline. Then the creator wanted to charge $80/year and I ended up dropping it completely. I also kinda realized that this sort of software isn’t that different than a RAT and an attacker could target these sort of things. I also figured Apple would’ve opened up their screen time API by now and this class of software would become redundant


Don’t show the VCs that invested $27.9M into https://rewind.ai this

They will be very upset


Rewind.ai looks a lot more full featured (unless their site is complete BS and none of it works yet). Doesn't matter though because Apple will rebuild this themselves in 2-5 years with an on-device LLM chip that you will have to buy new hardware to get and it will be way more efficient and with way better privacy.


From the repo, OP did this in couple of days with no experience in swift. So getting to rewind stage is not that hard it seems


This has all the makings of the original “Dropbox is just rsync” comment.


Yes but in this case both apps are relatively new and not established ones if I’m not mistaken


Similar to the “Loom is just OBS with Dropbox on top”


for now it's easy to catch up. But after a few months they will be so far in the sky from all the vc money that it will be tremendously harder. Like the M1 chip for example


Works well. Been using it since beta. I got a memory like a gold fish and this comes in handy.


a16z deploys capital fast into AI companies. They've already funded several companies running off the shelf open source models.

Find the latest flashy thing on Twitter / GitHub, spin it up with a waitlist, then send a16z your deck.


Guess it’s an improvement over deploying it to sociopathic felons which was their last claim to fame.


I feel like recording everything is like recording nothing in practical terms


Think of it as closed circuit tv for your computer. You don’t need to watch 24:7 but you can go back for specific incidents/information.


Yeah I understand that, it seems that it tries to classify activity in order to help finding relevant stuff seeing

  let configuration = ImageAnalyzer.Configuration([.text])
                    let nsImage = NSImage(cgImage: image, size: NSSize(width: image.width, height: image.height))
                    let analysis = try await ImageAnalyzer().analyze(nsImage, orientation: CGImagePropertyOrientation.up, configuration: configuration)
                    let textToAssociate = analysis.transcript
                    let newClipboardText = ClipboardManager.shared.getClipboardIfChanged() ?? ""


It lets you query any data once you realize what is important (which might vary depending on the question you're trying to answer).

It's like law enforcement tracking everything we say. They aren't catching many people right now, but wait until the future when they start working backwards with logs.


And when things are illegal which weren’t illegal when they were said.


i record every command in .zsh_history (like everybody else does by default, but mine is configured to not have a size limit)

i often do things like

history | rg ..

it helps when you roughly know what you want to find, but want to check some detail you forgot


For those unaware: CTRL+R in terminal will also change your prompt to search your command history. After typing, CTRL+R again to cycle through matches.


...and if you're like me - I live on the terminal - tools like atuin[^1] are very handy.

[^1]: https://github.com/atuinsh/atuin


what you really want though is fzf with C-r


Would agree 100%

fzf supercharges your shell history I can’t imagine my life without it since I spend most of my day in terminal


Okay that's interesting, thanks


Loving this concept perhaps really useful for my work laptop (as in my own one but is only for work stuff) as quite often you just want to quickly backtrack and find that piece of info you looked at earlier rather than navigate to it again. I’d imagine something like a physical wheel on your desk to wind back would be amazing. I have a useless Bose one that never gets used, can imagine it would feel very “black mirror” to use that to rewind.


I love it. The touchpad feels pretty good, but a wheel would be incredible.

I debounce the livetext analysis on history so you should be able to spin fast without issue


Sweet. Fast spin for the win!


A lot of custom keyboards have wheels (search "rotary encoder"), common enough for qmk to support them (https://docs.qmk.fm/#/feature_encoders).



Rewind has search - it just works better than rewinding manually.


How similar is this to rewind.ai (https://www.rewind.ai)?


I only used rewind at alpha, so not sure how much they’ve added, but it has the value i got out of it, and doesn’t limit your searches arbitrarily.

- takes screenshots every two seconds - records all the text via ocr - builds full text search with sqlite - allows you to go back in time however far and select/copy text from there

No meeting recording / audio recognition. Kinda irks me. Easy to add though.


Rewind relies GPT-4 for the useful parts. I assume Rem will support local LLMs?

https://help.rewind.ai/en/articles/7791703-ask-rewind-s-priv...


That's the plan. Very open to ideas on the best way to do it. Seems like either Stdin/Stdout or API call via localhost.


I never heard of this until now but this looks amazing

Would be even more amazing with a locally running LLM


That’s a core purpose of the project!


definitely potential for nightmare scenarios - employers would love using this type of thing to fully surveil staff. Plug it in to AI and you have real time monitoring of everything everyone is doing with alerts.


Not quite the same thing (no screen grab) but there is a non-visual cousin, http://arbtt.nomeata.de that records X11 data and provides a query language to produce summaries of the data.


This is a lot like RescueTime which I use for personal accountability as to how I am actually spending my time

https://www.rescuetime.com/


This is very cool, I am building a tool [1] to record 1H of screen at a time (to help developers debug errors while doing exploratory testing) and I always thought that I could add a layer to turn my 1-hours-brain-recording into a baby Rewind.

I have tried Rewind in alpha/beta, it was cool, but it was never something I felt like I needed. That being said things change, and maybe I'll change my mind when it's part of the OS in a seamless way, but it's sketchy for as long as it's not offline: let alone the privacy consequences of running Rewind ;)

[1] https://dashcam.io


will be interesting to see if or how this technologies will be used in ten years, or even five. To me, it seems curious that we posses the most powerful memory ever created, and we're constantly trying not to use it.

On a more serious note, I wonder if such tools hinder creativity. By not remembering things directly, one could build the habit of relying on such tools for everything. Given creativity is the ability to recombine past memories into future ones...


“pretty scary stuff” indeed!

This would inevitably end up ingesting secrets, right? Like say from my password manager? Or API keys in my terminal?

Lots of ways for this to go sideways even if the data stays local.

What’s the plan there?


Come together as a community and help build the right thing. This isn’t the first implementation and I don’t have a fiduciary duty to create value to investors.


> Lots of ways for this to go sideways even if the data stays local.

Could you name some?


The impression I was left with is that this tool would write things to disk. It would be helpful to know how that data is stored. I wouldn’t want my password manager OCR’d and then sitting in plain text on disk for example.


> Like say from my password manager? Or API keys in my terminal?


That's not describing a bad outcome, it's describing how the tool works.


Oh, well I think what he meant is that some malicious program could read and transmit this unencrypted recorded data which is normally stored in an encrypted form


Thanks, I think so too, but the threat model is a bit odd. On a Mac, potentially malicious programs do not normally have access to files in every location (e.g. the prompts to allow a process to access your Documents dir); there is hardware-backed crypto available for further protections; full disk encryption; and so on. It's unclear to me how to evaluate the severity of the risk.

Every security decision is a risk-reward tradeoff, and the reward of a complete memory of computing tasks seems pretty huge.


Genuinely so pleased with this! I'm a big fan of Rewind as a concept because, in general, I write down everything I do every day (helps with my poor memory, adhd, and in general), and a tool that can record my online-actions and facilitate search too is a game-changer for me (specially when it doesn't come with the hefty price tag).

Couple things -- 1) Do you ever plan on introducing shortcuts to toggle rem as opposed to using the menu-bar option?

2) What's storage like? Will rem's size keep getting larger and larger until I purge all? Or will I eventually get to select how long I want to 'remember' something (without manually having to 'start' and 'stop')?

3) One bug I've caught a few times is - once I toggle timeline and just slide back to view previous 'memories', >50% of the time my device won't let me exit out of the 'timeline' and I'll have to exit out of rem (cmd+opt+esc) and re-open it.

But all in all, very happy with this!


Very cool demo OP. Not sure why it's only for Apple Silicone, is it because of it's superior ML support compared to windows? Side oservarion, Olama is not available for Windows. Sadly I won't be able to test this out since I don't own a Apple Silicon notebook, I only have a Apple Intel and beefy Windows.

I don't know if I am a basic programmer or lack the idea on how do folks go build something like this from scratch with no Swift programming language. If I was OP, I would first do bunch of Swift tutorials.

This will be wishful thinking but it is a legitimate side project to make a clone of this to work on Linx oe WIndows on programming language I am most comfortable in, Java and C#. I have zero background in building anything in ML and not at all familiar with Direct X api or Linux Desktop API or Direct X.

The point I am trying to make is there is crap ton of API and tools to be familiar with before even taking the step to code.

How did OP crack this with no exp in Swift to build this? Is it simplar to build project on Apple Silicon and I should get one?

Mind you, I have 4 YOE and code in Java and C# doing vanilla web API and bit of WinFoem/DevExpress work.


I have been in software for a while now. I just love learning and doing cool stuff.

As for being able to hack in Swift specifically... I'm comfortable in 5-6 languages, and play with many more. The language itself feels like a mix of C# and Kotlin to me. But I had no familiarity with Mac OS APIs or SwiftUI / MVVM etc. but there's lots of docs, despite them having effectively no examples.

The repo has a lot of room to grow in terms of quality, which is perfectly ok in my book.

Re: Apple Silicon - I should have just said "I built it on my laptop which is an M1 Air and it's managing to keep up with the screenshot -> OCR -> ffmpeg rendering pipeline and not completely drain the battery and have a strong suspicion it will require a lot more work to get it to perform the same on Intel computers"

As for clone it and build it in Linux / Windows - do it. And there are other comments here suggesting others want to do the same.

I personally want this thing, if I can impact speed of it happening positively, awesome.

I want to record everything and have this nice big dataset which is what I've experienced (on my laptop) the last X amount of time, and be able to do stuff with it - whether it's chat with a local llm or have a really good way to search back in time. I _constantly_ need things from the past.


I suspect it's Apple Silicon only because there are simple APIs that Apple has provided to take screenshots of the desktop and OCR text from images. I don't think OP necessarily built any low-level code from scratch here (not that their provided code isn't useful).

Someone else has previously looked into how Rewind.ai may be doing its thing under the hood and there are more details about it here: https://kevinchen.co/blog/rewind-ai-app-teardown/ OP may have used some of the info there.


I think it's Apple only because OP made this on an Apple M1 and that's all he intended to support.

As for OCR on Windows, there seems to be that functionality available on an OS level: https://medium.com/dataseries/using-windows-10-built-in-ocr-...

There's a Power Toy that uses it, which is where I found out about it. I have no idea how it compares to Live Text or how good it is.


You could set up screen sharing for any system that supports it, and on another system receive the stream and use an LLM(s) to run object ("cat, piano") and text detection on images and sound. Mac silicon can do that, but it's not as efficient or cost effective as a dedicated GPU. This approach would be system neutral since models work on (increasingly) any GPU including Apple's. On your workstation, you'd just have the overhead (and concerns) of the video stream.


Cool stuff. Interesting to see how these ideas evolve, now with LLMs. I made the similar thing some time ago (>2yrs): https://shkur.ski/chronocatch/ for Mac/Win (Intel, H264 for interframe compression and BM25-ranked search). Then the war started and I regret not sharing this back in time "as is" when I could.


Really cool idea, I think this could open up a lot of possibilities. I'm just not sure what yet..

As for storage what is the plan/expectation for preventing the DB from blowing up? After running the app for 30 minutes the DB was 32mb in size. Not huge but over a days worth (16hrs) or so of solid use it begins to creep up (~1gb).

(32mb * 2) * 16 = ~1gb/day

Not sure how this would be feasible over the course of a year or even several months.


Surprisingly this is something that I have been thinking on for the whole 2023.

I myself am really bad at documenting findings while doing research or bugfixing so I started at recording all my daily activities for both replaying research sessions and also for my future me in case something is not clear in the docs.

Then I knew rewind and I was happy to know that I am not alone. This REM is the confirmation that this definitely has great use cases :)

I’d rather prefer the recording phase to be as lightweight as possible so I am recording the full mp4 video and plan to re-encode at a lower rate at night. But there is a compromise between recording quality and file size, I do not want end up with several Petabytes of videos.

What codec do you recommend for this use case? Lossy video codecs usually are very efficient for real images (just like the comparison between jpg/png) and I am sure a video format that is PNG based should be more efficient in space while preserving text quality.

I am very interested in read your thoughts about this.


I used h264_videotoolbox which is supposed to be efficient for apple hardware. I'd like to get hevc_videotoolbox working.

rem does OCR in memory before streaming to ffmpeg. But it works on the screen grabs of the video anyway.

Yeah, it's a pretty different use case than other video. Curious too if there are "screen recording optimized" codecs.

Like non-contiguous diffing. Instead of "diff from last frame", "diff from frame X"- and/or some sort of quad tree hash lookup


Comments:

- Insanely useful with some changes.

- Needs local llama support for privacy.

- Needs pause/record functionality, ideally w/ preset exclusions, again privacy.

- If this could evaluate in real time at some point and start intelligently adding value at that point it has the chance to change things.

My guess is that in 10 years this will seem absolutely archaic. Now, it feels a bit like magic.


Thanks for the feedback! You can start / stop remembering whenever you want.

As far as real time stuff and local llama- absolutely, on the roadmap.

I’ve been exploring / experimenting with embedding spaces and local models a lot.


Can anyone suggest a linux equivalent of this project? X or Wayland - doesn't matter to me?


I have been recording, what I type, or copy, or windows titles of applications I interact with for past 15 years. And it has helped recover stuff that wouldn't have been possible without this system.

I recently switched to MacOS, and I'm missing this very much.


I am building this exact app you're using on Windows but for MacOS - I love hearing that you're also a fan of screen recording!


I am basically running a keylogger. Not screen recording per se.


Mind sharing the name of the app you use for recording on Windows?


https://github.com/XEonAX/Kiilogger I am the author. BTW it might get detected nowadays. I had to add an exception to windows defender few years back.


Serious question: I have a serious case of OCD where I keep trying to remember things verbatim (the verbatim part is the OCD). Naturally there are a bunch of checking and repeating in trying to do so.

I have been considering the idea of using a similar app to this (or rewind.ai), but I have the concern that it might aggravate my situation. Just imaging my checking self watching 12 hours of video footage already gave me chill.

I would appreciate if anyone with a related or similar situation can share their experience using those apps. Since this is fairly sensitive, my email is also in the profile if anyone want to contact me directly.


There are folks who mention that Rewind.ai has be invaluable for managing their ADHD on their Slack community. Perhaps if you join their Slack [1], you might be able to meet people in a similar situation as yours (with your type of OCD)?

1: https://rewind.ai/community


You’ll end up with inception levels of watching yourself watching yourself…


I wonder if there's a way to leverage this application to create a user profile while keeping the data locally (storing, processing, etc.), just for the user to know _what_ social media companies know (or think they know) about the user.

If this application monitors, stores and analyses social media presence, email, etc. Could the application present to the user a profile similar to what Google has for the user?

For example, would be interesting to know how Spotify or Netflix sees me in technical and/or social terms.

This idea for such application comes from Yuval Harari.


Pretty interesting stuff.

I'm just wondering how you manage the limitation of context length.


For the "copy recent context"?

The last 15 frames.

It's a terrible approach! But I had to start somewhere. Actively experimenting with properly leveraging embedding search.

But I've had a hard time finding CPU + RAM efficient vector indexing + search that meets my expectations. Been doing a lot of personal experimentation and research in this space.

Is there a known approach to be able to maintain a large embedding space that you can insert into efficiently and search accurately without needing to load / maintain the entire thing into memory?


Have you tried using the Accessibility API instead of (or alongside) taking screenshots? It wont work with all apps but you can fall back to OCR when it doesn’t and best of all you can monitor the “DOM” for changes.


Candidly, I don't know how to do this effectively, especially with browsers. I looked into this approach using the notification pattern, but I just couldn't see a good way to do it. I'm no expert in Mac APIs and would love to learn and / or see any specific approaches you have in mind!


Used to do this a several years back but on a windows machine and without any of the AI stuff obviously. One use case I found is for tracking down unpredictable and seemingly randomly occurring bugs since you can rewatch the events leading up to the bug and form better hypotheses about what might reproduce it.

Eventually I had to stop because the fan was going crazy, plus I couldn't bear seeing how slow and error-prone I was at typing and at generally operating the computer (it never felt that way when I'm using the computer, but watching myself using it is a different story)


I am building this exact software for exploratory bug-testing. What have you been replacing it since your last usage on Windows?? I think I tweaked the recording aspect to be super-clean and CPU/memory impact is minimal now (1%)


Haven't found any replacement, but dashcam.io looks very promising for that use case, will definitely be checking it out!


Another obvious option is to just access the browser’s History file and request and store the contents of each visited page. This prevents you from needing to do OCR and is more highly compressible. Or do your method, but throw away the screenshots after AI analyzes and OCRs them. BTW, Mistral 7B is good enough! We don’t need to rely on ChatGPT4 IMO and copy pasting context is a bit sloppy.


Yeah that works well for browser stuff, but this works with IDEs etc too

and totally. Haven’t added direct local interaction yet, but on the roadmap.


I wanted to build a similar tool that just relied on browser history. But I couldn't figure out anyway to do it (especially not through browser extensions)

If anyone has any suggestions, I'd be more than grateful.


Interesting concept, however I don't get what information is pasted into the context. Also, ChatGPT's context is kinda limited, I can probably remember the recent context, what I have problem with is context from let's say a week ago which would probably be way over the LLM's context window.


Admittedly, it might have been a mistake as a demo / feature, but haven’t built embedding support yet. Working on it!


You should also post this to https://www.reddit.com/r/LocalLLaMA/, since it may be useful with local LLMs.


Feel free! I don't want to spam / market too much. A single post on HN and putting on my personal twitter seems like a good amount.

Maybe once we actually add support for interacting with local llms directly.


> add support for interacting with local llms directly.

If that’s a WIP or even just on the ToDo list, you could post a request to that subreddit asking if anyone is interested in helping implement it. That’s not too spammy, and you’ll get reply notifications you wouldn’t see if I posted it.


I use the shortcuts app on the iphone to create a widget that , on a button press, records a dictation, transcribes it, and appends it to a note together with a timestamp.


Anyone else mentally associate REM with QBASIC?

https://www.qbasic.net/en/reference/qb11/Statement/REM.htm


I associate it with Commodore 64 BASIC V2


bat/cmd scripts for me

https://ss64.com/nt/rem.html


only half remembered and had a vague feeling it's familiar. It's been a while. Realizing now BASIC is pretty weird.


This looks super-interesting! I haven’t seen the questions yet scrolling through a number of comments, so:

- how much disk storage does this use, say per hour of typical computer use

- how much CPU/battery life impact does it have


- disk storage: depends what you're doing, but seems to be about 150MB / hour. This is WAY too much in my book. I think we could get this way down. I'd love help here too :D - Candidly, haven't done extensive testing in battery department, but someone posted an article in a sibling comment https://kevinchen.co/blog/rewind-ai-app-teardown/ and it said "Overall, running Rewind reduces my battery life by about 20–40 percent." - that's way too much imo. And I assume `rem` is in that ballpark, as based on this article, it's doing pretty similar stuff, though way fewer accesses to disk as I don't write screenshots to disk, just stream directly to mp4.

But - I built it in a few days and haven't spent time on optimization. This can be clearly seen in the binary size too. First build I released was 165MB, and after I took a step back, fixed the build process, and built a custom build of ffmpeg, it's now 3MB. Seeing as Rewind is a 211MB dmg, they aren't doing a ton of optimization either, so I think there's tons of room for improvement.


Cool! I would be interested to hear what Apple Silicone specific features this uses? Is there some sort of image processing feature that Apple CPUs offer that are being leveraged?


I'm curious as to why you chose to turn the screenshots into a video. What are the benefits of storing them like that instead of as image files?


Dramatically smaller size on disk. Video codecs leverage representing things using diffs. Think about a 2 minute video of someone reading an article online. Then think of 60 screenshots of someone reading that article over 2 minutes. The 60 screenshots are likely ~15-30MB. The video is probably like 3MB or less and that’s without doing much of anything. Any time the user is idle, that’s kind of free in a video. An image, it wouldn’t be.


How much more complex is it to go through the images as frames in a video? Is all the processing (like OCR) done before it's added to the video or is that something you can do on each frame?


As this seems to generate quite a lot of positive feedback, what would be use cases for something like this? Asking not only OP.


At one point I was considering building something similar for myself. Basic idea was something like: Take one screenshot every second, caption the image somehow and keep both things around forever. Add in some adapters that can extract more information (if the browser was active last minute, gather all URLs from that minute and categorize, and so on with different things) and put everything into one location.

Purpose for doing this would be to get a database I can search/query when I kind of know what I'm looking for, but I cannot remember exactly what it was. Being able to query "show me all websites I've never visited before, but visited first time in week 35" would help me a lot to find those easier.

Also just having a recorded log of everything I'm doing would be helpful to see where I'm spending my time the most.


An easy one for me is programming.

This kind of approach the only way I know how to be able to go back in time and recognize / resurrect your thought process.

But there's little thorns it solves all over. Ever experienced knowing you did something X days ago but it's in the past and there's literally no way to go back and look at it? Ideally, it solves that.

Version control / history is great if the app supports it, but depending on how it works, "a month ago" might not be available.


Does it do inter-frame compression at all?

Also, integrating with Ollama.ai or some other local LLM with an API server would be fantastic.


I’d love your opinion on the right way to do this! Being able to call APIs means network permissions- which i was trying to avoid. Maybe via Stdin / Stdout?


Jason - great work here. Your Swift code looks like mine :) on this, some folks in the UK have created Crux - an interesting abstraction layer for mobile apps using Rust. Might provide some ideas for optimisation/ipc. https://github.com/redbadger/crux


were you trying to avoid network permissions (I'm guessing) because this is Docker? (That's the only reason off the top of my head for wanting to avoid network access... in a non-Docker context, localhost is of course easy to hit up, but Docker and nets are a PITA)


Are there instructions on how to launch the app? I’m able to clone the repo but a bit lost on next steps


Open in XCode and create an archive or run it directly.

Or you can ise the release i uploaded.

I added instructions for how to use it once it’s open in readme.

Apologies for anything unclear!


Imagine raising a $1B + valuation just to have some random guy on HN make an open source version of your company...

VC economics are going to need to change with AI and I think many haven't got the memo.


How is the latency, impact on battery?

Would an iphone version be possible


Maybe in 5 years apple with release a native version of this.


It's not going to take 5 years… seems like a natural addition to the existing Time Machine feature.


It's interesting that as someone who seems to care about privacy and security, you would use a closed source, web browser (Arc Browser).


Now I no longer need to wonder who Rem is.


Can also see this being used by scammers/malware. Not saying it shouldn’t exist. It’s really cool. Just scary. Great job.


Fwiw requires you to explicitly give it permission to record your screen. Would also require you to explicitly give it permission to use network if it needed to make any requests.

I’m super glad about this personally.


I think op is referring to a similar attack vector used in the recently presented “triangulation exploit” wherein attackers used iOS’ stored data from its own local machine learning engine that classifies photos using object recognition and stores text from images with ocr to prioritise which photos from a victims phone had content of interest for them.

Seems a legitimate concern; unsure why op is receiving negative attention for saying so.


Precisely, although I’ll come clean and didn’t know about that exact triangulation exploit mentioned, just the fact that nicely organised historical data of all actions is there somewhere to be found. Especially if this type of software starts getting all the modern ai magic on top, this could be used by everyone as standard tooling and perhaps be a target.


cool concept, love the idea. Might be fun to integrate with local llama to get most privacy


100% and local embeddings. This is the area i want to explore next.

The demo i showed with chatgpt works just as well with openhermes2.5-mistral. But is instant with chatgpt instead of 20s


(mac only)


This is really awesome


Thank you! I hope it can become more awesome and be useful to people.


I think my employer does it for my anyway lol




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: