I think it's better to have the AI write scripts that extract the data required from logs vs directly shoving the entire log content into the AI.
An example of this is: I had Claude analyze the hourly precipitation forecasts for an entire year across various cities. Claude saved the API results to .csv files, then wrote a (Python?) script to analyze the data and only output the 60-80% expected values. So this avoided putting every hourly data point (8700+ hours in a year) into the context.
Another example: At first, Claude struggled to extract a very long AI chat session to MD. So Claude only returned summaries of the chats. Later, after I installed the context mode MCP[1], Claude was able to extract the entire AI chat session verbatim, including all tool calls.
I recall I enjoyed Hoplite, Data Wing, Mini Metro, Super Mario Run, and a few others.
---
You probably already know Apple arcade curates a set of games. Many of the 'plus' versions of games have the ad/loot box features stripped or set to "free."
A note on Super Mario Run. When this first came out I tried to play it on an airplane and it didn’t work. There was some sort of phone home to make sure it was a legit copy on launch. When it couldn’t perform this check, the game wouldn’t load.
Things could have changed since then, as this was many years ago, but something to look out for and check if this is a concern.
# I think it's possible to architecture around this. For example, here is one idea:
- make the game as functional as possible: as in the game state is stored in a serializable format. New game states are generated by combining the current game state with events (like player input, clock ticks, etc)
- the serialized game state is much more accessible to the AI because it is in the same language AI speaks: text. AI can also simulate the game by sending synthetic events (player inputs, clock ticks, etc)
- the functional serialized game architecture is also great for unit testing: a text-based game state + text-based synthetic events results in another text-based game state. Exactly what you want for unit tests. (Don't even need any mocks or harnesses!)
- the final step is rendering this game state. The part that AI has trouble with is saved for the very end. You probably want to verify the rendering and play-testing manually, but AI has been getting pretty decent at analyzing images (screenshots/renders).
That's true - state serialization can definitely help.
> AI has been getting pretty decent at analyzing images (screenshots/renders).
I've found AI to be hit or miss on this - especially if the image is busy with lots of elements. They're really good at ad-hoc OCR but struggle more with 3d visualizations in a game that might be using WebGL.
For example, setting up the light sources (directional, lightmaps, etc) in my 3D chess game to ensure everything looked well-lit while also minimizing harsh specular reflections was something VLMs (tested with Claude and Gemini) failed pretty miserably at.
Seems like this one is Windows-only (even though it's Tauri?)
And it's not local (uses a cloud-based transcription API)
Also doesn't seem like it's realtime streaming, either. To get the most connected typing experience, try showing results in under a second from within the first word spoken (not after the utterance is complete)
Streaming transcription is something I’m working on. The main challenge so far has been accuracy. Streaming models, especially cloud ones, often drop enough quality that the tradeoff isn’t always worth it. Local models look more promising, so streaming will likely land there first.
On multimodal input, the UX you’re prototyping where you switch between dictating and typing while composing is interesting. I haven’t really seen that approach before.
The direction I took is a bit different. Instead of mixing modalities mid-composition, dictation becomes context-aware during post-processing. Selected/Copied text or surrounding field content can be inserted into the post-processing prompt so the spoken input is interpreted relative to what’s already on screen.
I tested something similar and continous re-transcription was the only way I could get close to batch-level accuracy.
In my current implementation I’m fairly aggressive with it. I don’t rely much on streaming word confidence. Instead I continuously reprocess audio using a sliding window. As new audio comes in, it’s retranscribed together with the previous segment so the model always sees a longer context.
That recovers a lot of the accuracy lost with streaming, but the amount of retranscription makes it hard to justify economically with cloud APIs. That’s why I’m focusing on a local-first approach for now.
So validating your idea before building is better, but there is an even more "backwards" way:
You're still assuming people will be interested in one of your ideas. There is far from 100% chance of that.
To increase this chance closer to 100%: ask people what they are interested in. "Extract" the #1 problem shared by at least 10 people/businesses (that would be worth paying at least $50/month to fix). Then offer a solution to this problem.
> There are three types of problems: 1. hair-on-fire problem, 2. 2nd biggest problem, 3. everything else
Yes exactly. I read OP's post and thought, why would I use this? If I really cared I'd just copy paste links to my projects or even build a small website. It doesn't even solve any problem I have, I never struggled to share projects.
reply