Hacker News new | past | comments | ask | show | jobs | submit login
Cheap and DIY solution for log analysis
6 points by unrequited 6 days ago | hide | past | favorite | 2 comments
I’ve ~5TB size of logs. I’m open to ideas on doing a log analysis using any of the AI models available to play around for learning purposes. I’m on a budget for this but have time to work on something DIY. Kindly suggest any ideas for anomaly detection or similar to play around with these logs. Thanks.





5TB of log data even if you cleaned up the data which would also take time, that's a lot of input for any model.

I think its probably more feasible to sort by type, or category. Maby do something like kibana or greylog so you can better visualize the logs and narrow down what's an IOC and what might just be a random error message. This also let's you look at the type of logs over a time period.

Any ML or AI model would be computationally expensive, and if this isn't something where you have the hardware to selfhost then you also need to upload 5TB of logs.


A few random thoughts since no one replied:

(rip)grep. Use AI to suggest what to look for if you want to use AI. Maybe do it in reverse, so you filter-out the logs you aren't interested about.

Look at the similarity of each line. Working on UTF-8 or ASCII may not be good enough, though it can quickly highlight some interesting lines. Perhaps a nice tokenizer can help, or even language models embeddings. You can play with old text similarity algorithms or cosine similarity and similar.

Play with clustering algorithms like UMAP and HDBSCAN (or whatever the state of the art is, I havn't look at the field recently).

Feeding a chat/instruct LLM 5TB of logs is technically possible, but that would be a huge waste of ressources IMHO. Is it worth it? You could only feed the lines that are unusual to filter.

Let's say you have hardware and a LLM than can process 100tokens/s, 5TB is about 400 years of compute.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: