My immediate reaction to today's news that Splunk was being acquired was to comment in the HN discussion for that story:
"I hated Splunk so much that I spent a couple days a few months ago writing a single 1200 line python script that does absolutely everything I need in terms of automatic log collection, ingestion, and analysis from a fleet of cloud instances. It pulls in all the log lines, enriches them with useful metadata like the IP address of the instance, the machine name, the log source, the datetime, etc. and stores it all in SQlite, which it then exposes to a very convenient web interface using Datasette.
I put it in a cronjob and it's infinitely better (at least for my purposes) than Splunk, which is just a total nightmare to use, and can be customized super easily and quickly. My coworkers all prefer it to Splunk as well. And oh yeah, it's totally free instead of costing my company thousands of dollars a year! If I owned CSCO stock I would sell it-- this deal shows incredibly bad judgment."
I had been meaning to clean it up a bit and open-source it but never got around to it. However, someone asked today in response to my comment if I had released it, so I figured now would be a good time to go through it and clean it up, move the constants to an .env file, and create a README.
This code is obviously tailored to my own requirements for my project, but if you know Python, it's extremely straightforward to customize it for your own logs (plus, some of the logs are generic, like systemd logs, and the output of netstat/ss/lsof, which it combines to get a table of open connections by process over time for each machine-- extremely useful for finding code that is leaking connections!). And I also included the actual sample log files from my project that correspond to the parsing functions in the code, so you can easily reason by analogy to adapt it to your own log files.
As many people pointed out in responses to my comment, this is obviously not a real replacement for Splunk for enterprise users who are ingesting terabytes a day from thousands of machines and hundreds of sources. If it were, hopefully someone would be paying me $28 billion for it instead of me giving it away for free! But if you don't have a huge number of machines and really hate using Splunk while wasting thousands of dollars, this might be for you.
I encourage everyone to share your "splunk in 1kloc of Python" projects! Some of my own:
- https://github.com/rollcat/judo is Ansible without Python or YAML
- https://github.com/rollcat/zfs-autosnap manages rolling ZFS snapshots