Hacker News new | comments | ask | show | jobs | submit login
Ask HN: How do you log application events?
221 points by GaiusCoffee on Apr 27, 2015 | hide | past | web | favorite | 110 comments
We are currently inserting our logs in an sql database, with timestamp, logType, userId, userAgent and description columns. It makes it trivial for us to debug any event by just querying the db. However, after three and a half years of continued use, the table is now way too large.

How do you guys log application events in such a way that extracting information from it is easy, but still keep the size of the logs manageable?

Ehm, the contrast between my answer and everyone else's here makes me feel surprisingly greybearded, but...

Application logging has been a solved problem for decades now. syslog or direct-to-disk in a reasonable format, let logrotate do the job it's faithfully done for years and let the gzipped old files get picked up by the offsite backups that you're surely running, and use the standard collection of tools for mining text files: grep, cut, tail, etc.

I'm a little weirded out that "my logs are too big" is still a thing, and that the most common answer to this is "glue even more complexity together".

100% agree. Don't try to build your own when there are some excellent free (and commercial) ones that are battle-tested.

grep, cut, tail, etc. work quite well if you're working on a single machine or small number of machines.

The "ELK stack" (ElasticSearch, Logstash, Kibana) is a step up in complexity but gives you much more power than command line tools.

There are also some great commercial solutions that abstract away some of that complexity if you don't feel like rolling your own (Scalyr, Splunk, SumoLogic, etc.).

But regardless of the path you take, don't reinvent the wheel!

(Disclosure - I work for a company that provides one of the commercial solutions above - https://scalyr.com)

What? You just said you agree and then said that complex solutions are worth it because they're more powerful, which is the opposite.

Unstructured logging, while easy for application developers and ops people, is a major source of headache for data scientists/engineers.

I am a big proponent of JSON-based, semi-structured logging. Most of log data today can be parsed with reasonable rigor at source, and doing this before shipping data to the backend saves so much future agony.

I recently blogged about this here: http://radar.oreilly.com/2015/04/the-log-the-lifeblood-of-yo...

That was a good read (I wish there was more like it posted to HN). I think there's something to be said though for not adding complexity before it's needed, which isn't really covered in your article. You assume you're writing for an audience that needs to process transactions and events for thousands of servers, but I'd bet that's not the case for most of the people reading your article.

That kind of fractal complexity -- complexity added to every single layer of software architecture -- is a real pain in smaller environments where there's just one guy responsible for figuring out what the heck just went pear-shaped in the application. (Although, the real world problem I've had more often, as that guy, is nonexistent logs...)

Ideally, I'd like to see both purposes handled -- applications logging semi-structured JSON with a utility that's real-time translating the output to a human readable format, or, preferably, vice-versa, with applications logging unstructured data to a file that's then being munged into JSON (or whatever) for machine consumption.

I'm not such a fan of tools that are intended for humans to mine structured logs as though they were text files, because it's too easy to miss important data.

That's great if you have only one server, not so much if you want to search across many apps and servers at once. We use ELK, works fairly nicely.

Log rotation to a centralized log server solves that problem, although maybe at the loss of easy real-time access.

Fair enough, I haven't had to deal with complex multi-server application logging.

Your way still works. Syslog can ship over the network since forever.

It's a security pattern to have the logs shipped to another host in production anyway.

I did seriously consider suggesting that (it would be my default approach) but since I don't actually have any experience with big complex multi-server applications, I decided to keep my mouth shut.

There might be a good reason why they don't do it that way and I wouldn't have a clue.

Second that. I tried to use ELK stack for application logs but it felt too windows-ish. I had to use my mouse and a lot of mouse-clicking to find what I needed. I was happy at my previous job when we had a single log-server which we ssh-ed to and got access to logs from all machines via NFS. It wasn't the fastest way to examine logs but it was very comfortable using all these grep, awk, sed, cut and etc tools.

Log analytics is a big topic so I'll hit the main points. The approach you take to logging depends on the analysis you want to do after the log event has been recorded. The value of the logs diminishes rapidly as the age of the events get older. Most places want to keep the logs hot for a period ranging from a day to week. After that,the logs are compressed using gzip or Google snappy compression. Even though they are in a compressed form they should still be searchable.

The most commont logging formats I've come across in production environments are:

1.log4j(java) or nlog(.NET)



Tools that I've used to search ,visualize and analyse log data have been:

1.Elasticsearch, Logstash and Kibana (ELK) stack

2.splunk (commercial)

3.Logscape (commercial)

Changes to the fields representing your data with the database approach is expensive because you are locked in by the schema. The database schema will never fully represent your full understanding of the data. With the tools I've mentioned above you have the option to extract ad-hoc fields at runtime.

Hope this helps.

We're currently evaluating options, but for .NET Serilog is shaping up extremely nicely, and Seq/Logg.ly as log sinks are nice...

Seq is great because you can set up your own instance very near to your servers for low-latency/high-bandwidth logging, which really changes the game in terms of what you can feasibly (perf/financially) log. It also has some decent visualization options, and it's got some great integrations, with a decent plugin architecture to create your own real-time log processing code.

Logg.ly has some amazing GUI/search options.

We've been using Serilog/Seq and we're extremely happy with it. I'm a little surprised that you didn't mention the buzzword "Structured Logging", which is the special sauce that makes Serilog stand out. Instead of concatenating strings with values, you assign keywords to values which you can later search on. For example,

Log.Info("Customer# {customerNumber} completed transaction {transactionId}", customerNumber, transactionId);

Then using the Seq log viewer you can simply click on "transactionId" in the log line and filter by "transactionId = 456" or whatever. It's one of the most exciting advancements I've seen in the .Net logging world.

EDIT: I realized I didn't really answer OPs question regarding space. If you used Serilog, you can set up different sinks to export to, with different options. For example, you could send all your logs to mongodb, and just recent 1 week rolling logs to the Seq server.

When I've heard "structured logging" used, it has been in the context of much more key-value pairs than just having keywords next to values, e.g.

    Log.Info("customerNum={customerNumber} transactionId={transactionId} state=completed", cN, tID)
or the ever popular logstash-y format:

    Log.Info(LogState.Add("state", "completed").Add("customerId", customerId).Add("transactionId", transactionId));
where `LogState` would build up a key-value dict and its `ToString` would emit the logstash JSON format.

I guess the version that works best depends on the tool that is consuming the log text.

In the end, serilog, depending on the sink, makes your log look like the template, and attaches the meta data of your template variable names and replacement values to the message itself.

Are you able to use Serilog for metrics in addition to application events? I'm thinking something like average time for a method to execute, things like that. And if so, what tools do you use to comb through that data (to determine average execution times, for example).

Right now at work everyone just logs to a single CSV with an inconsistent format and it makes me cringe every time I look at it. It's also really difficult to parse.

I recently used the SerilogMetrics [1] NuGet package to determine the elapsed time between method calls. Worked great, although I couldn't figure out how to use my standard logging config that is carried in the static logging object and had to redefine the seq server I wanted those lines logged to in the class itself. This may have just been unfamiliarity on my part.

Your current way does sound like a headache. If your logging lines are in the standard NLog format, you should be able to drop in serilog without many changes.

[1] https://github.com/serilog-metrics/serilog-metrics

We've evaluated loggly, logentries, splunk, and Seq. The first 3 are fine depending on your logging needs. Seq can handle a TON of events thrown at it, and the latest stuff (~1 day old or so) is extremely accessible. The older stuff takes a little longer to search through though.

We're currently using Splunk (and may move to the ELK stack) for logging, but some types of "application events" are really more useful as metrics. We're using Ganglia for those metrics and limiting application logs to actions that are needed for audit purposes and for warning and error-level application problems.

Using a system like Ganglia (or the Etsy inspired statsd) is an important idea since the OP's original question included how to limit the size of logged data. These systems provide a natural way to aggregate data.

I'd recommend looking at Graylog. It uses Elasticsearch under the hood, surrounds it with an application that focuses on log management specifically. https://www.graylog.org/

Graylog is absolutely brilliant. Storing several TB of data in it, using it for alerting/monitoring (you can configure streams, which you can think of as constrained views of logging data), etc. Highly recommend it.

Is graylog free?


Do you really need to debug events from 3 and half years ago? Full logs only really need to stick around as long as you're likely to want to debug them. Log rotation is a must (I've seen debug logs nobody reads sitting in the gigabytes ...) Past that, you can cherry pick and store metadata about the events (e.g. X hits from userAgent Y on this day) with enough information you'll need to do trend analysis, although it's generally a good idea to keep backups of old full logs in case you need to reload the logs to find out that one thing you forgot to add to your metadata ... If you do genuinely need all of the data back that far, you should look at partitioning the data so you're not indexing over millions of rows - how you do that depends how you're intending on using the data.

If the table is too large, make it smaller by moving some of the data to a colder, slower, store. Whether that's json documents or text file gzips on 6x USB drives, data retention is mostly cheap.

The question is, what information do you want available within the second about logs from a year ago? Aggregate that, and move the rest.

Elasticsearch is amazing. It lives up to the hype. It's perfect for rolling over logs, and they have lots of documentation on how to make it work just right.

Just as an example of how awesome Elasticsearch is, you can trivially segment your storage tiers (say, SSD versus HDD) and then easily move older data to other storage, with a single command.

They have a log-specific handler called Logstash, and a dashboard system called Kibana (which is sorta neat but the UI seems a big laggy in my brief experience). Apparently some folks use Logstash/Elasticsearch to record millions and millions of events per day and ES does a great job.

If you want hosted, check out Stackify. I'm totally blown away with the product (no affiliation other than being a new user). You can send log info to them and they'll sort it all out, similar to Splunk, but not ridiculously priced and no dealing with terrible sales teams. But it gets better - they offer all sorts of ways to define app-specific data and metrics, so you can get KPIs and dashboards just adding a line or two of code here and there. It's a lot easier than running your own system, and it looks like it can make ops a ton easier.

Another hosted service is SumoLogic. I only used them for logging, but it seemed to work well enough.

Wow. Stackify looks really good.

At my current employer, we are currently implementing Splunk. And it's taking forever and they do charge arm and a leg for their offering. I don't mind if a good product costs money but you shouldn't need a consultant on premise just to configure your logging solution.

I manage a 2.5 TB/day Splunk cluster at my current employer and can offer a few tips for making Splunk less painful to manage:

- Make frequent visits to answers.splunk.com. It has a very active community, and I've frequently been able to type "how do I do X in Splunk" into Google and found multiple answers on Splunk Answers.

- Deployment Server. Make friends with it. In a perfect world, it should hold your configurations for all Indexers, Heavy Forwarders, and Forwarders. If you're having to populate $SPLUNK_HOME/etc/system/local/ yourself, you're doing it wrong.

- Make friends with the "splunk btool config_file_name list --debug" command. That makes it dead simple to know which configuration options a Splunk install is running. Append "| grep -v system/default" on the end of that command to filter out the defaults and you'll more easily see which of your options are being used.

- If you have the cash, attend Splunk Conf and load your schedule up with presentations. It's worth every penny.

Hope that helps.

Second that.

We use Splunk for logging and I wrote custom app for fraud detection in financial security field, custom events correlation and alerting.

It's very good tool for enterprise data analytics as well as for any custom dashboarding and event processing.

Installing and configuring Splunk to ingest and index data is close to being dead-simple.

Splunk consultant maybe needed if you have complex enterprise deployment scenario or wish to develop really advanced apps - but configuring logging solution?

It's a point and click exercise.

Yeah I don't get it either. Splunk is still losing money after many years. Their model seems to be "spend $2 to make $1". Compared with how arrogant the sales people are - one told me they really just don't care as there's so much demand for the product - something doesn't make sense.

It was cool software, a bit slow (this was several years back). But with things like Elasticsearch catching up release over release (even if you don't use it directly, other platforms will build on it), Splunk is no longer the totally unique thing they used to be. I can't figure out their $8bn+ market cap with revenue under 500M and costs increasing as they grow.

Site wouldn't load for me :-(

Is there a pricing page for Stackify? I've rooted round their site and cannot track one down. Just like Splunk...

I always have to Google it. http://stackify.com/errors-log-management/ Dunno why it's not more discoverable. Maybe they're doing tests, because it's underpriced IMO. I've not found any service like them - maybe New Relic but that's 10x the price, last I checked. Azure has something they're pushing in their UI, too, but a brief look made it seem nontrivial to implement.

Apart from that, Stackify is $15/server/mo for the server monitoring and app metrics, IIRC.


I used graphite and now I'm using influxdb, in the other hand kibana+logstash+ES.

With statsd and influxdb you can measure all the events in a database, it's pretty easy and you have statsd libraries in some languages. I measure all the events in my products, from response timings, database queries, logins, sign-ups, calls, all go to statsd.

Logs are good to debug, but if you want to measure all events in your platform, statsd+influxdb+grafana are your best friends, and your manages will be happy with that ;-)

A few weeks ago I gave a talk about this, you can see the slides here + a few examples+ deploy in docker:


Regards ;-)

Do you use Logstash to push to statsd for software that don't support statsd?

I have a HAProxy instance I want to log into something like influxdb.

You can quite easily do that with Mozilla Heka [1] and a bit of Lua scripting. Recently a haproxy log decoder was posted here - https://gist.github.com/victorcoder/4b9bea9ade7671fcea75

So in Heka set up an input (LogStreamer for example), and use that haproxy decoder script (as SandboxDecoder) with it. After that pass it through any filters you'd like (like StatFilter) and output collected stats with HTTP output + influxdb encoder.

(I just built a log parsing and stats collecting pipeline for our Heroku apps with haproxy + heka + influxdb + grafana. So far happy with the result.)

[1] https://hekad.readthedocs.org/


No :-(, all the software that I use is python, and Kamailio servers. So I didn't have this problem. I can't help in this case.


At the place where I work we use a couple of different tools for logging events:

Logstash + graylog / elasticsearch - mostly for monitoring application error logs and easy ad hoc querying and debugging.

statsd+graphite+ nagios/pagerduty - general monitoring/alerting and performance stats

zeromq (in the process of changing now to kafka) + storm and redis for real time events analytics dashboards. We are also writing it to hdfs and running batch jobs over the data for more in depth processing.

We also have a legacy sql server in which we save events / logs which is still maintained so maybe this could help you. Just FYI we analyse more than 500 million records / day and we had to do some optimisations there:

-if the database allows then partition the table by date. -create different tables for different applications and / or different events -1 table / day which is then at the start of the new day getting merged in a different monthly table in a separate read only database. -create daily summary tables which are used for analytics -if you actually need to query all the data then use union on the monthly tables or the summary tables -I want to also say this, I know it's a given but if you have large amounts of data batch and then use bulk inserts..

I suggest you take a couple of steps back and think hard about exactly how you want to access and query the data and think what the best tool for you in the long run is.

Why do you feel the log is way too large?

If log entries take up too much disk space, switching to a different system will not help; you will have to do something with the data. You can either archive old years (export in some way, compress, put in cold storage) or throw them away, either partially or fully (do you need to keep debug logging around forever?). Using partitions can help here, as it makes it faster to drop older data (http://www.postgresql.org/docs/current/interactive/ddl-parti...)

You also may consider compressing some fields inside the database (did you normalize logType and userAgent or are they strings? Can your database compress descriptions?), but that may affect logging performance (that's a _may_. There's extra work to do, but less data to write)

If, on the other hand, indexes take up too much space or querying gets too slow, consider using a partial index (http://en.m.wikipedia.org/wiki/Partial_index). You won't be able to efficiently query older data, but if you do that only rarely, that may be sufficient.

Here is another solution that hasn't been mentioned yet, but has by far the best price/performance if it matches your use-case. Google BigQuery isn't advertised as being for log search, but in practice it works phenomenally well. It provides exceptionally low storage costs, combined with a powerful query language and reasonable query costs. The counter-intuitive part is that the query performance, even on tens or hundreds of gigabytes of data is amazing, and better in practice than many purpose built inverted index log search systems.

If you want to use your logs for troubleshooting (e.g. ad-hoc queries to find error messages) or ad-hoc analytics it is ideal. Hundreds of gigabytes can be searched or analyzed in 5-6 seconds per query.

Fluentd can be used to collect log data and send to BigQuery.

Second the BigQuery suggestion for log collection and search. Also Apache Flume is another option to source logs from apps and send to different sinks.

ElasticSearch + Logstash + Kibana.

Custom NLog renderer which implements SysLog protocol and NLog target which pushes logs to RabbitMQ.

We use this at my startup. There's a lot of tools that support pushing to logstash in addition, logstash can parse any on disk log file. It's very useful especially for clustered applications that span multiple servers (since a single user session should be even spread out due to load balancing).

Just wondering b/c where I work, sessions are persisted to a single host (there's load balancing but cookie based persistence through Citrix netscalars). Admittedly it's not great b/c if that host goes down, user session is lost, but it does simplify matter a bit.

Do you persist sessions in some global cache (We'd use Oracle coherence at my work, but I guess you could use redis/memcache/infinispan?). Side note...does anyone have experience with Infinispan?

Making web apps stateful by "sticky" load balancing always felt like a hack. It seems much more elegant to separate the webservers from state. No need to drain machines or anything like that. And as an added benefit, separating things makes it easier to persist sessions, so you can provide a better user experience instead of balancing RAM vs "session timed out" pages.

I might be biased, though, against all the idiotic uses of session that just tend to wreck the user experience. It's not specific to any type of technology, I suppose. Maybe it's just being stateful, overall. (Looking at my bank which can't handle multiple tabs doing transactions at once, or Lenovo which will silently corrupt the displayed pricing info if you price two machines at once, or the new Azure Portal, which manages to display the wrong data even with a single tab.)

I've always been strongly in the stateful web app camp, and I don't find anything hacky about sticky sessions:) But I think the divide between the stateless and stateful camps may come down to a combination of initial technologies used, and web application complexity.

At some point, with a complex enough application, reconstituting all of the required session data and state for each request becomes a massive performance bottleneck, and stateful sessions solve that very nicely. Most app tiers support session failover pretty nicely, and many clients I've worked with have logic where if the session doesn't exist (i.e. if their session's sticky server has crashed or been removed) then reconstitute the state based on a cookie value. Either approach gives you the best of both worlds - the performance of stateful sticky sessions, and the cluster flexibility of stateless.

But with simpler applications, with less session data/state to manage then stateless is probably fine.

Sorry if I'm misunderstanding yours or GPs point but I think GP was recommending storing app session in some 3rd party cache outside your app server tier, not having nothing stored in session at all.

But reconstructing session from cookie value would work just as well I suppose, at the tradeoff of having everything you want stored in cookie (vs storing just session ID in cookie and having that ID mapped to data in one of the aforementioned caches or other alternative)?

We use Redis as a shared session store.

How do you find logging over RabbitMQ? I have a guy at my place who's pushing to get log messages flowing over the message bus, but using Rabbit as a logging mechanism makes me feel a bit uneasy. Do you find that it pollutes your regular messages? Or do you have them on a separate control bus?

We have several teams pushing logs into a shared Logstash, over RMQ. Depending on the amount of logs you have, I would suggest you have a separate RMQ cluster for the logs. That said RMQ is a great buffer for logs, incase there is an issue with logstash / es at the far end.

I really like Sentry (https://github.com/getsentry/sentry) for exception tracking. It's easy to set up, supports different platforms, and looks great.

And if you want to run Sentry in production, you can use Cyclops (https://github.com/heynemann/cyclops) as a proxy.

You could also take a look at Graylog (https://www.graylog.org/), it supports structured data in a variety of formats and can send alerts as well.

It's similar in spirit to elasticsearch + logstash + kibana, but more integrated.

Disclaimer: I work on it, so I'm not going say what's better, just giving another pointer.

1) elasticsearch +kibana: https://www.elastic.co/products/kibana

2) hbase+phoenix: http://phoenix.apache.org/

3) opentsdb: http://opentsdb.net/

My experience is that:

• open source solution require a lot of work

• commercial solution get very expensive very quickly

If you can narrow down how much logs you want to keep, then the commercial solutions are amazing, but as you need (or think you need) to keep them longer and longer, they become prohibitevely expensive.

The next time I have to tackle this issue, specifically keeping the log forever, I will give the hadoop stores (HBase, Impala etc...) a try. Hadoop solutions work really well for very large set of write-once only data, which is what logs are.

I run services that log to plaintext files and I use logrotate to periodically gzip and rotate them out for archival.

Just use grep to query recent logs, zgrep if you have to dig a little.

zgrep works for both plain text and gzipped streams so you can do something like this (assuming you have both deflated and non-deflated files that matches the glob pattern):

    zgrep somepattern /var/log/messages*

We stage stuff out.

After a week, it goes out of cache. After a month, we no longer keep multiple copies around. After 3 months, we gather stats from it, and push it to some tar.xz files, which we store. So its out of the database.

We can still do processing runs over it, and do... but it is no longer indexed, so they take longer.

After 3 years, the files are deleted.

My company uses pretty basic logging functionality (no third party services yet), but one thing we've done that's helpful when reading logs is adding a context id to help us track down API calls as they travel through our system - I wrote up a quick blog post about it here: https://www.cbinsights.com/blog/error-logging-context-identi...

https://logentries.com/ has worked out well for us, at least at the small scale we're using it now. Pricing is reasonable.

The important feature for us is S3 archiving. They'll keep your logs online for a certain period of time, and then copy the old ones to S3. You don't have to get rid of anything, and you're still able to keep costs under control.

We use elmah (https://code.google.com/p/elmah/) for logging our ASP.NET/MVC apps.

It works well for us, nice accessible UI if you need it and a solid database behind it. Also RSS/Email alerts if you need it. We've got thousands of entries in there and even on the old SQL2005 box we use, it seems to work just fine.

I'm probably the only one doing it outside a bank or hedge fund, but since kdb+ opened up their 32-bit license for free, it's been amazing working with. Log files and splayed tables are stored neatly on disk so backing up to aws nightly is a breeze. It is a great solution for high tick rate logging of homogeneous data, especially when that data needs to be highly available in business applications.

Okay, You never log logs in DB in the first place.

You never fill table with non capped/infinitly growing records (capped = collections with an upper limit).

You use at best rotating collections (like circular buffer ring). But anyway, if you have success the log flow should always grow more than your number of customers (coupling) thus, it grows more than linearly. So the upper limit will grow too.

Tools have software complexity in retrieving, inserting and deleting. There is not a tool that can be log(n) for all cases and be ACID.

The big data fraud is about letting business handling growing set of datas that are inducing diminishing returns in OPEX.

In software theory the more data, the more resource you need that is a growing function of size of your data. Size that grows more than your customers, and linearly other time.

The more customers you have, the longer you keep them, the more they cost you. It is in terms of business stupid.

Storing ALL your logs is like being an living being that refuses to poo. It is not healthy.

Solutions lies in sampling or reducing datas after an amount of time and scheme like round robin databases.

Shit HN says candidate: "Storing ALL your logs is like being an living being that refuses to poo. It is not healthy."

That's gold.

You didn't specify your location, but in some counties like the Netherlands, it's not legal to store PI (personally identifiable) data that long. There is no reason to keep access logs for 3+ years. What are you ever going to do with that data?

Like others here said, extract what you want to keep (unique visitors per day or so) and throw the rest out after a few weeks.

I use a logging library called Winston (https://github.com/winstonjs/winston). I have it hooked up to Pushbullet with Winston-Pushbullet (https://github.com/michaelmcmillan/winston-pushbullet) so that when an unhandled exception or error is thrown I get an instant notification on my Nexus 5 and MacBook.

Winston is a node/iojs library though, but I guess you could find something equivalent in any other stack. The Pushbullet part is really useful.

Edit: I run a pretty small site however (http://littlist.no). I don't think I would enable the Pushbullet part if I had several hundred thousand visitors per day.

Edit: I should have known better than to post this to HN, some guy found a procedure which threw an uncaught exception. I was bombarded with push notifications (hahaha). At least I fixed a bug (-:

I just log to a file, rotating/deleting when/if needed

We generally run them through central syslog servers or directly to a logstash tcp or udp input. One way or another all logs from around the world end up in an elasticsearch cluster where we either query for things manually or use kibana to interact with them. Works pretty well actually.

Back in 2012, we talked about our foundation for this at Indeed:

Blog: http://engineering.indeed.com/blog/2012/11/logrepo-enabling-...

Talk: http://engineering.indeed.com/talks/logrepo-enabling-data-dr...

tl; dr: a human-readable log format that uses a sortable UID and arbitrary types/fields, captured via a log4j syslog-ng adapter, and aggregated to a central server for manual access and processing

> It makes it trivial for us to debug any event by just querying the db. However, after three and a half years of continued use, the table is now way too large.

Why are you keeping all of the logs? Are you doing anything with it?

Are the old logs relevant at all? If your program structure has changed, then anything logged before that point isn't even applicable.

My advice: If what you is working, but only failed because of volume of data, apply a retention policy and delete data older than some point in time.

An example: Nuke all data older than 1 month for starters, and if you find that you really don't use even that much (perhaps you only need 7 days to provide customer support and debug new releases) then be more aggressive and store less.

Syslog + Logentries for raw logging (e.g. "User Alice created X"). New Relic APM for performance monitoring, New Relic Insights for statistics (e.g. tracking downloads, page views, API requests, etc).

What kind of log data do you mean exactly? E.g. what's the granularity?

We have web server logs going 30 days back, on disk, managed by logrotate. Then we have error logging in Sentry. For user level events, we track in Analytics, but we also have our own database-backed event logging for certain events. Currently this is in the same db as everything else, but we have deliberately factored the tables such that there are no key constraints/joins across these tables and the rest of the schema, which means it should be trivial to shard it out in its own db in time.

It depends on what type of data you are logging.

For performance metrics we use graphite/stats-d This allows us to log hits/access times for many things, all without state handling code inside the app.

This allows us to get rid of a lot of logs after only a few days. As we're not doing silly things like shipping verbose logs for processing.

However in your usercase this might not be appropriate. As other people have mentioned, truncing the tables and shipping out to cold storage is a good idea if you really need three years of full resolution data.

Well-solved via SaaS. Logentries, Loggly, Papertrail, amongst others.

> We are currently inserting our logs in an sql database, with timestamp, logType, userId, userAgent and description columns.

That's what I would do.

> However, after three and a half years of continued use, the table is now way too large.

Yeah, that's what happens...

There are many ways to handle this issue. The simplest is to start archiving your records ( i.e. dumping your old records into archival tables ).

Do you have access to a DBA or a data team? They should be able to help you out with this if you have special requirements.

I am biased, but you should look into a logging system like splunk. You shouldn't be using an RDBMS for your logs. Your logs don't have a schema.

With splunk, you just output your logs in this format:

<timestamp> key1=value key2=value key3=value

install splunk agent on your machines, and splunk takes care of everything from there. You can search, filter, graph, create alerts etc...

Splunk indexer allows you to age your logs, and keeps the newer ones in hot buckets for fast access.

Splunk is indexing system, not a logging system.

Splunk is a log management first and foremost. It has an indexer system to index your logs of course.

Disclaimer: I work at splunk.

Using a fast, scalable and flexible tool called Fluentd:


Here is a good presentation of Fluentd about it design and general capabilities:


note: it's good to mention that Fluentd have more than 300 plugins to interact with different sources and outputs.

I wrote a program to help me analyse my log files - LogViewPlus [1]. It basically takes a text log file and loads it into a table so you can analyse it like a SQL table. You can also combine log files.

Using this approach, you get a lot of the same advantages of the DB approach for analysis, but with more standard file logging.

[1] http://www.logviewplus.com

Specifically for mobile logging and remote debugging you might wanna check out Bugfender's remote logger: http://bugfender.com/

Disclosure: I'm on of the co-founders. We've a couple of other related tools in the pipeline, but the BF remote logger was the first we built, mostly to solve our own need at Mobile Jazz.

I've actually tried logentries and papertrail. For mobile, I highly recommend Bugfender.

ELK and others have been mentioned and are great tools, but if you want a more simple solution within the Sql realm postgresql with table partitions works well for that particular problem.

I agree with many comments that this isn't ideal, but setting up weekly/monthly partitions might buy you plenty of time to think through and implement an alternative solution.

Surprised no-one has mentioned Papertrail yet - https://papertrailapp.com/

We use them for all our apps and have not seen any issues so far. It can be a bit tricky to set up, but once the logging works, it's hassle free from then on. Pricing is also very affordable.

We use papertrail and have been very pleased with it.

We've been very happy with Paper Trail.

Rollbar has been pretty fantastic for us.

Also NewRelic if you want to spend the money (or get it throuhg Amazon/Rackspace for free)

Elasticsearch, Logstash and Kibana (ELK) stack.

This is very convenient for decently complex querying and analysis at great speeds.

We push data into shared memory. Then we have clients that can read the memory and present it. This makes it possible to log millions of lines per second with a very limited cost.

This has the benefit of making logging more or less asynchronous. You still need to handle the logs coming out of this, of course.

If you're only looking to debug with the data, then something like Splunk ($$$) or Elasticsearch should work. However, if this is for some kind of an Analyticial/Data Scienc-y use, then you'd be better off with a format like Avro and keeping it in Hadoop/Hive.

One of my java projects I use logback with a mongodb appender. This allows me to structure the logs for easy querying plus I have access to all stacktraces from all servers in one spot.

If you go this route, use a capped collection. I generally don't care about my old logs anyway.

I log to redis and scrape the logs to SQL for long-term storage. Memory is fairly cheap now adays so it works out for my app.

If I had a lot of logging to do though, I'd use elasticsearch since that's what I run for my main DB. It handles sharding beautifully.

I track them via the analytics event tracking API. It is really useful and full of surprises when you look at the stats:


Piggy-backing on this topic: does anyone successfully use Amazon S3 as the log store for application event logging? The low cost is attractive, but at first glance it seems like the latency is too high for it work well.

It depends on the priority and importance of events.

For instance, we use a log file for HTTP access logs but I store all of errors and warnings from MongoDB. However, I clean the log storages every month.

We use NodeJS and MongoDB in www.floatalk.com

Sumo Logic (https://www.sumologic.com/)

Works in cloud. Easy to setup and very scalable.

Free tier: 500 MB/day, 7 day retention

Disclosure: I work there.

When I'm hacking something together, I log things in... Slack.

As it grows into a seemingly useable feature, I might move it to GA or Mixpanel.

When it gets to be large and stable, then it goes into syslog

Could you expand on how do you use the log data ? How often do you query it, what time periods do you query, have you considered building a data warehouse for your analytics?

Zabbix to monitor hardware, logstash/elasticsearch (kibana for UI) to monitor service logs, Sentry for application level logs

At least this is what we're moving to at work.

Exactly that, but rotating after three years. If it's three years ago it probably doesn't matter any more.

Rotate it, but save the old ones. You never know how far into the past you have to dig one day. Be it for statistics or tracking a security issue.

Fluentd + Elasticsearch.

Very easy to setup.

We're using ELK stack - it's pretty nice.

Text file + grep + awk

Checkout segment.io


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact