Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Log.io (Realtime log monitoring, powered by node.js + socket.io) (logio.org)
205 points by msmathers on June 5, 2011 | hide | past | web | favorite | 53 comments

> Log.io has no persistence layer. Harvesters are informed of file changes via inotify, and log messages hop from harvester to server to web client via socket.io.

This struck me as really odd. Why is it done that way? Most stuff is logged via syslog. Those things are already in the buffer and it seems the local nodes are doubling the seek+read of what was just written. Downsides:

- double disk action for every log line

- inotify overhead for the system

- logging requires local disk space

- disk io delays mean collection delays

- when disk get full / fail on write you don't collect the message about the failure if one was attempted to be written

- when syslog daemon decides that messages are coming in too fast, it might drop them from the for_the_disk buffers - this way the only copy was lost (not 100% sure, but I believe syslog-ng can send to one destination even if another is blocked)

I really can't see positive sides of this approach. Logstash seems to handle everything log.io does and much more... am I missing something?

A lot of things don’t use syslog. For example, default configurations of Apache write logs directly without syslog. The benefit of this method is that it is general enough to work for most people without configuration changes, and I assume they could always add syslog support later.

Log.io assumes multiple nodes. If you want any kind of visit statistics in that scenario, you're already collecting the logs somehow. Unless you rolled out some homemade script for that purpose, you're probably using remote syslog anyways.

Or do you store your logs on a random server that request happened to hit / session was bound to?

I know very little about system administration, but the comment I want to add to this conversation is: wow, this is a tremendously good product website.

1. Show the glamour shot (even just a screen shot of text makes it compelling). 2. Quickly explain the problem you're solving. 3. Quickly explain how you solve the problem. 4. List the most compelling RTBs ("reasons to believe")

Just a heads up: it wasn't immediately clear that I could click on the screenshot for a demo. I'd make that a huge call-to-action. The product can sell itself.

Also realize that if you are connected over websockets all data will be transferred unencrypted even if you put log.io behind https. A much better fit for these cases is SSE which works fine over https:


How have I never heard of this before? (Oh, no Firefox or IE support).

Also, I don't believe your claims about WebSockets are true. I thought wss:// would allow for SSL secured WebSocket connections...

edit: Googling indeed seems to indicate that wss is ws over TLS. See: http://websocket.org/echo.html for an example.

Modern HTTPS uses TLS too.

It's cool to see log files streamed in real time to your browser but most of the time it scrolls too fast to get any useful information out of it (unless you use regex filters of course). It would be cool to have graphs that show, say, browser usage (or IP geolocation or operating system or page visited or any other metric that can be gathered from log files), being updated in real time.

This is probably what you described: https://github.com/stagas/maptail

I like this, and I'd use it in my projects if I knew it was reliable.

I've seen socket.io drop messages and I wonder if it's happening here.

Open both Chrome and Firefox at the same time and watch logio_server in each browser. After several minutes I start seeing different messages in each browser. Is socket.io not delivering every message to each socket?

[author of socket.io]

It's possible that log.io is not trying to send absolutely every message (which is the reason I introduced the volatile flag for the upcoming version: https://github.com/learnboost/socket.io-node/tree/develop)

Oh man, and I just coded essentially the same (with persistence through, for we need history) with CouchDB over the last 2 months. If only you'd have launched a bit sooner!

ps. Maybe it's me, but 2 hours ago your site worked great in Opera, and now the design seems broken. If you changed some CSS, then that might be it :)

You know what would be cool. Write your own npm that does logging which would open up a socket connection to your server bypassing sending logs to disk and being watched by inotify.

Yeah that would be cool. Even better if it could be done over UDP, so the log messages are fire-and-forget from the client side. Even BETTER if there was a standardized way for all programs regardless of implementation language to write to said socket/log.

Yeah, they thought that up 30 years ago. It's called syslog.

There are some issues with syslog. Fortunately an extension is ready: http://www.graylog2.org/about/gelf - can be structured and is not limited in size anymore.

Except my Cisco switches, HP printers etc won't be talking that anytime soon.

In addition, graylog appears to be "solving" something RFC 5424 already has with structured messages.

Tbh, I haven't seen any standard syslog daemon supporting structured messages - except for apple's syslogd and syslog-ng. Others mostly cut the message to 1024 or so bytes and each one handles new-lines in a different way. So far it seems noone is sending messages that way and noone is expecting them. If a new protocol is the way to push structured logging forwards, I'm all for it. It doesn't mean those two protocols cannot coexist. It doesn't mean a typical logging pipeline cannot include syslog->gelf converter which adds annotations / parses the message.

It also seems that the structured messages handling assumes the message fits in a packet and can be processed dropped when convenient depending on its size. That's not something I expect from a logging system. Gelf adds chunking / compression just in case.

rsyslog also supports structured RFC5424 messages. Others will support it in time.

I agree it's not the most perfect format, but it's usable, and already much better than freeform text messages if you intend to do programmatic processing.

Another interesting initiative in normalized logging is CEE (http://cee.mitre.org/). The 'lognorm' and 'libee' libraries (http://www.liblognorm.com/) are early attempts to implement this.

Note that I haven't looked at 'gelf' yet so I don't know how it compares.

This looks similar to logstash, which has been on my tools to try list for a while.

"logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source."


very, very cool.

node.js is also my (current) tool of choice for real time logfile analysis, i use a slightly different approach.

i ssh funnel the logfiles to my machine i.e.:

   ssh server.example.com sudo tail -f /var/log/nginx/example.com_access.log > test.log
then use this (disclaimer: coded it during a hackweek) lib https://github.com/franzenzenhofer/nolog to attach events to the logfile

   nolog('./test.log').shoutIf('googlebot', /Googlebot/i ).on('googlebot', function(data) { ... });
basically you

   - watch a logfile
   - shoutIf you come across a given pattern (can be regex or function)
   - listen if a shout event occurs
i will definitly look into log.io and see if there is a way to plug nolog somewhere in there.

an important addition to log.io would be log servers in php, asp(x) and all the other legacy languages. my experience is that webservices that allow and/or use modern development approches already have a suitable logfile observation, but all the millions of sites stuck with yesteryear really struggle with it.

vsConsole is a similar application that solves the same problem, but based on java technologies.

* Java Agents run on the servers * vsConsole is a web application that runs on Tomcat

Use a browser to select the desired log file, and you can tail it. (When the browser polls the web app, the web app will contact the agent for the latest changes to the file).

View the demo at:

* http://demo.vamonossoftware.com/vsconsole/file/index#1:DEV:8...

More information at * http://demo.vamonossoftware.com/

vsConsole is aimed at development teams, not as a production log monitor. For example, while developers and testers are working with an application, when they want to see whats going on in the log, they can just click a button - no unix accounts, ssh, tailing, needing to know where the logs are etc.

A new version which looks a lot better, and has simple messaging and application monitoring will be out soon.

I built something similar a while back during the Django Dash, but there's no server software to install because it just uses ssh and tail: https://github.com/ericflo/servertail

This looks more polished and complete though from a user interface perspective, nice job!

This is really cool; very impressed!

Cool, but doesn't run on CentOS. Init files are coded exclusively for Debian, and I don't have time to convert them.

Couldn't get it to start in Ubuntu (couldn't find module "connect", totally unsupported on Centos.

did you use npm for packages? did you install connect?


This would be awesome as a plugin/add-on for Heroku - if that's not already in the works.

nice, however unless you left it out deliberately to fire the error log, you should add a favicon to your site.

Could you explain your concern? I don't understand why a favicon is important in this case.

Look at the app; favico 404's are the entire error log.

ssh user@server tail -f logfile

It's even encrypted.

Try that with N servers and M log files where nodes join / leave the cluster at any time.

As long as N is not too big, remote syslog works well for this case.

Google pdsh

Very cool!

Cool, you just reinvented syslog, poorly.

Syslog is cool and all, but this is a nice UI for displaying the logs in the browser, so everyone in the company can watch them, or you can easily open it up on an iPad, and hang it on the wall.

Could I use SSH over iTerm, and display multiple windows in Screen? Sure. But this is easier/quicker.

It would be nice if it didn't need to constantly hit local disk on every machine, though.

What is a usecase for everyone in the company to see raw logs? Nagios / zabbix dashboard - sure. Some visits analytics - maybe. General stats / resource usage - not for everyone, but ok.

But why would you ever inflict raw, unprocessed, scrolling logs on anyone other than developers working on a specific bug? (they'd expect an accessible backwards search too)

I was thinking it would be fun, in a somewhat similar vein to Google displaying Search Queries on the wall - http://www.flickr.com/photos/kchrist/21051526/

That said, I agree- Historical Search is a necessary feature for it to be very useful to debugging.

I work with Mike. Everyone in the company doesn't need to see the logs, but all the developers do and some others (like myself) do too.

This tool is most useful for watching servers after deployments, tracking a process through a bunch of coordinated services, live debugging, or keeping an eye on specific log messages across a bunch of servers.

As many folks have pointed out, there are plenty of solutions for aggregating, storing, parsing and analyzing logs. We needed a tool to make watching live logs simpler. That's what this is for.

I created vsConsole because it was a pain to get access to even the development and test server logs for the apps we were working on. Now the developers and testers have easy access to logs (not just syslogs, but our application logs wherever they may be).


So, I can easily see the use for these kinds of tools. Nagios, Splunk, et al are all too heavyweight for our kind of requirements.

Raw logs generally aren't a useful source of information for humans.

In addition to joining your previous comments on this thread in pointless meanness, this comment is also nonsensical. Who else but humans could be the intended audience for logs, raw or otherwise? They are a cast-iron bitch to do anything programmatic with; whole companies have been created just to tackle that problem.

I'd go one further, entire industries have been created to tackle this very problem! There is a lot of money in mining logs properly, it's easily a billion dollar industry.

Watch a production log file for 5 minutes and tell me if you can get anything meaningful out of it.

OP's ignorance makes him a threat to himself and others. It's this absolute refusal to learn the existing toolset that's pervading modern software development, putting projects in jeopardy. What's odd to me is that not only is the HN community complacent in this, you actually applaud it.

FWIW, this is why your sysadmin is pithy. He has seen software developers pull stunts like this for far too long, unwilling to listen to reason, unwilling to be technically cuckolded by a software package that's existed for thirty years.

Ignorance doesn't listen to reason.

"A threat to himself and others"? You're simply griefing. Please stop.

u mad

I'm sure you can come up with more constructive suggestions than just that.

There's value in brevity.

Yes; criticism delivered with no justification is often simply rude. Perhaps you have a point--it's hard to tell: you didn't make one.

I said that OP had reimplemented something that already existed, but with fewer features and tests. Or did you get something else out of those six words?

You have _now_ made that statement. Previously, you merely stated that they had done a poor job of reimplementing syslog without explaining why--this is the sole component of good criticism. Implicit meaning is tantamount to no meaning at all.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact