
Show HN: Herodotus – An IRC bot that logs a channel's activity to JSON - egladman
https://github.com/egladman/herodotus
======
perlgeek
Don't use integer timestamps as object keys. What if there's more than one
action in any given second?

I'd much prefer an array with objects as element, each object storing its own
timestamp.

~~~
egladman
I was hoping no two actions would occur on the same millisecond. Thanks for
the feedback.

~~~
TheDong
Just for you, I went through my logs for my current irc server and checked how
often irc messages occured on the same millisecond. I found on the order of a
few thousand such events out of millions of lines total.

In addition, that's just based on my perception; IRC is _NOT_ an ordered
protocol, so if multiple people ran this same logger, they'd end up with
different stream and different duplicate keys (same thing as events lost I
might add).

My log, also, conveniently contains joins, parts, notices, ctcps, and all that
other jazz because I used a mature logging library.

------
koolba

        var array = [];
        
        client.addListener('message', function (from, to, text) {
          var time = Math.floor(new Date() / 1000);
          var obj = { nick: from, message: text };
          var item = '\"' + time + '\"\:' + JSON.stringify(obj);
        
          array.push(item);
        
          fs.writeFile(file, '{' + array + '}\n', function() {
            // console.log("updated");
          });
        });
    

This will buffer all the messages in memory (in the variable "array") and
rewrite the file on every message. Besides the lack of atomicity mentioned in
another comment, that's also really inefficient and will eventually run out of
memory. Plus 1000 incoming messages will lead to 1000 sequential rewrites of
the same file!

~~~
IanCal
A useful file format to deal with this kind of problem is JSONL, which is just
a list of newline separated json blobs. You can then simply append things to
the file rather than having to write the whole thing each time.

------
dikaiosune
Cool - does it also inject apocryphal stories of women growing beards when war
is nigh?

------
Houshalter
Don't use this prettified JSON format. It's awful to parse. Put each message
on it's own line. I had trouble parsing someone's irc logs in JSON because it
was too big to load into memory. It was much easier to just load one line at a
time from the file.

------
TheDong
Please don't "Show HN" something that is neither complicated nor interesting.

The other "Show HN" on the front page right now, for comparison, is
Claudia.js. That's not something anyone on HN could trivially write in 5
minutes. It's about 5k lines of javascript. It's already somewhat mature.

For comparison, the project you posted is under 50loc, any of us could
trivially write (in fact, piping
[http://tools.suckless.org/sic/](http://tools.suckless.org/sic/) would be
sufficient... or using znc with its "log" module, or using weechat / irssi's
builtin logging functions), and doesn't really accomplish something super
interesting to HN imo.

I expect everyone who uses IRC in a significant capacity on HN has already
solved the most basic level of their logging problem. I solved mine by just
running a ZNC bouncer with the log module. The harder level of the logging
problem, indexing it and presenting it in a nice UI and solving availability
by merging logs from multiple leafs in a netsplit event, I'm not so you can
consider commonly solved, but hey, your solution has nothing to say there
either.

Others have already mentioned that there are some issues with your code, so I
won't touch on them, but I will emphasize that you should probably work on
improving code quality and make something both more significant and more
interesting prior to doing a Show HN.

In addition, this actually is something people on HN actively couldn't
trivially use because it has no configuration (hard coded connection strings)
and no docs.

~~~
tilpner
I'm curious, in what circumstances would you merge multiple log files (from
different perspectives, I assume) because of a netsplit?

That'd require one log/connection per server of the network, which isn't
something ZNC or weechat will do by default.

Have you ever needed that functionality?

~~~
TheDong
Let's say that you're an irc network operator trying to make the channel's
entire history searchable (e.g. botbot.me type thing).

For absolute correctness, you'd want to record per-leaf and merge.

However, the last time I needed log-merging functionality was much more
boring; some of my log-files were corrupted, and someone else had logfiles
that weren't as extensive, thus I needed to munge the two together. The
timings were subtly different (because that's how irc works), so I wrote
custom code to munge them together, preferring his for the range of
corruption.

So yes, I've needed that functionality, though it wasn't actually done due to
a netsplit. I can foresee it being useful in the case of a netsplit if your
logging is at the ircd level or you run one bot per leaf.

~~~
tilpner
I wonder whether any large network keeps logs of channels, let alone one per
server. And if they do, they probably don't have much use for a merged log
except for taking up less space.

My own IRC log tool [GH: tilpner/ilc] can be used to merge logs, but I rarely
use that functionality.

For two log files "a" and "b", in weechats default log format:

    
    
        ilc sort -f weechat -i <(cat a b) | ilc dedup -f weechat
    

I've never tested this with logs over 200MB, but sort will read the combined
log into memory, which is definitely not optimal.

------
SFjulie1
Wtf?!

[https://github.com/egladman/herodotus/blob/master/server.js#...](https://github.com/egladman/herodotus/blob/master/server.js#L37)

How can a developer be that stupid?

file.write is atomic (guaranteed to "work") for PIPE_BUF (posix) octects ~
16ko at most.

JSON is guaranteed to be unparsable if the file is truncated of the last chars
(it is not very resilient).

Hence the write may corrupt your WHOLE log if a non recuperable failure
happens or the code is interrupted.

The code DOES not use fixed size allocation ... thus is can crash randomly
because of SEGFAULT in the middle of the writing. The history will take
cumulative size in memory.

This coding attitude highers the probability of this failure to happen BY
DESIGN.

Is it that complex to FIRST write the file, and THEN atomically rename the new
file to the old file at worst (resulting only in losing the current session,
but not the whole history).

If your log are that precious, why would you not take extra care about
protecting them?

This code makes me want to puke.

On the other hand, it is representative of the reason why I am disenchanted by
modern coding standards.

~~~
egladman
I'll see to it that this is fixed.

------
SFJulie2
And Part 2/2:

file.seek is your friend. You should not use any dynamic allocation to make it
more robust (yes preallocating fixed memory size with "char circular_ring[N *
MAX_SIZE]") Where N and MAX_SIZE are fixed.

Well. It requires a complete rewrite for this code to not create a risk for
your users. Your code is like full of hidden landmine by lack of design.

And you know what?

It used to be standard knowledge for introduction to CS for scientific. You
know why? Because no users of computers like to lose their precious data made
during a very long measurement spilling huge amount of data and costing a lot
of resources. (like gold, helium, molybden, electricity, time, wages of
qualified operators....)

It seems like coders do not care of their users, like a builder thinking it is
reasonable to build 50 store buildings on quicksands because people only
judges builder by the look of their creation.

~~~
dang
Please don't do this. Instead, vouch for a dead comment if you think it
shouldn't be dead.

We banned SFjulie1:
[https://news.ycombinator.com/item?id=11141174](https://news.ycombinator.com/item?id=11141174).
If you violate the rules of this community that badly, you forfeit the right
to comment here.

We detached this comment from
[https://news.ycombinator.com/item?id=11141103](https://news.ycombinator.com/item?id=11141103)
and marked it off-topic.

~~~
SFJulie2
Roger that Dang.

How do I vouch for a dead comment?

~~~
dang
Click on the comment's timestamp to go to its page, then click 'vouch'. But
note that you need more than 30 karma before the 'vouch' and 'flag' links
appear.

