Hacker News new | past | comments | ask | show | jobs | submit login

Ok so this isn't a Google product. In brief (please correct if I'm wrong), Google lets its employees work on their own side projects on company resources if they assign copyright to Google. This means that it gets published on the Google github account, but is then denoted to not be a Google product - it's someone's side project.

I do however have at least anecdotal experience with how these sorts of systems work. The idea is that as a large company, you traditionally pump all of your internet through a firewall, which scans it all online, does deep packet inspection etc to look for attackers.

Then, because it takes up a lot of space, you ditch it, and perhaps keep finer grained logfiles - perhaps just the DNS requests or headers or suspicious packets etc.

The idea here is that for many companies, this isn't helpful when you do get owned - you'll have deleted most of the relevant data (showing exactly what got exfiltrated etc, how it happened etc) and you might have some logfiles showing TCP addresses but you know little else.

Since a company of 1000 will use no more than around 1-10TB per day for its staff, it's actually now feasible to store every packet that is sent in and out of your network - you could store for 90 days on around 0.1-1PB - which is actually fairly affordable for a company of that size.

Then, you either run large (more expensive than can be done in a firewall) jobs over the data offline to look for intrusions, or wait for a breach and then drill down on the data to try to learn exactly what happened.

The reason why this isn't really a tool for monitoring users is:

a) What can you do to track users that you couldn't already do with systems that don't store all the data? b) The target seems to be corporate networks who can and should monitor what their users are doing on their network. c) The nature of this sort of data is that because it's not really indexed any specific searches would be very expensive - perhaps requiring runthroughs of terabytes of data. So individually spying on many people isn't really doable without further processing - this is really just a big packet dumper.

If you were going to try and monitor random Joe Public, then you'd certainly be fitting a device like this to a computer their traffic would be passing through - but this isn't useful for someone who's not an ISP or nation state (and in that case, there'd probably be smarter ways of doing this (since here, you can only sniff local connections)). For Google, the most they'd be able to sniff is communications from their users to their own servers - which isn't a huge bonus for the costs.

Even for an ISP, it'd just be massively expensive and unhelpful - a UK ISP (Plusnet) I just searched up has around 800,000 ADSL users, and at peak time they see total usage of 130Gbps-ish. Even assuming average half utilisation of 65Gbps, that's still 702TB a day. That's a massive amount of data to store for any reason. The reason you (bad person) only store the metadata is beause the metadata is the valuable part!

I welcome corrections :)

No corrections necessary, you're right on the money.

This is a 20% project. While it's one we plan to use internally, it's not a "supported" Google product. It's just another open-source project along with the many others we use to keep our networks secure.

Also, it's designed specifically to do one thing (packet history) and do it well. In no way is it a complete solution; this is a building block for network detection and response.

To reiterate some of the salient points:

1) Disk is REALLY cheap these days.

2) NIDS don't store lots of history, because they're optimized for detecting patterns and signatures. So they might find something in the middle of a TCP stream and send an alert, but you don't have much context around it. This allows you to build that context by requesting all packets from that stream during a (possibly very long) time range.

3) There's a ton of reasons why this isn't used to monitor users:

* it's wrong: I'd flat-out refuse to build something designed to monitor users

* it wouldn't work #1: most interesting user traffic is encrypted on the wire

* it wouldn't work #2: our production network architecture is not good at single aggregation points

* it wouldn't work #3: there aren't enough disks in the world to handle our production network load

* it's redundant: applications can already do per-application, structured monitoring as necessary for debugging/auditing/etc.


> Then, you either run large (more expensive than can be done in a firewall) jobs over the data offline to look for intrusions, or wait for a breach and then drill down on the data to try to learn exactly what happened.

I was thinking in terms of offline jobs, and don't have a good intuition for what those rules would look like. I'm also skeptical that your average company would have the expertise to write a good set of rules. So I was interested to see that "half" of an IDS tool.

I think the real answer is that it truly is just a rolling packet dump, and it's up to you to use it however you choose.

I can think of uses outside of network security: capturing traffic from your mobile devices on your home network (maybe this is just IDS if you're watching for the contents of your address book to be exfiltrated by a malicious app), or snooping on people through a Internet cafe, library, or other (small) open network that you administer.

For these uses, just like IDS, you'd want to run offline jobs against the data. Whether that's a full scan for something interesting, or an indexing pass that extracts (portions?) into a more easily viewable form.

Offline jobs are an interesting idea, but they weren't what we were really thinking of. Instead, we use stenographer more like a database of recent traffic. Consider this as a simple use case for intrusion detection:

  set up snort and steno
  foreach snort alert
    request all packets in stream from steno: srcIP,srcPort,dstIP,dstPort match
    OR request all packets on that srcIP,dstIP, to get OTHER connections between those hosts
    store pcap to directory (or central DB, or whatever)
Then, when a human analyst wants to investigate the alert, instead of getting the very limited PCAP that comes out of snort, they get a ton of data they can use to build context, write new detection rules, etc.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact