Hacker News new | past | comments | ask | show | jobs | submit login

Log structure is really important, from the examples provided I would suggest the same approach can be used using a full 'logfmt' style, so timestamp and the event type can be set as keys, e.g:

  ts="2019-03-18 22:48:32.990" event="Request started" http_method=POST http_path=/v1/charges request_id=req_123
the main difference is that you make easier the parsing since many tools can parse lOgfmt without problems.

One interesting use-case here for 'me' is the ability to perform queries in a schema-less fashion and I will do a quick speech on what we are working on Fluent Bit[0] (open source log project), pretty much the ability to query your data while is still in motion (stream processing on the edge[1]). Consider the following data samples in a log file:

  ts="2019-03-18 22:48:32.990" event="Request started" http_method=POST http_path=/v1/charges request_id=req_123
  ts="2019-03-18 22:48:32.991" event="User authenticated" auth_type=api_key key_id=mk_123 user_id=usr_123
  ts="2019-03-18 22:48:32.992" event="Rate limiting ran" rate_allowed=true rate_quota=100 rate_remaining=99
  ts="2019-03-18 22:48:32.998" event="Charge created" charge_id=ch_123 permissions_used=account_write team=acquiring
  ts="2019-03-18 22:48:32.999" event="Request finished" alloc_count=9123 database_queries=34 duration=0.009 http_status=200
so if I wanted to retrieve all events associated for user 123 I would process the file as follows:

  $ fluent-bit -R conf/parsers.conf \
               -i tail -p alias=data -p path=canonical.log -p parser=logfmt \
               -T "SELECT * FROM STREAM:data WHERE user_id='usr_123';" 
               -o null -f 1
the output is:

  [1552949312.991000, {"event"=>"User authenticated", "auth_type"=>"api_key", "key_id"=>"mk_123", "user_id"=>"usr_123"}]
the results are in a raw mode but can be exported to stdout in json, to elasticsearch, kafka or any output destination supported.

One of the great things of the stream processor engine is that you can create new streams of data based on results, use windows of time (tumbling) for aggregation queries and such.

[0] https://fluentbit.io

[1] https://docs.fluentbit.io/stream-processing

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact