
Ubiq: A Scalable and Fault-Tolerant Log Processing Infrastructure (2016) - SriniK
https://ai.google/research/pubs/pub45805
======
throwawaymath
Let me take this opportunity to say I'm appalled all research at Google has
been rolled into "AI." Ubiq has nothing to do with machine learning, but you
can't access it without wading through a swath of Google-branded AI marketing
material. The domain research.google.com now redirects to ai.google. If you
want to search for _any_ research publications by Google employees, you will
be searching on a domain that first and foremost highlights the AI research
teams. In a particularly egregious example, all quantum computing research has
been filed under "Quantum AI." The "Recent Publications" section exclusively
highlights machine learning research despite there being more recent research
by other teams. I can go on.

In my opinion, this move threatens the scientific integrity of Google's
research. It's clear that researchers in distributed systems, fundamental
theory, security and privacy, networking, language development, etc. are
second class citizens. I wouldn't be surprised to learn that there's
organizational pressure to inject machine learning into as many publications
as possible, even if it dilutes the overall diversity of research.

~~~
kozikow
It's not just research. Android improves battery by 30%, no one bats an eye.
But 1/3 of that improvement is thanks to machine learning - off to the Google
IO we go!

30% and 1/3 not the exact real numbers, just showing a concept.

------
carapace
> It also guarantees exactly-once semantics for application pipelines to
> process logs in the form of event bundles.

Well, it's got my attention.

~~~
EdwardDiego
FWIW Kafka 0.11+ supports exactly-once also.

------
bane
huh...I interviewed with Google about 4-5 years ago and one of the
interviewers spent a bunch of time asking me questions about something just
like this, but from a "theoretical" perspective.

Pretty interesting to see this actually happen and see where Google ended up
with it.

------
equalunique
I wonder if the name is a nod to Ubik by Philip K. Dick

~~~
bripeace
That too is what I thought of. Though in the book the product is of dubious
quality and constantly changing what it promises. So it's hard to be believe
someone thought it a good idea to associate their product

~~~
atombender
I think you may have misinterpreted what Ubik is in the book. It's not a
product, and it seems to be the manifestation of some kind of creative force
that negates entropy. The commercials quoted in each chapter are not real
commercials in the book.

------
bzillins
Published in 2016

~~~
sctb
Thanks! Updated.

------
nwmcsween
The most scalable, fastest reaction time and simplest log processing tool is
awk.

~~~
Godel_unicode
Ok, I'll bite. Most scalable, you say. I have 250TB of Django logs, how do you
recommend I use awk to process them to determine the 99th percentile response
time that's faster than using SparkSQL?

~~~
heavenlyblue
SparkSQL doesn’t support gz. Are your logs splittable on a file-by-file basis
or are they in gz format?

Where are your logs stored? Is that a distributed storage? Will SparkSQL not
eat all of the bandwidth of it’s ethernet interfaces?

Yeah; sure.

~~~
Godel_unicode
It's not so much that SparkSQL doesn't support gz as that gz is slow because
you can't parallelize the reads. Regardless, parquet format in hdfs so yarn
can allocate containers local to the chunk to be processed. Scales nicely.

~~~
heavenlyblue
Yeah, but you are making an assumption I’ve got these 350Gb of logs in HDFS
format. Which takes time to set up.

~~~
Godel_unicode
Not really, you said SparkSQL doesn't support gz, which is incorrect and the
thrust of my comment. The anecdote about parquet is orthogonal to gz support.

pedantic sidebar: hdfs isn't a file format, it's a distributed file system
layered over a traditional on-disk filesystem. For example you might have:
json logs, in a gz-formatted file, tracked in the hdfs filesystem, stored on
disk in an ext4-formatted filesystem.

