
Spiral: Self-tuning services via real-time machine learning - amzans
https://code.fb.com/data-infrastructure/spiral-self-tuning-services-via-real-time-machine-learning/
======
taeric
Using a cache as an example feels off. I mean, yes. To a large degree, machine
learning is fancy statistics done by the computer. And, to a large degree,
caching is keeping the data that is statistically most likely to be accessed
again.

However, my understanding is the classic algorithms for caching have yet to
lose to the new machine learning ones. Has that changed?

~~~
vbychkovsky
Classic caching algorithms like LRU use only last access time as a feature.
You are correct that ML based on only that one feature won’t be much different
than LRU. However, most of the time, data items have metadata that can be used
as features, for example, image size and type. With this additional metadata
ML algorithms can do a lot better than LRU that is only using access patterns.
Also, think about how image content (sports car, an attractive person, etc)
could predict if this image it likely to be accessed in the future.

~~~
taeric
I think some caching algorithms are a bit smarter than just LRU. Though, that
is typically as much in having multiple caches as it is anything else.

More importantly, I know this is claimed a lot. But I thought the last few
explorations of the idea I saw did not actually see fancier algorithms win.
Indeed, the best strategy from my memory was random spreading of the data with
an almost random replacement strategy. I think some of the win there was just
the low overhead of the bookkeeping, but it was still one of the better bets.

(This is all off the table, of course, if you _know_ what the access pattern
will be. Then, by all means, set things up accordingly.)

------
mohaps
One of the authors here. Be happy to answer questions

~~~
abrichr
I'm having trouble fully grasping this:

> Today, rather than specify how to compute correct responses to requests, our
> engineers encode the means of providing feedback to a self-tuning system.

"encod[ing] the means of providing feedback to a self-tuning system", got it,
very cool!

But don't they still have to "specify how to compute correct responses to
requests"?

~~~
nawgszy
Not OP / GP; from what I understand, this isn't an API generated by ML, it's a
cache manager. "Computing correct responses to requests" refers to deciding
whether or not it should read it from some caching layer and whether or not it
should be cached in the future, and it does this by optimizing some
parameters. The difference is something like it decides whether or not to use
the API or the cache to load an image or some content, rather than saying
"this request should return a response with these properties". Hopefully I'm
right and this makes sense.

------
dangirsh
Potentially confused with SpiralGen:
[http://www.spiralgen.com/](http://www.spiralgen.com/)

~~~
gwern
And even more easily confusable with DeepMind's Spiral, another reinforcement
learning project: [https://deepmind.com/research/publications/synthesizing-
prog...](https://deepmind.com/research/publications/synthesizing-programs-
images-using-reinforced-adversarial-learning/)

~~~
shoo
Yet less confusable with a certain French television police procedural and
legal drama series set in Paris.

------
grantlmiller
This looks really interesting. Does anyone know if there are plans to open
source the framework?

~~~
vbychkovsky
Thank you for your interest, this is very encouraging! There are no immediate
plan to open source this, but this is something we may consider in the future.

