Spiral: Self-tuning services via real-time machine learning

taeric · on July 3, 2018

Using a cache as an example feels off. I mean, yes. To a large degree, machine learning is fancy statistics done by the computer. And, to a large degree, caching is keeping the data that is statistically most likely to be accessed again.

However, my understanding is the classic algorithms for caching have yet to lose to the new machine learning ones. Has that changed?

vbychkovsky · on July 3, 2018

Classic caching algorithms like LRU use only last access time as a feature. You are correct that ML based on only that one feature won’t be much different than LRU. However, most of the time, data items have metadata that can be used as features, for example, image size and type. With this additional metadata ML algorithms can do a lot better than LRU that is only using access patterns. Also, think about how image content (sports car, an attractive person, etc) could predict if this image it likely to be accessed in the future.

taeric · on July 3, 2018

I think some caching algorithms are a bit smarter than just LRU. Though, that is typically as much in having multiple caches as it is anything else.

More importantly, I know this is claimed a lot. But I thought the last few explorations of the idea I saw did not actually see fancier algorithms win. Indeed, the best strategy from my memory was random spreading of the data with an almost random replacement strategy. I think some of the win there was just the low overhead of the bookkeeping, but it was still one of the better bets.

(This is all off the table, of course, if you know what the access pattern will be. Then, by all means, set things up accordingly.)

mohaps · on July 3, 2018

To add to Vlad's comment above: The major win was in being able to create a simple API which inherently computes update -> cache invalidation probability taking into account the semantics of the item being cached and its relevance to the query. To be able to do this explicitly via heuristics leads to bespoke effort for each new and modified query.

With Spiral, we were able to approach this top down as a classification problem.

e.g.

If you have a cached query for "Friends that liked my post", the Spiral classifier quickly learns that "Post Last Viewed At" or "Post Last Modified At" is not relevant to this via the feedback from the caching code.

Pre-spiral, this was expressed via a curated blacklist/whitelist which had to be recreated if the query characteristics changed.

taeric · on July 3, 2018

I think this makes sense. My mental model would have been viewing this as a sharding style concern. Where different sized and topic items would be to effectively different caches. With hit rates determining the size of each cache.

That said, I think I see where I was mistaken in thinking that was an odd example. It was literally the example. Not just a random pedagogical one.

To that end, thanks for sharing! Cool stuff.

mohaps · on July 3, 2018

One of the authors here. Be happy to answer questions

afpx · on July 3, 2018

Could you talk a bit about the challenges that you faced in developing this? And, the before ans after benchmarks?

I tried something kinda similar to help with tuning data engineering jobs and pipelines for performance and costs. But, it turned out to be a fruitless activity because there were too many variables that affected performance. I’d produce some models that seemed to be marginally effective. But, after code changes, configuration changes, changes to data input sizes, the models quickly became stale and ineffective.

vbychkovsky · on July 3, 2018

We had two big challenges: (1) getting people to clearly specify the feedback mechanism, (2) figuring out what features to use.

(Isn’t all ML about good labels and features? :-) )

Structured ML systems require you to provide (ideally) unambiguous examples of expected behavior. In case of Spiral (or any other online learning) such examples need to be generated automatically. In our experience this part took a good amount of effort: distributed systems issues (aka race conditions and transient bugs in remote systems) made automatic generation of “clean” examples difficult. Once the bulk of these problems were addressed the system began to operate very smoothly. Specifically, it adapted to changing conditions very well.

Spiral is designed to be a drop-in replacement for hand-coded heuristics. In other words, if you had a somewhat working tree if-else statements that specified your image caching policy (if size<100k and type==jpeg..), you should already have an idea for what features to use. There is a bit of work involved in translating these features into the form suitable for classifiers in Spiral. For example, if a classifier is using binary features, the file size feature would need to be quantized (123kb -> “100-200kb bucket”). While this type of work requires forethought and effort, runtime cost of running this classifier is very low.

abrichr · on July 3, 2018

I'm having trouble fully grasping this:

> Today, rather than specify how to compute correct responses to requests, our engineers encode the means of providing feedback to a self-tuning system.

"encod[ing] the means of providing feedback to a self-tuning system", got it, very cool!

But don't they still have to "specify how to compute correct responses to requests"?

nawgszy · on July 3, 2018

Not OP / GP; from what I understand, this isn't an API generated by ML, it's a cache manager. "Computing correct responses to requests" refers to deciding whether or not it should read it from some caching layer and whether or not it should be cached in the future, and it does this by optimizing some parameters. The difference is something like it decides whether or not to use the API or the cache to load an image or some content, rather than saying "this request should return a response with these properties". Hopefully I'm right and this makes sense.

mohaps · on July 3, 2018

the basic difference in approach was to switch the earlier approach of

if (conditionA && conditionB && !conditionC) cache_it()

to

hey look, an item with featureset X is cacheable while one with featureset Y is not.

This reflects in the API which is just two calls predict() and feedback()

this simplifies the integration code and is easily debuggable even in the face of changes.

dangirsh · on July 2, 2018

Potentially confused with SpiralGen: http://www.spiralgen.com/

gwern · on July 3, 2018

And even more easily confusable with DeepMind's Spiral, another reinforcement learning project: https://deepmind.com/research/publications/synthesizing-prog...

shoo · on July 3, 2018

Yet less confusable with a certain French television police procedural and legal drama series set in Paris.

grantlmiller · on July 2, 2018

This looks really interesting. Does anyone know if there are plans to open source the framework?

vbychkovsky · on July 3, 2018

Thank you for your interest, this is very encouraging! There are no immediate plan to open source this, but this is something we may consider in the future.

FridgeSeal · on July 3, 2018

Yeah, please open source this!