
Open Sourcing Vespa, Yahoo’s Big Data Processing and Serving Engine - martinp
http://blog.vespa.ai/post/165763618906/open-sourcing-vespa-yahoos-big-data-processing
======
mkj
There's a lot in there.

Cluster file distribution with bittorrent [https://github.com/vespa-
engine/vespa/tree/master/filedistri...](https://github.com/vespa-
engine/vespa/tree/master/filedistribution)

------
toast0
If someone was familiar with Vespa in 2011, but hasn't had access to it until
now, what's new since then?

~~~
tedd4u
At Flickr, we worked closely with the Vespa team from 2011 through 2016 on a
wide range of advancements:

    
    
       * partial document refeeding (i.e. expedite indexing a new field to 20+ billion documents without refeeding everything and staying online handling 100M+ free text queries a day)
       * visual similarity search - check out the tensor ranking features [1] [2]
       * online elasticity - add/remove replicas / shards online. A must when it could take weeks+ to re-feed from scratch. This is non-trivial to make work smoothly at scale. 
       * latency / tail-latency on complex queries. p90 reduction from 3,000 to 30 ms.
    

This is a major gift to the open-source community of a battle-tested search
engine that works reliably without babysitting with very large datasets, and
simultaneous high query / high feed volumes. Huge debt of gratitude to the
team in Trondheim and Verizon/Oath/Yahoo legal & management teams for making
this happen. :+1:

[1] [http://docs.vespa.ai/documentation/tensor-
intro.html](http://docs.vespa.ai/documentation/tensor-intro.html) [2]
[http://docs.vespa.ai/documentation/tensor-user-
guide.html](http://docs.vespa.ai/documentation/tensor-user-guide.html)

------
groodt
Powers bits of Flickr. Interesting.

