
ArrayUDF: Breakthrough data structure - dxbydt
https://cs.lbl.gov/news-media/news/2017/berkeley-labs-arrayudf-tool-turns-large-scale-scientific-array-data-analysis-into-a-cakewalk/
======
wahern
FWIW, here's the abstract from their 2017 paper.

    
    
      User-Defined Functions (UDF) allow application 
      programmers to specify analysis operations on data, while 
      leaving the data management tasks to the system. This 
      general approach enables numerous custom analysis 
      functions and is at the heart of the modern Big Data 
      systems. Even though the UDF mechanism can theoretically 
      support arbitrary operations, a wide variety of common 
      operations -- such as computing the moving average of a 
      time series, the vorticity of a fluid flow, etc., -- are 
      hard to express and slow to execute. Since these 
      operations are traditionally performed on 
      multi-dimensional arrays, we propose to extend the 
      expressiveness of structural locality for supporting UDF 
      operations on arrays. We further propose an in situ UDF 
      mechanism, called ArrayUDF, to implement the structural 
      locality. ArrayUDF allows users to define computations on 
      adjacent array cells without the use of join operations 
      and executes the UDF directly on arrays stored in data 
      files without requiring to load their content into a data 
      management system. Additionally, we present a thorough 
      theoretical analysis of the data access cost to exploit 
      the structural locality, which enables ArrayUDF to 
      automatically select the best array partitioning strategy 
      for a given UDF operation. In a series of performance 
      evaluations on large scientific datasets, we have 
      observed that -- using the generic UDF interface -- 
      ArrayUDF consistently outperforms Spark, SciDB, and 
      RasDaMan.
    

[https://dl.acm.org/citation.cfm?id=3078599](https://dl.acm.org/citation.cfm?id=3078599)

