I would say TensorFlow is a hybrid of two strategies: SIMD and dataflow/DAG. (I wouldn't say fork-join and dataflow/DAG are synonymous; rather they are related but different models/APIs).
At the level of a single node, TensorFlow uses Eigen [1]. Eigen is like BLAS, but it's a C++ template library rather than Fortran. It compiles to various flavors of SIMD. Nvidia's proprietary CUDA is the SIMD flavor most commonly used by TensorFlow programs.
At the level of multiple nodes, TensorFlow derives a program graph from your Python source code, using high level "ops", in the style of NumPy. Then it distributes the ops across a cluster using a scheduler:
Quote: Its dataflow scheduler, which is the component that chooses the next node to execute, uses the same basic algorithm as Dryad, Flume, CIEL, and Spark. [2]
Python is the "control plane" and not the "data plane" -- it describes the logic and dataflow of the program, but doesn't touch the actual data. When you use NumPy, the C code and BLAS code are the data plane. When you use TensorFlow, the Eigen and GRPC/protobuf distribution layer are the data plane.
So you can have a big data dataflow system WITHOUT SIMD, like the four systems mentioned in the quote. And you can have SIMD without dataflow, i.e. if you are doing it in pure Eigen or procedural/functional R/Matlab/Julia on a single machine. Languages like R and Julia may have dataflow extensions, but they're single-threaded/procedural by default as far as I know.
A mathematical way to think of the DAG model is where you program uses a partial order on computations rather than a total order (the procedural model) -- this is what give gives you parallelism.
At the level of a single node, TensorFlow uses Eigen [1]. Eigen is like BLAS, but it's a C++ template library rather than Fortran. It compiles to various flavors of SIMD. Nvidia's proprietary CUDA is the SIMD flavor most commonly used by TensorFlow programs.
At the level of multiple nodes, TensorFlow derives a program graph from your Python source code, using high level "ops", in the style of NumPy. Then it distributes the ops across a cluster using a scheduler:
Quote: Its dataflow scheduler, which is the component that chooses the next node to execute, uses the same basic algorithm as Dryad, Flume, CIEL, and Spark. [2]
Python is the "control plane" and not the "data plane" -- it describes the logic and dataflow of the program, but doesn't touch the actual data. When you use NumPy, the C code and BLAS code are the data plane. When you use TensorFlow, the Eigen and GRPC/protobuf distribution layer are the data plane.
So you can have a big data dataflow system WITHOUT SIMD, like the four systems mentioned in the quote. And you can have SIMD without dataflow, i.e. if you are doing it in pure Eigen or procedural/functional R/Matlab/Julia on a single machine. Languages like R and Julia may have dataflow extensions, but they're single-threaded/procedural by default as far as I know.
A mathematical way to think of the DAG model is where you program uses a partial order on computations rather than a total order (the procedural model) -- this is what give gives you parallelism.
So TensorFlow uses both SIMD and dataflow.
[1] http://eigen.tuxfamily.org/index.php?title=Main_Page
[2] http://download.tensorflow.org/paper/whitepaper2015.pdf