Member-only story
ML Paper Challenge Day 1 — TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
2 min readApr 12, 2020
Day 1: 2020.04.12
Paper: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Category: Tool/Framework/Software Architecture
Computation model
- Described by directed graph, each computation as node.
- Unlike MapReduce, its distributed model divides the computation by subgraph, instead of by subset of data.
- Placement algorithm (How to divide?):
“For each node that is reached in this traversal, the set of feasible devices is considered (a device may not be feasible if the device does not provide a kernel that implements the particular operation). For nodes with multiple feasible devices, the placement algorithm uses a greedy heuristic that examines the effects on the completion time of the node of placing the node on each possible device. This heuristic takes into…