Hi,
A computer cluster is composed of nodes that execute jobs on files stored in that node (data locality optimization). Certain files have more jobs assigned to them than others.
Let's say we have:
- file A with load X
- file B with load X
- file C with load 2X
- two nodes in the cluster
So the best distribution is: file A and file B in one node and file C in the other node.
How can I distribute the files in the nodes? Does a greedy algorithm solve my problem?