Wednesday, December 14, 2011

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

Dryad can be seen basically as an extension of MapReduce. It models computation of data as a DAG and data may flow between the nodes. The authors show MapReduce as one possible DAG, and that the framework generalizes to even more. In addition, Dryad also adds more methods of communication than MapReduce, which relied on files to pipe one stage into another. In Dryad, nodes may communicate using files, sockets, pipes, and even shared memory.

Dryad does solve some of the problems that I had with MapReduce. It allows for efficient piping from one MapReduce job to another without having to go to disk each time. However, this comes with the cost of complexity. By exposing the programmer with all these knobs to play with, the programmer may feel overwhelmed. Overall, I agree with the sentiment that experts programmers should use Dryad to build simpler platforms which other programmers can use.

Dryad has not had as much success as MapReduce. I believe this is because of all the extra complexity it has brought to the table. MapReduce remains simpler in the minds of people, and so is the go-to solution for a lot of the problems they face. Perhaps Dryad would face better success by masquerading as a MapReduce platform with all these extra knobs that expert programmers can use to build interesting platforms.

No comments:

Post a Comment