Friday, January 18, 2013

Hadoop RPC is not RMI

I never closely looked IPC protocols in hadoop. Just run into this today, and luckily, found the explanation from the inventor:


Why use Hadoop IPC over RMI or java.io.Serialization? Here's what Doug has to say:
Why didn't I use Serialization when we first started Hadoop? Because it looked big-and-hairy and I thought we needed something lean-and-mean, where we had precise control over exactly how objects are written and read, since that is central to Hadoop. With Serialization you can get some control, but you have to fight for it.

The logic for not using RMI was similar. Effective, high-performance inter-process communications are critical to Hadoop. I felt like we'd need to precisely control how things like connections, timeouts and buffers are handled, and RMI gives you little control over those.

 I am going to brew my own RPC protocol in C++ as well !

No comments:

Post a Comment