Friday, January 18, 2013

Hadoop RPC is not RMI

I never closely looked IPC protocols in hadoop. Just run into this today, and luckily, found the explanation from the inventor:


Why use Hadoop IPC over RMI or java.io.Serialization? Here's what Doug has to say:
Why didn't I use Serialization when we first started Hadoop? Because it looked big-and-hairy and I thought we needed something lean-and-mean, where we had precise control over exactly how objects are written and read, since that is central to Hadoop. With Serialization you can get some control, but you have to fight for it.

The logic for not using RMI was similar. Effective, high-performance inter-process communications are critical to Hadoop. I felt like we'd need to precisely control how things like connections, timeouts and buffers are handled, and RMI gives you little control over those.

 I am going to brew my own RPC protocol in C++ as well !

Tuesday, January 8, 2013

c++ segmentation fault and undefined behaviour causes



1.I run into a issue when I call a function from a object it throws a Segmentation fault. It turns out to be the problem that I did n't make a copy of dynamic allocated elements when copying a object. The default copy constructor (and/or operator=) may not handle that correctly.

Details are explained well in this post:

http://www.cplusplus.com/forum/general/28420/

2. when you reference a item, but that item was deallocated already.
e.g.

class A {
    tcp::socket _sock;

public:
    void connect() {
        tcp::io_service _io_service;
        _sock = tcp::socket(_io_service);
    }
...
}

When connect() is called _sock is assigned with the connection socket. But _io_service is destructed when connect() finishes.

3. Destructor not virtual.