In HHH (default) mode, we evaluate RDMA-enhanced MapReduce over RDMA-enhanced HDFS with heterogeneous storage support. We compare our package with default Hadoop MapReduce running over default HDFS with heterogeneous storage. In this mode, as heterogeneous storage, we have used RAM disks, SSDs, and HDDs for data storage. For detailed configuration and setup, please refer to our userguide.

TestDFSIO Latency and Throughput

TestDFSIO Latency

dfsio ri time

TestDFSIO Throughput

dfsio ri thruput

Experimental Testbed: Each node in OSU-RI has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed in 8 DataNodes with a total of 32 maps. Each DataNode has a single 1TB HDD, single 300GB SSD, and 12GB of RAM disk. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode.

The RDMA-IB design improves the job execution time of TestDFSIO by up to 3.16x compared to IPoIB (32Gbps). Compared to 10GigE, the improvement is 3x. For TestDFSIO throughput experiment, RDMA-IB design has an improvement of 2.4x - 3.2x and 2.5x - 3.1x compared to IPoIB (32Gbps) and 10GigE, respectively.


RandomWriter and Sort

RandomWriter Execution Time

randomwriter ri time

Sort Execution Time

sort ri time

Experimental Testbed: Each node in OSU-RI has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed in 8 DataNodes with a total of 32 maps. For Sort, 14 reduces are launched. Each DataNode has a single 1TB HDD, single 300GB SSD, and 12GB of RAM disk. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5 GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode.

The RDMA-IB design improves the job execution time of RandomWriter by 28% - 39% and Sort by 33% - 42% compared to IPoIB (32Gbps). Compared to 10GigE, the improvement is 26% - 38% in RandomWriter and 44% - 70% in Sort.


TeraGen and TeraSort

TeraGen Execution Time

teragen ri time

TeraSort Execution Time

terasort ri time

Experimental Testbed: Each node in OSU-RI has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed in 8 DataNodes with a total of 32 maps and 16 reduces. Each DataNode has a single 1TB HDD, single 300GB SSD, and 12GB of RAM disk. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5 GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode.

The RDMA-IB design improves the job execution time of TeraGen by a maximum of 55% and of TeraSort by a maximum of 42% compared to IPoIB (32Gbps). Compared to 10GigE, the maximum benefits observed are 58% and 50%, for TeraGen and TeraSort, respectively.