We compare our package with default Hadoop MapReduce running over default HDFS. For detailed configuration and setup, please refer to our userguide.

TestDFSIO

TestDFSIO Throughput

dfsio ri2 thruput

Experimental Testbed: Each node in OpenPOWER Cluster has 20 cores (each core has 8 SMT threads) POWER8 8335-GTA processors at 3491 MHz and 256GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.2.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

For TestDFSIO throughput experiment, RDMA-IB design has an improvement of 2.18x - 2.26x compared to IPoIB (100Gbps).


Sort

Sort Execution Time

sort ri2 time

Experimental Testbed: Each node in OpenPOWER Cluster has 20 cores (each core has 8 SMT threads) POWER8 8335-GTA processors at 3491 MHz and 256GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.2.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

The RDMA-IB design improves the job execution time of Sort by a maximum of 55% compared to IPoIB (100Gbps).


TeraSort

TeraSort Execution Time

terasort ri2 time

Experimental Testbed: Each node in OpenPOWER Cluster has 20 cores (each core has 8 SMT threads) POWER8 8335-GTA processors at 3491 MHz and 256GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.2.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

The RDMA-IB design improves the job execution time of TeraSort by a maximum of 21% compared to IPoIB (100Gbps).