We compare our package with default Spark. For detailed configuration and setup, please refer to our userguide.

SortBy

SortBy Execution Time

sortby ri2

Experimental Testbed: Each node in OpenPOWER Cluster 1 has 20 cores (each core has 8 SMT threads) POWER8 8335-GTA processors at 3491 MHz and 256GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.2.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

For SortBy experiment, RDMA-IB design has a maximum improvement of 18% compared to IPoIB (100Gbps).


GroupBy

Groupby Execution Time

grouby ri2 time

Experimental Testbed: Each node in OpenPOWER Cluster 2 has 20 cores (each core has 8 SMT threads) POWER8 8335-GTA processors at 3491 MHz and 256GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.2.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

The RDMA-IB design improves the job execution time of GroupBy by a maximum of 11% compared to IPoIB (100Gbps).

TeraSort

TeraSort Execution Time

terasort ri2 time

Experimental Testbed: Each node in OpenPOWER Cluster 2 has 20 cores (each core has 8 SMT threads) POWER8NVL processors at 2 GHz and 64GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.0.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

The RDMA-IB design improves the job execution time of TeraSort by a maximum of 35% compared to IPoIB (100Gbps).


Sort

Sort Execution Time

ort ri2 time

Experimental Testbed: Each node in OpenPOWER Cluster 2 has 20 cores (each core has 8 SMT threads) POWER8NVL processors at 2 GHz and 64GB RAM. The nodes are equipped with Mellanox ConnectX-4 EDR HCAs. The operating system used is Red Hat Enterprise Linux Server release 7.0.

These experiments are performed in 5 DataNodes with a total of 60 maps. HDFS block size is kept to 256MB. Each NodeManager is configured to run with 12 concurrent containers assigning a minimum of 4GB memory per container. The NameNode runs in a different node of the Hadoop cluster..

The RDMA-IB design improves the job execution time of Sort by a maximum of 25% compared to IPoIB (100Gbps).