HiBench PageRank Execution Time on 32 Worker Nodes / 768 Cores

sort ri hdd

HiBench PageRank Execution Time on 64 Worker Nodes / 1536 Cores

terasort ri hdd

Experimental Testbed: Each compute node in this cluster has two twelve-core Intel Xeon E5-2680v3 processors, 128GB DDR4 DRAM, and 320GB of local SSD with CentOS operating system. The network topology in this cluster is 56Gbps FDR InfiniBand with rack-level full bisection bandwidth and 4:1 oversubscription cross-rack bandwidth.

These experiments are performed with full subscription utilization of Cores so that on 32 Worker Nodes jobs run with total of 768 maps and 768 reduces and on 64 Worker Nodes jobs run with total of 1536 maps and 1536 reduces. Spark is run in Standalone mode. Configuration used is spark_worker_memory=96GB, spark_worker_cores=24, spark_executor_instances=1. SSD is used for Spark Local and Work data.

The RDMA-IB design improves the average job execution time of HiBench PageRank on 32 Worker Nodes by 39% with 42% highest improvement and on 64 Worker Nodes by 40% with 46% highest improvement compared to IPoIB (56Gbps). The highest improvement is seen for BigData datasize for both 32 and 64 nodes.