Sort Execution Time on HDD

sort ri time

TeraSort Execution Time on HDD

terasort ri time

Experimental Testbed: Each node of our testbed has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed in 8 DataNodes with a total of 32 maps. For TeraSort, 16 reducers are launched; Sort runs with 14 reducers. Each DataNode has a single 1TB HDD. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5 GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode.

The RDMA-IB design improves the job execution time of Sort by 31% - 34% and TeraSort by 11% - 19% compared to IPoIB (32Gbps). Compared to 10GigE, the improvement is 27% - 35% in Sort and 21% in TeraSort.


Sort Execution Time on SSD

sort ri ssd

TeraSort Execution Time on SSD

terasort ri ssd

Experimental Testbed: Each node of our testbed has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed in 4 DataNodes with a total of 16 maps and 7 (8) reduces for Sort (TeraSort). Each DataNode has a single 300GB OCZ VeloDrive PCIe SSD. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5 GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode.

The RDMA-IB design improves the job execution time of Sort by 21% - 28% and TeraSort by 8% - 37% compared to IPoIB (32Gbps).

Sort Execution Time

sort stampede time

TeraSort Execution Time

terasort stampede time

Experimental Testbed: Each node of our testbed is dual-socket containing Intel Sandy Bridge (E5-2680) dual octa-core processors running at 2.70GHz. Each node has 32GB of main memory, a SE10P (B0-KNC) co-processor, and a Mellanox IB FDR MT4099 HCA. The host processors run CentOS release 6.3 (Final)

The Sort experiments are performed in 32 DataNodes with a total of 128 maps and 57 reducers. Each DataNode has a single 80GB HDD. HDFS block size is kept to 256 MB. Each NodeManager is configured to run with 6 concurrent containers assigning a minimum of 1.5 GB memory per container. The NameNode runs in a different node of the Hadoop cluster and the benchmark is run in the NameNode. The RDMA-IB design improves the job execution time of Sort by 20% - 46% over IPoIB (56Gbps).

The TeraSort experiments are performed in 16 DataNodes with a total of 64 maps and 32 reducers. All other configurations are kept as same as in Sort. The RDMA-IB design improves the job execution time of TeraSort by 29% - 48% over IPoIB (56Gbps).