OHB Set Micro-benchmark

set latency RI

OHB Get Micro-benchmark

get latency RI

OHB Mix Micro-benchmark

mix latency RI

Experimental Testbed (OSU - RI): Each node of our testbed has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed with a single Memcached server running with 1GB of memory and a single Memcached client node.

In the OHB Set Micro-benchmark, the Memcached client repeatedly sets an item of a particular size on the Memcached server. The RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by up to 81% over default Memcached running over IPoIB (32Gbps).

In the OHB Get Micro-benchmark, the Memcached client repeatedly gets an item of a particular size from the Memcached server. Compared to IPoIB (32Gbps), the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of get operations by up to 83%.

In the OHB Mix Micro-benchmark, the Memcached client repeatedly sets and gets an item of a particular size on the Memcached server, with a read/write mix of 90/10. We can observe that the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by up to 82% as compared to default Memcached over IPoIB (32Gbps).


OHB Hybrid Micro-benchmark with Normal Pattern

Hybrid normal RI

OHB Hybrid Micro-benchmark with Uniform Pattern

Hybrid uniform RI

OHB Hybrid Micro-benchmark Success Rate

Success Rate RI

Experimental Testbed (OSU - RI): Each node of our testbed has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed with a single Memcached server and a single Memcached client node.

In the OHB Hybrid Micro-benchmark with Normal Pattern, the Memcached client attempts to over-flow the Memcached server for a particular message size (spill factor of 1.5 i.e., workload size = 1.5 * memcached server memory), and then reads the key/value pairs at random, with some keys being queried more frequently that the others. The RDMA-enhanced SSD-based Hybrid Memcached design improves the latency of gets by up to 83-96% over pure In-memory RDMA-enhanced Memcached with a Memcached server miss penalty as low as 1.5 ms.

In the OHB Hybrid Micro-benchmark with Uniform Pattern, the Memcached client attempts to over-flow the Memcached server for a particular message size (spill factor of 1.5), and then reads the key/value pairs uniformly at random. The RDMA-enhanced SSD-based Hybrid Memcached design improves the latency of gets by more than 81% over pure In-memory RDMA-enhanced Memcached with a miss penalty of 1.5 ms.

In the OHB Hybrid Micro-benchmark Success Rate test, the Memcached client attempts to over-flow the Memcached server for a particular message size, and measure the success rate as the number of key/value pairs successfully fetched versus number of get attempts made, for different spill factors. As the RDMA-enhanced SSD-based Hybrid Memcached design faciliatates us to hold more key/values pairs, as compared to pure In-memory RDMA-enhanced Memcached, the success rate with the hybrid design remains constant at 100% while that of pure in-memory design degrades.


OHB Non-Blocking Set Micro-benchmark

nb set latency RI

OHB Non-Blocking Get Micro-benchmark

nb get latency RI

Experimental Testbed (OSU - RI): Each node of our testbed has two 4-core 2.53 GHz Intel Xeon E5630 (Westmere) processors and 24 GB main memory. The nodes support 16x PCI Express Gen2 interfaces and are equipped with Mellanox ConnectX QDR HCAs with PCI Express Gen2 interfaces. The operating system used was RedHat Enterprise Linux Server release 6.4 (Santiago).

These experiments are performed with a single Memcached server running with 4GB of memory and a single Memcached client node.

In the OHB Non-Blocking Set Micro-benchmark, the Memcached client issues set requests using memcached_iset and memcached_bset APIs and monitors them asynchronously using memcached_test or memcached_wait progress APIs with a request threshold of 32 ongoing requests. For varying key/value pair sizes, the iset/bset RDMA-enabled non-blocking APIs can improve overall set latency by over 57% over the default blocking set API while using the RDMA-enhanced Memcached design.

In the OHB Non-Blocking Get Micro-benchmark, the Memcached client issues get requests using memcached_iget and memcached_bget APIs and monitors them asynchronously using memcached_test or memcached_wait progress APIs with a request threshold of 32 ongoing requests. For varying key/value pair sizes, the iget/bget RDMA-enabled non-blocking APIs can improve overall get latency by over 69% over the default blocking get API while using the RDMA-enhanced Memcached design.

OHB Set Micro-benchmark

set latency RI

OHB Get Micro-benchmark

get latency RI

OHB Mix Micro-benchmark

mix latency RI

Experimental Testbed (OSU - RI2 Cluster): Each storage node is provisioned with Intel Broadwell (E5-2680-v4) dual fourteen-core processors, 512 GB of memory and a Mellanox IB EDR HCA.

These experiments are performed with a single Memcached server running with 1GB of memory and a single Memcached client node.

In the OHB Set Micro-benchmark, the Memcached client repeatedly sets an item of a particular size on the Memcached server. The RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by up to 5.3X over default Memcached running over IPoIB (32Gbps).

In the OHB Get Micro-benchmark, the Memcached client repeatedly gets an item of a particular size from the Memcached server. Compared to IPoIB (32Gbps), the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of get operations by more than 5.4X.

In the OHB Mix Micro-benchmark, the Memcached client repeatedly sets and gets an item of a particular size on the Memcached server, with a read/write mix of 90/10. We can observe that the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by more than 5.3X as compared to default Memcached over IPoIB (32Gbps).


OHB Hybrid Micro-benchmark with Uniform Pattern

Hybrid uniform RI

OHB Non-Blocking Set Micro-benchmark

nb set latency RI

OHB Non-Blocking Get Micro-benchmark

nb get latency RI

Experimental Testbed (OSU - RI2 Cluster): Each storage node is provisioned with Intel Broadwell (E5-2680-v4) dual fourteen-core processors, 512 GB of memory and a Mellanox IB EDR HCA.

In the OHB Hybrid Micro-benchmark with Uniform Pattern, the Memcached client attempts to over-flow the Memcached server for a particular message size (spill factor of 1.5), and then reads the key/value pairs uniformly at random. This experiment is performed with a single Memcached server running with 10GB of memory and a single Memcached client node. The RDMA-enhanced SSD-based Hybrid Memcached design improves the latency of gets by up to 93% over pure In-memory RDMA-enhanced Memcached with a miss penalty of 1.5 ms.

In the OHB Non-Blocking Set Micro-benchmark, the Memcached client issues set requests using memcached_iset and memcached_bset APIs and monitors them asynchronously using memcached_test or memcached_wait progress APIs with a request threshold of 32 ongoing requests. This experiment runs a single Memcached server running with 20GB of memory and a single Memcached client node. For varying key/value pair sizes, the iset/bset RDMA-enabled non-blocking APIs can improve overall set latency by over 59% over the default blocking set API while using the RDMA-enhanced Memcached design.

In the OHB Non-Blocking Get Micro-benchmark, the Memcached client issues get requests using memcached_iget and memcached_bget APIs and monitors them asynchronously using memcached_test or memcached_wait progress APIs with a request threshold of 32 ongoing requests. This experiment runs a single Memcached server running with 20GB of memory and a single Memcached client node. For varying key/value pair sizes, the iget/bget RDMA-enabled non-blocking APIs can improve overall get latency by over 73% over the default blocking get API while using the RDMA-enhanced Memcached design.

OHB Set Micro-benchmark

set latency Comet

OHB Get Micro-benchmark

get latency Comet

OHB Mix Micro-benchmark

mix latency Comet

Experimental Testbed (SDSC - Comet): Each compute node in this cluster has two twelve-core Intel Xeon E5-2680 v3 (Haswell) processors, 128GB DDR4 DRAM, and 320GB of local SATA-SSD with CentOS operating system. The network topology in this cluster is 56Gbps FDR InfiniBand with rack-level full bisection bandwidth and 4:1 over-subscription cross-rack bandwidth.

These experiments are performed with a single Memcached server running with 1GB of memory and a single Memcached client node.

In the OHB Set Micro-benchmark, the Memcached client repeatedly sets an item of a particular size on the Memcached server. The RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by up to 70% over default Memcached running over IPoIB (32Gbps).

In the OHB Get Micro-benchmark, the Memcached client repeatedly gets an item of a particular size from the Memcached server. Compared to IPoIB (32Gbps), the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of get operations by up to 69%.

In the OHB Mix Micro-benchmark, the Memcached client repeatedly sets and gets an item of a particular size on the Memcached server, with a read/write mix of 90/10. We can observe that the RDMA-enhanced Memcached design (both In-memory and SSD-based Hybrid) improves the latency of set operations by up to 71% as compared to default Memcached over IPoIB (32Gbps).


OHB Set Micro-benchmark

set latency stampede

OHB Get Micro-benchmark

get latency stampede

OHB Mix Micro-benchmark

mix latency stampede

Experimental Testbed (TACC - Stampede): Each node of our testbed is dual-socket containing Intel Sandy Bridge (E5-2680) dual octa-core processors running at 2.70GHz. Each node has 32GB of main memory, a SE10P (B0-KNC) co-processor, and a Mellanox IB FDR MT4099 HCA. The host processors run CentOS release 6.3 (Final).

These experiments are performed with a single Memcached server with 1G of memory and a single Memcached client node. Since the nodes on this cluster are not equipped with SSDs, we run RDMA-enhanced Memcached in In-memory mode only.

In the OHB Set Micro-benchmark, the Memcached client repeatedly sets an item of a particular size on the Memcached server. The RDMA-enhanced Memcached design improves the latency of set operations by up to 86% over default Memcached running over IPoIB (56Gbps).

In the OHB Get Micro-benchmark, the Memcached client repeatedly gets an item of a particular size from the Memcached server. Compared to IPoIB (56Gbps), the RDMA-enhanced Memcached design improves the latency of get operations by up to 87%.

In the OHB Mix Micro-benchmark, the Memcached client repeatedly sets and gets an item of a particular size on the Memcached server, with a read/write mix of 90/10. We can observe that the RDMA-enhanced Memcached design improves the latency of set operations by up to 87% as compared to default Memcached over IPoIB (56Gbps).