Design and Optimization of GPU-Aware MPI Allreduce Using Direct Sendrecv Communication C. Chen, J. Yao, H. Subramoni, D. Panda 54th International Conference on Parallel Processing, Sep 2025.