### Question 1. Is there any test one can run to compare device-api allreduce ( e.g. example 06) vs host-api allreduce ? 2. Is there any example kernel that uses device-api collective ( e.g. allreduce + gemm) ?