Skip to content

Commit ff7effb

Browse files
arttianezhumeta-codesync[bot]
authored andcommitted
error handling on mcclComm::init
Summary: Arguably, the capture scope is very wide. The reason is that some of the errors should be crashing quickly, while others should be retried. I found a really small number of cases, where `std::runtime_error` is used in place of something more suitable, such as an `invalid_argument`. We will follow up to clean these up, so that non-recoverable items would fail fast and recoverable ones are handled by Mccl. But these findings are harmless for initial enablement. Reviewed By: dboyda Differential Revision: D85824307 fbshipit-source-id: 1c5b4e647ccb3ee7f2eaad927071dd8f40411b89
1 parent 9917f03 commit ff7effb

File tree

2 files changed

+11
-1
lines changed

2 files changed

+11
-1
lines changed

comms/ctran/mapper/tests/CtranMapperTcpdmUT.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ class CtranMapperTcpdmTest : public ::testing::Test {
3535
commRAII.reset();
3636
} catch (const std::runtime_error& e) {
3737
GTEST_SKIP() << "TCPDM backend not enabled. Skip test";
38+
} catch (const ctran::utils::Exception& e) {
39+
GTEST_SKIP() << "TCPDM backend not enabled. Skip test";
3840
}
3941
}
4042
void TearDown() override {

comms/ctran/mapper/tests/CtranMapperUT.cc

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,15 @@ TEST(CtranMapperUT, EnableBackendWithMCCLBackendOverride) {
112112
TEST(CtranMapperUT, EnableBackendThroughCVARsWithTCPandIB) {
113113
setenv("NCCL_CTRAN_BACKENDS", "nvl, ib, socket, tcpdm", 1);
114114
ncclCvarInit();
115-
EXPECT_THROW(createDummyCtranComm(), std::runtime_error);
115+
std::optional<std::exception> ex;
116+
try {
117+
createDummyCtranComm();
118+
} catch (const std::runtime_error& e) {
119+
ex = e;
120+
} catch (const ctran::utils::Exception& e) {
121+
ex = e;
122+
}
123+
ASSERT_TRUE(ex.has_value());
116124
}
117125

118126
TEST(CtranMapperUT, BackendEnum) {

0 commit comments

Comments
 (0)