Dear author, thank you for your great work!
I'm interested in the method, and I have some question to consult.
In the framework figure 2. The point encoder backbone is consist of PointBert and Point MAE module, which will group the points into patches, but semantic and instance segmentation require point-wise prediction. Therefore how to obtain the prediction.
And I also have another question, when perform the contrastive learning, how does the depth image features align with the point features.
Thank you very much.
Dear author, thank you for your great work!
I'm interested in the method, and I have some question to consult.
In the framework figure 2. The point encoder backbone is consist of PointBert and Point MAE module, which will group the points into patches, but semantic and instance segmentation require point-wise prediction. Therefore how to obtain the prediction.
And I also have another question, when perform the contrastive learning, how does the depth image features align with the point features.
Thank you very much.