This is a Viam module providing a model of vision service for TorchVision's New Multi-Weight Support API.
For a given model architecture (e.g. ResNet50), multiple weights can be available. Each of those weights comes with preprocessing and label metadata.
First, create a machine in Viam.
To use this module, follow these instructions to add a module from the Viam Registry and select the viam:vision:torchvision model from the torchvision module.
Navigate to the CONFIGURE tab of your machine in the Viam app.
Add vision / torchvision to your machine.
Depending on the type of models configured, the module implements:
-
For detectors:
GetDetections()GetDetectionsFromCamera()
-
For classifiers:
GetClassifications()GetClassificationsFromCamera()
To configure the torchvision model, use the following template:
{
"model_name": <string>,
"labels_confidences": {
<label1>: <float>,
<label2>: <float>
},
"default_minimum_confidence": <float>
}The only required attribute to configure your torchvision vision service is a model_name:
| Name | Type | Inclusion | Default | Description |
|---|---|---|---|---|
model_name |
string | Required | Vision model name as expected by the method get_model() from torchvision multi-weight API. |
| Name | Type | Inclusion | Default | Description |
|---|---|---|---|---|
weights |
string | Optional | DEFAULT |
Weights model name as expected by the method get_model() from torchvision multi-weight API. |
default_minimum_confidence |
float | Optional | Default minimum confidence for filtering all labels that are not specified in label_confidences. |
|
labels_confidences |
dict[str, float] | Optional | Dictionary specifying minimum confidence thresholds for specific labels. Example: {"grasshopper": 0.5, "cricket": 0.45}. If a label has a confidence set lower that default_minimum_confidence, that confidence over-writes the default for the specified label if labels_confidences is left blank, no filtering on labels will be applied. |
|
use_weight_transform |
bool | Optional | True | Loads preprocessing transform from weights metadata. |
input size |
List[int] | Optional | None |
Resize the image. Overides resize from weights metadata. |
mean_rgb |
[float, float, float] | Optional | [0, 0, 0] |
Specifies the mean and standard deviation values for normalization in RGB order. |
std_rgb |
[float, float, float] | Optional | [1, 1, 1] |
Specifies the standard deviation values for normalization in RGB order. |
swap_r_and_b |
bool | Optional | False |
If True, swaps the R and B channels in the input image. Use this if the images passed as inputs to the model are in the OpenCV format. |
channel_last |
bool | Optional | False |
If True, the image tensor will be converted to channel-last format. Default is False. |
- If there are a transform in the metadata of the weights and
use_weight_transformis True,weights_transformis added to the pipeline. - If
input_sizeis provided, the image is resized usingv2.Resize()to the specified size. - If both mean and standard deviation values are provided in
normalize, the image is normalized usingv2.Normalize()with the specified mean and standard deviation values. - If
swap_R_and_Bis set toTrue, first and last channel are swapped. - If
channel_lastisTrue, a transformation is applied to convert the channel order to the last dimension format. (C, H ,W) -> (H, W, X).
The following JSON config file includes the following resources:
- TorchVision module
- modular resource (TorchVision vision service)
- a webcam camera
- a transform camera
{
"modules": [
{
"executable_path": "/path/to/run.sh",
"name": "mytorchvisionmodule",
"type": "local"
}
],
"services": [
{
"attributes": {
"model_name": "fasterrcnn_mobilenet_v3_large_320_fpn",
"labels_confidences": {"grasshopper": 0.5,
"cricket": 0.45 },
"default_minimum_confidence": 0.3
},
"name": "detector-module",
"type": "vision",
"namespace": "rdk",
"model": "viam:vision:torchvision"
}
],
"components": [
{
"namespace": "rdk",
"attributes": {
"video_path": "video0"
},
"depends_on": [],
"name": "cam",
"model": "webcam",
"type": "camera"
},
{
"model": "transform",
"type": "camera",
"namespace": "rdk",
"attributes": {
"source": "cam",
"pipeline": [
{
"attributes": {
"detector_name": "detector-module",
"confidence_threshold": 0.5
},
"type": "detections"
}
]
},
"depends_on": [],
"name": "detections"
}
]
}