-
Notifications
You must be signed in to change notification settings - Fork 183
Description
I manually downloaded lisa_plus_7b and clip-vit-large-patch14 from hugging face and put them in my folder. Then I run python chat_instance.py --precision='bf16' with a single GPU(32g memory) of Tesla V100.
I tried to ask in this way then I get the following results, and no camera lens were segmented, I only get the original image:
Please input your prompt: Please describe the two cameras. + DEFAULT_INST_CONV_SEG
Please input the image path: ./imgs/camera_lens.jpg
text_output: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: Please describe the two cameras. + DEFAULT_INST_CONV_SEG ASSISTANT: Sure, the image depicts two different sized cameras placed on a surface. One of the cameras is larger and has a front element attached to it, which appears to be a lens, and the other camera is smaller in size. The larger camera has a front element that is a lens as well, and it is placed on top of the smaller camera. The smaller camera has a front facing lens and is positioned directly below the larger lens in the image.
len(pred_masks): 1
[x.shape for x in pred_masks]: [torch.Size([0, 1620, 2160])]
./lisa_output/camera_lens_masked_img.jpg has been saved.
With another example, there are some fruits in the image:
Please input your prompt: What kind of fruits in this image? Please describe and segment them to me one by one?
Please input the image path: ./imgs/fruits.jpg
text_output: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: What kind of fruits in this image? Please describe and segment them to me one by one? ASSISTANT: Sure, the segmentation results are:- Apples- Pears- Grapes- Plums (or dark colored fruit, it's difficult to determine the exact type from the image)- Kiwis-[ [SEG] ]
len(pred_masks): 1
[x.shape for x in pred_masks]: [torch.Size([1, 1200, 900])]
./lisa_output/fruits_mask_0.jpg has been saved.
./lisa_output/fruits_masked_img.jpg has been saved.
I still cannot get any segment results.
However, if I ask in the following way, I can get segment results but with very low quality:
Please input your prompt: Please segment the fruit in the image one by one. + DEFAULT_INSTANT_SEG
Please input the image path: ./imgs/fruits.jpg
text_output: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: Please segment the fruit in the image one by one. + DEFAULT_INSTANT_SEG ASSISTANT: Sure, the segmentation results are [SEG] , [SEG] , and [SEG] .
len(pred_masks): 1
[x.shape for x in pred_masks]: [torch.Size([3, 1200, 900])]
./lisa_output/fruits_mask_0.jpg has been saved.
./lisa_output/fruits_mask_1.jpg has been saved.
./lisa_output/fruits_mask_2.jpg has been saved.
./lisa_output/fruits_masked_img.jpg has been saved.
**Is there something I ignore potentionally? Would be very appreciate to get any reply. Thanks!I only installed the necessary packages for inference, without the flash-attn module, is it necessary for inference? **