Skip to content

Conversation

@BritishWerewolf
Copy link
Contributor

Prerequisites

These PRs are required because the processors lean heavily on the new features.


Add support for the U-2-Net architecture which is a mask generation model; useful for background removal.
Both U-2-Net and ISNet can be used with these processors.

Below is an example of what the model will achieve when using the BritishWerewolf/U-2-Netp model.

Model Original Masked
U-2-Netp A photo of a Loungefly backpack with a Moana design on it. The background is a single shade of white. The same photo of the Loungefly backpack, however this time the white background has been removed and become transparent. The masking isn't perfect, and there is still a white halo around the image.
IS-Net A photo of a Loungefly backpack with a Moana design on it. The background is a single shade of white. The same photo of the Loungefly backpack, however this time the white background has been removed and become transparent. The masking isn't perfect, though better than U-2-Netp, and there is still a white halo around the image.

If you would like to run this code, you can do with the BritishWerewolf/U-2-Net collection of models.

// Create the processor.
const processor = await AutoProcessor.from_pretrained(modelName);

// Create the model.
const model = await AutoModel.from_pretrained(modelName, {
	dtype: 'fp32',
});

// Process the image.
const image = await RawImage.read('https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png');
const processed_image = await processor(image);

// Run the model.
const output = await model({
	input: processed_image.pixel_values,
});

// Retrieve the mask, and scale it up to the size of the original image.
const mask = await RawImage.fromTensor(output.mask)
	.resize(image.width, image.height);

// Apply the mask to the image and save it.
const maskedImage = await image.putAlpha(mask);
maskedImage.save('masked_image.png');

…gmentation.

This supports both the U-2-Net and ISNet models.
@xenova
Copy link
Collaborator

xenova commented Apr 30, 2025

Hi @BritishWerewolf 👋 Thanks for the PR!

I would imagine that the additional pre-processing needed introduces quite a large latency to model inference, especially because the necessary operations are implemented in JS as opposed to WASM/WebGPU. Is that correct?

Do you see any benefits of a model like this over a model listed in the recent background-removal pipeline update? #1216

@BritishWerewolf
Copy link
Contributor Author

Hey @xenova, I started work on this before the pipeline, but kept this PR as draft because I hoped to fix it down the line.
My current laptop just became too cumbersome to work with, because each iteration I was making just started to take a few minutes and it started adding up.

If you’re happy to leave this open as draft, then I will have a look into it when I buy a new laptop; but if you want to clean up the PRs and close this I completely understand (and this may be the better approach, I can reopen in future if needed).

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants