Why does "film_clap_cond1" CLAP use "embed_mode": "text" instead of "audio"?

Dear Author, Thank you for your significant contribution to audio generation. However, I have the following questions and would appreciate your answers.
your source code utils.py "get_basic_config()" says
"film_clap_cond1": {
                                    "cond_stage_key": "text",
                                    "conditioning_key": "film",
                                    "target": "audioldm2.latent_diffusion.modules.encoders.modules.CLAPAudioEmbeddingClassifierFreev2",
                                    "params": {
                                        "sampling_rate": 48000,
                                        "embed_mode": "text",
                                        "amodel": "HTSAT-base",
                                    },
                                },
**HTSAT-base is AudioEncoder, why "embed_mode": "text" instead of "audio"? Can I replace it with CLAP's audio encoder? like follows:**
"film_clap_cond1": {
                                    "cond_stage_key": "audio",
                                    "conditioning_key": "film",
                                    "target": "audioldm2.latent_diffusion.modules.encoders.modules.CLAPAudioEmbeddingClassifierFreev2",
                                    "params": {
                                        "sampling_rate": 48000,
                                        "embed_mode": "audio",
                                        "amodel": "HTSAT-base",
                                    },
                                },

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does "film_clap_cond1" CLAP use "embed_mode": "text" instead of "audio"? #94

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why does "film_clap_cond1" CLAP use "embed_mode": "text" instead of "audio"? #94

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions