Skip to content

Latest commit

 

History

History
260 lines (197 loc) · 12.5 KB

File metadata and controls

260 lines (197 loc) · 12.5 KB

Creating Custom Plugins

Plugins are Python script that transforms a payload during dataset generation. This is typically used to assess transformation based jailbreaking techniques, or to modify prompts into a target friendly format.

Sample plugins can be found within the workspace/plugins/ directory, created by running spikee init. Further information about built-in plugins and usage examples can be found in Built-in Plugins.

Plugins vs. Dynamic Attacks: What's the Difference?

Both Plugins and Dynamic Attacks can generate variations of a payload, but they serve different purposes in the testing workflow:

  • Plugins (Pre-Test Transformation):

    • When they run: During spikee generate.
    • What they do: Create multiple variations of a payload. Each variation is saved as a separate, independent entry in the final dataset file.
    • Result: When you run spikee test, every single variation generated by the plugin is tested against the target. This is useful for systematically evaluating a target's resilience to a known set of transformations (e.g., "Is the target vulnerable to Base64 encoding? To Leetspeak?").
  • Dynamic Attacks (Real-Time Transformation):

    • When they run: During spikee test, but only if the initial, standard prompt fails.
    • What they do: Generate and test variations one by one in real-time. The attack stops as soon as a variation succeeds.
    • Result: Only the first successful variation (or the final failed attempt) is logged. This is useful for efficiently finding any successful bypass, rather than testing every possible variation.

In short, use Plugins to build a comprehensive dataset of known transformations. Use Dynamic Attacks to find a single successful bypass with adaptive, real-time logic.

Plugin Structure

Every plugin is a Python module located in the plugins/ directory of your workspace. Spikee identifies plugins by their filename.

Plugin Template

from spikee.templates.plugin import Plugin
from spikee.templates.basic_plugin import BasicPlugin
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import ModuleDescriptionHint, ModuleOptionsHint, Content
from typing import List, Union, Tuple

class SamplePlugin(Plugin):
    def get_description(self) -> ModuleDescriptionHint:
        """Returns the type and a short description of the plugin."""
        return [], "A brief description of what this plugin does."

    def get_available_option_values(self) -> ModuleOptionsHint:
        """Return supported attack options; Tuple[options (default is first), llm_required]"""
        return [], False

    def transform(
        self, 
        content: Content, # To specify specific content types, use str, Audio, Image subclasses of Content
        exclude_patterns: Optional[List[str]] = None,
        plugin_option: str = ""
    ) -> Union[Content, List[Content]]:
        """Transforms the input text according to the user-defined logic, returning one or more variations.

        Args:
            content (Content): The input prompt to transform.
            exclude_patterns (List[str], optional): Regex patterns for substrings to preserve.

        Returns:
            Content: The transformed text in uppercase.
        """
        # Your implementation here...

class SampleBasicPlugin(BasicPlugin):
    def get_description(self) -> ModuleDescriptionHint:
        """Returns the type and a short description of the plugin."""
        return [], "A brief description of what this plugin does."

    def get_available_option_values(self) -> ModuleOptionsHint:
        """Return supported attack options; Tuple[options (default is first), llm_required]"""
        return [], False

    def plugin_transform(
        self, 
        text: str, 
        plugin_option: str = "",
    ) -> str:
        """Transforms the input text according to the user-defined logic, returning a single variation.

        Args:
            text (str): The input prompt to transform.
            plugin_option (str, optional): A string option passed from the command line for custom behavior.

        Returns:
            str: The transformed text in uppercase.
        """
        # Your implementation here...

The transform Function

This is the core function of every plugin. It receives a payload string and returns one or more transformed versions.

Parameters

  • content: Content: The input payload, which is typically a combination of a jailbreak and a malicious instruction.

  • exclude_patterns: List[str]: A list of regular expression patterns. Your plugin must not transform any part of the content that matches one of these patterns. This is critical for preserving sensitive parts of a prompt, like URLs or specific keywords.

  • plugin_option: str (Optional): A string passed from the command line via --plugin-options (e.g., "my_plugin:mode=full;variants=10"). If your plugin doesn't need configuration, you can omit this parameter.

Return Values

  • str: Return a single transformed string. Spikee will create one new test case from this.
  • List[str]: Return a list of transformed strings. Spikee will create a separate test case for each string in the list, allowing you to test multiple variations at once.

Signature with Options Support

For more advanced plugins, you can accept a configuration string and advertise the available options. This may be implemented as a class method (recommended) or as a legacy module-level function — both are supported for backward compatibility.

from typing import List, Union, Optional
from spikee.utilities.hinting import Content, ModuleOptionsHint

def get_available_option_values(self) -> ModuleOptionsHint:
    """Return supported attack options; Tuple[options (default is first), llm_required]"""
    return ["mode=strict", "mode=full"], False # "mode=strict" is the default

def transform(self, content: Content, exclude_patterns: Optional[List[str]] = None, plugin_option: str = "") -> Union[Content, List[Content]]:
    """Transforms the payload based on the provided option."""
    # Your transformation logic here...

Supporting Plugin Options

For more advanced plugins you can support runtime configuration via the --plugin-options CLI flag. Options are passed into your plugin's transform (or plugin_transform) method as the plugin_option string, and you parse that string yourself using the parse_options utility.

Advertising Available Options

Implement get_available_option_values to tell Spikee which options your plugin accepts. This function must return a Tuple[List[str], bool] where:

  • The first element is a list of option strings. The first item is treated as the default value shown by spikee list plugins.
  • The second element is a boolean — True if the plugin requires an LLM/provider to operate, False otherwise.

Return ([], False) to indicate the plugin has no configurable options.

from spikee.templates.plugin import Plugin
from spikee.utilities.modules import parse_options
from spikee.utilities.hinting import Content, ModuleOptionsHint
from typing import List, Union, Optional

class SamplePlugin(Plugin):
    def get_available_option_values(self) -> ModuleOptionsHint:
        # First entry is the default; advertised by `spikee list plugins`
        return ["mode=strict,variants=1", "mode=full,variants=5"], False

    def transform(
        self,
        content: Content,
        exclude_patterns: Optional[List[str]] = None,
        plugin_option: str = "",
    ) -> Union[Content, List[Content]]:
        opts = parse_options(plugin_option)          # {"mode": "strict", "variants": "1"}
        mode = opts.get("mode", "strict")
        variants = int(opts.get("variants", 1))
        # Your implementation here...

Passing Options from the CLI

Options are supplied with --plugin-options using the format plugin_name:key=value,key2=value2.

# Single plugin with two options
spikee generate --seed-folder datasets/seeds-cybersec-2026-01 \
    --plugins my_plugin \
    --plugin-options "my_plugin:mode=full,variants=5"

Multiple Plugins with Individual Options

When running multiple plugins at once, separate each plugin's options with a semicolon (;):

# Two plugins, each with their own independent options
spikee generate --seed-folder datasets/seeds-cybersec-2026-01 \
    --plugins plugin_a plugin_b \
    --plugin-options "plugin_a:mode=strict;plugin_b:variants=10"

# Three plugins — only one needs options
spikee generate --seed-folder datasets/seeds-cybersec-2026-01 \
    --plugins base64 splat best_of_n \
    --plugin-options "best_of_n:variants=5"

The option string delivered to each plugin contains only that plugin's own key-value pairs (i.e., what appears after the plugin_name: prefix). Plugins that are listed under --plugins but have no entry in --plugin-options receive an empty string for plugin_option.

Handling Exclude Patterns

Correctly handling exclude_patterns is the most important part of writing a robust plugin. You must leave the excluded parts of the string completely untouched. The recommended way to do this is with re.split as implemnted within the BasicPlugin.

# Example transformation function converting all text to uppercase with exclude_patterns support
import re
from typing import List, Union, Optional
from spikee.utilities.hinting import Content, get_content

def transform(self, content: Content, exclude_patterns: Optional[List[str]] = None) -> Union[Content, List[Content]]:
    text = get_content(content)  # Unwrap Content wrapper to get the raw string

    if not exclude_patterns:
        # No exclusions, transform the whole text
        return apply_transformation(text)

    # 1. Create a single regex pattern that captures any of the exclude patterns.
    # The parentheses around the pattern are crucial for re.split to keep the delimiters.
    combined_pattern = "(" + "|".join(exclude_patterns) + ")"
    
    # 2. Split the text by the combined pattern.
    # even-indexed chunks are normal text; odd-indexed chunks are the exclusions.
    chunks = re.split(combined_pattern, text)
    
    # 3. Transform only the non-excluded chunks.
    transformed_chunks = []
    for i, chunk in enumerate(chunks):
        if i % 2 == 0:
            # This is normal text, apply the transformation
            transformed_chunks.append(apply_transformation(chunk))
        else:
            # This is an excluded part, keep it as is
            transformed_chunks.append(chunk)
            
    # 4. Rejoin the chunks into a single string.
    return "".join(transformed_chunks)

def apply_transformation(text: str) -> str:
    return text.upper()

Multimodal Plugins

Plugins can output non-text content types by returning Audio or Image objects. This is how TTS (text-to-speech) and image-generation plugins work. When a plugin returns a Content subclass, the generator updates the dataset entry's content_type field accordingly so that targets and judges can handle it correctly.

Content-type routing: The generator inspects the plugin's transform (or plugin_transform) parameter annotations to decide whether to call it:

  • A content: Content parameter annotation — plugin accepts any content type.
  • A content: str (or text: str) parameter annotation — plugin only accepts text; the generator will skip it for audio/image entries.
from typing import Optional, List
from spikee.templates.plugin import Plugin
from spikee.utilities.enums import ModuleTag
from spikee.utilities.hinting import Audio, Content, get_content, ModuleDescriptionHint, ModuleOptionsHint

class MyTTSPlugin(Plugin):
    """Example plugin that converts text to audio using a TTS service."""

    def get_description(self) -> ModuleDescriptionHint:
        return [ModuleTag.SINGLE], "Converts text payload to Audio via TTS"

    def get_available_option_values(self) -> ModuleOptionsHint:
        return ["voice=alloy", "voice=nova"], True  # Requires LLM/TTS provider

    def transform(
        self,
        content: str,  # Annotate as str: only receives text entries
        exclude_patterns: Optional[List[str]] = None,
        plugin_option: str = "",
    ) -> Audio:
        text = get_content(content)
        # ... call TTS API to get base64-encoded audio bytes ...
        audio_bytes_b64 = call_tts_api(text)
        return Audio(audio_bytes_b64)

See spikee/plugins/tts.py and spikee/plugins/text2image.py for full reference implementations.