Skip to content

Add WordPress NLWeb Plugin#404

Open
rvguha wants to merge 2 commits intomainfrom
wordpress-plugin
Open

Add WordPress NLWeb Plugin#404
rvguha wants to merge 2 commits intomainfrom
wordpress-plugin

Conversation

@rvguha
Copy link
Collaborator

@rvguha rvguha commented Feb 25, 2026

Summary

This PR adds a WordPress plugin implementation of the NLWeb v0.55 protocol, enabling WordPress sites to provide conversational search powered by LLMs.

Features

Core Functionality

  • NLWeb v0.55 Protocol: Full support for the NLWeb protocol specification (query, context, prefer, meta sections)
  • Query Fanout: Automatically rewrites complex natural language queries into 1-5 simpler keyword queries for better retrieval
  • Decontextualization: Handles multi-turn conversations by contextualizing queries based on conversation history
  • LLM-based Ranking: Scores search results 0-100 for relevance with AI-generated descriptions

Integrations

  • Multiple LLM Providers: OpenRouter, Anthropic, and Gemini support
  • MCP Apps: Sandboxed iframe rendering with postMessage bridge for interactive applications
  • WordPress Integration: Uses native WordPress search (WP_Query) with Yoast SEO schema.org metadata

User Interface

  • ChatGPT-style Interface: Modern conversational UI with message bubbles, avatars, typing indicators
  • Debug Panel: Collapsible panel showing pipeline execution (decontextualization, query fanout, retrieval, ranking)
  • Responsive Design: Mobile-friendly with customizable styling

Plugin Structure

code/wordpress/nlweb/
├── nlweb.php                          # Main plugin file, REST endpoints
├── assets/
│   ├── nlweb-frontend.css             # ChatGPT-style interface styles
│   ├── nlweb-frontend.js              # Frontend JavaScript
│   └── nlweb-mcp-renderer.js          # MCP Apps renderer
└── includes/
    ├── class-nlweb-protocol.php       # NLWeb v0.55 protocol handler
    ├── class-nlweb-handler.php        # Main query orchestrator
    ├── class-nlweb-decontextualizer.php
    ├── class-nlweb-query-rewriter.php # Query fanout implementation
    ├── class-nlweb-retriever.php      # WordPress search integration
    ├── class-nlweb-ranker.php         # LLM-based ranking
    ├── class-nlweb-llm.php            # Multi-provider LLM wrapper
    ├── class-nlweb-settings.php       # Admin settings UI
    └── class-nlweb-frontend.php       # Shortcode renderer

Configuration

  • No hardcoded API keys (stored securely in WordPress options)
  • Admin UI for configuring LLM providers and models
  • Configurable settings:
    • LLM provider (OpenRouter, Anthropic, Gemini)
    • Models for high/low priority tasks
    • Query fanout enable/disable
    • Post types to search
    • Ranking thresholds

Usage

  1. Install plugin in WordPress wp-content/plugins/ directory
  2. Activate plugin in WordPress admin
  3. Navigate to Settings → NLWeb to configure API keys and models
  4. Add [nlweb_search] shortcode to any page
  5. Users can now search with natural language queries

Testing

Tested with:

  • WordPress 6.7.2
  • 3,220 sci-fi movies with schema.org metadata
  • OpenRouter with gpt-4o and gpt-4o-mini models
  • Query fanout working correctly with debug output
  • ChatGPT-style interface rendering properly

Files Not Included

This PR intentionally excludes:

  • code/wordpress/docker-compose.yml (test environment setup)
  • code/wordpress/setup.sh (test environment setup)
  • API keys (configured via admin UI)

Related Documentation

This commit adds a WordPress plugin implementation of NLWeb v0.55 protocol.

Features:
- NLWeb v0.55 protocol support (query, context, prefer, meta structure)
- Query fanout: Rewrites complex queries into 1-5 simpler keyword queries
- Decontextualization: Handles multi-turn conversations
- LLM integration: Support for OpenRouter, Anthropic, and Gemini providers
- Ranking: LLM-based relevance scoring (0-100) with descriptions
- MCP Apps: Sandboxed iframe rendering with postMessage bridge
- ChatGPT-style UI: Modern conversational interface with message bubbles
- Debug panel: Collapsible debug information showing pipeline execution

Plugin Structure:
- nlweb.php: Main plugin file, REST endpoints (/ask, /mcp)
- includes/: PHP classes for protocol, handlers, LLM calls, etc.
- assets/: Frontend JavaScript and CSS, MCP renderer

Configuration:
- No hardcoded API keys (stored in WordPress options)
- Admin UI for configuring LLM providers and models
- Settings for query fanout, ranking thresholds, post types

Usage:
1. Install plugin in WordPress wp-content/plugins/ directory
2. Activate plugin
3. Configure API keys and models in admin settings
4. Add [nlweb_search] shortcode to any page
Copilot AI review requested due to automatic review settings February 25, 2026 21:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive WordPress plugin that implements the NLWeb v0.55 protocol for conversational search. The plugin integrates multiple LLM providers (OpenRouter, Anthropic, Gemini) to provide AI-powered search capabilities with features like query decontextualization, query fanout, and LLM-based ranking.

Changes:

  • Adds complete WordPress plugin with admin settings, REST API endpoints, and conversational UI
  • Implements NLWeb v0.55 protocol with support for multi-turn conversations and query processing pipeline
  • Integrates with WordPress search, Yoast SEO, and optional WooCommerce for schema.org metadata extraction

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
nlweb.php Main plugin file with REST API endpoints for /ask and MCP
class-nlweb-settings.php Admin configuration UI for LLM providers and plugin settings
class-nlweb-protocol.php NLWeb v0.55 protocol request/response handler
class-nlweb-llm.php Multi-provider LLM wrapper with parallel request support
class-nlweb-retriever.php WordPress search integration with schema.org extraction
class-nlweb-ranker.php LLM-based relevance scoring for search results
class-nlweb-query-rewriter.php Query fanout implementation for complex queries
class-nlweb-handler.php Main orchestrator for the search pipeline
class-nlweb-frontend.php Shortcode renderer for conversational UI
class-nlweb-decontextualizer.php Multi-turn conversation context handling
nlweb-mcp-renderer.js MCP Apps iframe rendering with postMessage bridge
nlweb-frontend.js ChatGPT-style conversational interface
nlweb-frontend.css Modern responsive styling for the UI

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

'headers' => array(
'Content-Type' => 'application/json',
'x-api-key' => $api_key,
'anthropic-version' => '2023-06-01',
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'anthropic-version' header has a typo - it should be 'anthropic-version' (which it is), but note that Anthropic's current API version is '2023-06-01' as shown. However, Anthropic recommends using more recent API versions. As of 2024, '2024-01-01' or later versions are available. Consider updating to a more recent API version or making this configurable.

Suggested change
'anthropic-version' => '2023-06-01',
'anthropic-version' => '2024-01-01',

Copilot uses AI. Check for mistakes.
'top_score' => ! empty( $ranked ) ? $ranked[0]['score'] : null,
);

/* ---- 4. Optionally summarize ------------------------------ */
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent comment numbering: The pipeline step is labeled as "4. Optionally summarize" but it's actually step 5 (after 1. Decontextualize, 2. Query Fanout, 3. Retrieve, 4. Rank). Update the comment to say "5. Optionally summarize" for accuracy.

Suggested change
/* ---- 4. Optionally summarize ------------------------------ */
/* ---- 5. Optionally summarize ------------------------------ */

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +142
$seen_ids = array();

foreach ( $queries as $query ) {
$results = NLWeb_Retriever::search( $query, $max_per_query );

foreach ( $results as $item ) {
$id = $item['id'] ?? null;

// Skip if we've already seen this item
if ( $id && in_array( $id, $seen_ids, true ) ) {
continue;
}

$all_results[] = $item;
if ( $id ) {
$seen_ids[] = $id;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fanout_search method accesses $item['id'] which doesn't exist in the retriever results. The retriever returns tuples with [url, schema_json, name, site] (line 46 of class-nlweb-retriever.php) but doesn't include an 'id' field. This will cause undefined index errors and the deduplication logic won't work correctly. The deduplication should use 'url' as the unique identifier instead.

Suggested change
$seen_ids = array();
foreach ( $queries as $query ) {
$results = NLWeb_Retriever::search( $query, $max_per_query );
foreach ( $results as $item ) {
$id = $item['id'] ?? null;
// Skip if we've already seen this item
if ( $id && in_array( $id, $seen_ids, true ) ) {
continue;
}
$all_results[] = $item;
if ( $id ) {
$seen_ids[] = $id;
$seen_urls = array();
foreach ( $queries as $query ) {
$results = NLWeb_Retriever::search( $query, $max_per_query );
foreach ( $results as $item ) {
$url = null;
// Support both associative arrays and numeric tuples: [url, schema_json, name, site].
if ( is_array( $item ) ) {
if ( array_key_exists( 'url', $item ) ) {
$url = $item['url'];
} elseif ( array_key_exists( 0, $item ) ) {
$url = $item[0];
}
}
// Skip if we've already seen this URL
if ( $url && in_array( $url, $seen_urls, true ) ) {
continue;
}
$all_results[] = $item;
if ( $url ) {
$seen_urls[] = $url;

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +210
switch ( $provider ) {

case 'openrouter':
$url = $endpoint ?: 'https://openrouter.ai/api/v1/chat/completions';
return array(
'url' => $url,
'args' => array(
'timeout' => $timeout,
'headers' => array(
'Content-Type' => 'application/json',
'Authorization' => "Bearer $api_key",
'HTTP-Referer' => get_site_url(),
'X-Title' => 'NLWeb WordPress Plugin',
),
'body' => wp_json_encode( array(
'model' => $model,
'messages' => array(
array( 'role' => 'system', 'content' => $system_msg ),
array( 'role' => 'user', 'content' => $prompt ),
),
'response_format' => array( 'type' => 'json_object' ),
'max_tokens' => 512,
'temperature' => 0,
) ),
),
);

case 'anthropic':
$url = $endpoint ?: 'https://api.anthropic.com/v1/messages';
return array(
'url' => $url,
'args' => array(
'timeout' => $timeout,
'headers' => array(
'Content-Type' => 'application/json',
'x-api-key' => $api_key,
'anthropic-version' => '2023-06-01',
),
'body' => wp_json_encode( array(
'model' => $model,
'max_tokens' => 512,
'system' => $system_msg,
'messages' => array(
array( 'role' => 'user', 'content' => $prompt ),
),
) ),
),
);

case 'gemini':
$url = $endpoint
?: "https://generativelanguage.googleapis.com/v1beta/models/{$model}:generateContent?key={$api_key}";
return array(
'url' => $url,
'args' => array(
'timeout' => $timeout,
'headers' => array(
'Content-Type' => 'application/json',
),
'body' => wp_json_encode( array(
'contents' => array(
array(
'parts' => array(
array( 'text' => $system_msg . "\n\n" . $prompt ),
),
),
),
'generationConfig' => array(
'responseMimeType' => 'application/json',
'maxOutputTokens' => 512,
'temperature' => 0,
),
) ),
),
);

default:
return null;
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLM wrapper doesn't handle the 'openai' provider case but the switch statement includes 'openrouter'. Based on the settings page (line 206 showing 'OpenAI' label for the 'openai' key), there's a mismatch. The build_request method should either support 'openai' directly or the settings should use 'openrouter' as the key. Currently, selecting OpenAI in settings will return null from build_request and fail silently.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +114
private static function build_prompt( $query ) {
return <<<PROMPT
You are helping to rewrite a complex search query into simpler keyword queries for a traditional keyword-based search engine.
The search engine works best with short, focused queries containing important keywords.

Take the following query and break it down into up to 5 simpler search queries.
Each query should:
- Contain no more than 3 words
- Focus on the most important keywords and concepts
- Be diverse to cover different aspects of the original query
- Use only essential nouns, adjectives, or product terms
- Avoid common words like "for", "the", "some", "are", "that", "would", "be"

For example:
- "what are some options for plates that would be appropriate for serving vegetables" → ["vegetable plates", "serving plates", "dinner plates", "salad plates", "ceramic plates"]
- "looking for a tea pot that can brew green tea" → ["tea pot", "green tea", "teapot ceramic", "japanese teapot", "brewing pot"]
- "movies about alien invasions in the 1950s" → ["alien invasion", "1950s scifi", "alien movies", "invasion films", "scifi 1950s"]

The original query is: {$query}
PROMPT;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt instructs the LLM to limit queries to "no more than 3 words" but the example shows queries with more than 3 words like "vegetable plates" (2 words is fine), "serving plates" (2), "dinner plates" (2), "salad plates" (2), "ceramic plates" (2), "japanese teapot" (2), "brewing pot" (2), "alien invasion" (2), "1950s scifi" (2), "alien movies" (2), "invasion films" (2), "scifi 1950s" (2). The examples actually follow the rule, but for more complex queries, this 3-word limit might be too restrictive. Consider allowing 3-5 words for more nuanced queries.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +65
$schema = array(
'requires_decontextualization' => 'True or False',
'decontextualized_query' => 'The rewritten query, if decontextualization is required',
);

$response = NLWeb_LLM::ask( $prompt, $schema, 'low' );

if ( empty( $response ) || ! isset( $response['requires_decontextualization'] ) ) {
// LLM failed — fall back to the original query.
return array(
'query' => $raw_query,
'requires_decontextualization' => false,
);
}

$needs = ( 'True' === $response['requires_decontextualization'] || true === $response['requires_decontextualization'] );
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema prompt parsing expects boolean values as strings 'True' or 'False' (line 65) but also checks for actual boolean true. This dual handling is good for robustness, but the schema definition on line 51 should clarify the expected format. Consider instructing the LLM to return actual boolean values (true/false without quotes) instead of string values for better type safety.

Copilot uses AI. Check for mistakes.
return $json_str; // not valid JSON — send as-is
}

// If it's a list, take first element.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: 'wrap' should be 'wrapped'. The comment says "If it's a list, take first element" but would be clearer as "If it's an array, take the first element" for consistency with PHP terminology.

Suggested change
// If it's a list, take first element.
// If it's an array, take the first element.

Copilot uses AI. Check for mistakes.
Comment on lines +11 to +13
if ( ! defined( 'ABSPATH' ) ) {
exit;
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed indentation detected: this file uses tabs while all other PHP files in the plugin use spaces. This violates WordPress coding standards which prefer tabs for PHP files, but creates inconsistency within the plugin codebase. For consistency, either convert all files to use tabs or convert this file to use spaces to match the rest of the plugin.

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +23
/* Hide WordPress page title on pages with NLWeb search */
.page .wp-block-post-title {
display: none !important;
}
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSS selector '.page .wp-block-post-title' globally hides all page titles on pages, which is too aggressive. This will hide page titles site-wide even on pages that don't use the NLWeb search. Instead, add a specific class to the shortcode container's parent and only hide titles within that context, or let users control this via their theme.

Copilot uses AI. Check for mistakes.
return $graph;
}
} catch ( \Exception $e ) {
// Yoast internals changed — fall through.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling: The Yoast schema extraction wraps the code in a try-catch but returns null on any exception. This silently swallows errors and makes debugging difficult. Consider logging the error before returning null so administrators can identify when Yoast integration fails.

Suggested change
// Yoast internals changed — fall through.
error_log(
sprintf(
'NLWeb_Retriever: Yoast schema extraction failed for post ID %d: %s',
isset( $post->ID ) ? $post->ID : 0,
$e->getMessage()
)
);
// Yoast internals changed or threw an error — fall through to other schema sources.

Copilot uses AI. Check for mistakes.
…ation in `postMessage` handler

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@rvguha
Copy link
Collaborator Author

rvguha commented Feb 26, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link

Copilot AI commented Feb 26, 2026

@rvguha I've opened a new pull request, #405, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants