Conversation
This commit adds a WordPress plugin implementation of NLWeb v0.55 protocol. Features: - NLWeb v0.55 protocol support (query, context, prefer, meta structure) - Query fanout: Rewrites complex queries into 1-5 simpler keyword queries - Decontextualization: Handles multi-turn conversations - LLM integration: Support for OpenRouter, Anthropic, and Gemini providers - Ranking: LLM-based relevance scoring (0-100) with descriptions - MCP Apps: Sandboxed iframe rendering with postMessage bridge - ChatGPT-style UI: Modern conversational interface with message bubbles - Debug panel: Collapsible debug information showing pipeline execution Plugin Structure: - nlweb.php: Main plugin file, REST endpoints (/ask, /mcp) - includes/: PHP classes for protocol, handlers, LLM calls, etc. - assets/: Frontend JavaScript and CSS, MCP renderer Configuration: - No hardcoded API keys (stored in WordPress options) - Admin UI for configuring LLM providers and models - Settings for query fanout, ranking thresholds, post types Usage: 1. Install plugin in WordPress wp-content/plugins/ directory 2. Activate plugin 3. Configure API keys and models in admin settings 4. Add [nlweb_search] shortcode to any page
There was a problem hiding this comment.
Pull request overview
This PR introduces a comprehensive WordPress plugin that implements the NLWeb v0.55 protocol for conversational search. The plugin integrates multiple LLM providers (OpenRouter, Anthropic, Gemini) to provide AI-powered search capabilities with features like query decontextualization, query fanout, and LLM-based ranking.
Changes:
- Adds complete WordPress plugin with admin settings, REST API endpoints, and conversational UI
- Implements NLWeb v0.55 protocol with support for multi-turn conversations and query processing pipeline
- Integrates with WordPress search, Yoast SEO, and optional WooCommerce for schema.org metadata extraction
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
nlweb.php |
Main plugin file with REST API endpoints for /ask and MCP |
class-nlweb-settings.php |
Admin configuration UI for LLM providers and plugin settings |
class-nlweb-protocol.php |
NLWeb v0.55 protocol request/response handler |
class-nlweb-llm.php |
Multi-provider LLM wrapper with parallel request support |
class-nlweb-retriever.php |
WordPress search integration with schema.org extraction |
class-nlweb-ranker.php |
LLM-based relevance scoring for search results |
class-nlweb-query-rewriter.php |
Query fanout implementation for complex queries |
class-nlweb-handler.php |
Main orchestrator for the search pipeline |
class-nlweb-frontend.php |
Shortcode renderer for conversational UI |
class-nlweb-decontextualizer.php |
Multi-turn conversation context handling |
nlweb-mcp-renderer.js |
MCP Apps iframe rendering with postMessage bridge |
nlweb-frontend.js |
ChatGPT-style conversational interface |
nlweb-frontend.css |
Modern responsive styling for the UI |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'headers' => array( | ||
| 'Content-Type' => 'application/json', | ||
| 'x-api-key' => $api_key, | ||
| 'anthropic-version' => '2023-06-01', |
There was a problem hiding this comment.
The 'anthropic-version' header has a typo - it should be 'anthropic-version' (which it is), but note that Anthropic's current API version is '2023-06-01' as shown. However, Anthropic recommends using more recent API versions. As of 2024, '2024-01-01' or later versions are available. Consider updating to a more recent API version or making this configurable.
| 'anthropic-version' => '2023-06-01', | |
| 'anthropic-version' => '2024-01-01', |
| 'top_score' => ! empty( $ranked ) ? $ranked[0]['score'] : null, | ||
| ); | ||
|
|
||
| /* ---- 4. Optionally summarize ------------------------------ */ |
There was a problem hiding this comment.
Inconsistent comment numbering: The pipeline step is labeled as "4. Optionally summarize" but it's actually step 5 (after 1. Decontextualize, 2. Query Fanout, 3. Retrieve, 4. Rank). Update the comment to say "5. Optionally summarize" for accuracy.
| /* ---- 4. Optionally summarize ------------------------------ */ | |
| /* ---- 5. Optionally summarize ------------------------------ */ |
| $seen_ids = array(); | ||
|
|
||
| foreach ( $queries as $query ) { | ||
| $results = NLWeb_Retriever::search( $query, $max_per_query ); | ||
|
|
||
| foreach ( $results as $item ) { | ||
| $id = $item['id'] ?? null; | ||
|
|
||
| // Skip if we've already seen this item | ||
| if ( $id && in_array( $id, $seen_ids, true ) ) { | ||
| continue; | ||
| } | ||
|
|
||
| $all_results[] = $item; | ||
| if ( $id ) { | ||
| $seen_ids[] = $id; |
There was a problem hiding this comment.
The fanout_search method accesses $item['id'] which doesn't exist in the retriever results. The retriever returns tuples with [url, schema_json, name, site] (line 46 of class-nlweb-retriever.php) but doesn't include an 'id' field. This will cause undefined index errors and the deduplication logic won't work correctly. The deduplication should use 'url' as the unique identifier instead.
| $seen_ids = array(); | |
| foreach ( $queries as $query ) { | |
| $results = NLWeb_Retriever::search( $query, $max_per_query ); | |
| foreach ( $results as $item ) { | |
| $id = $item['id'] ?? null; | |
| // Skip if we've already seen this item | |
| if ( $id && in_array( $id, $seen_ids, true ) ) { | |
| continue; | |
| } | |
| $all_results[] = $item; | |
| if ( $id ) { | |
| $seen_ids[] = $id; | |
| $seen_urls = array(); | |
| foreach ( $queries as $query ) { | |
| $results = NLWeb_Retriever::search( $query, $max_per_query ); | |
| foreach ( $results as $item ) { | |
| $url = null; | |
| // Support both associative arrays and numeric tuples: [url, schema_json, name, site]. | |
| if ( is_array( $item ) ) { | |
| if ( array_key_exists( 'url', $item ) ) { | |
| $url = $item['url']; | |
| } elseif ( array_key_exists( 0, $item ) ) { | |
| $url = $item[0]; | |
| } | |
| } | |
| // Skip if we've already seen this URL | |
| if ( $url && in_array( $url, $seen_urls, true ) ) { | |
| continue; | |
| } | |
| $all_results[] = $item; | |
| if ( $url ) { | |
| $seen_urls[] = $url; |
| switch ( $provider ) { | ||
|
|
||
| case 'openrouter': | ||
| $url = $endpoint ?: 'https://openrouter.ai/api/v1/chat/completions'; | ||
| return array( | ||
| 'url' => $url, | ||
| 'args' => array( | ||
| 'timeout' => $timeout, | ||
| 'headers' => array( | ||
| 'Content-Type' => 'application/json', | ||
| 'Authorization' => "Bearer $api_key", | ||
| 'HTTP-Referer' => get_site_url(), | ||
| 'X-Title' => 'NLWeb WordPress Plugin', | ||
| ), | ||
| 'body' => wp_json_encode( array( | ||
| 'model' => $model, | ||
| 'messages' => array( | ||
| array( 'role' => 'system', 'content' => $system_msg ), | ||
| array( 'role' => 'user', 'content' => $prompt ), | ||
| ), | ||
| 'response_format' => array( 'type' => 'json_object' ), | ||
| 'max_tokens' => 512, | ||
| 'temperature' => 0, | ||
| ) ), | ||
| ), | ||
| ); | ||
|
|
||
| case 'anthropic': | ||
| $url = $endpoint ?: 'https://api.anthropic.com/v1/messages'; | ||
| return array( | ||
| 'url' => $url, | ||
| 'args' => array( | ||
| 'timeout' => $timeout, | ||
| 'headers' => array( | ||
| 'Content-Type' => 'application/json', | ||
| 'x-api-key' => $api_key, | ||
| 'anthropic-version' => '2023-06-01', | ||
| ), | ||
| 'body' => wp_json_encode( array( | ||
| 'model' => $model, | ||
| 'max_tokens' => 512, | ||
| 'system' => $system_msg, | ||
| 'messages' => array( | ||
| array( 'role' => 'user', 'content' => $prompt ), | ||
| ), | ||
| ) ), | ||
| ), | ||
| ); | ||
|
|
||
| case 'gemini': | ||
| $url = $endpoint | ||
| ?: "https://generativelanguage.googleapis.com/v1beta/models/{$model}:generateContent?key={$api_key}"; | ||
| return array( | ||
| 'url' => $url, | ||
| 'args' => array( | ||
| 'timeout' => $timeout, | ||
| 'headers' => array( | ||
| 'Content-Type' => 'application/json', | ||
| ), | ||
| 'body' => wp_json_encode( array( | ||
| 'contents' => array( | ||
| array( | ||
| 'parts' => array( | ||
| array( 'text' => $system_msg . "\n\n" . $prompt ), | ||
| ), | ||
| ), | ||
| ), | ||
| 'generationConfig' => array( | ||
| 'responseMimeType' => 'application/json', | ||
| 'maxOutputTokens' => 512, | ||
| 'temperature' => 0, | ||
| ), | ||
| ) ), | ||
| ), | ||
| ); | ||
|
|
||
| default: | ||
| return null; | ||
| } |
There was a problem hiding this comment.
The LLM wrapper doesn't handle the 'openai' provider case but the switch statement includes 'openrouter'. Based on the settings page (line 206 showing 'OpenAI' label for the 'openai' key), there's a mismatch. The build_request method should either support 'openai' directly or the settings should use 'openrouter' as the key. Currently, selecting OpenAI in settings will return null from build_request and fail silently.
| private static function build_prompt( $query ) { | ||
| return <<<PROMPT | ||
| You are helping to rewrite a complex search query into simpler keyword queries for a traditional keyword-based search engine. | ||
| The search engine works best with short, focused queries containing important keywords. | ||
|
|
||
| Take the following query and break it down into up to 5 simpler search queries. | ||
| Each query should: | ||
| - Contain no more than 3 words | ||
| - Focus on the most important keywords and concepts | ||
| - Be diverse to cover different aspects of the original query | ||
| - Use only essential nouns, adjectives, or product terms | ||
| - Avoid common words like "for", "the", "some", "are", "that", "would", "be" | ||
|
|
||
| For example: | ||
| - "what are some options for plates that would be appropriate for serving vegetables" → ["vegetable plates", "serving plates", "dinner plates", "salad plates", "ceramic plates"] | ||
| - "looking for a tea pot that can brew green tea" → ["tea pot", "green tea", "teapot ceramic", "japanese teapot", "brewing pot"] | ||
| - "movies about alien invasions in the 1950s" → ["alien invasion", "1950s scifi", "alien movies", "invasion films", "scifi 1950s"] | ||
|
|
||
| The original query is: {$query} | ||
| PROMPT; |
There was a problem hiding this comment.
The prompt instructs the LLM to limit queries to "no more than 3 words" but the example shows queries with more than 3 words like "vegetable plates" (2 words is fine), "serving plates" (2), "dinner plates" (2), "salad plates" (2), "ceramic plates" (2), "japanese teapot" (2), "brewing pot" (2), "alien invasion" (2), "1950s scifi" (2), "alien movies" (2), "invasion films" (2), "scifi 1950s" (2). The examples actually follow the rule, but for more complex queries, this 3-word limit might be too restrictive. Consider allowing 3-5 words for more nuanced queries.
| $schema = array( | ||
| 'requires_decontextualization' => 'True or False', | ||
| 'decontextualized_query' => 'The rewritten query, if decontextualization is required', | ||
| ); | ||
|
|
||
| $response = NLWeb_LLM::ask( $prompt, $schema, 'low' ); | ||
|
|
||
| if ( empty( $response ) || ! isset( $response['requires_decontextualization'] ) ) { | ||
| // LLM failed — fall back to the original query. | ||
| return array( | ||
| 'query' => $raw_query, | ||
| 'requires_decontextualization' => false, | ||
| ); | ||
| } | ||
|
|
||
| $needs = ( 'True' === $response['requires_decontextualization'] || true === $response['requires_decontextualization'] ); |
There was a problem hiding this comment.
The schema prompt parsing expects boolean values as strings 'True' or 'False' (line 65) but also checks for actual boolean true. This dual handling is good for robustness, but the schema definition on line 51 should clarify the expected format. Consider instructing the LLM to return actual boolean values (true/false without quotes) instead of string values for better type safety.
| return $json_str; // not valid JSON — send as-is | ||
| } | ||
|
|
||
| // If it's a list, take first element. |
There was a problem hiding this comment.
Typo in comment: 'wrap' should be 'wrapped'. The comment says "If it's a list, take first element" but would be clearer as "If it's an array, take the first element" for consistency with PHP terminology.
| // If it's a list, take first element. | |
| // If it's an array, take the first element. |
| if ( ! defined( 'ABSPATH' ) ) { | ||
| exit; | ||
| } |
There was a problem hiding this comment.
Mixed indentation detected: this file uses tabs while all other PHP files in the plugin use spaces. This violates WordPress coding standards which prefer tabs for PHP files, but creates inconsistency within the plugin codebase. For consistency, either convert all files to use tabs or convert this file to use spaces to match the rest of the plugin.
| /* Hide WordPress page title on pages with NLWeb search */ | ||
| .page .wp-block-post-title { | ||
| display: none !important; | ||
| } |
There was a problem hiding this comment.
The CSS selector '.page .wp-block-post-title' globally hides all page titles on pages, which is too aggressive. This will hide page titles site-wide even on pages that don't use the NLWeb search. Instead, add a specific class to the shortcode container's parent and only hide titles within that context, or let users control this via their theme.
| return $graph; | ||
| } | ||
| } catch ( \Exception $e ) { | ||
| // Yoast internals changed — fall through. |
There was a problem hiding this comment.
Missing error handling: The Yoast schema extraction wraps the code in a try-catch but returns null on any exception. This silently swallows errors and makes debugging difficult. Consider logging the error before returning null so administrators can identify when Yoast integration fails.
| // Yoast internals changed — fall through. | |
| error_log( | |
| sprintf( | |
| 'NLWeb_Retriever: Yoast schema extraction failed for post ID %d: %s', | |
| isset( $post->ID ) ? $post->ID : 0, | |
| $e->getMessage() | |
| ) | |
| ); | |
| // Yoast internals changed or threw an error — fall through to other schema sources. |
…ation in `postMessage` handler Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Summary
This PR adds a WordPress plugin implementation of the NLWeb v0.55 protocol, enabling WordPress sites to provide conversational search powered by LLMs.
Features
Core Functionality
Integrations
User Interface
Plugin Structure
Configuration
Usage
wp-content/plugins/directory[nlweb_search]shortcode to any pageTesting
Tested with:
Files Not Included
This PR intentionally excludes:
code/wordpress/docker-compose.yml(test environment setup)code/wordpress/setup.sh(test environment setup)Related Documentation