Skip to content

docs: add semantic and hybrid search specification for 2.0.0#31

Open
adityamparikh wants to merge 70 commits intosb4from
docs/semantic-hybrid-search-spec-sb4
Open

docs: add semantic and hybrid search specification for 2.0.0#31
adityamparikh wants to merge 70 commits intosb4from
docs/semantic-hybrid-search-spec-sb4

Conversation

@adityamparikh
Copy link
Copy Markdown
Owner

Summary

  • Adds design specification for semantic and hybrid search capabilities targeting the 2.0.0 release (Spring Boot 4 / Spring AI 2.0 stack)
  • Defines SolrVectorStore (Spring AI VectorStore implementation), three new MCP tools (semantic-search, hybrid-search, index-with-embeddings), and supporting components (RrfMerger, EmbeddingService, VectorStoreFactory)
  • Based on patterns from the ai-powered-search reference implementation
  • Supersedes feat: GraalVM native image support #53 (which targeted main/1.x); this PR targets sb4 for inclusion in 2.0.0

Changes from #53

  • Updated target stack: Spring Boot 4.0.2, Spring AI 2.0.0-M2, SolrJ 10.0.0
  • Solr 10 is now the primary target (Solr 9.0+ still supported)
  • Updated starter naming convention: spring-ai-starter-openai (Spring AI 2.0 convention)
  • Integration tests now specify Solr 10 as the primary Testcontainers image

Test plan

  • Review spec for completeness and correctness against sb4 tech stack
  • Validate architecture against Spring AI 2.0.0-M2 APIs
  • Confirm Solr 10 KNN/DenseVectorField compatibility
  • Implementation will follow in a separate PR

🤖 Generated with Claude Code

adityamparikh and others added 30 commits February 4, 2026 13:32
…(HTTP mode)

Add comprehensive observability support for HTTP mode using OpenTelemetry
and the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir).

Changes:
- Add spring-boot-starter-opentelemetry dependency
- Add OpenTelemetry logback appender for OTLP log export
- Configure OTLP endpoints in application-http.properties
- Add logback-spring.xml with profile-specific configuration
- Add OpenTelemetryAppenderInstaller component for HTTP profile
- Add lgtm service to compose.yaml for local observability stack
- Add Observability.md documentation guide
- Update README.md with observability feature and docs link

The observability stack provides:
- Distributed tracing via Tempo
- Metrics via Mimir (Prometheus-compatible)
- Log aggregation via Loki
- Grafana dashboards at http://localhost:3000

STDIO mode remains unaffected - no telemetry is enabled to prevent
stdout pollution that would interfere with MCP protocol communication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add org.springframework.boot.ignore label to prevent conflict between
Docker Compose auto-configuration and manual OTLP endpoint settings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove conflicting manual OTLP endpoint defaults
- Let Spring Boot Docker Compose auto-detect grafana/otel-lgtm container
- Use spring.* namespace instead of management.* for OTLP config
- Update docs to explain auto-configuration vs production setup

This matches the pattern used in github.com/danvega/ot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes:
- Use management.* namespace for OTLP endpoints (per Spring Boot convention)
- Add explicit localhost defaults for OTLP endpoints
- Add org.springframework.boot.ignore label to lgtm container to prevent
  duplicate bean conflict with Docker Compose auto-configuration
- Cherry-pick security bypass feature to allow testing without OAuth2

The application now starts successfully with OpenTelemetry metrics,
traces, and logs configured for the local LGTM stack.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The spring-ai-spring-boot-docker-compose transitively brings in
spring-boot-starter-mongodb which causes MongoDB auto-configuration
to run even though no MongoDB is used in this project.

Exclude the MongoDB starter to prevent unwanted MongoDB client
creation and health check failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Force opentelemetry-proto to version 1.3.2-alpha which uses protobuf
3.23.4 instead of the default 1.8.0-alpha which uses protobuf 4.32.0.

The opentelemetry-proto 1.8.0-alpha has a known incompatibility with
protobuf 4.x causing NoSuchMethodError on getParentForChildren().

See: micrometer-metrics/micrometer#5658

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add spring-boot-starter-aspectj dependency for @observed annotation
  support on service methods (required in Spring Boot 4)
- Add prometheus endpoint to actuator exposure for metrics scraping
- Enable observation annotations with management.observations.annotations.enabled
- Fix Logback pattern with default fallback to prevent empty pattern errors
- Configure stdio profile to use level=OFF instead of ERROR
- Add test profile configuration in logback-spring.xml
- Update Observability.md with LGTM stack documentation and screenshot
- Move Observability.md to dev-docs/ folder

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Spring Boot 4 uses Jackson 3 (tools.jackson) for databind/core but
retains Jackson 2 (com.fasterxml.jackson) for annotations. Update
ObjectMapper imports in CollectionService, SchemaService, JsonUtils,
and their tests to use tools.jackson.databind.ObjectMapper.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…(HTTP mode)

Add comprehensive observability support for HTTP mode using OpenTelemetry
and the Grafana LGTM stack (Loki, Grafana, Tempo, Mimir).

Changes:
- Add spring-boot-starter-opentelemetry dependency
- Add OpenTelemetry logback appender for OTLP log export
- Configure OTLP endpoints in application-http.properties
- Add logback-spring.xml with profile-specific configuration
- Add OpenTelemetryAppenderInstaller component for HTTP profile
- Add lgtm service to compose.yaml for local observability stack
- Add Observability.md documentation guide
- Update README.md with observability feature and docs link

The observability stack provides:
- Distributed tracing via Tempo
- Metrics via Mimir (Prometheus-compatible)
- Log aggregation via Loki
- Grafana dashboards at http://localhost:3000

STDIO mode remains unaffected - no telemetry is enabled to prevent
stdout pollution that would interfere with MCP protocol communication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add org.springframework.boot.ignore label to prevent conflict between
Docker Compose auto-configuration and manual OTLP endpoint settings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove conflicting manual OTLP endpoint defaults
- Let Spring Boot Docker Compose auto-detect grafana/otel-lgtm container
- Use spring.* namespace instead of management.* for OTLP config
- Update docs to explain auto-configuration vs production setup

This matches the pattern used in github.com/danvega/ot

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes:
- Use management.* namespace for OTLP endpoints (per Spring Boot convention)
- Add explicit localhost defaults for OTLP endpoints
- Add org.springframework.boot.ignore label to lgtm container to prevent
  duplicate bean conflict with Docker Compose auto-configuration
- Cherry-pick security bypass feature to allow testing without OAuth2

The application now starts successfully with OpenTelemetry metrics,
traces, and logs configured for the local LGTM stack.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The spring-ai-spring-boot-docker-compose transitively brings in
spring-boot-starter-mongodb which causes MongoDB auto-configuration
to run even though no MongoDB is used in this project.

Exclude the MongoDB starter to prevent unwanted MongoDB client
creation and health check failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Force opentelemetry-proto to version 1.3.2-alpha which uses protobuf
3.23.4 instead of the default 1.8.0-alpha which uses protobuf 4.32.0.

The opentelemetry-proto 1.8.0-alpha has a known incompatibility with
protobuf 4.x causing NoSuchMethodError on getParentForChildren().

See: micrometer-metrics/micrometer#5658

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add spring-boot-starter-aspectj dependency for @observed annotation
  support on service methods (required in Spring Boot 4)
- Add prometheus endpoint to actuator exposure for metrics scraping
- Enable observation annotations with management.observations.annotations.enabled
- Fix Logback pattern with default fallback to prevent empty pattern errors
- Configure stdio profile to use level=OFF instead of ERROR
- Add test profile configuration in logback-spring.xml
- Update Observability.md with LGTM stack documentation and screenshot
- Move Observability.md to dev-docs/ folder

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Spring Boot 4 uses Jackson 3 (tools.jackson) for databind/core but
retains Jackson 2 (com.fasterxml.jackson) for annotations. Update
ObjectMapper imports in CollectionService, SchemaService, JsonUtils,
and their tests to use tools.jackson.databind.ObjectMapper.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
adityamparikh and others added 26 commits March 5, 2026 22:12
On SolrJ 10 (Spring Boot 4 / Spring AI 2.0), CollectionAdminRequest
validates the collection name at construction time and throws
SolrException. Add an explicit isBlank() guard upfront so callers
receive a consistent IllegalArgumentException regardless of SolrJ
version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Extract JsonResponseParser instantiation into a dedicated @bean method
so it can be injected as a dependency into solrClient(), making the
wiring explicit and enabling overriding in tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
- Add create-collection to the "What's inside" feature list
- Split "Available MCP tools" into Search/Indexing/Collections/Schema
  sections with correct kebab-case tool names
- List all three indexing tools separately (json/csv/xml)
- Replace stale camelCase names with actual @mcptool(name=...) values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add @PreAuthorize("isAuthenticated()") to createCollection(), consistent
with the existing checks on indexJsonDocuments, indexCsvDocuments,
indexXmlDocuments, and search. Enforced when the http profile is active
and spring.security.enabled=true (via MethodSecurityConfiguration).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
- Rename configuration property from `spring.security.enabled` to
  `http.security.enabled` to avoid collision with Spring Boot's own
  `spring.security.*` namespace
- Rename corresponding environment variable from `SECURITY_ENABLED`
  to `HTTP_SECURITY_ENABLED` for consistency
- Update application-http.properties, HttpSecurityConfiguration,
  MethodSecurityConfiguration, and keycloak.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Replace the static new ObjectMapper() with Spring's auto-configured
ObjectMapper bean injected via constructor. Use MediaType.APPLICATION_JSON_VALUE
for the content type constant instead of a raw string literal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Replace tools.jackson.databind.ObjectMapper with the more specific
tools.jackson.databind.json.JsonMapper — the concrete JSON format mapper
provided by Spring Boot 4's Jackson auto-configuration — in JsonUtils,
CollectionService, SchemaService, and their corresponding tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…'s ObjectMapper

Extract URL normalization tests from SolrConfigTest into a dedicated
SolrConfigUrlNormalizationTest annotated with @jsontest, so Spring's
auto-configured ObjectMapper is injected rather than using new ObjectMapper().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
…'s JsonMapper

Extract URL normalization tests from SolrConfigTest into a dedicated
SolrConfigUrlNormalizationTest annotated with @jsontest, so Spring's
auto-configured JsonMapper is available rather than needing the full
Testcontainers-backed SpringBootTest context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Commit ee13711 accidentally re-added stale files that had already been
moved to dedicated packages in prior refactoring commits:

- Delete config/McpServerConfiguration.java and config/MethodSecurityConfiguration.java
  (moved to security/ package in cd398b0; duplicates caused ConflictingBeanDefinitionException)
- Delete metadata/CollectionService.java, metadata/CollectionUtils.java, metadata/Dtos.java
  (moved to collection/ package; duplicates caused ConflictingBeanDefinitionException)
- Delete metadata/CollectionServiceTest.java, metadata/CollectionServiceIntegrationTest.java,
  metadata/CollectionUtilsTest.java (duplicates of collection/ test package)
- Update Main.java, MainTest.java, McpToolRegistrationTest.java to import from
  collection.CollectionService instead of metadata.CollectionService
- Restore http.security.enabled=${HTTP_SECURITY_ENABLED:false} in application-http.properties
  (ee13711 reverted the rename from spring.security.enabled)
- Hardcode logback-spring.xml CONSOLE appender pattern to fix PatternLayout("") error
  in Spring Boot 4.0.2 where CONSOLE_LOG_PATTERN resolves to empty string

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
… Solr 10

SolrInfoMBeanHandler (and thus the /admin/mbeans endpoint) was removed in
Solr 10. When getCacheMetrics() or getHandlerMetrics() call this endpoint on
a Solr 10 server, SolrJ throws RemoteSolrException (a RuntimeException) because
the server returns an HTML 404 page instead of JSON.

Widen the catch in both methods to include RuntimeException so the server
degrades gracefully (returning null for cache/handler stats) rather than
propagating the exception. The integration tests already handle null stats,
so all tests now pass with solr:10-slim.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Add Solr 9.10 and 10 to the CI compatibility matrix, running integration
tests against all supported versions (8.11, 9.4, 9.9, 9.10, 10) on every
PR and push to main.

Also update AGENTS.md to document Solr 10 compatibility status: the
/admin/mbeans endpoint removal is handled gracefully, and all other
functionality is verified working with solr:10-slim.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
- Bump solr version to 10.0.0 in libs.versions.toml
- Remove Jetty BOM alignment (Solr 10 uses Jetty 12; no longer needed)
- Remove Apache HttpComponents exclusion (SolrJ 10 no longer uses it)
- Replace Http2SolrClient with HttpJdkSolrClient (new JDK HTTP client)
- Move SolrQuery import: solrj → solrj.request package
- Move ResponseParser import: solrj → solrj.response package
- Adapt ResponseParser: getContentType() → getContentTypes() returning
  Collection<String>; remove processResponse(Reader) (no longer abstract)
- Fix CoreAdminResponse.getCoreStatus(): now returns Map<String,
  SingleCoreData> instead of NamedList — update listCollections()
  and CollectionServiceTest accordingly

All unit and Testcontainers integration tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Incorporates:
- feat(config): switch Solr wire format from JavaBin to JSON (#55)
- fix(collection): catch RuntimeException from removed /admin/mbeans in Solr 10 (#59)
- feat(ci): add Solr 9.10 and 10 compatibility testing (#59)
- feat(deps): upgrade solr-solrj from 9.9.0 to 10.0.0 (#58)

Adapted JsonResponseParser for Jackson 3 (tools.jackson.databind).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Consolidate security-related documentation into a dedicated
security-docs directory for better organization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Replace deprecated /admin/mbeans endpoint with /admin/metrics API
(available since Solr 7.1+) for cache and handler metrics collection.
This ensures compatibility with both Solr 9 and 10, since /admin/mbeans
was removed in Solr 10.

Key changes:
- CollectionService: remove all MBeans code, use Metrics API for cache
  and handler stats with proper LinkedHashMap/NamedList handling
- Add SLF4J logging to all silent catch blocks (visible only in HTTP
  profile via logback-spring.xml)
- Integration tests: strong assertions for cache and handler metrics
  against real Solr via Testcontainers
- Unit tests: rewritten with Metrics API response format mocks
- CLAUDE.md: update Solr 10 compatibility docs

Signed-off-by: Aditya Parikh <adityaparikh@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
The Apache Solr community does not use @Version and @SInCE tags in
javadocs. Remove all occurrences of @Version 1.0.0 and @SInCE 1.0.0
from 13 source files.

Signed-off-by: Aditya Parikh <adityamparikh@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Fill the constitution with project principles and update the
Technology Stack section to reflect the sb4 branch upgrades:

- Spring Boot 4.0.2 (up from 3.5.8)
- Spring AI 2.0.0-M2 (up from 1.1.2)
- SolrJ 10.0.0 (up from 9.x)
- Testcontainers 2.0.2
- OpenTelemetry observability stack (OTel starter, Micrometer OTLP,
  Logback appender)
- Error Prone with NullAway for static analysis

Core principles remain the same as the main branch constitution:
MCP Protocol Integrity, Solr Version Compatibility, Test-First
Development, Security by Default, and Simplicity/YAGNI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
docs: initialize Spec Kit constitution for sb4 tech stack
Enable Java virtual threads globally via spring.threads.virtual.enabled=true.
Virtual threads are lightweight and improve concurrency for HTTP mode where
multiple MCP clients can connect and each request blocks on Solr I/O.

Signed-off-by: Aditya Parikh <adityamparikh@gmail.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
- S1118: Add private constructor to CollectionUtils utility class
- S1488: Inline shardMatch variable in CollectionService.validateCollectionExists
- S7467: Replace unused catch variable 'e' with '_' in 14 locations

Signed-off-by: Aditya Parikh <aditya.m.parikh@gmail.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
Adds design specification for semantic and hybrid search capabilities
targeting the 2.0.0 release on the Spring Boot 4 / Spring AI 2.0 stack.

Defines SolrVectorStore (Spring AI VectorStore implementation), three
new MCP tools (semantic-search, hybrid-search, index-with-embeddings),
and supporting components (RrfMerger, EmbeddingService, VectorStoreFactory).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: adityamparikh <aditya.m.parikh@gmail.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Mar 9, 2026

adityamparikh added a commit that referenced this pull request Mar 10, 2026
…d their schema (#31)

* feat: add MCP Resources for collections and schema

Add @McpResource and @McpComplete annotations from spring-ai-mcp-annotations
library to expose Solr metadata as MCP Resources:

- solr://collections: Lists all available Solr collections
- solr://{collection}/schema: Returns schema definition for a collection

The schema resource supports autocompletion for the {collection} parameter
using @McpComplete, allowing MCP clients to discover available collections.

Also adds JsonUtils utility class for consistent JSON serialization in
MCP Resource responses.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add MCP Inspector screenshots for resources

Add screenshots showing:
- MCP Inspector listing available resources
- Resource autocompletion for collection names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* chore: remove useless comments

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant