For the curious¶
A handful of short notes for readers who want to know how the demo actually fits together — none are required reading, but each one is the answer to a question that comes up when you start measuring.
Why SQLSpec¶
The data layer is SQLSpec — a SQL-first data access
layer that keeps the demo close to real SQL while still handing back typed
Python objects. It is deliberately not an ORM: queries live in named
.sql files, service code asks for them by name, and rows come back as
msgspec structs.
Five things make it the right fit once Oracle 26ai’s VECTOR type and the
/explore page are on the table:
Statement-aware query handling. SQLSpec normalizes named SQL and parameter binding before it reaches the driver. That’s how
:bindparameters, dialect quirks, and rewrites likeEXPLAINwork uniformly — instead of requiring per-driver string surgery inside service code.Adapter-neutral service shape. The demo only uses Oracle, but SQLSpec keeps the named-SQL service and Litestar plugin shape separate from the driver-specific connection details.
EXPLAIN PLANfirst-class. The same statement that runs the query can produce its plan — which is what/explorerenders for every vector search, so you can see the HNSW path without leaving the app.Native vector binding. A Python
list[float]is bound straight through to Oracle’sVECTOR(3072, FLOAT32)parameter. Noarray.array("f", ...)wrapping, no manual struct packing, no driver-specific shim leaking into service code.Typed result mapping.
schema_type=ProductMatchproduces msgspec structs row-by-row from the cursor, so handlers and tools work with real types instead ofdict[str, Any]. msgspec, Pydantic, and plain dataclasses all work the same way.
Named SQL files round it out: every query lives in src/app/db/sql/*.sql
under a stable name (e.g. vector-search-products). Service code asks
for db_manager.get_sql("vector-search-products") rather than embedding
multi-line strings — the SQL stays inspectable, version-controlled, and
EXPLAIN-friendly.
Together that’s why ProductService.search_by_vector fits on one screen
and reads like a function call: SQLSpec absorbs the binding,
plan-generation, and result-shaping that an Oracle-aware service would
otherwise have to spell out. See https://sqlspec.dev for the full
feature set, including the adapter list, migration runner, and Litestar
plugin used by this app.
Schema annotations in migrations¶
Oracle AI Database 26ai schema annotations are authored directly in the
baseline migration DDL. SQLSpec loads the migration as named SQL and executes
the migration body as a script, so new Oracle DDL such as ANNOTATIONS(...)
can remain inline with the CREATE TABLE, column definition, and supported
CREATE INDEX statements.
The app does not read annotations at runtime; they are demo metadata for
database inspection and tooling. After coffee upgrade, inspect them with
USER_ANNOTATIONS_USAGE alongside the existing COMMENT ON metadata.
The HNSW neighbor graph in the SGA¶
The product index is ORGANIZATION INMEMORY NEIGHBOR GRAPH. That name is
load-bearing: the index lives in Oracle’s vector memory pool inside the
SGA as a small-world graph of vectors and their nearest neighbors. A query
walks down through the graph from a sparse top layer toward the densest
bottom layer, refining the candidate set at each step.
flowchart TD
Q[query vector] -.-> L0
subgraph L2[layer 2 · sparse]
N0((·)) --- N1((·)) --- N2((·))
end
subgraph L1[layer 1]
M0((·)) --- M1((·)) --- M2((·)) --- M3((·))
end
subgraph L0[layer 0 · dense]
P0((·)) --- P1((·)) --- P2((·)) --- P3((·)) --- P4((·)) --- P5((·))
end
L2 -.-> L1
L1 -.-> L0
L0 --> R[top-k]
Two construction parameters set the shape:
NEIGHBORS=40— out-degree per node. Higher values build a denser graph, slower to construct, faster to search.EFCONSTRUCTION=500— candidate list size during build. Higher values mean better-quality neighbor selection (and a slower build).
WITH TARGET ACCURACY 95 lets Oracle pick the search-time ef that meets
the recall target dynamically.
The pool itself is configured by vector_memory_size. 512 MB is plenty for
the demo’s 130 committed product vectors plus query embeddings saved in
embedding_cache; budget roughly
rows × dimensions × 4 bytes × 1.4 (HNSW overhead) × 2 (safety) for larger
catalogs.
Deterministic vs ADK latency¶
Product, store, availability, and unsupported order turns are routed before the
ADK workflow is built, so they avoid speculative LLM deltas entirely. Product
RAG can include one bounded Gemini structured-output call for candidate
selection, but final product text is still rendered from Oracle rows. The ADK
workflow still matters for GENERAL_CONVERSATION: inside that fallback path,
the classifier and the LLM share START with max_concurrency=2. The timings
below are illustrative; the important shape is overlap, not the exact numbers.
gantt
dateFormat X
axisFormat %s ms
title Sequential vs parallel
section Sequential
Classifier :a1, 0, 180
LLM + tool :a2, 180, 1100
section Parallel
Classifier :b1, 0, 180
LLM + tool :b2, 0, 1100
The “LLM + tool” bar dominates because it covers the fallback agent’s reasoning, any closure-bound tool call, and final-event packaging. The classifier’s latency overlaps with that fallback path instead of adding to it.
The grounded PRODUCT_RAG route is stricter: it classifies first, runs the
vector search directly, validates any structured selection against the returned
product ids, and formats the final answer from product rows.
What the live dashboard measures¶
The chat UI surfaces per-message badges; /explore shows the same data
across recent searches. The fields map back to specific call sites:
Badge / metric |
What it measures |
Source |
|---|---|---|
|
Wall time for |
|
|
Round-trip for the named SQL |
|
|
Total time inside |
|
|
Rows returned by HNSW after threshold + |
|
|
Response cache hit (model + persona + normalized query). |
|
|
Hit on the Oracle-backed |
|
|
Product RAG answer path: structured selection, template fallback, timeout, rejected output, or model error. |
|
|
Time spent in the Product RAG structured-selection step or its fallback. |
|
|
Output of the Flash-Lite classifier, with product lookup fallback normalized to |
|
|
Per-phase timing collected during retrieval, used for the colored badges in the chat bubble. |
|
If oracle_ms spikes:
Check the EXPLAIN PLAN at
/explorestill mentionsVECTOR. A full table scan means the index is unavailable.Verify
vector_memory_sizeis non-zero —SELECT name, bytes FROM v$sgainfo WHERE name = 'Vector Memory'.Refresh table/index statistics if the catalog has just grown.
If embedding_ms spikes, the embedding cache is missing or full. coffee clear-cache rebuilds it on the next request.