- 
							
					Scoped Vector Search with the MyVector Plugin for MySQL – Part II
										
 
				Subtitle: Schema design, embedding workflows, hybrid search, and performance tradeoffs explained.
Quick Recap from Part 1
In Part 1, we introduced the MyVector plugin — a native extension that brings vector embeddings and HNSW-based approximate nearest neighbor (ANN) search into MySQL. We covered how MyVector supports scoped queries (e.g., WHERE user_id = X) to ensure that semantic search remains relevant, performant, and secure in real-world multi-tenant applications.
Now in Part 2, we move from concept to implementation:
How to store and index embeddings
How to design embedding workflows
How hybrid (vector + keyword) search works
How HNSW compares to brute-force search
How to tune for performance at scale
1. Schema Design for Vector Search
The first step is designing tables that support both structured and semantic data.
A typical schema looks like:
CREATE TABLE documents (
    id BIGINT PRIMARY KEY,
    user_id INT NOT NULL,
    title TEXT,
    body TEXT,
    embedding VECTOR(384),
    INDEX(embedding) VECTOR
);
Design tips:
Use VECTOR(n) to store dense embeddings (e.g., 384-dim for MiniLM).
Always combine vector queries with SQL filtering (WHERE user_id = …, category = …) to scope the search space.
Use TEXT or JSON fields for hybrid or metadata-driven filtering.
Consider separating raw text from embedding storage for cleaner pipelines.
2. Embedding Pipelines: Where and When to Embed
MyVector doesn’t generate embeddings — it stores and indexes them. You’ll need to decide how embeddings are generated and updated:
a. Offline (batch) embedding
Run scheduled jobs (e.g., nightly) to embed new rows.
Suitable for static content (documents, articles).
Can be run using Python + HuggingFace, OpenAI, etc.
# Python example
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
vectors = model.encode(["Your text goes here"])
b. Write-time embedding
Embed text when inserted via your application.
Ensures embeddings are available immediately.
Good for chat apps, support tickets, and notes.
c. Query-time embedding
Used for user search input only.
Transforms search terms into vectors (not stored).
Passed into queries like:
ORDER BY L2_DISTANCE(embedding, '[query_vector]') ASC
3. Hybrid Search: Combine Text and Semantics
Most real-world search stacks benefit from combining keyword and vector search. MyVector enables this inside a single query:
SELECT id, title
FROM documents
WHERE MATCH(title, body) AGAINST('project deadline')
  AND user_id = 42
ORDER BY L2_DISTANCE(embedding, EMBED('deadline next week')) ASC
LIMIT 5;
This lets you:
Narrow results using lexical filters
Re-rank them semantically
All in MySQL — no sync to external vector DBs
This hybrid model is ideal for support systems, chatbots, documentation search, and QA systems.
4. Brute-Force vs. HNSW Indexing in MyVector
When it comes to similarity search, how you search impacts how fast you scale.
Brute-force search
Compares the query against every row
Guarantees exact results (100% recall)
Simple but slow for >10K rows
SELECT id
FROM documents
ORDER BY COSINE_DISTANCE(embedding, '[query_vector]') ASC
LIMIT 5;
HNSW: Hierarchical Navigable Small World
Graph-based ANN algorithm used by MyVector
Fast and memory-efficient
High recall (~90–99%) with tunable parameters (ef_search, M)
CREATE INDEX idx_vec ON documents(embedding) VECTOR
  COMMENT='{"HNSW_M": 32, "HNSW_EF_CONSTRUCTION": 200}';
Comparison
FeatureBrute ForceHNSW (MyVector)Recall 100% ~90–99%Latency (1M rows) 100–800ms+ ~5–20msIndexing None RequiredFiltering Support Yes YesIdeal Use CaseSmall datasetsProduction search
5. Scoped Search as a Security Boundary
Because MyVector supports native SQL filtering, you can enforce access boundaries without separate vector security layers.
Patterns:
WHERE user_id = ? → personal search
WHERE org_id = ? → tenant isolation
Use views or stored procedures to enforce access policies
You don’t need to bolt access control onto your search engine — MySQL already knows your users.
6. HNSW Tuning for Performance
MyVector lets you tune index behavior at build or runtime:
ParamPurposeEffectMGraph connectivityHigher = more accuracy + RAMef_searchTraversal breadth during queriesHigher = better recall, more latencyef_constructionIndex quality at build timeAffects accuracy and build cost
Example:
ALTER INDEX idx_vec SET HNSW_M = 32, HNSW_EF_SEARCH = 100;
You can also control ef_search per session or per query soon (planned feature).
TL;DR: Production Patterns with MyVector
Use VECTOR(n) columns and HNSW indexing for fast ANN search
Embed externally using HuggingFace, OpenAI, Cohere, etc.
Combine text filtering + vector ranking for hybrid search
Use SQL filtering to scope vector search for performance and privacy
Tune ef_search and M to control latency vs. accuracy
Coming Up in Part 3
In Part 3, we’ll explore real-world implementations:
Semantic search 
Real-time document recall
Chat message memory + re-ranking
Integrating MyVector into RAG and AI workflows
We’ll also show query plans and explain fallbacks when HNSW is disabled or brute-force is needed.				 
				 
						 
					- 
							
					MySQL Basics: Trust the Process—Mastering Transactions Step by Step
										
 
				Discover how to use transactions in MySQL to keep your data safe and organized—just like a diligent librarian! This beginner-friendly post explains what transactions are, why they matter, and how to use them with easy library metaphors, clear SQL examples, and essential tips for database reliability.				 
				 
						 
					- 
							
					Practical Data Masking in Percona Server for MySQL 8.4
										
 
				Data masking lets you hide sensitive fields (emails, credit-card numbers, job titles, etc.) while keeping data realistic for reporting, support, or testing. It is particularly useful when you collaborate with external entities and need to share your data for development reasons. You also need to protect your data and keep your customers’ privacy safe. Last […]				 
				 
						 
					- 
							
					MySQL 9.5 – Thank you for your contributions!
										
 
				MySQL 9.5 has been released, here is a list of the contributions we included in it. Thank you to all contributors!				 
				 
						 
					- 
							
					MySQL 9.5 – Thank you for your contributions!
										
 
				On October 10th, 2025, we released MySQL 9.5, the latest Innovation Release. As usual, we released bug fixes for 8.0 and 8.4 LTS, but this post focuses on the newest release. In this release, we can see contributions related to Connector J and Connector Net, as well as to different server categories. Connector / J […]				 
				 
						 
				 
	 |