Graph Index since v0.5.0
VectorChord's index type vchordg is a disk-based graph index. It has low memory consumption.
Let's start by creating a table named items with an embedding column of type vector(n), and then populate it with sample data.
To create a vchordg index, you can use the following SQL.
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops);TIP
This feature is in preview.
Tuning
When building an index, two options usually need tuning: m and ef_construction. m is the maximum number of neighbors per vertex, and ef_construction is the size of the dynamic list containing the nearest neighbors during insertion.
In search, you need to tune ef_search. ef_search is the size of the dynamic list containing the nearest neighbors during search.
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
m = 64
ef_construction = 128
$$);
SET vchordg.ef_search TO '128';
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;As a disk-based index, vchordg usually requires only the quantized vectors to be kept in the buffer pool to maintain performance. By default, vchordg quantizes a
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
bits = 1
m = 64
ef_construction = 128
$$);The index building can be sped up by using multiple processes. Refer to PostgreSQL Tuning.
Reference
Operator Classes vchordg
The following table lists all available operator classes supported by vchordg.
| Operator Class | Description | Operator 1 | Operator 2 |
|---|---|---|---|
vector_l2_ops | index works for vector type and Euclidean distance | <->(vector,vector) | <<->>(vector,vector) |
vector_ip_ops | index works for vector type and negative inner product | <#>(vector,vector) | <<#>>(vector,vector) |
vector_cosine_ops | index works for vector type and cosine distance | <=>(vector,vector) | <<=>>(vector,vector) |
halfvec_l2_ops | index works for halfvec type and Euclidean distance | <->(halfvec,halfvec) | <<->>(halfvec,halfvec) |
halfvec_ip_ops | index works for halfvec type and negative inner product | <#>(halfvec,halfvec) | <<#>>(halfvec,halfvec) |
halfvec_cosine_ops | index works for halfvec type and cosine distance | <=>(halfvec,halfvec) | <<=>>(halfvec,halfvec) |
<<->>, <<#>>, <<=>> are operators defined by VectorChord.
For more information about <<->>, <<#>>, <<=>>, refer to Similarity Filter.
All operator classes are available since version 0.3.0.
Indexing Options vchordg
bits since v0.5.0
- Description: The ratio of bits to dimensions after RaBitQ quantization.
bits = 2provides better recall and QPS, whilebits = 1consumes less memory. - Type: integer
- Default:
2 - Example:
bits = 2means a-dimensional vector is quantized to bits. bits = 1means a-dimensional vector is quantized to bits .
m since v0.5.0
- Description: The maximum number of neighbors per vertex. The larger
mis, the better the performance, but the higher the storage requirement.mcorresponds toin HNSW and in DiskANN. - Type: integer
- Default:
32 - Example:
m = 32means that there are at mostneighbors per vertex. m = 64means that there are at mostneighbors per vertex.
ef_construction since v0.5.0
- Description: The size of the dynamic list containing the nearest neighbors during insertion. The larger
ef_constructionis, the better the performance, but the slower the insertion.ef_constructioncorresponds toin HNSW and in DiskANN. - Type: integer
- Default:
64 - Example:
ef_construction = 64means that the size of the dynamic list containing the nearest neighbors isduring insertion. ef_construction = 128means that the size of the dynamic list containing the nearest neighbors isduring insertion.
alpha since v0.5.0
- Description: The
alphavalues selected during pruning.alphacorresponds toin DiskANN. This option must be an ascending list, where the first element is 1.0and the last element is less than2.0. - Type: list of floats
- Default:
[1.0, 1.2] - Example:
alpha = [1.0, 1.2]is equivalent to settingalpha = 1.2in DiskANN.alpha = [1.0]is equivalent to the default pruning strategy in HNSW.
- Note: this option is ineffective when the distance metric is negative inner product.
beam_construction since v0.5.0
- Description: Beam width used during insertion. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default:
1 - Example:
beam_construction = 8means that the index accesses 8 vertices at once during insertion.beam_construction = 1means that the index accesses 1 vertex at once during insertion.
Search Parameters vchordg
vchordg.enable_scan since v0.5.0
- Description: Enables or disables the query planner's use of
vchordgindex scan. It's for testing purposes. - Type: boolean
- Default:
on - Example:
vchordg.enable_scan = offdisables the query planner's use ofvchordgindex scan.vchordg.enable_scan = onenables the query planner's use ofvchordgindex scan.
vchordg.ef_search since v0.5.0
- Description: The size of the dynamic list containing the nearest neighbors in search. The larger
vchordg.ef_searchis, the better the recall, but the worse the QPS.vchordg.ef_searchcorresponds toin HNSW and DiskANN. - Type: integer
- Default:
64 - Domain:
[1, 65535] - Example:
SET vchordg.ef_search = 64indicates the size of the dynamic list containing the nearest neighbors isduring search. SET vchordg.ef_search = 128indicates the size of the dynamic list containing the nearest neighbors isduring search.
vchordg.beam_search since v0.5.0
- Description: Beam widths used during search. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default:
1 - Domain:
[1, 65535] - Example:
SET vchordg.beam_search = 8indicates that the index accesses 8 vertices at once during search.SET vchordg.beam_search = 1indicates that the index accesses 1 vertex at once during search.