pg_search Full Text Search
pg_search is an extension for SynxDB that provides powerful full-text search capabilities for database tables using the BM25 algorithm. It is built on Tantivy, a search engine library implemented in Rust, and integrates with SynxDB via the pgrx framework.
The main application scenarios and benefits of pg_search include:
High-performance full-text search: Provides efficient search capabilities for text data in the database, surpassing traditional
LIKEqueries.Seamless integration with the database: Runs as an extension and integrates seamlessly with existing databases and table structures without requiring data migration.
Rich query functionality: Supports complex boolean logic queries, phrase queries, and aggregate function pushdown, which can accelerate data analysis and report generation.
Enable the pg_search extension
Follow these steps to enable the pg_search extension:
Modify the database configuration: Add
pg_searchtoshared_preload_libraries. This is a mandatory step; otherwise, creating an index will cause the database connection to crash or hang.gpconfig -c shared_preload_libraries -v pg_search
After modifying the configuration, restart the database cluster.
Enable the
pg_searchextension: Connect to the database and execute the following SQL command to enable thepg_searchextension:CREATE EXTENSION pg_search;
How to use
The following example demonstrates how to use pg_search with a product information table.
Create a sample table and insert some data:
CREATE TABLE products ( id SERIAL PRIMARY KEY, description TEXT, rating INTEGER, category TEXT, price NUMERIC, in_stock BOOLEAN ); INSERT INTO products (description, rating, category, price, in_stock) VALUES ('Laptop with fast processor', 5, 'Electronics', 999.99, true), ('Gaming laptop with RGB', 5, 'Electronics', 1299.99, true), ('Toy laptop for kids', 3, 'Toys', 499.99, false), ('Wireless keyboard and mouse', 4, 'Electronics', 79.99, true), ('Mechanical keyboard RGB', 5, 'Electronics', 149.99, true), ('Running shoes for athletes', 5, 'Sports', 89.99, true), ('Winter jacket warm', 4, 'Clothing', 129.99, true), ('Summer jacket light', 3, 'Clothing', 59.99, true);
Create a full-text search index: Use the
USING bm25clause to create apg_searchindex for the table.CREATE INDEX products_idx ON products USING bm25 (id, description, rating, category, price) WITH ( key_field='id', text_fields='{"description": {}, "category": {"fast": true}}', numeric_fields='{"rating": {"fast": true}, "price": {"fast": true}}' );
Parameters in the
WITHclause:key_field: Specifies the primary key field of the table.text_fields: Defines the text fields to be indexed for full-text search."fast": true: Marks the field as a “fast field,” which allows direct access to field values in scenarios like aggregate functions, resulting in higher performance.
numeric_fields: Defines the numeric fields to be indexed. These can also be configured as “fast fields.”
Execute queries. Use the
@@@operator to perform full-text searches. The syntax is similar to PostgreSQL’sLIKEoperator but offers better performance.Basic query. Query for products where the
descriptionfield contains “laptop”:SELECT * FROM products WHERE description @@@ 'laptop';
Use boolean logic. Query for products where
descriptioncontains “laptop” or “keyboard”, and theratingis greater than or equal to 4:SELECT * FROM products WHERE ((description @@@ 'laptop') OR (description @@@ 'keyboard')) AND (rating >= 4);
Use aggregate functions
pg_search supports pushing down aggregate functions (COUNT, SUM, AVG, MIN, MAX) to the index level for execution, which significantly improves query performance. This feature is only effective for columns configured as “fast fields.”
Example: Count the number of matching products and calculate their total price:
-- Query the count and sum of price for products where description contains 'laptop'
EXPLAIN (FORMAT TEXT, COSTS OFF, TIMING OFF, VERBOSE)
SELECT COUNT(*), SUM(price) FROM products WHERE description @@@ 'laptop';
The EXPLAIN output shows that the query plan uses a Custom Scan (ParadeDB Scan), indicating that the aggregation operation has been successfully pushed down.
Considerations and limitations
shared_preload_librariesconfiguration: For SynxDB, you must addpg_searchtoshared_preload_librariesand restart the database. This is a prerequisite for the extension to work correctly.Cloud database services: On managed cloud database services (such as Amazon RDS), you may not be able to install
pg_searchunless the provider explicitly supports the extension.Aggregate function limitations:
Aggregate pushdown only works for fields marked as
"fast": truein the index.DISTINCTaggregates are not supported. Queries containingDISTINCTwill fall back to PostgreSQL’s native execution plan and will not benefit from the performance advantages of index acceleration.
Background process:
pg_searchrelies on a background worker process to handle index writes. If this process does not start correctly, it may lead to database connection issues.Query limitations:
pg_searchcurrently only supports single-table queries and does not support multi-table joins.Partitioned tables:
pg_searchdoes not support partitioned tables.