Use AIFun

AIFun is a PostgreSQL extension that provides AI-powered functions using large language model (LLM) providers, including OpenAI, Anthropic, Google Gemini, and AWS Bedrock. It enables you to leverage advanced AI capabilities directly within your database environment.

Key features

  • Multiple AI providers: Seamlessly integrates with major AI platforms:

    • OpenAI

    • Anthropic

    • Google Gemini

    • AWS Bedrock

  • Simple setup: Designed for ease of use without complex encryption key management. You can directly manage your own API keys.

  • User isolation: Ensures security and privacy by allowing each user to manage their own API keys and provider configurations independently.

  • AI functions: Ask questions, generate embeddings, classify text, extract structured data, summarize, translate, and more.

  • Row level security: PostgreSQL RLS ensures users can only access their own configurations.

Supported providers

AIFun supports integration with the following AI providers and models, allowing you to choose the best fit for your application needs.

Provider

Type

Models

OpenAI

openai

gpt-4, gpt-3.5-turbo, text-embedding-ada-002, or compatible models

Anthropic

anthropic

claude-3-sonnet, claude-3-haiku

Google Gemini

gemini

gemini-pro, gemini-pro-vision

AWS Bedrock

aws_bedrock

Various models via AWS Bedrock

Prerequisites

Before using AIFun, ensure that the Python requests library and other necessary dependencies are installed on the database server.

Because the SynxDB environment comes with an independent Python environment, and some dependencies have strict version requirements, follow these steps to manually install the required dependencies:

  1. Log into the database server (this must be performed on all nodes, including the coordinator and segments).

  2. Load the database environment variables (to ensure the bundled Python environment is used):

    source /usr/local/synxdb/cloudberry-env.sh
    
  3. Use the python -m pip command to install the specified versions of the dependencies (recommended to avoid path errors):

    python -m pip install requests==2.32.5 boto3==1.40.72 PyPDF2==3.0.1 python-pptx==1.0.2 python-docx==1.2.0 pypdfium2==5.0.0 pillow==12.0.0
    
  4. After installation, it is recommended to restart the database to ensure the changes take effect.

Install the extension

After ensuring that the prerequisites are met, you can create the extension in the database:

CREATE EXTENSION aifun;

Get started

Configure your provider

Before using any AI functions, you must register your API key with AIFun. The aifun.add_provider function allows you to configure different providers securely.

-- Adds an OpenAI-compatible provider.
SELECT
    aifun.add_provider(
        p_id => 'local_llm',
        p_type => 'openai',
        p_api_key => 'abc', 
        p_metadata => '{
            "endpoint": "http://10.14.10.1:8800/vllm/v1"
        }'::JSONB
    );

-- Adds an OpenAI provider.
SELECT
    aifun.add_provider(
        p_id => 'my_openai',
        p_type => 'openai',
        p_api_key => 'api-key-openai'
    );

-- Adds an Anthropic provider.
SELECT
    aifun.add_provider(
        p_id => 'my_anthropic',
        p_type => 'anthropic',
        p_api_key => 'api-key-anthropic'
    );

-- Adds a Google Gemini provider.
SELECT
    aifun.add_provider(
        p_id => 'my_gemini',
        p_type => 'gemini',
        p_api_key => 'api-key-gemini'
    );

Basic usage

AIFun provides a set of scalar functions for common AI tasks.

Text generation and chat

Use these functions to ask questions or conduct conversational flows.

-- Asks a question.
SELECT
    aifun.ask(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        prompt => 'What is PostgreSQL?'
    );

-- Asks a question with context.
SELECT
    aifun.ask(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        prompt => 'What is the main advantage?',
        context => '{"context": "PostgreSQL is a relational database with ACID compliance and extensive extensibility."}'::jsonb
    );

-- Conducts a conversation with the AI.
SELECT
    aifun.chat(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        messages => '[
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "What is PostgreSQL?"
            },
            {
                "role": "assistant",
                "content": "PostgreSQL is an open-source relational database management system."
            },
            {
                "role": "user",
                "content": "What are its main features?"
            }
        ]'::jsonb
    );

Text analysis and processing

Perform tasks like text classification, structured data extraction, summarization, and translation.

-- Classifies text.
SELECT
    aifun.classify(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        text_to_classify => 'I love this product!',
        labels => ARRAY['positive', 'negative', 'neutral']
    );

-- Extracts structured data.
SELECT
    aifun.extract(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        text_to_parse => 'John is 30 years old and works as a developer', 
        json_schema => '{
            "name": "string",
            "age": "number",
            "job": "string"
        }'::JSONB
    );

-- Summarizes text.
SELECT
    aifun.summarize(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        text_to_summarize => 'AIFun is a PostgreSQL extension that provides AI-powered functions using various LLM providers (OpenAI, Anthropic, Google Gemini, AWS Bedrock). It allows users to ask questions, generate embeddings, classify text, extract structured data, summarize, translate, and more.'
    );

-- Translates text.
SELECT
    aifun.translate(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        text_to_translate => 'Hello world', 
        target_language => 'Spanish'
    );

Vision and multimodal

Work with images and multimodal embeddings.

-- Performs visual Q&A.
SELECT
    aifun.visual_qa(
        provider => 'local_vlm',
        model => 'paddleocr-vl',
        image => 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAIAAAACUFjqAAAAE0lEQVR4nGP8z4APMOGVZRip0gBBLAETee26JgAAAABJRU5ErkJggg==',
        question => 'What is the color of the shape?'
    );

-- Generates text embeddings.
SELECT
    aifun.embed(
        provider => 'local_embedding',
        model => 'jina', 
        text_to_embed => 'PostgreSQL is awesome'
    );

-- Generates multimodal embeddings with JSON content.
SELECT
    aifun.multimodal_embed(
        provider => 'local_multimodal_embedding',
        model => 'bge', 
        content => '{"text": "A red square", "image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAIAAAACUFjqAAAAE0lEQVR4nGP8z4APMOGVZRip0gBBLAETee26JgAAAABJRU5ErkJggg=="}'::jsonb
    );

-- Generates multimodal embeddings with text and image arguments.
SELECT
    aifun.multimodal_embed(
        provider => 'local_multimodal_embedding',
        model => 'bge',
        text => 'A red square',
        image => 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAIAAAACUFjqAAAAE0lEQVR4nGP8z4APMOGVZRip0gBBLAETee26JgAAAABJRU5ErkJggg=='
    );

Document parsing

Extract text from various document formats (PDF, DOCX, PPTX).

-- Parses PDF (Replace with actual base64-encoded PDF content).
SELECT aifun.parse_pdf(
    file_content_base64 => 'data:application/pdf;base64,JVBERi0xLjQKJdPr6eUQyAAAAAA=='
);

-- Parses DOCX (Replace with actual base64-encoded DOCX content).
SELECT aifun.parse_docx(
    file_content_base64 => 'data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,UEsDBBQACAgIAQwCAAQAAQAAAgICAgICAgAAAAAA=='
);

-- Parses PPTX (Replace with actual base64-encoded PPTX content).
SELECT aifun.parse_pptx(
    file_content_base64 => 'data:application/vnd.openxmlformats-officedocument.presentationml.presentation;base64,UEsDBBQACAgIAQwCAAQAAQAAAgICAgICAgAAAAAA=='
);

-- Parses document with specified format.
SELECT aifun.parse_document(
    file_content_base64 => 'data:application/pdf;base64,JVBERi0xLjQKJdPr6eUQyAAAAAA==',
    file_extension => 'pdf'
);

Work with tables

You can apply AI functions directly to table data, enabling bulk processing such as batch translation or generating embeddings for existing rows.

CREATE TABLE IF NOT EXISTS test (
    id SERIAL PRIMARY KEY,
    content TEXT
);

-- Inserts some test data.
INSERT INTO test (content)
VALUES ('PostgreSQL is a powerful open-source database system.'),
       ('I love using Anthropic for AI-powered chatbots.'),
       ('AIFun is a PostgreSQL extension that provides AI-powered functions using various LL M providers.');

-- Translates content in bulk.
CREATE TABLE test_translation AS
SELECT
    id,
    aifun.translate(
        provider => 'local_llm',
        model => 'zhipu/glm4-9b-chat',
        text_to_translate => content, 
        target_language => 'zh'
    ) AS translation
FROM test;

-- Generates embeddings for content.
CREATE TABLE test_embedding AS
SELECT
    id,
    aifun.embed(
        provider => 'local_llm',
        model => 'jina', 
        text_to_embed => content
    ) AS embedding
FROM test;

Work with directory tables

AIFun integrates with directory tables to process unstructured data files stored in the file system.

-- Creates a directory table.
CREATE DIRECTORY TABLE test_dirt;

-- Parses PDF files in the directory table.
SELECT
    aifun.parse_pdf(
        file_content_bytea => content
    )  
FROM directory_table('test_dirt')
WHERE relative_path LIKE '%.pdf';

-- Parses all document types automatically.
SELECT
    aifun.parse_document(
        file_content_bytea => content,
        file_extension => SPLIT_PART(relative_path, '.', -1)
    )  
FROM directory_table('test_dirt');

Manage providers

You can use the following functions to list, update, or remove your registered AI providers.

-- Lists your providers.
SELECT * FROM aifun.my_providers;

-- Updates API key.
SELECT aifun.update_api_key('my_openai', 'new-api-key');

-- Removes provider.
SELECT aifun.remove_provider('my_openai');

-- Checks if provider exists.
SELECT aifun.has_provider('my_openai');

Security model

This extension uses row level security (RLS) to ensure complete user isolation:

  • User isolation: Each user can only access their own provider configurations.

  • No shared secrets: No master encryption keys that could be compromised.

  • Direct API keys: Users manage their own API keys directly.

  • PostgreSQL security: Leverages PostgreSQL’s built-in RLS for access control.

How it works:

  1. Each provider configuration is tagged with the owner’s username.

  2. RLS policies ensure users can only see or modify their own data.

  3. API keys are stored as plain text (user’s own keys).

  4. No central encryption/decryption overhead.

List of functions

The following table lists all available functions in the aifun schema, including their return types and arguments.

Name

Return type

Arguments

_get_provider_key

text

p_provider_id text

add_provider

void

p_id text, p_type text, p_api_key text, p_metadata jsonb DEFAULT NULL::jsonb

ask

text

provider text, model text, prompt text

ask

text

provider text, model text, prompt text, context text

chat

text

provider text, model text, messages jsonb

chunk

text[]

text text, chunk_size integer DEFAULT 1000, overlap integer DEFAULT 200

classify

text

provider text, model text, text_to_classify text, labels text[]

embed

vector

provider text, model text, text_to_embed text

extract

jsonb

provider text, model text, text_to_parse text, json_schema jsonb

extract_keywords

text[]

provider text, model text, text text, num_keywords integer DEFAULT 5

fix_grammar

text

provider text, model text, text text

has_provider

boolean

p_provider_id text

help

text

p_function_name text

list_all

TABLE(function_name text, function_description text)

None

multimodal_embed

vector

provider text, model text, content jsonb

multimodal_embed

vector

provider text, model text, text text DEFAULT NULL::text, image bytea DEFAULT NULL::bytea

multimodal_embed

vector

provider text, model text, text text DEFAULT NULL::text, image text DEFAULT NULL::text

parse_document

jsonb

file_content_base64 text, file_extension text

parse_document

jsonb

file_content_bytea bytea, file_extension text

parse_docx

jsonb

file_content_base64 text

parse_docx

jsonb

file_content_bytea bytea

parse_pdf

jsonb

file_content_base64 text

parse_pdf

jsonb

file_content_bytea bytea

parse_pdf_with_vlm

jsonb

provider text, model text, file_content_base64 text, prompt text DEFAULT 'Extract all text and describe any images, charts, or visual elements in this PDF.'::text

parse_pdf_with_vlm

jsonb

provider text, model text, file_content_bytea bytea, prompt text DEFAULT 'Extract all text and describe any images, charts, or visual elements in this PDF.'::text

parse_pptx

jsonb

file_content_base64 text

parse_pptx

jsonb

file_content_bytea bytea

remove_provider

void

p_id text

rerank

text[]

provider text, model text, query text, documents text[]

similarity

double precision

provider text, model text, text1 text, text2 text

summarize

text

provider text, model text, text_to_summarize text, length integer DEFAULT 50

translate

text

provider text, model text, text_to_translate text, target_language text

update_api_key

void

p_provider_id text, p_api_key text

visual_qa

text

provider text, model text, image bytea, question text

visual_qa

text

provider text, model text, image text, question text