Just-in-Time Compilation (JIT)

This topic explains Just-in-Time (JIT) compilation and how to configure it in SynxDB.

What is JIT compilation

Just-in-Time (JIT) compilation transforms interpreted program evaluation into a native program at run time. For example, instead of using general-purpose code to evaluate arbitrary SQL expressions like WHERE a.col=3, JIT generates a function specific to that expression. The CPU executes this function natively, which speeds up execution. JIT compilation reduces the overhead of indirect jumps and branches common in generic interpreted code by generating native code with direct calls and constant folding.

SynxDB uses LLVM for JIT compilation. SynxDB uses LLVM for JIT compilation. Unlike the standard interpretation execution, JIT is an optional execution mode.

The JIT workflow is designed with fault tolerance. If the JIT library fails to load on the segments (for example, if the dependency is not installed), the execution mode automatically falls back to non-JIT interpretation without interrupting the query.

User scenarios

JIT compilation primarily benefits long-running CPU-bound queries, such as analytical queries. For short queries, the overhead of JIT compilation often exceeds the time it saves.

By generating native code specific to the query and data layout, JIT optimizes away a great percentage of the interpretation overhead. This process speeds up query completion for complex workloads.

Principles of JIT compilation

The internal workflow of JIT has three stages:

Planner stage

This stage occurs in the SynxDB coordinator. The planner generates the plan tree of a query and estimates its cost.

The planner triggers JIT compilation if:
- The server configuration parameter jit is true.
- The estimated query cost exceeds the value of jit_above_cost.
If jit_expressions is enabled, the planner suggests that the executor compile the expressions in JIT space. The planner makes other decisions based on costs:
- If the estimated cost exceeds jit_inline_above_cost, the planner compiles short functions and operators used in the query using in-line compilation.
- If the estimated cost exceeds jit_optimize_above_cost, the planner applies expensive optimizations to improve the generated code.
- If jit_tuple_deforming is enabled, the planner generates a custom function to deform the target table.
When the plan is ready, the planner sends the plan trees and JIT flags to the executor.
Executor initialization stage

This stage occurs in the SynxDB segments. SynxDB creates the expression evaluation steps. If using JIT, it rewrites the steps as functions in the JIT space. The planner decisions determine whether to trigger JIT compilation and which strategy to apply. However, SynxDB decides to use JIT at execution time only if jit is enabled and the JIT libraries load successfully. The executor ignores cached decisions if the configuration for jit or jit_expressions changes to false between the planner and execution stages, or if an error occurs.

In addition, the executor checks the developer configuration parameters for providers, bitcode dumping, profiling, and debugging support.
Executor run stage

This stage also occurs in the SynxDB segments. The segments execute the steps provided by the initialization stage. The functions in JIT space are combined as a whole before the first call.

JIT accelerated operations

Currently, the SynxDB JIT implementation accelerates expression evaluation and tuple deforming:

Expression evaluation: Evaluates WHERE clauses, target lists, aggregates, and projections. SynxDB accelerates this by generating code specific to each case.
Tuple deforming: Transforms an on-disk tuple into its in-memory representation. SynxDB accelerates this by creating a function specific to the table layout and the number of columns to extract.

In-line compilation (Inlining)

SynxDB allows defining new data types, functions, operators, and other database objects. Built-in objects use similar mechanisms. This extensibility incurs overhead, for example, due to function calls. To reduce this overhead, JIT uses in-line compilation to fit the bodies of small functions into the expressions that use them. This process optimizes away a significant percentage of the overhead. SynxDB uses pre-generated bitcode files installed with the server for built-in functions and operators to facilitate this inlining.

Optimization

LLVM supports optimizing generated code. Some optimizations are cheap enough to perform whenever JIT runs, while others benefit only longer-running queries.

How to use JIT compilation

Prerequisites

Note

To use JIT, first install the LLVM libraries in your system. SynxDB requires LLVM version 14.0.0 or lower. LLVM 14.0.0 is recommended. You can install the libraries by running the command yum install llvm-libs.

Configuration

JIT works with both GPORCA and the Postgres-based planner. Because GPORCA and the Postgres-based planner use different algorithms and calculate costs differently, tune the JIT thresholds according to your usage.

Enable JIT: Set jit to on.
Tune thresholds: Adjust jit_above_cost to determine when JIT triggers.
- Check the values of these configuration parameters for both GPORCA and the Postgres-based planner, because the meaning of cost differs.
- Because SynxDB with GPORCA might fall back to the Postgres-based planner for some operations, verify settings for both planners.
- Setting the JIT cost parameters to 0 forces JIT compilation for all queries. This is useful for testing but slows down short queries.
- Setting them to a negative value disables the feature the parameter provides.

Usage examples

To verify that JIT compilation is active and working correctly, or to force JIT compilation for testing purposes, you can temporarily lower the JIT cost thresholds.

Configure the session to force JIT compilation:

By setting the cost thresholds to 0, you ensure that JIT compilation is triggered even for simple queries that would normally be too fast to benefit from it.
```
SET jit = on;
SET jit_above_cost = 0;
SET jit_inline_above_cost = 0;
SET jit_optimize_above_cost = 0;
```
Run a query with EXPLAIN (ANALYZE, VERBOSE):

Use EXPLAIN (ANALYZE, VERBOSE) to execute the query and display detailed execution statistics, including JIT information. Alternatively, enable the configuration parameter gp_explain_jit.

The EXPLAIN output provides specific JIT metrics, such as:
- Functions: The number of JIT functions created.
- Timing: The average time spent in JIT tasks per slice.
- Options: Which JIT strategies (Inlining, Optimization, Expressions, Deforming) were applied.

Example with GPORCA:

EXPLAIN (ANALYZE, VERBOSE) SELECT count(*) FROM jit_explain_output;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=0.00..431.00 rows=1 width=8) (actual time=...)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8) (actual time=...)
         ->  Partial Aggregate  (cost=0.00..431.00 rows=1 width=8) (actual time=...)
               ->  Seq Scan on jit_explain_output  (cost=0.00..431.00 rows=...) (actual time=...)
 Settings: jit = 'on', jit_above_cost = '0', jit_inline_above_cost = '0', jit_optimize_above_cost = '0'
 Optimizer: GPORCA
 Planning Time: 2.125 ms
 JIT:
   Options: Inlining true, Optimization true, Expressions true, Deforming true.
   (slice0): Functions: 2.00. Timing: 1.137 ms total.
   (slice1): Functions: 1.00 avg x 3 workers. Timing: 0.830 ms avg.
 Execution Time: 15.123 ms
(12 rows)

Example with Postgres-based planner:

SET optimizer = off;
EXPLAIN (ANALYZE, VERBOSE) SELECT count(*) FROM jit_explain_output;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=...) (actual time=...)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=...) (actual time=...)
         ->  Partial Aggregate  (cost=...) (actual time=...)
               ->  Seq Scan on jit_explain_output  (cost=...) (actual time=...)
 Settings: jit = 'on', jit_above_cost = '0', jit_inline_above_cost = '0', jit_optimize_above_cost = '0', optimizer = 'off'
 Optimizer: Postgres query optimizer
 Planning Time: 0.158 ms
 JIT:
   Options: Inlining true, Optimization true, Expressions true, Deforming true.
   (slice0): Functions: 2.00. Timing: 1.381 ms total.
   (slice1): Functions: 1.00 avg x 3 workers. Timing: 0.854 ms max.
 Execution Time: 24.023 ms
(14 rows)