Skip to main content >
Citus
v12.0
Get Started
What is Citus?
Citus Gives You Postgres At Any Scale
When to Use Citus
Multi-Tenant SaaS Database
Real-Time Analytics
Microservices
Considerations for Use
When Citus is Inappropriate
Quick Tutorials
Multi-tenant Applications
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Real-time Analytics
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Microservices
Distributed schemas
Creating tables
Configure services
Running the services
Exploring the database
Install
Single-Node Citus
Docker (Mac or Linux)
Ubuntu or Debian
Fedora, CentOS, or Red Hat
Multi-Node Citus
Ubuntu or Debian
Steps to be executed on all nodes
Steps to be executed on the coordinator node
Fedora, CentOS, or Red Hat
Steps to be executed on all nodes
Steps to be executed on the coordinator node
Managed Service
Use-Case Guides
Multi-tenant Applications
Let’s Make an App – Ad Analytics
Scaling the Relational Data Model
Preparing Tables and Ingesting Data
Try it Yourself
Integrating Applications
Sharing Data Between Tenants
Online Changes to the Schema
When Data Differs Across Tenants
Scaling Hardware Resources
Dealing with Big Tenants
Where to Go From Here
Real-Time Dashboards
Data Model
Rollups
Expiring Old Data
Approximate Distinct Counts
Unstructured Data with JSONB
Timeseries Data
Scaling Timeseries Data on Citus
Automating Partition Creation
Archiving with Columnar Storage
Archiving a Row Partition to Columnar Storage
Architecture
Concepts
Nodes
Coordinator and Workers
Sharding models
Row-based sharding
Schema-based sharding
Sharding tradeoffs
Distributed Data
Table Types
Type 1: Distributed Tables
Type 2: Reference Tables
Type 3: Local Tables
Type 4: Local Managed Tables
Type 5: Schema Tables
Shards
Shard Placements
Co-Location
Parallelism
Query Execution
Develop
Determining Application Type
At a Glance
Examples and Characteristics
Choosing Distribution Column
Multi-Tenant Apps
Best Practices
Real-Time Apps
Best Practices
Timeseries Data
Best Practices
Table Co-Location
Data co-location in Citus for hash-distributed tables
A practical example of co-location
Using Regular PostgreSQL Tables
Distributing tables by ID
Distributing tables by tenant
Co-location means better feature support
Query Performance
Migrating an Existing App
Identify Distribution Strategy
Pick distribution key
Identify types of tables
Prepare Source Tables for Migration
Add distribution keys
Backfill newly created columns
Prepare Application for Citus
Set up Development Citus Cluster
Include distribution column in keys
Add distribution key to queries
Ruby on Rails
Django Multitenant
ASP.NET
Java Hibernate
Other (SQL Principles)
Enable Secure Connections
Check for cross-node traffic
Migrate Production Data
Small Database Migration
Big Database Migration
Duplicate schema
Enable logical replication
Open access for network connection
Begin Replication
Switch over to Citus and stop all connections to old database
SQL Reference
Creating and Modifying Distributed Objects (DDL)
Creating and Distributing Schemas
Creating And Distributing Tables
Reference Tables
Distributing Coordinator Data
Co-Locating Tables
Dropping Tables
Modifying Tables
Adding/Modifying Columns
Adding/Removing Constraints
Using NOT VALID Constraints
Adding/Removing Indices
Types and Functions
Manual Modification
Ingesting, Modifying Data (DML)
Inserting Data
“From Select” Clause (Distributed Rollups)
COPY Command (Bulk load)
Caching Aggregations with Rollups
Updates and Deletion
Maximizing Write Performance
Querying Distributed Tables (SQL)
Aggregate Functions
Count (Distinct) Aggregates
Estimating Top N Items
Percentile Calculations
Limit Pushdown
Views on Distributed Tables
Joins
Co-located joins
Reference table joins
Repartition joins
Query Processing
Distributed Query Planner
Distributed Query Executor
Subquery/CTE Push-Pull Execution
PostgreSQL planner and executor
Manual Query Propagation
Running on all Workers
Running on all Shards
Limitations
SQL Support and Workarounds
Limitations
General
Cross-Node SQL Queries
Schema-based Sharding SQL compatibility
Workarounds
Work around limitations using CTEs
Temp Tables: the Workaround of Last Resort
Subqueries within INSERT queries
Citus API
Citus Utility Functions
Table and Shard DDL
citus_schema_distribute
citus_schema_undistribute
create_distributed_table
truncate_local_data_after_distributing_table
undistribute_table
alter_distributed_table
alter_table_set_access_method
remove_local_tables_from_metadata
create_reference_table
citus_add_local_table_to_metadata
update_distributed_table_colocation
create_distributed_function
alter_columnar_table_set
create_time_partitions
drop_old_time_partitions
alter_old_partitions_set_access_method
Metadata / Configuration Information
citus_add_node
citus_update_node
citus_set_node_property
citus_add_inactive_node
citus_activate_node
citus_disable_node
citus_add_secondary_node
citus_remove_node
citus_get_active_worker_nodes
citus_backend_gpid
citus_check_cluster_node_health
citus_set_coordinator_host
master_get_table_metadata
get_shard_id_for_distribution_column
column_to_column_name
citus_relation_size
citus_table_size
citus_total_relation_size
citus_stat_statements_reset
Cluster Management And Repair Functions
citus_move_shard_placement
citus_rebalance_start
citus_rebalance_status
citus_rebalance_stop
citus_rebalance_wait
rebalance_table_shards
get_rebalance_table_shards_plan
get_rebalance_progress
citus_add_rebalance_strategy
citus_set_default_rebalance_strategy
citus_remote_connection_stats
citus_drain_node
isolate_tenant_to_new_shard
citus_create_restore_point
Citus Tables and Views
Coordinator Metadata
Partition table
Shard table
Shard information view
Shard placement table
Worker node table
Distributed object table
Citus schemas view
Citus tables view
Time partitions view
Co-location group table
Rebalancer strategy table
Query statistics table
Tenant-level query statistics view
Distributed Query Activity
Tables on all Nodes
Connection Credentials Table
Connection Pooling Credentials
Configuration Reference
General configuration
citus.max_background_task_executors_per_node (integer)
citus.max_worker_nodes_tracked (integer)
citus.use_secondary_nodes (enum)
citus.cluster_name (text)
citus.enable_version_checks (boolean)
citus.log_distributed_deadlock_detection (boolean)
citus.distributed_deadlock_detection_factor (floating point)
citus.node_connection_timeout (integer)
citus.node_conninfo (text)
citus.local_hostname (text)
citus.show_shards_for_app_name_prefixes (text)
citus.rebalancer_by_disk_size_base_cost (integer)
Query Statistics
citus.stat_statements_purge_interval (integer)
citus.stat_statements_max (integer)
citus.stat_statements_track (enum)
citus.stat_tenants_untracked_sample_rate (floating point)
Data Loading
citus.multi_shard_commit_protocol (enum)
citus.shard_count (integer)
citus.shard_max_size (integer)
citus.replicate_reference_tables_on_activate (boolean)
citus.metadata_sync_mode (enum)
Planner Configuration
citus.local_table_join_policy (enum)
citus.limit_clause_row_fetch_count (integer)
citus.count_distinct_error_rate (floating point)
citus.task_assignment_policy (enum)
enable_non_colocated_router_query_pushdown (boolean)
Intermediate Data Transfer
citus.binary_worker_copy_format (boolean)
citus.max_intermediate_result_size (integer)
DDL
citus.enable_ddl_propagation (boolean)
citus.enable_local_reference_table_foreign_keys (boolean)
citus.enable_change_data_capture (boolean)
citus.enable_schema_based_sharding (boolean)
Executor Configuration
General
Explain output
External Integrations
Change Data Capture (CDC)
Differences from single-node PostgreSQL
Logical replication of distributed tables to PostgreSQL tables
Logical decoding caveats
Ingesting Data from Kafka
Caveats
Ingesting Data from Spark
Business Intelligence with Tableau
Administer
Cluster Management
Choosing Cluster Size
Shard Count
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Initial Hardware Size
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Scaling the cluster
Add a worker
Rebalance Shards without Downtime
Parallel Rebalancing
How it Works
Adding a coordinator
Dealing With Node Failures
Worker Node Failures
Coordinator Node Failures
Tenant Isolation
Row-based sharding
Schema-based sharding
Make the move
Viewing Query Statistics
Tenant-level Statistics
Statistics Expiration
Resource Conservation
Limiting Long-Running Queries
Security
Connection Management
Setup Certificate Authority signed certificates
Increasing Worker Security
Row-Level Security
PostgreSQL extensions
Creating a New Database
Table Management
Determining Table and Relation Size
Vacuuming Distributed Tables
Analyzing Distributed Tables
Columnar Storage
Usage
Measuring compression
Example
Gotchas
Limitations
Upgrading Citus
Upgrading Citus Versions
Patch Version Upgrade
Major and Minor Version Upgrades
Step 1. Update Citus Package
Step 2. Apply Update in DB
Upgrading PostgreSQL version from 14 to 15
For Every Node
Troubleshoot
Query Performance Tuning
Table Distribution and Shards
PostgreSQL tuning
Scaling Out Performance
Distributed Query Performance Tuning
General
Subquery/CTE Network Overhead
Advanced
Connection Management
Task Assignment Policy
Intermediate Data Transfer Format
Binary protocol
Scaling Out Data Ingestion
Real-time Insert and Updates
Insert Throughput
Update Throughput
Insert and Update: Throughput Checklist
Insert and Update: Latency
Staging Data Temporarily
Bulk Copy (250K - 2M/s)
Useful Diagnostic Queries
Finding which shard contains data for a specific tenant
Finding which node hosts a distributed schema
Finding the distribution column for a table
Detecting locks
Querying the size of your shards
Querying the size of all distributed tables
Identifying unused indices
Monitoring client connection count
Viewing system queries
Active queries
Why are queries waiting
Index hit rate
Cache hit rate
Common Error Messages
Could not receive query results
Resolution
Canceling the transaction since it was involved in a distributed deadlock
Resolution
Could not connect to server: Cannot assign requested address
Resolution
SSL error: certificate verify failed
Resolution
Could not connect to any active placements
Resolution
Remaining connection slots are reserved for non-replication superuser connections
Resolution
PgBouncer cannot connect to server
Resolution
Relation
foo
is not distributed
Resolution
Unsupported clause type
Resolution
Cannot open new connections after the first modification command within a transaction
Resolution
Cannot create uniqueness constraint
Resolution
Function create_distributed_table does not exist
Resolution
STABLE functions used in UPDATE queries cannot be called with column references
Resolution
FAQ
Frequently Asked Questions
Can I create primary keys on distributed tables?
How do I add nodes to an existing Citus cluster?
How does Citus handle failure of a worker node?
How does Citus handle failover of the coordinator node?
Are there any PostgreSQL features not supported by Citus?
How do I choose the shard count when I hash-partition my data?
How do I change the shard count for a hash partitioned table?
How does citus support count(distinct) queries?
In which situations are uniqueness constraints supported on distributed tables?
How do I create database roles, functions, extensions etc in a Citus cluster?
What if a worker node’s address changes?
Which shard contains data for a particular tenant?
I forgot the distribution column of a table, how do I find it?
Can I distribute a table by multiple keys?
Why does pg_relation_size report zero bytes for a distributed table?
Why am I seeing an error about max_intermediate_result_size?
Can I run Citus on Microsoft Azure?
Can I shard by schema on Citus for multi-tenant applications?
How does cstore_fdw work with Citus?
What happened to pg_shard?
Articles
Related Articles
Efficient Rollup Tables with HyperLogLog in Postgres
Rollup tables without HLL—using GitHub events data as an example
Without HLL, rollup tables have a few limitations
HLL to the rescue
HLL and rollup tables in action, together
What kinds of queries can HLL answer?
A rollup table with HLL is worth a thousand rollup tables without HLL
Want to learn more about HLL in Postgres?
Distributed Distinct Count with HyperLogLog on Postgres
What does HLL do behind the curtains?
Hash all elements
Observe the data for rare patterns
Stochastic Averaging
More?
HLL in distributed systems
Hands on with HLL
Setup
Examples
Conclusion
Postgres Parallel Indexing in Citus
Real-time Event Aggregation at Scale Using Postgres with Citus
How Distributed Outer Joins on PostgreSQL with Citus Work
Distributed Outer Joins with Citus
Designing your SaaS Database for Scale with Postgres
Building a Scalable Postgres Metrics Backend using the Citus Extension
Time-Series Metrics
Events
Sharding a Multi-Tenant App with Postgres
Tenancy
Multi-tenancy and co-location, a perfect pair
In conclusion
Sharding Postgres with Semi-Structured Data and Its Performance Implications
One large table, without joins
Enter Citus
The query workload
Every distribution has its thorns
Scalable Real-time Product Search using PostgreSQL with Citus
Read the Docs
v: v12.0
Versions
latest
stable
v12.1
v12.0
v11.3
v11.2
v11.1
v11.0
v10.2
v10.1
v10.0
v9.5
v9.4
v9.3
v9.2
v9.1
v9.0
v8.3
v8.2
v8.1
v8.0
v7.5
v7.4
v7.3
v7.2
v7.1
v7.0
v6.2.2
v6.2
v6.1
v6.0
v5.2
v5.1
v5.0
a11y-2
Downloads
pdf
epub
On Read the Docs
Project Home
Builds
Citus
»
Citus API
Edit on GitHub
Citus API
Citus Utility Functions
Table and Shard DDL
Metadata / Configuration Information
Cluster Management And Repair Functions
Citus Tables and Views
Coordinator Metadata
Tables on all Nodes
Configuration Reference
General configuration
Query Statistics
Data Loading
Planner Configuration
Intermediate Data Transfer
DDL
Executor Configuration