Citus Docs
v7.4
Get Started
What is Citus?
When to Use Citus
Multi-Tenant Database
Real-Time Analytics
Considerations for Use
When Citus is Inappropriate
Concepts
Distributed Architecture
Nodes: Coordinator and Workers
Table Types
Type 1: Distributed Tables
Distribution Column
Type 2: Reference Tables
Type 3: Local Tables
Shards
Shard Placements
Co-Location
Parallelism
Quick Tutorials
Multi-tenant Applications
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Real-time Analytics
Data model and sample data
Creating tables
Distributing tables and loading data
Running queries
Install
Single-Machine Cluster
Docker (Mac or Linux)
Ubuntu or Debian
Fedora, CentOS, or Red Hat
Multi-Machine Cluster
Ubuntu or Debian
Steps to be executed on all nodes
Steps to be executed on the coordinator node
Fedora, CentOS, or Red Hat
Steps to be executed on all nodes
Steps to be executed on the coordinator node
AWS CloudFormation
Deploy on Citus Cloud
Use-Case Guides
Multi-tenant Applications
Let’s Make an App – Ad Analytics
Scaling the Relational Data Model
Preparing Tables and Ingesting Data
Try it Yourself
Integrating Applications
Sharing Data Between Tenants
Online Changes to the Schema
When Data Differs Across Tenants
Scaling Hardware Resources
Dealing with Big Tenants
Where to Go From Here
Real-Time Dashboards
Data Model
Rollups
Expiring Old Data
Approximate Distinct Counts
Unstructured Data with JSONB
Timeseries Data
Scaling Timeseries Data on Citus
Automating Partition Creation
Develop
Determining Application Type
At a Glance
Examples and Characteristics
Choosing Distribution Column
Multi-Tenant Apps
Best Practices
Real-Time Apps
Best Practices
Timeseries Data
Best Practices
Table Co-Location
Data co-location in Citus for hash-distributed tables
A practical example of co-location
Using Regular PostgreSQL Tables
Distributing tables by ID
Distributing tables by tenant
Co-location means better feature support
Query Performance
Migrating an Existing App
Multi-Tenant Schema Migration
Backfilling Tenant ID
Multi-Tenant Query Migration
Validating Query Migration
Ruby on Rails
Preparing to scale-out a multi-tenant application
Updating the Rails Application
Example Application
Django
Preparing to scale-out a multi-tenant application
Updating the Django Application
ASP.NET
Example App
Schema
ASP.NET Core project
Adding Tenancy to App
Define the Entity Framework Core context and models
Install SaasKit
Update views
Query the database
Test the app
Data Migration
Using Citus Warp
1. Duplicate schema
2. Enable logical replication
3. Open access for network connection
4. Begin Replication
5. Switch Production System to Citus Cloud
Real-Time Analytics Apps
SQL Reference
Creating and Modifying Distributed Tables (DDL)
Creating And Distributing Tables
Reference Tables
Distributing Coordinator Data
Co-Locating Tables
Upgrading from Citus 5.x
Dropping Tables
Modifying Tables
Adding/Modifying Columns
Adding/Removing Constraints
Adding/Removing Indices
Manual Modification
Ingesting, Modifying Data (DML)
Inserting Data
Single Row Inserts
Multi-Row Inserts
Bulk Loading
Distributed Aggregations
Updates and Deletion
Maximizing Write Performance
Querying Distributed Tables (SQL)
Aggregate Functions
Count (Distinct) Aggregates
Limit Pushdown
Views on Distributed Tables
Joins
Co-located joins
Reference table joins
Repartition joins
Query Processing
Distributed Query Planner
Distributed Query Executor
Real-time Executor
Router Executor
Task Tracker Executor
Subquery/CTE Push-Pull Execution
PostgreSQL planner and executor
Manual Query Propagation
Running on all Workers
Running on all Shards
Running on all Placements
Limitations
SQL Support and Workarounds
Workarounds
JOIN a local and a distributed table
INSERT…SELECT upserts lacking distribution column
Temp Tables: the Last Resort
Citus API
Citus Utility Functions
Table and Shard DDL
create_distributed_table
create_reference_table
upgrade_to_reference_table
mark_tables_colocated
master_create_distributed_table
master_create_worker_shards
master_create_empty_shard
Table and Shard DML
master_append_table_to_shard
master_apply_delete_command
master_modify_multiple_shards
Metadata / Configuration Information
master_add_node
master_update_node
master_add_inactive_node
master_activate_node
master_disable_node
master_add_secondary_node
master_remove_node
master_get_active_worker_nodes
master_get_table_metadata
get_shard_id_for_distribution_column
column_to_column_name
citus_relation_size
citus_table_size
citus_total_relation_size
Cluster Management And Repair Functions
master_copy_shard_placement
master_move_shard_placement
rebalance_table_shards
get_rebalance_progress
replicate_table_shards
isolate_tenant_to_new_shard
citus_create_restore_point
Metadata Tables
Partition table
Shard table
Shard Storage Types
Shard placement table
Shard Placement States
Worker node table
Co-location group table
Configuration Reference
General configuration
citus.max_worker_nodes_tracked (integer)
citus.use_secondary_nodes (enum)
citus.cluster_name (text)
citus.enable_version_checks (boolean)
citus.log_distributed_deadlock_detection (boolean)
citus.distributed_deadlock_detection_factor (floating point)
Data Loading
citus.multi_shard_commit_protocol (enum)
citus.shard_replication_factor (integer)
citus.shard_count (integer)
citus.shard_max_size (integer)
Planner Configuration
citus.limit_clause_row_fetch_count (integer)
citus.count_distinct_error_rate (floating point)
citus.task_assignment_policy (enum)
Intermediate Data Transfer
citus.binary_worker_copy_format (boolean)
citus.binary_master_copy_format (boolean)
citus.max_intermediate_result_size (integer)
DDL
citus.enable_ddl_propagation (boolean)
Executor Configuration
citus.all_modifications_commutative
citus.max_task_string_size (integer)
citus.remote_task_check_interval (integer)
citus.task_executor_type (enum)
citus.multi_task_query_log_level (enum)
Real-time executor configuration
Task tracker executor configuration
Explain output
Append Distribution
Creating and Distributing Tables
Expiring Data
Deleting Data
Dropping Tables
Data Loading
Bulk load using \copy
Incremental loads by appending to existing shards
Increasing data loading performance
Scaling Data Ingestion
Coordinator Node Bulk Ingestion (100k/s-200k/s)
Worker Node Bulk Ingestion (100k/s-1M/s)
Pre-processing Data in Citus
External Integrations
Ingesting Data from Kafka
Caveats
Ingesting Data from Spark
Business Intelligence with Tableau
Citus Cloud
Get Started
Provisioning
Configuring Your Plan
Supported Regions
Other infrastructure providers
Connecting
Connecting Directly to a Worker
Manage
Scaling
Scaling Up (increasing node size)
Scaling Out (adding new nodes)
Rebalancing
Scaling Connections (pgBouncer)
Monitoring
Cloud Platform Status
Resources Usage
Amazon EBS Volume Metrics
CPU and Network
PostgreSQL Write-Ahead Log
Formation Events Feed
StatsD external reporting
Example: Datadog with statsd
VividCortex External Monitoring
Security
Connecting with SSL
Two-Factor Authentication
Users and Permissions
Granting Privileges in Bulk
Encryption at Rest
Network Perimeter Controls
Backup, Availability, and Replication
High-Availability (HA) Replication
Disaster Recovery (DR)
Comparison of HA and DR
Disaster Recovery takes a little extra work but gives greater reliability
Trade-offs between latency and reliability for backups
How High Availability and Disaster Recovery is used for crash recovery
Point-in-Time Recovery
Upgrades
Logging
What Is Logged
Recent Logs
External Log Destinations
Verified Provider Settings
Additional Features
Extensions
Forking
How to Fork a Formation
When is it Useful
How it Works Internally
Followers
Masterless Mode (beta)
Architecture
Data Access
Scaling Out a Raw Events Table
Limitations Compared to Citus
Custom PostgreSQL Configuration
Support and Billing
Support
Billing
Pricing
Payments
Administer
Cluster Management
Choosing Cluster Size
Shard Count
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Initial Hardware Size
Multi-Tenant SaaS Use-Case
Real-Time Analytics Use-Case
Scaling the cluster
Add a worker
Rebalance Shards without Downtime
How it Works
Adding a coordinator
Dealing With Node Failures
Worker Node Failures
Coordinator Node Failures
Tenant Isolation
Worker Security
PostgreSQL extensions
Checks For Updates and Cluster Statistics
What we Collect
How to Opt Out
Table Management
Determining Table and Relation Size
Vacuuming Distributed Tables
Analyzing Distributed Tables
Upgrading Citus
Upgrading Citus Versions
Patch Version Upgrade
Major and Minor Version Upgrades
Step 1. Update Citus Package
Step 2. Apply Update in DB
Upgrading PostgreSQL version from 9.6 to 10.0
On the Coordinator Node
On Worker Nodes
Troubleshoot
Query Performance Tuning
Table Distribution and Shards
PostgreSQL tuning
Scaling Out Performance
Distributed Query Performance Tuning
General
Subquery/CTE Network Overhead
Advanced
Task Assignment Policy
Intermediate Data Transfer Format
Real Time Executor
Task Tracker Executor
Scaling Out Data Ingestion
Real-time Insert and Updates
Insert Throughput
Update Throughput
Insert and Update: Throughput Checklist
Insert and Update: Latency
Bulk Copy (250K - 2M/s)
Masterless Citus (50k/s-500k/s)
Useful Diagnostic Queries
Finding which shard contains data for a specific tenant
Finding the distribution column for a table
Detecting locks
Querying the size of your shards
Querying the size of all distributed tables
Determining Replication Factor per Table
Identifying unused indices
Monitoring client connection count
Index hit rate
Common Error Messages
Failed to execute task
n
Resolution
Relation
foo
is not distributed
Resolution
Could not receive query results
Resolution
Canceling the transaction since it was involved in a distributed deadlock
Resolution
Cannot establish a new connection for placement
n
, since DML has been executed on a connection that is in use
Resolution
Could not connect to server: Cannot assign requested address
Resolution
Remaining connection slots are reserved for non-replication superuser connections
Resolution
PgBouncer cannot connect to server
Resolution
Unsupported clause type
Resolution
Cannot open new connections after the first modification command within a transaction
Resolution
ON CONFLICT is not supported via coordinator
Resolution
FAQ
Frequently Asked Questions
Can I create primary keys on distributed tables?
How do I add nodes to an existing Citus cluster?
How does Citus handle failure of a worker node?
How does Citus handle failover of the coordinator node?
How do I ingest the results of a query into a distributed table?
Can I join distributed and non-distributed tables together in the same query?
Are there any PostgreSQL features not supported by Citus?
How do I choose the shard count when I hash-partition my data?
How do I change the shard count for a hash partitioned table?
How does citus support count(distinct) queries?
In which situations are uniqueness constraints supported on distributed tables?
How do I create database roles, functions, extensions etc in a Citus cluster?
What if a worker node’s address changes?
Which shard contains data for a particular tenant?
I forgot the distribution column of a table, how do I find it?
Can I distribute a table by multiple keys?
Why does pg_relation_size report zero bytes for a distributed table?
Why am I seeing an error about max_intermediate_result_size?
Can I run Citus on Heroku?
Can I run Citus on Amazon RDS?
Can I use Citus with my existing AWS account?
Can I shard by schema on Citus for multi-tenant applications?
How does cstore_fdw work with Citus?
What happened to pg_shard?
Articles
Related Articles
Introducing Citus Add-on for Heroku: Scale out your Postgres
Continuous protection and monitoring with Citus Cloud
The insights you need
Granular access control in Citus Cloud
Scale out Postgres on Heroku today
Efficient Rollup Tables with HyperLogLog in Postgres
Rollup tables without HLL—using GitHub events data as an example
Without HLL, rollup tables have a few limitations
HLL to the rescue
HLL and rollup tables in action, together
What kinds of queries can HLL answer?
A rollup table with HLL is worth a thousand rollup tables without HLL
Want to learn more about HLL in Postgres?
Distributed Distinct Count with HyperLogLog on Postgres
What does HLL do behind the curtains?
Hash all elements
Observe the data for rare patterns
Stochastic Averaging
More?
HLL in distributed systems
Hands on with HLL
Setup
Examples
Conclusion
How to Scale PostgreSQL on AWS: Learnings from Citus Cloud
1. High Availability
2. Effortless Failovers
3. Vertical Scalability
4. Horizontal Scalability
5. Automatic Backups for Disaster Recovery
6. Monitoring, Alerts, and Logging
7. Auto-configure, patch, and upgrade
Conclusion
Postgres Parallel Indexing in Citus
Real-time Event Aggregation at Scale Using Postgres with Citus
How Distributed Outer Joins on PostgreSQL with Citus Work
Distributed Outer Joins with Citus
Designing your SaaS Database for Scale with Postgres
Building a Scalable Postgres Metrics Backend using the Citus Extension
Time-Series Metrics
Events
Sharding a Multi-Tenant App with Postgres
Tenancy
Multi-tenancy and co-location, a perfect pair
In conclusion
Sharding Postgres with Semi-Structured Data and Its Performance Implications
One large table, without joins
Enter Citus
The query workload
Every distribution has its thorns
Scalable Real-time Product Search using PostgreSQL with Citus
Citus Docs
»
Additional Features
Edit on GitHub
Additional Features
¶
Extensions
Forking
How to Fork a Formation
When is it Useful
How it Works Internally
Followers
Masterless Mode (beta)
Architecture
Data Access
Scaling Out a Raw Events Table
Limitations Compared to Citus
Custom PostgreSQL Configuration
Read the Docs
v: v7.4
Versions
latest
stable
v11.3
v11.2
v11.1
v11.0
v10.2
v10.1
v10.0
v9.5
v9.4
v9.3
v9.2
v9.1
v9.0
v8.3
v8.2
v8.1
v8.0
v7.5
v7.4
v7.3
v7.2
v7.1
v7.0
v6.2.2
v6.2
v6.1
v6.0
v5.2
v5.1
v5.0
a11y-2
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds