Logo Citus Docs
v7.3

Get Started

  • What is Citus?
    • When to Use Citus
      • Multi-Tenant Database
      • Real-Time Analytics
    • Considerations for Use
    • When Citus is Inappropriate
  • Concepts
    • Distributed Architecture
      • Nodes: Coordinator and Workers
    • Table Types
      • Type 1: Distributed Tables
        • Distribution Column
      • Type 2: Reference Tables
      • Type 3: Local Tables
    • Shards
      • Shard Placements
    • Co-Location
    • Parallelism
  • Quick Tutorials
    • Multi-tenant Applications
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries
    • Real-time Analytics
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries

Install

  • Single-Machine Cluster
    • Docker (Mac or Linux)
    • Ubuntu or Debian
    • Fedora, CentOS, or Red Hat
  • Multi-Machine Cluster
    • Ubuntu or Debian
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • Fedora, CentOS, or Red Hat
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • AWS CloudFormation
  • Deploy on Citus Cloud

Use-Case Guides

  • Multi-tenant Applications
    • Let’s Make an App – Ad Analytics
    • Scaling the Relational Data Model
    • Preparing Tables and Ingesting Data
      • Try it Yourself
    • Integrating Applications
    • Sharing Data Between Tenants
    • Online Changes to the Schema
    • When Data Differs Across Tenants
    • Scaling Hardware Resources
    • Dealing with Big Tenants
    • Where to Go From Here
  • Real-Time Dashboards
    • Data Model
    • Rollups
    • Expiring Old Data
    • Approximate Distinct Counts
    • Unstructured Data with JSONB
  • Timeseries Data
    • Scaling Timeseries Data on Citus
    • Automating Partition Creation

Develop

  • Determining Application Type
  • Choosing Distribution Column
    • Multi-Tenant Apps
      • Best Practices
    • Real-Time Apps
      • Best Practices
    • Table Co-Location
      • Data co-location in Citus for hash-distributed tables
      • A practical example of co-location
      • Using Regular PostgreSQL Tables
      • Distributing tables by ID
      • Distributing tables by tenant
      • Co-location means better feature support
      • Query Performance
  • Migrating an Existing App
    • Multi-Tenant Schema Migration
      • Backfilling Tenant ID
    • Multi-Tenant Query Migration
      • Validating Query Migration
    • Ruby on Rails
      • Preparing to scale-out a multi-tenant application
      • Updating the Rails Application
      • Example Application
    • Django
      • Preparing to scale-out a multi-tenant application
      • Updating the Django Application
    • Data Migration
      • Using Citus Warp
        • 1. Duplicate schema
        • 2. Enable logical replication
        • 3. Open access for network connection
        • 4. Begin Replication
        • 5. Switch Production System to Citus Cloud
    • Real-Time Analytics Apps
  • SQL Reference
    • Creating and Modifying Distributed Tables (DDL)
      • Creating And Distributing Tables
        • Reference Tables
        • Distributing Coordinator Data
      • Co-Locating Tables
        • Upgrading from Citus 5.x
      • Dropping Tables
      • Modifying Tables
        • Adding/Modifying Columns
        • Adding/Removing Constraints
        • Adding/Removing Indices
        • Manual Modification
    • Ingesting, Modifying Data (DML)
      • Inserting Data
        • Single Row Inserts
        • Multi-Row Inserts
        • Bulk Loading
        • Distributed Aggregations
      • Updates and Deletion
      • Maximizing Write Performance
    • Querying Distributed Tables (SQL)
      • Aggregate Functions
        • Count (Distinct) Aggregates
      • Limit Pushdown
      • Views on Distributed Tables
      • Joins
        • Co-located joins
        • Reference table joins
        • Repartition joins
    • Query Processing
      • Distributed Query Planner
      • Distributed Query Executor
        • Real-time Executor
        • Router Executor
        • Task Tracker Executor
        • Subquery/CTE Push-Pull Execution
      • PostgreSQL planner and executor
    • Manual Query Propagation
      • Running on all Workers
      • Running on all Shards
      • Running on all Placements
      • Limitations
    • SQL Support and Workarounds
      • Workarounds
        • JOIN a local and a distributed table
        • Temp Tables: the Last Resort
  • Citus API
    • Citus Utility Functions
      • Table and Shard DDL
        • create_distributed_table
        • create_reference_table
        • upgrade_to_reference_table
        • master_create_distributed_table
        • master_create_worker_shards
        • master_create_empty_shard
      • Table and Shard DML
        • master_append_table_to_shard
        • master_apply_delete_command
        • master_modify_multiple_shards
      • Metadata / Configuration Information
        • master_add_node
        • master_update_node
        • master_add_inactive_node
        • master_activate_node
        • master_disable_node
        • master_add_secondary_node
        • master_remove_node
        • master_get_active_worker_nodes
        • master_get_table_metadata
        • get_shard_id_for_distribution_column
        • column_to_column_name
        • citus_relation_size
        • citus_table_size
        • citus_total_relation_size
      • Cluster Management And Repair Functions
        • master_copy_shard_placement
        • master_move_shard_placement
        • rebalance_table_shards
        • replicate_table_shards
        • isolate_tenant_to_new_shard
    • Metadata Tables
      • Partition table
      • Shard table
        • Shard Storage Types
      • Shard placement table
        • Shard Placement States
      • Worker node table
      • Co-location group table
    • Configuration Reference
      • General configuration
        • citus.max_worker_nodes_tracked (integer)
        • citus.use_secondary_nodes (enum)
        • citus.cluster_name (text)
        • citus.enable_version_checks (bool)
        • citus.log_distributed_deadlock_detection (boolean)
        • citus.distributed_deadlock_detection_factor (floating point)
      • Data Loading
        • citus.multi_shard_commit_protocol (enum)
        • citus.shard_replication_factor (integer)
        • citus.shard_count (integer)
        • citus.shard_max_size (integer)
      • Planner Configuration
        • citus.limit_clause_row_fetch_count (integer)
        • citus.count_distinct_error_rate (floating point)
        • citus.task_assignment_policy (enum)
      • Intermediate Data Transfer
        • citus.binary_worker_copy_format (boolean)
        • citus.binary_master_copy_format (boolean)
        • citus.max_intermediate_result_size (integer)
      • DDL
        • citus.enable_ddl_propagation (boolean)
      • Executor Configuration
        • citus.all_modifications_commutative
        • citus.max_task_string_size (integer)
        • citus.remote_task_check_interval (integer)
        • citus.task_executor_type (enum)
        • citus.multi_task_query_log_level (enum)
        • Real-time executor configuration
        • Task tracker executor configuration
        • Explain output
    • Append Distribution
      • Creating and Distributing Tables
      • Expiring Data
      • Deleting Data
      • Dropping Tables
      • Data Loading
        • Bulk load using \copy
        • Incremental loads by appending to existing shards
        • Increasing data loading performance
      • Scaling Data Ingestion
        • Coordinator Node Bulk Ingestion (100k/s-200k/s)
        • Worker Node Bulk Ingestion (100k/s-1M/s)
        • Pre-processing Data in Citus

Administer

  • Cluster Management
    • Choosing Cluster Size
      • Shard Count
        • Multi-Tenant SaaS Use-Case
        • Real-Time Analytics Use-Case
    • Initial Hardware Size
      • Multi-Tenant SaaS Use-Case
      • Real-Time Analytics Use-Case
    • Scaling the cluster
      • Add a worker
      • Rebalance Shards without Downtime
        • How it Works
      • Adding a coordinator
    • Dealing With Node Failures
      • Worker Node Failures
      • Coordinator Node Failures
    • Tenant Isolation
    • Worker Security
    • PostgreSQL extensions
    • Checks For Updates and Cluster Statistics
      • What we Collect
      • How to Opt Out
  • Table Management
    • Determining Table and Relation Size
    • Vacuuming Distributed Tables
    • Analyzing Distributed Tables
  • Upgrading Citus
    • Upgrading Citus Versions
      • Patch Version Upgrade
      • Major and Minor Version Upgrades
        • Step 1. Update Citus Package
        • Step 2. Apply Update in DB
    • Upgrading PostgreSQL version from 9.6 to 10.0
      • On the Coordinator Node
      • On Worker Nodes

Citus Cloud

  • Get Started
    • Provisioning
      • Configuring Your Plan
      • Supported Regions
      • Other infrastructure providers
    • Connecting
      • Connecting Directly to a Worker
  • Manage
    • Scaling
      • Scaling Up (increasing node size)
      • Scaling Out (adding new nodes)
        • Rebalancing
      • Scaling Connections (pgBouncer)
    • Monitoring
      • Resources Usage
        • Amazon EBS Volume Metrics
        • CPU and Network
        • PostgreSQL Write-Ahead Log
      • Formation Events Feed
      • StatsD external reporting
        • Example: Datadog with statsd
      • Systemic Cloud Status
    • Security
      • Connecting with SSL
        • Two-Factor Authentication
      • Users and Permissions
        • Granting Privileges in Bulk
      • Encryption at Rest
      • Network Perimeter Controls
    • Backup, Availability, and Replication
      • High-Availability (HA) Replication
      • Disaster Recovery (DR)
      • Comparison of HA and DR
        • Disaster Recovery takes a little extra work but gives greater reliability
        • Trade-offs between latency and reliability for backups
        • How High Availability and Disaster Recovery is used for crash recovery
      • Point-in-Time Recovery
    • Logging
      • What Is Logged
      • Recent Logs
      • External Log Destinations
        • Verified Provider Settings
  • Additional Features
    • Extensions
    • Forking
      • How to Fork a Formation
      • When is it Useful
      • How it Works Internally
    • Followers
    • Masterless Mode (beta)
      • Architecture
      • Data Access
      • Scaling Out a Raw Events Table
      • Limitations Compared to Citus
    • Custom PostgreSQL Configuration
  • Support and Billing
    • Support
    • Billing and pricing

Troubleshoot

  • Query Performance Tuning
    • Table Distribution and Shards
    • PostgreSQL tuning
    • Scaling Out Performance
    • Distributed Query Performance Tuning
      • General
      • Subquery/CTE Network Overhead
      • Advanced
        • Task Assignment Policy
        • Intermediate Data Transfer Format
        • Real Time Executor
        • Task Tracker Executor
    • Scaling Out Data Ingestion
      • Real-time Insert and Updates
        • Insert Throughput
        • Update Throughput
        • Insert and Update: Throughput Checklist
        • Insert and Update: Latency
      • Bulk Copy (250K - 2M/s)
      • Masterless Citus (50k/s-500k/s)
  • Useful Diagnostic Queries
    • Finding which shard contains data for a specific tenant
    • Finding the distribution column for a table
    • Detecting locks
    • Querying the size of your shards
    • Querying the size of all distributed tables
    • Detetermining Replication Factor per Table
    • Identifying unused indices
    • Monitoring client connection count
    • Index hit rate
  • Common Error Messages
    • Failed to execute task n
      • Resolution
    • Relation foo is not distributed
      • Resolution
    • Could not receive query results
      • Resolution
    • Canceling the transaction since it was involved in a distributed deadlock
      • Resolution
    • Cannot establish a new connection for placement n, since DML has been executed on a connection that is in use
      • Resolution
    • Could not connect to server: Cannot assign requested address
      • Resolution
    • Remaining connection slots are reserved for non-replication superuser connections
      • Resolution
    • PgBouncer cannot connect to server
      • Resolution
    • Unsupported clause type
      • Resolution
    • Cannot open new connections after the first modification command within a transaction
      • Resolution

FAQ

  • Frequently Asked Questions
    • Can I create primary keys on distributed tables?
    • How do I add nodes to an existing Citus cluster?
    • How does Citus handle failure of a worker node?
    • How does Citus handle failover of the coordinator node?
    • How do I ingest the results of a query into a distributed table?
    • Can I join distributed and non-distributed tables together in the same query?
    • Are there any PostgreSQL features not supported by Citus?
    • How do I choose the shard count when I hash-partition my data?
    • How do I change the shard count for a hash partitioned table?
    • How does citus support count(distinct) queries?
    • In which situations are uniqueness constraints supported on distributed tables?
    • How do I create database roles, functions, extensions etc in a Citus cluster?
    • What if a worker node’s address changes?
    • Which shard contains data for a particular tenant?
    • I forgot the distribution column of a table, how do I find it?
    • Why does pg_relation_size report zero bytes for a distributed table?
    • Why am I seeing an error about max_intermediate_result_size?
    • Can I run Citus on Heroku or Amazon RDS?
    • Can I shard by schema on Citus for multi-tenant applications?
    • How does cstore_fdw work with Citus?
    • What happened to pg_shard?

Articles

  • Introducing Citus Add-on for Heroku: Scale out your Postgres
    • Continuous protection and monitoring with Citus Cloud
    • The insights you need
    • Granular access control in Citus Cloud
    • Scale out Postgres on Heroku today
  • Efficient Rollup Tables with HyperLogLog in Postgres
    • Rollup tables without HLL—using GitHub events data as an example
    • Without HLL, rollup tables have a few limitations
    • HLL to the rescue
    • HLL and rollup tables in action, together
    • What kinds of queries can HLL answer?
    • A rollup table with HLL is worth a thousand rollup tables without HLL
    • Want to learn more about HLL in Postgres?
  • Distributed Distinct Count with HyperLogLog on Postgres
    • What does HLL do behind the curtains?
      • Hash all elements
      • Observe the data for rare patterns
      • Stochastic Averaging
      • More?
    • HLL in distributed systems
    • Hands on with HLL
      • Setup
      • Examples
      • Conclusion
  • How to Scale PostgreSQL on AWS: Learnings from Citus Cloud
    • 1. High Availability
    • 2. Effortless Failovers
    • 3. Vertical Scalability
    • 4. Horizontal Scalability
    • 5. Automatic Backups for Disaster Recovery
    • 6. Monitoring, Alerts, and Logging
    • 7. Auto-configure, patch, and upgrade
    • Conclusion
  • Postgres Parallel Indexing in Citus
  • Real-time Event Aggregation at Scale Using Postgres with Citus
  • How Distributed Outer Joins on PostgreSQL with Citus Work
    • Distributed Outer Joins with Citus
  • Designing your SaaS Database for Scale with Postgres
  • Building a Scalable Postgres Metrics Backend using the Citus Extension
    • Time-Series Metrics
    • Events
  • Sharding a Multi-Tenant App with Postgres
    • Tenancy
    • Multi-tenancy and co-location, a perfect pair
    • In conclusion
  • Sharding Postgres with Semi-Structured Data and Its Performance Implications
    • One large table, without joins
    • Enter Citus
    • The query workload
    • Every distribution has its thorns
  • Scalable Real-time Product Search using PostgreSQL with Citus
Citus Docs
  • »
  • Citus Documentation
  • Edit on GitHub

Citus Documentation¶

Welcome to the documentation for Citus 7.3! Citus horizontally scales PostgreSQL across commodity servers using sharding and replication. Its query engine parallelizes incoming SQL queries across these servers to enable real-time responses on large datasets.

Getting Started

Learn the Citus architecture, install locally, and follow some ten-minute tutorials.

Use Cases

See how Citus allows multi-tenant applications to scale with minimal database changes.

Migrating to Citus

Move from plain PostgreSQL to Citus, and discover data modeling techniques for distributed systems.

Citus Cloud

Explore our secure, scalable, highly available database-as-a-service.

API / Reference

Get the most out of Citus by learning its functions and configuration.

Help and Support

See the frequently asked questions, and contact us. This is the page to get unstuck.
_images/number-one.png _images/use-cases.png _images/migrating.png _images/cloud.png _images/reference.png _images/help.png _images/logo.png
Next

© Copyright 2022, Citus Data. Revision 609f724e.

Read the Docs v: v7.3
Versions
latest
stable
v11.0-beta
v10.2
v10.1
v10.0
v9.5
v9.4
v9.3
v9.2
v9.1
v9.0
v8.3
v8.2
v8.1
v8.0
v7.5
v7.4
v7.3
v7.2
v7.1
v7.0
v6.2.2
v6.2
v6.1
v6.0
v5.2
v5.1
v5.0
a11y-2
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds