Citus Docs
v7.4

Get Started

  • What is Citus?
  • When to Use Citus
    • Multi-Tenant Database
    • Real-Time Analytics
      • Considerations for Use
      • When Citus is Inappropriate
  • Concepts
    • Distributed Architecture
      • Nodes: Coordinator and Workers
    • Table Types
      • Type 1: Distributed Tables
        • Distribution Column
      • Type 2: Reference Tables
      • Type 3: Local Tables
    • Shards
      • Shard Placements
    • Co-Location
    • Parallelism
  • Quick Tutorials
    • Multi-tenant Applications
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries
    • Real-time Analytics
      • Data model and sample data
      • Creating tables
      • Distributing tables and loading data
      • Running queries

Install

  • Single-Machine Cluster
    • Docker (Mac or Linux)
    • Ubuntu or Debian
    • Fedora, CentOS, or Red Hat
  • Multi-Machine Cluster
    • Ubuntu or Debian
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • Fedora, CentOS, or Red Hat
      • Steps to be executed on all nodes
      • Steps to be executed on the coordinator node
    • AWS CloudFormation
  • Deploy on Citus Cloud

Use-Case Guides

  • Multi-tenant Applications
    • Let’s Make an App – Ad Analytics
    • Scaling the Relational Data Model
    • Preparing Tables and Ingesting Data
      • Try it Yourself
    • Integrating Applications
    • Sharing Data Between Tenants
    • Online Changes to the Schema
    • When Data Differs Across Tenants
    • Scaling Hardware Resources
    • Dealing with Big Tenants
    • Where to Go From Here
  • Real-Time Dashboards
    • Data Model
    • Rollups
    • Expiring Old Data
    • Approximate Distinct Counts
    • Unstructured Data with JSONB
  • Timeseries Data
    • Scaling Timeseries Data on Citus
    • Automating Partition Creation

Develop

  • Determining Application Type
    • At a Glance
    • Examples and Characteristics
  • Choosing Distribution Column
    • Multi-Tenant Apps
      • Best Practices
    • Real-Time Apps
      • Best Practices
    • Timeseries Data
      • Best Practices
    • Table Co-Location
      • Data co-location in Citus for hash-distributed tables
      • A practical example of co-location
      • Using Regular PostgreSQL Tables
      • Distributing tables by ID
      • Distributing tables by tenant
      • Co-location means better feature support
      • Query Performance
  • Migrating an Existing App
    • Multi-Tenant Schema Migration
      • Backfilling Tenant ID
    • Multi-Tenant Query Migration
      • Validating Query Migration
    • Ruby on Rails
      • Preparing to scale-out a multi-tenant application
      • Updating the Rails Application
      • Example Application
    • Django
      • Preparing to scale-out a multi-tenant application
      • Updating the Django Application
    • ASP.NET
      • Example App
        • Schema
        • ASP.NET Core project
      • Adding Tenancy to App
        • Define the Entity Framework Core context and models
        • Install SaasKit
        • Update views
        • Query the database
        • Test the app
    • Data Migration
      • Using Citus Warp
        • 1. Duplicate schema
        • 2. Enable logical replication
        • 3. Open access for network connection
        • 4. Begin Replication
        • 5. Switch Production System to Citus Cloud
    • Real-Time Analytics Apps
  • SQL Reference
    • Creating and Modifying Distributed Tables (DDL)
      • Creating And Distributing Tables
        • Reference Tables
        • Distributing Coordinator Data
      • Co-Locating Tables
        • Upgrading from Citus 5.x
      • Dropping Tables
      • Modifying Tables
        • Adding/Modifying Columns
        • Adding/Removing Constraints
        • Adding/Removing Indices
        • Manual Modification
    • Ingesting, Modifying Data (DML)
      • Inserting Data
        • Single Row Inserts
        • Multi-Row Inserts
        • Bulk Loading
        • Distributed Aggregations
      • Updates and Deletion
      • Maximizing Write Performance
    • Querying Distributed Tables (SQL)
      • Aggregate Functions
        • Count (Distinct) Aggregates
      • Limit Pushdown
      • Views on Distributed Tables
      • Joins
        • Co-located joins
        • Reference table joins
        • Repartition joins
    • Query Processing
      • Distributed Query Planner
      • Distributed Query Executor
        • Real-time Executor
        • Router Executor
        • Task Tracker Executor
        • Subquery/CTE Push-Pull Execution
      • PostgreSQL planner and executor
    • Manual Query Propagation
      • Running on all Workers
      • Running on all Shards
      • Running on all Placements
      • Limitations
    • SQL Support and Workarounds
      • Workarounds
        • JOIN a local and a distributed table
      • INSERT…SELECT upserts lacking distribution column
        • Temp Tables: the Last Resort
  • Citus API
    • Citus Utility Functions
      • Table and Shard DDL
        • create_distributed_table
        • create_reference_table
        • upgrade_to_reference_table
        • mark_tables_colocated
        • master_create_distributed_table
        • master_create_worker_shards
        • master_create_empty_shard
      • Table and Shard DML
        • master_append_table_to_shard
        • master_apply_delete_command
        • master_modify_multiple_shards
      • Metadata / Configuration Information
        • master_add_node
        • master_update_node
        • master_add_inactive_node
        • master_activate_node
        • master_disable_node
        • master_add_secondary_node
        • master_remove_node
        • master_get_active_worker_nodes
        • master_get_table_metadata
        • get_shard_id_for_distribution_column
        • column_to_column_name
        • citus_relation_size
        • citus_table_size
        • citus_total_relation_size
      • Cluster Management And Repair Functions
        • master_copy_shard_placement
        • master_move_shard_placement
        • rebalance_table_shards
        • get_rebalance_progress
        • replicate_table_shards
        • isolate_tenant_to_new_shard
        • citus_create_restore_point
    • Metadata Tables
      • Partition table
      • Shard table
        • Shard Storage Types
      • Shard placement table
        • Shard Placement States
      • Worker node table
      • Co-location group table
    • Configuration Reference
      • General configuration
        • citus.max_worker_nodes_tracked (integer)
        • citus.use_secondary_nodes (enum)
        • citus.cluster_name (text)
        • citus.enable_version_checks (boolean)
        • citus.log_distributed_deadlock_detection (boolean)
        • citus.distributed_deadlock_detection_factor (floating point)
      • Data Loading
        • citus.multi_shard_commit_protocol (enum)
        • citus.shard_replication_factor (integer)
        • citus.shard_count (integer)
        • citus.shard_max_size (integer)
      • Planner Configuration
        • citus.limit_clause_row_fetch_count (integer)
        • citus.count_distinct_error_rate (floating point)
        • citus.task_assignment_policy (enum)
      • Intermediate Data Transfer
        • citus.binary_worker_copy_format (boolean)
        • citus.binary_master_copy_format (boolean)
        • citus.max_intermediate_result_size (integer)
      • DDL
        • citus.enable_ddl_propagation (boolean)
      • Executor Configuration
        • citus.all_modifications_commutative
        • citus.max_task_string_size (integer)
        • citus.remote_task_check_interval (integer)
        • citus.task_executor_type (enum)
        • citus.multi_task_query_log_level (enum)
        • Real-time executor configuration
        • Task tracker executor configuration
        • Explain output
    • Append Distribution
      • Creating and Distributing Tables
      • Expiring Data
      • Deleting Data
      • Dropping Tables
      • Data Loading
        • Bulk load using \copy
        • Incremental loads by appending to existing shards
        • Increasing data loading performance
      • Scaling Data Ingestion
        • Coordinator Node Bulk Ingestion (100k/s-200k/s)
        • Worker Node Bulk Ingestion (100k/s-1M/s)
        • Pre-processing Data in Citus
  • External Integrations
    • Ingesting Data from Kafka
      • Caveats
    • Ingesting Data from Spark
    • Business Intelligence with Tableau

Citus Cloud

  • Get Started
    • Provisioning
      • Configuring Your Plan
      • Supported Regions
      • Other infrastructure providers
    • Connecting
      • Connecting Directly to a Worker
  • Manage
    • Scaling
      • Scaling Up (increasing node size)
      • Scaling Out (adding new nodes)
        • Rebalancing
      • Scaling Connections (pgBouncer)
    • Monitoring
      • Cloud Platform Status
      • Resources Usage
        • Amazon EBS Volume Metrics
        • CPU and Network
        • PostgreSQL Write-Ahead Log
      • Formation Events Feed
      • StatsD external reporting
        • Example: Datadog with statsd
      • VividCortex External Monitoring
    • Security
      • Connecting with SSL
        • Two-Factor Authentication
      • Users and Permissions
        • Granting Privileges in Bulk
      • Encryption at Rest
      • Network Perimeter Controls
    • Backup, Availability, and Replication
      • High-Availability (HA) Replication
      • Disaster Recovery (DR)
      • Comparison of HA and DR
        • Disaster Recovery takes a little extra work but gives greater reliability
        • Trade-offs between latency and reliability for backups
        • How High Availability and Disaster Recovery is used for crash recovery
      • Point-in-Time Recovery
    • Upgrades
    • Logging
      • What Is Logged
      • Recent Logs
      • External Log Destinations
        • Verified Provider Settings
  • Additional Features
    • Extensions
    • Forking
      • How to Fork a Formation
      • When is it Useful
      • How it Works Internally
    • Followers
    • Masterless Mode (beta)
      • Architecture
      • Data Access
      • Scaling Out a Raw Events Table
      • Limitations Compared to Citus
    • Custom PostgreSQL Configuration
  • Support and Billing
    • Support
    • Billing
      • Pricing
      • Payments

Administer

  • Cluster Management
    • Choosing Cluster Size
      • Shard Count
        • Multi-Tenant SaaS Use-Case
        • Real-Time Analytics Use-Case
    • Initial Hardware Size
      • Multi-Tenant SaaS Use-Case
      • Real-Time Analytics Use-Case
    • Scaling the cluster
      • Add a worker
      • Rebalance Shards without Downtime
        • How it Works
      • Adding a coordinator
    • Dealing With Node Failures
      • Worker Node Failures
      • Coordinator Node Failures
    • Tenant Isolation
    • Worker Security
    • PostgreSQL extensions
    • Checks For Updates and Cluster Statistics
      • What we Collect
      • How to Opt Out
  • Table Management
    • Determining Table and Relation Size
    • Vacuuming Distributed Tables
    • Analyzing Distributed Tables
  • Upgrading Citus
    • Upgrading Citus Versions
      • Patch Version Upgrade
      • Major and Minor Version Upgrades
        • Step 1. Update Citus Package
        • Step 2. Apply Update in DB
    • Upgrading PostgreSQL version from 9.6 to 10.0
      • On the Coordinator Node
      • On Worker Nodes

Troubleshoot

  • Query Performance Tuning
    • Table Distribution and Shards
    • PostgreSQL tuning
    • Scaling Out Performance
    • Distributed Query Performance Tuning
      • General
      • Subquery/CTE Network Overhead
      • Advanced
        • Task Assignment Policy
        • Intermediate Data Transfer Format
        • Real Time Executor
        • Task Tracker Executor
    • Scaling Out Data Ingestion
      • Real-time Insert and Updates
        • Insert Throughput
        • Update Throughput
        • Insert and Update: Throughput Checklist
        • Insert and Update: Latency
      • Bulk Copy (250K - 2M/s)
      • Masterless Citus (50k/s-500k/s)
  • Useful Diagnostic Queries
    • Finding which shard contains data for a specific tenant
    • Finding the distribution column for a table
    • Detecting locks
    • Querying the size of your shards
    • Querying the size of all distributed tables
    • Determining Replication Factor per Table
    • Identifying unused indices
    • Monitoring client connection count
    • Index hit rate
  • Common Error Messages
    • Failed to execute task n
      • Resolution
    • Relation foo is not distributed
      • Resolution
    • Could not receive query results
      • Resolution
    • Canceling the transaction since it was involved in a distributed deadlock
      • Resolution
    • Cannot establish a new connection for placement n, since DML has been executed on a connection that is in use
      • Resolution
    • Could not connect to server: Cannot assign requested address
      • Resolution
    • Remaining connection slots are reserved for non-replication superuser connections
      • Resolution
    • PgBouncer cannot connect to server
      • Resolution
    • Unsupported clause type
      • Resolution
    • Cannot open new connections after the first modification command within a transaction
      • Resolution
    • ON CONFLICT is not supported via coordinator
      • Resolution

FAQ

  • Frequently Asked Questions
    • Can I create primary keys on distributed tables?
    • How do I add nodes to an existing Citus cluster?
    • How does Citus handle failure of a worker node?
    • How does Citus handle failover of the coordinator node?
    • How do I ingest the results of a query into a distributed table?
    • Can I join distributed and non-distributed tables together in the same query?
    • Are there any PostgreSQL features not supported by Citus?
    • How do I choose the shard count when I hash-partition my data?
    • How do I change the shard count for a hash partitioned table?
    • How does citus support count(distinct) queries?
    • In which situations are uniqueness constraints supported on distributed tables?
    • How do I create database roles, functions, extensions etc in a Citus cluster?
    • What if a worker node’s address changes?
    • Which shard contains data for a particular tenant?
    • I forgot the distribution column of a table, how do I find it?
    • Can I distribute a table by multiple keys?
    • Why does pg_relation_size report zero bytes for a distributed table?
    • Why am I seeing an error about max_intermediate_result_size?
    • Can I run Citus on Heroku?
    • Can I run Citus on Amazon RDS?
    • Can I use Citus with my existing AWS account?
    • Can I shard by schema on Citus for multi-tenant applications?
    • How does cstore_fdw work with Citus?
    • What happened to pg_shard?

Articles

  • Related Articles
    • Introducing Citus Add-on for Heroku: Scale out your Postgres
      • Continuous protection and monitoring with Citus Cloud
      • The insights you need
      • Granular access control in Citus Cloud
      • Scale out Postgres on Heroku today
    • Efficient Rollup Tables with HyperLogLog in Postgres
      • Rollup tables without HLL—using GitHub events data as an example
      • Without HLL, rollup tables have a few limitations
      • HLL to the rescue
      • HLL and rollup tables in action, together
      • What kinds of queries can HLL answer?
      • A rollup table with HLL is worth a thousand rollup tables without HLL
      • Want to learn more about HLL in Postgres?
    • Distributed Distinct Count with HyperLogLog on Postgres
      • What does HLL do behind the curtains?
        • Hash all elements
        • Observe the data for rare patterns
        • Stochastic Averaging
        • More?
      • HLL in distributed systems
      • Hands on with HLL
        • Setup
        • Examples
        • Conclusion
    • How to Scale PostgreSQL on AWS: Learnings from Citus Cloud
      • 1. High Availability
      • 2. Effortless Failovers
      • 3. Vertical Scalability
      • 4. Horizontal Scalability
      • 5. Automatic Backups for Disaster Recovery
      • 6. Monitoring, Alerts, and Logging
      • 7. Auto-configure, patch, and upgrade
      • Conclusion
    • Postgres Parallel Indexing in Citus
    • Real-time Event Aggregation at Scale Using Postgres with Citus
    • How Distributed Outer Joins on PostgreSQL with Citus Work
      • Distributed Outer Joins with Citus
    • Designing your SaaS Database for Scale with Postgres
    • Building a Scalable Postgres Metrics Backend using the Citus Extension
      • Time-Series Metrics
      • Events
    • Sharding a Multi-Tenant App with Postgres
      • Tenancy
      • Multi-tenancy and co-location, a perfect pair
      • In conclusion
    • Sharding Postgres with Semi-Structured Data and Its Performance Implications
      • One large table, without joins
      • Enter Citus
      • The query workload
      • Every distribution has its thorns
    • Scalable Real-time Product Search using PostgreSQL with Citus
Citus Docs
  • »
  • Search


© Copyright 2023, Citus Data. Revision 90c7765d.