From 8a427c0b0f005e2ae709d2b61edc32850867f41d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Sun, 28 Sep 2025 13:00:49 +0800 Subject: [PATCH 01/55] This PR adds a new guide on mastering partitioned tables in TiDB - Query optimization with partition pruning - Performance comparison: Non-Partitioned vs Local Index vs Global Index - Data cleanup efficiency: TTL vs DROP PARTITION - Partition drop performance: Local Index vs Global Index - Strategies to mitigate write hotspot issues with hash/key partitioning - Partition management challenges and best practices - Avoiding read/write hotspots on new partitions - Using PRE_SPLIT_REGIONS, SHARD_ROW_ID_BITS, and region splitting - Converting between partitioned and non-partitioned tables - Batch DML, Pipeline DML, IMPORT INTO, and Online DDL efficiency comparison --- tidb_partitioned_tables_guide.md | 690 +++++++++++++++++++++++++++++++ 1 file changed, 690 insertions(+) create mode 100644 tidb_partitioned_tables_guide.md diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md new file mode 100644 index 0000000000000..6fc54fa10a3e3 --- /dev/null +++ b/tidb_partitioned_tables_guide.md @@ -0,0 +1,690 @@ +# Mastering TiDB Partitioned Tables + +## Introduction + +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. + +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like DROP PARTITION. This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. + +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. + +While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. + +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it aims to equip you with the knowledge to make informed decisions about when and how to adopt partitioning strategies in your TiDB environment. + +## Agenda + +- Improving query efficiency + - Partition pruning + - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index +- Facilitating bulk data deletion + - Data cleanup efficiency: TTL vs. Direct Partition Drop + - Partition drop efficiency: Local Index vs Global Index +- Mitigating write hotspot issues +- Partition management challenge + - How to avoid hotspots caused by new range partitions +- Converting between partitioned and non-partitioned tables + +By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. + +> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. + +## Improving Query Efficiency + +### Partition Pruning + +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. + +#### Applicable Scenarios + +Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: + +- **Time-series data queries**: When data is partitioned by time ranges (e.g., daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. + +For more use cases, please refer to https://docs.pingcap.com/tidb/stable/partition-pruning/ + +### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index + +In TiDB, local indexes are the default indexing strategy for partitioned tables. Each partition maintains its own set of indexes, while a Global Index refers to an index that spans all partitions in a partitioned table. Unlike Local Indexes, which are partition-specific and stored separately within each partition, a Global Index maintains a single, unified index across the entire table. This index includes references to all rows, regardless of which partition they belong to, and thus can provide global queries and operations, such as joins or lookups, with faster access. + +#### What Did We Test + +We evaluated query performance across three table configurations in TiDB: +- Non-Partitioned Table +- Partitioned Table with Global Index +- Partitioned Table with Local Index + +#### Test Setup + +- The query **accesses data via a secondary index** and uses IN conditions across multiple values. +- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. +- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. +- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. + +#### Schema + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `yeardate` int NOT NULL, + PRIMARY KEY (`id`,`yeardate`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +AUTO_INCREMENT=1284046228560811404 +PARTITION BY RANGE (`yeardate`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +``` + +#### SQL + +```sql +SELECT `fa`.* +FROM `fa` +WHERE `fa`.`sid` IN ( + 1696271179344, + 1696317134004, + 1696181972136, + ... + 1696159221765 +); +``` + +- Query filters on secondary index, but does **not include the partition key**. +- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Table lookup tasks are significantly higher for partitioned tables. + +#### Findings + +Data came from a table with **366 range partitions** (e.g., by date). +- The **Average Query Time** was obtained from the statement_summary view. +- The query used a **secondary index** and returned **400 rows**. + +Metrics collected: +- **Average Query Time**: from statement_summary +- **Cop Tasks** (Index Scan + Table Lookup): from execution plan + +#### Test Results + +| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | +|---|---|---|---|---|---| +| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | +| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | +| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | + +#### Execution Plan Examples + +**Non-partitioned table** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task: {total_time: 3.34ms, fetch_handle: 3.34ms, build: 600ns, wait: 2.86µs}, table_task: {total_time: 7.55ms, num: 1, concurrency: 5}, next: {wait_index: 3.49ms, wait_table_lookup_build: 492.5µs, wait_table_lookup_resp: 7.05ms} | | 706.7 KB | N/A | +| ├─IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task: {num: 72, max: 780.4µs, min: 394.2µs, avg: 566.7µs, p95: 748µs, max_proc_keys: 20, p95_proc_keys: 10, tot_proc: 3.66ms, tot_wait: 18.6ms, copr_cache_hit_ratio: 0.00, build_task_duration: 94µs, max_distsql_concurrency: 15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg: 27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail: {total_process_keys: 400, total_process_keys_size: 22800, total_keys: 480, get_snapshot_time: 17.7ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 160}}}, time_detail: {total_process_time: 3.66ms, total_wait_time: 18.6ms, total_kv_read_wall_time: 2ms, tikv_wall_time: 27.4ms} | range:[1696125963161,1696125963161], [1696126443462,1696126443462], ..., keep order:false | N/A | N/A | +| └─TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task: {num: 79, max: 4.98ms, min: 0s, avg: 514.9µs, p95: 3.75ms, max_proc_keys: 10, p95_proc_keys: 5, tot_proc: 15ms, tot_wait: 21.4ms, copr_cache_hit_ratio: 0.00, build_task_duration: 341.2µs, max_distsql_concurrency: 1, max_extra_concurrency: 7, store_batch_num: 62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail: {total_process_keys: 400, total_process_keys_size: 489856, total_keys: 800, get_snapshot_time: 20.8ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 1600}}}, time_detail: {total_process_time: 15ms, total_wait_time: 21.4ms, tikv_wall_time: 10.9ms} | keep order:false | N/A | N/A | +``` + +[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] + +#### How to Create a Global Index on a Partitioned Table in TiDB + +**Option 1: Add via ALTER TABLE** + +```sql +ALTER TABLE +ADD UNIQUE INDEX (col1, col2) GLOBAL; +``` + +- Adds a global index to an existing partitioned table. +- GLOBAL must be explicitly specified. +- You can also use ADD INDEX for non-unique global indexes. + +**Option 2: Define Inline on Table Creation** + +```sql +CREATE TABLE t ( + id BIGINT NOT NULL, + col1 VARCHAR(50), + col2 VARCHAR(50), + -- other columns... + + UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) +) +PARTITION BY RANGE (id) ( + PARTITION p0 VALUES LESS THAN (10000), + PARTITION p1 VALUES LESS THAN (20000), + PARTITION pMax VALUES LESS THAN MAXVALUE +); +``` + +#### Summary + +The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. + +- The more partitions you have, the more severe the potential performance degradation. +- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (i.e., the number of rows requiring table lookups). + +#### Recommendation + +- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- If you must use partitioned tables, benchmark both global Index and local Index strategies under your workload. +- Use global indexes when query performance across partitions is critical. +- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. + +## Facilitating Bulk Data Deletion + +### Data Cleanup Efficiency: TTL vs. Direct Partition Drop + +In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. + +#### What's the difference? + +- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. +- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. + +#### What Did We Test + +To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. + +#### Findings + +**TTL Performance:** +- On a high-write table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + +**Partition Drop Performance:** +- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. + +#### How to Use TTL and Partition Drop in TiDB + +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. + +**TTL schema** + +```sql +CREATE TABLE `ad_cache` ( + `session` varchar(255) NOT NULL, + `ad_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `data` mediumblob DEFAULT NULL, + `version` int(11) DEFAULT NULL, + `is_delete` tinyint(1) DEFAULT NULL, + PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) +) +ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' +TTL_JOB_INTERVAL='10m' +``` + +**Drop Partition (Range INTERVAL partitioning)** + +```sql +CREATE TABLE `ad_cache` ( + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `id_suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, + PRIMARY KEY ( + `session_id`, `external_id`, + `create_time`, `id_suffix` + ) NONCLUSTERED +) +SHARD_ROW_ID_BITS=7 +PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE COLUMNS (create_time) +INTERVAL (10 MINUTE) +FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') +... +LAST PARTITION LESS THAN ('2025-02-19 20:00:00') +``` + +It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. + +```sql +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}") +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}") +``` + +#### Recommendation + +For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. + +### Partition Drop Efficiency: Local Index vs Global Index + +Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. + +#### What Did We Test + +We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. + +| Index Type | Duration (drop partition) | +|---|---| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | + +#### Findings + +Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. + +**Global Index** + +```sql +mysql> alter table A drop partition A_2024363; +Query OK, 0 rows affected (1 min 16.02 sec) +``` + +**Local Index** + +```sql +mysql> alter table A drop partition A_2024363; +Query OK, 0 rows affected (0.52 sec) +``` + +#### Recommendation + +When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. + +If you need to drop partitions frequently and minimize the performance impact the the system, it's better to use **Local Indexes** for faster and more efficient operations. + +## Mitigating Write Hotspot Issues + +### Background + +In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. + +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: + +- A single Region handling most of the write workload, while other Regions remain idle. +- Higher write latency and reduced throughput. +- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. + +**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. + +### How It Works + +TiDB stores table data in **Regions**, each covering a continuous range of row keys. + +When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: + +**Without Partitioning:** +- New rows always have the highest key values and are inserted into the same "last Region." +- That Region is served by one TiKV node at a time, becoming a single write bottleneck. + +**With Hash/Key Partitioning:** +- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. +- Each partition has its own set of Regions, often distributed across different TiKV nodes. +- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. + +### Use Case + +If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. + +```sql +CREATE TABLE server_info ( + id bigint NOT NULL AUTO_INCREMENT, + serial_no varchar(100) DEFAULT NULL, + device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, + device_type varchar(50) DEFAULT NULL, + modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, + KEY idx_serial_no (serial_no), + KEY idx_modified_ts (modified_ts) +) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ +PARTITION BY KEY (id) PARTITIONS 16; +``` + +### Pros + +- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. + +### Cons + +**Potential Query Performance Drop Without Partition Pruning** + +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: + +```sql +select * from server_info where `serial_no` = ? +``` + +**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: + +```sql +ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; +``` + +## Partition Management Challenge + +### How to Avoid Hotspots Caused by New Range Partitions + +#### Overview + +New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. + +#### Common Hotspot Scenarios + +**Read Hotspot** + +When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. + +**Root Cause:** +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. + +**Impact:** +When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. + +**Write Hotspot** + +When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: + +**Root Cause:** +In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. + +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. + +**Impact:** +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + +#### Solutions + +**1. NONCLUSTERED Partitioned Table** + +**Pros:** +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- Lower operational overhead. + +**Cons:** +- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. + +**Recommendation:** +- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. + +**Best Practices** + +Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). + +```sql +CREATE TABLE employees ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) NONCLUSTERED, + KEY `idx_employees_on_store_id` (`store_id`) +)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +-- table +ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; +-- partition +ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for the primary key of all partitions** + +To split regions for the primary key of all partitions in a partitioned table, you can use a command like: + +```sql +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +``` + +This example will split each partition's primary key range into `` regions between the specified boundary values. + +**Splitting Regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** + +```sql +ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +SHOW TABLE employees PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +SHOW TABLE employees PARTITION (p4) regions; +``` + +**2. CLUSTERED Partitioned Table** + +**Pros:** +- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. + +**Cons:** +- **Manual region splitting** is required when creating new partitions, increasing operational complexity. + +**Recommendation:** +- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. + +**Best Practices** + +Create a CLUSTERED partitioned table. + +```sql +CREATE TABLE employees2 ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) CLUSTERED, + KEY `idx_employees2_on_store_id` (`store_id`) +) +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for all partitions** + +```sql +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +``` + +**Splitting regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** + +```sql +ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +show table employees2 PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +show table employees2 PARTITION (p4) regions; +``` + +**3. CLUSTERED Non-partitioned Table** + +**Pros:** +- **No hotspot risk from new partitions**. +- Provides **good read performance** for point and range queries. + +**Cons:** +- **Cannot use DROP PARTITION** to clean up large volumes of old data. + +**Recommendation:** +- Best suited for use cases that require stable performance and do not benefit from partition-based data management. + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + +## Converting Between Partitioned and Non-Partitioned Tables + +When working with large tables (e.g., 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: + +1. Batch DML: `INSERT INTO ... SELECT ...` +2. Pipeline DML: `INSERT INTO ... SELECT ...` +3. Import into: `IMPORT INTO ... FROM SELECT ...` +4. Online DDL: Direct schema transformation via `ALTER TABLE` + +This section compares the efficiency and implications of both methods in both directions of conversion, and provides best practice recommendations. + +### Method 1: Batch DML INSERT INTO ... SELECT ... + +**By Default** + +```sql +SET tidb_mem_quota_query = 0; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 1h 52m 47s +``` + +### Method 2: Pipeline DML INSERT INTO ... SELECT... + +```sql +SET tidb_dml_type = "bulk"; +SET tidb_mem_quota_query = 0; +SET tidb_enable_mutation_checker = OFF; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 58m 42s +``` + +### Method 3: IMPORT INTO ... FROM SELECT ... + +```sql +mysql> import into fa_new from select * from fa with thread=32,disable_precheck; +Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) +Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 +``` + +### Method 4: Online DDL + +**From partition table to non-partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; + +mysql> alter table fa REMOVE PARTITIONING; +-- real 170m12.024s (≈ 2h 50m) +``` + +**From non-partition table to partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +ALTER TABLE fa PARTITION BY RANGE (`yearweek`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +... +PARTITION `fa_2024365` VALUES LESS THAN (2024365), +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); + +Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) +``` + +### Findings + +| Method | Time Taken | +|---|---| +| Method 1: Batch DML INSERT INTO ... SELECT | 1h 52m 47s | +| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58m 42s | +| Method 3: IMPORT INTO ... FROM SELECT ... | 16m 59s | +| Method 4: Online DDL (From partition table to non-partitioned table) | 2h 50m | +| Method 4: Online DDL (From non-partition table to partitioned table) | 2h 31m | + +### Recommendation + +TiDB offers two approaches for converting tables between partitioned and non-partitioned states: + +- Choose the offline method when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From aa9e29930f22c2a3c73e981d74ac5988d952fed3 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:13:50 +0800 Subject: [PATCH 02/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6fc54fa10a3e3..88eb0d0df2d0d 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -687,4 +687,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -- Choose the offline method when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +- Choose an offline method like `IMPORT INTO` when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From d11ba34d53c8b4aa79b0df5452e3e8afb357ad69 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:14:40 +0800 Subject: [PATCH 03/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 88eb0d0df2d0d..b48633c114643 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -305,7 +305,7 @@ Query OK, 0 rows affected (0.52 sec) When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. -If you need to drop partitions frequently and minimize the performance impact the the system, it's better to use **Local Indexes** for faster and more efficient operations. +If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. ## Mitigating Write Hotspot Issues From 2b37ec83fe35f137121e0fc7fe0f576f09e80f38 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:14:51 +0800 Subject: [PATCH 04/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index b48633c114643..55558a0b0800f 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -179,7 +179,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on #### Recommendation - Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global Index and local Index strategies under your workload. +- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. - Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. From ae8555d8f8b049bb757a0a922fdd939ad9721b1b Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Sun, 28 Sep 2025 13:15:04 +0800 Subject: [PATCH 05/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 55558a0b0800f..2e957f04b6e35 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -550,7 +550,7 @@ To avoid hotspots when a new table or partition is created, it is often benefici **Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: ```sql -SELECT MIN(id), MAX(id) FROM employees; +SELECT MIN(id), MAX(id) FROM employees2; ``` - If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. From a1563b98e2f2db762a43a0599a3d742b9f049aea Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Sun, 28 Sep 2025 16:29:25 +0800 Subject: [PATCH 06/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6fc54fa10a3e3..61b6592e97b24 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -128,12 +128,30 @@ Metrics collected: **Non-partitioned table** ```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task: {total_time: 3.34ms, fetch_handle: 3.34ms, build: 600ns, wait: 2.86µs}, table_task: {total_time: 7.55ms, num: 1, concurrency: 5}, next: {wait_index: 3.49ms, wait_table_lookup_build: 492.5µs, wait_table_lookup_resp: 7.05ms} | | 706.7 KB | N/A | -| ├─IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task: {num: 72, max: 780.4µs, min: 394.2µs, avg: 566.7µs, p95: 748µs, max_proc_keys: 20, p95_proc_keys: 10, tot_proc: 3.66ms, tot_wait: 18.6ms, copr_cache_hit_ratio: 0.00, build_task_duration: 94µs, max_distsql_concurrency: 15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg: 27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail: {total_process_keys: 400, total_process_keys_size: 22800, total_keys: 480, get_snapshot_time: 17.7ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 160}}}, time_detail: {total_process_time: 3.66ms, total_wait_time: 18.6ms, total_kv_read_wall_time: 2ms, tikv_wall_time: 27.4ms} | range:[1696125963161,1696125963161], [1696126443462,1696126443462], ..., keep order:false | N/A | N/A | -| └─TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task: {num: 79, max: 4.98ms, min: 0s, avg: 514.9µs, p95: 3.75ms, max_proc_keys: 10, p95_proc_keys: 5, tot_proc: 15ms, tot_wait: 21.4ms, copr_cache_hit_ratio: 0.00, build_task_duration: 341.2µs, max_distsql_concurrency: 1, max_extra_concurrency: 7, store_batch_num: 62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail: {total_process_keys: 400, total_process_keys_size: 489856, total_keys: 800, get_snapshot_time: 20.8ms, rocksdb: {key_skipped_count: 400, block: {cache_hit_count: 1600}}}, time_detail: {total_process_time: 15ms, total_wait_time: 21.4ms, tikv_wall_time: 10.9ms} | keep order:false | N/A | N/A | +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | +| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | +| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` +**Partition table with global index** +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | +``` + +**Partition table with local index** +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| +| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | +``` [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] #### How to Create a Global Index on a Partitioned Table in TiDB @@ -550,7 +568,7 @@ To avoid hotspots when a new table or partition is created, it is often benefici **Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: ```sql -SELECT MIN(id), MAX(id) FROM employees; +SELECT MIN(id), MAX(id) FROM employees2; ``` - If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. From 09c68e6cd2f74165a53fa2ce165274294e3fe1ab Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:34:07 +0800 Subject: [PATCH 07/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 5912f031ed923..47c8c509855d3 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -634,7 +634,7 @@ When working with large tables (e.g., 120 million rows), transforming between pa 3. Import into: `IMPORT INTO ... FROM SELECT ...` 4. Online DDL: Direct schema transformation via `ALTER TABLE` -This section compares the efficiency and implications of both methods in both directions of conversion, and provides best practice recommendations. +This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. ### Method 1: Batch DML INSERT INTO ... SELECT ... From 854262b9e8416ee09861e54f7b45953f2b09c67c Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:34:20 +0800 Subject: [PATCH 08/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 47c8c509855d3..c50f74e21902a 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -29,7 +29,7 @@ By understanding these aspects, you can make informed decisions on whether and h > **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. -## Improving Query Efficiency +## Improving query efficiency ### Partition Pruning From 4b23f1149da381052667cfc1ef6ac49c8eb20ce2 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:35:44 +0800 Subject: [PATCH 09/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c50f74e21902a..459cb4283b6bb 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -646,7 +646,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT... +### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql SET tidb_dml_type = "bulk"; From cb32c4eeb8150ac70f8ed27ae16857cb0b732057 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 10:36:22 +0800 Subject: [PATCH 10/55] Update tidb_partitioned_tables_guide.md Co-authored-by: Lilian Lee --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 459cb4283b6bb..9ca0ce9b50555 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,4 +1,4 @@ -# Mastering TiDB Partitioned Tables +# Best Practices for Using TiDB Partitioned Tables ## Introduction From d04daf5cf2b2d5352a2d42f11e5987561a74d652 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 20:13:51 +0800 Subject: [PATCH 11/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 9ca0ce9b50555..2b660bc005bca 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -43,7 +43,7 @@ Partition pruning is most beneficial in scenarios where query predicates match t - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. - **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. -For more use cases, please refer to https://docs.pingcap.com/tidb/stable/partition-pruning/ +For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). ### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index From e1c059a516483d2f5b67f7c146436bebd737726e Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 29 Sep 2025 20:14:04 +0800 Subject: [PATCH 12/55] Update tidb_partitioned_tables_guide.md Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 2b660bc005bca..3a1764541ec83 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -631,7 +631,7 @@ When working with large tables (e.g., 120 million rows), transforming between pa 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` -3. Import into: `IMPORT INTO ... FROM SELECT ...` +3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` 4. Online DDL: Direct schema transformation via `ALTER TABLE` This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. From ea9c8f047bb4567091d17f2a64d487bfb77d7327 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 30 Sep 2025 11:37:20 +0800 Subject: [PATCH 13/55] Update tidb_partitioned_tables_guide.md Co-authored-by: Mattias Jonsson --- tidb_partitioned_tables_guide.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 3a1764541ec83..8b1b4b7cc81aa 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -77,7 +77,6 @@ CREATE TABLE `fa` ( KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -AUTO_INCREMENT=1284046228560811404 PARTITION BY RANGE (`yeardate`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), From 84f2eafa4a1cc0ae7ded198bde3a392ff8fad6e8 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 30 Sep 2025 11:39:52 +0800 Subject: [PATCH 14/55] Update tidb_partitioned_tables_guide.md Co-authored-by: Mattias Jonsson --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 8b1b4b7cc81aa..c26fa1ad297f9 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -626,7 +626,7 @@ show table employees2 PARTITION (p4) regions; ## Converting Between Partitioned and Non-Partitioned Tables -When working with large tables (e.g., 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +When working with large tables (e.g. in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` From 3061517b556dd3034262a23998153e4798994268 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Mon, 13 Oct 2025 15:55:09 +0800 Subject: [PATCH 15/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 54 ++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 6 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c26fa1ad297f9..094a81794a6e8 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -47,7 +47,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index -In TiDB, local indexes are the default indexing strategy for partitioned tables. Each partition maintains its own set of indexes, while a Global Index refers to an index that spans all partitions in a partitioned table. Unlike Local Indexes, which are partition-specific and stored separately within each partition, a Global Index maintains a single, unified index across the entire table. This index includes references to all rows, regardless of which partition they belong to, and thus can provide global queries and operations, such as joins or lookups, with faster access. +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### What Did We Test @@ -71,13 +71,13 @@ CREATE TABLE `fa` ( `account_id` bigint(20) NOT NULL, `sid` bigint(20) DEFAULT NULL, `user_id` bigint NOT NULL, - `yeardate` int NOT NULL, - PRIMARY KEY (`id`,`yeardate`) /*T![clustered_index] CLUSTERED */, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, KEY `index_fa_on_sid` (`sid`), KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`yeardate`) +PARTITION BY RANGE (`date`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), @@ -637,7 +637,48 @@ This section compares the efficiency and implications of these methods in both d ### Method 1: Batch DML INSERT INTO ... SELECT ... -**By Default** +#### Table Schema: `fa` +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +``` + + +#### Table Schema: `fa_new` +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +); +``` + +#### Description +Supports bi-directional import, suitable for very large datasets. +### Method 1: By Default ```sql SET tidb_mem_quota_query = 0; @@ -645,6 +686,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` + ### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql @@ -680,7 +722,7 @@ mysql> alter table fa REMOVE PARTITIONING; ```sql SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096; -ALTER TABLE fa PARTITION BY RANGE (`yearweek`) +ALTER TABLE fa PARTITION BY RANGE (`date`) (PARTITION `fa_2024001` VALUES LESS THAN (2024001), PARTITION `fa_2024002` VALUES LESS THAN (2024002), ... From 1b0893cbf21f963ef823860e1e3c1e237fca970f Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:56:16 +0800 Subject: [PATCH 16/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 094a81794a6e8..37e3a0aa3a513 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -4,7 +4,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like DROP PARTITION. This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. From b3f3cf758b0abaf2e6ec5676c71e1346d2cc9623 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:57:40 +0800 Subject: [PATCH 17/55] Update tidb_partitioned_tables_guide.md Co-authored-by: Hangjie Mo --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 37e3a0aa3a513..78f444fc5ffd2 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -218,7 +218,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** -- On a high-write table, TTL runs every 10 minutes. +- On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. - With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. From 8a2bf1183165e99d57d03017a67669807335ebef Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 15:57:56 +0800 Subject: [PATCH 18/55] Update tidb_partitioned_tables_guide.md Co-authored-by: Hangjie Mo --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 78f444fc5ffd2..034bcbbf4abd6 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -219,7 +219,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8–10 minutes, deleting 7–11 million rows. +- With 50 threads, each TTL job took 8–10 minutes, deleted 7–11 million rows. - With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. From 7bf8e9d01bb94bc884bafee2d6952d977d5623d3 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:50:30 +0800 Subject: [PATCH 19/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 034bcbbf4abd6..940e8fcdf5f06 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -746,4 +746,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -- Choose an offline method like `IMPORT INTO` when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +In this experiment, the table structures have been anonymized. For more detailed information on the usage of [TTL (Time To Live)](/time-to-live.md). \ No newline at end of file From 80289fe5e8374066d7326b5eb1811349c2875a85 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:50:51 +0800 Subject: [PATCH 20/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 940e8fcdf5f06..e033ede193faf 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on AUTO_INCREMENT-style IDs where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto_increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From 5700b84964177dd745e04a094cac194d9420ff68 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Mon, 13 Oct 2025 16:51:19 +0800 Subject: [PATCH 21/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e033ede193faf..9f0f4fd21162b 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -307,16 +307,7 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t **Global Index** ```sql -mysql> alter table A drop partition A_2024363; -Query OK, 0 rows affected (1 min 16.02 sec) -``` - -**Local Index** - -```sql -mysql> alter table A drop partition A_2024363; -Query OK, 0 rows affected (0.52 sec) -``` +ALTER TABLE A DROP PARTITION A_2024363; #### Recommendation From 82b27d4aeacd3f440d439d4494622d7e9620bf2a Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:21:47 +0800 Subject: [PATCH 22/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 65 ++++++++++++++++---------------- 1 file changed, 32 insertions(+), 33 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 9f0f4fd21162b..e623f6dd60682 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -27,7 +27,7 @@ This document examines partitioned tables in TiDB from multiple angles, includin By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. -> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](https://docs.pingcap.com/tidb/stable/partitioned-table) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. +> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](/partitioned-table.md) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. ## Improving query efficiency @@ -83,7 +83,7 @@ PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` #### SQL @@ -247,7 +247,7 @@ CREATE TABLE `ad_cache` ( ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' -TTL_JOB_INTERVAL='10m' +TTL_JOB_INTERVAL='10m'; ``` **Drop Partition (Range INTERVAL partitioning)** @@ -273,14 +273,14 @@ PARTITION BY RANGE COLUMNS (create_time) INTERVAL (10 MINUTE) FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') ... -LAST PARTITION LESS THAN ('2025-02-19 20:00:00') +LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. ```sql -ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}") -ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}") +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); ``` #### Recommendation @@ -308,6 +308,7 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t ```sql ALTER TABLE A DROP PARTITION A_2024363; +``` #### Recommendation @@ -374,7 +375,7 @@ PARTITION BY KEY (id) PARTITIONS 16; When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql -select * from server_info where `serial_no` = ? +select * from server_info where `serial_no` = ?; ``` **Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: @@ -415,6 +416,15 @@ However, if the initial write traffic to this new partition is **very high**, th **Impact:** This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + #### Solutions **1. NONCLUSTERED Partitioned Table** @@ -485,15 +495,15 @@ A common practice is to split the number of regions to **match** the number of T To split regions for the primary key of all partitions in a partitioned table, you can use a command like: ```sql -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; ``` -This example will split each partition's primary key range into `` regions between the specified boundary values. +This example will split each partition's primary key range into `` regions between the specified boundary values. **Splitting Regions for the secondary index of all partitions.** ```sql -SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` **(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** @@ -503,9 +513,9 @@ ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); SHOW TABLE employees PARTITION (p4) regions; -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; SHOW TABLE employees PARTITION (p4) regions; ``` @@ -572,13 +582,13 @@ A common practice is to split the number of regions to **match** the number of T **Splitting regions for all partitions** ```sql -SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; ``` **Splitting regions for the secondary index of all partitions.** ```sql -SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` **(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** @@ -588,9 +598,9 @@ ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); show table employees2 PARTITION (p4) regions; -SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; -SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; show table employees2 PARTITION (p4) regions; ``` @@ -607,13 +617,6 @@ show table employees2 PARTITION (p4) regions; **Recommendation:** - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -### Summary Table - -| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | ## Converting Between Partitioned and Non-Partitioned Tables @@ -626,8 +629,6 @@ When working with large tables (e.g. in this example 120 million rows), transfor This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -### Method 1: Batch DML INSERT INTO ... SELECT ... - #### Table Schema: `fa` ```sql CREATE TABLE `fa` ( @@ -647,7 +648,7 @@ PARTITION `fa_2024002` VALUES LESS THAN (2024002), PARTITION `fa_2024003` VALUES LESS THAN (2024003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)) +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` @@ -663,13 +664,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_sid` (`sid`), KEY `index_fa_on_account_id` (`account_id`), KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -); +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; ``` #### Description -Supports bi-directional import, suitable for very large datasets. -### Method 1: By Default +This example shows converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. +### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql SET tidb_mem_quota_query = 0; @@ -703,8 +703,7 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 ```sql SET @@global.tidb_ddl_reorg_worker_cnt = 16; SET @@global.tidb_ddl_reorg_batch_size = 4096; - -mysql> alter table fa REMOVE PARTITIONING; +alter table fa REMOVE PARTITIONING; -- real 170m12.024s (≈ 2h 50m) ``` @@ -737,4 +736,4 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -In this experiment, the table structures have been anonymized. For more detailed information on the usage of [TTL (Time To Live)](/time-to-live.md). \ No newline at end of file +Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From 97fd5c4b7aebabb49038b80e209eda6f2ce0a024 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:22:41 +0800 Subject: [PATCH 23/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e623f6dd60682..55cc03c0303b5 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -225,7 +225,8 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **Partition Drop Performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to Use TTL and Partition Drop in TiDB From 45bfa106ffb619f8b1abd84ac9d17c7ebcc55f90 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:24:16 +0800 Subject: [PATCH 24/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 55cc03c0303b5..8852de949d545 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -464,7 +464,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [merge_option=deny](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table From ad70a86f43f6b46bb2a2eb73189c62f7bdb41b81 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:25:54 +0800 Subject: [PATCH 25/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index e623f6dd60682..c11941c16368f 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -375,7 +375,7 @@ PARTITION BY KEY (id) PARTITIONS 16; When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql -select * from server_info where `serial_no` = ?; +SELECT * FROM server_info WHERE `serial_no` = ?; ``` **Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: From bc770ba865dced7c32cbb40dcdf956518e9c2a27 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 14:26:17 +0800 Subject: [PATCH 26/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 8852de949d545..26a42c3eea601 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -323,7 +323,7 @@ If you need to drop partitions frequently and minimize the performance impact on In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - A single Region handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. From afc19e4c3a038e192475999ab7de01dd84b547c2 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:27:36 +0800 Subject: [PATCH 27/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c11941c16368f..75d21f50176e0 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto_increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From 9b33c98e418631f69b3911e060eba5fc9d5e6cab Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:32:15 +0800 Subject: [PATCH 28/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index b27edac4ca73f..caec8b1d36904 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/autoincrement.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -225,8 +225,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **Partition Drop Performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. -- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to Use TTL and Partition Drop in TiDB @@ -323,7 +322,7 @@ If you need to drop partitions frequently and minimize the performance impact on In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - A single Region handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. @@ -464,7 +463,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table @@ -669,7 +668,7 @@ CREATE TABLE `fa` ( ``` #### Description -This example shows converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. +These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. ### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql From c16bae130bbe814b95cf8c399a933e6cb34146e4 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:32:54 +0800 Subject: [PATCH 29/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index caec8b1d36904..7a3ac514198f8 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -6,7 +6,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/autoincrement.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. From a724b1f23317ed7756a478704a709f85b3dc0056 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 14:55:53 +0800 Subject: [PATCH 30/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 7a3ac514198f8..eeee073f90689 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -162,9 +162,14 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -- Adds a global index to an existing partitioned table. -- GLOBAL must be explicitly specified. -- You can also use ADD INDEX for non-unique global indexes. + +Adds a global index to an existing partitioned table. + +- The `GLOBAL` keyword must be explicitly specified. +- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release **Option 2: Define Inline on Table Creation** From db96727a470153af5ce8f091a90fa660e3fd9059 Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 15:10:59 +0800 Subject: [PATCH 31/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index eeee073f90689..c8d3d63f5f47d 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -15,8 +15,8 @@ This document examines partitioned tables in TiDB from multiple angles, includin ## Agenda - Improving query efficiency - - Partition pruning - - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index + - Partition pruning + - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index - Facilitating bulk data deletion - Data cleanup efficiency: TTL vs. Direct Partition Drop - Partition drop efficiency: Local Index vs Global Index From 515a82e5ef604c7dde0c3f528292ffd893b5437f Mon Sep 17 00:00:00 2001 From: shaoxiqian <85105033+shaoxiqian@users.noreply.github.com> Date: Tue, 14 Oct 2025 15:11:29 +0800 Subject: [PATCH 32/55] Update tidb_partitioned_tables_guide.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Daniël van Eeden --- tidb_partitioned_tables_guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index c8d3d63f5f47d..a6d90d4477ef4 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -52,6 +52,7 @@ In TiDB, local indexes are the default for partitioned tables. Each partition ha #### What Did We Test We evaluated query performance across three table configurations in TiDB: + - Non-Partitioned Table - Partitioned Table with Global Index - Partitioned Table with Local Index From 2875c0e89d85c3cc8b7f796895b69cb14525dc84 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 15:15:35 +0800 Subject: [PATCH 33/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index eeee073f90689..abf9dc874ed52 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -39,7 +39,7 @@ By understanding these aspects, you can make informed decisions on whether and h Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: -- **Time-series data queries**: When data is partitioned by time ranges (e.g., daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. - **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. @@ -106,7 +106,7 @@ WHERE `fa`.`sid` IN ( #### Findings -Data came from a table with **366 range partitions** (e.g., by date). +Data came from a table with **366 range partitions** (for example, by date). - The **Average Query Time** was obtained from the statement_summary view. - The query used a **secondary index** and returned **400 rows**. @@ -196,7 +196,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degradation. - With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (i.e., the number of rows requiring table lookups). +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendation @@ -625,7 +625,7 @@ show table employees2 PARTITION (p4) regions; ## Converting Between Partitioned and Non-Partitioned Tables -When working with large tables (e.g. in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: +When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: 1. Batch DML: `INSERT INTO ... SELECT ...` 2. Pipeline DML: `INSERT INTO ... SELECT ...` From 9fa0726678ba924349bc1f3af22408894ebde2fb Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:11:55 +0800 Subject: [PATCH 34/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index efc844d071ac2..6f0d1cc7dc781 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,5 +1,5 @@ # Best Practices for Using TiDB Partitioned Tables - +This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. ## Introduction Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. @@ -10,24 +10,9 @@ Another frequent scenario is using **hash or key partitioning** to address write While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. -This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it aims to equip you with the knowledge to make informed decisions about when and how to adopt partitioning strategies in your TiDB environment. - -## Agenda - -- Improving query efficiency - - Partition pruning - - Query performance comparison: Non-Partitioned Table vs. Local Index vs. Global Index -- Facilitating bulk data deletion - - Data cleanup efficiency: TTL vs. Direct Partition Drop - - Partition drop efficiency: Local Index vs Global Index -- Mitigating write hotspot issues -- Partition management challenge - - How to avoid hotspots caused by new range partitions -- Converting between partitioned and non-partitioned tables - -By understanding these aspects, you can make informed decisions on whether and how to implement partitioning in your TiDB environment. +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. -> **Note:** If you're new to partitioned tables in TiDB, we recommend reviewing the [Partitioned Table User Guide](/partitioned-table.md) first to better understand key concepts like partition pruning, global vs. local indexes, and partition strategies. +> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. ## Improving query efficiency From f9461289e21c95074afdafce6f8071b23675029d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:27:10 +0800 Subject: [PATCH 35/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 6f0d1cc7dc781..2be4b97881943 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -210,8 +210,8 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8–10 minutes, deleted 7–11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15–30 minutes, with greater variance. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** From bc27e69ebba1f58ff39c0f156bedc4f0e6f244af Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:35:56 +0800 Subject: [PATCH 36/55] Update tidb_partitioned_tables_guide.md --- tidb_partitioned_tables_guide.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md index 2be4b97881943..0792c1d70684b 100644 --- a/tidb_partitioned_tables_guide.md +++ b/tidb_partitioned_tables_guide.md @@ -1,5 +1,7 @@ # Best Practices for Using TiDB Partitioned Tables + This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. + ## Introduction Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. @@ -121,6 +123,7 @@ Metrics collected: ``` **Partition table with global index** + ```yaml | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| @@ -130,6 +133,7 @@ Metrics collected: ``` **Partition table with local index** + ```yaml | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| @@ -215,6 +219,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** + - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. - DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. @@ -328,10 +333,12 @@ TiDB stores table data in **Regions**, each covering a continuous range of row k When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: **Without Partitioning:** + - New rows always have the highest key values and are inserted into the same "last Region." - That Region is served by one TiKV node at a time, becoming a single write bottleneck. **With Hash/Key Partitioning:** + - The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. - Each partition has its own set of Regions, often distributed across different TiKV nodes. - Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. @@ -390,9 +397,11 @@ New range partitions in a partitioned table can easily lead to hotspot issues in When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. **Root Cause:** + By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. **Impact:** + When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. **Write Hotspot** @@ -405,6 +414,7 @@ In TiDB, any newly created table or partition initially contains only **one regi However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** + This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. @@ -421,13 +431,16 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to **1. NONCLUSTERED Partitioned Table** **Pros:** + - When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** + - Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. **Recommendation:** + - Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. **Best Practices** @@ -514,12 +527,15 @@ SHOW TABLE employees PARTITION (p4) regions; **2. CLUSTERED Partitioned Table** **Pros:** + - Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. **Cons:** + - **Manual region splitting** is required when creating new partitions, increasing operational complexity. **Recommendation:** + - Ideal when low-latency point queries are important and operational resources are available to manage region splitting. **Best Practices** @@ -599,13 +615,16 @@ show table employees2 PARTITION (p4) regions; **3. CLUSTERED Non-partitioned Table** **Pros:** + - **No hotspot risk from new partitions**. - Provides **good read performance** for point and range queries. **Cons:** + - **Cannot use DROP PARTITION** to clean up large volumes of old data. **Recommendation:** + - Best suited for use cases that require stable performance and do not benefit from partition-based data management. @@ -621,6 +640,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. #### Table Schema: `fa` + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -644,6 +664,7 @@ PARTITION `fa_2024366` VALUES LESS THAN (2024366)); #### Table Schema: `fa_new` + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -659,7 +680,9 @@ CREATE TABLE `fa` ( ``` #### Description + These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. + ### Method 1: Batch DML INSERT INTO ... SELECT ... ```sql From 831b91e0f72e993280db9b6be3b4696d5a6401de Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 16:43:58 +0800 Subject: [PATCH 37/55] Create tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 753 +++++++++++++++++++++++++++++++ 1 file changed, 753 insertions(+) create mode 100644 tidb-partitioned-tables-guide.md diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md new file mode 100644 index 0000000000000..2abe1d6f8a32f --- /dev/null +++ b/tidb-partitioned-tables-guide.md @@ -0,0 +1,753 @@ +# Best Practices for Using TiDB Partitioned Tables + +This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. + +## Introduction + +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. + +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. + +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. + +While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. + +This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. + +> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. + +## Improving query efficiency + +### Partition Pruning + +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. + +#### Applicable Scenarios + +Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: + +- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. +- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. +- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. + +For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). + +### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index + +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. + +#### What Did We Test + +We evaluated query performance across three table configurations in TiDB: + +- Non-Partitioned Table +- Partitioned Table with Global Index +- Partitioned Table with Local Index + +#### Test Setup + +- The query **accesses data via a secondary index** and uses IN conditions across multiple values. +- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. +- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. +- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. + +#### Schema + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +``` + +#### SQL + +```sql +SELECT `fa`.* +FROM `fa` +WHERE `fa`.`sid` IN ( + 1696271179344, + 1696317134004, + 1696181972136, + ... + 1696159221765 +); +``` + +- Query filters on secondary index, but does **not include the partition key**. +- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Table lookup tasks are significantly higher for partitioned tables. + +#### Findings + +Data came from a table with **366 range partitions** (for example, by date). +- The **Average Query Time** was obtained from the statement_summary view. +- The query used a **secondary index** and returned **400 rows**. + +Metrics collected: +- **Average Query Time**: from statement_summary +- **Cop Tasks** (Index Scan + Table Lookup): from execution plan + +#### Test Results + +| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | +|---|---|---|---|---|---| +| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | +| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | +| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | + +#### Execution Plan Examples + +**Non-partitioned table** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | +| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | +| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | +``` + +**Partition table with global index** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| +| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | +``` + +**Partition table with local index** + +```yaml +| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | +|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| +| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | +| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | +| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | +``` +[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] + +#### How to Create a Global Index on a Partitioned Table in TiDB + +**Option 1: Add via ALTER TABLE** + +```sql +ALTER TABLE +ADD UNIQUE INDEX (col1, col2) GLOBAL; +``` + + +Adds a global index to an existing partitioned table. + +- The `GLOBAL` keyword must be explicitly specified. +- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release + +**Option 2: Define Inline on Table Creation** + +```sql +CREATE TABLE t ( + id BIGINT NOT NULL, + col1 VARCHAR(50), + col2 VARCHAR(50), + -- other columns... + + UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) +) +PARTITION BY RANGE (id) ( + PARTITION p0 VALUES LESS THAN (10000), + PARTITION p1 VALUES LESS THAN (20000), + PARTITION pMax VALUES LESS THAN MAXVALUE +); +``` + +#### Summary + +The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. + +- The more partitions you have, the more severe the potential performance degradation. +- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). + +#### Recommendation + +- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. +- Use global indexes when query performance across partitions is critical. +- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. + +## Facilitating Bulk Data Deletion + +### Data Cleanup Efficiency: TTL vs. Direct Partition Drop + +In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. + +#### What's the difference? + +- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. +- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. + +#### What Did We Test + +To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. + +#### Findings + +**TTL Performance:** +- On a write-heavy table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + +**Partition Drop Performance:** + +- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. +- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. + +#### How to Use TTL and Partition Drop in TiDB + +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. + +**TTL schema** + +```sql +CREATE TABLE `ad_cache` ( + `session` varchar(255) NOT NULL, + `ad_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `data` mediumblob DEFAULT NULL, + `version` int(11) DEFAULT NULL, + `is_delete` tinyint(1) DEFAULT NULL, + PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) +) +ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' +TTL_JOB_INTERVAL='10m'; +``` + +**Drop Partition (Range INTERVAL partitioning)** + +```sql +CREATE TABLE `ad_cache` ( + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, + `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, + `id_suffix` bigint(20) NOT NULL, + `expire_time` timestamp NULL DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, + PRIMARY KEY ( + `session_id`, `external_id`, + `create_time`, `id_suffix` + ) NONCLUSTERED +) +SHARD_ROW_ID_BITS=7 +PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE COLUMNS (create_time) +INTERVAL (10 MINUTE) +FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') +... +LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); +``` + +It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. + +```sql +ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); +ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); +``` + +#### Recommendation + +For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. + +### Partition Drop Efficiency: Local Index vs Global Index + +Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. + +#### What Did We Test + +We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. + +| Index Type | Duration (drop partition) | +|---|---| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | + +#### Findings + +Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. + +**Global Index** + +```sql +ALTER TABLE A DROP PARTITION A_2024363; +``` + +#### Recommendation + +When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. + +If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. + +## Mitigating Write Hotspot Issues + +### Background + +In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. + +This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: + +- A single Region handling most of the write workload, while other Regions remain idle. +- Higher write latency and reduced throughput. +- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. + +**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. + +### How It Works + +TiDB stores table data in **Regions**, each covering a continuous range of row keys. + +When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: + +**Without Partitioning:** + +- New rows always have the highest key values and are inserted into the same "last Region." +- That Region is served by one TiKV node at a time, becoming a single write bottleneck. + +**With Hash/Key Partitioning:** + +- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. +- Each partition has its own set of Regions, often distributed across different TiKV nodes. +- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. + +### Use Case + +If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. + +```sql +CREATE TABLE server_info ( + id bigint NOT NULL AUTO_INCREMENT, + serial_no varchar(100) DEFAULT NULL, + device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, + device_type varchar(50) DEFAULT NULL, + modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, + PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, + KEY idx_serial_no (serial_no), + KEY idx_modified_ts (modified_ts) +) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ +PARTITION BY KEY (id) PARTITIONS 16; +``` + +### Pros + +- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. + +### Cons + +**Potential Query Performance Drop Without Partition Pruning** + +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: + +```sql +SELECT * FROM server_info WHERE `serial_no` = ?; +``` + +**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: + +```sql +ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; +``` + +## Partition Management Challenge + +### How to Avoid Hotspots Caused by New Range Partitions + +#### Overview + +New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. + +#### Common Hotspot Scenarios + +**Read Hotspot** + +When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. + +**Root Cause:** + +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. + +**Impact:** + +When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. + +**Write Hotspot** + +When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: + +**Root Cause:** +In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. + +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. + +**Impact:** + +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. + + +### Summary Table + +| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | +|---|---|---|---|---|---| +| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | +| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | +| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | + +#### Solutions + +**1. NONCLUSTERED Partitioned Table** + +**Pros:** + +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- Lower operational overhead. + +**Cons:** + +- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. + +**Recommendation:** + +- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. + +**Best Practices** + +Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). + +```sql +CREATE TABLE employees ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) NONCLUSTERED, + KEY `idx_employees_on_store_id` (`store_id`) +)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +-- table +ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; +-- partition +ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for the primary key of all partitions** + +To split regions for the primary key of all partitions in a partitioned table, you can use a command like: + +```sql +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; +``` + +This example will split each partition's primary key range into `` regions between the specified boundary values. + +**Splitting Regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** + +```sql +ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +SHOW TABLE employees PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +SHOW TABLE employees PARTITION (p4) regions; +``` + +**2. CLUSTERED Partitioned Table** + +**Pros:** + +- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. + +**Cons:** + +- **Manual region splitting** is required when creating new partitions, increasing operational complexity. + +**Recommendation:** + +- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. + +**Best Practices** + +Create a CLUSTERED partitioned table. + +```sql +CREATE TABLE employees2 ( + id INT NOT NULL, + fname VARCHAR(30), + lname VARCHAR(30), + hired DATE NOT NULL DEFAULT '1970-01-01', + separated DATE DEFAULT '9999-12-31', + job_code INT, + store_id INT, + PRIMARY KEY (`id`,`hired`) CLUSTERED, + KEY `idx_employees2_on_store_id` (`store_id`) +) +PARTITION BY RANGE ( YEAR(hired) ) ( + PARTITION p0 VALUES LESS THAN (1991), + PARTITION p1 VALUES LESS THAN (1996), + PARTITION p2 VALUES LESS THAN (2001), + PARTITION p3 VALUES LESS THAN (2006) +); +``` + +Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. + +```sql +ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; +``` + +**Determining split boundaries based on existing business data** + +To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. + +**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: + +```sql +SELECT MIN(id), MAX(id) FROM employees2; +``` + +- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. +- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. +- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. + +**Pre-split and scatter regions** + +A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. + +**Splitting regions for all partitions** + +```sql +SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; +``` + +**Splitting regions for the secondary index of all partitions.** + +```sql +SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; +``` + +**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** + +```sql +ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); + +show table employees2 PARTITION (p4) regions; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; + +SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; + +show table employees2 PARTITION (p4) regions; +``` + +**3. CLUSTERED Non-partitioned Table** + +**Pros:** + +- **No hotspot risk from new partitions**. +- Provides **good read performance** for point and range queries. + +**Cons:** + +- **Cannot use DROP PARTITION** to clean up large volumes of old data. + +**Recommendation:** + +- Best suited for use cases that require stable performance and do not benefit from partition-based data management. + + +## Converting Between Partitioned and Non-Partitioned Tables + +When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: + +1. Batch DML: `INSERT INTO ... SELECT ...` +2. Pipeline DML: `INSERT INTO ... SELECT ...` +3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` +4. Online DDL: Direct schema transformation via `ALTER TABLE` + +This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. + +#### Table Schema: `fa` + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin +PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +PARTITION `fa_2024003` VALUES LESS THAN (2024003), +... +... +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +``` + + +#### Table Schema: `fa_new` + +```sql +CREATE TABLE `fa` ( + `id` bigint NOT NULL AUTO_INCREMENT, + `account_id` bigint(20) NOT NULL, + `sid` bigint(20) DEFAULT NULL, + `user_id` bigint NOT NULL, + `date` int NOT NULL, + PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, + KEY `index_fa_on_sid` (`sid`), + KEY `index_fa_on_account_id` (`account_id`), + KEY `index_fa_on_user_id` (`user_id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; +``` + +#### Description + +These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. + +### Method 1: Batch DML INSERT INTO ... SELECT ... + +```sql +SET tidb_mem_quota_query = 0; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 1h 52m 47s +``` + + +### Method 2: Pipeline DML INSERT INTO ... SELECT ... + +```sql +SET tidb_dml_type = "bulk"; +SET tidb_mem_quota_query = 0; +SET tidb_enable_mutation_checker = OFF; +INSERT INTO fa_new SELECT * FROM fa; +-- 120 million rows copied in 58m 42s +``` + +### Method 3: IMPORT INTO ... FROM SELECT ... + +```sql +mysql> import into fa_new from select * from fa with thread=32,disable_precheck; +Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) +Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 +``` + +### Method 4: Online DDL + +**From partition table to non-partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +alter table fa REMOVE PARTITIONING; +-- real 170m12.024 s (≈ 2 h 50 m) +``` + +**From non-partition table to partitioned table** + +```sql +SET @@global.tidb_ddl_reorg_worker_cnt = 16; +SET @@global.tidb_ddl_reorg_batch_size = 4096; +ALTER TABLE fa PARTITION BY RANGE (`date`) +(PARTITION `fa_2024001` VALUES LESS THAN (2024001), +PARTITION `fa_2024002` VALUES LESS THAN (2024002), +... +PARTITION `fa_2024365` VALUES LESS THAN (2024365), +PARTITION `fa_2024366` VALUES LESS THAN (2024366)); + +Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) +``` + +### Findings + +| Method | Time Taken | +|---|---| +| Method 1: Batch DML INSERT INTO ... SELECT | 1 h 52 m 47 s | +| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58 m 42 s | +| Method 3: IMPORT INTO ... FROM SELECT ... | 16 m 59 s | +| Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | +| Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | + +### Recommendation + +TiDB offers two approaches for converting tables between partitioned and non-partitioned states: + +Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From 0dc299d9d9ca445d62390cbeb6c7f342c8711fac Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:10:41 +0800 Subject: [PATCH 38/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 1 - 1 file changed, 1 deletion(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 2abe1d6f8a32f..449fb120ef75e 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -152,7 +152,6 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` - Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. From 017a1adf7ee4d45894628481145736b160d9dc43 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:22:10 +0800 Subject: [PATCH 39/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 449fb120ef75e..6ea473e0d9490 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -114,7 +114,7 @@ Metrics collected: **Non-partitioned table** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| | IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | @@ -124,7 +124,7 @@ Metrics collected: **Partition table with global index** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| | IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | @@ -134,7 +134,7 @@ Metrics collected: **Partition table with local index** -```yaml +``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | |------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| | IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | @@ -152,6 +152,7 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` + Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. From 3517f7fb57d3fbe29d6ac9bb07bb9e7b418946f3 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 17:34:06 +0800 Subject: [PATCH 40/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 6ea473e0d9490..07785d723b8e8 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -95,11 +95,13 @@ WHERE `fa`.`sid` IN ( #### Findings Data came from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the statement_summary view. + +- The **Average Query Time** was obtained from the `statement_summary` view. - The query used a **secondary index** and returned **400 rows**. Metrics collected: -- **Average Query Time**: from statement_summary + +- **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from execution plan #### Test Results From 5013221dbfd85cd3553b5dd8163cc56fbbb15e3d Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 18:26:31 +0800 Subject: [PATCH 41/55] rename the file --- tidb-partitioned-tables-guide.md | 16 +- tidb_partitioned_tables_guide.md | 753 ------------------------------- 2 files changed, 8 insertions(+), 761 deletions(-) delete mode 100644 tidb_partitioned_tables_guide.md diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 07785d723b8e8..e9fda6b2982a6 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -143,6 +143,7 @@ Metrics collected: | IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | | TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | ``` + [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] #### How to Create a Global Index on a Partitioned Table in TiDB @@ -154,14 +155,13 @@ ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` - Adds a global index to an existing partitioned table. - The `GLOBAL` keyword must be explicitly specified. - For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. - - Not supported in v8.5.x - - Available starting from v9.0.0-beta.1 - - Expected to be included in the next LTS release + - Not supported in v8.5.x + - Available starting from v9.0.0-beta.1 + - Expected to be included in the next LTS release **Option 2: Define Inline on Table Creation** @@ -215,10 +215,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** -- On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** diff --git a/tidb_partitioned_tables_guide.md b/tidb_partitioned_tables_guide.md deleted file mode 100644 index 0792c1d70684b..0000000000000 --- a/tidb_partitioned_tables_guide.md +++ /dev/null @@ -1,753 +0,0 @@ -# Best Practices for Using TiDB Partitioned Tables - -This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. - -## Introduction - -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. - -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. - -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. - -While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. - -This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. - -> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. - -## Improving query efficiency - -### Partition Pruning - -**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. - -#### Applicable Scenarios - -Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: - -- **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. -- **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. - -For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). - -### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index - -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. - -#### What Did We Test - -We evaluated query performance across three table configurations in TiDB: - -- Non-Partitioned Table -- Partitioned Table with Global Index -- Partitioned Table with Local Index - -#### Test Setup - -- The query **accesses data via a secondary index** and uses IN conditions across multiple values. -- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. -- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. -- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. - -#### Schema - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), -... -... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); -``` - -#### SQL - -```sql -SELECT `fa`.* -FROM `fa` -WHERE `fa`.`sid` IN ( - 1696271179344, - 1696317134004, - 1696181972136, - ... - 1696159221765 -); -``` - -- Query filters on secondary index, but does **not include the partition key**. -- Causes **Local Index** to scan across all partitions due to lack of pruning. -- Table lookup tasks are significantly higher for partitioned tables. - -#### Findings - -Data came from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the statement_summary view. -- The query used a **secondary index** and returned **400 rows**. - -Metrics collected: -- **Average Query Time**: from statement_summary -- **Cop Tasks** (Index Scan + Table Lookup): from execution plan - -#### Test Results - -| Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | -|---|---|---|---|---|---| -| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | -| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | -| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | - -#### Execution Plan Examples - -**Non-partitioned table** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|---------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|----------|------| -| IndexLookUp_7 | 398.73 | 787052.13 | 400 | root | | time:11.5ms, loops:2, index_task:{total_time:3.34ms, fetch_handle:3.34ms, build:600ns, wait:2.86µs}, table_task:{total_time:7.55ms, num:1, concurrency:5}, next:{wait_index:3.49ms, wait_table_lookup_build:492.5µs, wait_table_lookup_resp:7.05ms} | | 706.7 KB | N/A | -| IndexRangeScan_5(Build) | 398.73 | 90633.86 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:3.16ms, loops:3, cop_task:{num:72, max:780.4µs, min:394.2µs, avg:566.7µs, p95:748µs, max_proc_keys:20, p95_proc_keys:10, tot_proc:3.66ms, tot_wait:18.6ms, copr_cache_hit_ratio:0.00, build_task_duration:94µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:72, total_time:40.1ms}}, tikv_task:{proc max:1ms, min:0s, avg:27.8µs, p80:0s, p95:0s, iters:72, tasks:72}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:480, get_snapshot_time:17.7ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:160}}}, time_detail:{total_process_time:3.66ms, total_wait_time:18.6ms, total_kv_read_wall_time:2ms, tikv_wall_time:27.4ms} | range:[1696125963161,1696125963161], …, [1696317134004,1696317134004], keep order:false | N/A | N/A | -| TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | -``` - -**Partition table with global index** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|------------------------|---------|-----------|---------|-----------|-------------------------------------------------|----------------|---------------|----------|------| -| IndexLookUp_8 | 398.73 | 786959.21 | 400 | root | partition:all | time:12.8ms, loops:2, index_task:{total_time:2.71ms, fetch_handle:2.71ms, build:528ns, wait:3.23µs}, table_task:{total_time:9.03ms, num:1, concurrency:5}, next:{wait_index:3.27ms, wait_table_lookup_build:1.49ms, wait_table_lookup_resp:7.53ms} | | 693.9 KB | N/A | -| IndexRangeScan_5(Build)| 398.73 | 102593.43 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid_global(sid, id)| time:2.49ms, loops:3, cop_task:{num:69, max:997µs, min:213.8µs, avg:469.8µs, p95:986.6µs, max_proc_keys:15, p95_proc_keys:10, tot_proc:13.4ms, tot_wait:1.52ms, copr_cache_hit_ratio:0.00, build_task_duration:498.4µs, max_distsql_concurrency:15}, rpc_info:{Cop:{num_rpc:69, total_time:31.8ms}}, tikv_task:{proc max:1ms, min:0s, avg:101.4µs, p80:0s, p95:1ms, iters:69, tasks:69}, scan_detail:{total_process_keys:400, total_process_keys_size:31200, total_keys:480, get_snapshot_time:679.9µs, rocksdb:{key_skipped_count:400, block:{cache_hit_count:189, read_count:54, read_byte:347.7 KB, read_time:6.17ms}}}, time_detail:{total_process_time:13.4ms, total_wait_time:1.52ms, total_kv_read_wall_time:7ms, tikv_wall_time:19.3ms} | range:[1696125963161,1696125963161], …, keep order:false, stats:partial[...] | N/A | N/A | -| TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | -``` - -**Partition table with local index** - -```yaml -| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | -|------------------------|---------|-----------|---------|-----------|--------------------------------------|----------------|---------------|---------|-------| -| IndexLookUp_7 | 398.73 | 784450.63 | 400 | root | partition:all | time:290.8ms, loops:2, index_task:{total_time:103.6ms, fetch_handle:7.74ms, build:133.2µs, wait:95.7ms}, table_task:{total_time:551.1ms, num:217, concurrency:5}, next:{wait_index:179.6ms, wait_table_lookup_build:391µs, wait_table_lookup_resp:109.5ms} | | 4.30 MB | N/A | -| IndexRangeScan_5(Build)| 398.73 | 90633.73 | 400 | cop[tikv] | table:fa, index:index_fa_on_sid(sid) | time:10.8ms, loops:800, cop_task:{num:600, max:65.6ms, min:1.02ms, avg:22.2ms, p95:45.1ms, max_proc_keys:5, p95_proc_keys:3, tot_proc:6.81s, tot_wait:4.77s, copr_cache_hit_ratio:0.00, build_task_duration:172.8ms, max_distsql_concurrency:3}, rpc_info:{Cop:{num_rpc:600, total_time:13.3s}}, tikv_task:{proc max:54ms, min:0s, avg:13.9ms, p80:20ms, p95:30ms, iters:600, tasks:600}, scan_detail:{total_process_keys:400, total_process_keys_size:22800, total_keys:29680, get_snapshot_time:2.47s, rocksdb:{key_skipped_count:400, block:{cache_hit_count:117580, read_count:29437, read_byte:104.9 MB, read_time:3.24s}}}, time_detail:{total_process_time:6.81s, total_suspend_time:1.51s, total_wait_time:4.77s, total_kv_read_wall_time:8.31s, tikv_wall_time:13.2s}} | range:[1696125963161,...,1696317134004], keep order:false, stats:partial[...] | N/A | N/A | -| TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | -``` -[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] - -#### How to Create a Global Index on a Partitioned Table in TiDB - -**Option 1: Add via ALTER TABLE** - -```sql -ALTER TABLE -ADD UNIQUE INDEX (col1, col2) GLOBAL; -``` - - -Adds a global index to an existing partitioned table. - -- The `GLOBAL` keyword must be explicitly specified. -- For non-unique global indexes, use `ADD INDEX` instead of `ADD UNIQUE INDEX`. - - Not supported in v8.5.x - - Available starting from v9.0.0-beta.1 - - Expected to be included in the next LTS release - -**Option 2: Define Inline on Table Creation** - -```sql -CREATE TABLE t ( - id BIGINT NOT NULL, - col1 VARCHAR(50), - col2 VARCHAR(50), - -- other columns... - - UNIQUE GLOBAL INDEX idx_col1_col2 (col1, col2) -) -PARTITION BY RANGE (id) ( - PARTITION p0 VALUES LESS THAN (10000), - PARTITION p1 VALUES LESS THAN (20000), - PARTITION pMax VALUES LESS THAN MAXVALUE -); -``` - -#### Summary - -The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. - -- The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). - -#### Recommendation - -- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. -- Use global indexes when query performance across partitions is critical. -- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. - -## Facilitating Bulk Data Deletion - -### Data Cleanup Efficiency: TTL vs. Direct Partition Drop - -In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. - -#### What's the difference? - -- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. -- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. - -#### What Did We Test - -To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. - -#### Findings - -**TTL Performance:** -- On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. - -**Partition Drop Performance:** - -- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. - -#### How to Use TTL and Partition Drop in TiDB - -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. - -**TTL schema** - -```sql -CREATE TABLE `ad_cache` ( - `session` varchar(255) NOT NULL, - `ad_id` varbinary(255) NOT NULL, - `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `suffix` bigint(20) NOT NULL, - `expire_time` timestamp NULL DEFAULT NULL, - `data` mediumblob DEFAULT NULL, - `version` int(11) DEFAULT NULL, - `is_delete` tinyint(1) DEFAULT NULL, - PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) -) -ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' -TTL_JOB_INTERVAL='10m'; -``` - -**Drop Partition (Range INTERVAL partitioning)** - -```sql -CREATE TABLE `ad_cache` ( - `session_id` varchar(255) NOT NULL, - `external_id` varbinary(255) NOT NULL, - `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `id_suffix` bigint(20) NOT NULL, - `expire_time` timestamp NULL DEFAULT NULL, - `cache_data` mediumblob DEFAULT NULL, - `data_version` int(11) DEFAULT NULL, - `is_deleted` tinyint(1) DEFAULT NULL, - PRIMARY KEY ( - `session_id`, `external_id`, - `create_time`, `id_suffix` - ) NONCLUSTERED -) -SHARD_ROW_ID_BITS=7 -PRE_SPLIT_REGIONS=2 -PARTITION BY RANGE COLUMNS (create_time) -INTERVAL (10 MINUTE) -FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') -... -LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); -``` - -It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. - -```sql -ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); -ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); -``` - -#### Recommendation - -For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. - -### Partition Drop Efficiency: Local Index vs Global Index - -Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. - -#### What Did We Test - -We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. - -| Index Type | Duration (drop partition) | -|---|---| -| Global Index | 1 min 16.02 s | -| Local Index | 0.52 s | - -#### Findings - -Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. - -**Global Index** - -```sql -ALTER TABLE A DROP PARTITION A_2024363; -``` - -#### Recommendation - -When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. - -If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. - -## Mitigating Write Hotspot Issues - -### Background - -In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. - -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: - -- A single Region handling most of the write workload, while other Regions remain idle. -- Higher write latency and reduced throughput. -- Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. - -**Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. - -### How It Works - -TiDB stores table data in **Regions**, each covering a continuous range of row keys. - -When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: - -**Without Partitioning:** - -- New rows always have the highest key values and are inserted into the same "last Region." -- That Region is served by one TiKV node at a time, becoming a single write bottleneck. - -**With Hash/Key Partitioning:** - -- The table and the secondary indexes are split into multiple partitions using a hash or key function on the primary key or indexed columns. -- Each partition has its own set of Regions, often distributed across different TiKV nodes. -- Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. - -### Use Case - -If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. - -```sql -CREATE TABLE server_info ( - id bigint NOT NULL AUTO_INCREMENT, - serial_no varchar(100) DEFAULT NULL, - device_name varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL, - device_type varchar(50) DEFAULT NULL, - modified_ts timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, - PRIMARY KEY (id) /*T![clustered_index] CLUSTERED */, - KEY idx_serial_no (serial_no), - KEY idx_modified_ts (modified_ts) -) /*T![auto_id_cache] AUTO_ID_CACHE=1 */ -PARTITION BY KEY (id) PARTITIONS 16; -``` - -### Pros - -- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. -- **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. - -### Cons - -**Potential Query Performance Drop Without Partition Pruning** - -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: - -```sql -SELECT * FROM server_info WHERE `serial_no` = ?; -``` - -**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: - -```sql -ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; -``` - -## Partition Management Challenge - -### How to Avoid Hotspots Caused by New Range Partitions - -#### Overview - -New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. - -#### Common Hotspot Scenarios - -**Read Hotspot** - -When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. - -**Root Cause:** - -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. - -**Impact:** - -When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. - -**Write Hotspot** - -When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: - -**Root Cause:** -In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. - -However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. - -**Impact:** - -This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. - - -### Summary Table - -| Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | -|---|---|---|---|---|---| -| NONCLUSTERED Partitioned | Low (with merge_option=deny) | Low (auto pre-split) | Low | Moderate (extra lookups) | Fast (DROP PARTITION) | -| CLUSTERED Partitioned | Medium (manual intervention) | Medium (manual split) | High | High (direct access) | Fast (DROP PARTITION) | -| CLUSTERED Non-partitioned | None | Medium (single table) | Low | High | Slow (DELETE/TTL) | - -#### Solutions - -**1. NONCLUSTERED Partitioned Table** - -**Pros:** - -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. -- Lower operational overhead. - -**Cons:** - -- Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. - -**Recommendation:** - -- Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. - -**Best Practices** - -Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). - -```sql -CREATE TABLE employees ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE DEFAULT '9999-12-31', - job_code INT, - store_id INT, - PRIMARY KEY (`id`,`hired`) NONCLUSTERED, - KEY `idx_employees_on_store_id` (`store_id`) -)SHARD_ROW_ID_BITS = 2 PRE_SPLIT_REGIONS=2 -PARTITION BY RANGE ( YEAR(hired) ) ( - PARTITION p0 VALUES LESS THAN (1991), - PARTITION p1 VALUES LESS THAN (1996), - PARTITION p2 VALUES LESS THAN (2001), - PARTITION p3 VALUES LESS THAN (2006) -); -``` - -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. - -```sql --- table -ALTER TABLE employees ATTRIBUTES 'merge_option=deny'; --- partition -ALTER TABLE employees PARTITION `p3` ATTRIBUTES 'merge_option=deny'; -``` - -**Determining split boundaries based on existing business data** - -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. - -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: - -```sql -SELECT MIN(id), MAX(id) FROM employees; -``` - -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. - -**Pre-split and scatter regions** - -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. - -**Splitting regions for the primary key of all partitions** - -To split regions for the primary key of all partitions in a partitioned table, you can use a command like: - -```sql -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "1970-01-01") AND (100000, "9999-12-31") REGIONS ; -``` - -This example will split each partition's primary key range into `` regions between the specified boundary values. - -**Splitting Regions for the secondary index of all partitions.** - -```sql -SPLIT PARTITION TABLE employees INDEX `idx_employees_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -``` - -**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** - -```sql -ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - -SHOW TABLE employees PARTITION (p4) regions; - -SPLIT PARTITION TABLE employees INDEX `PRIMARY` BETWEEN (1, "2006-01-01") AND (100000, "2011-01-01") REGIONS ; - -SPLIT PARTITION TABLE employees PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - -SHOW TABLE employees PARTITION (p4) regions; -``` - -**2. CLUSTERED Partitioned Table** - -**Pros:** - -- Queries using **Point Get** or **Table Range Scan** do **not** need additional lookups, resulting in better **read performance**. - -**Cons:** - -- **Manual region splitting** is required when creating new partitions, increasing operational complexity. - -**Recommendation:** - -- Ideal when low-latency point queries are important and operational resources are available to manage region splitting. - -**Best Practices** - -Create a CLUSTERED partitioned table. - -```sql -CREATE TABLE employees2 ( - id INT NOT NULL, - fname VARCHAR(30), - lname VARCHAR(30), - hired DATE NOT NULL DEFAULT '1970-01-01', - separated DATE DEFAULT '9999-12-31', - job_code INT, - store_id INT, - PRIMARY KEY (`id`,`hired`) CLUSTERED, - KEY `idx_employees2_on_store_id` (`store_id`) -) -PARTITION BY RANGE ( YEAR(hired) ) ( - PARTITION p0 VALUES LESS THAN (1991), - PARTITION p1 VALUES LESS THAN (1996), - PARTITION p2 VALUES LESS THAN (2001), - PARTITION p3 VALUES LESS THAN (2006) -); -``` - -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. - -```sql -ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; -``` - -**Determining split boundaries based on existing business data** - -To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. - -**Identify the minimum and maximum values** from existing production data so that incoming writes are more likely to target different pre-allocated regions. Example query for existing data: - -```sql -SELECT MIN(id), MAX(id) FROM employees2; -``` - -- If the table is **new** and has no historical data, estimate the min/max values based on your business logic and expected data range. -- For **composite primary keys** or **composite indexes**, only the **leftmost column** needs to be considered when deciding split boundaries. -- If the leftmost column is a **string**, take string length and distribution into account to ensure even data spread. - -**Pre-split and scatter regions** - -A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. - -**Splitting regions for all partitions** - -```sql -SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; -``` - -**Splitting regions for the secondary index of all partitions.** - -```sql -SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; -``` - -**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** - -```sql -ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - -show table employees2 PARTITION (p4) regions; - -SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; - -SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - -show table employees2 PARTITION (p4) regions; -``` - -**3. CLUSTERED Non-partitioned Table** - -**Pros:** - -- **No hotspot risk from new partitions**. -- Provides **good read performance** for point and range queries. - -**Cons:** - -- **Cannot use DROP PARTITION** to clean up large volumes of old data. - -**Recommendation:** - -- Best suited for use cases that require stable performance and do not benefit from partition-based data management. - - -## Converting Between Partitioned and Non-Partitioned Tables - -When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: - -1. Batch DML: `INSERT INTO ... SELECT ...` -2. Pipeline DML: `INSERT INTO ... SELECT ...` -3. `IMPORT INTO`: `IMPORT INTO ... FROM SELECT ...` -4. Online DDL: Direct schema transformation via `ALTER TABLE` - -This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. - -#### Table Schema: `fa` - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin -PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), -... -... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); -``` - - -#### Table Schema: `fa_new` - -```sql -CREATE TABLE `fa` ( - `id` bigint NOT NULL AUTO_INCREMENT, - `account_id` bigint(20) NOT NULL, - `sid` bigint(20) DEFAULT NULL, - `user_id` bigint NOT NULL, - `date` int NOT NULL, - PRIMARY KEY (`id`,`date`) /*T![clustered_index] CLUSTERED */, - KEY `index_fa_on_sid` (`sid`), - KEY `index_fa_on_account_id` (`account_id`), - KEY `index_fa_on_user_id` (`user_id`) -) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin; -``` - -#### Description - -These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. - -### Method 1: Batch DML INSERT INTO ... SELECT ... - -```sql -SET tidb_mem_quota_query = 0; -INSERT INTO fa_new SELECT * FROM fa; --- 120 million rows copied in 1h 52m 47s -``` - - -### Method 2: Pipeline DML INSERT INTO ... SELECT ... - -```sql -SET tidb_dml_type = "bulk"; -SET tidb_mem_quota_query = 0; -SET tidb_enable_mutation_checker = OFF; -INSERT INTO fa_new SELECT * FROM fa; --- 120 million rows copied in 58m 42s -``` - -### Method 3: IMPORT INTO ... FROM SELECT ... - -```sql -mysql> import into fa_new from select * from fa with thread=32,disable_precheck; -Query OK, 120000000 rows affected, 1 warning (16 min 49.90 sec) -Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 -``` - -### Method 4: Online DDL - -**From partition table to non-partitioned table** - -```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; -alter table fa REMOVE PARTITIONING; --- real 170m12.024s (≈ 2h 50m) -``` - -**From non-partition table to partitioned table** - -```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; -ALTER TABLE fa PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -... -PARTITION `fa_2024365` VALUES LESS THAN (2024365), -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); - -Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) -``` - -### Findings - -| Method | Time Taken | -|---|---| -| Method 1: Batch DML INSERT INTO ... SELECT | 1h 52m 47s | -| Method 2: Pipeline DML: INSERT INTO ... SELECT ... | 58m 42s | -| Method 3: IMPORT INTO ... FROM SELECT ... | 16m 59s | -| Method 4: Online DDL (From partition table to non-partitioned table) | 2h 50m | -| Method 4: Online DDL (From non-partition table to partitioned table) | 2h 31m | - -### Recommendation - -TiDB offers two approaches for converting tables between partitioned and non-partitioned states: - -Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file From a294efbe470a322bd30a60e3434ec8b8329405de Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 18:41:53 +0800 Subject: [PATCH 42/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index e9fda6b2982a6..3836e11fc3ce8 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -215,10 +215,11 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### Findings **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** @@ -227,7 +228,7 @@ To compare the performance of TTL and partition drop, we configured TTL to execu #### How to Use TTL and Partition Drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at https://docs.pingcap.com/tidb/stable/time-to-live/. +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . **TTL schema** @@ -419,7 +420,6 @@ However, if the initial write traffic to this new partition is **very high**, th This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. - ### Summary Table | Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | @@ -629,7 +629,6 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. - ## Converting Between Partitioned and Non-Partitioned Tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: @@ -664,7 +663,6 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` - #### Table Schema: `fa_new` ```sql @@ -693,7 +691,6 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` - ### Method 2: Pipeline DML INSERT INTO ... SELECT ... ```sql From 50ac5742efc5e510b9de0a8f3dae423c4f52e98f Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 20:10:17 +0800 Subject: [PATCH 43/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index 3836e11fc3ce8..bf25738174702 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -216,10 +216,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. + - On a write-heavy table, TTL runs every 10 minutes. + - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. + - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. + - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** @@ -640,7 +640,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -#### Table Schema: `fa` +### Table Schema: `fa` ```sql CREATE TABLE `fa` ( @@ -663,7 +663,7 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` -#### Table Schema: `fa_new` +### Table Schema: `fa_new` ```sql CREATE TABLE `fa` ( @@ -683,7 +683,7 @@ CREATE TABLE `fa` ( These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. -### Method 1: Batch DML INSERT INTO ... SELECT ... +### Method 1: Batch DML INSERT INTO ... SELECT ```sql SET tidb_mem_quota_query = 0; @@ -691,7 +691,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT ... +### Method 2: Pipeline DML INSERT INTO ... SELECT ```sql SET tidb_dml_type = "bulk"; @@ -701,7 +701,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 58m 42s ``` -### Method 3: IMPORT INTO ... FROM SELECT ... +### Method 3: IMPORT INTO ... FROM SELECT ```sql mysql> import into fa_new from select * from fa with thread=32,disable_precheck; From 9f2cccfe8d19eba2a67c3dc10a8ebe327619f242 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 14 Oct 2025 20:17:24 +0800 Subject: [PATCH 44/55] Update tidb-partitioned-tables-guide.md --- tidb-partitioned-tables-guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tidb-partitioned-tables-guide.md b/tidb-partitioned-tables-guide.md index bf25738174702..62a1813011db9 100644 --- a/tidb-partitioned-tables-guide.md +++ b/tidb-partitioned-tables-guide.md @@ -216,10 +216,10 @@ To compare the performance of TTL and partition drop, we configured TTL to execu **TTL Performance:** - - On a write-heavy table, TTL runs every 10 minutes. - - With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. - - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. +- On a write-heavy table, TTL runs every 10 minutes. +- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. +- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. +- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. **Partition Drop Performance:** From cd5c1d58540178b5a53705b584e86ec9ba15bc37 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 11:01:01 +0800 Subject: [PATCH 45/55] Move TiDB partitioned tables guide to best practices Renamed and relocated 'tidb-partitioned-tables-guide.md' to 'best-practices/tidb-partitioned-tables-guide.md'. Updated TOC.md to reference the new location and added front matter and editorial improvements to the guide for clarity and consistency. --- TOC.md | 1 + .../tidb-partitioned-tables-guide.md | 179 +++++++++--------- 2 files changed, 92 insertions(+), 88 deletions(-) rename tidb-partitioned-tables-guide.md => best-practices/tidb-partitioned-tables-guide.md (83%) diff --git a/TOC.md b/TOC.md index 3aba0440ea012..fc608950dc4d9 100644 --- a/TOC.md +++ b/TOC.md @@ -438,6 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) + - [Best Practices for Using TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) diff --git a/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md similarity index 83% rename from tidb-partitioned-tables-guide.md rename to best-practices/tidb-partitioned-tables-guide.md index 62a1813011db9..a80adf0160f14 100644 --- a/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -1,9 +1,12 @@ +--- +title: Best Practices for Using TiDB Partitioned Tables +summary: Learn best practices for using TiDB partitioned tables to improve performance, simplify data management, and handle large-scale datasets efficiently. +--- + # Best Practices for Using TiDB Partitioned Tables This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -## Introduction - Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. @@ -14,15 +17,17 @@ While partitioning offers clear benefits, it also presents **common challenges** This document examines partitioned tables in TiDB from multiple angles, including query optimization, data cleanup, write scalability, and index management. Through detailed scenarios and best practices, it provides practical guidance on optimizing partitioned table design and performance tuning in TiDB. -> **Note:** To get started with the fundamentals, refer to the [Partitioned Table User Guide](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. +> **Note:** +> +> To get started with the fundamentals, refer to [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. -## Improving query efficiency +## Improve query efficiency -### Partition Pruning +### Partition pruning **Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. -#### Applicable Scenarios +#### Applicable scenarios Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: @@ -32,19 +37,19 @@ Partition pruning is most beneficial in scenarios where query predicates match t For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). -### Query Performance on Secondary Index: Non-Partitioned Table vs. Local Index vs. Global Index +### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. -#### What Did We Test +#### What did we test We evaluated query performance across three table configurations in TiDB: -- Non-Partitioned Table -- Partitioned Table with Global Index -- Partitioned Table with Local Index +- Non-partitioned tables +- Partitioned tables with global indexes +- Partitioned tables with local indexes -#### Test Setup +#### Test setup - The query **accesses data via a secondary index** and uses IN conditions across multiple values. - The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. @@ -94,17 +99,17 @@ WHERE `fa`.`sid` IN ( #### Findings -Data came from a table with **366 range partitions** (for example, by date). +Data comes from a table with **366 range partitions** (for example, by date). -- The **Average Query Time** was obtained from the `statement_summary` view. -- The query used a **secondary index** and returned **400 rows**. +- The **Average Query Time** is obtained from the `statement_summary` view. +- The query uses a **secondary index** and returns **400 rows**. Metrics collected: - **Average Query Time**: from `statement_summary` - **Cop Tasks** (Index Scan + Table Lookup): from execution plan -#### Test Results +#### Test results | Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | |---|---|---|---|---|---| @@ -112,7 +117,7 @@ Metrics collected: | Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | | Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | -#### Execution Plan Examples +#### Execution plan examples **Non-partitioned table** @@ -124,7 +129,7 @@ Metrics collected: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -**Partition table with global index** +**Partition tables with global indexes** ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -134,7 +139,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -**Partition table with local index** +**Partition tables with local indexes** ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -146,9 +151,9 @@ Metrics collected: [Similar detailed execution plans for partitioned tables with global and local indexes would follow...] -#### How to Create a Global Index on a Partitioned Table in TiDB +#### How to create a global index on a partitioned table in TiDB -**Option 1: Add via ALTER TABLE** +**Option 1: add via ALTER TABLE** ```sql ALTER TABLE @@ -163,7 +168,7 @@ Adds a global index to an existing partitioned table. - Available starting from v9.0.0-beta.1 - Expected to be included in the next LTS release -**Option 2: Define Inline on Table Creation** +**Option 2: Define inline on table creation** ```sql CREATE TABLE t ( @@ -190,27 +195,27 @@ The performance overhead of partitioned tables in TiDB depends significantly on - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). -#### Recommendation +#### Recommendations - Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. - If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. - Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. -## Facilitating Bulk Data Deletion +## Facilitate bulk data deletion -### Data Cleanup Efficiency: TTL vs. Direct Partition Drop +### Data cleanup efficiency: TTL vs. direct partition drop -In TiDB, historical data cleanup can be handled either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. +In TiDB, you can clear up historical data either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. #### What's the difference? -- **TTL**: Automatically removes data based on its age, but may be slower due to the need to scan and clean data over time. -- **Partition Drop**: Deletes an entire partition at once, making it much faster, especially when dealing with large datasets. +- **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. +- **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. -#### What Did We Test +#### What did we test -To compare the performance of TTL and partition drop, we configured TTL to execute every 10 minutes and created a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches were tested under background write loads of 50 and 100 concurrent threads. We measured key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and partition drop, we configure TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. We measure key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings @@ -221,14 +226,14 @@ To compare the performance of TTL and partition drop, we configured TTL to execu - With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. - TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. -**Partition Drop Performance:** +**Partition drop performance:** - DROP PARTITION removes an entire data segment instantly, with minimal resource usage. - DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. -#### How to Use TTL and Partition Drop in TiDB +#### How to use TTL and partition drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), please refer to the official documentation at [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . **TTL schema** @@ -275,33 +280,37 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -It's required to run DDL alter table partition ... to change the FIRST PARTITION and LAST PARTITION periodically. These two DDL statements can drop the old partitions and create new ones. +It is required to run `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); ``` -#### Recommendation +#### Recommendations -For workloads with **large or time-based data cleanup**, prefer using **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup but may not be optimal under high write pressure or when deleting large volumes of data quickly. +For workloads with **large or time-based data cleanup**, it is recommended to use **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. -### Partition Drop Efficiency: Local Index vs Global Index +TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. -Partition table with Global Index requires synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION. In this section, the tests show that DROP PARTITION is much slower when using a **Global Index** compared to a **Local Index**. This should be considered when designing partitioned tables. +### Partition drop efficiency: local index vs. global index -#### What Did We Test +Partition tables with global indexes require synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION`. -We created a table with **366 partitions** and tested the DROP PARTITION performance using both **Global Index** and **Local Index**. The total number of rows was **1 billion**. +In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. -| Index Type | Duration (drop partition) | -|---|---| -| Global Index | 1 min 16.02 s | -| Local Index | 0.52 s | +#### What did we test + +Create a table with **366 partitions** and test the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is **1 billion**. + +| Index Type | Duration (drop partition) | +|--------------|---------------------------| +| Global Index | 1 min 16.02 s | +| Local Index | 0.52 s | #### Findings -Dropping a partition on a table with a Global Index took **76 seconds**, while the same operation with a Local Index took only **0.52 seconds**. The reason is that Global Indexes span all partitions and require more complex updates, while Local Indexes are limited to individual partitions and are easier to handle. +Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes are limited to individual partitions and are easier to handle. **Global Index** @@ -309,15 +318,13 @@ Dropping a partition on a table with a Global Index took **76 seconds**, while t ALTER TABLE A DROP PARTITION A_2024363; ``` -#### Recommendation - -When a partitioned table contains global indexes, performing certain DDL operations such as DROP PARTITION, TRUNCATE PARTITION, or REORG PARTITION requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +#### Recommendations -If you need to drop partitions frequently and minimize the performance impact on the system, it's better to use **local indexes** for faster and more efficient operations. +When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. -## Mitigating Write Hotspot Issues +If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. -### Background +## Mitigate write hotspot issues In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. @@ -329,7 +336,7 @@ This is common when the primary key is **monotonically increasing**—for exampl **Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. -### How It Works +### How it works TiDB stores table data in **Regions**, each covering a continuous range of row keys. @@ -346,9 +353,9 @@ When the primary key is AUTO_INCREMENT and the secondary indexes on datetime col - Each partition has its own set of Regions, often distributed across different TiKV nodes. - Inserts are spread across multiple Regions in parallel, improving load distribution and throughput. -### Use Case +### Use cases -If a table with an AUTO_INCREMENT primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. +If a table with an [`AUTO_INCREMENT`](/auto-increment.md) primary key experiences heavy bulk inserts and suffers from write hotspot issues, applying **hash** or **key** partitioning on the primary key can help distribute the write load more evenly. ```sql CREATE TABLE server_info ( @@ -379,39 +386,39 @@ When converting a non-partitioned table to a partitioned table, TiDB creates a s SELECT * FROM server_info WHERE `serial_no` = ?; ``` -**Mitigation**: Add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down DROP PARTITION operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: +**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; ``` -## Partition Management Challenge +## Partition management challenges -### How to Avoid Hotspots Caused by New Range Partitions +### How to avoid Hotspots caused by new range partitions #### Overview New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. -#### Common Hotspot Scenarios +#### Common hotspot scenarios -**Read Hotspot** +**Read hotspot** When using **range-partitioned tables**, if queries do **not** filter data using the partition key, new empty partitions can easily become read hotspots. -**Root Cause:** +**Root cause:** By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. -**Impact:** +**impact:** When a query does **not filter by partition key**, TiDB will **scan all partitions** (as seen in the execution plan partition:all). As a result, the single region holding multiple empty partitions will be scanned repeatedly, leading to a **read hotspot**. -**Write Hotspot** +**Write hotspot** When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: -**Root Cause:** +**Root cause:** In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. @@ -430,24 +437,24 @@ This imbalance can cause that TiKV node to trigger **flow control**, leading to #### Solutions -**1. NONCLUSTERED Partitioned Table** +**1. NONCLUSTERED partitioned table** **Pros:** -- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with SHARD_ROW_ID_BITS and [PRE_SPLIT_REGIONS](https://docs.pingcap.com/tidb/stable/sql-statement-split-region/#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. +- When a new partition is created in a **NONCLUSTERED Partitioned Table** configured with `SHARD_ROW_ID_BITS` and [PRE_SPLIT_REGIONS](/sql-statements//sql-statement-split-region.md#pre_split_regions), the regions can be **automatically pre-split**, significantly reducing manual intervention. - Lower operational overhead. **Cons:** - Queries using **Point Get** or **Table Range Scan** will require **more table lookups**, which can degrade read performance for such query types. -**Recommendation:** +**Recommendation** - Suitable for workloads where write scalability and operational ease are more critical than low-latency reads. -**Best Practices** +**Best practices** -Create a partitioned table with SHARD_ROW_ID_BITS and PRE_SPLIT_REGIONS to pre-split table regions. The value of PRE_SPLIT_REGIONS must be less than or equal to that of SHARD_ROW_ID_BITS. The number of pre-split Regions for each partition is 2^(PRE_SPLIT_REGIONS). +Create a partitioned table with `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` to pre-split table regions. The value of `PRE_SPLIT_REGIONS` must be less than or equal to that of `SHARD_ROW_ID_BITS`. The number of pre-split Regions for each partition is `2^(PRE_SPLIT_REGIONS)`. ```sql CREATE TABLE employees ( @@ -469,7 +476,7 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql -- table @@ -536,11 +543,11 @@ SHOW TABLE employees PARTITION (p4) regions; - **Manual region splitting** is required when creating new partitions, increasing operational complexity. -**Recommendation:** +**Recommendation** - Ideal when low-latency point queries are important and operational resources are available to manage region splitting. -**Best Practices** +**Best practices** Create a CLUSTERED partitioned table. @@ -564,13 +571,13 @@ PARTITION BY RANGE ( YEAR(hired) ) ( ); ``` -Adding the [merge_option=deny](https://docs.pingcap.com/tidb/stable/table-attributes/#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. +Adding the [`merge_option=deny`](/table-attributes.md#control-the-region-merge-behavior-using-table-attributes) attribute to a table or partition can prevent the merging of empty regions. However, when a partition is dropped, the regions belonging to that partition will still be merged automatically. ```sql ALTER TABLE employees2 ATTRIBUTES 'merge_option=deny'; ``` -**Determining split boundaries based on existing business data** +**Determine split boundaries based on existing business data** To avoid hotspots when a new table or partition is created, it is often beneficial to **pre-split** regions before heavy writes begin. To make pre-splitting effective, configure the **lower and upper boundaries** for region splitting based on the **actual business data distribution**. Avoid setting excessively wide boundaries, as this can result in real data not being effectively distributed across TiKV nodes, defeating the purpose of pre-splitting. @@ -588,29 +595,25 @@ SELECT MIN(id), MAX(id) FROM employees2; A common practice is to split the number of regions to **match** the number of TiKV nodes, or to be **twice** the number of TiKV nodes. This helps ensure that data is more evenly distributed across the cluster from the start. -**Splitting regions for all partitions** +**Split Regions for all partitions.** ```sql SPLIT PARTITION TABLE employees2 BETWEEN (1,"1970-01-01") AND (100000,"9999-12-31") REGIONS ; ``` -**Splitting regions for the secondary index of all partitions.** +**Split Regions for the secondary index of all partitions.** ```sql SPLIT PARTITION TABLE employees2 INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; ``` -**(Optional) When adding a new partition, you MUST manually split regions for the specific partition and its indices.** +**(Optional) When adding a new partition, you MUST manually split Regions for the specific partition and its indexes.** ```sql ALTER TABLE employees2 ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); - show table employees2 PARTITION (p4) regions; - SPLIT PARTITION TABLE employees2 PARTITION (p4) BETWEEN (1,"2006-01-01") AND (100000,"2011-01-01") REGIONS ; - SPLIT PARTITION TABLE employees2 PARTITION (p4) INDEX `idx_employees2_on_store_id` BETWEEN (1) AND (1000) REGIONS ; - show table employees2 PARTITION (p4) regions; ``` @@ -629,7 +632,7 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -## Converting Between Partitioned and Non-Partitioned Tables +## Converte between partitioned and non-partitioned tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: @@ -640,7 +643,7 @@ When working with large tables (for example in this example 120 million rows), t This section compares the efficiency and implications of these methods in both directions of conversion, and provides best practice recommendations. -### Table Schema: `fa` +### Table schema: `fa` ```sql CREATE TABLE `fa` ( @@ -663,7 +666,7 @@ PARTITION `fa_2024003` VALUES LESS THAN (2024003), PARTITION `fa_2024366` VALUES LESS THAN (2024366)); ``` -### Table Schema: `fa_new` +### Table schema: `fa_new` ```sql CREATE TABLE `fa` ( @@ -683,7 +686,7 @@ CREATE TABLE `fa` ( These examples show converting a partitioned table to a non-partitioned table, but the same methods also work for converting a non-partitioned table to a partitioned table. -### Method 1: Batch DML INSERT INTO ... SELECT +### Method 1: Batch DML `INSERT INTO ... SELECT` ```sql SET tidb_mem_quota_query = 0; @@ -691,7 +694,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 1h 52m 47s ``` -### Method 2: Pipeline DML INSERT INTO ... SELECT +### Method 2: Pipeline DML `INSERT INTO ... SELECT` ```sql SET tidb_dml_type = "bulk"; @@ -701,7 +704,7 @@ INSERT INTO fa_new SELECT * FROM fa; -- 120 million rows copied in 58m 42s ``` -### Method 3: IMPORT INTO ... FROM SELECT +### Method 3: `IMPORT INTO ... FROM SELECT` ```sql mysql> import into fa_new from select * from fa with thread=32,disable_precheck; @@ -745,8 +748,8 @@ Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) | Method 4: Online DDL (From partition table to non-partitioned table) | 2 h 50 m | | Method 4: Online DDL (From non-partition table to partitioned table) | 2 h 31 m | -### Recommendation +### Recommendations TiDB offers two approaches for converting tables between partitioned and non-partitioned states: -Choose an offline method like [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. \ No newline at end of file +Choose an offline method such as [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) when your system can accommodate a maintenance window, as it delivers much better performance. Use online DDL only when zero downtime is a strict requirement. From 1c926684e8343d630596e75507181b7d6b3f195a Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 11:30:35 +0800 Subject: [PATCH 46/55] Update tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index a80adf0160f14..ed5efc57c2221 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -7,11 +7,11 @@ summary: Learn best practices for using TiDB partitioned tables to improve perfo This guide introduces how to use partitioned tables in TiDB to improve performance, simplify data management, and handle large-scale datasets efficiently. -Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in OLAP workloads with massive datasets. +Partitioned tables in TiDB offer a versatile approach to managing large datasets, improving query efficiency, facilitating bulk data deletion, and alleviating write hotspot issues. By dividing data into logical segments, TiDB can leverage **partition pruning** to skip irrelevant data during query execution, reducing resource consumption and accelerating performance—particularly in Online Analytical Processing (OLAP) workloads with massive datasets. -A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations like [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning—such as those lacking partition key filters—may experience degraded performance. In such cases, [**global indexes**](https://docs.pingcap.com/tidb/stable/partitioned-table/#global-indexes) can be introduced to mitigate the performance impact by providing a unified index structure across all partitions. +A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [AUTO_INCREMENT-style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -149,7 +149,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.49 | 400 | cop[tikv] | table:fa | time:514ms, loops:434, cop_task:{num:375, max:31.6ms, min:0s, avg:1.33ms, p95:1.67ms, max_proc_keys:2, p95_proc_keys:2, tot_proc:220.7ms, tot_wait:242.2ms, copr_cache_hit_ratio:0.00, build_task_duration:27.8ms, max_distsql_concurrency:1, max_extra_concurrency:1, store_batch_num:69}, rpc_info:{Cop:{num_rpc:306, total_time:495.5ms}}, tikv_task:{proc max:6ms, min:0s, avg:597.3µs, p80:1ms, p95:1ms, iters:375, tasks:375}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:158.3ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:3197, read_count:803, read_byte:10.2 MB, read_time:113.5ms}}}, time_detail:{total_process_time:220.7ms, total_suspend_time:5.39ms, total_wait_time:242.2ms, total_kv_read_wall_time:224ms, tikv_wall_time:430.5ms}} | keep order:false, stats:partial[...] | N/A | N/A | ``` -[Similar detailed execution plans for partitioned tables with global and local indexes would follow...] +The following sections describe similar detailed execution plans for partitioned tables with global and local indexes. #### How to create a global index on a partitioned table in TiDB From 02ffcb97318b9ab2a1b285d5649a986063fd27fd Mon Sep 17 00:00:00 2001 From: xixirangrang Date: Wed, 15 Oct 2025 15:16:18 +0800 Subject: [PATCH 47/55] Update best-practices/tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index ed5efc57c2221..0e7150aa8b2e3 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -419,6 +419,7 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: **Root cause:** + In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. From fef2247ae94cc5c4a183cbcdb03893f8a923992f Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 15 Oct 2025 17:18:56 +0800 Subject: [PATCH 48/55] Update TOC.md --- TOC.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TOC.md b/TOC.md index fc608950dc4d9..065047a1034da 100644 --- a/TOC.md +++ b/TOC.md @@ -438,7 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) - - [Best Practices for Using TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) + - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) From b054380a8f71bf59182a8b83741a99c9c53ad284 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 16 Oct 2025 14:31:44 +0800 Subject: [PATCH 49/55] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 111 +++++++++--------- 1 file changed, 58 insertions(+), 53 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 0e7150aa8b2e3..61970d4a7e4c1 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -11,7 +11,7 @@ Partitioned tables in TiDB offer a versatile approach to managing large datasets A common use case is **range partitioning combined with local indexes**, which enables efficient historical data cleanup through operations such as [`ALTER TABLE ... DROP PARTITION`](/sql-statements/sql-statement-alter-table.md). This method not only removes obsolete data almost instantly but also retains high query efficiency when filtering by the partition key. However, after migrating from non-partitioned to partitioned tables, queries that cannot benefit from partition pruning, such as those lacking partition key filters, might experience degraded performance. In such cases, you can use [**global indexes**](/partitioned-table.md#global-indexes) to mitigate the performance impact by providing a unified index structure across all partitions. -Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions may suffer performance drawbacks—again, a situation where global indexes can help. +Another frequent scenario is using **hash or key partitioning** to address write hotspot issues, especially in workloads relying on [`AUTO_INCREMENT` style IDs](/auto-increment.md) where sequential inserts can overload specific TiKV regions. Distributing writes across partitions helps balance load, but similar to range partitioning, queries without partition-pruning conditions might suffer performance drawbacks again, a situation where global indexes can help. While partitioning offers clear benefits, it also presents **common challenges**, such as **hotspots caused by newly created range partitions**. To address this, TiDB provides techniques for automatic or manual region pre-splitting, ensuring balanced data distribution and avoiding bottlenecks. @@ -19,13 +19,13 @@ This document examines partitioned tables in TiDB from multiple angles, includin > **Note:** > -> To get started with the fundamentals, refer to [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. +> To get started with the fundamentals, see [Partitioning](/partitioned-table.md), which explains key concepts such as partition pruning, index types, and partitioning methods. ## Improve query efficiency ### Partition pruning -**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions may contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. +**Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. #### Applicable scenarios @@ -41,7 +41,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. -#### What did we test +#### Types of tables to be tested We evaluated query performance across three table configurations in TiDB: @@ -51,13 +51,15 @@ We evaluated query performance across three table configurations in TiDB: #### Test setup -- The query **accesses data via a secondary index** and uses IN conditions across multiple values. -- The **partitioned table** had **366 partitions**, defined by **range partitioning on a datetime column**. -- Each matching key could return **multiple rows**, simulating a **high-volume OLTP-style query pattern**. -- We also evaluated the **impact of different partition counts** to understand how partition granularity influences latency and index performance. +- The query accesses data via a secondary index and uses `IN` conditions across multiple values. +- The partitioned table contains 366 partitions, defined by range partitioning on a datetime column. +- Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. +- The **impact of different partition counts** is also evaluated to understand how partition granularity influences latency and index performance. #### Schema +The following schema is used in the example. + ```sql CREATE TABLE `fa` ( `id` bigint NOT NULL AUTO_INCREMENT, @@ -81,6 +83,8 @@ PARTITION `fa_2024366` VALUES LESS THAN (2024366)); #### SQL +The following SQL statement is used in the example. + ```sql SELECT `fa`.* FROM `fa` @@ -93,33 +97,33 @@ WHERE `fa`.`sid` IN ( ); ``` -- Query filters on secondary index, but does **not include the partition key**. -- Causes **Local Index** to scan across all partitions due to lack of pruning. +- Query filters on secondary index, but does not include the partition key. +- Causes local indexes to scan across all partitions due to lack of pruning. - Table lookup tasks are significantly higher for partitioned tables. #### Findings -Data comes from a table with **366 range partitions** (for example, by date). +Data comes from a table with 366 range partitions (for example, by date). - The **Average Query Time** is obtained from the `statement_summary` view. -- The query uses a **secondary index** and returns **400 rows**. +- The query uses a secondary index and returns 400 rows. Metrics collected: - **Average Query Time**: from `statement_summary` -- **Cop Tasks** (Index Scan + Table Lookup): from execution plan +- **Cop Tasks** (Index Scan + Table Lookup): from the execution plan #### Test results | Configuration | Average Query Time | Cop task for index range scan | Cop task for table lookup | Total Cop tasks | Key Takeaways | |---|---|---|---|---|---| -| Non-Partitioned Table | 12.6 ms | 72 | 79 | 151 | Delivering the best performance with the fewest Cop tasks — ideal for most OLTP use cases. | -| Partitioned Table with Local Index | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries will scan all partitions. | -| Partitioned Table with Global Index | 14.8 ms | 69 | 383 | 452 | Improving index scan efficiency, but table lookups can still be expensive if many rows match. | +| Non-partitioned table | 12.6 ms | 72 | 79 | 151 | Provides the best performance with the fewest Cop tasks, which is ideal for most OLTP use cases. | +| Partitioned table with local indexes | 108 ms | 600 | 375 | 975 | When the partition key is not used in the query condition, local index queries scan all partitions. | +| Partitioned table with global indexes | 14.8 ms | 69 | 383 | 452 | It improves index scan efficiency, but table lookups can still take long time if many rows match. | #### Execution plan examples -**Non-partitioned table** +##### Non-partitioned table ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -129,7 +133,7 @@ Metrics collected: | TableRowIDScan_6(Probe) | 398.73 | 166072.78 | 400 | cop[tikv] | table:fa | time:7.01ms, loops:2, cop_task:{num:79, max:4.98ms, min:0s, avg:514.9µs, p95:3.75ms, max_proc_keys:10, p95_proc_keys:5, tot_proc:15ms, tot_wait:21.4ms, copr_cache_hit_ratio:0.00, build_task_duration:341.2µs, max_distsql_concurrency:1, max_extra_concurrency:7, store_batch_num:62}, rpc_info:{Cop:{num_rpc:17, total_time:40.5ms}}, tikv_task:{proc max:0s, min:0s, avg:0s, p80:0s, p95:0s, iters:79, tasks:79}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:20.8ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1600}}}, time_detail:{total_process_time:15ms, total_wait_time:21.4ms, tikv_wall_time:10.9ms} | keep order:false | N/A | N/A | ``` -**Partition tables with global indexes** +##### Partition tables with global indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -139,7 +143,7 @@ Metrics collected: | TableRowIDScan_6(Probe)| 398.73 | 165221.64 | 400 | cop[tikv] | table:fa | time:7.47ms, loops:2, cop_task:{num:383, max:4.07ms, min:0s, avg:488.5µs, p95:2.59ms, max_proc_keys:2, p95_proc_keys:1, tot_proc:203.3ms, tot_wait:429.5ms, copr_cache_hit_ratio:0.00, build_task_duration:1.3ms, max_distsql_concurrency:1, max_extra_concurrency:31, store_batch_num:305}, rpc_info:{Cop:{num_rpc:78, total_time:186.3ms}}, tikv_task:{proc max:3ms, min:0s, avg:517µs, p80:1ms, p95:1ms, iters:383, tasks:383}, scan_detail:{total_process_keys:400, total_process_keys_size:489856, total_keys:800, get_snapshot_time:2.99ms, rocksdb:{key_skipped_count:400, block:{cache_hit_count:1601, read_count:799, read_byte:10.1 MB, read_time:131.6ms}}}, time_detail:{total_process_time:203.3ms, total_suspend_time:6.31ms, total_wait_time:429.5ms, total_kv_read_wall_time:198ms, tikv_wall_time:163ms} | keep order:false, stats:partial[...] | N/A | N/A | ``` -**Partition tables with local indexes** +##### Partition tables with local indexes ``` | id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk | @@ -151,24 +155,28 @@ Metrics collected: The following sections describe similar detailed execution plans for partitioned tables with global and local indexes. -#### How to create a global index on a partitioned table in TiDB +#### Create a global index on a partitioned table in TiDB + + +You can define inline when creating a table to create a global index. ```sql CREATE TABLE t ( @@ -190,52 +198,50 @@ PARTITION BY RANGE (id) ( The performance overhead of partitioned tables in TiDB depends significantly on the number of partitions and the type of index used. -- The more partitions you have, the more severe the potential performance degradation. -- With a smaller number of partitions, the impact may not be as noticeable, but it's still workload-dependent. +- The more partitions you have, the more severe the potential performance degrades. +- With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. - For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendations -- Avoid partitioned tables unless truly necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. +- Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. - If you must use partitioned tables, benchmark both global index and local index strategies under your workload. - Use global indexes when query performance across partitions is critical. -- Choose local indexes only if your main concern is DDL efficiency, such as fast DROP PARTITION, and the performance side effect from the partition table is acceptable. +- Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion -### Data cleanup efficiency: TTL vs. direct partition drop +In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual partition drop. While both methods serve the same purpose, they differ significantly in performance. The testcases in this section show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. -In TiDB, you can clear up historical data either by **TTL (Time-to-Live)** or **manual partition drop**. While both methods serve the same purpose, they differ significantly in performance. Our tests show that dropping partitions is generally faster and less resource-intensive, making it a better choice for large datasets and frequent purging needs. - -#### What's the difference? +#### Differences between TTL and partition drop - **TTL**: automatically removes data based on its age, but might be slower due to the need to scan and clean data over time. - **Partition Drop**: deletes an entire partition at once, making it much faster, especially when dealing with large datasets. -#### What did we test +#### Test case -To compare the performance of TTL and partition drop, we configure TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. We measure key metrics such as execution time, system resource utilization, and the total number of rows deleted. +To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings **TTL Performance:** - On a write-heavy table, TTL runs every 10 minutes. -- With 50 threads, each TTL job took 8—10 minutes, deleted 7—11 million rows. -- With 100 threads, it handled up to 20 million rows, but execution time increased to 15—30 minutes, with greater variance. -- TTL jobs impacted system performance under high load due to extra scanning and deletion activity, reducing overall QPS. +- With 50 threads, each TTL job takes 8 to 10 minutes, deleted 7 to 11 million rows. +- With 100 threads, it handles up to 20 million rows, but the execution time increases to 15 to 30 minutes, with greater variance. +- TTL jobs impact system performance under high workloads due to extra scanning and deletion activity, reducing overall QPS. **Partition drop performance:** -- DROP PARTITION removes an entire data segment instantly, with minimal resource usage. -- DROP PARTITION is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### How to use TTL and partition drop in TiDB -In this experiment, the table structures have been anonymized. For more detailed information on the usage of TTL (Time To Live), see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . +In this test case, the table structures have been anonymized. For more detailed information on the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . -**TTL schema** +The following is the TTL schema. ```sql CREATE TABLE `ad_cache` ( @@ -254,7 +260,7 @@ TTL=`expire_time` + INTERVAL 0 DAY TTL_ENABLE='ON' TTL_JOB_INTERVAL='10m'; ``` -**Drop Partition (Range INTERVAL partitioning)** +The following is the SQL statement for dropping partitions (Range INTERVAL partitioning). ```sql CREATE TABLE `ad_cache` ( @@ -289,7 +295,7 @@ ALTER TABLE ad_cache LAST PARTITION LESS THAN ("${nextTimestamp}"); #### Recommendations -For workloads with **large or time-based data cleanup**, it is recommended to use **partitioned tables with DROP PARTITION**. It offers better performance, lower system impact, and simpler management. +For workloads with large or time-based data cleanup, it is recommended to use partitioned tables with DROP PARTITION. It offers better performance, lower system impact, and simpler management. TTL is still useful for finer-grained or background cleanup, but might not be optimal under high write pressure or when deleting large volumes of data quickly. @@ -299,9 +305,9 @@ Partition tables with global indexes require synchronous updates to the global i In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. -#### What did we test +#### Test case -Create a table with **366 partitions** and test the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is **1 billion**. +This test case creates a table with 366 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. | Index Type | Duration (drop partition) | |--------------|---------------------------| @@ -380,7 +386,7 @@ PARTITION BY KEY (id) PARTITIONS 16; **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; @@ -408,7 +414,7 @@ When using **range-partitioned tables**, if queries do **not** filter data using **Root cause:** -By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions may be merged into a **single region**. +By default, TiDB creates an empty region for each partition when the table is created. If no data is written for a while, multiple empty partitions' regions might be merged into a **single region**. **impact:** @@ -416,17 +422,16 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio **Write hotspot** -When using a time-based field as the partition key, a write hotspot may occur when switching to a new partition: +When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** - In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. -However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it may not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. +However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. **Impact:** -This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn may impact the overall read and write performance of the cluster. +This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. ### Summary Table From a7185a51dc7c53f5eb16a87553e6c415ee047c95 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 20 Oct 2025 14:58:37 +0800 Subject: [PATCH 50/55] Update tidb-partitioned-tables-guide.md --- best-practices/tidb-partitioned-tables-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 61970d4a7e4c1..3d93dd5880c56 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -200,7 +200,7 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degrades. - With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs (Remote Procedure Call) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. - For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). #### Recommendations @@ -225,7 +225,7 @@ To compare the performance of TTL and partition drop, the test case in this sect #### Findings -**TTL Performance:** +**TTL performance:** - On a write-heavy table, TTL runs every 10 minutes. - With 50 threads, each TTL job takes 8 to 10 minutes, deleted 7 to 11 million rows. @@ -237,7 +237,7 @@ To compare the performance of TTL and partition drop, the test case in this sect - `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. - `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. -#### How to use TTL and partition drop in TiDB +#### Use TTL and partition drop in TiDB In this test case, the table structures have been anonymized. For more detailed information on the usage of TTL, see [Periodically Delete Data Using TTL (Time to Live)](/time-to-live.md) . From 3bfa654be2b2a5efa2fd078fbf5a43a987205f56 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Mon, 20 Oct 2025 17:20:25 +0800 Subject: [PATCH 51/55] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 71 +++++++++---------- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index 3d93dd5880c56..d157d5598c3c7 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -39,7 +39,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. Global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. +In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### Types of tables to be tested @@ -51,8 +51,7 @@ We evaluated query performance across three table configurations in TiDB: #### Test setup -- The query accesses data via a secondary index and uses `IN` conditions across multiple values. -- The partitioned table contains 366 partitions, defined by range partitioning on a datetime column. +- The **partitioned table** had **365 partitions**, defined by **range partitioning on a date column**. - Each matching key returns multiple rows, simulating a high-volume OLTP-style query pattern. - The **impact of different partition counts** is also evaluated to understand how partition granularity influences latency and index performance. @@ -73,12 +72,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), +PARTITION `fa_2024003` VALUES LESS THAN (2025003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` #### SQL @@ -98,12 +97,12 @@ WHERE `fa`.`sid` IN ( ``` - Query filters on secondary index, but does not include the partition key. -- Causes local indexes to scan across all partitions due to lack of pruning. +- Causes local indexes key lookup for each partition due to lack of pruning. - Table lookup tasks are significantly higher for partitioned tables. #### Findings -Data comes from a table with 366 range partitions (for example, by date). +Data comes from a table with 365 range partitions (for example, by date). - The **Average Query Time** is obtained from the `statement_summary` view. - The query uses a secondary index and returns 400 rows. @@ -157,8 +156,6 @@ The following sections describe similar detailed execution plans for partitioned #### Create a global index on a partitioned table in TiDB - + You can define inline when creating a table to create a global index. ```sql @@ -200,14 +199,14 @@ The performance overhead of partitioned tables in TiDB depends significantly on - The more partitions you have, the more severe the potential performance degrades. - With a smaller number of partitions, the impact might not be as noticeable, but it is still workload-dependent. -- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of RPCs (Remote Procedure Call) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. -- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). +- For local indexes, if a query does not include effective partition pruning conditions, the number of partitions directly correlates with the number of [Remote Procedure Calls (RPCs)](https://docs.pingcap.com/tidb/stable/glossary/#remote-procedure-call-rpc) triggered. This means more partitions will likely result in more RPCs, leading to higher latency. +- For global indexes, the number of RPCs and the degree of performance regression depend on both the number of partitions involved and how many rows need to be retrieved (that is, the number of rows requiring table lookups). Note that for very large tables where data is already distributed across many Regions, accessing data through a global index may have similar performance to a non-partitioned table, as both scenarios require multiple cross-Region RPCs. #### Recommendations - Avoid partitioned tables unless necessary. For most OLTP workloads, a well-indexed non-partitioned table performs better and is easier to manage. -- If you must use partitioned tables, benchmark both global index and local index strategies under your workload. -- Use global indexes when query performance across partitions is critical. +- If you know all queries will make use of good partitioning pruning (matching only a few partitions) then local indexes are good +- If you know critical queries does not have good partitioning pruning (matching many partitions) then Global index is to recommend. - Use local indexes only if your main concern is DDL efficiency (such as fast `DROP PARTITION`) and the performance side effect from the partition table is acceptable. ## Facilitate bulk data deletion @@ -286,7 +285,7 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -It is required to run `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. +You are required to run DDL statements like `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); @@ -301,13 +300,13 @@ TTL is still useful for finer-grained or background cleanup, but might not be op ### Partition drop efficiency: local index vs. global index -Partition tables with global indexes require synchronous updates to the global index, potentially increasing significant execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION`. +A partitioned table with a global index requires synchronous updates to the global index, which can significantly increase the execution time for DDL operations, such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION`. -In this section, the tests show that `DROP PARTITION` is much slower when using a global index compared to a local index**. Take this into consideration when you design partitioned tables. +In this section, the tests show that `DROP PARTITION` is much slower when using a **global index** compared to a **local index**. This should be considered when you design partitioned tables. #### Test case -This test case creates a table with 366 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. +This test case creates a table with 365 partitions and tests the `DROP PARTITION` performance using both global indexes and local indexes. The total number of rows is 1 billion. | Index Type | Duration (drop partition) | |--------------|---------------------------| @@ -316,7 +315,7 @@ This test case creates a table with 366 partitions and tests the `DROP PARTITION #### Findings -Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes are limited to individual partitions and are easier to handle. +Dropping a partition on a table with a global index takes **76 seconds**, while the same operation with a local index takes only **0.52 seconds**. The reason is that global indexes span all partitions and require more complex updates, while local indexes can just be dropped together with the partition data. **Global Index** @@ -326,7 +325,7 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORG PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. @@ -386,7 +385,7 @@ PARTITION BY KEY (id) PARTITIONS 16; **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Region for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates a separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; @@ -525,7 +524,7 @@ This example will split each partition's primary key range into `; ``` -**(Optional) When adding a new partition, you MUST manually split regions for its primary key and indices.** +**(Optional) When adding a new partition, you should manually split regions for its primary key and indices.** ```sql ALTER TABLE employees ADD PARTITION (PARTITION p4 VALUES LESS THAN (2011)); @@ -664,12 +663,12 @@ CREATE TABLE `fa` ( KEY `index_fa_on_user_id` (`user_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), -PARTITION `fa_2024003` VALUES LESS THAN (2024003), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), +PARTITION `fa_2024003` VALUES LESS THAN (2025003), ... ... -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); ``` ### Table schema: `fa_new` @@ -723,8 +722,8 @@ Records: 120000000, ID: c1d04eec-fb49-49bb-af92-bf3d6e2d3d87 **From partition table to non-partitioned table** ```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; +SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; +SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; alter table fa REMOVE PARTITIONING; -- real 170m12.024 s (≈ 2 h 50 m) ``` @@ -732,14 +731,14 @@ alter table fa REMOVE PARTITIONING; **From non-partition table to partitioned table** ```sql -SET @@global.tidb_ddl_reorg_worker_cnt = 16; -SET @@global.tidb_ddl_reorg_batch_size = 4096; +SET @@global.tidb_ddl_REORGANIZE_worker_cnt = 16; +SET @@global.tidb_ddl_REORGANIZE_batch_size = 4096; ALTER TABLE fa PARTITION BY RANGE (`date`) -(PARTITION `fa_2024001` VALUES LESS THAN (2024001), -PARTITION `fa_2024002` VALUES LESS THAN (2024002), +(PARTITION `fa_2024001` VALUES LESS THAN (2025001), +PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... -PARTITION `fa_2024365` VALUES LESS THAN (2024365), -PARTITION `fa_2024366` VALUES LESS THAN (2024366)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365), +PARTITION `fa_2024365` VALUES LESS THAN (2025365ƒf)); Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ``` From 2fcc6d7eeeadd25ffcbd2fef545f57da80ff99f2 Mon Sep 17 00:00:00 2001 From: shaoxiqian Date: Tue, 21 Oct 2025 14:41:19 +0800 Subject: [PATCH 52/55] Update tidb-partitioned-tables-guide.md --- .../tidb-partitioned-tables-guide.md | 38 ++++++++++--------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-guide.md index d157d5598c3c7..37af510037928 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-guide.md @@ -223,6 +223,7 @@ In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings +> **Note**: The performance benefits described below apply to partitioned tables without global indexes. **TTL performance:** @@ -233,8 +234,8 @@ To compare the performance of TTL and partition drop, the test case in this sect **Partition drop performance:** -- `DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. -- `DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. +- `ALTER TABLE ... DROP PARTITION` removes an entire data segment instantly, with minimal resource usage. +- `ALTER TABLE ... DROP PARTITION` is a metadata-level operation, making it much faster and more predictable than TTL, especially when managing large volumes of historical data. #### Use TTL and partition drop in TiDB @@ -244,14 +245,14 @@ The following is the TTL schema. ```sql CREATE TABLE `ad_cache` ( - `session` varchar(255) NOT NULL, - `ad_id` varbinary(255) NOT NULL, + `session_id` varchar(255) NOT NULL, + `external_id` varbinary(255) NOT NULL, `create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP, - `suffix` bigint(20) NOT NULL, + `id_suffix` bigint(20) NOT NULL, `expire_time` timestamp NULL DEFAULT NULL, - `data` mediumblob DEFAULT NULL, - `version` int(11) DEFAULT NULL, - `is_delete` tinyint(1) DEFAULT NULL, + `cache_data` mediumblob DEFAULT NULL, + `data_version` int(11) DEFAULT NULL, + `is_deleted` tinyint(1) DEFAULT NULL, PRIMARY KEY (`session`, `ad_id`, `create_time`, `suffix`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin @@ -325,25 +326,26 @@ ALTER TABLE A DROP PARTITION A_2024363; #### Recommendations -When a partitioned table contains global indexes, performing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires synchronously updating the global index values. This can significantly increase the execution time of these DDL operations. +When a partitioned table contains global indexes, executing certain DDL operations such as `DROP PARTITION`, `TRUNCATE PARTITION`, or `REORGANIZE PARTITION` requires updating the global index entries to reflect the changes. This update must be performed immediately to ensure consistency, which can significantly increase the execution time of these DDL operations. If you need to drop partitions frequently and minimize the performance impact on the system, it is recommended to use **local indexes** for faster and more efficient operations. ## Mitigate write hotspot issues -In TiDB, **write hotspots** occur when incoming write traffic is unevenly distributed across Regions. +In TiDB, **write hotspots** can occur when incoming write traffic is unevenly distributed across Regions. -This is common when the primary key is **monotonically increasing**—for example, an AUTO_INCREMENT primary key with AUTO_ID_CACHE=1, or secondary index on datetime column with default value set to CURRENT_TIMESTAMP—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: +This is common when the primary key is **monotonically increasing**—for example, an `AUTO_INCREMENT` primary key with `AUTO_ID_CACHE=1`, or secondary index on datetime column with default value set to `CURRENT_TIMESTAMP`—because new rows and index entries are always appended to the "rightmost" Region. Over time, this can lead to: -- A single Region handling most of the write workload, while other Regions remain idle. +- A single [Region](https://docs.pingcap.com/tidb/stable/tidb-storage/#region) handling most of the write workload, while other Regions remain idle. - Higher write latency and reduced throughput. - Limited performance gains from scaling out TiKV nodes, as the bottleneck remains concentrated on one Region. **Partitioned tables** can help mitigate this problem. By applying **hash** or **key** partitioning on the primary key, TiDB can spread inserts across multiple partitions (and therefore multiple Regions), reducing hotspot contention. +> **Note**: This section uses partitioned tables as an example for mitigating read/write hotspots. TiDB also provides other features such as `AUTO_RANDOM` and `SHARD_ROW_ID_BITS` for hotspot mitigation. When using partitioned tables in certain scenarios, you may need to set `merge_option=deny` to maintain partition boundaries. For more details, see [issue #58128](https://github.com/pingcap/tidb/issues/58128). ### How it works -TiDB stores table data in **Regions**, each covering a continuous range of row keys. +TiDB stores table data and indexes in **Regions**, each covering a continuous range of row keys. When the primary key is AUTO_INCREMENT and the secondary indexes on datetime columns are monotonically increasing: @@ -378,20 +380,20 @@ PARTITION BY KEY (id) PARTITIONS 16; ### Pros -- **Balanced Write Load** — Hotspots are spread across multiple partitions, reducing contention and improving insert performance. +- **Balanced Write Load** — Hotspots are spread across multiple partitions, and therefore multiple **Regions**, reducing contention and improving insert performance. - **Query Optimization via Partition Pruning** — If queries already filter by the partition key, TiDB can prune unused partitions, scanning less data and improving query speed. ### Cons **Potential Query Performance Drop Without Partition Pruning** -When converting a non-partitioned table to a partitioned table, TiDB creates a separate Regions for each partition. This might significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: +When converting a non-partitioned table to a partitioned table, TiDB creates separate Regions for each partition. This may significantly increase the total Region count. Queries that do not filter by the partition key cannot take advantage of partition pruning, forcing TiDB to scan all partitions or do index lookups in all partitions. This increases the number of coprocessor (cop) tasks and can slow down queries. Example: ```sql SELECT * FROM server_info WHERE `serial_no` = ?; ``` -**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely removed, making global indexes a feasible solution in these scenarios. Example: +**Mitigation**: add a **global index** on the filtering columns used by these queries to reduce scanning overhead. While creating a global index can significantly slow down `DROP PARTITION` operations, **hash and key partitioned tables do not support DROP PARTITION**. In practice, such partitions are rarely truncated, making global indexes a feasible solution in these scenarios. Example: ```sql ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; @@ -424,7 +426,7 @@ When a query does **not filter by partition key**, TiDB will **scan all partitio When using a time-based field as the partition key, a write hotspot might occur when switching to a new partition: **Root cause:** -In TiDB, any newly created table or partition initially contains only **one region** (data block), which is randomly placed on a single TiKV node. As data begins to be written, this region will eventually **split** into multiple regions, and PD will schedule these new regions to other TiKV nodes. +In TiDB, newly created partitions initially contain only **one region** on a single TiKV node. As writes concentrate on this single region, it must **split** into multiple regions before writes can be distributed across multiple TiKV nodes. This splitting process is the main cause of the temporary write hotspot. However, if the initial write traffic to this new partition is **very high**, the TiKV node hosting that single initial region will be under heavy write pressure. In such cases, it might not have enough spare resources (I/O capacity, CPU cycles) to handle both the application writes and the scheduling of newly split regions to other TiKV nodes. This can delay region distribution, keeping most writes concentrated on the same node for longer than desired. @@ -738,7 +740,7 @@ ALTER TABLE fa PARTITION BY RANGE (`date`) PARTITION `fa_2024002` VALUES LESS THAN (2025002), ... PARTITION `fa_2024365` VALUES LESS THAN (2025365), -PARTITION `fa_2024365` VALUES LESS THAN (2025365ƒf)); +PARTITION `fa_2024365` VALUES LESS THAN (2025365)); Query OK, 0 rows affected, 1 warning (2 hours 31 min 57.05 sec) ``` From aec934310f39507a0138871882e51ad8564f4156 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Wed, 22 Oct 2025 10:58:49 +0800 Subject: [PATCH 53/55] Rename partitioned tables guide and update references Renamed 'tidb-partitioned-tables-guide.md' to 'tidb-partitioned-tables-best-practices.md' for consistency. Updated TOC and internal note to reflect the new filename and clarified the scope of performance benefits for partitioned tables without global indexes. --- TOC.md | 2 +- ...es-guide.md => tidb-partitioned-tables-best-practices.md} | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) rename best-practices/{tidb-partitioned-tables-guide.md => tidb-partitioned-tables-best-practices.md} (99%) diff --git a/TOC.md b/TOC.md index 065047a1034da..731c1d7e33d28 100644 --- a/TOC.md +++ b/TOC.md @@ -438,7 +438,7 @@ - [Optimize Multi-Column Indexes](/best-practices/multi-column-index-best-practices.md) - [Manage Indexes and Identify Unused Indexes](/best-practices/index-management-best-practices.md) - [Handle Millions of Tables in SaaS Multi-Tenant Scenarios](/best-practices/saas-best-practices.md) - - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-guide.md) + - [Use TiDB Partitioned Tables](/best-practices/tidb-partitioned-tables-best-practices.md) - [Use UUIDs as Primary Keys](/best-practices/uuid.md) - [Develop Java Applications](/best-practices/java-app-best-practices.md) - [Handle High-Concurrency Writes](/best-practices/high-concurrency-best-practices.md) diff --git a/best-practices/tidb-partitioned-tables-guide.md b/best-practices/tidb-partitioned-tables-best-practices.md similarity index 99% rename from best-practices/tidb-partitioned-tables-guide.md rename to best-practices/tidb-partitioned-tables-best-practices.md index 37af510037928..89667171815f5 100644 --- a/best-practices/tidb-partitioned-tables-guide.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -223,7 +223,10 @@ In TiDB, you can clear up historical data either by TTL (Time-to-Live) or manual To compare the performance of TTL and partition drop, the test case in this section configures TTL to execute every 10 minutes and create a partitioned version of the same table, dropping one partition at the same interval for comparison. Both approaches are tested under background write loads of 50 and 100 concurrent threads. This test case measures key metrics such as execution time, system resource utilization, and the total number of rows deleted. #### Findings -> **Note**: The performance benefits described below apply to partitioned tables without global indexes. + +> **Note:** +> +> The performance benefits described in this section only apply to partitioned tables without global indexes. **TTL performance:** From 6aec89b9f12c2a1035567ccb7b4e66e8df744726 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 23 Oct 2025 21:20:28 +0800 Subject: [PATCH 54/55] Update tidb-partitioned-tables-best-practices.md --- best-practices/tidb-partitioned-tables-best-practices.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 89667171815f5..5da892a78f0e8 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -33,7 +33,7 @@ Partition pruning is most beneficial in scenarios where query predicates match t - **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. - **Multi-tenant or category-based datasets**: Partitioning by tenant ID or category enables queries to focus on a small subset of partitions. -- **Hybrid Transactional/Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. +- **Hybrid Transactional and Analytical Processing (HTAP)**: Especially for range partitioning, TiDB can leverage partition pruning in analytical workloads on TiFlash to skip irrelevant partitions and scan only the necessary subset, preventing **full table scans** on large datasets. For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable/partition-pruning/). From 0dd7df8646b459ec9e9666f126829670361c7eae Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 27 Oct 2025 18:53:18 +0800 Subject: [PATCH 55/55] Update tidb-partitioned-tables-best-practices.md --- .../tidb-partitioned-tables-best-practices.md | 23 +++++++++++-------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/best-practices/tidb-partitioned-tables-best-practices.md b/best-practices/tidb-partitioned-tables-best-practices.md index 5da892a78f0e8..778df083dc81d 100644 --- a/best-practices/tidb-partitioned-tables-best-practices.md +++ b/best-practices/tidb-partitioned-tables-best-practices.md @@ -23,12 +23,15 @@ This document examines partitioned tables in TiDB from multiple angles, includin ## Improve query efficiency +This section describes how to improve query efficiency by the following methods: + +- Partition pruning +- Query performance on secondary indexes + ### Partition pruning **Partition pruning** is an optimization technique that allows TiDB to reduce the amount of data scanned when executing queries against partitioned tables. Instead of scanning all partitions, TiDB analyzes the query's filter conditions and determines which partitions might contain relevant data, scanning only those partitions. This significantly improves query performance by reducing I/O and computation overhead. -#### Applicable scenarios - Partition pruning is most beneficial in scenarios where query predicates match the partitioning strategy. Common use cases include: - **Time-series data queries**: When data is partitioned by time ranges (for example, daily, monthly), queries restricted to a specific time period can quickly skip unrelated partitions. @@ -39,7 +42,7 @@ For more use cases, see [Partition Pruning](https://docs.pingcap.com/tidb/stable ### Query performance on secondary indexes: non-partitioned tables vs. local indexes vs. global indexes -In TiDB, local indexes are the default for partitioned tables. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. +In TiDB, partitioned tables use local indexes by default. Each partition has its own set of indexes. A global index, on the other hand, covers the whole table in one index. This means it keeps track of all rows across all partitions. global indexes can be faster for queries across multiple partitions because local indexes needs to do one lookup in each partition separately, while global index only needs one lookup for the whole table. #### Types of tables to be tested @@ -166,7 +169,9 @@ You can use `ALTER TABLE` to add a global index to an existing partitioned table ALTER TABLE ADD UNIQUE INDEX (col1, col2) GLOBAL; ``` -**Note:** + +> **Note:** +> > In TiDB v8.5.x and earlier versions, global indexes can only be created on unique columns. Starting from v9.0.0 (currently in beta), global indexes on non-unique columns are supported. This limitation will be removed in the next LTS version. - The `GLOBAL` keyword must be explicitly specified. @@ -289,7 +294,7 @@ FIRST PARTITION LESS THAN ('2025-02-19 18:00:00') LAST PARTITION LESS THAN ('2025-02-19 20:00:00'); ``` -You are required to run DDL statements like `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. +You need to run DDL statements such as `ALTER TABLE PARTITION ...` to change the `FIRST PARTITION` and `LAST PARTITION` periodically. These two DDL statements can drop the old partitions and create new ones. ```sql ALTER TABLE ad_cache FIRST PARTITION LESS THAN ("${nextTimestamp}"); @@ -404,9 +409,7 @@ ALTER TABLE server_info ADD UNIQUE INDEX(serial_no, id) GLOBAL; ## Partition management challenges -### How to avoid Hotspots caused by new range partitions - -#### Overview +### How to avoid hotspots caused by new range partitions New range partitions in a partitioned table can easily lead to hotspot issues in TiDB. This section outlines common scenarios and mitigation strategies to avoid read and write hotspots caused by range partitions. @@ -437,7 +440,7 @@ However, if the initial write traffic to this new partition is **very high**, th This imbalance can cause that TiKV node to trigger **flow control**, leading to a sharp drop in QPS, a spike in write latency, and increased CPU usage on the affected node, which in turn might impact the overall read and write performance of the cluster. -### Summary Table +### Summary | Approach | Read Hotspot Risk | Write Hotspot Risk | Operational Complexity | Query Performance | Data Cleanup | |---|---|---|---|---|---| @@ -642,7 +645,7 @@ show table employees2 PARTITION (p4) regions; - Best suited for use cases that require stable performance and do not benefit from partition-based data management. -## Converte between partitioned and non-partitioned tables +## Convert between partitioned and non-partitioned tables When working with large tables (for example in this example 120 million rows), transforming between partitioned and non-partitioned schemas is sometimes required for performance tuning or schema design changes. TiDB supports several main approaches for such transformations: