# PostgreSQL 表分区

## 场景

公司的pgsql有张大表数据量日益见长, 数据量也不断增加，查询效率越来越慢。

![大表数据量](https://img.pelycloud.com/pelyblog/typora/202302281712269.png)

**对此记录下优化方法, 目前采用的方法是**`表分区`

> **ps: 数据库版本 postgres:13**

## 介绍

* **主表/父表/Master Table　该表是创建子表的模板。它是一个正常的普通表，但正常情况下它并不储存任何数据。**
* **子表/分区表/Child Table/Partition Table　这些表继承并属于一个主表。子表中存储所有的数据。主表与分区表属于一对多的关系，也就是说，一个主表包含多个分区表，而一个分区表只从属于一个主表**

## 表分区的优势

* **在特定场景下，查询性能极大提高，尤其是当大部分经常访问的数据记录在一个或少数几个分区表上时。表分区减小了索引的大小，并使得常访问的分区表的索引更容易保存于内存中。**
* **当查询或者更新访问一个或少数几个分区表中的大部分数据时，可以通过顺序扫描该分区表而非使用大表索引来提高性能。**
* **可通过添加或移除分区表来高效的批量增删数据。如可使用ALTER TABLE NO INHERIT可将特定分区从主逻辑表中移除（该表依然存在，并可单独使用，只是与主表不再有继承关系并无法再通过主表访问该分区表），或使用DROP TABLE直接将该分区表删除。这两种方式完全避免了使用DELETE时所需的VACUUM额外代价。**
* **很少使用的数据可被迁移到便宜些的慢些的存储介质中**

## 表分区

**PostgreSQL10 以后的表分区有三种**

### 1. Range范围分区

**假设目前有一个需求:**

```
需要整理商品金额, 计算哪些金额区间的商品种类的数量最多, 需要快速的查出来.
例: 
商品金额 小于 100 的数量
    商品金额 大于 101 并小于 1000 的数量
    商品金额 大于 1001 并小于 10000 的数量
    商品金额 大于 10001 的数量
```

**先创建一张商品表，然后我们根据金额分段进行分区，sql如下：**

```
CREATE TABLE tbl_orders
(
    name varchar not null,
    price numeric(8,2) not null

) PARTITION BY RANGE (price);
```

**这个语句已经指定了按**`price`字段来分区了，接着创建分区表：

```
create table tbl_orders_rang1 partition of tbl_orders for values from (MINVALUE) to (100);
create table tbl_orders_rang2 partition of tbl_orders for values from (101) to (1000);
create table tbl_orders_rang3 partition of tbl_orders for values from (1001) to (10000);
create table tbl_orders_rang4 partition of tbl_orders for values from (10001) to (MAXVALUE);
```

**执行完成后我们就能看到有四个分区:**

![`商品分区`](https://img.pelycloud.com/pelyblog/typora/202303010927681.png)

**这里创建了四张分区表，分别对应金额是0到100、101到1000、1001到10000、10001以上。**

**接着我们插入一些数据：**

```
insert into tbl_orders(price, name) VALUES (88.8, '商品A');
insert into tbl_orders(price, name) VALUES (12, '商品B');
insert into tbl_orders(price, name) VALUES (2023, '商品C');
insert into tbl_orders(price, name) VALUES (88888, '商品D');
insert into tbl_orders(price, name) VALUES (115, '商品E');
insert into tbl_orders(price, name) VALUES (333.33, '商品F');
```

**可以看到这里的表名还是**`tbl_orders`，而不是具体的分区表，`说明对于客户端是无感知的`。

**我们查询也一样的：**

```
SELECT * from tbl_orders
```

![`查询全部数据`](https://img.pelycloud.com/pelyblog/typora/202303010933913.png)

**当然我们也可以直接查询表分区, 而且分区表与主表的字段是一致的。**

**查询分区表，就只能查到那个特定分区的数据了:**

```
SELECT * from tbl_orders_rang1;
```

![查询分区1](https://img.pelycloud.com/pelyblog/typora/202303010935233.png)

### 2. List列表分区

**类似的，列表分区是按特定的值来分区，比较**`某个城市的用户数据`放在一个分区里。这里不再给出每一步的讲解，代码如下：

```
drop table if exists tbl_user_list;

-- 创建主表
create table tbl_user_list
(
    name varchar not null,
    age  int     not null,
    city varchar not null
) partition by list (city);

-- 创建分区表
CREATE TABLE tbl_user_list1 PARTITION OF tbl_user_list FOR VALUES IN ('GZ');
CREATE TABLE tbl_user_list2 PARTITION OF tbl_user_list FOR VALUES IN ('BJ');
CREATE TABLE tbl_user_list3 PARTITION OF tbl_user_list DEFAULT;

-- 插入测试数据
insert into tbl_user_list(name, age, city)
VALUES ('小敏', 21, 'GZ');
insert into tbl_user_list(name, age, city)
VALUES ('小王', 18, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('小明', 8, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('老王', 55, 'BJ');
insert into tbl_user_list(name, age, city)
VALUES ('晓峰', 30, 'SH');
insert into tbl_user_list(name, age, city)
VALUES ('老闵', 66, 'HK');
insert into tbl_user_list(name, age, city)
VALUES ('小三', 14, 'GZ');
```

**当我们查询第一个分区的时候，只有广州的数据：**

```
select * from tbl_user_list1;
```

![查询分区1](https://img.pelycloud.com/pelyblog/typora/202303010945819.png)

### 3. Hash哈希分区

**哈希分区是指按字段取哈希值后再分区。具体的语句如下：**

```
-- 创建主表
create table tbl_user_list
(
    name varchar not null,
    age  int     not null,
    city varchar not null
) partition by hash (city);

-- 创建分区表
create table tbl_user_list1 partition of tbl_user_list for values with (modulus 4, remainder 0);
create table tbl_user_list2 partition of tbl_user_list for values with (modulus 4, remainder 1);
create table tbl_user_list3 partition of tbl_user_list for values with (modulus 4, remainder 2);
create table tbl_user_list4 partition of tbl_user_list for values with (modulus 4, remainder 3);

**以看到创建分区表的时候，我们用了取模的方式，所以如果要创建N个分区表，就要取N取模。**

**当一个city=xxx 数据插入到数据库时，首先会计算该字符串的hashcode(id)得到一个整数。**
**随后我们队这个整数求MODULUS =20的余数，如果余数是0就进入第一个分区。以下类推，余数是19就进入第20个分区**

**随便查询一张分区表如下：**

```
select * from tbl_user_list4;
```

![查询分区4](https://img.pelycloud.com/pelyblog/typora/202303010949417.png)

**可以看到同是SZ的哈希值是一样的，肯定会分在同一个分区，而BJ的哈希值取模后也属于同一个分区。**

```
-- 验证pkslow_person_h4表分区的city值于就全部是于3
SELECT hashtext(city)%4  from tbl_user_list4;
```

![验证](https://img.pelycloud.com/pelyblog/typora/202303010952891.png)

PostgreSQL 表分区

场景

公司的pgsql有张大表数据量日益见长, 数据量也不断增加，查询效率越来越慢。

大表数据量

对此记录下优化方法, 目前采用的方法是表分区

ps: 数据库版本 postgres:13

介绍

主表/父表/Master Table　该表是创建子表的模板。它是一个正常的普通表，但正常情况下它并不储存任何数据。
子表/分区表/Child Table/Partition Table　这些表继承并属于一个主表。子表中存储所有的数据。主表与分区表属于一对多的关系，也就是说，一个主表包含多个分区表，而一个分区表只从属于一个主表

表分区的优势

在特定场景下，查询性能极大提高，尤其是当大部分经常访问的数据记录在一个或少数几个分区表上时。表分区减小了索引的大小，并使得常访问的分区表的索引更容易保存于内存中。
当查询或者更新访问一个或少数几个分区表中的大部分数据时，可以通过顺序扫描该分区表而非使用大表索引来提高性能。
可通过添加或移除分区表来高效的批量增删数据。如可使用ALTER TABLE NO INHERIT可将特定分区从主逻辑表中移除（该表依然存在，并可单独使用，只是与主表不再有继承关系并无法再通过主表访问该分区表），或使用DROP TABLE直接将该分区表删除。这两种方式完全避免了使用DELETE时所需的VACUUM额外代价。
很少使用的数据可被迁移到便宜些的慢些的存储介质中

表分区

PostgreSQL10 以后的表分区有三种

1. Range范围分区

假设目前有一个需求:

需要整理商品金额, 计算哪些金额区间的商品种类的数量最多, 需要快速的查出来.
例: 
商品金额 小于 100 的数量
    商品金额 大于 101 并小于 1000 的数量
    商品金额 大于 1001 并小于 10000 的数量
    商品金额 大于 10001 的数量

先创建一张商品表，然后我们根据金额分段进行分区，sql如下：

CREATE TABLE tbl_orders
(
    name varchar not null,
    price numeric(8,2) not null

) PARTITION BY RANGE (price);

这个语句已经指定了按price字段来分区了，接着创建分区表：

create table tbl_orders_rang1 partition of tbl_orders for values from (MINVALUE) to (100);
create table tbl_orders_rang2 partition of tbl_orders for values from (101) to (1000);
create table tbl_orders_rang3 partition of tbl_orders for values from (1001) to (10000);
create table tbl_orders_rang4 partition of tbl_orders for values from (10001) to (MAXVALUE);

执行完成后我们就能看到有四个分区:

商品分区

这里创建了四张分区表，分别对应金额是0到100、101到1000、1001到10000、10001以上。

接着我们插入一些数据：

insert into tbl_orders(price, name) VALUES (88.8, '商品A');
insert into tbl_orders(price, name) VALUES (12, '商品B');
insert into tbl_orders(price, name) VALUES (2023, '商品C');
insert into tbl_orders(price, name) VALUES (88888, '商品D');
insert into tbl_orders(price, name) VALUES (115, '商品E');
insert into tbl_orders(price, name) VALUES (333.33, '商品F');

可以看到这里的表名还是tbl_orders，而不是具体的分区表，说明对于客户端是无感知的。

我们查询也一样的：

SELECT * from tbl_orders

查询全部数据

当然我们也可以直接查询表分区, 而且分区表与主表的字段是一致的。

查询分区表，就只能查到那个特定分区的数据了:

SELECT * from tbl_orders_rang1;

查询分区1

2. List列表分区

类似的，列表分区是按特定的值来分区，比较某个城市的用户数据放在一个分区里。这里不再给出每一步的讲解，代码如下：

drop table if exists tbl_user_list;

-- 创建主表
create table tbl_user_list
(
    name varchar not null,
    age  int     not null,
    city varchar not null
) partition by list (city);

-- 创建分区表
CREATE TABLE tbl_user_list1 PARTITION OF tbl_user_list FOR VALUES IN ('GZ');
CREATE TABLE tbl_user_list2 PARTITION OF tbl_user_list FOR VALUES IN ('BJ');
CREATE TABLE tbl_user_list3 PARTITION OF tbl_user_list DEFAULT;

-- 插入测试数据
insert into tbl_user_list(name, age, city)
VALUES ('小敏', 21, 'GZ');
insert into tbl_user_list(name, age, city)
VALUES ('小王', 18, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('小明', 8, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('老王', 55, 'BJ');
insert into tbl_user_list(name, age, city)
VALUES ('晓峰', 30, 'SH');
insert into tbl_user_list(name, age, city)
VALUES ('老闵', 66, 'HK');
insert into tbl_user_list(name, age, city)
VALUES ('小三', 14, 'GZ');

当我们查询第一个分区的时候，只有广州的数据：

select * from tbl_user_list1;

查询分区1

3. Hash哈希分区

哈希分区是指按字段取哈希值后再分区。具体的语句如下：

-- 创建主表
create table tbl_user_list
(
    name varchar not null,
    age  int     not null,
    city varchar not null
) partition by hash (city);

-- 创建分区表
create table tbl_user_list1 partition of tbl_user_list for values with (modulus 4, remainder 0);
create table tbl_user_list2 partition of tbl_user_list for values with (modulus 4, remainder 1);
create table tbl_user_list3 partition of tbl_user_list for values with (modulus 4, remainder 2);
create table tbl_user_list4 partition of tbl_user_list for values with (modulus 4, remainder 3);

-- 插入测试数据
insert into tbl_user_list(name, age, city)
VALUES ('小敏', 21, 'GZ');
insert into tbl_user_list(name, age, city)
VALUES ('小王', 18, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('小明', 8, 'SZ');
insert into tbl_user_list(name, age, city)
VALUES ('老王', 55, 'BJ');
insert into tbl_user_list(name, age, city)
VALUES ('晓峰', 30, 'SH');
insert into tbl_user_list(name, age, city)
VALUES ('老闵', 66, 'HK');
insert into tbl_user_list(name, age, city)
VALUES ('小三', 14, 'GZ');

以看到创建分区表的时候，我们用了取模的方式，所以如果要创建N个分区表，就要取N取模。

当一个city=xxx 数据插入到数据库时，首先会计算该字符串的hashcode(id)得到一个整数。
随后我们队这个整数求MODULUS =20的余数，如果余数是0就进入第一个分区。以下类推，余数是19就进入第20个分区

随便查询一张分区表如下：

select * from tbl_user_list4;

查询分区4

可以看到同是SZ的哈希值是一样的，肯定会分在同一个分区，而BJ的哈希值取模后也属于同一个分区。

-- 验证pkslow_person_h4表分区的city值于就全部是于3
SELECT hashtext(city)%4  from tbl_user_list4;

验证

最后修改：2023 年 03 月 01 日

如果觉得我的文章对你有用，请点个赞吧!

PostgreSQL 表分区

PostgreSQL 表分区

场景

介绍

表分区的优势

表分区

1. Range范围分区

2. List列表分区

3. Hash哈希分区

外网访问家庭网络小记 (DDNS)

推荐几款 Intellij IDEA 插件

Centos7/8 安装部署 Alist 私有云

模板方法模式

MyBatis 插件的使用及原理

Alibaba Sentinel规则持久化 - 拉模式【基于文件】

单例模式

Navicat 16 安装及破解

线程池

K8S 资源 - Label 和 Selector

PostgreSQL 表分区