Add design, review, and production-readiness documents for the April remediation cycle.\nInclude supporting SQL and supply-api operational design notes so review conclusions and implementation guidance stay versioned together.
316 lines
7.5 KiB
Markdown
316 lines
7.5 KiB
Markdown
# 数据库索引维护策略 v1.0
|
||
|
||
> **文档版本**: v1.0
|
||
> **创建日期**: 2026-04-07
|
||
> **问题**: P1-009 高频写入表的索引维护策略未定义
|
||
|
||
---
|
||
|
||
## 1. 概述
|
||
|
||
本文档定义高频写入表的索引维护策略,包括 `REINDEX`、`VACUUM` 自动化方案,确保数据库性能稳定。
|
||
|
||
### 1.1 高频写入表清单
|
||
|
||
| 表名 | 写入频率 | 日均增量 | 备注 |
|
||
|------|----------|----------|------|
|
||
| supply_usage_records | 极高 | ~1000万条 | 核心业务表 |
|
||
| supply_idempotency_records | 高 | ~100万条 | 幂等检查 |
|
||
| audit_events | 高 | ~500万条 | 审计日志 |
|
||
| billing_ledger_entries | 中 | ~10万条 | 账务明细 |
|
||
|
||
---
|
||
|
||
## 2. VACUUM 维护策略
|
||
|
||
### 2.1 自动 VACUUM 配置
|
||
|
||
PostgreSQL 默认启用 autovacuum,但需要针对高频表进行调优:
|
||
|
||
```sql
|
||
-- supply_usage_records 表配置
|
||
ALTER TABLE supply_usage_records SET (
|
||
autovacuum_vacuum_threshold = 50,
|
||
autovacuum_analyze_threshold = 50,
|
||
autovacuum_vacuum_scale_factor = 0.01,
|
||
autovacuum_analyze_scale_factor = 0.01,
|
||
autovacuum_vacuum_cost_delay = 2,
|
||
autovacuum_vacuum_cost_limit = 200
|
||
);
|
||
|
||
-- supply_idempotency_records 表配置
|
||
ALTER TABLE supply_idempotency_records SET (
|
||
autovacuum_vacuum_threshold = 100,
|
||
autovacuum_analyze_threshold = 100,
|
||
autovacuum_vacuum_scale_factor = 0.05,
|
||
autovacuum_analyze_scale_factor = 0.02
|
||
);
|
||
```
|
||
|
||
### 2.2 VACUUM 策略矩阵
|
||
|
||
| 表名 | autovacuum_enabled | vacuum_threshold | vacuum_scale_factor | 分析频率 |
|
||
|------|-------------------|------------------|---------------------|----------|
|
||
| supply_usage_records | true | 50 | 0.01 (1%) | 每1%变化 |
|
||
| supply_idempotency_records | true | 100 | 0.05 (5%) | 每2%变化 |
|
||
| supply_orders | true | 500 | 0.05 (5%) | 每周 |
|
||
| supply_packages | true | 1000 | 0.1 (10%) | 每月 |
|
||
|
||
### 2.3 手动 VACUUM 计划
|
||
|
||
**日常维护** (低峰期 02:00-04:00):
|
||
```bash
|
||
# vacuum analyze 高频表
|
||
vacuumdb -h localhost -U postgres -d supply_db \
|
||
--table 'supply_usage_records' \
|
||
--analyze \
|
||
--verbose
|
||
|
||
# 批量 vacuum 多个表
|
||
vacuumdb -h localhost -U postgres -d supply_db \
|
||
--all \
|
||
--analyze \
|
||
--verbose
|
||
```
|
||
|
||
**周维护** (周日 03:00-05:00):
|
||
```bash
|
||
# 全面 vacuum + analyze
|
||
vacuumdb -h localhost -U postgres -d supply_db \
|
||
--all \
|
||
--analyze \
|
||
--full \
|
||
--verbose
|
||
```
|
||
|
||
---
|
||
|
||
## 3. REINDEX 维护策略
|
||
|
||
### 3.1 REINDEX 触发条件
|
||
|
||
| 触发条件 | 说明 | 影响 |
|
||
|----------|------|------|
|
||
| 索引膨胀率 > 20% | B-tree 索引膨胀 | 性能下降 |
|
||
| 大量删除后 | DELETE > 30% 总行数 | 索引包含大量空页 |
|
||
| 长时间运行后 | 运行 > 30天 | 索引统计信息陈旧 |
|
||
| 硬件故障后 | 系统重启 | 确保索引一致性 |
|
||
|
||
### 3.2 索引膨胀检测
|
||
|
||
```sql
|
||
-- 检测索引膨胀率
|
||
SELECT
|
||
schemaname,
|
||
tablename,
|
||
indexname,
|
||
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
|
||
idx_scan,
|
||
idx_tup_read,
|
||
idx_tup_fetch,
|
||
ROUND(
|
||
(pg_relation_size(indexrelid)::numeric /
|
||
pg_relation_size(indrelid) * 100),
|
||
2
|
||
) AS index_ratio
|
||
FROM
|
||
pg_stat_user_indexes
|
||
WHERE
|
||
pg_relation_size(indexrelid) > 1024 * 1024 -- > 1MB
|
||
ORDER BY
|
||
pg_relation_size(indexrelid) DESC;
|
||
```
|
||
|
||
### 3.3 REINDEX 执行计划
|
||
|
||
**月维护** (每月第一个周日 04:00-06:00):
|
||
```bash
|
||
# 重建单个膨胀索引
|
||
reindexdb -h localhost -U postgres -d supply_db \
|
||
--index 'idx_supply_usage_records_request_id' \
|
||
--verbose
|
||
|
||
# 重建表的所有索引
|
||
reindexdb -h localhost -U postgres -d supply_db \
|
||
--table 'supply_usage_records' \
|
||
--verbose
|
||
|
||
# 全库索引重建 (慎用,会锁表)
|
||
reindexdb -h localhost -U postgres -d supply_db \
|
||
--all \
|
||
--verbose
|
||
```
|
||
|
||
### 3.4 联机型 REINDEX 方案
|
||
|
||
对于不可停机的关键表,使用 `REINDEX CONCURRENTLY`:
|
||
|
||
```bash
|
||
# 联机重建索引 (不锁表)
|
||
reindexdb -h localhost -U postgres -d supply_db \
|
||
--index 'idx_supply_usage_records_request_id' \
|
||
--concurrently \
|
||
--verbose
|
||
```
|
||
|
||
---
|
||
|
||
## 4. 自动化脚本
|
||
|
||
### 4.1 每日维护脚本 (daily_vacuum.sh)
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# daily_vacuum.sh - 每日索引维护
|
||
# 执行时间: 每日 02:00
|
||
|
||
set -e
|
||
|
||
DB_HOST="localhost"
|
||
DB_PORT="5432"
|
||
DB_NAME="supply_db"
|
||
DB_USER="postgres"
|
||
LOG_FILE="/var/log/postgresql/daily_vacuum_$(date +%Y%m%d).log"
|
||
|
||
echo "=== 开始每日 VACUUM 维护: $(date) ===" | tee -a "$LOG_FILE"
|
||
|
||
# 高频表优先 vacuum
|
||
TABLES=(
|
||
"supply_usage_records"
|
||
"supply_idempotency_records"
|
||
"supply_orders"
|
||
"supply_earnings"
|
||
)
|
||
|
||
for TABLE in "${TABLES[@]}"; do
|
||
echo "VACUUM $TABLE ..." | tee -a "$LOG_FILE"
|
||
vacuumdb -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \
|
||
--table "$TABLE" \
|
||
--analyze \
|
||
--verbose 2>&1 | tee -a "$LOG_FILE"
|
||
done
|
||
|
||
echo "=== VACUUM 维护完成: $(date) ===" | tee -a "$LOG_FILE"
|
||
```
|
||
|
||
### 4.2 每周维护脚本 (weekly_reindex.sh)
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# weekly_reindex.sh - 每周 REINDEX 维护
|
||
# 执行时间: 每周日 03:00
|
||
|
||
set -e
|
||
|
||
DB_HOST="localhost"
|
||
DB_PORT="5432"
|
||
DB_NAME="supply_db"
|
||
DB_USER="postgres"
|
||
LOG_FILE="/var/log/postgresql/weekly_reindex_$(date +%Y%m%d).log"
|
||
|
||
echo "=== 开始每周 REINDEX 维护: $(date) ===" | tee -a "$LOG_FILE"
|
||
|
||
# 检查并重建膨胀索引
|
||
膨胀索引=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -t -c "
|
||
SELECT indexname FROM pg_stat_user_indexes
|
||
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024
|
||
AND idx_scan = 0
|
||
AND schemaname = 'public';
|
||
")
|
||
|
||
for INDEX in $膨胀索引; do
|
||
echo "REINDEX INDEX $INDEX ..." | tee -a "$LOG_FILE"
|
||
reindexdb -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \
|
||
--index "$INDEX" \
|
||
--concurrently \
|
||
--verbose 2>&1 | tee -a "$LOG_FILE"
|
||
done
|
||
|
||
echo "=== REINDEX 维护完成: $(date) ===" | tee -a "$LOG_FILE"
|
||
```
|
||
|
||
### 4.3 Cron 任务配置
|
||
|
||
```bash
|
||
# /etc/cron.d/postgresql_maintenance
|
||
# 每日凌晨2点执行 vacuum
|
||
0 2 * * * postgres /home/postgres/scripts/daily_vacuum.sh
|
||
|
||
# 每周日凌晨3点执行 reindex
|
||
0 3 * * 0 postgres /home/postgres/scripts/weekly_reindex.sh
|
||
```
|
||
|
||
---
|
||
|
||
## 5. 监控指标
|
||
|
||
### 5.1 关键监控指标
|
||
|
||
| 指标 | 告警阈值 | 说明 |
|
||
|------|----------|------|
|
||
| index膨胀率 | > 20% | 触发 REINDEX |
|
||
| dead_tuples | > 10000 | 触发 VACUUM |
|
||
| last_autovacuum | > 24h | 可能 autovacuum 异常 |
|
||
| idx_scan | = 0 | 索引未使用,考虑删除 |
|
||
|
||
### 5.2 监控查询
|
||
|
||
```sql
|
||
-- 检测需要维护的表
|
||
SELECT
|
||
schemaname,
|
||
relname AS table_name,
|
||
n_dead_tup,
|
||
n_live_tup,
|
||
last_autovacuum,
|
||
last_autoanalyze
|
||
FROM
|
||
pg_stat_user_tables
|
||
WHERE
|
||
n_dead_tup > 1000
|
||
ORDER BY
|
||
n_dead_tup DESC;
|
||
|
||
-- 检测未使用的索引
|
||
SELECT
|
||
schemaname,
|
||
tablename,
|
||
indexname,
|
||
idx_scan
|
||
FROM
|
||
pg_stat_user_indexes
|
||
WHERE
|
||
idx_scan = 0
|
||
AND NOT indexname LIKE '%_pkey'
|
||
ORDER BY
|
||
pg_relation_size(indexrelid) DESC;
|
||
```
|
||
|
||
---
|
||
|
||
## 6. 最佳实践
|
||
|
||
1. **避免在高峰期维护**: 维护操作安排在低峰期 (02:00-06:00)
|
||
2. **优先自动 vacuum**: 配置合理的 autovacuum 参数,减少手动干预
|
||
3. **监控索引膨胀**: 定期检测膨胀率,及时重建
|
||
4. **使用 CONCURRENTLY**: 关键表使用 `REINDEX CONCURRENTLY` 避免锁表
|
||
5. **保留维护日志**: 记录每次维护执行情况,便于分析问题
|
||
|
||
---
|
||
|
||
## 7. 恢复时间预估
|
||
|
||
| 操作 | 表大小 | 预计耗时 | 锁类型 |
|
||
|------|--------|----------|--------|
|
||
| VACUUM ANALYZE | 10GB | 5-10min | 轻量锁 |
|
||
| REINDEX | 1GB | 1-2min | 表锁* |
|
||
| REINDEX CONCURRENTLY | 1GB | 3-5min | 无锁 |
|
||
| VACUUM FULL | 10GB | 15-30min | 表锁 |
|
||
|
||
*使用 `REINDEX CONCURRENTLY` 可避免锁表
|
||
|
||
---
|
||
|
||
> **维护记录**:
|
||
> - v1.0 (2026-04-07): 初始版本
|