Files
lijiaoqiao/supply-api/sql/postgresql/index_maintenance_v1.md
Your Name aecba5ff27 docs(review): add remediation plans and readiness artifacts
Add design, review, and production-readiness documents for the April remediation cycle.\nInclude supporting SQL and supply-api operational design notes so review conclusions and implementation guidance stay versioned together.
2026-04-13 18:54:45 +08:00

316 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 数据库索引维护策略 v1.0
> **文档版本**: v1.0
> **创建日期**: 2026-04-07
> **问题**: P1-009 高频写入表的索引维护策略未定义
---
## 1. 概述
本文档定义高频写入表的索引维护策略,包括 `REINDEX``VACUUM` 自动化方案,确保数据库性能稳定。
### 1.1 高频写入表清单
| 表名 | 写入频率 | 日均增量 | 备注 |
|------|----------|----------|------|
| supply_usage_records | 极高 | ~1000万条 | 核心业务表 |
| supply_idempotency_records | 高 | ~100万条 | 幂等检查 |
| audit_events | 高 | ~500万条 | 审计日志 |
| billing_ledger_entries | 中 | ~10万条 | 账务明细 |
---
## 2. VACUUM 维护策略
### 2.1 自动 VACUUM 配置
PostgreSQL 默认启用 autovacuum但需要针对高频表进行调优
```sql
-- supply_usage_records 表配置
ALTER TABLE supply_usage_records SET (
autovacuum_vacuum_threshold = 50,
autovacuum_analyze_threshold = 50,
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_delay = 2,
autovacuum_vacuum_cost_limit = 200
);
-- supply_idempotency_records 表配置
ALTER TABLE supply_idempotency_records SET (
autovacuum_vacuum_threshold = 100,
autovacuum_analyze_threshold = 100,
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02
);
```
### 2.2 VACUUM 策略矩阵
| 表名 | autovacuum_enabled | vacuum_threshold | vacuum_scale_factor | 分析频率 |
|------|-------------------|------------------|---------------------|----------|
| supply_usage_records | true | 50 | 0.01 (1%) | 每1%变化 |
| supply_idempotency_records | true | 100 | 0.05 (5%) | 每2%变化 |
| supply_orders | true | 500 | 0.05 (5%) | 每周 |
| supply_packages | true | 1000 | 0.1 (10%) | 每月 |
### 2.3 手动 VACUUM 计划
**日常维护** (低峰期 02:00-04:00):
```bash
# vacuum analyze 高频表
vacuumdb -h localhost -U postgres -d supply_db \
--table 'supply_usage_records' \
--analyze \
--verbose
# 批量 vacuum 多个表
vacuumdb -h localhost -U postgres -d supply_db \
--all \
--analyze \
--verbose
```
**周维护** (周日 03:00-05:00):
```bash
# 全面 vacuum + analyze
vacuumdb -h localhost -U postgres -d supply_db \
--all \
--analyze \
--full \
--verbose
```
---
## 3. REINDEX 维护策略
### 3.1 REINDEX 触发条件
| 触发条件 | 说明 | 影响 |
|----------|------|------|
| 索引膨胀率 > 20% | B-tree 索引膨胀 | 性能下降 |
| 大量删除后 | DELETE > 30% 总行数 | 索引包含大量空页 |
| 长时间运行后 | 运行 > 30天 | 索引统计信息陈旧 |
| 硬件故障后 | 系统重启 | 确保索引一致性 |
### 3.2 索引膨胀检测
```sql
-- 检测索引膨胀率
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read,
idx_tup_fetch,
ROUND(
(pg_relation_size(indexrelid)::numeric /
pg_relation_size(indrelid) * 100),
2
) AS index_ratio
FROM
pg_stat_user_indexes
WHERE
pg_relation_size(indexrelid) > 1024 * 1024 -- > 1MB
ORDER BY
pg_relation_size(indexrelid) DESC;
```
### 3.3 REINDEX 执行计划
**月维护** (每月第一个周日 04:00-06:00):
```bash
# 重建单个膨胀索引
reindexdb -h localhost -U postgres -d supply_db \
--index 'idx_supply_usage_records_request_id' \
--verbose
# 重建表的所有索引
reindexdb -h localhost -U postgres -d supply_db \
--table 'supply_usage_records' \
--verbose
# 全库索引重建 (慎用,会锁表)
reindexdb -h localhost -U postgres -d supply_db \
--all \
--verbose
```
### 3.4 联机型 REINDEX 方案
对于不可停机的关键表,使用 `REINDEX CONCURRENTLY`:
```bash
# 联机重建索引 (不锁表)
reindexdb -h localhost -U postgres -d supply_db \
--index 'idx_supply_usage_records_request_id' \
--concurrently \
--verbose
```
---
## 4. 自动化脚本
### 4.1 每日维护脚本 (daily_vacuum.sh)
```bash
#!/bin/bash
# daily_vacuum.sh - 每日索引维护
# 执行时间: 每日 02:00
set -e
DB_HOST="localhost"
DB_PORT="5432"
DB_NAME="supply_db"
DB_USER="postgres"
LOG_FILE="/var/log/postgresql/daily_vacuum_$(date +%Y%m%d).log"
echo "=== 开始每日 VACUUM 维护: $(date) ===" | tee -a "$LOG_FILE"
# 高频表优先 vacuum
TABLES=(
"supply_usage_records"
"supply_idempotency_records"
"supply_orders"
"supply_earnings"
)
for TABLE in "${TABLES[@]}"; do
echo "VACUUM $TABLE ..." | tee -a "$LOG_FILE"
vacuumdb -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \
--table "$TABLE" \
--analyze \
--verbose 2>&1 | tee -a "$LOG_FILE"
done
echo "=== VACUUM 维护完成: $(date) ===" | tee -a "$LOG_FILE"
```
### 4.2 每周维护脚本 (weekly_reindex.sh)
```bash
#!/bin/bash
# weekly_reindex.sh - 每周 REINDEX 维护
# 执行时间: 每周日 03:00
set -e
DB_HOST="localhost"
DB_PORT="5432"
DB_NAME="supply_db"
DB_USER="postgres"
LOG_FILE="/var/log/postgresql/weekly_reindex_$(date +%Y%m%d).log"
echo "=== 开始每周 REINDEX 维护: $(date) ===" | tee -a "$LOG_FILE"
# 检查并重建膨胀索引
膨胀索引=$(psql -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" -t -c "
SELECT indexname FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024
AND idx_scan = 0
AND schemaname = 'public';
")
for INDEX in $膨胀索引; do
echo "REINDEX INDEX $INDEX ..." | tee -a "$LOG_FILE"
reindexdb -h "$DB_HOST" -p "$DB_PORT" -U "$DB_USER" -d "$DB_NAME" \
--index "$INDEX" \
--concurrently \
--verbose 2>&1 | tee -a "$LOG_FILE"
done
echo "=== REINDEX 维护完成: $(date) ===" | tee -a "$LOG_FILE"
```
### 4.3 Cron 任务配置
```bash
# /etc/cron.d/postgresql_maintenance
# 每日凌晨2点执行 vacuum
0 2 * * * postgres /home/postgres/scripts/daily_vacuum.sh
# 每周日凌晨3点执行 reindex
0 3 * * 0 postgres /home/postgres/scripts/weekly_reindex.sh
```
---
## 5. 监控指标
### 5.1 关键监控指标
| 指标 | 告警阈值 | 说明 |
|------|----------|------|
| index膨胀率 | > 20% | 触发 REINDEX |
| dead_tuples | > 10000 | 触发 VACUUM |
| last_autovacuum | > 24h | 可能 autovacuum 异常 |
| idx_scan | = 0 | 索引未使用,考虑删除 |
### 5.2 监控查询
```sql
-- 检测需要维护的表
SELECT
schemaname,
relname AS table_name,
n_dead_tup,
n_live_tup,
last_autovacuum,
last_autoanalyze
FROM
pg_stat_user_tables
WHERE
n_dead_tup > 1000
ORDER BY
n_dead_tup DESC;
-- 检测未使用的索引
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM
pg_stat_user_indexes
WHERE
idx_scan = 0
AND NOT indexname LIKE '%_pkey'
ORDER BY
pg_relation_size(indexrelid) DESC;
```
---
## 6. 最佳实践
1. **避免在高峰期维护**: 维护操作安排在低峰期 (02:00-06:00)
2. **优先自动 vacuum**: 配置合理的 autovacuum 参数,减少手动干预
3. **监控索引膨胀**: 定期检测膨胀率,及时重建
4. **使用 CONCURRENTLY**: 关键表使用 `REINDEX CONCURRENTLY` 避免锁表
5. **保留维护日志**: 记录每次维护执行情况,便于分析问题
---
## 7. 恢复时间预估
| 操作 | 表大小 | 预计耗时 | 锁类型 |
|------|--------|----------|--------|
| VACUUM ANALYZE | 10GB | 5-10min | 轻量锁 |
| REINDEX | 1GB | 1-2min | 表锁* |
| REINDEX CONCURRENTLY | 1GB | 3-5min | 无锁 |
| VACUUM FULL | 10GB | 15-30min | 表锁 |
*使用 `REINDEX CONCURRENTLY` 可避免锁表
---
> **维护记录**:
> - v1.0 (2026-04-07): 初始版本