218 lines
4.0 KiB
Markdown
218 lines
4.0 KiB
Markdown
|
|
# 日志分析 Runbook
|
|||
|
|
|
|||
|
|
## 日志位置
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Docker Compose 日志
|
|||
|
|
docker compose logs -f
|
|||
|
|
|
|||
|
|
# 应用日志文件
|
|||
|
|
./logs/app.log
|
|||
|
|
|
|||
|
|
# Docker 内部日志
|
|||
|
|
docker inspect user-management-app 2>/dev/null | jq '.[0].LogPath'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 日志级别
|
|||
|
|
|
|||
|
|
| 级别 | 说明 | 示例 |
|
|||
|
|
|------|------|------|
|
|||
|
|
| DEBUG | 调试信息 | 变量值、函数调用 |
|
|||
|
|
| INFO | 一般信息 | 请求处理、服务启动 |
|
|||
|
|
| WARN | 警告信息 | 配置缺失、性能下降 |
|
|||
|
|
| ERROR | 错误信息 | 数据库连接失败 |
|
|||
|
|
| FATAL | 致命错误 | 启动失败 |
|
|||
|
|
|
|||
|
|
## 常用查询
|
|||
|
|
|
|||
|
|
### 1. 查看实时日志
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 跟踪所有日志
|
|||
|
|
docker compose logs -f
|
|||
|
|
|
|||
|
|
# 只看应用日志
|
|||
|
|
docker compose logs -f app
|
|||
|
|
|
|||
|
|
# 只看错误
|
|||
|
|
docker compose logs -f | grep -i error
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 搜索特定内容
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 搜索错误
|
|||
|
|
grep -i "error" ./logs/app.log
|
|||
|
|
|
|||
|
|
# 搜索特定用户
|
|||
|
|
grep "user_id=123" ./logs/app.log
|
|||
|
|
|
|||
|
|
# 搜索 IP 地址
|
|||
|
|
grep "192.168.1.1" ./logs/app.log
|
|||
|
|
|
|||
|
|
# 搜索时间范围
|
|||
|
|
sed -n '/2026-04-08 10:00:00/,/2026-04-08 11:00:00/p' ./logs/app.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 分析请求日志
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 查找慢请求 (> 1s)
|
|||
|
|
grep -E "[0-9]+ms" ./logs/app.log | awk '{if($NF ~ /[0-9]+ms/ && $NF+0 > 1000) print}'
|
|||
|
|
|
|||
|
|
# 查找 5xx 错误
|
|||
|
|
grep -E "HTTP/.* 5[0-9][0-9]" ./logs/app.log
|
|||
|
|
|
|||
|
|
# 查找登录失败
|
|||
|
|
grep "login.*failed" ./logs/app.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 统计信息
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 统计错误数量
|
|||
|
|
grep -c "ERROR" ./logs/app.log
|
|||
|
|
|
|||
|
|
# 统计各类型错误
|
|||
|
|
grep "ERROR" ./logs/app.log | cut -d' ' -f4 | sort | uniq -c | sort -rn
|
|||
|
|
|
|||
|
|
# 统计请求来源 IP
|
|||
|
|
grep "client_ip" ./logs/app.log | awk '{print $NF}' | sort | uniq -c | sort -rn | head -10
|
|||
|
|
|
|||
|
|
# 统计 API 调用次数
|
|||
|
|
grep "GET\|POST\|PUT\|DELETE" ./logs/app.log | cut -d' ' -f6 | sort | uniq -c | sort -rn
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 常见问题分析
|
|||
|
|
|
|||
|
|
### 1. 数据库连接问题
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
错误特征:
|
|||
|
|
- "database connection failed"
|
|||
|
|
- "too many connections"
|
|||
|
|
- "connection timeout"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**排查步骤:**
|
|||
|
|
```bash
|
|||
|
|
# 1. 检查数据库文件
|
|||
|
|
ls -la ./data/user_management.db
|
|||
|
|
|
|||
|
|
# 2. 检查 SQLite 完整性
|
|||
|
|
sqlite3 ./data/user_management.db "PRAGMA integrity_check;"
|
|||
|
|
|
|||
|
|
# 3. 检查连接数
|
|||
|
|
lsof ./data/user_management.db | wc -l
|
|||
|
|
|
|||
|
|
# 4. 重启服务
|
|||
|
|
docker compose restart
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 认证/授权问题
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
错误特征:
|
|||
|
|
- "unauthorized"
|
|||
|
|
- "invalid token"
|
|||
|
|
- "permission denied"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**排查步骤:**
|
|||
|
|
```bash
|
|||
|
|
# 1. 检查 JWT 配置
|
|||
|
|
grep JWT ./configs/config.yaml
|
|||
|
|
|
|||
|
|
# 2. 验证 token 格式
|
|||
|
|
curl -H "Authorization: Bearer <token>" http://localhost:8080/api/v1/health
|
|||
|
|
|
|||
|
|
# 3. 检查密钥是否正确
|
|||
|
|
# 确保 JWT_SECRET 环境变量未被更改
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 性能问题
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
错误特征:
|
|||
|
|
- 响应时间 > 2s
|
|||
|
|
- 请求超时
|
|||
|
|
- 服务无响应
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**排查步骤:**
|
|||
|
|
```bash
|
|||
|
|
# 1. 检查系统资源
|
|||
|
|
docker stats
|
|||
|
|
|
|||
|
|
# 2. 检查内存使用
|
|||
|
|
free -h
|
|||
|
|
|
|||
|
|
# 3. 检查磁盘IO
|
|||
|
|
iostat -x 1 5
|
|||
|
|
|
|||
|
|
# 4. 检查进程
|
|||
|
|
ps aux | grep -E "user-management|docker"
|
|||
|
|
|
|||
|
|
# 5. 重启服务清理缓存
|
|||
|
|
docker compose restart
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 内存泄漏
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
错误特征:
|
|||
|
|
- 内存使用持续增长
|
|||
|
|
- OOM (Out of Memory) 错误
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**排查步骤:**
|
|||
|
|
```bash
|
|||
|
|
# 1. 查看内存使用趋势
|
|||
|
|
docker stats --no-stream
|
|||
|
|
|
|||
|
|
# 2. 检查容器内存限制
|
|||
|
|
docker inspect user-management-app | grep -i memory
|
|||
|
|
|
|||
|
|
# 3. 查看 Go 运行时的内存统计
|
|||
|
|
curl http://localhost:8080/metrics | grep go_memstats
|
|||
|
|
|
|||
|
|
# 4. 如果持续增长,可能需要重启
|
|||
|
|
docker compose restart
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 日志保留
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 查看当前日志大小
|
|||
|
|
du -h ./logs/app.log
|
|||
|
|
|
|||
|
|
# 轮转日志(如果配置了 logrotate)
|
|||
|
|
logrotate -f /etc/logrotate.d/user-management
|
|||
|
|
|
|||
|
|
# 手动清理旧日志
|
|||
|
|
find ./logs -name "*.log.*" -mtime +7 -delete
|
|||
|
|
|
|||
|
|
# 压缩旧日志
|
|||
|
|
find ./logs -name "*.log.*" -mtime +3 -exec gzip {} \;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 结构化日志查询(JSON格式)
|
|||
|
|
|
|||
|
|
如果日志是 JSON 格式:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用 jq 解析
|
|||
|
|
cat ./logs/app.log | jq '.level == "error"'
|
|||
|
|
|
|||
|
|
# 统计错误类型
|
|||
|
|
cat ./logs/app.log | jq -r '.error // .message' | sort | uniq -c | sort -rn | head -10
|
|||
|
|
|
|||
|
|
# 按时间范围查询
|
|||
|
|
cat ./logs/app.log | jq 'select(.time > "2026-04-08T10:00:00Z" and .time < "2026-04-08T11:00:00Z")'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 联系人
|
|||
|
|
|
|||
|
|
- 运维负责人:[填写]
|
|||
|
|
- 开发团队:[填写]
|