Go性能分析:使用pprof排查CPU 100%问题
Go性能分析工具:pprof
1. 问题背景
在Go程序运行过程中,遇到CPU占用100%的情况是一个常见的性能问题。本文将介绍如何使用pprof工具进行问题排查和性能分析。
2. 工具集成
2.1 添加pprof支持
在Gin框架中集成pprof调试接口:
package pprofdebug
import (
"github.com/gin-contrib/pprof"
"github.com/gin-gonic/gin"
)
func Run() {
r := gin.New()
pprof.Register(r, "debug/pprof")
r.Run(":3001")
}
3. 性能分析步骤
3.1 收集性能数据
通过HTTP接口采集CPU profile数据:
➜ app git:(master) ✗ go tool pprof http://192.168.1.174:3001/debug/pprof/profile
Fetching profile over HTTP from http://192.168.1.174:3001/debug/pprof/profile
Saved profile in /Users/wang/pprof/pprof.MyServer.samples.cpu.001.pb.gz
File: MyServer
Type: cpu
Time: Mar 21, 2024 at 4:46pm (CST)
Duration: 30.11s, Total samples = 30.21s (100.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) cpu
(pprof) help
Commands:
callgrind Outputs a graph in callgrind format
comments Output all profile comments
disasm Output assembly listings annotated with samples
dot Outputs a graph in DOT format
eog Visualize graph through eog
evince Visualize graph through evince
gif Outputs a graph image in GIF format
gv Visualize graph through gv
kcachegrind Visualize report in KCachegrind
list Output annotated source for functions matching regexp
pdf Outputs a graph in PDF format
peek Output callers/callees of functions matching regexp
png Outputs a graph image in PNG format
proto Outputs the profile in compressed protobuf format
ps Outputs a graph in PS format
raw Outputs a text representation of the raw profile
svg Outputs a graph in SVG format
tags Outputs all tags in the profile
text Outputs top entries in text form
top Outputs top entries in text form
topproto Outputs top entries in compressed protobuf format
traces Outputs all profile samples in text form
tree Outputs a text rendering of call graph
web Visualize graph through web browser
weblist Display annotated source in a web browser
o/options List options and their current values
q/quit/exit/^D Exit pprof
Options:
call_tree Create a context-sensitive call tree
compact_labels Show minimal headers
divide_by Ratio to divide all samples before visualization
drop_negative Ignore negative differences
edgefraction Hide edges below <f>*total
focus Restricts to samples going through a node matching regexp
hide Skips nodes matching regexp
ignore Skips paths going through any nodes matching regexp
intel_syntax Show assembly in Intel syntax
mean Average sample value over first value (count)
nodecount Max number of nodes to show
nodefraction Hide nodes below <f>*total
noinlines Ignore inlines.
normalize Scales profile based on the base profile.
output Output filename for file-based outputs
prune_from Drops any functions below the matched frame.
relative_percentages Show percentages relative to focused subgraph
sample_index Sample value to report (0-based index or name)
show Only show nodes matching regexp
show_from Drops functions above the highest matched frame.
source_path Search path for source files
tagfocus Restricts to samples with tags in range or matched by regexp
taghide Skip tags matching this regexp
tagignore Discard samples with tags in range or matched by regexp
tagleaf Adds pseudo stack frames for labels key/value pairs at the callstack leaf.
tagroot Adds pseudo stack frames for labels key/value pairs at the callstack root.
tagshow Only consider tags matching this regexp
trim Honor nodefraction/edgefraction/nodecount defaults
trim_path Path to trim from source paths before search
unit Measurement units to display
Option groups (only set one per group):
granularity
functions Aggregate at the function level.
filefunctions Aggregate at the function level.
files Aggregate at the file level.
lines Aggregate at the source code line level.
addresses Aggregate at the address level.
sort
cum Sort entries based on cumulative weight
flat Sort entries based on own weight
: Clear focus/ignore/hide/tagfocus/tagignore
type "help <cmd|option>" for more information
(pprof) top # 查看cpu占用比较多的部分
Showing nodes accounting for 29.84s, 98.78% of 30.21s total
Dropped 175 nodes (cum <= 0.15s)
Showing top 10 nodes out of 11
flat flat% sum% cum cum%
8.11s 26.85% 26.85% 8.11s 26.85% runtime.unlock2
7.04s 23.30% 50.15% 7.04s 23.30% runtime.lock2
6.39s 21.15% 71.30% 26.70s 88.38% runtime.chanrecv
2.41s 7.98% 79.28% 2.41s 7.98% runtime.memclrNoHeapPointers
2.04s 6.75% 86.03% 29.97s 99.21% MyServer/pkg/net_check.reload.func1 # 症结出在这里了
1.98s 6.55% 92.59% 4.39s 14.53% runtime.typedmemclr
1.23s 4.07% 96.66% 27.93s 92.45% runtime.chanrecv2
0.27s 0.89% 97.55% 7.31s 24.20% runtime.lockWithRank (inline)
0.19s 0.63% 98.18% 8.48s 28.07% runtime.unlock (inline)
0.18s 0.6% 98.78% 8.29s 27.44% runtime.unlockWithRank (inline)
(pprof) %
4. 性能优化建议
-
关注
flat
和cum
列flat
: 函数自身占用的CPU时间cum
: 函数及其调用的函数占用的总CPU时间
-
分析热点函数
- 查看占用CPU时间最多的函数
- 分析这些函数的调用关系
- 重点优化cumulative时间较高的函数
-
常见性能问题
- 锁竞争(如示例中的lock2/unlock2)
- 频繁的通道操作(chanrecv)
- 内存操作(memclr相关)
5. 最佳实践
- 定期进行性能分析
- 建立性能基准
- 在测试环境中进行完整的性能测试
- 保存性能分析数据以便比较
- 结合其他工具(如trace、heap profile等)
6. 注意事项
- 性能分析会对程序性能造成影响
- 在生产环境使用时需要注意安全性
- 建议在测试环境进行完整的性能分析
- 保护debug接口,避免未授权访问