Go性能分析:使用pprof排查CPU 100%问题

Go性能分析:使用pprof排查CPU 100%问题

Go性能分析工具:pprof

1. 问题背景

在Go程序运行过程中,遇到CPU占用100%的情况是一个常见的性能问题。本文将介绍如何使用pprof工具进行问题排查和性能分析。

2. 工具集成

2.1 添加pprof支持

在Gin框架中集成pprof调试接口:

package pprofdebug

import (
	"github.com/gin-contrib/pprof"
	"github.com/gin-gonic/gin"
)

func Run() {
	r := gin.New()

	pprof.Register(r, "debug/pprof")
	r.Run(":3001")
}

3. 性能分析步骤

3.1 收集性能数据

通过HTTP接口采集CPU profile数据:

➜  app git:(master) ✗ go tool pprof http://192.168.1.174:3001/debug/pprof/profile
Fetching profile over HTTP from http://192.168.1.174:3001/debug/pprof/profile
Saved profile in /Users/wang/pprof/pprof.MyServer.samples.cpu.001.pb.gz
File: MyServer
Type: cpu
Time: Mar 21, 2024 at 4:46pm (CST)
Duration: 30.11s, Total samples = 30.21s (100.32%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) cpu
(pprof) help
  Commands:
    callgrind        Outputs a graph in callgrind format
    comments         Output all profile comments
    disasm           Output assembly listings annotated with samples
    dot              Outputs a graph in DOT format
    eog              Visualize graph through eog
    evince           Visualize graph through evince
    gif              Outputs a graph image in GIF format
    gv               Visualize graph through gv
    kcachegrind      Visualize report in KCachegrind
    list             Output annotated source for functions matching regexp
    pdf              Outputs a graph in PDF format
    peek             Output callers/callees of functions matching regexp
    png              Outputs a graph image in PNG format
    proto            Outputs the profile in compressed protobuf format
    ps               Outputs a graph in PS format
    raw              Outputs a text representation of the raw profile
    svg              Outputs a graph in SVG format
    tags             Outputs all tags in the profile
    text             Outputs top entries in text form
    top              Outputs top entries in text form
    topproto         Outputs top entries in compressed protobuf format
    traces           Outputs all profile samples in text form
    tree             Outputs a text rendering of call graph
    web              Visualize graph through web browser
    weblist          Display annotated source in a web browser
    o/options        List options and their current values
    q/quit/exit/^D   Exit pprof

  Options:
    call_tree        Create a context-sensitive call tree
    compact_labels   Show minimal headers
    divide_by        Ratio to divide all samples before visualization
    drop_negative    Ignore negative differences
    edgefraction     Hide edges below <f>*total
    focus            Restricts to samples going through a node matching regexp
    hide             Skips nodes matching regexp
    ignore           Skips paths going through any nodes matching regexp
    intel_syntax     Show assembly in Intel syntax
    mean             Average sample value over first value (count)
    nodecount        Max number of nodes to show
    nodefraction     Hide nodes below <f>*total
    noinlines        Ignore inlines.
    normalize        Scales profile based on the base profile.
    output           Output filename for file-based outputs
    prune_from       Drops any functions below the matched frame.
    relative_percentages Show percentages relative to focused subgraph
    sample_index     Sample value to report (0-based index or name)
    show             Only show nodes matching regexp
    show_from        Drops functions above the highest matched frame.
    source_path      Search path for source files
    tagfocus         Restricts to samples with tags in range or matched by regexp
    taghide          Skip tags matching this regexp
    tagignore        Discard samples with tags in range or matched by regexp
    tagleaf          Adds pseudo stack frames for labels key/value pairs at the callstack leaf.
    tagroot          Adds pseudo stack frames for labels key/value pairs at the callstack root.
    tagshow          Only consider tags matching this regexp
    trim             Honor nodefraction/edgefraction/nodecount defaults
    trim_path        Path to trim from source paths before search
    unit             Measurement units to display

  Option groups (only set one per group):
    granularity
      functions        Aggregate at the function level.
      filefunctions    Aggregate at the function level.
      files            Aggregate at the file level.
      lines            Aggregate at the source code line level.
      addresses        Aggregate at the address level.
    sort
      cum              Sort entries based on cumulative weight
      flat             Sort entries based on own weight
  :   Clear focus/ignore/hide/tagfocus/tagignore

  type "help <cmd|option>" for more information
(pprof) top # 查看cpu占用比较多的部分
Showing nodes accounting for 29.84s, 98.78% of 30.21s total
Dropped 175 nodes (cum <= 0.15s)
Showing top 10 nodes out of 11
      flat  flat%   sum%        cum   cum%
     8.11s 26.85% 26.85%      8.11s 26.85%  runtime.unlock2
     7.04s 23.30% 50.15%      7.04s 23.30%  runtime.lock2
     6.39s 21.15% 71.30%     26.70s 88.38%  runtime.chanrecv
     2.41s  7.98% 79.28%      2.41s  7.98%  runtime.memclrNoHeapPointers
     2.04s  6.75% 86.03%     29.97s 99.21%  MyServer/pkg/net_check.reload.func1 # 症结出在这里了
     1.98s  6.55% 92.59%      4.39s 14.53%  runtime.typedmemclr
     1.23s  4.07% 96.66%     27.93s 92.45%  runtime.chanrecv2
     0.27s  0.89% 97.55%      7.31s 24.20%  runtime.lockWithRank (inline)
     0.19s  0.63% 98.18%      8.48s 28.07%  runtime.unlock (inline)
     0.18s   0.6% 98.78%      8.29s 27.44%  runtime.unlockWithRank (inline)
(pprof) %

4. 性能优化建议

  1. 关注 flatcum

    • flat: 函数自身占用的CPU时间
    • cum: 函数及其调用的函数占用的总CPU时间
  2. 分析热点函数

    • 查看占用CPU时间最多的函数
    • 分析这些函数的调用关系
    • 重点优化cumulative时间较高的函数
  3. 常见性能问题

    • 锁竞争(如示例中的lock2/unlock2)
    • 频繁的通道操作(chanrecv)
    • 内存操作(memclr相关)

5. 最佳实践

  1. 定期进行性能分析
  2. 建立性能基准
  3. 在测试环境中进行完整的性能测试
  4. 保存性能分析数据以便比较
  5. 结合其他工具(如trace、heap profile等)

6. 注意事项

  1. 性能分析会对程序性能造成影响
  2. 在生产环境使用时需要注意安全性
  3. 建议在测试环境进行完整的性能分析
  4. 保护debug接口,避免未授权访问