写在前面

源码

Arthas是淘宝开源的JVM问题定位工具,本文一起来看下。

1:准备

arthas是一个可执行的jar包,可以从这里 下载,运行jar包后会列出所有的java进程,用户可以选择附加到哪个java进程,进而就可以使用arthas提供的各种命令来定位问题和辅助性能优化了。如下开始使用操作步骤:

在这里插入图片描述

在arthas的交互命令行中就可以执行其提供的命令来进行操作了,下面一起来看下。

2:命令的使用

2.1:thread

thread命令用来查看线程信息,如线程的名称,所属组,优先级,占用CPU,运行状态等,如下可能结果:

[arthas@24632]$ thread
thread
Threads Total: 34, NEW: 0, RUNNABLE: 10, BLOCKED: 0, WAITING: 4, TIMED_WAITING:
 3, TERMINATED: 0, Internal threads: 17
ID NAME                GROUP     PRIORI STATE %CPU   DELTA TIME   INTER DAEMON
12 useManyCpuThread    main      5      RUNNA 101.28 0.203 0:33.1 false false
2  Reference Handler   system    10     WAITI 0.0    0.000 0:0.00 false true
3  Finalizer           system    8      WAITI 0.0    0.000 0:0.01 false true
4  Signal Dispatcher   system    9      RUNNA 0.0    0.000 0:0.00 false true
5  Attach Listener     system    5      RUNNA 0.0    0.000 0:0.01 false true
15 arthas-timer        system    5      WAITI 0.0    0.000 0:0.00 false true
17 Keep-Alive-Timer    system    8      TIMED 0.0    0.000 0:0.00 false true
18 arthas-NettyHttpTel system    5      RUNNA 0.0    0.000 0:0.01 false true
19 arthas-NettyWebsock system    5      RUNNA 0.0    0.000 0:0.00 false true
20 arthas-NettyWebsock system    5      RUNNA 0.0    0.000 0:0.00 false true
21 arthas-shell-server system    5      TIMED 0.0    0.000 0:0.00 false true
22 arthas-session-mana system    5      TIMED 0.0    0.000 0:0.00 false true
23 arthas-UserStat     system    5      WAITI 0.0    0.000 0:0.00 false true

下面我们来模拟下不同场景下使用该命令来定位问题。

2.1.1:定位CPU彪高问题

CPU彪高一般就是有大量的计算工作在进行,而通过定位到具体的进程就能辅助找到问题代码,测试代码如下:

public class ManyCPUInThread {
    public static void main(String[] args) {
        new Thread(() -> {
            while (true) {
                Md5Utils.getMD5(UUID.randomUUID().toString().getBytes());
                Md5Utils.getMD5(UUID.randomUUID().toString().getBytes());
                Md5Utils.getMD5(UUID.randomUUID().toString().getBytes());
                Md5Utils.getMD5(UUID.randomUUID().toString().getBytes());
                Md5Utils.getMD5(UUID.randomUUID().toString().getBytes());
            }
        }, "useManyCpuThread").start();
    }
}

其中线程名称是useManyCpuThread,这将有助于我们定位导致问题的线程,根据阿里巴巴java规范手册 也应该这么做,如下图:

在这里插入图片描述

运行测试程序,突然发现CPU彪高,如下图:

在这里插入图片描述

我们通过如下步骤使用thread命令来定位问题:

  • 依附到Java进程
$ java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.6.0
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 11104 dongshi.daddy.ManyCPUInThread
  ...
1
[INFO] arthas home: C:\Users\Administrator\.arthas\lib\3.6.0\arthas
[INFO] Try to attach process 11104
[INFO] Attach process 11104 success.
[INFO] arthas-client connect 127.0.0.1 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'

...
time       2022-04-21 13:50:48

[arthas@11104]$
  • thread
[arthas@11104]$ thread
thread
Threads Total: 33, NEW: 0, RUNNABLE: 10, BLOCKED: 0, WAITING: 4, TIMED_WAITING:
 2, TERMINATED: 0, Internal threads: 17
ID NAME                GROUP     PRIORI STATE %CPU   DELTA TIME   INTER DAEMON
12 useManyCpuThread    main      5      RUNNA 107.63 0.218 6:54.7 false false
26 arthas-command-exec system    5      RUNNA 7.69   0.015 0:0.01 false true
2  Reference Handler   system    10     WAITI 0.0    0.000 0:0.01 false true
3  Finalizer           system    8      WAITI 0.0    0.000 0:0.01 false true
4  Signal Dispatcher   system    9      RUNNA 0.0    0.000 0:0.00 false true
...

可以看到行12 useManyCpuThread main 5 RUNNA 107.63 0.218 6:54.7 false false,名称为useManyCpuThread的线程占用了107.63的CPU,然后就可以定位到具体代码,查找CPU占用多的原因了。

2.2:jvm

查看jvm相关信息,如进程号,内存大小,GC耗时,总加载类个数,总线程个数,等,如下:

jvm
 RUNTIME
-------------------------------------------------------------------------------
 MACHINE-NAME           11104@jhp
 JVM-START-TIME         2022-04-21 13:45:26
 MANAGEMENT-SPEC-VERSI  1.2
 ON
 SPEC-NAME              Java Virtual Machine Specification
 SPEC-VENDOR            Oracle Corporation
 SPEC-VERSION           1.8
 VM-NAME                Java HotSpot(TM) 64-Bit Server VM
 VM-VENDOR              Oracle Corporation

2.3:trace

在日常工作中经常会有用户反馈某个接口,但是一般情况下我们是没有在方法中添加对应的耗时日志的,此时定位到底因为什么慢就不太好办了,arthas的trace命令针对这种场景给出了解决方案,允许我们动态的追踪某个方法的执行耗时,从而辅助定位性能问题,下面我们一起来看下。

2.3.1:命令格式

trace 类名称 方法名称,如trace dongshi.daddy.MathGame run,就是追踪dongshi.daddy.MathGame类的run方法。

2.3.2:测试程序
public class MathGame {
    private static Random random = new Random();

    private int illegalArgumentCount = 0;

    public static void main(String[] args) throws InterruptedException {
        MathGame game = new MathGame();
        while (true) {
            game.run();
            TimeUnit.SECONDS.sleep(1L);
        }
    }

    public void run() throws InterruptedException {
        try {
            int number = random.nextInt() / 10000;
            List<Integer> primeFactors = primeFactors(number);
            print(number, primeFactors);
        } catch (Exception e) {
            System.out.println(String.format("illegalArgumentCount:%3d, ", new Object[]{Integer.valueOf(this.illegalArgumentCount)}) + e.getMessage());
        }
    }

    public static void print(int number, List<Integer> primeFactors) {
        StringBuffer sb = new StringBuffer(number + "=");
        for (Iterator<Integer> iterator = primeFactors.iterator(); iterator.hasNext(); ) {
            int factor = ((Integer) iterator.next()).intValue();
            sb.append(factor).append('*');
        }
        if (sb.charAt(sb.length() - 1) == '*')
            sb.deleteCharAt(sb.length() - 1);
        System.out.println(sb);
    }

    public List<Integer> primeFactors(int number) {
        if (number < 2) {
            this.illegalArgumentCount++;
            throw new IllegalArgumentException("number is: " + number + ", need >= 2");
        }
        List<Integer> result = new ArrayList<Integer>();
        int i = 2;
        while (i <= number) {
            if (number % i == 0) {
                result.add(Integer.valueOf(i));
                number /= i;
                i = 2;
                continue;
            }
            i++;
        }
        return result;
    }

    private void primeFactors111() {
        System.out.println("xxxxxxxxxxxxxxx");
    }
}

该类生成一个正整数,然后打印正整数的因子乘积的形式。运行后,继续看2.3.3:trace测试

2.3.3:trace测试
[arthas@20784]$ trace dongshi.daddy.MathGame run -n 2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 70 ms, listenerId: 1
`---ts=2022-04-21 17:54:23;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    `---[0.5033ms] dongshi.daddy.MathGame:run()
        +---[0.0348ms] dongshi.daddy.MathGame:primeFactors() #25
        `---[0.1011ms] dongshi.daddy.MathGame:print() #26

`---ts=2022-04-21 17:54:24;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    `---[0.2798ms] dongshi.daddy.MathGame:run()
        +---[0.1041ms] dongshi.daddy.MathGame:primeFactors() #25
        `---[0.0942ms] dongshi.daddy.MathGame:print() #26

Command execution times exceed limit: 2, so command will exit. You can set it with -n option.

打印2次,实际情况可自行调整,默认是打印100次,以第二次输出为例---[0.2798ms] dongshi.daddy.MathGame:run(),run方法总耗时0.2798ms+---[0.0348ms] dongshi.daddy.MathGame:primeFactors() #25,primeFactors方法耗时0.0348ms,代码行数为25,---[0.0942ms] dongshi.daddy.MathGame:print() #26,print方法耗时0.0942ms,代码行数为26

总耗时大于每个方法的耗时总和,这是因为总耗时包含了arthas添加逻辑的耗时,但是对我们大概定位耗时程序影响不大。

  • 显示jdk相关方法调用
    默认是不显示jdk的方法调用信息的,可以设置--skipJDKMethod false,如下:
[arthas@20784]$ trace --skipJDKMethod false dongshi.daddy.MathGame run -n1
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 24 ms, listenerId: 3
`---ts=2022-04-21 18:18:46;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    `---[0.5441ms] dongshi.daddy.MathGame:run()
        +---[0.032ms] java.util.Random:nextInt() #24
        +---[0.2704ms] dongshi.daddy.MathGame:primeFactors() #25
        `---[0.1162ms] dongshi.daddy.MathGame:print() #26

Command execution times exceed limit: 1, so command will exit. You can set it with -n option.
  • 根据耗时过滤
    如果我们只关心耗时到达一定时长的方法调用,则可以添加#cost>毫秒数,如#cost>20,过滤耗时大于20ms的方法调用,如下测试:
[arthas@19320]$ trace dongshi.daddy.MathGame run -n2 '#cost>20'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 73 ms, listenerId: 1
`---ts=2022-04-21 18:33:53;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    `---[1000.4337ms] dongshi.daddy.MathGame:run()
        +---[999.895ms] dongshi.daddy.MathGame:primeFactors() #25
        `---[0.122ms] dongshi.daddy.MathGame:print() #26

`---ts=2022-04-21 18:33:55;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@18b4aac2
    `---[1015.0411ms] dongshi.daddy.MathGame:run()
        +---[1014.9108ms] dongshi.daddy.MathGame:primeFactors() #25
        `---[0.0678ms] dongshi.daddy.MathGame:print() #26

Command execution times exceed limit: 2, so command will exit. You can set it with -n option.

注意是必须其中有一个方法耗时大于20ms才会显示该次方法调用链路,是显示完整的调用链路,而不是只显示满足耗时时长的方法调用。

2.3.4:真实场景实战

现在有个接口dongshi.daddy.action.JFormAction#canMoveToApplication,执行时长超过100ms,不符合接口要求,需要排查到底哪个方法执行慢,按照如下步骤操作:

[arthas@12784]$ trace dongshi.daddy.action.JFormAction canMoveToApplication -n5
Press Q or Ctrl+C to abort.
Affect(class count: 2 , method count: 2) cost in 222 ms, listenerId: 2
`---ts=2022-04-21 18:50:17;thread_name=http-bio-446-exec-11;id=31;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader@4d62a1fd
    `---[115.7377ms] dongshi.daddy.action.JFormAction$$EnhancerByCGLIB$$4d98ee8e:canMoveToApplication()
        `---[115.5969ms] org.springframework.cglib.proxy.MethodInterceptor:intercept()
            `---[115.5403ms] dongshi.daddy.action.JFormAction:canMoveToApplication()
                +---[0.0239ms] javax.servlet.http.HttpServletRequest:getParameter() #1006
                +---[0.7267ms] org.slf4j.Logger:info() #1007
                +---[14.1981ms] dongshi.daddy.action.JFormAction:getSessionUserInfo2() #1010
                +---[0.0147ms] org.apache.commons.lang3.StringUtils:isBlank() #1018
                +---[0.0131ms] javax.servlet.http.HttpServletRequest:getParameter() #1019
                +---[0.0048ms] org.apache.commons.lang3.StringUtils:isBlank() #1020
                +---[1.0612ms] dongshi.daddy.useformextract.factory.FormExtractorFactory:produceFormExtractor() #1021
                +---[0.0531ms] dongshi.daddy.framework.web.bean.UserInfo:getDeptId() #1021
                +---[0.0266ms] dongshi.daddy.action.JFormAction$1:<init>() #1021
                +---[95.032ms] dongshi.daddy.useformextract.UseFormExtractor:extract() #1021
                +---[1.6599ms] org.slf4j.Logger:info() #1028
                +---[0.1656ms] com.alibaba.fastjson.JSON:toJSONString() #1029
                `---[2.1009ms] dongshi.daddy.action.JFormAction:renderText() #1029

通过+---[95.032ms] dongshi.daddy.useformextract.UseFormExtractor:extract() #1021可以看到,是1021行的UseFormExtractor:extract()耗时多,继续看:

[arthas@12784]$ trace dongshi.daddy.useformextract.UseFormExtractor extract
Press Q or Ctrl+C to abort.
Affect(class count: 5 , method count: 1) cost in 282 ms, listenerId: 4
`---ts=2022-04-21 18:56:54;thread_name=http-bio-446-exec-14;id=34;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader@4d62a1fd
    `---[24.1424ms] dongshi.daddy.useformextract.BaseUseFormExtractor:extract()
        +---[0.2228ms] org.slf4j.Logger:info() #21
        +---[0.0169ms] dongshi.daddy.useformextract.BaseUseFormExtractor:beforeExtract() #22
        +---[83.2874ms] dongshi.daddy.useformextract.BaseUseFormExtractor:doExtract() #23
        +---[0.2633ms] org.slf4j.Logger:info() #24
        +---[0.015ms] dongshi.daddy.useformextract.BaseUseFormExtractor:afterExtract() #25
        `---[0.2138ms] org.slf4j.Logger:info() #26

+---[83.2874ms] dongshi.daddy.useformextract.BaseUseFormExtractor:doExtract() #23可以看出是23行的BaseUseFormExtractor:doExtract(),继续看:

[arthas@12784]$ trace dongshi.daddy.useformextract.BaseUseFormExtractor doExtract -n1
Press Q or Ctrl+C to abort.
Affect(class count: 4 , method count: 4) cost in 400 ms, listenerId: 6
`---ts=2022-04-22 09:57:31;thread_name=http-bio-446-exec-19;id=39;is_daemon=true;priority=5;TCCL=org.apache.catalina.loader.WebappClassLoader@4d62a1fd
    `---[45.2156ms] dongshi.daddy.useformextract.PhoneUseFormExtractor:doExtract()
        +---[0.2457ms] org.slf4j.Logger:info() #26
        +---[56.2093ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryDesignDetailByFormId() #28
        +---[0.0828ms] org.apache.commons.lang.StringUtils:isBlank() #29
        +---[5.2509ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryFieldsInMblConfig() #31
        +---[0.046ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryTargetField() #32
        +---[4.3191ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryFieldsInConfig() #36
        +---[0.0421ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryByJfType() #39
        +---[0.2355ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryByJfType() #61
        +---[2.4586ms] dongshi.daddy.useformextract.PhoneUseFormExtractor:installJumpSetting() #62
        +---[0.6199ms] dongshi.daddy.useformextract.PhoneUseFormExtractor:installDatasource() #64
        +---[0.0262ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryByJfType() #66
        +---[0.0103ms] dongshi.daddy.useformextract.PhoneUseFormExtractor:installDatasource() #67
        +---[0.0207ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryByJfType() #69
        `---[0.7863ms] org.slf4j.Logger:info() #99

+---[56.2093ms] dongshi.daddy.useformextract.BaseUseFormExtractor:queryDesignDetailByFormId() #28处可以看出调用的是BaseUseFormExtractor:queryDesignDetailByFormId()方法,该方法源码如下:

class FakeCls {
    protected String queryDesignDetailByFormId(String formId) {
        IJFormDao jFormDao = SpringContextUtil.getBean("jFormDao");
        String designDetail = jFormDao.getDesignDetailById(formId);
        return designDetail;
    }
}

继续看jFormDao.getDesignDetailById(formId):

class FakeCls {
    @Override
    public String getDesignDetailById(String formId) {
        String sql = "select name from user where id = '"+ xxx +"' and version = '1'";
        return this.findSqlUnique(sql);
    }
}

就能知道是因为sql语句select name from user where id = '"+ xxx +"' and version = '1',接下来就是查看该sql语句为什么慢了。

2.4:sc

通过sc -d 类名可以查看类的信息,如类名称,修饰符,是否接口,是否枚举,父类等信息,如下:

[arthas@1548]$ sc -d dongshi.daddy.MathGame
 class-info        dongshi.daddy.MathGame
 code-source       /E:/workspace-idea/dongshidaddy-labs-new/myjvm/arthas/target/classes/
 name              dongshi.daddy.MathGame
 isInterface       false
 isAnnotation      false
 isEnum            false
 isAnonymousClass  false
 isArray           false
 isLocalClass      false
 isMemberClass     false
 isPrimitive       false
 isSynthetic       false
 simple-name       MathGame
 modifier          public
 annotation
 interfaces
 super-class       +-java.lang.Object
 class-loader      +-sun.misc.Launcher$AppClassLoader@18b4aac2
                     +-sun.misc.Launcher$ExtClassLoader@14ae5a5
 classLoaderHash   18b4aac2

Affect(row-cnt:1) cost in 3 ms.

查看加载类:

[arthas@1548]$ sc dongshi.*
dongshi.daddy.MathGame
Affect(row-cnt:1) cost in 6 ms.

查看类字段:

[arthas@1548]$ sc -d -f dongshi.daddy.MathGame
 class-info        dongshi.daddy.MathGame
 ...
 classLoaderHash   18b4aac2
 fields            name     random
                   type     java.util.Random
                   modifier private,static
                   value    java.util.Random@73c46669

                   name     illegalArgumentCount
                   type     int
                   modifier private

2.5:jad

直接将jvm正在运行的class的字节码byte code反编译为Java 代码,这在运行环境出现问题,而又怀疑是因为代码没有部署成功造成时很有用,在知道arthas该工具之前我的做法如下一般是首先将线上的class下载到本地,然后使用jd-gui 工具反编译查看,但是如果是使用jad就简单的多了,只需要一条命令即可:

2.5.1:查看源码
[arthas@1548]$ jad --source-only dongshi.daddy.MathGame
       /*
        * Decompiled with CFR.
        */
       package dongshi.daddy;

       import java.util.ArrayList;
       import java.util.List;
       import java.util.Random;
       import java.util.concurrent.TimeUnit;

       public class MathGame {
           private static Random random = new Random();
           private int illegalArgumentCount = 0;

           public static void main(String[] args) throws InterruptedException {
               MathGame game = new MathGame();
               while (true) {
/*17*/             game.run();
/*18*/             TimeUnit.SECONDS.sleep(1L);
               }
           }
           // 省略部分
           public List<Integer> primeFactors(int number) {
               try {
/*45*/             Thread.sleep(1000L);
               }
               // 省略部分
/*64*/         return result;
           }
       }

需要注意,反编译的代码和源码在行号上并非完全一致,但是会在前面通过/*源码行号*/来标记实际的行号。

2.5.2:反编译指定的方法
[arthas@1548]$ jad --source-only dongshi.daddy.MathGame print
       public static void print(int number, List<Integer> primeFactors) {
           StringBuffer sb = new StringBuffer(number + "=");
/*34*/     for (int factor : primeFactors) {
/*36*/         sb.append(factor).append('*');
           }
/*38*/     if (sb.charAt(sb.length() - 1) == '*') {
/*39*/         sb.deleteCharAt(sb.length() - 1);
           }
/*40*/     System.out.println(sb);
       }

2.6:watch

方法执行数据观测,让你能方便的观察到指定方法的调用情况。能观察到的范围为:返回值、抛出异常、入参,通过编写 OGNL 表达式进行对应变量的查看。

2.6.1:方法执行之前打印信息
[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors -b -n1
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 16 ms, listenerId: 4
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 13:56:46; [cost=0.0579ms] result=@ArrayList[
    @Object[][isEmpty=false;size=1],
    @MathGame[dongshi.daddy.MathGame@49c2faae],
    null,
]
-b:表示方法执行之前打印信息。
-n1:表示打印一次。
location=AtEnter:表示位置是方法进入前。
result=@ArrayList[:表示方法的返回值是ArrayList。
@Object[][isEmpty=false;size=1]:表示有1个不为空的入参。
@MathGame[dongshi.daddy.MathGame@49c2faae],:表示this对象。
null,:表示返回值,因为是方法调用前,所以是null。

看到入参和this信息展示的信息不够详细,可以通过设置-x 2来设置属性遍历深度,如下:

[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors -b -n1 -x2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 18 ms, listenerId: 5
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 14:09:27; [cost=0.1004ms] result=@ArrayList[
    @Object[][
        @Integer[-52359],
    ],
    @MathGame[
        random=@Random[java.util.Random@73c46669],
        illegalArgumentCount=@Integer[3317],
    ],
    null,
]

可以看到入参,this展示的更加详细了,其中入参是一个Integer,this有两个属性random和illegalArgumentCount。如果是我们不关心this,即不想要返回this信息,可以增加参数"{params,returnObj}",如下:

[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors "{params,returnObj}" -b -n1
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 17 ms, listenerId: 11
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 14:53:21; [cost=0.0826ms] result=@ArrayList[
    @Object[][isEmpty=false;size=1],
    null,
]
2.6.2:方法执行之后打印信息

使用的是-f在正常返回和throw异常都会打印。

[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors -f -n2 -x2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 16 ms, listenerId: 9
method=dongshi.daddy.MathGame.primeFactors location=AtExit
ts=2022-04-22 14:36:19; [cost=4.162564302241E8ms] result=@ArrayList[
    @Object[][
        @Integer[1],
    ],
    @MathGame[
        random=@Random[java.util.Random@73c46669],
        illegalArgumentCount=@Integer[3722],
    ],
    @ArrayList[
        @Integer[2],
        @Integer[41149],
    ],
]
method=dongshi.daddy.MathGame.primeFactors location=AtExceptionExit
ts=2022-04-22 14:36:21; [cost=1002.2582ms] result=@ArrayList[
    @Object[][
        @Integer[-180175],
    ],
    @MathGame[
        random=@Random[java.util.Random@73c46669],
        illegalArgumentCount=@Integer[3723],
    ],
    null,
]

第一个输出location=AtExit代表是方法正常返回了,返回的值是[2,41149]的ArrayList(返回值都能看到,真心强大!),第二个输出location=AtExceptionExit表示发生了异常,方法是以throw异常结束的。

2.6.3:方法返回前和返回后

这种方式通过方法返回前输出可以看到具体入参的值,通过方法返回后输出可以看到具体的返回值,如下:

[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors "{params[0],returnObj}" -s -b -n5 -x2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 19 ms, listenerId: 17
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 15:20:55; [cost=0.0547ms] result=@ArrayList[
    @Integer[194389],
    null,
]
method=dongshi.daddy.MathGame.primeFactors location=AtExit
ts=2022-04-22 15:20:56; [cost=4.189333472393E8ms] result=@ArrayList[
    @Integer[1],
    @ArrayList[
        @Integer[13],
        @Integer[19],
        @Integer[787],
    ],
]

通过@Integer[194389]可看到方法的入参值是194389,返回值是[13,19,787]

2.6.4:过滤
  • 根据参数值过滤
    过滤第一个参数小于0的方法调用:
[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors "{params[0],returnObj}" "params[0]<0" -b -f -x2 -n1
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 20 ms, listenerId: 21
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 15:43:58; [cost=0.0131ms] result=@ArrayList[
    @Integer[-50140],
    null,
]
Command execution times exceed limit: 1, so command will exit. You can set it with -n option.
  • 根据参数类型过滤

需要修改严格模式为false。

[arthas@1548]$ options strict false
 NAME    BEFORE-VALUE  AFTER-VALUE
-----------------------------------
 strict  true          false
[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors "{params, params[0].class.name}" 'params[0].class.name == "java.lang.Integer"' -b -f -x2 -n2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 19 ms, listenerId: 27
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 15:53:58; [cost=0.0907ms] result=@ArrayList[
    @Object[][
        @Integer[-125336],
    ],
    @String[java.lang.Integer],
]
method=dongshi.daddy.MathGame.primeFactors location=AtExceptionExit
ts=2022-04-22 15:53:59; [cost=4.209165192229E8ms] result=@ArrayList[
    @Object[][
        @Integer[-125336],
    ],
    @String[java.lang.Integer],
]
Command execution times exceed limit: 2, so command will exit. You can set it with -n option.
  • 根据参数个数过滤
[arthas@1548]$ watch dongshi.daddy.MathGame primeFactors -b '{params,returnObj}' 'params.length == 1'
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 24 ms, listenerId: 36
method=dongshi.daddy.MathGame.primeFactors location=AtEnter
ts=2022-04-22 18:13:48; [cost=0.0611ms] result=@ArrayList[
    @Object[][isEmpty=false;size=1],
    null,
]
2.6.5:实际场景使用

当我在愉快的划水时,测试突然反馈一个问题,说某个接口不返回数据了,数据最终是通过如下方法返回的:

class FakeCls {
    public ResultData selectData(String id) {
        ResultData entity = formDao.queryEntryFormId(id);
        return entity;
    }
}

因为参数ID是前端传过来的,所以怀疑是这个参数的问题,便使用如下命令获取传入的参数值:

[arthas@13736]$ watch com.jh.jcs.alignForm.service.impl.FormApiService$$EnhancerByCGLIB$$5220dc21 selectFormDataByFormId '{params[0]}' -b -x2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 93 ms, listenerId: 7
method=com.jh.jcs.alignForm.service.impl.FormApiService$$EnhancerByCGLIB$$5220dc21.selectFormDataByFormId location=AtEnter
ts=2022-04-22 18:36:18; [cost=0.0356ms] result=@ArrayList[
    @String[2c91c81a7033d2d2ddd2c91c81a70],
]

可以看到获取的参数值是2c91c81a7033d2d2ddd2c91c81a70,结果证明我的怀疑是错误的,但至少排除了一种错误的可能性,那没办法,继续找问题吧!

3:火焰图

这部分看下如何通过arthas的火焰图来定位相关问题。

3.1:火焰图定位CPU问题

如下测试代码:

public class B {
    public static void main(String[] args) {
        aaaaaaaa();
    }

    private static void aaaaaaaa() {
        bbbbbbbbbbbbb();
    }

    private static void bbbbbbbbbbbbb() {
        cccccccccccccc();
    }

    private static void cccccccccccccc() {
        try {
            while (true) {
                TimeUnit.SECONDS.sleep(2);
                System.out.println("cccccccccc");
                new Thread(() -> {
                    while (true) {
                        System.out.println((UUID.randomUUID().toString().getBytes()));
                        System.out.println((UUID.randomUUID().toString().getBytes()));
                        System.out.println((UUID.randomUUID().toString().getBytes()));
                        System.out.println((UUID.randomUUID().toString().getBytes()));
                        System.out.println((UUID.randomUUID().toString().getBytes()));
                    }
                }, "useManyCpuThread").start();

            }
        } catch (InterruptedException e) {
        }
    }
}

运行程序后使用arthas执行如下操作:

[root@localhost arthas]# java -jar arthas-boot-1.jar 
[INFO] arthas-boot version: 3.6.0
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 2200 dongshi/daddy/huohuo/B
1 # 1:选择对应的PID
[INFO] arthas home: /root/.arthas/lib/3.7.0/arthas
[INFO] Try to attach process 2200
[INFO] Attach process 2200 success.
[INFO] arthas-client connect 127.0.0.1 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          

...
[arthas@2200]$ profiler start # 2:开始采样CPU信息
Profiling started
[arthas@2200]$ profiler getSamples # 3:获取采样数
450
[arthas@2200]$ profiler getSamples
523
[arthas@2200]$ profiler stop  # 4:停止采样,会直接生一个html,即下边的结果
OK
profiler output file: /root/study/arthas/arthas-output/20230807-114216.html 

接下来我们可以打开这个html,如下图只需要看蓝色的,其他色的都不是Java代码
在这里插入图片描述

其中大红色对勾(注意是我自己打的),就是我们程序中消耗CPU的代码,为了更清楚的看到哪些操作分别消耗了多少CPU,可以单击对勾处,使其变为
100%,如下图:
在这里插入图片描述

其中sout占用了72.12%,String.getBytes占用了1.96%,UUDI.randomUUID占用了12.82%,UUID.toString占用了8.2%,对应到程序如下图:
在这里插入图片描述

3.2:火焰图定位内存问题

如下测试代码:

public class B {
    public static void main(String[] args) {
        aaaaaaaa();
    }

    private static void aaaaaaaa() {
        bbbbbbbbbbbbb();
    }

    private static void bbbbbbbbbbbbb() {
        cccccccccccccc();
    }

    private static void cccccccccccccc() {
        // 一个EmptyObject大小
        // 标记字 8字节 , class指针 4字节 ,int num 4字节,short shortNum 2字节,5个double 共40字节 此时共58字节,非8字节整数倍,alignment对齐填充6字节 共64字节
        // 每秒生成5000个对象,即每秒堆内存占用64*1000*5=320k
        List<EmptyObject> XYZZZList = new ArrayList<EmptyObject>();
        try {
            while (true) {
                TimeUnit.MILLISECONDS.sleep(1);
                System.out.println("cccccccccc");
                XYZZZList.add(new EmptyObject());
                XYZZZList.add(new EmptyObject());
                XYZZZList.add(new EmptyObject());
                XYZZZList.add(new EmptyObject());
                XYZZZList.add(new EmptyObject());
            }
        } catch (InterruptedException e) {
        }
    }
}
class EmptyObject {
    int num;
    short shortNum;
    double d1;
    double d2;
    double d3;
    double d4;
    double d5;
}

运行程序后使用arthas执行如下操作:

[root@localhost arthas]# java -jar arthas-boot-1.jar 
[INFO] arthas-boot version: 3.6.0
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 2019 dongshi/daddy/huohuo/B
1
[INFO] arthas home: /root/.arthas/lib/3.7.0/arthas
[INFO] Try to attach process 2019
[INFO] Attach process 2019 success.
[INFO] arthas-client connect 127.0.0.1 3658
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.                           
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'                          
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.                          
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |                         
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'                          

wiki       https://arthas.aliyun.com/doc                                        
tutorials  https://arthas.aliyun.com/doc/arthas-tutorials.html                  
version    3.7.0                                                                
main_class                                                                      
pid        2019                                                                 
time       2023-08-08 13:29:59                                                  

[arthas@2019]$ profiler start --event=alloc 
Profiling started
[arthas@2019]$ profiler getSamples
4
[arthas@2019]$ profiler status
Profiling is running for 32 seconds
[arthas@2019]$ profiler stop
OK
profiler output file: /root/study/arthas/arthas-output/20230808-133051.html

--event=alloc设置对内存占用采样。打开生成的html报告,如下图:
在这里插入图片描述

可以清除的看到main方法调用aaaaaa方法最终调用到ccccc方法,并在ccccc方法中产生了大量的EmptyObject对象,占用了总内存的53.85%,酱!!!

写在后面

参考文章列表

Arthas trace命令

JVM性能调优实战篇(一)

Arthas sc命令

记一次Arthas火焰图(Flame Graph)性能分析实战

jvm之对象大小分析

阿里Java诊断工具 arthas - 排查线上环境内存使用过大、线程cpu使用率高问题

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐