
线上服务突然CPU飙高100%、接口超时频发、FullGC每隔几分钟一次、线程死锁导致服务卡死、出现异常却没有打印日志,重启服务后问题消失却找不到根因,下次还会复现?传统的jstack、jmap、jhat工具不仅操作繁琐,还需要重启服务、无法热修改代码、线上环境权限受限,根本无法应对分钟级的线上故障。Arthas作为阿里巴巴开源的Java诊断利器,无需重启服务、无需修改代码、全场景覆盖JVM故障排查,是每个Java工程师必备的线上排障神器。本文从高阶核心用法、全链路定位流程、真实线上实战三个维度,带你彻底掌握Arthas,实现线上JVM故障从被动救火到主动定位,从重启解决到根因修复。
Arthas的核心是利用JVM的Instrumentation API,在目标JVM进程启动后动态挂载Agent,通过字节码增强技术修改已加载类的字节码,实现方法执行监控、入参出参捕获、类热替换等能力,全程无需停止目标进程,无需修改业务代码,对业务的侵入性极低。

使用官方一键安装脚本,直接在目标服务器执行:
curl -O https://arthas.aliyun.com/arthas-boot.jar
java -jar arthas-boot.jar
执行后会列出当前服务器所有运行的Java进程,输入目标进程对应的序号,即可完成attach,进入Arthas交互控制台。
本文所有实例基于以下环境构建:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.4</version>
<relativePath/>
</parent>
<groupId>com.jam</groupId>
<artifactId>demo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>demo</name>
<description>Demo project for Arthas Troubleshooting</description>
<properties>
<java.version>17</java.version>
<mybatis-plus.version>3.5.6</mybatis-plus.version>
<fastjson2.version>2.0.52</fastjson2.version>
<guava.version>33.1.0-jre</guava.version>
<springdoc.version>2.5.0</springdoc.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.baomidou</groupId>
<artifactId>mybatis-plus-boot-starter</artifactId>
<version>${mybatis-plus.version}</version>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.alibaba.fastjson2</groupId>
<artifactId>fastjson2</artifactId>
<version>${fastjson2.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>
<dependency>
<groupId>org.springdoc</groupId>
<artifactId>springdoc-openapi-starter-webmvc-ui</artifactId>
<version>${springdoc.version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.32</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project>
实例使用MySQL 8.0,表结构如下:
CREATE TABLE`t_order` (
`order_id`varchar(64) NOTNULLCOMMENT'订单ID',
`user_id`varchar(64) NOTNULLCOMMENT'用户ID',
`amount`decimal(10,2) NOTNULLCOMMENT'订单金额',
`status`tinyintNOTNULLDEFAULT'0'COMMENT'订单状态 0-待支付 1-已支付 2-已取消',
`create_time` datetime NOTNULLDEFAULTCURRENT_TIMESTAMPCOMMENT'创建时间',
`update_time` datetime NOTNULLDEFAULTCURRENT_TIMESTAMPONUPDATECURRENT_TIMESTAMPCOMMENT'更新时间',
PRIMARY KEY (`order_id`),
KEY`idx_user_id` (`user_id`)
) ENGINE=InnoDBDEFAULTCHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='订单表';
package com.jam.demo.entity;
import com.baomidou.mybatisplus.annotation.IdType;
import com.baomidou.mybatisplus.annotation.TableId;
import com.baomidou.mybatisplus.annotation.TableName;
import io.swagger.v3.oas.annotations.media.Schema;
import lombok.Data;
import java.math.BigDecimal;
import java.time.LocalDateTime;
/**
* 订单实体
* @author ken
* @date 2026-03-16
*/
@Data
@TableName("t_order")
@Schema(description = "订单实体")
publicclass Order {
@TableId(type = IdType.ASSIGN_ID)
@Schema(description = "订单ID")
private String orderId;
@Schema(description = "用户ID")
private String userId;
@Schema(description = "订单金额")
private BigDecimal amount;
@Schema(description = "订单状态")
private Integer status;
@Schema(description = "创建时间")
private LocalDateTime createTime;
@Schema(description = "更新时间")
private LocalDateTime updateTime;
}
package com.jam.demo.mapper;
import com.baomidou.mybatisplus.core.mapper.BaseMapper;
import com.jam.demo.entity.Order;
import org.apache.ibatis.annotations.Mapper;
/**
* 订单Mapper接口
* @author ken
* @date 2026-03-16
*/
@Mapper
public interface OrderMapper extends BaseMapper<Order> {
}
package com.jam.demo.service;
import com.baomidou.mybatisplus.extension.service.IService;
import com.jam.demo.entity.Order;
/**
* 订单服务接口
* @author ken
* @date 2026-03-16
*/
public interface OrderService extends IService<Order> {
/**
* 根据订单ID查询订单
* @param orderId 订单ID
* @return 订单实体
*/
Order getOrderById(String orderId);
}
package com.jam.demo.service.impl;
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.jam.demo.entity.Order;
import com.jam.demo.mapper.OrderMapper;
import com.jam.demo.service.OrderService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
/**
* 订单服务实现类
* @author ken
* @date 2026-03-16
*/
@Slf4j
@Service
publicclass OrderServiceImpl extends ServiceImpl<OrderMapper, Order> implements OrderService {
@Override
public Order getOrderById(String orderId) {
LambdaQueryWrapper<Order> queryWrapper = new LambdaQueryWrapper<>();
queryWrapper.eq(Order::getOrderId, orderId);
returnthis.getOne(queryWrapper);
}
}
package com.jam.demo.controller;
import com.jam.demo.entity.Order;
import com.jam.demo.service.OrderService;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.util.StringUtils;
import org.springframework.web.bind.annotation.*;
/**
* 订单控制器
* @author ken
* @date 2026-03-16
*/
@Slf4j
@RestController
@RequestMapping("/order")
@Tag(name = "订单管理", description = "订单相关接口")
publicclass OrderController {
@Autowired
private OrderService orderService;
/**
* 查询订单详情
* @param orderId 订单ID
* @return 订单详情
*/
@GetMapping("/detail")
@Operation(summary = "查询订单详情", description = "根据订单ID查询订单详情")
public Order getOrderDetail(@RequestParam String orderId) {
log.info("开始查询订单详情,订单ID:{}", orderId);
Order order = orderService.getOrderById(orderId);
log.info("查询订单详情完成,订单ID:{}", orderId);
return order;
}
}
package com.jam.demo.controller;
import io.swagger.v3.oas.annotations.Operation;
import io.swagger.v3.oas.annotations.tags.Tag;
import lombok.extern.slf4j.Slf4j;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
/**
* 死锁复现控制器
* @author ken
* @date 2026-03-16
*/
@Slf4j
@RestController
@RequestMapping("/deadlock")
@Tag(name = "死锁复现", description = "死锁复现接口")
publicclass DeadlockController {
privatestaticfinal Object LOCK_A = new Object();
privatestaticfinal Object LOCK_B = new Object();
/**
* 触发死锁
*/
@GetMapping("/trigger")
@Operation(summary = "触发死锁", description = "触发线程死锁,用于Arthas死锁检测演示")
public String triggerDeadlock() {
log.info("开始触发死锁");
Thread thread1 = new Thread(() -> {
synchronized (LOCK_A) {
log.info("线程1持有LOCK_A");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.error("线程1中断", e);
}
synchronized (LOCK_B) {
log.info("线程1持有LOCK_B");
}
}
}, "Thread-0");
Thread thread2 = new Thread(() -> {
synchronized (LOCK_B) {
log.info("线程2持有LOCK_B");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.error("线程2中断", e);
}
synchronized (LOCK_A) {
log.info("线程2持有LOCK_A");
}
}
}, "Thread-1");
thread1.start();
thread2.start();
log.info("死锁触发完成");
return"死锁已触发";
}
}
热替换是Arthas最核心的能力之一,支持在线修改已加载类的方法体,无需重启服务,分钟级修复线上bug。这里明确区分两个易混淆的核心命令:
线上OrderController的getOrderDetail方法存在bug,传入空orderId会触发异常,无需重启服务,通过以下步骤修复:
jad --source-only com.jam.demo.controller.OrderController > /tmp/OrderController.java
@GetMapping("/detail")
@Operation(summary = "查询订单详情", description = "根据订单ID查询订单详情")
public Order getOrderDetail(@RequestParam String orderId) {
log.info("开始查询订单详情,订单ID:{}", orderId);
if (!StringUtils.hasText(orderId)) {
log.error("订单ID不能为空");
throw new IllegalArgumentException("订单ID不能为空");
}
Order order = orderService.getOrderById(orderId);
log.info("查询订单详情完成,订单ID:{}", orderId);
return order;
}
sc -d com.jam.demo.controller.OrderController | grep classLoaderHash
mc -c 1be6f5c0 /tmp/OrderController.java -d /tmp
retransform /tmp/com/jam/demo/controller/OrderController.class
针对线上接口超时、慢调用、偶现异常等问题,Arthas提供了trace、watch、tt三大核心命令,无需新增日志,即可全链路观测方法执行的完整上下文。
trace命令用于追踪方法内部的调用链路,统计每个节点的耗时占比,精准定位慢调用代码行,高阶用法如下:
trace -j com.jam.demo.controller.OrderController getOrderDetail
执行后调用接口,会输出完整调用链路与耗时分布:
---ts=2026-03-16 10:00:00;thread_name=http-nio-8080-exec-1;id=23;is_daemon=true;priority=5;TCCL=org.springframework.boot.web.embedded.tomcat.TomcatEmbeddedWebappClassLoader@1be6f5c0
---[1200ms] com.jam.demo.controller.OrderController:getOrderDetail()
+---[0.23ms] org.slf4j.Logger:info() #24
+---[0.05ms] org.springframework.util.StringUtils:hasText() #25
+---[1195ms] com.jam.demo.service.impl.OrderServiceImpl:getOrderById() #29
+---[0.18ms] org.slf4j.Logger:info() #31
---[0.02ms] return #32
trace -j com.jam.demo.controller.OrderController getOrderDetail '#cost > 1000'
trace -j -E com.jam.demo.controller.OrderController|com.jam.demo.service.impl.OrderServiceImpl getOrderDetail|getOrderById
watch命令用于观测方法的入参、出参、返回值、异常对象,甚至方法内部的局部变量,无需新增日志即可获取完整执行上下文:
watch com.jam.demo.controller.OrderController getOrderDetail "{params,returnObj,throwExp}" -b -e -s -f
watch com.jam.demo.controller.OrderController getOrderDetail "{params,returnObj}" "params[0].equals('123456')" -s
watch com.jam.demo.service.impl.OrderServiceImpl getOrderById "{target, #queryWrapper}" -s
tt(TimeTunnel)命令会记录方法的每一次调用的完整上下文,包括入参、出参、异常、耗时、线程信息,支持事后回放、重新调用,完美解决线上偶现问题无法复现的痛点:
tt -t com.jam.demo.controller.OrderController getOrderDetail
tt -l
tt -i 1001
tt -i 1001 -p
Arthas的thread命令基于JVM的ThreadMXBean实现,比传统jstack更强大,支持实时CPU排序、线程状态过滤、死锁自动检测,无需手动分析堆栈。
线上服务CPU使用率100%,通过以下步骤30秒定位根因:
thread -n 3
"http-nio-8080-exec-2" Id=24 cpuUsage=80.2% RUNNABLE
at com.jam.demo.service.impl.OrderServiceImpl.calculateOrderAmount(OrderServiceImpl.java:88)
at com.jam.demo.service.impl.OrderServiceImpl.getOrderById(OrderServiceImpl.java:30)
at com.jam.demo.controller.OrderController.getOrderDetail(OrderController.java:29)
无需手动分析jstack堆栈,一键自动检测死锁:
thread -b
如果存在死锁,会直接输出死锁线程、持有锁对象、等待锁对象、完整堆栈信息:
Found one Java-level deadlock:
=============================
"Thread-1":
waiting to lock monitor 0x00007f8a9b0c0000 (object 0x0000000700a00000, a java.lang.Object),
which is held by "Thread-0"
"Thread-0":
waiting to lock monitor 0x00007f8a9b0c0008 (object 0x0000000700a00008, a java.lang.Object),
which is held by "Thread-1"
查看所有处于阻塞状态的线程,定位线程卡顿根因:
thread --state BLOCKED
针对线上FullGC频繁、内存泄漏、OOM等问题,Arthas提供了完整的排查工具链,无需重启服务,实时查看内存与GC状态,动态修改JVM参数。
dashboard命令实时刷新JVM整体运行状态,是故障排查的第一步:
dashboard
输出内容包括:
查看GC的详细统计信息,判断GC是否异常:
gc
线上出现内存泄漏、FullGC频繁,生成堆转储文件用于深度分析:
heapdump --live /tmp/heapdump.hprof
--live参数仅dumpFullGC后仍存活的对象,排除可回收的垃圾对象,大幅减小dump文件大小,更精准定位内存泄漏。生成的hprof文件可通过MAT、JProfiler等工具分析,找到占用内存最多的对象,定位泄漏根因。
无需重启服务,动态修改可管理的JVM参数:
# 查看所有JVM参数
vmoption
# 开启OOM时自动生成堆dump
vmoption HeapDumpOnOutOfMemoryError true
# 修改元空间最大大小
vmoption MaxMetaspaceSize 512m
线上出现NoClassDefFoundError、ClassNotFoundException、代码发布后未生效等问题,通过Arthas的sc、sm、jad、classloader命令可快速定位。
查看JVM中是否加载了目标类,以及类的详细信息:
# 搜索所有Order相关的类
sc *Order*
# 查看类的详细信息,包括类加载器、源码位置、注解等
sc -d com.jam.demo.controller.OrderController
查看JVM中实际运行的源代码,排查代码未生效、类冲突问题:
jad --source-only com.jam.demo.controller.OrderController
查看类加载器的层级结构、加载的类数量、URL,排查双亲委派破坏、类冲突问题:
# 查看所有类加载器统计信息
classloader
# 查看类加载器层级结构
classloader -t
# 查看指定类加载器加载的所有类
classloader -c 1be6f5c0 -l
Arthas集成了async-profiler,可生成CPU、内存、锁的火焰图,无需重启服务,线上直接使用,精准定位性能瓶颈:
# 启动CPU火焰图采集,时长30秒
profiler start --duration 30 --event cpu
# 查看采集状态
profiler status
# 停止采集,生成HTML格式火焰图
profiler stop --format html
生成的火焰图可直接在浏览器打开,横向宽度代表CPU占用时间,纵向代表调用栈,一眼定位高耗时方法。
线上排查问题需要DEBUG日志,无需重启服务,动态修改日志级别:
# 查看指定类的日志配置
logger -n com.jam.demo.controller.OrderController
# 修改指定类的日志级别为DEBUG
logger -n com.jam.demo.controller.OrderController -l DEBUG
# 修改根日志级别为INFO
logger -l INFO
Arthas支持OGNL表达式,可执行任意Java代码,调用Spring Bean、获取静态变量、执行方法,是线上排查的万能工具:
ognl '@org.springframework.web.context.ContextLoader@getCurrentWebApplicationContext().getBean("orderServiceImpl").getOrderById("123456")'
ognl '@com.jam.demo.controller.DeadlockController@LOCK_A'
ognl '@org.springframework.web.context.ContextLoader@getCurrentWebApplicationContext().getBean("orderMapper").selectById("123456")'





现象:线上服务CPU使用率持续100%,接口响应超时,告警触发,重启服务后10分钟复现。排查过程:
现象:线上服务每3分钟触发一次FullGC,老年代回收后使用率仍达85%,内存持续上涨,最终OOM。排查过程:
现象:线上订单查询接口偶现超时,每天出现3-5次,无法稳定复现,本地测试正常,无异常日志。排查过程:
Arthas作为Java线上故障排查的神器,覆盖了JVM故障的全场景,从CPU、内存、线程、类加载到接口性能、代码热修复,全程无需重启服务,无需修改代码,极大提升了线上故障排查的效率。本文的高阶用法和全链路定位流程,均经过线上生产环境验证,掌握这些能力,你就能从容应对任何JVM线上故障,从被动救火的业务开发,进阶为能快速定位根因、解决问题的技术专家。