上一篇我们掌握了sys租户的三种连接方式,作为集群的“总管理员”,sys租户最核心的能力之一就是集群性能诊断。
不管是日常巡检、故障排查还是容量规划,都离不开性能诊断。本篇完全基于OceanBase官方demo,带你用黑屏命令快速诊断集群:CPU、内存、磁盘、IO四大核心指标,一键定位高负载租户、异常会话、合并转储状态,新手照着敲就能用。
一、先搞懂:性能诊断看什么
OceanBase集群性能,核心看4大核心资源:
CPU:是否过载、哪个租户占比高
内存:MemStore占用、是否触发冻结、各模块内存分布
磁盘:数据盘/日志盘使用率、租户磁盘占用
IO/后台任务:合并、转储、备份、导入导出是否影响性能
所有诊断操作,都在sys租户下执行,先登录:
# 登录sys租户(密码替换为你的)
obclient -h127.0.0.1 -P2881 -uroot@sys -p'OwoAyDN2uLeDIdn62gmn' -Doceanbase -A
二、核心诊断1:服务器资源总览(最常用)
直接查看OBServer的CPU、内存、数据盘、日志盘分配与使用率,一眼看懂集群负载。
-- 查看节点资源总览(CPU/内存/磁盘使用率)
select
svr_ip as ip,
CONCAT(cpu_capacity_max, 'C') as cpu_total,
CONCAT(cpu_assigned_max, 'C') as cpu_assigned,
CONCAT(TRUNCATE(mem_capacity / 1024 / 1024 / 1024, 2), 'G') as mem_total,
CONCAT(TRUNCATE(mem_assigned / 1024 / 1024 / 1024, 2), 'G') as mem_assigned,
CONCAT(TRUNCATE(data_disk_capacity / 1024 / 1024 / 1024, 2),'G') as total_data_size,
CONCAT(TRUNCATE(log_disk_capacity / 1024 / 1024 / 1024, 2), 'G') as total_log_size,
CONCAT(TRUNCATE((cpu_assigned_max / cpu_capacity_max) * 100, 2),'%') as cpu_assigned_percent,
CONCAT(TRUNCATE((mem_assigned / mem_capacity) * 100, 2),'%') as mem_assigned_percent,
CONCAT(TRUNCATE((data_disk_in_use / data_disk_capacity) * 100, 2),'%') as used_data_percent,
CONCAT(TRUNCATE((log_disk_in_use / log_disk_capacity) * 100, 2),'%') as used_log_percent
from oceanbase.GV$OB_SERVERS;
结果解读:
三、核心诊断 2:磁盘使用率细分排查
3.1 数据盘使用详情
-- 查看数据盘总使用情况
select
CONCAT(TRUNCATE(data_disk_capacity/1024/1024/1024,2),'G') as total_data_size,
CONCAT(TRUNCATE(data_disk_in_use/1024/1024/1024,2),'G') as used_data_size,
CONCAT(TRUNCATE((data_disk_in_use/data_disk_capacity) * 100, 2),'%') as used_data_percent,
CONCAT(TRUNCATE((data_disk_capacity - data_disk_in_use)/1024/1024/1024,2),'G') as free_data_size,
CONCAT(TRUNCATE(((data_disk_capacity - data_disk_in_use)/data_disk_capacity)*100,2),'%') as free_data_percent
from oceanbase.GV$OB_SERVERS;
3.2 日志盘使用详情
-- 查看日志盘总使用情况
select
CONCAT(TRUNCATE(log_disk_capacity/1024/1024/1024,2),'G') as total_log_size,
CONCAT(TRUNCATE(log_disk_in_use/1024/1024/1024,2),'G') as used_log_size,
CONCAT(TRUNCATE(log_disk_assigned/1024/1024/1024,2),'G') as assigned_log_size,
CONCAT(TRUNCATE((log_disk_in_use/log_disk_capacity) * 100, 2),'%') as used_log_percent,
CONCAT(TRUNCATE((log_disk_capacity - log_disk_assigned - log_disk_in_use)/1024/1024/1024,2),'G') as free_log_size,
CONCAT(TRUNCATE(((log_disk_capacity - log_disk_assigned - log_disk_in_use)/log_disk_capacity)*100,2),'%') as free_log_percent
from oceanbase.GV$OB_SERVERS;
磁盘高负载优化建议:
四、核心诊断3:定位高CPU占用租户
CPU高是最常见故障,用这条SQL一键定位耗CPU的租户(每100value=1核CPU)。
-- 查找CPU占用超过10核的租户(value>1000)
select * from gv$sysstat where con_id>1000 and name = 'cpu usage' and value > 1000;
CPU高负载优化建议
五、核心诊断4:租户级资源占用排行
查看所有租户的CPU、内存、磁盘占用,定位资源大户。
-- 查看各租户资源使用详情
SELECT
COALESCE(t1.TENANT_ID, -1) as TENANT_ID,
TENANT_NAME,
SUM(MAX_CPU) AS MAX_CPU,
SUM(MIN_CPU) AS MIN_CPU,
SUM(MAX_MEMORY) AS MAX_MEMORY,
SUM(MIN_MEMORY) AS MIN_MEMORY,
SUM(MAX_IOPS) AS MAX_IOPS,
SUM(MIN_IOPS) AS MIN_IOPS,
SUM(DATA_DISK_IN_USE) AS DATA_DISK_IN_USE,
SUM(LOG_DISK_IN_USE) AS LOG_DISK_IN_USE
FROM
(
SELECT
T1.UNIT_ID, T1.SVR_IP, T1.SVR_PORT, T1.ZONE, T2.TENANT_ID,
T1.MIN_CPU, T1.MAX_CPU, T1.MIN_MEMORY, T1.MAX_MEMORY,
T1.MIN_IOPS, T1.MAX_IOPS, T1.DATA_DISK_IN_USE, T1.LOG_DISK_IN_USE
FROM
(SELECT UNIT_ID,SVR_IP,SVR_PORT,ZONE,SUM(MIN_CPU) MIN_CPU,SUM(MAX_CPU) MAX_CPU,
SUM(MEMORY_SIZE) MIN_MEMORY,SUM(MEMORY_SIZE) MAX_MEMORY,SUM(MIN_IOPS) MIN_IOPS,
SUM(MAX_IOPS) MAX_IOPS,SUM(DATA_DISK_IN_USE) DATA_DISK_IN_USE,
SUM(LOG_DISK_IN_USE) LOG_DISK_IN_USE FROM oceanbase.GV$OB_UNITS GROUP BY UNIT_ID) T1
JOIN oceanbase.DBA_OB_UNITS T2 ON T1.UNIT_ID = T2.UNIT_ID
) t1
JOIN oceanbase.DBA_OB_TENANTS t2 ON t1.TENANT_ID = t2.TENANT_ID
WHERE TENANT_TYPE IN ('SYS', 'USER')
GROUP BY TENANT_ID;
六、核心诊断5:内存使用深度排查
6.1 MemStore内存使用
MemStore是写入内存区域,占比过高会触发冻结,影响写入。
-- 查看各租户MemStore使用情况
select
/*+ READ_CONSISTENCY(WEAK),query_timeout(100000000) */
TENANT_ID, SVR_IP,
round(ACTIVE_SPAN / 1024 / 1024 / 1024, 2) ACTIVE_GB,
round(MEMSTORE_USED / 1024 / 1024 / 1024, 2) TOTAL_GB,
round(FREEZE_TRIGGER / 1024 / 1024 / 1024, 2) FREEZE_TRIGGER_GB,
round(MEMSTORE_USED / FREEZE_TRIGGER * 100, 2) percent_trigger,
round(MEMSTORE_LIMIT / 1024 / 1024 / 1024, 2) MEM_LIMIT_GB
from V$OB_MEMSTORE
where tenant_id > 1000 or TENANT_ID = 1
order by tenant_id, TOTAL_GB desc;
6.2 内存模块占用Top10
-- 查看内存占用最大的10个模块
select * from V$OB_MEMORY order by hold desc limit 10;
内存高负载优化建议:
七、核心诊断6:会话与连接排查
7.1 各租户活跃会话数
-- 查看各租户活跃会话数
select
case when cnt is null then 0 else cnt end as cnt, tenant_name, tenant_id
from
(select DBA_OB_TENANTS.tenant_name,DBA_OB_TENANTS.tenant_id,cnt
from DBA_OB_TENANTS
left join (select count(state = 'ACTIVE' OR NULL) cnt,tenant tenant_name
from GV$OB_PROCESSLIST
where svr_ip = '127.0.0.1' and svr_port = '2882'
group by tenant) t1
on DBA_OB_TENANTS.tenant_name = t1.tenant_name
where DBA_OB_TENANTS.tenant_type <> 'META') t2;
7.2 实时非休眠会话
-- 查看活跃非睡眠会话
select user,host,state,count(*)
from GV$OB_PROCESSLIST
where state <> 'SLEEP'
group by user,host,state order by host limit 50;
八、核心诊断7:合并与转储状态(后台核心任务)
合并/转储是OB后台关键任务,运行中会占用大量IO/CPU,需排查状态。
-- sys租户查看所有租户合并状态
SELECT
tenant_id, global_broadcast_scn AS broadcast_scn, is_error AS error,
status, frozen_scn, last_scn, is_suspended AS suspend,
info, start_time, last_finish_time
FROM CDB_OB_MAJOR_COMPACTION;
合并高负载优化建议
九、辅助诊断:节点、资源池、RS任务
9.1 查看OBServer运行状态
SELECT
zone, svr_ip, svr_port, sql_port AS inner_port, with_rootserver,
UPPER(status) AS status, build_version,
stop_time * 1000000 AS stop_time,
start_service_time * 1000000 AS start_service_time,
last_offline_time * 1000000 AS last_offline_time
FROM oceanbase.DBA_OB_SERVERS;
9.2 查看资源池与Unit配置
-- 查看资源池、Unit规格详情
SELECT
t1.MODIFY_TIME AS UPDATE_TIME, t1.RESOURCE_POOL_ID, t1.NAME,
t1.UNIT_COUNT, t1.ZONE_LIST, t1.TENANT_ID, t2.MAX_CPU, t2.MIN_CPU,
CONCAT(TRUNCATE(t2.MEMORY_SIZE/1024/1024/1024,2),'G') MEM_SIZE,
CONCAT(TRUNCATE(t2.LOG_DISK_SIZE/1024/1024/1024,2),'G') LOG_DISK_SIZE
FROM oceanbase.DBA_OB_RESOURCE_POOLS t1
JOIN oceanbase.DBA_OB_UNIT_CONFIGS t2 ON t1.UNIT_CONFIG_ID = t2.UNIT_CONFIG_ID;
十、小结:性能诊断标准化流程
新手做OB性能诊断,按这个流程来,快速定位问题: