首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >零基础OceanBase数据库入门(7):集群性能诊断

零基础OceanBase数据库入门(7):集群性能诊断

作者头像
俊才
发布2026-04-13 17:13:06
发布2026-04-13 17:13:06
770
举报
文章被收录于专栏:数据库干货铺数据库干货铺

上一篇我们掌握了sys租户的三种连接方式,作为集群的“总管理员”,sys租户最核心的能力之一就是集群性能诊断。

不管是日常巡检、故障排查还是容量规划,都离不开性能诊断。本篇完全基于OceanBase官方demo,带你用黑屏命令快速诊断集群:CPU、内存、磁盘、IO四大核心指标,一键定位高负载租户、异常会话、合并转储状态,新手照着敲就能用。

一、先搞懂:性能诊断看什么

OceanBase集群性能,核心看4大核心资源:

CPU:是否过载、哪个租户占比高

内存:MemStore占用、是否触发冻结、各模块内存分布

磁盘:数据盘/日志盘使用率、租户磁盘占用

IO/后台任务:合并、转储、备份、导入导出是否影响性能

所有诊断操作,都在sys租户下执行,先登录:

代码语言:javascript
复制
# 登录sys租户(密码替换为你的)
obclient -h127.0.0.1 -P2881 -uroot@sys -p'OwoAyDN2uLeDIdn62gmn' -Doceanbase -A

二、核心诊断1:服务器资源总览(最常用)

直接查看OBServer的CPU、内存、数据盘、日志盘分配与使用率,一眼看懂集群负载。

代码语言:javascript
复制
-- 查看节点资源总览(CPU/内存/磁盘使用率)
select
  svr_ip as ip,
  CONCAT(cpu_capacity_max, 'C') as cpu_total,
  CONCAT(cpu_assigned_max, 'C') as cpu_assigned,
  CONCAT(TRUNCATE(mem_capacity / 1024 / 1024 / 1024, 2), 'G') as mem_total,
  CONCAT(TRUNCATE(mem_assigned / 1024 / 1024 / 1024, 2), 'G') as mem_assigned,
  CONCAT(TRUNCATE(data_disk_capacity / 1024 / 1024 / 1024, 2),'G') as total_data_size,
  CONCAT(TRUNCATE(log_disk_capacity / 1024 / 1024 / 1024, 2), 'G') as total_log_size,
  CONCAT(TRUNCATE((cpu_assigned_max / cpu_capacity_max) * 100, 2),'%') as cpu_assigned_percent,
  CONCAT(TRUNCATE((mem_assigned / mem_capacity) * 100, 2),'%') as mem_assigned_percent,
  CONCAT(TRUNCATE((data_disk_in_use / data_disk_capacity) * 100, 2),'%') as used_data_percent,
  CONCAT(TRUNCATE((log_disk_in_use / log_disk_capacity) * 100, 2),'%') as used_log_percent
from oceanbase.GV$OB_SERVERS;

结果解读:

  • cpu_assigned_percent:CPU分配率,过高会导致业务卡顿
  • mem_assigned_percent:内存分配率,超过80%需警惕
  • used_data_percent:数据盘使用率,超过85%必须扩容
  • used_log_percent:日志盘使用率,过高会影响写入

三、核心诊断 2:磁盘使用率细分排查

3.1 数据盘使用详情

代码语言:javascript
复制
-- 查看数据盘总使用情况
select
  CONCAT(TRUNCATE(data_disk_capacity/1024/1024/1024,2),'G') as total_data_size,
  CONCAT(TRUNCATE(data_disk_in_use/1024/1024/1024,2),'G') as used_data_size,
  CONCAT(TRUNCATE((data_disk_in_use/data_disk_capacity) * 100, 2),'%') as used_data_percent,
  CONCAT(TRUNCATE((data_disk_capacity - data_disk_in_use)/1024/1024/1024,2),'G') as free_data_size,
  CONCAT(TRUNCATE(((data_disk_capacity - data_disk_in_use)/data_disk_capacity)*100,2),'%') as free_data_percent
from oceanbase.GV$OB_SERVERS;

3.2 日志盘使用详情

代码语言:javascript
复制
-- 查看日志盘总使用情况
select
  CONCAT(TRUNCATE(log_disk_capacity/1024/1024/1024,2),'G') as total_log_size,
  CONCAT(TRUNCATE(log_disk_in_use/1024/1024/1024,2),'G') as used_log_size,
  CONCAT(TRUNCATE(log_disk_assigned/1024/1024/1024,2),'G') as assigned_log_size,
  CONCAT(TRUNCATE((log_disk_in_use/log_disk_capacity) * 100, 2),'%') as used_log_percent,
  CONCAT(TRUNCATE((log_disk_capacity - log_disk_assigned - log_disk_in_use)/1024/1024/1024,2),'G') as free_log_size,
  CONCAT(TRUNCATE(((log_disk_capacity - log_disk_assigned - log_disk_in_use)/log_disk_capacity)*100,2),'%') as free_log_percent
from oceanbase.GV$OB_SERVERS;

磁盘高负载优化建议:

  • 数据盘满:扩容磁盘、清理历史数据、分区归档
  • 日志盘满:清理过期日志、调整日志保留策略

四、核心诊断3:定位高CPU占用租户

CPU高是最常见故障,用这条SQL一键定位耗CPU的租户(每100value=1核CPU)。

代码语言:javascript
复制
-- 查找CPU占用超过10核的租户(value>1000)
select * from gv$sysstat where con_id>1000 and name = 'cpu usage' and value > 1000;

CPU高负载优化建议

  • 优化高消耗SQL
  • 对SQL做限流
  • 扩容租户CPU配额
  • 错峰执行批处理任务

五、核心诊断4:租户级资源占用排行

查看所有租户的CPU、内存、磁盘占用,定位资源大户。

代码语言:javascript
复制
-- 查看各租户资源使用详情
SELECT
  COALESCE(t1.TENANT_ID, -1) as TENANT_ID,
  TENANT_NAME,
  SUM(MAX_CPU) AS MAX_CPU,
  SUM(MIN_CPU) AS MIN_CPU,
  SUM(MAX_MEMORY) AS MAX_MEMORY,
  SUM(MIN_MEMORY) AS MIN_MEMORY,
  SUM(MAX_IOPS) AS MAX_IOPS,
  SUM(MIN_IOPS) AS MIN_IOPS,
  SUM(DATA_DISK_IN_USE) AS DATA_DISK_IN_USE,
  SUM(LOG_DISK_IN_USE) AS LOG_DISK_IN_USE
FROM
  (
    SELECT
      T1.UNIT_ID, T1.SVR_IP, T1.SVR_PORT, T1.ZONE, T2.TENANT_ID,
      T1.MIN_CPU, T1.MAX_CPU, T1.MIN_MEMORY, T1.MAX_MEMORY,
      T1.MIN_IOPS, T1.MAX_IOPS, T1.DATA_DISK_IN_USE, T1.LOG_DISK_IN_USE
    FROM
      (SELECT UNIT_ID,SVR_IP,SVR_PORT,ZONE,SUM(MIN_CPU) MIN_CPU,SUM(MAX_CPU) MAX_CPU,
        SUM(MEMORY_SIZE) MIN_MEMORY,SUM(MEMORY_SIZE) MAX_MEMORY,SUM(MIN_IOPS) MIN_IOPS,
        SUM(MAX_IOPS) MAX_IOPS,SUM(DATA_DISK_IN_USE) DATA_DISK_IN_USE,
        SUM(LOG_DISK_IN_USE) LOG_DISK_IN_USE FROM oceanbase.GV$OB_UNITS GROUP BY UNIT_ID) T1
    JOIN oceanbase.DBA_OB_UNITS T2 ON T1.UNIT_ID = T2.UNIT_ID
  ) t1
JOIN oceanbase.DBA_OB_TENANTS t2 ON t1.TENANT_ID = t2.TENANT_ID
WHERE TENANT_TYPE IN ('SYS', 'USER')
GROUP BY TENANT_ID;

六、核心诊断5:内存使用深度排查

6.1 MemStore内存使用

MemStore是写入内存区域,占比过高会触发冻结,影响写入。

代码语言:javascript
复制
-- 查看各租户MemStore使用情况
select
  /*+ READ_CONSISTENCY(WEAK),query_timeout(100000000) */
  TENANT_ID, SVR_IP,
  round(ACTIVE_SPAN / 1024 / 1024 / 1024, 2) ACTIVE_GB,
  round(MEMSTORE_USED / 1024 / 1024 / 1024, 2) TOTAL_GB,
  round(FREEZE_TRIGGER / 1024 / 1024 / 1024, 2) FREEZE_TRIGGER_GB,
  round(MEMSTORE_USED / FREEZE_TRIGGER * 100, 2) percent_trigger,
  round(MEMSTORE_LIMIT / 1024 / 1024 / 1024, 2) MEM_LIMIT_GB
from V$OB_MEMSTORE
where tenant_id > 1000 or TENANT_ID = 1
order by tenant_id, TOTAL_GB desc;

6.2 内存模块占用Top10

代码语言:javascript
复制
-- 查看内存占用最大的10个模块
select * from V$OB_MEMORY order by hold desc limit 10;

内存高负载优化建议:

  • 扩大租户内存配额
  • 调高转储线程数,加快内存释放
  • 开启写入限速,避免内存暴涨
  • 应用侧降级批任务

七、核心诊断6:会话与连接排查

7.1 各租户活跃会话数

代码语言:javascript
复制
-- 查看各租户活跃会话数
select
  case when cnt is null then 0 else cnt end as cnt, tenant_name, tenant_id
from
  (select DBA_OB_TENANTS.tenant_name,DBA_OB_TENANTS.tenant_id,cnt
   from DBA_OB_TENANTS
   left join (select count(state = 'ACTIVE' OR NULL) cnt,tenant tenant_name
              from GV$OB_PROCESSLIST
              where svr_ip = '127.0.0.1' and svr_port = '2882'
              group by tenant) t1
   on DBA_OB_TENANTS.tenant_name = t1.tenant_name
   where DBA_OB_TENANTS.tenant_type <> 'META') t2;

7.2 实时非休眠会话

代码语言:javascript
复制
-- 查看活跃非睡眠会话
select user,host,state,count(*)
from GV$OB_PROCESSLIST
where state <> 'SLEEP'
group by user,host,state order by host limit 50;

八、核心诊断7:合并与转储状态(后台核心任务)

合并/转储是OB后台关键任务,运行中会占用大量IO/CPU,需排查状态。

代码语言:javascript
复制
-- sys租户查看所有租户合并状态
SELECT
  tenant_id, global_broadcast_scn AS broadcast_scn, is_error AS error,
  status, frozen_scn, last_scn, is_suspended AS suspend,
  info, start_time, last_finish_time
FROM CDB_OB_MAJOR_COMPACTION;

合并高负载优化建议

  • 暂停正在执行的合并
  • 暂停备份、导入导出任务
  • 调低转储 / 迁移并发数
  • 错峰执行后台任务

九、辅助诊断:节点、资源池、RS任务

9.1 查看OBServer运行状态

代码语言:javascript
复制
SELECT
  zone, svr_ip, svr_port, sql_port AS inner_port, with_rootserver,
  UPPER(status) AS status, build_version,
  stop_time * 1000000 AS stop_time,
  start_service_time * 1000000 AS start_service_time,
  last_offline_time * 1000000 AS last_offline_time
FROM oceanbase.DBA_OB_SERVERS;

9.2 查看资源池与Unit配置

代码语言:javascript
复制
-- 查看资源池、Unit规格详情
SELECT
  t1.MODIFY_TIME AS UPDATE_TIME, t1.RESOURCE_POOL_ID, t1.NAME,
  t1.UNIT_COUNT, t1.ZONE_LIST, t1.TENANT_ID, t2.MAX_CPU, t2.MIN_CPU,
  CONCAT(TRUNCATE(t2.MEMORY_SIZE/1024/1024/1024,2),'G') MEM_SIZE,
  CONCAT(TRUNCATE(t2.LOG_DISK_SIZE/1024/1024/1024,2),'G') LOG_DISK_SIZE
FROM oceanbase.DBA_OB_RESOURCE_POOLS t1
JOIN oceanbase.DBA_OB_UNIT_CONFIGS t2 ON t1.UNIT_CONFIG_ID = t2.UNIT_CONFIG_ID;

十、小结:性能诊断标准化流程

新手做OB性能诊断,按这个流程来,快速定位问题:

  • 看总览:执行资源总览SQL,看CPU/内存/磁盘使用率
  • 找租户:定位高负载租户,确认是哪个业务导致
  • 查内存:看MemStore是否占满,是否触发冻结
  • 查会话:看是否有大量活跃会话、慢查询
  • 查后台:确认合并/转储/备份是否在运行
  • 做优化:对应官方建议,扩容、限流、优化SQL
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2026-04-07,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据库干货铺 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档