hbase的region处在RIT状态的时长超过阈值问题
2024-10-22 15:54:00
# hbase
# 问题汇总
#hbase
hbase的region处在RIT状态的时长超过阈值问题
背景
在master界面可以看到节点上线了,但是依赖hbase的服务显示异常
再查看一下master服务发现大量的region数据处于rit状态
![alt text](data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0OCIgaGVpZ2h0PSI0OCIgdmlld0JveD0iMCAwIDI0IDI0Ij48Y2lyY2xlIGN4PSI0IiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgaWQ9InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAiIGF0dHJpYnV0ZU5hbWU9InIiIGJlZ2luPSIwO3N2Z1NwaW5uZXJzM0RvdHNTY2FsZTEuZW5kLTAuMjVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjxjaXJjbGUgY3g9IjEyIiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNnMiIGR1cj0iMC43NXMiIHZhbHVlcz0iMzsuMjszIi8+PC9jaXJjbGU+PGNpcmNsZSBjeD0iMjAiIGN5PSIxMiIgcj0iMyIgZmlsbD0iY3VycmVudENvbG9yIj48YW5pbWF0ZSBpZD0ic3ZnU3Bpbm5lcnMzRG90c1NjYWxlMSIgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNDVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjwvc3ZnPg==)
原因
一般这个问题的原因是因为hdfs上的副本丢失了,导致的hbase数据异常
也可能是在hdfs中删除了hbase的数据文件,但是meta表中还有这个表的数据,所以出现了rit问题
修复方法
assign
对于还存在的表可以使用上线错误region,重新分配region到合适的regionserver上,然后手动刷新meta元数据表数据
1 2 3 4 5 6
| # 恢复region hbase> assign 'rowkey' # 查看region状态,应该变成 OPEN hbase> get 'hbase:meta','rowkey' # 手动将表的状态从ENABLING恢复为ENABLE,就可以恢复正常使用了 hbase> put 'hbase:meta', 'tablename','table:state',"\b\0"
|
hbase> assign 'rowkey'
hbase> assign 'rowkey'
是重新分配指定的region,这里的rowkey是这个表的起始rowkey,可以在webui的tables选项中
![alt text](data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0OCIgaGVpZ2h0PSI0OCIgdmlld0JveD0iMCAwIDI0IDI0Ij48Y2lyY2xlIGN4PSI0IiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgaWQ9InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAiIGF0dHJpYnV0ZU5hbWU9InIiIGJlZ2luPSIwO3N2Z1NwaW5uZXJzM0RvdHNTY2FsZTEuZW5kLTAuMjVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjxjaXJjbGUgY3g9IjEyIiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNnMiIGR1cj0iMC43NXMiIHZhbHVlcz0iMzsuMjszIi8+PC9jaXJjbGU+PGNpcmNsZSBjeD0iMjAiIGN5PSIxMiIgcj0iMyIgZmlsbD0iY3VycmVudENvbG9yIj48YW5pbWF0ZSBpZD0ic3ZnU3Bpbm5lcnMzRG90c1NjYWxlMSIgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNDVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjwvc3ZnPg==)
![alt text](data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0OCIgaGVpZ2h0PSI0OCIgdmlld0JveD0iMCAwIDI0IDI0Ij48Y2lyY2xlIGN4PSI0IiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgaWQ9InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAiIGF0dHJpYnV0ZU5hbWU9InIiIGJlZ2luPSIwO3N2Z1NwaW5uZXJzM0RvdHNTY2FsZTEuZW5kLTAuMjVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjxjaXJjbGUgY3g9IjEyIiBjeT0iMTIiIHI9IjMiIGZpbGw9ImN1cnJlbnRDb2xvciI+PGFuaW1hdGUgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNnMiIGR1cj0iMC43NXMiIHZhbHVlcz0iMzsuMjszIi8+PC9jaXJjbGU+PGNpcmNsZSBjeD0iMjAiIGN5PSIxMiIgcj0iMyIgZmlsbD0iY3VycmVudENvbG9yIj48YW5pbWF0ZSBpZD0ic3ZnU3Bpbm5lcnMzRG90c1NjYWxlMSIgYXR0cmlidXRlTmFtZT0iciIgYmVnaW49InN2Z1NwaW5uZXJzM0RvdHNTY2FsZTAuZW5kLTAuNDVzIiBkdXI9IjAuNzVzIiB2YWx1ZXM9IjM7LjI7MyIvPjwvY2lyY2xlPjwvc3ZnPg==)
这里显示的就是这个表的所有region,只要找到对应有问题的region就可以看到这个region的出事的rowkey了
get 'hbase:meta','rowkey'
这个是获取指定的rowkey的region的状态,这里的rowkey和上面的rowkey相同
put 'hbase:meta', 'tablename','table:state',"\b\0"
这个是修改表的状态,,将表从 ENABLING(启用中)
手动修改为 ENABLE(已启用)
,使表可以正常使用。
这些命令是用来手动修复hbase表的状态和region的状态的,应对region没有正确分配或者卡在某些不正常的状态下
手动删除元数据
因为本质原因是因为元数据和数据文件对应不上,元数据中的表状态没有更新导致的这个问题,我们可以删除错误的region的元数据来使这个表的在元数据存储的region都有效
删除出错的region的元数据代码
注意我们只需要删除错误的region的元数据,一个表的其他region正常就不需要删除表的所有元数据,否则会导致正常的region也出现错误
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| @Test public void testDeleteSpecificRegion() throws IOException { TableName tableName = TableName.valueOf("hbase:meta"); Table table = hbaseConn.getTable(tableName);
Scan scan = new Scan(); String regionRowKey = "rowkey"; PrefixFilter prefixFilter = new PrefixFilter(Bytes.toBytes(regionRowKey)); scan.setFilter(prefixFilter); ResultScanner rs = table.getScanner(scan); List<Delete> deletes = new ArrayList<>();
for (Result r : rs) { Cell[] cells = r.rawCells(); for (Cell c : cells) { String rowKey = Bytes.toString(CellUtil.cloneRow(c)); System.out.println("Region rowkey to delete: " + rowKey);
Delete delete = new Delete(CellUtil.cloneRow(c)); deletes.add(delete); } }
if (!deletes.isEmpty()) { table.delete(deletes); System.out.println("Deleted " + deletes.size() + " region metadata entries."); } else { System.out.println("No matching region found for deletion."); } }
|
删除之后我们可以手动停掉master,然后删除hbase/MasterProcWALs/目录下的日志文件,否则hbase仍然会出现RIT