一、问题现象
- hive 执行 insert overwrite table dwd_prod_info_frequent_di partition(dt='20230225',rgn='cn') 双分区的形式。
INFO : Moving data to directory hdfs://nameservice1/data/hive/warehouse/2c/dwd/dwd_prod_info_frequent_di/dt=20230225/rgn=cn/.hive-staging_hive_2023-02-26_22-18-23_057_6764590617411203663-4519/-ext-10000 from hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn/.hive-staging_hive_2023-02-26_22-18-23_057_6764590617411203663-4519/-ext-10002
INFO : Starting task [Stage-0:MOVE] in serial mode
INFO : Loading data to table ninebot_dw_cn_2c.dwd_prod_t_bms_info_frequent_di partition (dt=20230225, rgn=cn) from hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn/.hive-staging_hive_2023-02-26_22-18-23_057_6764590617411203663-4519/-ext-10000
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn/.hive-staging_hive_2023-02-26_22-18-23_057_6764590617411203663-4519/-ext-10000 to destination hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 17 Cumulative CPU: 94943.39 sec HDFS Read: 4486454836 HDFS Write: 15886513875 HDFS EC Read: 0 SUCCESS
INFO : Total MapReduce CPU Time Spent: 1 days 2 hours 22 minutes 23 seconds 390 msec
INFO : Completed executing command(queryId=hive_20230226221823_ff9fc631-15fc-4ebf-871f-36265e72b8fd); Time taken: 6032.278 seconds
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn/.hive-staging_hive_2023-02-26_22-18-23_057_6764590617411203663-4519/-ext-10000 to destination hdfs://nameservice1/data/hive/warehouse/ninebot_dw_cn_2c/dwd/dwd_prod_t_bms_info_frequent_di/dt=20230225/rgn=cn
taskId-3318086,error
二、持续时间
连续好几天的任务都报了相同的错误
三、问题原因
使用的CDH hive 2.1.1-cdh6.1.1 版本,上网查资源说是bug,并没有修复
四、解决问题
-- 这个参数默认了cdh的版本并没有开启,所以手动开启一下
set hive.warehouse.subdir.inherit.perms = false;
--
set hive.insert.into.multilevel.dirs= true;
最终每天的周期调度还是失败,但是手动删除文件后就在重跑就可以了,问题比较奇怪,所以我从hive改成 spark ,改成spark 之后任务不失败了,并且发现了问题的端倪。
突然间发现不仅有spark 写的文件 ,还有hive生成的文件,所以猜测到有可能是谁的表location写错了。导致写文件冲突了,由于hive每次生成的文件名称都是固定的按照编号,那么如果两个hive同时写一个文件的话,就会出现move过去相同的文件,也就导致move报错。
既然猜测是 location 写错了,那么就要定位到底是那个表的location写错了。于是登录hive得元数据库。
MySQL [metastore]> select
-> t1.TBL_NAME,
-> t2.LOCATION
-> from (
-> select * from TBLS
-> ) t1 join (
-> select * from SDS
-> where LOCATION like '%data/hive/warehouse/2c/dwd/dwd_prod_info_frequent_di%'
-> ) t2 on t1.SD_ID=t2.SD_ID;
+------------------------------------+----------------------------------------------------------------------------------------------+
| TBL_NAME | LOCATION |
+------------------------------------+----------------------------------------------------------------------------------------------+
| dwd_prod_bms_info_frequent_v_di | hdfs://nameservice1/data/hive/warehouse/2c/dwd/dwd_prod_info_frequent_di |
| dwd_prod_info_frequent_di | hdfs://nameservice1/data/hive/warehouse/2c/dwd/dwd_prod_info_frequent_di |
+------------------------------------+----------------------------------------------------------------------------------------------+
查询得到 有两张表的location相同了;
查询一下 创建时间;
MySQL [metastore]> select * from TBLS where TBL_NAME = 'dwd_prod_bms_info_frequent_v_di';
+--------+-------------+--------+------------------+-------+------------+-----------+---------+------------------------------------+----------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | OWNER_TYPE | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+--------+------------------+-------+------------+-----------+---------+------------------------------------+----------------+--------------------+--------------------+
| 737871 | 1677150410 | 386004 | 0 | simba | USER | 0 | 4460486 | dwd_prod_bms_info_frequent_v_di | EXTERNAL_TABLE | NULL | NULL |
+--------+-------------+--------+------------------+-------+------------+-----------+---------+------------------------------------+----------------+--------------------+--------------------+
1 row in set (0.00 sec)
正好和出现问题的时间差不多。所以根本问题是move文件的时候文件名称相同冲突导致的。
于是修改表的location 问题解决。
五、参考资料
https://www.notion.so/20230227-hive-move-8b1ecfbe901547faa7011d27ddc94e74?pvs=4#3be0a81c98924d47a4b2507b1b523402
https://www.notion.so/20230227-hive-move-8b1ecfbe901547faa7011d27ddc94e74?pvs=4#5eebdf49984d499588226ff2c087fb7c