Friday, February 28, 2014

HiveServer2 does not return ResultSets in UTF-8 encoding 解决HiveServer2 JDBC显示UTF8乱码的问题

add following env variables in Hive startup script ($HIVE_HOME/bin/hive):

export LANG=en_US.UTF-8
export HADOOP_OPTS="$HADOOP_OPTS -Dfile.encoding=UTF-8"

MapR cluster: /opt/mapr/hive/hive-0.11/bin/hive
Cloudera cluster: /opt/cloudera/parcels/CDH/lib/hive/bin/hive
other Hadoop distribution: /usr/lib/hive/bin/hive (maybe...)

make sure, your data in HDFS are encoded in UTF-8, if not, you should set LANG variable and file.encoding in HADOOP_OPTS as same as the encoding you used for the files in HDFS.

我们通过 Hive JDBC 读数据的时候,如果有非 ascii 字符,比如中文,CJK,之类的,默认情况下很可能是无法正确读取的,要么是乱码,要么是问号。。。

可以通过给hive的启动脚本添加上面两个环境变量解决,不同hadoop发行版的hive启动脚本位置有所不同,上面也列出了。

其实重点就是,你存入HDFS的文件编码要与启动hive,hiveserver2,sqoop之类的这些服务的环境变量相同,可以通过在启动脚本中设置LANG,和在HADOOP_OPTS中指定file.encoding解决。

如果你存入hdfs的文件是UTF8,那就这样设置,如果是GBK之类的,就

export LANG="zh_CN.GBK"
export HADOOP_OPTS="$HADOOP_OPTS -Dfile.encoding=GBK"

2 comments:

  1. According to the previous reports, the PSC Result Date 2022 Comilla Board is also last week of December, however, we will update PSC Result 2022 Comilla the official result date here after the official announcement by DPE, as per DPE previous five years result from the announcement of this year result will be announced likely on 30th or 31st December 2022.

    ReplyDelete

© Chutium / Teng Qiu @ ABC Netz Group