How to install Shark in CDH 5.0.0 GA
Requirements for Shark
1. CDH5
2. Spark
spark should be already installed in CDH 5, under /opt/cloudera/parcels/CDH/lib/spark
follow these steps you will install Shark 0.9.1 in /var/lib/spark on CDH 5.0.0 GA with Hadoop version 2.3.0
/var/lib/spark is the default user home of spark user in CDH 5.0.0 GA
you need run these scripts as root or spark user (you need to change the shell of spark user to /bin/bash, by default it is nologin)
1. Download Shark source code
export SPARK_USER_HOME=/var/lib/spark cd $SPARK_USER_HOME wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz tar zxf scala-2.10.3.tgz wget https://github.com/amplab/shark/archive/v0.9.1.tar.gz tar zxf v0.9.1.tar.gz
OR YOU CAN DOWNLOAD MY shark-0.9.1 version, it is complied with CDH 5.0.0 packages:
http://user.cs.tu-berlin.de/~tqiu/fxlive/dataset/shark-0.9.1-cdh-5.0.0.tar.gz
2. Configure Shark
we can use the hive 0.12 in CDH5, so we do not need to download spark/shark version of hive 0.11 bin
set following configs in $SPARK_USER_HOME/shark-0.9.1/conf/shark-env.sh :
export SPARK_USER_HOME=/var/lib/spark export SPARK_MEM=2g export SHARK_MASTER_MEM=1g export SCALA_HOME="$SPARK_USER_HOME/scala-2.10.3" export HIVE_HOME="/opt/cloudera/parcels/CDH/lib/hive" export HIVE_CONF_DIR="$HIVE_HOME/conf" export HADOOP_HOME="/opt/cloudera/parcels/CDH/lib/hadoop" export SPARK_HOME="/opt/cloudera/parcels/CDH/lib/spark" export MASTER="spark://test01:7077" SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp " SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 " SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " export SPARK_JAVA_OPTS
(change the host name test01 in MASTER="spark://test01:7077" to your host)
3. Build Shark with Hadoop 2.3.0-cdh5.0.0
if you downloaded my shark-0.9.1 version above (http://user.cs.tu-berlin.de/~tqiu/fxlive/dataset/shark-0.9.1-cdh-5.0.0.tar.gz), you do not need to build it, you can jump to Step 5. Otherwise, you need to compile your shark-0.9.1 with hadoop 2.3.0-cdh5 :
cd $SPARK_USER_HOME/shark-0.9.1/ SHARK_HADOOP_VERSION=2.3.0-cdh5.0.0 ./sbt/sbt package
it takes a long time, depends on your network... normally it will be very slow... -_-
so may be now you want to download the pre-built shark-0.9.1 package for cdh 5.0.0 GA ...
again, it is here:
http://user.cs.tu-berlin.de/~tqiu/fxlive/dataset/shark-0.9.1-cdh-5.0.0.tar.gz
4. Parquet support
wget http://repo1.maven.org/maven2/com/twitter/parquet-hive/1.2.8/parquet-hive-1.2.8.jar -O $SPARK_USER_HOME/shark-0.9.1/lib/parquet-hive-1.2.8.jar ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar $SPARK_USER_HOME/shark-0.9.1/lib/ ln -s /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar $SPARK_USER_HOME/shark-0.9.1/lib/
i am not sure if all of these jars are needed. but it works with these parquet jars...
and if you enable this parquet support, you need to set the SPARK_MEM in $SPARK_USER_HOME/shark-0.9.1/conf/shark-env.sh with at least 2GB .
5. Deploy shark to all the worker nodes
#MASTER cd $SPARK_USER_HOME tar zcf shark.tgz shark-0.9.1
scp this file to each worker, or
#WORKER sudo ln -s /usr/bin/java /bin/java export SPARK_USER_HOME=/var/lib/spark cd $SPARK_USER_HOME scp shark@test01:$SPARK_USER_HOME/shark.tgz $SPARK_USER_HOME/ tar zxf shark.tgz
6. Configure Spark
if your spark service can not be started in CM5 (Cloudera Manager 5), you may need to remove the "noexec" part of the /var or /var/run mount point. Using command:
mount -o remount,exec /var/run
and change the mount parameter in the line of /var or /var/run in /lib/init/fstab for a permanent solution.
you may need to go back to #MASTER, and add the workers in /etc/spark/conf/slaves .
such as you have 2 worker nodes:
echo "test02" >> /etc/spark/conf/slaves echo "test03" >> /etc/spark/conf/slaves
and, in /etc/spark/conf/spark-env.sh you may need to change
export STANDALONE_SPARK_MASTER_HOST=`hostname`to
export STANDALONE_SPARK_MASTER_HOST=`hostname -f`
7. Run it!
finally, i believe you can run the shark shell now!
go back to #MASTER $SPARK_USER_HOME/shark-0.9.1/bin/shark-withinfo -skipRddReload
this -skipRddReload is only needed when you have some table with hive/hbase mapping, because of some issus in PassthroughOutputFormat by hive hbase handler.
the error message is something like:
"Property value must not be null"
or
"java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat"
8. Issuses
ref.: http://bigdataanalyze.blogspot.de/2014/03/installing-shark-in-cdh5-beta2.html
it is a good guide for Installing Shark in CDH5 beta2.
the author has also collect some common issues about Shark in CDH5 beta2: http://bigdataanalyze.blogspot.de/2014/03/issues-on-shark-with-cdh5-beta2-1.html
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteI've been trying to get spark/shark working on cdh5 for several days now, but with no success. Apparently, cdh5 now uses a newer version of the jets3t library (0.9.0) instead of the version that ships with spark/shark (0.7.1).
ReplyDeleteConsequently, installing shark on cdh5 results in a class not found error. (See https://issues.apache.org/jira/browse/SPARK-1556).
But when I try either a) to replace the jets3t jars in the binary releases with 0.9.0, or b) when I try to use a version of spark and/or shark that I've compiled myself with 0.9.0 I get a verify error. (See https://github.com/apache/spark/pull/468#issuecomment-42027309)
Any ideas on how to fix this and work around this dilemma?
Thanks!
For the record, the following solved the issue for me:
Deletecd /usr/lib/shark/lib
ln -s /usr/lib/hadoop/lib/jets3t-0.9.0.jar
I attempted to follow the instructions here, but am stuck on this error. Any ideas?
ReplyDelete[root@cdh-head spark]# ./shark-0.9.1/bin/shark-withinfo
-hiveconf hive.root.logger=INFO,console
Starting the Shark Command Line Client
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapred.JobConf.(JobConf.java:118)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:1077)
at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:1039)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:74)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:58)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:94)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Same exception here. Hadoop version 2.3.0-cdh5.0.1
DeleteSame here, did anyone ever solve this?
DeleteFails for me too... I guess no one solved it yet? Thanks!
DeleteI had a similar class problem that I managed to solve by not skipping Step 3 and actually recompiling shark against the cloudera that I had installed (5.0.2).
Deleteto solve this remove 2 obsolete files:
Deletemv /root/shark-0.9.1/lib_managed/jars/org.apache.hadoop/hadoop-core/hadoop-core-1.0.4.jar{,.backup}
mv /root/shark-0.9.1/lib_managed/jars/org.apache.hadoop/hadoop-test/hadoop-test-0.20.2.jar{,.backup}
First we want to say thanks for the link seems to be working well the steps provided however we have version compatible issue any help appreciated.
DeleteWe have hadoop CDH 5.0.0 and and followed above procedure and we were not able to select * from table; as we saw the jars were used /var/lib/spark/shark-0.9.1/lib_managed/jars/edu.berkeley.cs.shark/ directory we have replaced with hive 0.12 jars now we are getting the below errors when we run $SPARK_USER_HOME/shark-0.9.1/bin/shark-withinfo -skipRddReload
Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.hadoop.hive.shims.ShimLoader.createShim(ShimLoader.java:120)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:115)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:80)
at org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator.setConf(HadoopDefaultAuthenticator.java:51)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthenticator(HiveUtils.java:365)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:285)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:128)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
Thanks in advance.
Adding to above one initially when we followed the original steps we were able to start the shark service and able to see the tables from hive metastore but not able to do anything further then we replaced with hive 0.12 jars then the service is not starting we are getting the above error. Thanks once again.
Deleteoh, are you still using shark? maybe you can try the new release of spark 1.1, and its hive thrift server, it should be a good replacement for shark
DeleteHey guys, I had another go at this tonight. Seems you need to reference the Hadoop jars in the lib folder too. I did it with:
ReplyDeletecd $SPARK_USER_HOME/shark-0.9.1/lib/
for a in `ls /opt/cloudera/parcels/CDH/lib/hadoop/hadoop*jar`; do ln -s $a `echo $a | cut -d"/" -f8`; done
but you get the idea. I also did all the parquet jars with the below but i don't think you need to:
for a in `ls /opt/cloudera/parcels/CDH/lib/parquet/parquet*jar`; do ln -s $a `echo $a | cut -d"/" -f8`; done
The Rajasthan board has conducted 9th class exam from 30th March, . The board examination was conducted in over Rajasthan state. When the board will announce BSER 9th class Syllabus 2023 in June, first week, then here we will Rajasthan 9th Syllabus 2023 update soon it link. Rajasthan Board 9th Syllabus 2023 It BSER 9th Syllabus 2023 official site is BSER 9th Syllabus 2023
ReplyDeleteTripura Board High School Parents can use the Syllabus to Understand the Concepts and Prepare their Children for the Exam, Accordingly, The TBSE 6th, 7th, 8th, 9th, 10th Class Syllabus 2023 All Subject Chapter Wise Students Prepare for the Upcoming 6th, 7th, 8th, 9th, 10th Class Exam, it is quite Essential that they have the Complete Knowledge of the TBSE 6th, 7th, 8th, 9th, 10th Class Latest Syllabus 2023 of All Relevant Subjects, TBSE 7th Class Syllabus Knowing the Details of Prescribed Topics and the weight age Allotted to Them Makes it easy to plan your Studies Meticulously so that you can make Effective Preparations for your Exam and obtain desired marks.
ReplyDelete