Tuesday, July 22, 2014

Lighting a Spark With HBase Full Edition with real world examples ~ dependencies, classpaths, handling ByteArray in HBase KeyValue object

First of all, there are many resources in internet about integrating HBase and Spark

such as

Spark has their own example: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/HBaseTest.scala

MapR has also some cool sample: http://www.mapr.com/developercentral/code/loading-hbase-tables-spark

and here, a more detailed code snippet: http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase

but all of them, has no information about:
  • which jar library are needed, let us say dependency problem
  • how should i set the classpath when i start my spark job/application with HBase connection
  • sc.newAPIHadoopRDD uses this holly class org.apache.hadoop.hbase.client.Result as a return value type, but objects in this Result are org.apache.hadoop.hbase.KeyValue, this is a core client-side Java API of HBase, sometimes it is really not enough to use it just with getColumn("columnFamily".getBytes(), "columnQualifier".getBytes()), and more important is, in scala, to use this KeyValue object is even more complicated.
therefore this post aims to create a "Full" Version...

assume you have already read the samples above. i will go ahead directly to solve this three problems.

if you only want to see some code, jump to the next part of this doc: http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html

1. dependency problem

it is similar as a HBase client program

for maven:

<dependency>
        <groupid>org.apache.spark</groupid>
        <artifactid>spark-core_2.10</artifactid>
        <version>1.0.1</version>
</dependency>

<dependency>
        <groupid>org.apache.hbase</groupid>
        <artifactid>hbase</artifactid>
        <version>0.98.2-hadoop2</version>
</dependency>

<dependency>
        <groupid>org.apache.hbase</groupid>
        <artifactid>hbase-client</artifactid>
        <version>0.98.2-hadoop2</version>
</dependency>

<dependency>
        <groupid>org.apache.hbase</groupid>
        <artifactid>hbase-common</artifactid>
        <version>0.98.2-hadoop2</version>
</dependency>

<dependency>
        <groupid>org.apache.hbase</groupid>
        <artifactid>hbase-server</artifactid>
        <version>0.98.2-hadoop2</version>
</dependency>

sbt:

libraryDependencies ++= Seq(
        "org.apache.spark" % "spark-core_2.10" % "1.0.1",
        "org.apache.hbase" % "hbase" % "0.98.2-hadoop2",
        "org.apache.hbase" % "hbase-client" % "0.98.2-hadoop2",
        "org.apache.hbase" % "hbase-common" % "0.98.2-hadoop2",
        "org.apache.hbase" % "hbase-server" % "0.98.2-hadoop2"
)

change the version of spark and hbase to yours.

2. classpath

in the time of Spark 0.9.x, you just need to set this environment: SPARK_CLASSPATH with HBase's Jars, for example, start spark-shell with local mode, in CDH5 Hadoop distribution:
export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
and then
./bin/spark-shell --master local[2]
or just
SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar ./bin/spark-shell --master local[2]

in your cluster, you should change the path of those jars to your HBase's path, such as in other Hadoop distribution should be some path like /usr/lib/xxx (Hortonworks HDP) or /opt/mapr/hbase-xxx (MapR)

but, but... this lovely SPARK_CLASSPATH is deprecated in the new era of Spark 1.x  !!! -_-

so, in Spark 1.x

there is one conf property and one command line augment for this:
spark.executor.extraClassPath
and
--driver-class-path

WTF... but, yes, you must give the whole jar paths twice!... and spark.executor.extraClassPath must be set in a conf file, can not be set via command line...

so, you need to do this:

edit conf/spark-defaults.conf

add this:
spark.executor.extraClassPath  /opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
and then, start spark shell or submit your spark job with command line args for driver --driver-class-path:
./bin/spark-shell --master local[2]  --driver-class-path  /opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-hadoop2-compat.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar
unbelievable, but it is so in spark 1.x ...

3. how to use org.apache.hadoop.hbase.KeyValue in scala with Spark

it seems this post is already long enough, let us take a break, to see the code of real world examples, you can go to the next part of this doc: http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html

70 comments:

  1. Thanks
    Great article...

    ReplyDelete
  2. This is what I want to know. Thanks!

    ReplyDelete
  3. These are only a few ideas and there are lots more available online. I hope I've given you some inspiration on what you can do to make your Halloween party a spooky success. Bath mirror lamps

    ReplyDelete
  4. It's very useful blog post with inforamtive and insightful content and i had good experience with this information.I have gone through CRS Info Solutions Home which really nice. Learn more details About Us of CRS info solutions. Here you can see the Courses CRS Info Solutions full list. Find Student Registration page and register now.Find this real time DevOps Training and great teaching. Join now on Selenium Training online course. Upskill career with Tableau training by crs info solutions. Latest trending course is Salesforce Lightning training with excellent jobs.

    ReplyDelete
  5. This is really very nice post you shared, i like the post, thanks for sharing..

    Data Science Course

    ReplyDelete
  6. Very awesome!!! When I seek for this I found this website at the top of all blogs in search engine.

    Data Science Training

    ReplyDelete
  7. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    Data Science Training Institute in Bangalore

    ReplyDelete
  8. Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! It’s always nice when you can not only be informed, but also entertained!
    Data Science Course in Bangalore

    ReplyDelete
  9. wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated.
    Data Science Training in Bangalore

    ReplyDelete
  10. Never too late to start learning at Salesforce Training in Australia even though you don't have any programming knowledge you can excell in Salesforce Training in London United Kingdom (UK) because it is all about your customers, so this time find the best Salesforce Training in Europe. This way we will learn Salesforce CRM.

    ReplyDelete
  11. Myself so glad to establish your blog entry since it's actually quite instructive. If it's not too much trouble continue composing this sort of web journal and I normally visit this blog. Examine my administrations.  
    Read these Salesforce Admin Certification Topics which are really helpful. I read these Salesforce Admin and Developer Certification Dumps and very much useful for me. 

    ReplyDelete
  12. I am so happy to found your blog post because it's really very informative. Please keep writing this kind of blogs and I regularly visit this blog. Have a look at my services.  
    This is really the best Top 20 Salesforce CRM Admin Development Interview Questions highly helpful. I have found these Scenario based Salesforce developers interview questions and answers very helpful to attempt job interviews. Wow, i got this scenario based Salesforce interview questions highly helpful.  

    ReplyDelete
  13. I'd love to thank you for the efforts you've made in composing this post. I hope the same best work out of you later on too. I wished to thank you with this particular sites! Thank you for sharing. Fantastic sites!
    360DigiTMG Data Science Course in Bangalore

    ReplyDelete
  14. This is a great post. This post gives a truly quality information. I am certainly going to look into it. Really very helpful tips are supplied here. Thank you so much. Keep up the great works
    360DigiTMG Data Science Training in Bangalore

    ReplyDelete
  15. Get real time project based and job oriented Salesforce training India course materials for Salesforce Certification with securing a practice org, database terminology, admin and user interface navigation and custom fields creation, reports & analytics, security, customization, automation and web to lead forms.  

    ReplyDelete
  16. I see some amazingly important and kept up to length of your strength searching for in your on the sitedata science course

    ReplyDelete

  17. I'm really thankful that I read this. It's extremely valuable and quite informative and I truly learned a great deal from it.
    360DigiTMG Data Science Training Institute in Bangalore

    ReplyDelete
  18. Additionally, this is an excellent article which I truly like studying. It's not everyday I have the option to see something similar to this.
    Data Science Course In Bangalore With Placement

    ReplyDelete
  19. This is a great post I saw thanks to sharing. I really want to hope that you will continue to share great posts in the future.
    artificial intelligence course in noida

    ReplyDelete
  20. If you don't mind, then continue this excellent work and expect more from your great blog posts
    hrdf training course

    ReplyDelete
  21. I was looking at a portion of your posts on this site and I consider this site is really enlightening! Keep setting up..
    360DigiTMG supply chain analytics using r

    ReplyDelete
  22. I feel extremely glad to have seen your site page and anticipate such a large number of additionally engaging occasions perusing here. Much obliged again for all the subtleties.
    hrdf scheme

    ReplyDelete
  23. Regular visits listed here are the easiest method to appreciate your energy, which is why why I am going to the website everyday, searching for new, interesting info. Many, thank you!
    business analytics course

    ReplyDelete
  24. Many sales managers tell me that their salespeople don't meet their expectations. The sales manager pleads, begs and even threatens, but the salesperson just goes through the motions of selling and following through on proposals and sales calls. Salesforce training in Chennai

    ReplyDelete

  25. Thank you quite much for discussing this type of helpful informative article. Will certainly stored and reevaluate your Website.

    Cyber Security Course In Bangalore

    ReplyDelete
  26. I have to search sites with relevant information ,This is a
    wonderful blog,These type of blog keeps the users interest in
    the website, i am impressed. thank you.
    Data Science Course in Bangalore

    ReplyDelete
  27. Excellent blog thanks for sharing the valuable information..it becomes easy to read and easily understand the information.
    Useful article which was very helpful. also interesting and contains good information.
    to know about python training course , use the below link.

    Python Training in chennai

    Python Course in chennai

    ReplyDelete
  28. This post is very simple to read and appreciate without leaving any details out. Great work!
    data scientist courses in gurgaon

    ReplyDelete
  29. I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.I want to share aboutdata analytics courses in yelahanka

    ReplyDelete
  30. Wonderful post. Thanks for taking time to share this information with us.
    Primavera course in Chennai | Primavera p6 training online

    ReplyDelete
  31. My spouse and I stumbled over here by a different web page and thought I should check things out. usamagazine writersevoke pathofex oftenit dsnews I like what I see so i am just following you. Look forward to looking over your web page yet again.

    ReplyDelete
  32. i am glad to discover this page : i have to thank you for the time i spent on this especially great reading !! i really liked each part and also bookmarked you for new information on your site.
    data science courses in hyderabad

    ReplyDelete
  33. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    Data Science Training in Bangalore

    ReplyDelete
  34. I am a new user of this site, so here I saw several articles and posts published on this site, I am more interested in some of them, hope you will provide more information on these topics in your next articles.
    data analytics training in bangalore

    ReplyDelete
  35. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
    data analytics course in bangalore

    ReplyDelete
  36. I read your article it is very interesting and every concept is very clear, thank you so much for sharing. AWS Certification Course in Chennai


    ReplyDelete
  37. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    Data Science Course in Bangalore

    ReplyDelete
  38. It is amazing and wonderful to see your blog. Thanks for sharing this information, this is useful to me.
    Data Science Training in Hyderabad
    Data Science Course in Hyderabad

    ReplyDelete
  39. อีกทั้งเรายังให้บริการ เกมสล็อต ยิงปลา แทงบอลออนไลน์ รองรับทุกการใช้งานในอุปกรณ์ต่าง ๆ HTML5 คอมพิวเตอร์ แท็บเล็ต สมาทโฟน คาสิโนออนไลน์ และมือถือทุกรุ่น เล่นได้ตลอด 24ชม. ไม่ต้อง Downloads เกมส์ให้ยุ่งยาก ด้วยระบบที่เสถียรที่สุดในประเทศไทย

    ReplyDelete
  40. หาคุณกำลังหาเกมส์ออนไลน์ที่สามารถสร้างรายได้ให้กับคุณ เรามีเกมส์แนะนำ เกมยิงปลา รูปแบบใหม่เล่นง่ายบนมือถือ คาสิโนออนไลน์ บนคอม เล่นได้ทุกอุปกรณ์รองรับทุกเครื่องมือ มีให้เลือกเล่นหลายเกมส์ เล่นได้ทั่วโลกเพราะนี้คือเกมส์ออนไลน์แบบใหม่ เกมยิงปลา

    ReplyDelete
  41. Online football betting ufabet will definitely get the price of water more than anywhere else. When compared with other companies such as other water 1.90, we water 1.94 or more, depending on the pair. We guarantee the price of 4 sets of football betting with us, starting with a minimum of only 10 baht, because our website has no minimum deposit with an automatic system

    ReplyDelete
  42. Online slots (Slot Online) may be the release of a gambling machine. Slot computer As stated before Used to produce electrical games known as online slots, on account of the development era, folks have looked to gamble through computer systems. Will achieve slot video games making internet gambling video games Via the world wide web network device Which players can have fun with through the slot plan or will have fun with Slots with the system provider's site Which internet slots games are actually available within the kind of participating in guidelines. It's similar to participating in on a slot machine. The two practical photos as well as sounds are equally thrilling since they go to lounge in the casino on the globe.บาคาร่า
    ufa
    ufabet
    แทงบอล
    แทงบอล
    แทงบอล

    ReplyDelete
  43. pgslot ซึ่งเกมคาสิโนออนไลน์เกมนี้เป็นเกมที่เรียกว่าเกม สล็อตเอ็กซ์โอ คุณรู้จักเกมส์เอ็กซ์โอหรือไม่ 90% ต้องรู้จักเกมส์เอ็กซ์โออย่างแน่นอนเพราะในตอนนี้เด็กนั้นเราทุกคนมักที่จะเอาก็ได้ขึ้นมา สล็อต เล่นเกมส์เอ็กซ์โอกับเพื่อนเพื่อนแล้วคุณรู้หรือไม่ว่าในปัจจุบันนี้เกมส์เอ็กซ์โอนั้นกลายมาเป็นเกมซะลอสออนไลน์ที่ให้บริการด้วยเว็บคาสิโนออนไลน์คุณสามารถเดิมพันเกมส์เอ็กซ์โอกับเว็บคาสิโนออนไลน์ได้โดยที่จะทำให้คุณนั้นสามารถสร้างกำไรจากการเล่นเกมส์เดิมพันออนไลน์ได้เราแนะนำเกมส์ชนิดนี้ให้คุณได้รู้จักก็เพราะว่าเชื่อว่าทุก

    ReplyDelete
  44. I just found this blog and have high hopes for it to continue. Keep up the great work, its hard to find good ones. I have added to my favorites. Thank You.
    best data science online course

    ReplyDelete
  45. Wow, happy to see this awesome post. I hope this think help any newbie for their awesome work and by the way thanks for share this awesomeness, i thought this was a pretty interesting read when it comes to this topic. Thank you..

    Data Science Training in Hyderabad

    ReplyDelete
  46. Very awesome!!! When I seek for this I found this website at the top of all blogs in search engine.
    data science training in malaysia

    ReplyDelete
  47. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    data scientist training in malaysia

    ReplyDelete
  48. Your work is very good and I appreciate you and hopping for some more informative posts
    data science training

    ReplyDelete
  49. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    data science course


    ReplyDelete
  50. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    data science course

    ReplyDelete
  51. This post is very simple to read and appreciate without leaving any details out. Great work!
    data scientist course in aurangabad

    ReplyDelete
  52. That's why we also keep improving our safety management skills to counter the top security companies in London
    new threats our clients may face. The number of times leading media outlets refer to us as experts in security matters is a clear testimony that we are a highly effective and
    innovative service provider that does a better job than any other security company in London.

    ReplyDelete
  53. Thanks for the informative and helpful post, obviously in your blog everything is good..
    data science course in malaysia

    ReplyDelete
  54. Please share this more. Thanks for sharing useful information and don't forget to share useful information.If you are flying to your destination and transiting through Turkey, you will need to obtain a Visa Transit Turkey. This visa allows you to travel through Turkey.

    ReplyDelete
  55. This is a wonderful inspiring article. I am practically satisfied with your great work. You have really put together extremely helpful data. Keep it up.. Are you planning to visit Kenya?For this, you need to fill the Kenya evisa application and pay the fee online.

    ReplyDelete

© Chutium / Teng Qiu @ ABC Netz Group