Using oozie cooradinator + oozie workflow we can schedule our data processing tasks, it could be also used as a monitoring and error handling system.
Background
To introduce the Oozie framework in detail we need to know some backgrounds of oozie.
Oozie http://oozie.apache.org/ is a workflow scheduler system
to manage Apache Hadoop jobs. The workflows and scheduler (in oozie it is a
coordinator) will be defined with XML files. Oozie provides a management
CLI and a web interface to show its running workflows. And you can also
communicate with oozie from a Java program using Oozie Java Client library.
XML
Schema
There is two types of XML, workflow,
which defines the payload of a oozie job, and the other is coordinator, which
defines the scheduling information of a oozie job.
You can use following command to
validate if a xml file is valid workflow or coordinator definition in oozie
framework.
$ oozie validate workflow.xml
Then we take a look at the detail of
these two XML schema.
Workflow
Workflow schema defined, a workflow must have extractly one start element
and end element. And between them, 0 to unbounded actions
or decisions and so on. Some of actions and decision element will
be explained in Example chapter.
Coordinator
Coordinator schema shows,
that one coordinator can only involve extractly one workflow in action element.
That means, one coordinator can only
control one workflow.
But, one workflow can have any
number of sub-workflows.
Management
Interface
Oozie provides three kinds of
management interface, all you want do with oozie can be done in
command line interface (CLI), there is also a Web interface which is using a so
called Oozie URL, but from the web interface you can only take a look the
information, you can not manage your oozie server or oozie workflows from
browser. And third choice is also powerful, you can do everything in your Java
program with Oozie Java Client library.
CLI
Some useful command:
Start a oozie workflow (or
coordinator):
$ oozie job -oozie http://fxlive.de:11000/oozie -config /some/where/job.properties -run
Get the information of a
oozie job with its ID (such as 0000001-130104191423486-oozie-oozi-W):
$ oozie job -oozie http:/fxlive.de:11000/oozie -info 0000001-130104191423486-oozie-oozi-W
Get the task log of a oozie job:
$ oozie job -oozie http://fxlive.de:11000/oozie -log 0000001-130104191423486-oozie-oozi-W
Stop a oozie job:
$ oozie job -oozie http://fxlive.de:11000/oozie -kill 0000003-130104191423486-oozie-oozi-W
The "-oozie" refers to a
URL that called Oozie URL, by each command you have to point
this URL explicit.
Web
The web interface is just the same
as the "Oozie URL". For example, in this case, it is: http://fxlive.de:11000/oozie
Using this URL you can get all of
informations about running jobs and configurations by your browser.
Java
Client
There is also a java client library of oozie.
=== Tips ===
1) Deploy Oozie ShareLib in HDFS
http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/
https://ccp.cloudera.com/display/CDH4DOC/Oozie+Installation#OozieInstallation-InstallingtheOozieShareLibinHadoopHDFS
$ mkdir /tmp/ooziesharelib $ cd /tmp/ooziesharelib $ tar zxf /usr/lib/oozie/oozie-sharelib.tar.gz $ sudo -u oozie hadoop fs -put share /user/oozie/share
2) Oozie Sqoop Action arguments from properties
Oozie Sqoop Action does not support multi-lines sqoop command from property file very well, we should use <arg> tag to set sqoop command line by line as sqoop job parameters.
3) ZooKeeper connection problem by importing to HBase using Sqoop-action
Problem: The mapper of a sqoop action tries to access zookeeper on localhost and not the one of the cluster.
Solution:
- Go to cloudera manager of the corresponding cluster
- Go to zookeeper service and get the hbase-site.xml
- Copy hbase-site.xml into hdfs under /tmp/ooziesharelib/share/lib/sqoop/
4) ZooKeeper connection problem by HBase Java client
Just like the similar ZooKeeper problem by Sqoop-action, we can put the hbase-site.xml into oozie common sharelib, or if we want to manually load HBase ZooKeeper configuration in Java, we can put the hbase-site.xml in jar, and then:
Configuration conf = new Configuration(); conf.addResource("hbase-site.xml"); conf.reloadConfiguration();
5) Hive action throws NestedThrowables: JDOFatalInternalException and InvocationTargetException
Put MySQL Java Connector into share lib or in Hive Workflow Root
If it still doesn't work, the take a look at http://cloudfront.blogspot.de/2012/06/failed-error-in-metadata.html
In short form:
hadoop fs -chmod g+w /tmp hadoop fs -chmod 777 /tmp hadoop fs -chmod g+w /user/hive/warehouse hadoop fs -chmod 777 /user/hive/warehouse
6) Fork-Join action errorTo same transitions
It is fixed in Oozie version 3.3.2 (https://issues.apache.org/jira/browse/OOZIE-1035)
A temporarily solution for old Oozie version is shown here: https://issues.apache.org/jira/browse/OOZIE-1142
In short:
In oozie-site.xml , set oozie.validate.ForkJoin to false and restart Oozie.
7) Default maximum output data size is only 2 KB
sometime you will get this error:
Failing Oozie Launcher, Output data size [4 321] exceeds maximum [2 048]
Failing Oozie Launcher, Main class [com.myactions.action.InitAction], exception invoking main(), null
org.apache.oozie.action.hadoop.LauncherException
at org.apache.oozie.action.hadoop.LauncherMapper.failLauncher(LauncherMapper.java:571)
yep, it will happen sooner or later, because the default maximum output data size is only 2KB -_-if you want to change this setting, you need set the property oozie.action.max.output.data to a larger one in oozie-site.xml , such as:
<property> <name>oozie.action.max.output.data</name> <value>1048576</value> </property>will set the max output size to 1024 KB .
8) SSH-Tunnel to bypass the firewall to get the web interface
The port 11000 may be blocked by default in some firewall, so if you want to use the web interface of oozie, you may need to set a ssh tunnel, to redirect the traffic with localhost:11000 to port 11000 on oozie server.
Then you can get the web interface using URL: http://localhost:11000/
9) sendmail-action after a decision-action
Java program set a status property and message property. by decision-action check if the status equals 0.
Sponsors: TUI.com mobilcom-debitel Online-Shop
Best No Deposit Bonus Codes in India - Herzamanindir.com
ReplyDelete5 steps1.Visit https://deccasino.com/review/merit-casino/ the official website of No Deposit India.
Benefits of using 바카라사이트 a no 바카라 deposit bonus.
Benefits of using a no deposit bonus.
Benefits of using sporting100 a no deposit bonus.
Online Sincere herzamanindir.com/ Accessory domain www.online-bookmakers.info