Sunday, March 18, 2012

Open Source JobScheduler

Introduction
The company I work for uses ActiveBatch Job Scheduling from Advanced Systems Concepts, Inc. For the longest time I have been looking for a viable replacement for this. It is not that there is anything inadequate about the software. On the contrary I suspect we are only using a tiny portion of what it is capable of. The problem is that software runs only on Microsoft Windows. We are a Linux/Mac shop and I resent having to run a Windows system to schedule jobs on Linux servers. In addition, the software costs $12,000 annually to maintain support.

JobScheduler from sos-berlin.com
So I was delighted when I found a link to the JobScheduler at http://www.sos-berlin.com/.

JobScheduler has exactly what I was looking for: Job Chains with dependencies, email notification, logging and in particular, event driven scheduling. The one downside I have found so far is the requirement of a 32 bit JVM. The FAQ anticipates this question stating, "You could ask why has this software not been ported to 64bit? The simple answer is: there is no need for this." It is true that there is no need in terms of the extra memory requirement of the 64 bit JVM but in 2012 one may reasonably say there is no need to stay behind the times. The vast majority of systems today are installed as 64 bit systems, so why not update JobScheduler What this means practically for 64 bit Linux users like myself is an extra configuration step after installation to ensure the 32 bit JVM is used.

On Debian I simply installed the 32 bit JVM with apt-get:

#apt-get install ia32-sun-java6-bin

and added the following line to jobscheduler_environment_variables.sh located in the bin directory under the freshly installed jobscheduler:

JAVA_HOME="/usr/lib/jvm/ia32-java-6-sun/jre"

Documentation
The documentation is pretty thorough. In the course of this write-up I will be working through the documents bundled with the software and trying out the sample implementations. I this way I hope to highlight any weaknesses as well as pointing out the strengths of the software. While there is a supported commercial version available, I will be using only the GPLd version and therefore reliant on the published documentation.


What is a Job?
The documentation defines a job as a program to be executed, its rn time and the event of an error occurring. A job also contains any parameters to be used, pre and post processing, file locks and follow-on jobs. These are all part of the configuration defined in XML, using the tag.

Jobs can be configured as part of the central configuration file in ./config/scheduler.xml but this requires a restart of the JobScheduler. A better solution is to use the hot directory at ./config/live where changes will be picked up with no restart necessary.

Jobs can either be executable programs or implemented using the Job Scheduler API. The API supports a number of languages including Java, Perl, and JavaScript.

My Environment
Before we get into some examples I should set out my installation environment. JobScheduler is installed at the following location:

/opt/sos-berlin.com/jobscheduler/scheduler

The main directory of interest under here is the bin directory. In addition there is a directory structure under the home directory of the user who launched the daemon. In my case it is here:

~/sos-berlin.com/jobscheduler/scheduler

and below here we have: config, jobs, logs, mail, and a sym link to scheduler_home.

The jobs directory contains sample job configuraitons and the config/live directory is where we will put the XML configurations for jobs to be picked up without restart.

Example One
Run a shell script every minute which will log the current date.

The script called getDate.sh is stored under jobs. It includes only the simple date command.

The XML controlling this job looks like this:

<?xml version="1.0" encoding="iso-8859-1"?>
<!--This very verbose example will automatically start the job getDate.sh every minute on the days specified within weekdays within the period 12am-23:59pm. Furthermore you can start the job manually in the period 8am-8pm as defined by run_time. The job executes the./jobs/getDate.sh shell script. -->

<job>
<script language="shell">
<include file="jobs/getDate.sh"/>
<!-- for Windows use "jobs\my_shell_script.cmd" -->
</script>
<run_time begin = "00:00"
end = "23:59">
<weekdays>
<day day="1">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="2">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="3">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="4">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="5">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="6">
<period begin = "00:00" end = "23:59" repeat = "60"/>
</day>
<day day="7">
<period begin = "0:00" end = "23:59" repeat = "60"/>
</day>
</weekdays>
</run_time>
</job>

We can now see the job in the GUI of JobScheduler:


Here we see the job getDate on the left hand side of the screen, with the logs created for each 1 minute run on the right hand side. This is a very simplified example but shows the basics of how JobScheduler works.

Example Two
In this example we will schedule a job by watching a directory for changes.

Again in the live directory we add a file called listDir.job.xml:

<?xml version="1.0" encoding="iso-8859-1"?>
<job>
<script language="shell">
<include file="jobs/listDir.sh"/>
<!-- for Windows use "jobs\my_shell_script.cmd" -->
</script>
<start_when_directory_changed directory = "~/sos-berlin.com/jobscheduler/scheduler/jobs/notification_dir" regex = "\.txt$"/> <run_time/> </job>

Now if we add a file to the notification_dir folder, the script listDir.sh will be executed. This type of execution can be combined with time based execution detailed above.