Job Status
Any DataStage job* that has been compiled once (or has been imported with binaries)** has a job status which can be queried with the DataStage Director in the Status View, via the dsjob command line (dsjob -jobinfo <project> <job>) or via DataStage BASIC (e.g. DSGetJobInfo(<jobhandle>,DSJ.JOBSTATUS). Unless you start the job the status is NOT RUNNING. When a job is started normally its status changes to RUNNING (the other option would be a validation run). After the execution the job status is either
- RUN OK (job finished without warnings) resp. VAL OK (when started in validation mode),
- RUN WARN (finished with warnings resp. VAL WARN,
- RUN FAILED (aborted with a fatal error) resp. VAL FAILED or
- CRASHED (aborted because of an undetermined cause, e.g. because the DataStage engine died).
- into a DataStage before/after routine to call it as a before/after subroutine for the job or to call it as before/after subroutine from a DataStage BASIC Transformer or
- into a server transformation function to call it within any expression field of the DataStage BASIC Transformer Stage.
A failed, stopped or crashed job needs to be reset before it can be rerun, thus we have another status RESET which indicates that the job finished a reset run. Only restartable sequences can be restarted without reset, they are displayed in the Director as "Aborted/Restartable" which is is not a separate status but rather the combination of the facts that the sequence is in status RUN FAILED and that it is restartable - if you query the jobs status with dsjob, it will also tell you if the job is restartable in a separate field.
A running job can be stopped and yields a status of STOPPED for server jobs and usually RUN FAILED for parallel jobs (stopping a parallel job is like forcing the job to abort). There is a also an interim job status which is the job status before the after-job subroutine is executed and after the job stages and (for sequences / control jobs) all controlled jobs have finished processing, but I would not rely on that too much.
DataStage 9.1 adds WLM. Now a job can also be in the state QUEUED (4) before it is starting to run. If its queued too long and WLM_QUEUE_WAIT_TIMEOUT is reached the job is being STOPPED. As usual the DataStage documentation is rather meager about the new states, not sure if they are mentioned at all.
User Status
Next to the job status every compiled job - even parallel jobs - can have a user status which is just a character string that is saved as part of the job runtime configuration (in the status file of the job in the DataStage engine repository) and which can be also queried with dsjob or in Datastage BASIC with DSGetJobInfo(<jobhandle>, DSJ.USERSTATUS). Although the user status string itself does not have a length limitation I would use it only for max 3 kilobytes (since querying longer status messages is not really supported from dsjob; still if you want you can put megabytes in the user status field, but don't try to open the job log afterwards in the director or designer...).
As the name says the status must be set by the user, i.e. by the developer, and requires a bit of programming: you must call the function DSSetUserStatus. You can call it from an external job control code (if you use your own custom job control rather than job sequences) or you can wrap the function
To set the user status within the job you need the BASIC transformer stage, originally a server stage that can also be used in parallel jobs (for this purpose only!). The BASIC Transformer is not exposed in the palette for parallel jobs, but you can still find the stage in the repository.
The user status is retained until the job is recompiled, the job status is cleared (via the Director or the DataStage Engine shell) or a new user status is set.
What are the status good for?
The job status is of course used to monitor and operate jobs and to built workflows, DataStage sequences or batches. If you have a sequence of dependent jobs and one of the job aborts, usually you do not want to run the subsequent jobs, but only those jobs which are required to clean up the left-overs from the aborted job, so the failed job can be safely restarted. The clean-up I am talking about does not mean to reset the job, but e.g. to remove partially written files or databse changes, which might impact the rerun. You easily achieve this by setting the triggers in the job activity stages in a DataStage sequence according to the job status - one triggers the subsequent job activites in case the job finished ok (RUNOK/RUNWARN) and another trigger in case of job unexpected termination (usually otherwise or by expliciting setting RUNFAIL).
If all your jobs are restartable by themselves (which I highly recommend for sake of simplicity), you can safely use the "Automatically handle activities that fail" of Datastage sequences and leave the trigger for the subsequent jobs to unconditional. With it the DataStage sequence aborts itself if one job fails, unless you have an exception handler for the sequence defined. Usually you want to raise aborts to the top level sequence rather than to handle them in each low-level sequence, to be able to leverage the restart feature of sequences (See "Add checkpoints so sequence is restartable on failure"). When you restart a restartable sequence it will continue from the point of failure and also execute all activities which are checked as "Do not checkpoint run".
The user status does not replace the job status and cannot be safely used to implement a proper failure or restart handling like we can do it with the job status, because: you can never be sure that the user status was correctly set before a job abort or crash and you never know if the job would finish as expected after you set the user status.
Still a user status can be used to amend the job status so you can distinguish different states in case your job finished OK.
One example is a staging job that does some quality checks on the source data. The number of failed checks can be set as the user status which in turn can be queried by a calling sequence, similarly like querying the job status: the user status like the job status is exposed as a job activity variable to subsequent stages of the sequence. Thus you could e.g. send an email if the number of failed checks was greater than 0. Another common use case for the user status is to check if a database table is empty or to check in a database table if some downstream processing can be skipped.
Frequently the user status is also misused to extract parameter values from a database to use them in subsequent job activities. For this it is much simpler to use a parameter set and let a job write the parameter value file which then can be used by the subsequent job activities in a sequence.
And the short answer is...
...not to use the user status as a replacement for the job status. It can be used for additional branching logic in sequences, but not to hand over parameter values from a database table to subsequent job activities.
*Except for mainframe jobs, which we are not covering in this article at all.
**Uncompiled DataStage jobs do not have a status and even though they are displayed as "Not compiled" in the Director, which is not a status, but rather an interpretation of the Director of the fact that there is no binary attached to the job. Trying to query the job status of an uncompiled job with dsjob fails with an error that you cannot attach to the job (DSAttachJob works only for compiled jobs).
Although the parameter set file can be used to write values to it can not be used in the way you described.
ReplyDeleteA sequence needs to have the parameter set defined on it to be able to hand down this parameter set to a job it calls. Not specifiying it on the job in the sequence gives a compilation error. When starting this sequence you can specify a specific file but this also doesn't work. Its values are cached/stored at runtime in the sequence. So any subsequent calls to jobs from the sequence uses the values known at starting the sequence. And then there is the problem of specifying a a specific value file on say a production environment. Often these environments do not allow manual starting of jobs but work with scheduled runs. These methods do not allow the use of specific value files.