Job Life Cycle Statuses: Additional Information

Resuming a Job

It is possible to resume a job (resume()). This is called on by the job async executor and resources. The executors provide this method in the event that a job has timed out and won’t respond. Alternatively, the job will fail.

Note: If the isResumeSupported option is configured but resume() is not implemented, the job fails.
Note: please contact the Ooyala Flex team for more information regarding this.

Retrying a Job

Jobs can be retried (retry()). This is called on by the job sync, async executors, and resources in the event that Enterprise has retried the job. Job executors may decide to run the execute() method immediately. It is important to consider different scenarios before executing the job.

Note: please contact the Ooyala Flex team for more information regarding this.

Cancelling

As a user you can cancel a job using either the UI or REST API.

In a distributed environment with no available transaction options, you need a way to cancel jobs. This is especially important when a job is not found in the architecture.

In the case of JEF, Redis is used as a shared repository in order to mark jobs that are being cancelled. The Cancel service is available in the flex-executioncommons-library.

A job can be cancelled while it is in a QUEUED state in JEF. A job can also be cancelled while using a job async executor when executing a job.

cancel(): The cancel() method is called on by the job async executor and resources if Enterprise has requested to cancel the job. The job executor might decide to ignore or not implement this interface. If so, the job can’t be cancelled.

Note: Currently JEF uses ActionProgressService notifications to check whether a cancel request is pending for a job. If there is a cancel request, the cancel() method is invoked for that executor. It is the responsibility of the executor to verify that the job ID corresponds to the execution job ID.

Watcher and Restart Job Mechanism

Job async executors typically execute long running tasks and / or use external services to execute jobs.

If an outage occurs, the whole microservice might stall or stop entirely while jobs are running. These jobs are managed by a watcher. All job executors run a watcher unless disabled. This scheduled thread watches all running jobs (across all job executors) in order to identify the following scenarios:

Queued Jobs

QUEUED jobs may be considered as “stalled” if they have not moved to a RUNNING state during a timeout deadline. QUEUED jobs that have stalled can be taken by another job executor with an available slot.

1) Any job executor microservice can take this job and move it to a RUNNING state.

2) Any new job executor microservice can take this job and move it to a RUNNING state at the start.

Running Jobs

RUNNING jobs may be considered as “stalled” if they haven’t updated the progress before the timeout deadline. If a running job stalls, it is moved to a TIMED_OUT state.

TIMED_OUT jobs can be taken by any other job executor microservice with an available slot. If a job is taken by another job executor, it is moved to a RUNNING state.

Timed Out Jobs

TIMED_OUT jobs are moved back to a RUNNING state if the original job execution progresses.

TIMED_OUT jobs may be considered as “stalled” if they still haven’t updated the progress for their second deadline timeout.

TIMED_OUT jobs call the resume() method for the same plugin.

  • If resume() is implemented and the job progresses, it is moved to a RUNNING state.
  • If resume() is not implemented or the job has not progressed, it is marked as FAILED.
Note: resume() is only supported in plugin jobs that are not using ActionProgress. In the case of ActionProgress, jobs marked as TIMED_OUT are moved to a FAILED state if no job executor takes responsibility for the job.
https://help.ooyala.com/sites/all/libraries/dita/en/media-logistics/flex/dev/70/jef_programming_guide_job_life_cycle_additional_info.html

Was this article helpful?