Watcher and Restart Job Mechanism

Job async executors typically execute long running tasks and / or use external services to execute jobs.

If an outage occurs, the whole microservice may stall or stop entirely whilst jobs are running. These jobs are managed by a watcher. All job executors run a watcher unless they are disabled. This scheduled thread watches all running jobs (across all job executors) in order to identify the following scenarios:

Queued Jobs

QUEUED jobs may be considered as “stalled” if they have not moved to a RUNNING state during a timeout deadline. QUEUED jobs that have stalled can be taken by another job executor with an available slot.

  • Any job executor microservice can take this job and move it to a RUNNING state.
  • Any new job executor microservice can take this job and move it to a RUNNING state at the start.

Running Jobs

RUNNING jobs may be considered as “stalled” if they haven’t updated the progress before the timeout deadline. If a running job stalls, it is moved to a TIMED_OUT state.

TIMED_OUT jobs can be taken by any other job executor microservice with an available slot. If a job is taken by another job executor, it is moved to a RUNNING state.

Timed Out Jobs

TIMED_OUT jobs are moved back to a RUNNING state if the original job execution progresses.

TIMED_OUT jobs may be considered as “stalled” if they still haven’t updated the progress for the second deadline timeout.

TIMED_OUT jobs call the resume() method for the same plugin.

  • If resume() is implemented and the job progresses, it is moved to a RUNNING state.

  • If resume() is not implemented or the job has not progressed, it is marked as FAILED.

Note: resume() is only supported in plugin jobs that are not using ActionProgress. In the case of ActionProgress, jobs marked as TIMED_OUT are moved to a FAILED state if no job executor takes responsibility for it. See the Development Plugin Guide for more information.
https://help.ooyala.com/sites/all/libraries/dita/en/media-logistics/flex/dev/70/jef_programming_guide_watcher_restart_job_mechanism.html

Was this article helpful?