Watcher and Restart Job Mechanism

Job async executors typically execute long running tasks and / or use external services to execute jobs.

If an outage occurs, the whole microservice may stall or stop entirely whilst jobs are running. These jobs are managed by a watcher. All job executors run a watcher unless they are disabled. This scheduled thread watches all running jobs (across all job executors) in order to identify the following scenarios:

Queued Jobs

QUEUED jobs may be considered as “stalled” if they have not moved to a RUNNING state during a timeout deadline. QUEUED jobs that have stalled can be taken by another job executor with an available slot.

1) Any job executor microservice can take this job and move it to a RUNNING state.

2) Any new job executor microservice can take this job and move it to a RUNNING state at the start.

Running Jobs

RUNNING jobs may be considered as “stalled” if they haven’t updated the progress before the timeout deadline. If a running job stalls, it is moved to a TIMED_OUT state.

TIMED_OUT jobs can be taken by any other job executor microservice with an available slot. If a job is taken by another job executor, it is moved to a RUNNING state.

Timed Out Jobs

TIMED_OUT jobs are moved back to a RUNNING state if the original job execution progresses.

TIMED_OUT jobs may be considered as “stalled” if they still haven’t updated the progress for the second deadline timeout.

TIMED_OUT jobs call the resume() method for the same plugin.

  • If resume() is implemented and the job progresses, it is moved to a RUNNING state.

  • If resume() is not implemented or the job has not progressed, it is marked as FAILED.

Note: resume() is only supported in plugin jobs that are not using ActionProgress. In the case of ActionProgress, jobs marked as TIMED_OUT are moved to a FAILED state if no job executor takes responsibility for it.
https://help.ooyala.com/sites/all/libraries/dita/en/media-logistics/flex/dev/70/jef_programming_guide_watcher_restart_job_mechanism.html

Was this article helpful?