Job Life Cycle Statuses: Additional Information
Resuming a Job
It is possible to resume a job (resume()). This is called on by the job async executor and resources. The executors provide this method in the event that a job has timed out and won’t respond. Alternatively, the job will fail.
Retrying a Job
Jobs can be retried (retry()). This is called on by the job sync, async executors, and resources in the event that Enterprise has retried the job. Job executors may decide to run the execute() method immediately. It is important to consider different scenarios before executing the job.
As a user you can cancel a job using either the UI or REST API.
In a distributed environment with no available transaction options, you need a way to cancel jobs. This is especially important when a job is not found in the architecture.
In the case of JEF, Redis is used as a shared repository in order to mark jobs that are being cancelled. The Cancel service is available in the flex-executioncommons-library.
A job can be cancelled while it is in a QUEUED state in JEF. A job can also be cancelled while using a job async executor when executing a job.
cancel(): The cancel() method is called on by the job async executor and resources if Enterprise has requested to cancel the job. The job executor might decide to ignore or not implement this interface. If so, the job can’t be cancelled.
Watcher and Restart Job Mechanism
Job async executors typically execute long running tasks and / or use external services to execute jobs.
If an outage occurs, the whole microservice might stall or stop entirely while jobs are running. These jobs are managed by a watcher. All job executors run a watcher unless disabled. This scheduled thread watches all running jobs (across all job executors) in order to identify the following scenarios:
QUEUED jobs may be considered as “stalled” if they have not moved to a RUNNING state during a timeout deadline. QUEUED jobs that have stalled can be taken by another job executor with an available slot.
1) Any job executor microservice can take this job and move it to a RUNNING state.
2) Any new job executor microservice can take this job and move it to a RUNNING state at the start.
RUNNING jobs may be considered as “stalled” if they haven’t updated the progress before the timeout deadline. If a running job stalls, it is moved to a TIMED_OUT state.
TIMED_OUT jobs can be taken by any other job executor microservice with an available slot. If a job is taken by another job executor, it is moved to a RUNNING state.
Timed Out Jobs
TIMED_OUT jobs are moved back to a RUNNING state if the original job execution progresses.
TIMED_OUT jobs may be considered as “stalled” if they still haven’t updated the progress for their second deadline timeout.
TIMED_OUT jobs call the resume() method for the same plugin.
- If resume() is implemented and the job progresses, it is moved to a RUNNING state.
- If resume() is not implemented or the job has not progressed, it is marked as FAILED.