Long running processes

Long running tasks

Long running tasks are ones that cannot be completed within the normal message processing window. These are tasks like encoding a video or preparing a large file for download. Because they take a long time to complete, they should be run in a way that doesn't block the message handling process.‌

When a message is read from the underlying queue it must be processed and deleted within a configured timeout (usually a few minutes). If it is not deleted in this time, the underlying queue will assume that the consuming process has died and will flag the message as visible again and will be picked up by another handler that will do the same process. After a number of iterations like this, the message will be sent to the dead letter queue.‌

The other downside to running time-consuming processes in a handling window is that it blocks that worker from consuming other messages as it's waiting for the process to complete. Handlers should process messages as fast as possible.‌

Approaching long running tasks

Let's say we have a command EncodeVideo. This might take up to an hour to complete so a handler can't wait for it to complete without the message being returned to the queue.‌

Instead a separate process should be started, and the events that follow reflect the asynchronous nature of the task.‌

A naive approach

A simple way that is not recommended might be to background the task:

import { videoService } from 'services'

export const encodeVideoHandler = (encodeVideo: EncodeVideo) => { 
  setTimeout(async () => videoService.encode(encodeVideo), 0)
}

This will background the task and the EncodeVideo command will be deleted, but it has a couple of flaws.‌

There is no retry mechanism at this point if the process fails. It will simply die without publishing any messages to indicate as such.‌

There's no effective load-balancing of tasks occurring. The same handler service may receive all of the EncodeVideo command and may attempt to have hundreds of these processes running in the background that eventually crash the instance.‌

Lastly if the service is restarted then the backgrounded tasks are killed and won't be retried.‌

A containerised approach

If your app runs in kubernetes, docker swarm, ECS etc, then starting a pod/task per long running task can be very effective. This is outside the scope of what @node-ts/bus provides but can be implemented relatively simply with handlers.‌

When receiving an EncodeVideo command, use the handler to start the encoding process in a new pod/task and leave it up to the scheduler to place. This should also make it easier to scale out your app given the volume of these long running tasks being created.‌

Resiliency

Just using handlers is a good start, but it won't provide the reliability needed if a task fails or gets terminated by the scheduler.‌

Workflows can be used to listen for system messages from the scheduler that indicate when a task as exited, and rerun the task if necessary.

Last updated