async Control Flows with Node.js

Callback Hell

Recently I found this example of Callback Hell in a professional, commercial code-base I was working on:

d3.json(GRAPH_FILE, function(data) {  
 d3.json("data/industries.json", function(industryJson) {
  d3.json("data/cc-latlong.min.json", function(countryCoordArr) {
   d3.json("data/capitals.json", function(error, capitals) {
    nodeData = data
    industries = industryJson
    countryCoordinates = countryCoordArr
    countryData = capitals
   })
  })
 })
})

Something looks odd here doesn't it? Clearly this pattern cannot continue. What if 10 or more sets of data were necessary to load? Do we just keep indenting and loading more data as part of a callback to the previous data load?

To be fair, this was part of a product still in prototype phase. But, the amount of work that went into constructing this chunk of code could have been better spent using the correct tool for the job. Doing this consistently will lead to a smooth transition from prototype to production. Unfortunately, the above scenario begins loading each data only after the previous load has completed. Hopefully future developers working with this code base do not continue following this pattern.

Note that defining anonymous functions inside functions (and perhaps even inside functions) is not inherently bad. For example, defining an inline callback function for a click or scroll event is fine. These types of functions will continue to be nested inside functions. Unless it's more readable to name the function and move them outside the method. Then do that instead.

By about the third or fourth layer of callbacks, however, consider if what you really want is a sequential asynchronous pattern.

Control Flows

Control Flows are patterns used by the async library to handle asynchronous operations more intuitively for the coder. Would you rather wade through Callback Hell, or read a perfectly clear asynchronous Control Flow?

Choosing the best Control Flow for the given task will improve functionality and performance too and it's easy to change the Control Flow by renaming it series, waterfall or parallel, because the idea and structure is basically the same between them. You give an array of anonymous (or named) functions, followed by a single, optional callback function to be called when the Control Flow is finished.

Fetching data from the web a simple and commonly used asynchronous process, so we will use this type of process in the following examples.

async.series(tasks, [callback])

This Control Flow is probably the most common and easiest to use. It completes each function sequentially and eventually calls the final callback function with an array with each functions output in order. This means each of the functions do not have access to other functions' results. If this is what you're looking for, see async.waterfall.

/**
 * Example of async.series
 *
 * @params {Function} done - Function to be called when 
 * the async.series is complete.
 */
function asyncSeries(done) {  
 async.series([
  // This is the first task. When the `return callback()`
  // line is called, the series will move to the next task.
  (callback) => {
   someAsyncMethod((err, result) => {
    return callback(err, result)
   })
  },
  // The next task in the series.
  (callback) => {
   // Since (err, result) is the standard way of doing callbacks
   // in Node.js, we dont need to create our own callback method -- it's already expecting (err, result)
   // This is equivalent to the way we called someAsyncMethod() in the first function.
   return thisAsyncMethod(callback)
  }
  // This final (and optional) function has access to the result from each tasks,
  // accessible 
 ], (err, result) => {
   return done(err, result)
  })
}

This is a very short example of what the async library is all about.

The done() method is a function passed into the async function. Your code will continue executing with whatever you defined as the callback method.

In each of the methods in the Series we invoke the callback() method to move on to the next task in the Series. callback(err, result) takes an error object as the first argument and whatever the result of the operation is as the second argument. This is a standard in Node.js called Error-First Callbacks. Everything is fine if err is null or falsy.

As soon as any of the methods pass a non-null error to the callback, or when the Series has completed each function in the tasks array, the series moves into the final function defined as the second argument to Series.

async.parallel(tasks, [callback])

Async parallel works the same as series, except it does not wait for the first method to complete. As its name suggests, each method will begin executing immediately. The final results method will call done() when the last method gets to its callback() function.

This is the best choice when each of the tasks do not depend on any of the other tasks. For a small number of tasks, this method will not take any additional time to complete than it would to finish only one task. Efficient!

/**
 * Example of async.parellel
 *
 * @params {Function} done - Function to be called when 
 * the async.parallel is complete.
 */
function asyncParallel(done) {  
 async.parallel([
  (callback) => {
   $.ajax({ url: "https://movies.com/latest", success: (result) => { 
    callback(null, result)
   }})               
  },
  (callback) => {
   $.ajax({ url: "https://movies.com/theaters", success: (result) => { 
    callback(null, result)
   }})               
  },
  (callback) => {
   $.ajax({ url: "https://movies.com/showtimes", success: (result) => { 
    callback(null, result)
   }})               
  }], (err, results) => {
   if (err) done(err)
   // If everything went smoothly, results contains [latest, theaters, showtimes]
   done(null, results)
 })
}

async.waterfall(tasks, [callback])

Waterfall is exactly the same idea as the series control flow, except it allows each of the methods in the Waterfall access to the results of the previous method. In this example imagine our call to https://movies.com/latest actually returns an array of the latest movies. In the second method, we're going to use the async.map function to find theaters for each of the movies.

/**
 * Example of async.waterfall
 *
 * @params {Function} done - Function to be called when 
 * the async.waterfall is complete.
 */
function asyncWaterfall(done) {  
 async.waterfall([
  (callback) => {
   $.ajax({ url: "https://movies.com/latest", success: (results) => {
    return callback(null, results)
   }})               
  },
  // Now we can use the results of the first call to find the theaters 
  (results, callback) => {
   // async.map is a utility provided by the async library for doing some action to every item
   // in an array. In this case, we want to invoke an endpoint for each movie in results.
   // This is similar to Array.map, except async.map is an asyncronous parallel operation. Efficient!
   async.map(results, (movie, mapCb) => {
     $.ajax({
      url: 'https://movies.com/' + movie + '/theaters',
      success: mapCb
     })
    }, (err, results) => {
      // Results here contains an array with the result of every call made to movies.com
      return callback(err, results)
    })
  }], (err, results) => {
   if (err) return done(err)
   return done(null, results)
 })
}

There are actually many more control flows which could be an even better option depending on the use case. Something to be aware of is the tasks array (the array containing all the functions you want to complete with a certain flow) can be built beforehand. Meaning if you have 100 or more tasks needing parallel processing, it's not recommended to put them all directly into the series. It's more readable to extract the tasks into a method which can generate the tasks array, then your method looks like this:

function processTasks(done) {  
  const tasks = getTasksArray()
  async.parallel(tasks, done)
}

You can find an exhaustive list of the capabilities of the async library at the async project GitHub page.

It's important to have a working knowledge of many tools, and to use the correct tool for the job. A carpenter doesn't use the handle of a screwdriver to bang in a nail, and neither should a software developer use synchronous programming style when performance and readability is at stake.