hot code loading in nodejs

Hot Code Loading in Node.js

Node.js Web应用代码热更新的另类思路

Reading through Fever today, this post by Jack Moffitt caught my eye. In it, he discusses a hack to allow a running Python process to dynamically reload code. While the hack itself, shall we say, lacks subtlety, Jack's post got me thinking. It's true, Erlang's hot code loading is a great feature, enabling Erlang's 99.9999999% uptime claims. It occurred to me that it wouldn't be terribly difficult to implement for node.js' CommonJS-based module loader.

A few hours (and a tasty home-made Paella later), here's my answer: Hotload node branch.

Umm… What does it do?

var requestHandler = require('./myRequestHandler');

process.watchFile('./myRequestHandler', function () {
  module.unCacheModule('./myRequestHandler');
  requestHandler = require('./myRequestHandler');
}

var reqHandlerClosure = function (req, res) {
  requestHandler.handle(req, res);
}

http.createServer(reqHandlerClosure).listen(8000);

Now, any time you modify myRequestHandler.js, the above code will notice and replace the local requestHandler with the new code. Any existing requests will continue to use the old code, while any new incoming requests will use the new code. All without shutting down the server, bouncing any requests, prematurely killing any requests, or even relying on an intelligent load balancer.

Awesome! How does it work?

Basically, all node modules are created as sandboxes, so that as long as you don't use global variables, you can be sure that any modules you write won't stomp on others' code, and vice versa, you can be sure that others' modules won't stomp on your code.

Modules are loaded by require()ing them and assigning the return to a local variable, like so:

var http = require('http');

The important insight is that the return value of require() is a self-contained closure. There's no reason it has to be the same each time. Essentially, require(file) says "read file, seal it in a protective case, and return that protective case." require() is smart, though, and caches modules so that multiple attempts torequire() the same module don't waste time (synchronously) reading from disk. Those caches don't get invalidated, though, and even though we can detect when files change, we can't just call require() again, since the cached version takes precedence.

There are a few ways to fix this, but the subtleties rapidly complicate matters. If the ultimate goal is to allow an already-executing module (e.g., an http request handler) to continue executing while new code is loaded, then automatic code reloading is out, since changing one module will change them all. In the approach I've taken here, I tried to achieve two goals:

  1. Make minimal changes to the existing node.js require() logic.
  2. Ensure that any require() calls within an already-loaded module will return functions corresponding to the pre-hot load version of the code.

The latter goal is important because a module expects a specific set of behaviour from the modules on which it depends. Hot loading only works so long as modules have a consistent view of the world.

To accomplish these goals, all I've done is move the module cache from a global one into the module itself. Reloading is minimised by copying parent's caches into child modules (made fast and efficient thanks to V8's approach to variable handling). Any module can load a new version of any loaded modules by first removing that module from its local cache. This doesn't affect any other modules (including dependent modules), but will ensure that any sub-modules are reloaded, as long as they're not in the parent's cache.

By taking a relatively conservative approach to module reloading, I believe this is a flexible and powerful approach to hot code reloading. Most server applications have a strongly hierarchical code structure; as long as code reloading is done at the top-level, before many modules have been required, it can be done simply and efficiently.

While I hope this patch or a modified one will make it into node.js, this approach can be adapted to exist outside of node's core, at the expense of maintaining two require() implementations.