Node.js ReadFile Benchmark

2016-08-03

For a lecture I wanted to show why Node.js is awesome. There are many reasons why node is a great tool, but the main point for me is that it embraces async & non-blocking processing. To show that behaviour I wanted to use an old school example of reading files. For this purpose I created a small bash-script which creates 10.000 files each with a size of 10kb in a ./files subdirectory:

1	for i in $(seq 100); do dd if=/dev/zero of=./files/${i} bs=1M count=10 status=none; done

Next we just need to get the list of files with fs.readdir and iterate over it and read each single file synchronously and asynchrounisly.

Scripts

Synchronous:

let fs = require("fs");
//Synchronous
fs.readdir( "./files", function( error, files ) {
    for ( var i = 0; i < files.length; i++ ) {
        fs.readFileSync( "./files/" + files[i])
    };
});

Asynchronous:

let fs = require("fs");
//Asynchronous
fs.readdir( "./files", function( err, files) {
    for ( var i = 0; i < files.length; i++ ) {
        fs.readFile( "./files/" + files[i], function( error, data ) {
        });
    }
});

We don’t print anything or do anything else as it would just the inaccuracy of our benchmark. A single console.log() for example could block the whole scripts as it is most of the times synchronous.

Attention: It depends on the runtime if console.log() is synchronous or asynchronous depends on the runtime. In our case we use node in the TTY in Linux and thanks to the official documentation we know that it is blocking:

The console functions are usually asynchronous unless the destination is a file. Disks are fast and operating systems normally employ write-back caching; it should be a very rare occurrence indeed that a write blocks, but it is possible.

Additionally, console functions are blocking when outputting to TTYs (terminals) on OS X as a workaround for the OS’s very small, 1kb buffer size. This is to prevent interleaving between stdout and stderr.

Benchmark

Now we are just running the scripts on our files and measuring the time with the unix builtin command time:

$ time node fileReadSync.js
real    0m0.200s
user    0m0.128s
sys    0m0.068s

$ time node fileReadAsync.js
real    0m0.627s
user    0m0.168s
sys    0m0.544s

Huh? Seems strange don’t it!? Against our bet the the blocking fileReadSync seems to be faster than asynchronous & non-blocking fileRead. But how could that be? First I tried other combinations with 10 files each 100mb or 100 files each 10mb and a few others but no matter what fileReadSync always was as fast as or up to 2 times faster than fileRead. After a few tries I took a guess and said that it has to be of my SSD. So I installed an old HDD into my system and retested it and watch this:

$ time node fileReadSync.js
real    0m4.203s
user    0m0.084s
sys    0m1.704s

$ time node fileReadParallel.js
real    0m1.823s
user    0m0.184s
sys    0m2.712s

It just worked like expected! The asynchronous example is faster than the synchronous one! So by implication the SSD is so fast that the overhead of scheduling the callbacks is much higher than just waiting for the files!

Summary

fileReadSync is very very very fast on a good SSD and you probably don’t need to care with asynchronous loading at server startup and all the callback hassle. Just load them synchronously and use your data! Nevertheless you should always use asynchronous readFile when processing data on request to not block any other requests! If someone has another opinion on my outcomes or my last recommendation please comment!

How I setup this Blog with Hexo!

2016-07-01

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

Quick Start

Create a new post

1	$ hexo new "My New Post"

More info: Writing

Run server

1	$ hexo server

More info: Server

Generate static files

1	$ hexo generate

More info: Generating

Deploy to remote sites

1	$ hexo deploy

More info: Deployment

Welcome to the Blog of DeusProx!

2016-07-01

Hello!

This is Gordon Lawrenz and I’m an computer science student(M.Sc.) at the RWTH Aachen. In Github I’m also known as DeusProx!

This blog will help me documentate my work, projects and other parts of my life.
It is also hosted on GitHub via Github Pages under this repository.

I hope you enjoy it!