Node.js ReadFile Benchmark
For a lecture I wanted to show why Node.js is awesome. There are many reasons why node is a great tool, but the main point for me is that it embraces async & non-blocking processing. To show that behaviour I wanted to use an old school example of reading files. For this purpose I created a small bash-script which creates 10.000 files each with a size of 10kb in a ./files
subdirectory:
Next we just need to get the list of files with fs.readdir
and iterate over it and read each single file synchronously and asynchrounisly.
Scripts
Synchronous:
|
|
Asynchronous:
|
|
We don’t print anything or do anything else as it would just the inaccuracy of our benchmark. A single console.log()
for example could block the whole scripts as it is most of the times synchronous.
Attention: It depends on the runtime if console.log()
is synchronous or asynchronous depends on the runtime. In our case we use node in the TTY in Linux and thanks to the official documentation we know that it is blocking:
The console functions are usually asynchronous unless the destination is a file. Disks are fast and operating systems normally employ write-back caching; it should be a very rare occurrence indeed that a write blocks, but it is possible.
Additionally, console functions are blocking when outputting to TTYs (terminals) on OS X as a workaround for the OS’s very small, 1kb buffer size. This is to prevent interleaving between stdout and stderr.
Benchmark
Now we are just running the scripts on our files and measuring the time with the unix builtin command time
:
|
|
Huh? Seems strange don’t it!? Against our bet the the blocking fileReadSync
seems to be faster than asynchronous & non-blocking fileRead
. But how could that be? First I tried other combinations with 10 files each 100mb or 100 files each 10mb and a few others but no matter what fileReadSync
always was as fast as or up to 2 times faster than fileRead
. After a few tries I took a guess and said that it has to be of my SSD. So I installed an old HDD into my system and retested it and watch this:
|
|
It just worked like expected! The asynchronous example is faster than the synchronous one! So by implication the SSD is so fast that the overhead of scheduling the callbacks is much higher than just waiting for the files!
Summary
fileReadSync
is very very very fast on a good SSD and you probably don’t need to care with asynchronous loading at server startup and all the callback hassle. Just load them synchronously and use your data! Nevertheless you should always use asynchronous readFile
when processing data on request to not block any other requests! If someone has another opinion on my outcomes or my last recommendation please comment!