building blocks of Web Workers

MustWatch


building blocks of Web Workers
This is post # 7 of the series dedicated to exploring JavaScript and its building components. 
In the process of identifying and describing the core elements, we also share some rules of thumb we use when building SessionStack, a lightweight JavaScript application that has to be robust and highly-performant to help users see and reproduce their web app defects real-time.
If you missed the previous chapters, you can find them here:
An overview of the engine, the runtime, and the call stack
Inside Google’s V8 engine + 5 tips on how to write optimized code
Memory management + how to handle 4 common memory leaks
The event loop and the rise of Async programming + 5 ways to better coding with async/await
Deep dive into WebSockets and HTTP/2 with SSE + how to pick the right path
How JavaScript works: A comparison with WebAssembly + why in certain cases it’s better to use it over JavaScript

This time we’ll be taking apart Web Workers: we’ll offer an overview, discuss the different types of workers, how their building components come to play together, and what advantages and limitations they offer in different scenarios. 
Finally, we’ll provide 5 use cases in which Web Workers will be the right choice.
You should already be familiar with the fact that JavaScript runs on a single thread as we have discussed it previously in great detail. 
JavaScript, however, gives developers the opportunity to write asynchronous code too.
Limitations of Async programming
We have discussed async programming previously and when it should be used.
Async programming enables your app UI to be responsive, by “scheduling” parts of the code to be executed a bit later in the event loop, thus allowing the UI rendering to be performed first.
A good use case for async programming is making AJAX requests. 
Since requests can take a lot of time, they can be made asynchronously, and while the client is waiting for a response, other code can be executed.

This, however, poses a problem — requests are handled by the WEB API of the browser, but how can other code be made asynchronous? For example, what if the code that is inside the success callback is very CPU intensive:

If the performCPUIntensiveCalculation is not an HTTP request but a blocking code (e.g. 
a huge for loop), there is no way to free up the event loop and unblock the UI of the browser — it will freeze and be unresponsive to the user.
This means that asynchronous functions solve only a small part of the single-thread limitations of the JavaScript language.
In some cases, you can achieve good results in unblocking the UI from longer-running computations by using setTimeout. For example, by batching a complex computation in separate setTimeout calls, you can put them on separate “locations” in the event loop and this way buy time for the UI rendering/responsiveness to be performed.
Let’s take a look at a simple function that calculates the average of a numeric array:

This is how you can rewrite the code above and “emulate” asynchronicity:

This will make use of the setTimeout function which will add each step of the calculation further down the event loop. 
Between each calculation, there will be enough time for other calculations to take place, necessary to unfreeze the browser.
Web Workers will save the day
HTML5 has brought us lots of great things out of the box, including:
SSE (which we have described and compared to WebSockets in a previous post)
Geolocation
Application cache
Local Storage
Drag and Drop
Web Workers

Web Workers are in-browser threads that can be used to execute JavaScript code without blocking the event loop.
This is truly amazing. 
The whole paradigm of JavaScript is based on the idea of single-threaded environment but here come Web Workers which remove (partially) this limitation.
Web Workers allow developers to put long-running and computationally intensive tasks on the background without blocking the UI, making your app even more responsive. 
What’s more, no tricks with the setTimeout are needed in order to hack your way around the event loop.
Here is a simple demo that shows the difference between sorting an array with and without Web Workers.
Overview of Web WorkersWeb Workers allow you to do things like firing up long-running scripts to handle computationally intensive tasks, but without blocking the UI. 
In fact, it all takes place in parallel . 
Web Workers are truly multi-threaded.
You might say — “Wasn’t JavaScript a single-threaded language?”.
This should be your ‘aha!’ moment when you realize that JavaScript is a language, which doesn’t define a threading model. 
Web Workers are not part of JavaScript, they’re a browser feature which can be accessed through JavaScript. 
Most browsers have historically been single-threaded (this has, of course, changed), and most JavaScript implementations happen in the browser. 
Web Workers are not implemented in Node.JS — it has a concept of “cluster” or “child_process” which is a bit different.
It’s worth noting that the specification mentions three types of Web Workers:
Dedicated Workers
Shared Workers
Service workers

Dedicated WorkersDedicated Web Workers are instantiated by the main process and can only communicate with it.


Dedicated Workers browser support
Shared WorkersShared workers can be reached by all processes running on the same origin (different browser tabs, iframes or other shared workers).



Shared Workers browser support
Service WorkersA Service Worker is an event-driven worker registered against an origin and a path. 
It can control the web page/site it is associated with, intercepting and modifying the navigation and resource requests, and caching resources in a very granular fashion to give you great control over how your app behaves in certain situations (e.g. 
when the network is not available.)


Service Workers browser support
In this post, we’ll focus on Dedicated Workers and refer to them as “Web Workers” or “Workers”.
How Web Workers work
Web Workers are implemented as .js files which are included via asynchronous HTTP requests in your page. 
These requests are completely hidden from you by the Web Worker API.
Workers utilize thread-like message passing to achieve parallelism. 
They’re perfect for keeping your UI up-to-date, performant, and responsive for users.
Web Workers run in an isolated thread in the browser. 
As a result, the code that they execute needs to be contained in a separate file. 
That’s very important to remember.
Let’s see how a basic worker is created:

If the “task.js” file exists and is accessible, the browser will spawn a new thread which downloads the file asynchronously. 
Right after the download is completed, it will be executed and the worker will begin.
In case the provided path to the file returns a 404, the worker will fail silently.
In order to start the created worker, you need to invoke the postMessage method:

Web Worker communication
In order to communicate between a Web Worker and the page that created it, you need to use the postMessage method or a Broadcast Channel.
The postMessage method
Newer browsers support a JSON object as a first parameter to the method while older browsers support just a string.
Let’s see an example of how the page that creates a worker can communicate back and forth with it, by passing a JSON object as a more “complicated” example. 
Passing a string is quite the same.
Let’s take a look at the following HTML page (or part of it to be more precise):

And this is how our worker script will look like:

When the button is clicked, postMessage will be called from the main page. 
The worker.postMessage line passes the JSON object to the worker, adding cmd and data keys with their respective values. 
The worker will handle that message through the defined message handler.
When the message arrives, the actual computing is being performed in the worker, without blocking the event loop. 
The worker is checking the passed event e and executes just like a standard JavaScript function. 
When it’s done, the result is passed back to the main page.
In the context of a worker, both the self and this reference the global scope for the worker.
There are two ways to stop a worker: by calling worker.terminate() from the main page or by calling self.close() inside of the worker itself.
Broadcast Channel
The Broadcast Channel is a more general API for communication. 
It lets us broadcast messages to all contexts sharing the same origin. 
All browser tabs, iframes, or workers served from the same origin can emit and receive messages:

And visually, you can see what Broadcast Channels look like to make it more clear:


Broadcast Channel has more limited browser support though:


The size of messagesThere are 2 ways to send messages to Web Workers:
Copying the message: the message is serialized, copied, sent over, and then de-serialized at the other end. 
The page and worker do not share the same instance, so the end result is that a duplicate is created on each pass. 
Most browsers implement this feature by automatically JSON encoding/decoding the value at either end. 
As expected, these data operations add significant overhead to the message transmission. 
The bigger the message, the longer it takes to be sent.
Transferring the message: this means that the original sender can no longer use it once sent. 
Transferring data is almost instantaneous. 
The limitation is that only ArrayBuffer is transferable.

Features available to Web Workers
Web Workers have access only to a subset of JavaScript features due to their multi-threaded nature. 
Here’s the list of features:
The navigator object
The location object (read-only)
XMLHttpRequest
setTimeout()/clearTimeout() and setInterval()/clearInterval()
The Application Cache
Importing external scripts using importScripts()
Creating other web workers

Web Worker limitationsSadly, Web Workers don’t have access to some very crucial JavaScript features:
The DOM (it’s not thread-safe)
The window object
The document object
The parent object

This means that a Web Worker can’t manipulate the DOM (and thus the UI). 
It can be tricky at times, but once you learn how to properly use Web Workers, you’ll start using them as separate “computing machines” while all the UI changes will take place in your page code. 
The Workers will do all the heavy lifting for you and once the jobs are done, you’ll pass the results to the page which makes the necessary changes to the UI.
Handling errorsAs with any JavaScript code, you’ll want to handle any errors that are thrown in your Web Workers. 
If an error occurs while a worker is executing, the ErrorEvent is fired. 
The interface contains three useful properties for figuring out what went wrong:
filename - the name of the worker script that caused the error
lineno - the line number where the error occurred
message - a description of the error

This is an example:

Here, you can see that we created a worker and started listening for the error event.
Inside the worker (in workerWithError.js) we create an intentional exception by multiplying x by 2 while x is not defined in that scope. 
The exception is propagated to the initial script and onError is being invoked with information about the error.
Good use cases for Web WorkersSo far we’ve listed the strengths and limitations of Web Workers. 
Let’s see now what are the strongest use-cases for them:
Ray tracing: ray tracing is a rendering technique for generating an image by tracing the path of light as pixels. 
Ray tracing uses very CPU-intensive mathematical computations in order to simulate the path of light. 
The idea is to simulate some effects like reflection, refraction, materials, etc. 
All this computational logic can be added to a Web Worker to avoid blocking the UI thread. 
Even better — you can easily split the image rendering between several workers (and respectively between several CPUs). 
Here is a simple demo of ray tracing using Web Workers — https://nerget.com/rayjs-mt/rayjs.html.
Encryption: end-to-end encryption is getting more and more popular due to the increasing rigorousness of regulations on personal and sensitive data. 
Encryption can be a something quite time-consuming, especially if there’s a lot of data that has to be frequently encrypted (before sending it to the server, for example). 
This is a very good scenario in which a Web Worker can be used since it doesn’t require any access to the DOM or anything fancy — it’s pure algorithms doing their job. 
Once in the worker, it is seamless to the end user and doesn’t impact thеir experience.
Prefetching data: in order to optimize your website or web application and improve data loading time, you can leverage Web Workers to load and store some data in advance so that you can use it later when needed. 
Web Workers are amazing in this case because they won’t impact your app’s UI, unlike when this is done without workers.
Progressive Web Apps: they have to load quickly even when the network connection is shaky. 
This means that data has to be stored locally in the browser. 
This is where IndexDB or similar APIs comes into play. 
Basically, a client-side storage is needed. 
In order to be used without blocking the UI thread, the work has to be done in Web Workers. 
Well, in the case of IndexDB, there is an asynchronous API that allows you to do this even without workers, but there was a synchronous API before (it might be introduced again) which should only be used inside workers.
Spell checking: a basic spell checker works in the following way — the program reads a dictionary file with a list of correctly spelled words. 
The dictionary is being parsed as a search tree to make the actual text search-efficient. 
When a word is provided to the checker, the program checks whether it exists in the pre-built search tree. 
If the word is not found in the tree, the user can be provided with alternate spellings, by substituting alternate characters and test if it’s a valid word — if it’s the word that the user wanted to write. 
All this processing can easily be offloaded to a Web Worker so that the user can just type words and sentences without any blocking of the UI, while the worker performs all the searching and providing of suggestions.

Performance and reliability are very critical for us at SessionStack. 
The reason why they’re so important is that once SessionStack is integrated into your web app, it starts recording everything from DOM changes and user interaction to network requests, unhandled exceptions and debug messages. 
All this data is transmitted to our servers in real-time which allows you to replay issues from your web apps as videos and see everything that happened to your users. 
This all takes place with minimum latency and no performance overhead for your app.
This is why we’re offloading (wherever it makes sense) logic from both our monitoring library and our player to Web Workers that are handling very CPU-intensive tasks like hashing to validate data integrity, rendering, etc.
Web technologies constantly change and develop so We go the extra mile to ensure SessionStack is very lightweight and has zero performance impact on our users’ apps.
There is a free plan if you’d like to give SessionStack a try.

Using worker_threads in Node.js
This is a beginner’s guide to using worker_threads in Node.js. 

What are they good for?
As the documentation says:

Workers are useful for performing CPU-intensive JavaScript operations; do not use them for I/O, since Node.js’s built-in mechanisms for performing operations asynchronously already treat it more efficiently than Worker threads can.

worker_threads are more lightweight than the parallelism you can get usingchild_process or cluster. 
Additionally, worker_threads can share memory efficiently.

Hello world!
In the following example, Worker is the constructor for a worker. 
It requires an argument which is the path to a file containing the code for the worker to execute. 
In this case, we send it __filename so that the code that launches the worker and the code for the worker itself are in the same file. 
The constructor also takes an optional second options argument, but we do not use it here.

To differentiate whether we are in the main thread (which will launch the worker) or the worker itself, we use isMainThread. 
It is true if we are in the main thread (that is, not in a worker) and false if we are in a worker.

Once the worker is created, we listen (in the main thread) for the message event on the worker and use console.log() to print whatever is sent.

Lastly, to send a message from the worker to the main thread, we use parentPort.postMessage().


Put this code in a file called threads-example.js, invoke it with node (and the --experimental-worker flag if you are not running Node.js 11.7.0 or newer), and the output should be Hello world!.

$ node threads-example.js 
Hello world!
$

Calculating Primes
Now let’s do something a little more interesting. 
Without worker_threads, here is how you might calculate all the prime numbers less than 10,000,000:


The code above was ported from a C# implementation at https://stackoverflow.com/a/34429272/436641. 
The important thing to know is that the generatePrimes() function does CPU-intensive work.

On my laptop, running the code above with the time utility reports results along the lines of:

real 0m17.209s
user 0m15.589s
sys 0m0.242s

Now let’s see what happens with worker_threads:


The new concept here is workerData. 
You set a value for the workerDataoption when invoking the Worker constructor. 
The value of workerData is cloned and available in the worker thread as require('worker_threads').workerData.

Running this script and passing it 2 on the command line (as the number of threads to use) yields far better performance than the single-threaded version:

real 0m7.881s
user 0m12.832s
sys 0m0.162s

With Node.js 11.7.0, you no longer need the --experimental-workers flag to use the worker_threads module. 
So it’s even easier to use worker_threads than it was when I wrote that first article.

Robert left a comment requesting more sophisticated real-world examples. 
I’m not sure this is more sophisticated than the prime number example in the previous article. 
But it is more of a real-world example!

I used to have a website that would solve Six Degrees Of Kevin Bacon queries, but for music rather than movies. 
Give it two musicians, and it would tell you how to connect them. 
It did this based on who recorded with whom on particular tracks. 
One thing I discovered is that there wasn’t a good source for track-by-track recording data. 
So I had to curate my own. 
A data dump is at https://github.com/Trott/music-routes-data.

Let’s say you wanted to connect Carrie Brownstein to Michael Jackson:

Carrie Brownstein played on Wild Flag’s “Black Tiles” with Janet Weiss.
Janet Weiss played on Bright Eyes’ “Clairaudients (Kill Or Be Killed)” with Nate Walcott.
Nate Walcott played on Pete Yorn’s “Social Development Dance” with Joey Waronker
Joey Waronker played on Paul McCartney’s “A Certain Softness” with Paul McCartney
Paul McCartney played on Michael Jackson’s “The Girl Is Mine” with Michael Jackson

The data is a simple undirected and unweighted graph. 
The algorithm is a breadth first search from each endpoint. 
Find all people one step away from Carrie Brownstein. 
Are any of them Michael Jackson? If not, find all people one step away from Michael Jackson. 
Are any of them also one step away from Carrie Brownstein? If not, find all people two steps away from Carrie Brownstein. 
If no overlap yet, find all people two steps away from Michael Jackson. 
And repeat until you find someone that is in both sets of connections.

I wrote some code to do this:
Unfortunately, it’s single-threaded. 
This means it will be slow:

$ node index.js
search duration: 4144.005ms
Carrie Brownstein played on "Racehorse" with Janet Weiss
Janet Weiss played on "Clairaudients (Kill Or Be Killed)" with Mike Mogis
Mike Mogis played on "Social Development Dance" with Joey Waronker
Joey Waronker played on "A Certain Softness" with Paul McCartney
Paul McCartney played on "The Girl Is Mine" with Michael Jackson
$ 

4 seconds isn’t too bad, I suppose. 
But it can get a lot longer:

$ node index.js 8876 8992
search duration: 27410.035ms
Derek Holt played on "She's Falling Apart" with Dweezil Zappa
Dweezil Zappa played on "Trouble Every Day" with Steve Vai
Steve Vai played on "Fishing" with John Lydon
John Lydon played on "Bad Baby" with Martin Atkins
Martin Atkins played on "The Bushmaster" with David Yow
David Yow played on "Seasick" with David Wm. 
Sims
David Wm. 
Sims played on "Soul Machine" with Jim Kimball
Jim Kimball played on "Now I Agree" with Drew Thomas
Drew Thomas played on "Running Into Walls" with Tony Bono
Tony Bono played on "Insult To Injury" with Joe Cangelosi
Joe Cangelosi played on "Isolation" with Mille Petrozza
Mille Petrozza played on "World Beyond" with Rob Fioretti
$ 

Climax Blues Band guitarist Derek Holt to German thrash-metal bassist Rob Fioretti? Over 27 seconds! Unacceptable!

To the rescue, worker_threads!

Here’s the main thread code:

And here’s the worker thread code:

As far as worker_threads go, there isn’t anything that wasn’t covered in the previous article. 
Things do get a bit more complicated with the messaging. 
But there’s nothing new. 
Let’s see if we’ve improved performance.

$ node main.js
search duration: 724.048ms
Carrie Brownstein played on "Glass Tambourine" with Mary Timony
Mary Timony played on "All Dressed Up In Dreams" with Stephin Merritt
Stephin Merritt played on "The Dead Only Quickly" with Neil Hannon
Neil Hannon played on "Do They Know It's Christmas?" with Paul McCartney
Paul McCartney played on "The Girl Is Mine" with Michael Jackson
$ 

Nice! From over 4 seconds to under 724ms. 
Let’s see how we do on the 27-second route!

$ node main.js 8876 8992
search duration: 2767.005ms
Derek Holt played on "She's Falling Apart" with Dweezil Zappa
Dweezil Zappa played on "Smoke On The Water" with Steve Madaio
Steve Madaio played on "All By Myself" with Jim Keltner
Jim Keltner played on "Couldn't Call It Unexpected No. 
4" with Marc Ribot
Marc Ribot played on "Bridge To The Beyond" with Mike Patton
Mike Patton played on "When The Stars Begin To Fall" with Duane Denison
Duane Denison played on "Soul Machine" with Jim Kimball
Jim Kimball played on "Monday's Highs" with Drew Thomas
Drew Thomas played on "William" with Tony Bono
Tony Bono played on "Battle Scars" with Joe Cangelosi
Joe Cangelosi played on "Prevail" with Frank Blackfire
Frank Blackfire played on "World Beyond" with Rob Fioretti
$ 

Whoa! That’s a roughly tenfold improvement! From over 27 seconds to under 3 seconds! And we’re only using two threads!

This may not quite be the complex example sought by the commenter in the last article. 
But I hope it is a step in that direction. 
And there’s always the possibility of a Part 3….