nodejs notes

Must Watch!
nodejs sqlite tutorial
real time chat application in Node.js using Express, Mongoose and Socket.io Web APIs Node Multi Client Chat


MustWatch

Learn JavaScript With This ONE Project! Node Debugger Trick audio speaker Zombie.js Puppeteer nodejs Knowledge Base


Beginner’s Guide to Server-Side Web Development with Node.js nodejs communicate between two servers

nodejs tutorials

♦nodejs ♦Node入门 ♦NodejsSQL Node.js Tutorial Online Node Terminal ♦Node命令行程序 ♦nodejsList nodejs in 3 minutes Node.js Tutorial For Absolute Beginners ♦Getting Started with Node.js Sublime Text Nodejs ♦nodejs_mongodb Usage node [options] [V8 options] [script.js | -e "script" | - ] [arguments] node [options] [V8 options] [script.js | -e "script" | -] [--] [arguments] node inspect [script.js | -e "script" | <host>:<port>] … node --v8-options Next, create a new source file in the projects folder and call it hello-world.js. Open hello-world.js and paste in the following content: const http = require('http'); const hostname = '127.0.0.1'; const port = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/plain'); res.end('Hello, World!\n'); }); server.listen(port, hostname, () => { console.log(`Server running at http://${hostname}:${port}/`); }); Save the file, go back to the terminal window, and enter the following command: $ node hello-world.js Now, open any preferred web browser and visit http://127.0.0.1:3000.

XMLHttpRequest

Example Using the XMLHttpRequest Object
The basics
Response Type
Ready states
Aborting request
Synchronous requests
HTTP-headers
POST, FormData
Upload progress
Cross-origin requests
Summary




Example Using the XMLHttpRequest Object

function loadXMLDoc() { var xhttp = new XMLHttpRequest(); xhttp.onreadystatechange = function() { if (this.readyState == 4 && this.status == 200) { document.getElementById("demo").innerHTML = this.responseText; } }; xhttp.open("GET", "xmlhttp_info.txt", true); xhttp.send(); } XMLHttpRequest is a built-in browser object that allows to make HTTP requests in JavaScript. Despite of having the word “XML” in its name, it can operate on any data, not only in XML format. We can upload/download files, track progress and much more. Right now, there’s another, more modern method fetch, that somewhat deprecates XMLHttpRequest. A Fetch API Example let file = "fetch_info.txt" fetch (file) .then(x => x.text()) .then(y => document.getElementById("demo").innerHTML = y); Since Fetch is based on async and await, the example above might be easier to understand like this: getText("fetch_info.txt"); async function getText(file) { let x = await fetch(file); let y = await x.text(); document.getElementById("demo").innerHTML = y; } In modern web-development XMLHttpRequest is used for three reasons: Historical reasons: we need to support existing scripts with XMLHttpRequest. We need to support old browsers, and don’t want polyfills (e.g. to keep scripts tiny). We need something that fetch can’t do yet, e.g. to track upload progress. Does that sound familiar? If yes, then all right, go on with XMLHttpRequest. Otherwise, please head on to Fetch.

The basics

XMLHttpRequest has two modes of operation: synchronous and asynchronous. Let’s see the asynchronous first, as it’s used in the majority of cases. To do the request, we need 3 steps: Create XMLHttpRequest: let xhr = new XMLHttpRequest(); The constructor has no arguments. Initialize it, usually right after new XMLHttpRequest: xhr.open(method, URL, [async, user, password]) This method specifies the main parameters of the request: method – HTTP-method. Usually "GET" or "POST". URL – the URL to request, a string, can be URL object. async – if explicitly set to false, then the request is synchronous, we’ll cover that a bit later. user, password – login and password for basic HTTP auth (if required). Please note that open call, contrary to its name, does not open the connection. It only configures the request, but the network activity only starts with the call of send. Send it out. xhr.send([body]) This method opens the connection and sends the request to server. The optional body parameter contains the request body. Some request methods like GET do not have a body. And some of them like POST use body to send the data to the server. We’ll see examples of that later. Listen to xhr events for response. These three events are the most widely used: load – when the request is complete (even if HTTP status is like 400 or 500), and the response is fully downloaded. error – when the request couldn’t be made, e.g. network down or invalid URL. progress – triggers periodically while the response is being downloaded, reports how much has been downloaded. xhr.onload = function() { alert(`Loaded: ${xhr.status} ${xhr.response}`); }; xhr.onerror = function() { // only triggers if the request couldn't be made at all alert(`Network Error`); }; xhr.onprogress = function(event) { // triggers periodically // event.loaded - how many bytes downloaded // event.lengthComputable = true if the server sent Content-Length header // event.total - total number of bytes (if lengthComputable) alert(`Received ${event.loaded} of ${event.total}`); }; Here’s a full example. The code below loads the URL at /article/xmlhttprequest/example/load from the server and prints the progress: // 1. Create a new XMLHttpRequest object let xhr = new XMLHttpRequest(); // 2. Configure it: GET-request for the URL /article/.../load xhr.open('GET', '/article/xmlhttprequest/example/load'); // 3. Send the request over the network xhr.send(); // 4. This will be called after the response is received xhr.onload = function() { if (xhr.status != 200) { // analyze HTTP status of the response alert(`Error ${xhr.status}: ${xhr.statusText}`); // e.g. 404: Not Found } else { // show the result alert(`Done, got ${xhr.response.length} bytes`); // response is the server response } }; xhr.onprogress = function(event) { if (event.lengthComputable) { alert(`Received ${event.loaded} of ${event.total} bytes`); } else { alert(`Received ${event.loaded} bytes`); // no Content-Length } }; xhr.onerror = function() { alert("Request failed"); }; Once the server has responded, we can receive the result in the following xhr properties:
status
HTTP status code (a number): 200, 404, 403 and so on, can be 0 in case of a non-HTTP failure.
statusText
HTTP status message (a string): usually OK for 200, Not Found for 404, Forbidden for 403 and so on.
response (old scripts may use responseText)
The server response body.
We can also specify a timeout using the corresponding property: xhr.timeout = 10000; // timeout in ms, 10 seconds If the request does not succeed within the given time, it gets canceled and timeout event triggers. URL search parameters To add parameters to URL, like ?name=value, and ensure the proper encoding, we can use URL object: let url = new URL('https://google.com/search'); url.searchParams.set('q', 'test me!'); // the parameter 'q' is encoded xhr.open('GET', url); // https://google.com/search?q=test+me%21

Response Type

We can use xhr.responseType property to set the response format: " (default) – get as string, "text" – get as string, "arraybuffer" – get as ArrayBuffer (for binary data, see chapter ArrayBuffer, binary arrays), "blob" – get as Blob (for binary data, see chapter Blob), "document" – get as XML document (can use XPath and other XML methods) or HTML document (based on the MIME type of the received data), "json" – get as JSON (parsed automatically). For example, let’s get the response as JSON: let xhr = new XMLHttpRequest(); xhr.open('GET', '/article/xmlhttprequest/example/json'); xhr.responseType = 'json'; xhr.send(); // the response is {"message": "Hello, world!"} xhr.onload = function() { let responseObj = xhr.response; alert(responseObj.message); // Hello, world! }; Please note: In the old scripts you may also find xhr.responseText and even xhr.responseXML properties. They exist for historical reasons, to get either a string or XML document. Nowadays, we should set the format in xhr.responseType and get xhr.response as demonstrated above.

Ready states

XMLHttpRequest changes between states as it progresses. The current state is accessible as xhr.readyState. All states, as in the specification: UNSENT = 0; // initial state OPENED = 1; // open called HEADERS_RECEIVED = 2; // response headers received LOADING = 3; // response is loading (a data packet is received) DONE = 4; // request complete An XMLHttpRequest object travels them in the order 0123 → … → 34. State 3 repeats every time a data packet is received over the network. We can track them using readystatechange event: xhr.onreadystatechange = function() { if (xhr.readyState == 3) { // loading } if (xhr.readyState == 4) { // request finished } }; You can find readystatechange listeners in really old code, it’s there for historical reasons, as there was a time when there were no load and other events. Nowadays, load/error/progress handlers deprecate it.

Aborting request

We can terminate the request at any time. The call to xhr.abort() does that: xhr.abort(); // terminate the request That triggers abort event, and xhr.status becomes 0.

Synchronous requests

If in the open method the third parameter async is set to false, the request is made synchronously. In other words, JavaScript execution pauses at send() and resumes when the response is received. Somewhat like alert or prompt commands. Here’s the rewritten example, the 3rd parameter of open is false: let xhr = new XMLHttpRequest(); xhr.open('GET', '/article/xmlhttprequest/hello.txt', false); try { xhr.send(); if (xhr.status != 200) { alert(`Error ${xhr.status}: ${xhr.statusText}`); } else { alert(xhr.response); } } catch(err) { // instead of onerror alert("Request failed"); } It might look good, but synchronous calls are used rarely, because they block in-page JavaScript till the loading is complete. In some browsers it becomes impossible to scroll. If a synchronous call takes too much time, the browser may suggest to close the “hanging” webpage. Many advanced capabilities of XMLHttpRequest, like requesting from another domain or specifying a timeout, are unavailable for synchronous requests. Also, as you can see, no progress indication. Because of all that, synchronous requests are used very sparingly, almost never. We won’t talk about them any more.

HTTP-headers

XMLHttpRequest allows both to send custom headers and read headers from the response. There are 3 methods for HTTP-headers:
setRequestHeader(name, value)
Sets the request header with the given name and value. For instance: xhr.setRequestHeader('Content-Type', 'application/json'); Headers limitations Several headers are managed exclusively by the browser, e.g. Referer and Host. The full list is in the specification. XMLHttpRequest is not allowed to change them, for the sake of user safety and correctness of the request. Can’t remove a header Another peculiarity of XMLHttpRequest is that one can’t undo setRequestHeader. Once the header is set, it’s set. Additional calls add information to the header, don’t overwrite it. For instance: xhr.setRequestHeader('X-Auth', '123'); xhr.setRequestHeader('X-Auth', '456'); // the header will be: // X-Auth: 123, 456
getResponseHeader(name)
Gets the response header with the given name (except Set-Cookie and Set-Cookie2). For instance: xhr.getResponseHeader('Content-Type')
getAllResponseHeaders()
Returns all response headers, except Set-Cookie and Set-Cookie2. Headers are returned as a single line, e.g.: Cache-Control: max-age=31536000 Content-Length: 4260 Content-Type: image/png Date: Sat, 08 Sep 2012 16:53:16 GMT The line break between headers is always "\r\n" (doesn’t depend on OS), so we can easily split it into individual headers. The separator between the name and the value is always a colon followed by a space ": ". That’s fixed in the specification. So, if we want to get an object with name/value pairs, we need to throw in a bit JS. Like this (assuming that if two headers have the same name, then the latter one overwrites the former one): let headers = xhr .getAllResponseHeaders() .split('\r\n') .reduce((result, current) => { let [name, value] = current.split(': '); result[name] = value; return result; }, {}); // headers['Content-Type'] = 'image/png'

POST, FormData

To make a POST request, we can use the built-in FormData object. The syntax: let formData = new FormData([form]); // creates an object, optionally fill from <form> formData.append(name, value); // appends a field We create it, optionally fill from a form, append more fields if needed, and then: xhr.open('POST', ...) – use POST method. xhr.send(formData) to submit the form to the server. For instance: <form name="person"> <input name="name" value="John"> <input name="surname" value="Smith"> </form> <script> // pre-fill FormData from the form let formData = new FormData(document.forms.person); // add one more field formData.append("middle", "Lee"); // send it out let xhr = new XMLHttpRequest(); xhr.open("POST", "/article/xmlhttprequest/post/user"); xhr.send(formData); xhr.onload = () => alert(xhr.response); </script> The form is sent with multipart/form-data encoding. Or, if we like JSON more, then JSON.stringify and send as a string. Just don’t forget to set the header Content-Type: application/json, many server-side frameworks automatically decode JSON with it: let xhr = new XMLHttpRequest(); let json = JSON.stringify({ name: "John", surname: "Smith" }); xhr.open("POST", '/submit') xhr.setRequestHeader('Content-type', 'application/json; charset=utf-8'); xhr.send(json); The .send(body) method is pretty omnivore. It can send almost any body, including Blob and BufferSource objects.

Upload progress

The progress event triggers only on the downloading stage. That is: if we POST something, XMLHttpRequest first uploads our data (the request body), then downloads the response. If we’re uploading something big, then we’re surely more interested in tracking the upload progress. But xhr.onprogress doesn’t help here. There’s another object, without methods, exclusively to track upload events: xhr.upload. It generates events, similar to xhr, but xhr.upload triggers them solely on uploading: loadstart – upload started. progress – triggers periodically during the upload. abort – upload aborted. error – non-HTTP error. load – upload finished successfully. timeout – upload timed out (if timeout property is set). loadend – upload finished with either success or error. Example of handlers: xhr.upload.onprogress = function(event) { alert(`Uploaded ${event.loaded} of ${event.total} bytes`); }; xhr.upload.onload = function() { alert(`Upload finished successfully.`); }; xhr.upload.onerror = function() { alert(`Error during the upload: ${xhr.status}`); }; Here’s a real-life example: file upload with progress indication: <input type="file" onchange="upload(this.files[0])"> <script> function upload(file) { let xhr = new XMLHttpRequest(); // track upload progress xhr.upload.onprogress = function(event) { console.log(`Uploaded ${event.loaded} of ${event.total}`); }; // track completion: both successful or not xhr.onloadend = function() { if (xhr.status == 200) { console.log("success"); } else { console.log("error " + this.status); } }; xhr.open("POST", "/article/xmlhttprequest/post/upload"); xhr.send(file); } </script>

Cross-origin requests

node.js package CORS XMLHttpRequest can make cross-origin requests, using the same CORS policy as fetch. Just like fetch, it doesn’t send cookies and HTTP-authorization to another origin by default. To enable them, set xhr.withCredentials to true: let xhr = new XMLHttpRequest(); xhr.withCredentials = true; xhr.open('POST', 'http://anywhere.com/request'); ... See the chapter Fetch: Cross-Origin Requests for details about cross-origin headers.

Summary

Typical code of the GET-request with XMLHttpRequest: let xhr = new XMLHttpRequest(); xhr.open('GET', '/my/url'); xhr.send(); xhr.onload = function() { if (xhr.status != 200) { // HTTP error? // handle error alert( 'Error: ' + xhr.status); return; } // get the response from xhr.response }; xhr.onprogress = function(event) { // report progress alert(`Loaded ${event.loaded} of ${event.total}`); }; xhr.onerror = function() { // handle non-HTTP error (e.g. network down) }; There are actually more events, the modern specification lists them (in the lifecycle order): loadstart – the request has started. progress – a data packet of the response has arrived, the whole response body at the moment is in response. abort – the request was canceled by the call xhr.abort(). error – connection error has occurred, e.g. wrong domain name. Doesn’t happen for HTTP-errors like 404. load – the request has finished successfully. timeout – the request was canceled due to timeout (only happens if it was set). loadend – triggers after load, error, timeout or abort. The error, abort, timeout, and load events are mutually exclusive. Only one of them may happen. The most used events are load completion (load), load failure (error), or we can use a single loadend handler and check the properties of the request object xhr to see what happened. We’ve already seen another event: readystatechange. Historically, it appeared long ago, before the specification settled. Nowadays, there’s no need to use it, we can replace it with newer events, but it can often be found in older scripts. If we need to track uploading specifically, then we should listen to same events on xhr.upload object.

ws: a Node.js WebSocket library



ws is a simple to use, blazing fast, and thoroughly tested WebSocket client and server implementation. Passes the quite extensive Autobahn test suite: server, client. Note: This module does not work in the browser. The client in the docs is a reference to a backend with the role of a client in the WebSocket communication. Browser clients must use the native WebSocket object. To make the same code work seamlessly on Node.js and the browser, you can use one of the many wrappers available on npm, like isomorphic-ws.

Installing

npm install ws

 Opt-in for performance

bufferutil is an optional module that can be installed alongside the ws module: npm install --save-optional bufferutil This is a binary addon that improves the performance of certain operations such as masking and unmasking the data payload of the WebSocket frames. Prebuilt binaries are available for the most popular platforms, so you don't necessarily need to have a C++ compiler installed on your machine. To force ws to not use bufferutil, use the WS_NO_BUFFER_UTIL environment variable. This can be useful to enhance security in systems where a user can put a package in the package search path of an application of another user, due to how the Node.js resolver algorithm works.

  Legacy opt-in for performance

If you are running on an old version of Node.js (prior to v18.14.0), ws also supports the utf-8-validate module: npm install --save-optional utf-8-validate This contains a binary polyfill for buffer.isUtf8(). To force ws not to use utf-8-validate, use the WS_NO_UTF_8_VALIDATE environment variable.

API docs

See /doc/ws.md for Node.js-like documentation of ws classes and utility functions.

WebSocket compression

ws supports the permessage-deflate extension which enables the client and server to negotiate a compression algorithm and its parameters, and then selectively apply it to the data payloads of each WebSocket message. The extension is disabled by default on the server and enabled by default on the client. It adds a significant overhead in terms of performance and memory consumption so we suggest to enable it only if it is really needed. Note that Node.js has a variety of issues with high-performance compression, where increased concurrency, especially on Linux, can lead to catastrophic memory fragmentation and slow performance. If you intend to use permessage-deflate in production, it is worthwhile to set up a test representative of your workload and ensure Node.js/zlib will handle it with acceptable performance and memory usage. Tuning of permessage-deflate can be done via the options defined below. You can also use zlibDeflateOptions and zlibInflateOptions, which is passed directly into the creation of raw deflate/inflate streams. See the docs for more options. import WebSocket, { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080, perMessageDeflate: { zlibDeflateOptions: { // See zlib defaults. chunkSize: 1024, memLevel: 7, level: 3 }, zlibInflateOptions: { chunkSize: 10 * 1024 }, // Other options settable: clientNoContextTakeover: true, // Defaults to negotiated value. serverNoContextTakeover: true, // Defaults to negotiated value. serverMaxWindowBits: 10, // Defaults to negotiated value. // Below options specified as default values. concurrencyLimit: 10, // Limits zlib concurrency for perf. threshold: 1024 // Size (in bytes) below which messages // should not be compressed if context takeover is disabled. } }); The client will only use the extension if it is supported and enabled on the server. To always disable the extension on the client, set the perMessageDeflate option to false. import WebSocket from 'ws'; const ws = new WebSocket('ws://www.host.com/path', { perMessageDeflate: false });

Usage examples

 Sending and receiving text data

import WebSocket from 'ws'; const ws = new WebSocket('ws://www.host.com/path'); ws.on('error', console.error); ws.on('open', function open() { ws.send('something'); }); ws.on('message', function message(data) { console.log('received: %s', data); });

 Sending binary data

import WebSocket from 'ws'; const ws = new WebSocket('ws://www.host.com/path'); ws.on('error', console.error); ws.on('open', function open() { const array = new Float32Array(5); for (var i = 0; i < array.length; ++i) { array[i] = i / 2; } ws.send(array); });

 Simple server

import { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', function connection(ws) { ws.on('error', console.error); ws.on('message', function message(data) { console.log('received: %s', data); }); ws.send('something'); });

 External HTTP/S server

import { createServer } from 'https'; import { readFileSync } from 'fs'; import { WebSocketServer } from 'ws'; const server = createServer({ cert: readFileSync('/path/to/cert.pem'), key: readFileSync('/path/to/key.pem') }); const wss = new WebSocketServer({ server }); wss.on('connection', function connection(ws) { ws.on('error', console.error); ws.on('message', function message(data) { console.log('received: %s', data); }); ws.send('something'); }); server.listen(8080);

 Multiple servers sharing a single HTTP/S server

import { createServer } from 'http'; import { WebSocketServer } from 'ws'; const server = createServer(); const wss1 = new WebSocketServer({ noServer: true }); const wss2 = new WebSocketServer({ noServer: true }); wss1.on('connection', function connection(ws) { ws.on('error', console.error); // ... }); wss2.on('connection', function connection(ws) { ws.on('error', console.error); // ... }); server.on('upgrade', function upgrade(request, socket, head) { const { pathname } = new URL(request.url, 'wss://base.url'); if (pathname === '/foo') { wss1.handleUpgrade(request, socket, head, function done(ws) { wss1.emit('connection', ws, request); }); } else if (pathname === '/bar') { wss2.handleUpgrade(request, socket, head, function done(ws) { wss2.emit('connection', ws, request); }); } else { socket.destroy(); } }); server.listen(8080);

 Client authentication

import { createServer } from 'http'; import { WebSocketServer } from 'ws'; function onSocketError(err) { console.error(err); } const server = createServer(); const wss = new WebSocketServer({ noServer: true }); wss.on('connection', function connection(ws, request, client) { ws.on('error', console.error); ws.on('message', function message(data) { console.log(`Received message ${data} from user ${client}`); }); }); server.on('upgrade', function upgrade(request, socket, head) { socket.on('error', onSocketError); // This function is not defined on purpose. Implement it with your own logic. authenticate(request, function next(err, client) { if (err || !client) { socket.write('HTTP/1.1 401 Unauthorized\r\n\r\n'); socket.destroy(); return; } socket.removeListener('error', onSocketError); wss.handleUpgrade(request, socket, head, function done(ws) { wss.emit('connection', ws, request, client); }); }); }); server.listen(8080); Also see the provided example using express-session.

 Server broadcast

A client WebSocket broadcasting to all connected WebSocket clients, including itself. import WebSocket, { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', function connection(ws) { ws.on('error', console.error); ws.on('message', function message(data, isBinary) { wss.clients.forEach(function each(client) { if (client.readyState === WebSocket.OPEN) { client.send(data, { binary: isBinary }); } }); }); }); A client WebSocket broadcasting to every other connected WebSocket clients, excluding itself. import WebSocket, { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', function connection(ws) { ws.on('error', console.error); ws.on('message', function message(data, isBinary) { wss.clients.forEach(function each(client) { if (client !== ws && client.readyState === WebSocket.OPEN) { client.send(data, { binary: isBinary }); } }); }); });

 Round-trip time

import WebSocket from 'ws'; const ws = new WebSocket('wss://websocket-echo.com/'); ws.on('error', console.error); ws.on('open', function open() { console.log('connected'); ws.send(Date.now()); }); ws.on('close', function close() { console.log('disconnected'); }); ws.on('message', function message(data) { console.log(`Round-trip time: ${Date.now() - data} ms`); setTimeout(function timeout() { ws.send(Date.now()); }, 500); });

 Use the Node.js streams API

import WebSocket, { createWebSocketStream } from 'ws'; const ws = new WebSocket('wss://websocket-echo.com/'); const duplex = createWebSocketStream(ws, { encoding: 'utf8' }); duplex.on('error', console.error); duplex.pipe(process.stdout); process.stdin.pipe(duplex);

 Other examples

For a full example with a browser client communicating with a ws server, see the examples folder. Otherwise, see the test cases.

FAQ

 How to get the IP address of the client?

The remote IP address can be obtained from the raw socket. import { WebSocketServer } from 'ws'; const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', function connection(ws, req) { const ip = req.socket.remoteAddress; ws.on('error', console.error); }); When the server runs behind a proxy like NGINX, the de-facto standard is to use the X-Forwarded-For header. wss.on('connection', function connection(ws, req) { const ip = req.headers['x-forwarded-for'].split(',')[0].trim(); ws.on('error', console.error); });

 How to detect and close broken connections?

Sometimes, the link between the server and the client can be interrupted in a way that keeps both the server and the client unaware of the broken state of the connection (e.g. when pulling the cord). In these cases, ping messages can be used as a means to verify that the remote endpoint is still responsive. import { WebSocketServer } from 'ws'; function heartbeat() { this.isAlive = true; } const wss = new WebSocketServer({ port: 8080 }); wss.on('connection', function connection(ws) { ws.isAlive = true; ws.on('error', console.error); ws.on('pong', heartbeat); }); const interval = setInterval(function ping() { wss.clients.forEach(function each(ws) { if (ws.isAlive === false) return ws.terminate(); ws.isAlive = false; ws.ping(); }); }, 30000); wss.on('close', function close() { clearInterval(interval); }); Pong messages are automatically sent in response to ping messages as required by the spec. Just like the server example above, your clients might as well lose connection without knowing it. You might want to add a ping listener on your clients to prevent that. A simple implementation would be: import WebSocket from 'ws'; function heartbeat() { clearTimeout(this.pingTimeout); // Use `WebSocket#terminate()`, which immediately destroys the connection, // instead of `WebSocket#close()`, which waits for the close timer. // Delay should be equal to the interval at which your server // sends out pings plus a conservative assumption of the latency. this.pingTimeout = setTimeout(() => { this.terminate(); }, 30000 + 1000); } const client = new WebSocket('wss://websocket-echo.com/'); client.on('error', console.error); client.on('open', heartbeat); client.on('ping', heartbeat); client.on('close', function clear() { clearTimeout(this.pingTimeout); });

 How to connect via a proxy?

Use a custom http.Agent implementation like https-proxy-agent or socks-proxy-agent.

Nodejs Websocket Guide



Create a WebSocket Server

First, make a new file and name it server.js. Then add the following server-side code: const express = require('express') const webserver = express() .use((req, res) => res.sendFile('/ws-client.html', { root: __dirname }) ) .listen(3000, () => console.log(`Listening on ${3000}`)) const { WebSocketServer } = require('ws') const sockserver = new WebSocketServer({ port: 2048 }) sockserver.on('connection', ws => { console.log('New client connected!') ws.send('connection established') ws.on('close', () => console.log('Client has disconnected!')) ws.on('message', data => { sockserver.clients.forEach(client => { console.log(`distributing message: ${data}`) client.send(`${data}`) }) }) ws.onerror = function () { console.log('websocket error') } } )

Create a WebSocket Client

Construct a page in HTML called ws-client.html. <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>WebSocket Chat App</title> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-GLhlTQ8iRABdZLl6O3oVMWSktQOp6b7In1Zl3/Jr59b6EGGoI1aFkw7cmDA6j6gD" crossorigin="anonymous"> <style> body{ padding:4rem; text-align: center; } </style> </head> <body> <h2>WebSocket Chat App</h2><br /><br /><br /> <form> <input type="text" placeholder="Enter message here" name="message"><br /><br /> <input type="submit" value="Send"><br /><br /> </form> <div></div> </body> </html> Add the following code to the script element of the HTML file to build a WebSocket client: const webSocket = new WebSocket('ws://localhost:2048/'); webSocket.onmessage = (event) => { console.log(event) document.getElementById('messages').innerHTML += 'Message from server: ' + event.data + "<br />"; }; webSocket.addEventListener("open", () => { console.log("Client is now connected"); }); function sendMessage(event) { var inputMessage = document.getElementById('message') webSocket.send(inputMessage.value) inputMessage.value = "" event.preventDefault(); } document.getElementById('input-form').addEventListener('submit', sendMessage); The final HTML file should look like this: <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>WebSocket Chat App</title> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0-alpha1/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-GLhlTQ8iRABdZLl6O3oVMWSktQOp6b7In1Zl3/Jr59b6EGGoI1aFkw7cmDA6j6gD" crossorigin="anonymous"> <style> body{ padding:4rem; text-align: center; } </style> </head> <body> <h2>WebSocket Chat App</h2><br /><br /><br /> <form> <input type="text" placeholder="Enter message here" name="message"><br /><br /> <input type="submit" value="Send"><br /><br /> </form> <div></div> <script> const webSocket = new WebSocket('ws://localhost:2048/'); webSocket.onmessage = (event) => { console.log(event) document.getElementById('messages').innerHTML += 'Message from server: ' + event.data + "<br />"; }; webSocket.addEventListener("open", () => { console.log("Client is now connected"); }); function sendMessage(event) { var inputMessage = document.getElementById('message') webSocket.send(inputMessage.value) inputMessage.value = "" event.preventDefault(); } document.getElementById('input-form').addEventListener('submit', sendMessage); </script> </body> </html> Your HTML page would look like this.

Start the Server and Run the Client

Finally, start the server by running the following command in your terminal: node server.js Now, go to your browser and open http://localhost:3000

Alternatives to the ws Library

 Socket.io

Working with WebSocket connections in Node.js is made simple and intuitive by the well-known WebSocket module Socket.io. Compared to the ws library, Socket.io has a larger codebase and more dependencies, which could make it slower and use more resources. Additionally, the code is challenging to read and maintain due to the heavy reliance on callbacks.

 SockJS

For browsers that do not support WebSockets, a fallback option is offered by SockJS, another WebSocket library. Additionally, it supports a variety of transports, such as polling, which can be advantageous when interacting with older browsers and devices. SockJS supports server-side implementations in many languages, including Java, Python, and Ruby, which is one of its key benefits. It is a wonderful option for developing cross-platform applications because of this. However, SockJS is less well-known and utilized than the ws library and Socket.io, making it more challenging to locate online assistance and information. Additionally, it offers fewer features and might not be appropriate for sophisticated applications.

 uWebSockets.js

A lightweight WebSocket framework called uWebSockets.js offers a high-performance interface for managing WebSocket connections in Node.js. It is a good option for high-speed data transmission applications since it uses a low-level C++ core to provide quick performance and little latency. One of uWebSockets.js’s key benefits is its compact codebase and little resource utilization, which can lower server costs and boost performance. It is a suitable option for real-time applications because it has a built-in system for handling disconnections and faults. The learning curve for uWebSockets.js is steeper than that of other WebSocket libraries, hence it might not be appropriate for novice programmers or developers with little background in low-level programming.

Pros and Cons of WebSocket Libraries

Each library has advantages and disadvantages, depending on the application’s requirements and the developer’s experience. Here are some general pros and cons of using WebSocket libraries in Node.js.
ProsCons
WebSocket libraries provide a simple and convenient interface.WebSocket libraries can be consume more resources than traditional HTTP connections. This increases server costs and reduces performance.
They support real-time data transfers and bidirectional communication between clients and servers.They can be more complicated to set up and configure compared to traditional HTTP connections.
They can create complex applications with multiple communication channels. For eg. chat rooms, multiplayer games, and real-time dashboards.They require the server and client to support the WebSocket protocol. This may limit compatibility with older devices and browsers.

Alternatives of WebSockets

Let’s dive into the alternatives to WebSockets and compare them to WebSockets.

 Long Polling

Long polling is a method where the client sends a request to the server, which keeps it open until it receives new data. As a result, real-time communication is possible without a constant connection. Long polling, however, has the potential to be ineffective and slow, particularly for applications with many active clients.

 Server-Sent Events (SSE)

A single HTTP connection can push real-time updates from the server to the client using the SSE standard. Compared to WebSockets, SSE is easier to use and doesn’t require a separate protocol. However, not all browsers support it.

 WebRTC

Real-time communication is made possible across browsers thanks to the WebRTC protocol. For applications like video conferencing or live streaming that need a lot of bandwidth and low latency, WebRTC is the best choice.

 MQTT

MQTT is a lightweight messaging protocol often used for Internet of Things (IoT) applications. MQTT is well-suited for low-power devices and unreliable network connections, but it’s not as widely supported as WebSockets.
It’s crucial to take into account the particular requirements of your application when contrasting WebSockets to various alternatives. WebSockets are able to support numerous active clients and provide low-latency, bidirectional communication. Modern browsers also frequently support WebSockets, which are simple to create in Node.js using the ws package. On the other side, though they may be easier to build, some alternatives, such as Long Polling and SSE, could not be as effective or scalable. Although WebRTC involves additional setup and is not always required, it is excellent for some use cases. MQTT works well for Internet of Things applications, however, it might not work for all real-time communication scenarios.

Chat with WebSockets, Server and Client both in NodeJs

There will be no persistence here. A client will see only messages that server sends him from the point he connects to it. Also whenever a client is connected, the server will generate a random color and a random name, and the client will use that. It’s not necessary, it’s just a few extra stuff. Utils — this file contains the generate random color and generate random name. You can check the contents of that file in full on Github. Link at the end of story. myClient is just a class called Client (not the actual client we are gonna define that connects to the server). It contains 2 fields, name & color. handleReceviedMsg accepts the message that was send to server, and returns that it’s JSON stringified. For WebSockets we are using the ws package. And chalk is used to make the console log a little colourful.

Server

First we create the WebSocketServer, running on port 8080, with option for client tracking. This tracking enables that the WebSocket server has clients property. When the server detects a connection, we create a client instance. It’s just a placeholder for name and color. Then we send the init or welcome message to the client, with that information. Whenever a new message is received, we will send that message to every client back. Even to the one who has send it. So they know that the message was received and transmitted.

Client

The client connects to the WebSocket server. On initial or welcome message, the server will return the name and the color. The client will use that to display message in color with the help of chalk. For client input we use nodes readline. Readline listents on the line event. This event is triggered whenever the client presses the Enter key. On that event we capture the text that was inputed and together with the clients name and color we send that to the server. Everything else should be pretty much self explanatory. On open is when we have a connection established. On close is when the connection gets terminated. On message is whenever server send us a message. client.js server.js Whole structure and code: https://github.com/zprima/wschat

Creating a chat with Node.js

https://itnext.io/creating-a-chat-with-node-js-from-the-scratch-707896d64593 About Socket.IO

socket.io This Node.js module brings a way to connect directly from the client browser to the application server. The library works through events, that is, the server or client will trigger events so that there are responses from one of the parties In a way, let’s use two very basic methods, which are emit and on. One serves to make the emission of the event and the other to receive the response of it. Each side of the application will therefore have the Socket.IO library added. In addition to allowing the direct exchange of messages between two devices, Socket.IO also allows the broadcast of messages, sending an event to all other connected users. The broadcast can be both from client and server. When the user accesses the page, a socket is created with the server and it is through this socket that the exchange of messages between a client and a server takes place. This, in turn, can either issue an event to a single Socket or to all the sockets connected to it, what we call a message broadcast.

The project


Let’s create now a directory called \ChatJs and, inside of it, we will create a file called app.js, which will be the main file of our server. As a first part we will create a fairly simple server that will only present a successful message on the browser screen Creating a single application. var app = require('http').createServer(response); app.listen(3000); console.log("App running…"); function response(req, res) { res.writeHead(200); res.end("Hi, your server is working!"); } The script creates an HTTP server (which will be listening on port 3000) which has as main method to be requested the response() function, which, in turn, has two parameters: req (request) and res (response). Into the function, we define a success code (200) and end it with a string warning that the server is ok. Soon after, just run the following command, which will run our application at the prompt: node app.js Note that when you run this code at the prompt, the terminal presents the content of the console.log function warning that the application is running. However, it will not print any other lines, indicating that our application is currently running. At this point, we have only our Node.js server running. If you access the browser at http://localhost:3000/ you’ll see the message we passed in the end method Next, we will make our server present an HTML response that will be the main page of our chat. For this, we will have to load the FileSystem module, since we will navigate the project directory and open a file. So, let’s change our app.js just like we see at Listing 4. Before making the changes, go to the prompt and press Ctrl + C (or command + C) to stop our application on the server. Listing 4.

Introducing an HTML page


var app = require('http').createServer(response); var fs = require('fs'); app.listen(3000); console.log("App running…"); function response(req, res) { fs.readFile(__dirname + '/index.html', function (err, data) { if (err) { res.writeHead(500); return res.end('Failed to load file index.html'); } res.writeHead(200); res.end(data); }); } After these changes we will again execute the command node app.js and, when accessing again the address http://localhost:3000/, you’ll come across the message “Error loading the index.html file”, just because we don’t have an index.html file inside our project yet. It is also important to remember that the server we created so far does not differentiate the path, ie you can put anything after http://localhost:3000/ and it will always respond in the same way because we have not implemented how it’d treat these paths. Soon, you can very well call up addresses like http://localhost:3000/chat, http://localhost:3000/error, http://localhost:3000/potato, etc. Let’s create a simple interface for our chat. Create an index.html file inside the project root directory. In this file enter a code equal to that shown in Listing 5. Listing 5.

Chat HTML code


<!DOCTYPE html> <html> <head> <title>ChatJS</title> <link rel="stylesheet" type="text/css" href="/css/style.css" /> </head> <body> <div id="history"></div> <form id="chat"> <input type="text" id="msg_text" name="msg_text" /> <input type="submit" value="Send!" /> </form> </body> </html> Our index, for now, will only deals with a div called history that is where all the messages exchanged in the chat will be arranged. Then, we have soon after a form with a text box and the button of message sending. A very simple chat structure so far. However, if you now try to access the address http://localhost:3000/ you will receive the same error message. This is because we do not restart our server application, then we go to the prompt again, press Ctrl + C and then reexecute the app. As you may have noticed, we already left a link tag in the <head> of our application to load our CSS. Within the directory of our project create another directory called css and, inside it, the style.css file with the same content as shown in Listing 6. Listing 6.

style.css file


html, body, input { font-family: Georgia, Tahoma, Arial, sans-serif; margin: 0; padding: 0;} body { background: #302F31; padding: 10px;} form { margin: 15px 0;} form input[type='text'] { border: 2px solid #eb5424; border-radius: 5px; padding: 5px; width: 75%;} form input[type='submit'] { background: #eb5424; border: none; border-radius: 5px; color: #FFF; cursor: pointer; font-weight: bold; padding: 7px 5px; width: 19%;} #history { background: #FFF; border: 2px solid #eb5424; height: 550px;} If we restart the application, the style is not yet applied to the index page. The reason is that our app.js only deals with a request path so far. To solve this we will change our app.js file so that it loads the files that are passed in the request URL, instead of placing each of the URLs manually. Let’s take a closer look at the changes listed in Listing 7. Listing 7.

Path changes in app.js


var app = require('http').createServer(response); var fs = require('fs'); app.listen(3000); console.log("App running..."); function response(req, res) { var file = ""; if (req.url == "/") { file = __dirname + '/index.html'; } else { file = __dirname + req.url; } fs.readFile(file, function(err, data) { if (err) { res.writeHead(404); return res.end('Page or file not found'); } res.writeHead(200); res.end(data); }); } After restarting the app

Sending messages


We will now work on the messaging mechanism. Our application will work by communicating with the Node.js server through the client-side library of Socket.IO while jQuery takes place in the interaction with the page. For this, we will change the app.js file as shown in Listing 8, and include a line of a command at the beginning of the file stating that we are including Socket.IO in the application. Listing 8.

Including Socket.IO module


var app = require('http').createServer(response); var fs = require('fs'); var io = require('socket.io')(app); … In order to use the require function in a module we need first to install it for our application. So, stop the application and run the following command to get this done: npm install socket.io Once finished, go to your index.html page and add the code snippet shown in Listing 9, at the end of the file. Listing 9.

Message sending event


… <script type="text/javascript" src="https://code.jquery.com/jquery-3.3.1.min.js"></script> <script type="text/javascript" src="/socket.io/socket.io.js"></script> <script type="text/javascript"> var socket = io.connect(); $("form#chat").submit(function(e) { e.preventDefault(); socket.emit("send message", $(this).find("#msg_text").val(), function() { $("form#chat #msg_text").val(""); }); }); </script> </body> </html> We are declaring a socket variable that refers to the Socket.IO library, which will be responsible for all socket functionalities. Next, we declare a submit event of our form in jQuery and pass a preventDefault so that the form does not proceed to its action, since we are the ones who are going to take care of the form response. Note that the emit method of the library is invoked, in which we pass as parameters three things: the event name (this will be useful on the server), the data we are sending (in this case we are only sending the contents of the message field) and finally the callback, a function that will be executed once the event is issued. The latter, in particular, will only serve to clear the message field, so the user does not have to delete the message after sending it. If we now test our application the message sending will not work, not even the callback to clear the message field, because we have not yet put the functionality of what the server have to do as soon as it receives this event. To do this, edit the app.js file and put the code shown in Listing 11 at the end of it. Listing 11.

Receiving messages from the client


io.on("connection", function(socket) { socket.on("send message", function(sent_msg, callback) { sent_msg = "repeat: " + sent_msg; io.sockets.emit("update messages", sent_msg); callback(); }); }); We’ve created a method that will work in response to the client’s connection to the server. When the client accesses the page it triggers this method on the server and when this socket receives a send message we trigger a method that has as parameters the sent data (the message field) and the callback that we created on the client side. Within this method we put the second part of the functionality: the module will send to the sockets connected to the server (all users) the update messages event and will also pass which new message was sent, with a specific datetime format. To provide the date and time we create a separate function because we will still use this method a few more times throughout the development. Right away, we call the callback that we created on the client side, which is the method for clearing the fields. Finally, also edit the index.html file and create the method that will update the messages for the users. The idea is quite simple: let’s give an append in the history div (the changes are in Listing 12). The following lines should be entered shortly after submitting the form. Listing 12.

Updating message history


socket.on("update messages", function(msg){ var final_message = $("<p />").text(msg); $("#history").append(final_message); }); Basically, the conversation between the server and the client is the same on both sides, that is, the two have an emit and on functions for issuing and receiving events, respectively. So, restart and access the application in two tabs and just send a message to see the power of Socket.IO in action. The application should display the message full source code here.

multiple clients


phantomJS headless NodeJS server handle multiple users There are multiple clients C1, C2, ..., Cn Clients emit request to the server R1,...,Rn Server receives request, does data processing When data-processing is complete, Server emits response to clients Rs1, .., Rs2 When the server has finished data processing it emits the response in the following way: // server listens for request from client socket.on('request_from_client', function(data){ // user data and request_type is stored in the data variable var user = data.user.id var action = data.action // server does data processing do_some_action(..., function(rData){ // when the processing is completed, the response data is emitted as a response_event // The problem is here, how to make sure that the response data goes to the right client socket.emit('response_to_client', rData) }) }) The instance of the socket object corresponds to a client connection. So every message you emit from that instance is send to the client that opened that socket connection. Remember that upon the connection event you get (through the onDone callback) the socket connection object. This event triggers everytime a client connects to the socket.io server. If you want to send a message to all clients you can use io.sockets.emit("message-to-all-clients") and if you want to send an event to every client apart the one that emits the event socket.broadcast.emit("message-to-all-other-clients"); On every connection, a "new channel" is created.

multiple clients connecting to same server


Server- var dgram = require('dgram'); var client = dgram.createSocket('udp4'); /** @requires socket.io */ var io = require('socket.io')(http); /** Array of clients created to keep track of who is listening to what*/ var clients = []; io.sockets.on('connection', function(socket, username){ /** printing out the client who joined */ console.log('New client connected (id=' + socket.id + ').'); /** pushing new client to client array*/ clients.push(socket); /** listening for acknowledgement message */ client.on('message', function( message, rinfo ){ /** creating temp array to put data in */ var temp = []; /**converting data bit to bytes */ var number= req.body.size * 2 /** acknowledgement message is converted to a string from buffer */ var message = message.toString(); /** cutting hex string to correspong to requested data size*/ var data = message.substring(0, number); /** converting that data to decimal */ var data = parseInt(data, 16); /** adding data to data array */ temp[0] = data /** emitting message to html page */ socket.emit('temp', temp); }); /** listening if client has disconnected */ socket.on('disconnect', function() { clients.splice(clients.indexOf(client), 1); console.log('client disconnected (id=' + socket.id + ').'); clearInterval(loop); }); }); } }); Client- var socket = io.connect('192.168.0.136:3000'); socket.on(temp', function(temp){ var temp= temp.toString(); var message= temp.split(',').join("
"); $('#output').html('' + message + ''); }); When a client connects, a random number called temp is emitted to the client. The above code works when one client connects to the server. Now how can you set a new connection each time? So that if one tab is opened, it gets its own random message back, while when another tab opens, it gets its own random message back. You could send an id back to the client and save it to localStorage (or a variable or anywhere else). Then have the client listen to a 'room' that is just for this client. e.g. var uuidSocket = io(serverHost + "/" + uuid); uuidSocket.on("Info", (data:any)=> { // do something on data }); This id / room will be used by the server to inform the specific client. And so on. Server side: // you have your socket ready and inside the on('connect'...) you handle a register event where the client passes an id if one exists else you create one. socket.on('register', function(clientUuid){ // a client requests registration var id = clientUuid == null? uuid.v4() : clientUuid; // create an id if client doesn't already have one var nsp; var ns = "/" + id; socket.join(id); var nsp = app.io.of(ns); // create a room using this id that is only for this client clientToRooms[ns] = nsp; // save it to a dictionary for future use // set up what to do on connection nsp.on('connection', function(nsSocket){ console.log('someone connected'); nsSocket.on('Info', function(data){ // just an example }); }); Client side: // you already have declared uuid, uuidSocket and have connected to the socket previously so you define what to do on register: socket.on("register", function(data){ if (uuid == undefined || uuidSocket == undefined) {// first time we get id from server //save id to a variable uuid = data.uuid; // save to localstorage for further usage (optional - only if you want one client per browser e.g.) localStorage.setItem('socketUUID', uuid); uuidSocket = io(serverHost + "/" + uuid); // set up the room --> will trigger nsp.on('connect',... ) on the server uuidSocket.on("Info", function(data){ //handle on Info }); // initiate the register from the client socket.emit("register", uuid); Broadcasting means sending a message to everyone else except for the socket that starts it. Server: var io = require('socket.io')(80); io.on('connection', function (socket) { socket.broadcast.emit('user connected'); });

WebSocket Communication



♦Complete Guide To Node Client-Server Communication node js communicate with client side javascript Using WebSockets with Node.js The easiest way is to set up Express and have your client side code communicate via Ajax (for example, using jQuery). (function() { var app, express; express = require("express"); app = express.createServer(); app.configure(function() { app.use(express.bodyParser()); return app.use(app.router); }); app.configure("development", function() { return app.use(express.errorHandler({ dumpExceptions: true, showStack: true })); }); app.post("/locations", function(request, response) { var latitude, longitude; latitude = request.body.latitude; longitude = request.body.longitude; return response.json({}, 200); }); app.listen(80); }).call(this); On the client side, call it like this: var latitude = 0, longitude = 0; // Set from form $.post({ url: "http://localhost/locations", data: {latitude: latitude, longitude: longitude}, success: function (data) { console.log("Success"); }, dataType: "json" }); Note this code is simply an example; you'll have to work out the error handling, etc. Another way is by making an HTTP request, just like any other server side program in a web application. With the XMLHttpRequest object, or by generating a <form> and then submitting it, or a variety of other methods.

Create a new WebSockets connection


const url = 'wss://myserver.com/something' const connection = new WebSocket(url) connection is a WebSocket object. When the connection is successfully established, the open event is fired. Listen for it by assigning a callback function to the onopen property of the connection object: connection.onopen = () => { //... } If there's any error, the onerror function callback is fired: connection.onerror = error => { console.log(`WebSocket error: ${error}`) }

Sending data to the server using WebSockets


Once the connection is open, you can send data to the server. You can do so conveniently inside the onopen callback function: connection.onopen = () => { connection.send('hey') }

Receiving data from the server using WebSockets


Listen with a callback function on onmessage, which is called when the message event is received: connection.onmessage = e => { console.log(e.data) }

Implement a WebSockets server in Node.js


ws is a popular WebSockets library for Node.js. We'll use it to build a WebSockets server. It can also be used to implement a client, and use WebSockets to communicate between two backend services. Easily install it using yarn init yarn add ws The code you need to write is very little: const WebSocket = require('ws') const wss = new WebSocket.Server({ port: 8080 }) wss.on('connection', ws => { ws.on('message', message => { console.log(`Received message => ${message}`) }) ws.send('ho!') }) This code creates a new server on port 8080 (the default port for WebSockets), and adds a callback function when a connection is established, sending ho! to the client, and logging the messages it receives.

Guide To Node Client-Server Communication





Complete source code for each implementation here.

WebSocket Client-Server Demo

WebSockets is a technology, based on the ws protocol, that makes it possible to establish a continuous full-duplex connection stream between a client and a server. A typical websocket client would be a user’s browser, but the protocol is platform independent. It is the closest API to a raw network socket in the browser. Except a WebSocket connection is also much more than a network socket, as the browser abstracts all the complexity behind a simple API and provides a number of additional services: Connection negotiation and same-origin policy enforcement Interoperability with existing HTTP infrastructure Message-oriented communication and efficient message framing Subprotocol negotiation and extensibility Here are some Node.js libraries WebSocket Ws Socket.io Sockjs Socketcluster WebSocket as promised This is a demo shows a demo of a client connecting to a websocket server and sharing data. Here is the server.js of a websocket. 'use strict'; const WebSocketServer = require('ws').Server const wss = new WebSocketServer({ port: 8081 }); wss.on('connection', ((ws) => { ws.on('message', (message) => { console.log(`received: ${m essage}`); }); ws.on('end', () => { console.log('Connection ended...'); }); ws.send('Hello Client'); })); Here is the client.js of a websocket. console.log('open: '); var ws = new WebSocket("ws://127.0.0.1:8081"); ws.onopen = function (event) { console.log('Connection is open ...'); ws.send("Hello Server"); }; ws.onerror = function (err) { console.log('err: ', err); } ws.onmessage = function (event) { console.log(event.data); document.body.innerHTML += event.data + '<br>'; }; ws.onclose = function() { console.log("Connection is closed..."); }

Stream Updates with Server-Sent Events (SSE)

SSEs are sent over traditional HTTP. That means they do not require a special protocol or server implementation to get working. WebSockets on the other hand, require full-duplex connections and new Web Socket servers to handle the protocol. In addition, Server-Sent Events have a variety of features that WebSockets lack by design such as automatic reconnection, event IDs, and the ability to send arbitrary events.

Server-Sent Events vs. WebSockets

APIs like WebSockets provide a richer protocol to perform bi-directional, full-duplex communication. Having a two-way channel is more attractive for things like games, messaging apps, and for cases where you need near real-time updates in both directions. However, in some scenarios data doesn’t need to be sent from the client. You simply need updates from some server action. A few examples would be friends’ status updates, stock tickers, news feeds, or other automated data push mechanisms (e.g. updating a client-side Web SQL Database or IndexedDB object store). If you’ll need to send data to a server, XMLHttpRequest is always a friend. Here is the server.js of our Server Sent Event, we will be sending out data to the client every 5 seconds with an updated timestamp via SSE. 'use strict'; const http = require('http'); const util = require('util'); const fs = require('fs'); http.createServer((req, res) => { debugHeaders(req); if (req.headers.accept && req.headers.accept == 'text/event-stream') { if (req.url == '/events') { sendSSE(req, res); } else { res.writeHead(404); res.end(); } } else { res.writeHead(200, {' Content-Type': 'text/html'}); res.write(fs.readFileSync(__dirname + '/index.html')); res.end(); } }).listen(8000); const sendSSE = (req, res) => { res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' }); const id = (new Date()).toLocaleTimeString(); setInterval(() => { constructSSE(res, id, (new Date()).toLocaleTimeString()); }, 5000); constructSSE(res, id, (new Date()).toLocaleTimeString()); //res.end(); } const constructSSE = (res, id, data) => { res.write('id: ' + id + '\n'); res.write("data: " + data + '\n\n'); } const debugHeaders = (req) => { util.puts('URL: ' + req.url); for (let key in req.headers) { util.puts(key + ': ' + req.headers[key]); } util.puts('\n\n'); } And here is the client.js which is reference by the index.html on the client side. Notice how the client never sends out a formal request for data with SSE’s. Once the intial connection has been made with the server then the plain text data can be sent to the client as needed! var source = new EventSource('/events'); source.onmessage = function(e) { document.body.innerHTML += e.data + ''; };

XMLHttpRequest (XHR)

XMLHttpRequest (XHR) is a browser-level API that enables the client to script data transfers via JavaScript. XHR made its first debut in Internet Explorer 5, became one of the key technologies behind the Asynchronous JavaScript and XML (AJAX) revolution, and is now a fundamental building block of nearly every modern web application.
XMLHTTP changed everything. It put the “D” in DHTML. It allowed us to asynchronously get data from the server and preserve document state on the client… The Outlook Web Access (OWA) team’s desire to build a rich Win32 like application in a browser pushed the technology into IE that allowed AJAX to become a reality. — Jim Van Eaton Outlook Web Access: A catalyst for web evolution
Here I am running a simple Express server with a simple route to send requested data to the Client. 'use strict'; var express = require('express'); var app = express(); app.use(express.static(`${_ _dirname}/public`)); app.get('/api', function(req, res){ res.send((new Date()).toLocaleTimeString()); }); app.listen(3000); Here is the javascript file linked to my index.html on the client side. I am using the baked in XHR methods as opposed to jQuery since I love to use vanilla JavaScript whenever possible. 'use strict' function reqListener (data) { document.body.innerHTML += this.responseText + '<br>'; } setInterval(function () { var oReq = new XMLHttpRequest(); oReq.addEventListener("load", reqListener); oReq.open("GET", "/api"); oReq.send(); }, 3000); In my Github repo, I cover two more use cases not referenced here, server to server communications and HTTP/2. If you are curious about those forms of communication check it out. One word about HTTP/2 before wrapping up. HTTP/2 is the future of Client-Server communication, but it is a protocol built on top of HTTP/1.1 which means that all of these forms of communicating will be still be relevant in the future, just the means that they are transmitted will be updated. As you can see there are a ton of different ways you can send data between a client and server. Before working on this project, I had no idea how many different ways were available in vanilla JavaScript for moving data around. Did I miss anything or you see something that needs to be fixed? Let me know in the comments below.

Create Web Server in Node.js

https://curl.se/docs/httpscripting.html https://www.digitalocean.com/community/tutorial-series/how-to-code-in-node-js https://www.w3schools.com/nodejs/nodejs_http.asp https://nodejs.org/en/learn/getting-started/introduction-to-nodejs https://runjs.app/blog/how-to-start-a-node-server https://www.tutorialsteacher.com/nodejs/create-nodejs-web-server

Introduction


A web server receives HTTP requests from a client, like your browser, and provides an HTTP response, like an HTML page or JSON from an API. This software generally falls into two categories: frontend and backend. Front-end code is concerned with how the content is presented, such as the color of a navigation bar and the text styling. Back-end code is concerned with how data is exchanged, processed, and stored. Code that handles network requests from your browser or communicates with the database is primarily managed by back-end code. Prerequisites Ensure that Node.js is installed on your development machine.

Step 1 — Creating a Basic HTTP Server


Let’s start by creating a server that returns plain text to the user. In the terminal, create a folder called first-servers: mkdir first-servers Then enter that folder: cd first-servers Now, create the file that will house the code: touch hello.js Open the file in a text editor. We will use nano as it’s available in the terminal: nano hello.js We start by loading the http module that’s standard with all Node.js installations. Add the following line to hello.js: first-servers/hello.js const http = require("http"); Our next step will be to define two constants, the host and port that our server will be bound to: first-servers/hello.js const host = 'localhost'; const port = 8000; The value localhost is a special private address that computers use to refer to themselves. It’s typically the equivalent of the internal IP address 127.0.0.1 and it’s only available to the local computer, not to any local networks we’ve joined or to the internet. The port is a number that servers use as an endpoint or “door” to our IP address. In our example, we will use port 8000 for our web server. Ports 8080 and 8000 are typically used as default ports in development, and in most cases developers will use them rather than other ports for HTTP servers. When we bind our server to this host and port, we will be able to reach our server by visiting http://localhost:8000 in a local browser. Let’s add a special function, which in Node.js we call a request listener. This function is meant to handle an incoming HTTP request and return an HTTP response. This function must have two arguments, a request object and a response object. The request object captures all the data of the HTTP request that’s coming in. The response object is used to return HTTP responses for the server. We want our first server to return this message whenever someone accesses it: "My first server!". Let’s add that function next: first-servers/hello.js const requestListener = function (req, res) { res.writeHead(200); res.end("My first server!"); }; The function would usually be named based on what it does. For example, if we created a request listener function to return a list of books, we would likely name it listBooks(). Since this one is a sample case, we will use the generic name requestListener. All request listener functions in Node.js accept two arguments: req and res (we can name them differently if we want). The HTTP request the user sends is captured in a Request object, which corresponds to the first argument, req. The HTTP response that we return to the user is formed by interacting with the Response object in second argument, res. The first line res.writeHead(200); sets the HTTP status code of the response. HTTP status codes indicate how well an HTTP request was handled by the server. In this case, the status code 200 corresponds to "OK". The next line of the function, res.end("My first server!");, writes the HTTP response back to the client who requested it. This function returns any data the server has to return. In this case, it’s returning text data. Finally, we can now create our server and make use of our request listener: first-servers/hello.js const server = http.createServer(requestListener); server.listen(port, host, () => { console.log(`Server is running on http://${host}:${port}`); }); Save and exit. In the first line, we create a new server object via the http module’s createServer() function. This server accepts HTTP requests and passes them on to our requestListener() function. After we create our server, we must bind it to a network address. We do that with the server.listen() method. It accepts three arguments: port, host, and a callback function that fires when the server begins to listen. All of these arguments are optional, but it is a good idea to explicitly state which port and host we want a web server to use. When deploying web servers to different environments, knowing the port and host it is running on is required to set up load balancing or a DNS alias. The callback function logs a message to our console so we can know when the server began listening to connections. Note: Even though requestListener() does not use the req object, it must still be the first argument of the function. With less than fifteen lines of code, we now have a web server. Let’s see it in action and test it end-to-end by running the program: node hello.js In the console, we will see this output: OutputServer is running on http://localhost:8000 Notice that the prompt disappears. This is because a Node.js server is a long running process. It only exits if it encounters an error that causes it to crash and quit, or if we stop the Node.js process running the server. In a separate terminal window, we’ll communicate with the server using cURL, a CLI tool to transfer data to and from a network. Enter the command to make an HTTP GET request to our running server: curl http://localhost:8000 When we press ENTER, our terminal will show the following output: OutputMy first server! We’ve now set up a server and got our first server response. Let’s break down what happened when we tested our server. Using cURL, we sent a GET request to the server at http://localhost:8000. Our Node.js server listened to connections from that address. The server passed that request to the requestListener() function. The function returned text data with the status code 200. The server then sent that response back to cURL, which displayed the message in our terminal. Before we continue, let’s exit our running server by pressing CTRL+C. This interrupts our server’s execution, bringing us back to the command line prompt. In most web sites we visit or APIs we use, the server responses are seldom in plain text. We get HTML pages and JSON data as common response formats. In the next step, we will learn how to return HTTP responses in common data formats we encounter in the web.

Step 2 — Returning Different Types of Content


The response we return from a web server can take a variety of formats. JSON and HTML were mentioned before, and we can also return other text formats like XML and CSV. Finally, web servers can return non-text data like PDFs, zipped files, audio, and video. In this article, we return the following types of data: JSON, CSV, HTML In the context of Node.js, we need to do two things: Set the Content-Type header in our HTTP responses with the appropriate value. Ensure that res.end() gets the data in the right format. Let’s see this in action with some examples. Most changes exist within the requestListener() function. Let’s create files with this “template code” to make future sections easier to follow. Create a new file called html.js. This file will be used later to return HTML text in an HTTP response. We’ll put the template code here and copy it to the other servers that return various types. In the terminal, enter the following: touch html.js Now open this file in a text editor: nano html.js Let’s copy the “template code.” Enter this in nano: first-servers/html.js const http = require("http"); const host = 'localhost'; const port = 8000; const requestListener = function (req, res) {}; const server = http.createServer(requestListener); server.listen(port, host, () => { console.log(`Server is running on http://${host}:${port}`); }); Save and exit html.js, then return to the terminal. Now let’s copy this file into two new files. The first file will be to return CSV data in the HTTP response: cp html.js csv.js The second file will return a JSON response in the server: cp html.js json.js The remaining files will be for later exercises: cp html.js htmlFile.js cp html.js routes.js We’re now set up to continue our exercises. Let’s begin with returning JSON.

 Serving JSON

Open the json.js file: nano json.js We want to return a JSON response. Let’s modify the requestListener() function to return the appropriate header all JSON responses have by changing the highlighted lines like so: first-servers/json.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); }; The res.setHeader() method adds an HTTP header to the response. HTTP headers are additional information that can be attached to a request or a response. The res.setHeader() method takes two arguments: the header’s name and its value. The Content-Type header is used to indicate the format of the data, also known as media type, that’s being sent with the request or response. In this case our Content-Type is application/json. Now, let’s return JSON content to the user. Modify json.js so it looks like this: first-servers/json.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); res.writeHead(200); res.end(`{"message": "This is a JSON response"}`); }; Like before, we tell the user that their request was successful by returning a status code of 200. This time in the response.end() call, our string argument contains valid JSON. Save and exit json.js. Now, let’s run the server with the node command: node json.js In another terminal, let’s reach the server by using cURL: curl http://localhost:8000 As we press ENTER, we will see the following result: Output{"message": "This is a JSON response"} We now have successfully returned a JSON response, just like many of the popular APIs we create apps with. Be sure to exit the running server with CTRL+C so we can return to the standard terminal prompt. Next, let’s look at another popular format of returning data: CSV.

 Serving CSV

Open the csv.js file with a text editor: nano csv.js Let’s add the following lines to our requestListener() function: first-servers/csv.js const requestListener = function (req, res) { res.setHeader("Content-Type", "text/csv"); res.setHeader("Content-Disposition", "attachment; filename=oceanpals.csv"); }; This time, our Content-Type indicates that a CSV file is being returned as the value is text/csv. The second header we add is Content-Disposition. This header tells the browser how to display the data, particularly in the browser or as a separate file. When we return CSV responses, most modern browsers automatically download the file even if the Content-Disposition header is not set. However, when returning a CSV file we should still add this header as it allows us to set the name of the CSV file. In this case, we signal to the browser that this CSV file is an attachment and should be downloaded. We then tell the browser that the file’s name is oceanpals.csv. Let’s write the CSV data in the HTTP response: first-servers/csv.js const requestListener = function (req, res) { res.setHeader("Content-Type", "text/csv"); res.setHeader("Content-Disposition", "attachment;filename=oceanpals.csv"); res.writeHead(200); res.end(`id,name,email\n1,Sammy Shark,shark@ocean.com`); }; Like before we return a 200/OK status with our response. This time, our call to res.end() has a string that’s a valid CSV. The comma separates the value in each column and the new line character (\n) separates the rows. We have two rows, one for the table header and one for the data. We’ll test this server in the browser. Save csv.js and exit the editor with CTRL+X. Run the server with the Node.js command: node csv.js In another Terminal, let’s reach the server by using cURL: curl http://localhost:8000 The console will show this: Outputid,name,email 1,Sammy Shark,shark@ocean.com If we go to http://localhost:8000 in our browser, a CSV file will be downloaded. Its file name will be oceanpals.csv. Exit the running server with CTRL+C to return to the standard terminal prompt. Having returned JSON and CSV, we’ve covered two cases that are popular for APIs. Let’s move on to how we return data for websites people view in a browser.

 Serving HTML

Let’s reopen html.js with our text editor: nano html.js Modify the requestListener() function to return the appropriate Content-Type header for an HTML response: first-servers/html.js const requestListener = function (req, res) { res.setHeader("Content-Type", "text/html"); }; Now, let’s return HTML content to the user. Add the highlighted lines to html.js so it looks like this: first-servers/html.js const requestListener = function (req, res) { res.setHeader("Content-Type", "text/html"); res.writeHead(200); res.end(`<html><body><h2>This is HTML</h2></body></html>`); }; We first add the HTTP status code. We then call response.end() with a string argument that contains valid HTML. When we access our server in the browser, we will see an HTML page with one header tag containing This is HTML. Let’s save and exit. Now, let’s run the server with the node command: node html.js We will see Server is running on http://localhost:8000 when our program has started. Now go into the browser and visit http://localhost:8000. Our page will look like this: Image of HTML response returned from Node.js server Let’s quit the running server with CTRL+C and return to the standard terminal prompt. It’s common for HTML to be written in a file, separate from the server-side code like our Node.js programs. Next, let’s see how we can return HTML responses from files.

Step 3 — Serving an HTML Page From a File with the fs


We can serve HTML as strings in Node.js to the user, but it’s preferable that we load HTML files and serve their content. This way, as the HTML file grows we don’t have to maintain long strings in our Node.js code, keeping it more concise and allowing us to work on each aspect of our website independently. This “separation of concerns” is common in many web development setups, so it’s good to know how to load HTML files to support it in Node.js To serve HTML files, we load the HTML file with the fs module and use its data when writing our HTTP response. First, we’ll create an HTML file that the web server will return. Create a new HTML file: touch index.html Now open index.html in a text editor: nano index.html Our web page will be minimal. It will have an orange background and will display some greeting text in the center. Add this code to the file: first-servers/index.html <!DOCTYPE html> <head> <title>My Website</title> <style> *, html { margin: 0; padding: 0; border: 0; } html { width: 100%; height: 100%; } body { width: 100%; height: 100%; position: relative; background-color: rgb(236, 152, 42); } .center { width: 100%; height: 50%; margin: 0; position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); color: white; font-family: "Trebuchet MS", Helvetica, sans-serif; text-align: center; } h2 { font-size: 144px; } p { font-size: 64px; } </style> </head> <body> <div class="center"> <h2>Hello Again!</h2> <p>This is served from a file</p> </div> </body> </html> This single webpage shows two lines of text: Hello Again! and This is served from a file. The lines appear in the center of the page, one above each other. The first line of text is displayed in a heading, meaning it would be large. The second line of text will appear slightly smaller. All the text will appear white and the webpage has an orange background. While it’s not the scope of this article or series, if you are interested in learning more about HTML, CSS, and other front-end web technologies, you can take a look at Mozilla’s Getting Started with the Web guide. That’s all we need for the HTML, so save and exit. We can now move on to the server code. For this exercise, we’ll work on htmlFile.js. Open it with the text editor: nano htmlFile.js As we have to read a file, let’s begin by importing the fs module: first-servers/htmlFile.js const http = require("http"); const fs = require('fs').promises; This module contains a readFile() function that we’ll use to load the HTML file in place. We import the promise variant in keeping with modern JavaScript best practices. We use promises as its syntactically more succinct than callbacks, which we would have to use if we assigned fs to just require('fs'). To learn more about asynchronous programming best practices, you can read our How To Write Asynchronous Code in Node.js guide. We want our HTML file to be read when a user requests our system. Let’s begin by modifying requestListener() to read the file: first-servers/htmlFile.js const requestListener = function (req, res) { fs.readFile(__dirname + "/index.html") }; We use the fs.readFile() method to load the file. Its argument has __dirname + "/index.html". The special variable __dirname has the absolute path of where the Node.js code is being run. We then append /index.html so we can load the HTML file we created earlier. Now let’s return the HTML page once it’s loaded: first-servers/htmlFile.js const requestListener = function (req, res) { fs.readFile(__dirname + "/index.html") .then(contents => { res.setHeader("Content-Type", "text/html"); res.writeHead(200); res.end(contents); }) }; If the fs.readFile() promise successfully resolves, it will return its data. We use the then() method to handle this case. The contents parameter contains the HTML file’s data. We first set the Content-Type header to text/html to tell the client that we are returning HTML data. We then write the status code to indicate the request was successful. We finally send the client the HTML page we loaded, with the data in the contents variable. The fs.readFile() method can fail at times, so we should handle this case when we get an error. Add this to the requestListener() function: first-servers/htmlFile.js const requestListener = function (req, res) { fs.readFile(__dirname + "/index.html") .then(contents => { res.setHeader("Content-Type", "text/html"); res.writeHead(200); res.end(contents); }) .catch(err => { res.writeHead(500); res.end(err); return; }); }; Save the file and exit. When a promise encounters an error, it is rejected. We handle that case with the catch() method. It accepts the error that fs.readFile() returns, sets the status code to 500 signifying that an internal error was encountered, and returns the error to the user. Run our server with the node command: node htmlFile.js In the web browser, visit http://localhost:8000. You will see this page: Image of HTML page loaded from a file in Node.js You have now returned an HTML page from the server to the user. You can quit the running server with CTRL+C. You will see the terminal prompt return when you do. When writing code like this in production, you may not want to load an HTML page every time you get an HTTP request. While this HTML page is roughly 800 bytes in size, more complex websites can be megabytes in size. Large files can take a while to load. If your site is expecting a lot of traffic, it may be best to load HTML files at startup and save their contents. After they are loaded, you can set up the server and make it listen to requests on an address. To demonstrate this method, let’s see how we can rework our server to be more efficient and scalable.

 Serving HTML Efficiently

Instead of loading the HTML for every request, in this step we will load it once at the beginning. The request will return the data we loaded at startup. In the terminal, re-open the Node.js script with a text editor: nano htmlFile.js Let’s begin by adding a new variable before we create the requestListener() function: first-servers/htmlFile.js let indexFile; const requestListener = function (req, res) { When we run this program, this variable will hold the HTML file’s contents. Now, let’s readjust the requestListener() function. Instead of loading the file, it will now return the contents of indexFile: first-servers/htmlFile.js const requestListener = function (req, res) { res.setHeader("Content-Type", "text/html"); res.writeHead(200); res.end(indexFile); }; Next, we shift the file reading logic from the requestListener() function to our server startup. Make the following changes as we create the server: first-servers/htmlFile.js const server = http.createServer(requestListener); fs.readFile(__dirname + "/index.html") .then(contents => { indexFile = contents; server.listen(port, host, () => { console.log(`Server is running on http://${host}:${port}`); }); }) .catch(err => { console.error(`Could not read index.html file: ${err}`); process.exit(1); }); Save the file and exit. The code that reads the file is similar to what we wrote in our first attempt. However, when we successfully read the file we now save the contents to our global indexFile variable. We then start the server with the listen() method. The key thing is that the file is loaded before the server is run. This way, the requestListener() function will be sure to return an HTML page, as indexFile is no longer an empty variable. Our error handler has changed as well. If the file can’t be loaded, we capture the error and print it to our console. We then exit the Node.js program with the exit() function without starting the server. This way we can see why the file reading failed, address the problem, and then start the server again. We’ve now created different web servers that return various types of data to a user. So far, we have not used any request data to determine what should be returned. We’ll need to use request data when setting up different routes or paths in a Node.js server, so next let’s see how they work together.

Step 4 — Managing Routes Using an HTTP Request Object


Most websites we visit or APIs we use usually have more than one endpoint so we can access various resources. A good example would be a book management system, one that might be used in a library. It would not only need to manage book data, but it would also manage author data for cataloguing and searching convenience. Even though the data for books and authors are related, they are two different objects. In these cases, software developers usually code each object on different endpoints as a way to indicate to the API user what kind of data they are interacting with. Let’s create a new server for a small library, which will return two different types of data. If the user goes to our server’s address at /books, they will receive a list of books in JSON. If they go to /authors, they will receive a list of author information in JSON. So far, we have been returning the same response to every request we get. Let’s illustrate this quickly. Re-run our JSON response example: node json.js In another terminal, let’s do a cURL request like before: curl http://localhost:8000 You will see: Output{"message": "This is a JSON response"} Now let’s try another curl command: curl http://localhost:8000/todos After pressing Enter, you will see the same result: Output{"message": "This is a JSON response"} We have not built any special logic in our requestListener() function to handle a request whose URL contains /todos, so Node.js returns the same JSON message by default. As we want to build a miniature library management server, we’ll now separate the kind of data that’s returned based on the endpoint the user accesses. First, exit the running server with CTRL+C. Now open routes.js in your text editor: nano routes.js Let’s begin by storing our JSON data in variables before the requestListener() function: first-servers/routes.js const books = JSON.stringify([ { title: "The Alchemist", author: "Paulo Coelho", year: 1988 }, { title: "The Prophet", author: "Kahlil Gibran", year: 1923 } ]); const authors = JSON.stringify([ { name: "Paulo Coelho", countryOfBirth: "Brazil", yearOfBirth: 1947 }, { name: "Kahlil Gibran", countryOfBirth: "Lebanon", yearOfBirth: 1883 } ]); The books variable is a string that contains JSON for an array of book objects. Each book has a title or name, an author, and the year it was published. The authors variable is a string that contains the JSON for an array of author objects. Each author has a name, a country of birth, and their year of birth. Now that we have the data our responses will return, let’s start modifying the requestListener() function to return them to the correct routes. First, we’ll ensure that every response from our server has the correct Content-Type header: first-servers/routes.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); } Now, we want to return the right JSON depending on the URL path the user visits. Let’s create a switch statement on the request’s URL: first-servers/routes.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); switch (req.url) {} } To get the URL path from a request object, we need to access its url property. We can now add cases to the switch statement to return the appropriate JSON. JavaScript’s switch statement provides a way to control what code is run depending on the value of an object or JavaScript expression (for example, the result of mathematical operations). Let’s continue by adding a case for when the user wants to get our list of books: first-servers/routes.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); switch (req.url) { case "/books": res.writeHead(200); res.end(books); break } } We set our status code to 200 to indicate the request is fine and return the JSON containing the list of our books. Now let’s add another case for our authors: first-servers/routes.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); switch (req.url) { case "/books": res.writeHead(200); res.end(books); break case "/authors": res.writeHead(200); res.end(authors); break } } Like before, the status code will be 200 as the request is fine. This time we return the JSON containing the list of our authors. We want to return an error if the user tries to go to any other path. Let’s add the default case to do this: routes.js const requestListener = function (req, res) { res.setHeader("Content-Type", "application/json"); switch (req.url) { case "/books": res.writeHead(200); res.end(books); break case "/authors": res.writeHead(200); res.end(authors); break default: res.writeHead(404); res.end(JSON.stringify({error:"Resource not found"})); } } We use the default keyword in a switch statement to capture all other scenarios not captured by our previous cases. We set the status code to 404 to indicate that the URL they were looking for was not found. We then set a JSON object that contains an error message. Let’s test our server to see if it behaves as we expect. In another terminal, let’s first run a command to see if we get back our list of books: curl http://localhost:8000/books Press Enter to see the following output: Output[{"title":"The Alchemist","author":"Paulo Coelho","year":1988},{"title":"The Prophet","author":"Kahlil Gibran","year":1923}] So far so good. Let’s try the same for /authors. Type the following command in the terminal: curl http://localhost:8000/authors You will see the following output when the command is complete: Output[{"name":"Paulo Coelho","countryOfBirth":"Brazil","yearOfBirth":1947},{"name":"Kahlil Gibran","countryOfBirth":"Lebanon","yearOfBirth":1883}] Last, let’s try an erroneous URL to ensure that requestListener() returns the error response: curl http://localhost:8000/notreal Entering that command will display this message: Output{"error":"Resource not found"} You can exit the running server with CTRL+C. We’ve now created different avenues for users to get different data. We also added a default response that returns an HTTP error if the user enters a URL that we don’t support.

Conclusion


In this tutorial, you’ve made a series of Node.js HTTP servers. You first returned a basic textual response. You then went on to return various types of data from our server: JSON, CSV, and HTML. From there you were able to combine file loading with HTTP responses to return an HTML page from the server to the user, and to create an API that used information about the user’s request to determine what data should be sent in its response. You’re now equipped to create web servers that can handle a variety of requests and responses. With this knowledge, you can make a server that returns many HTML pages to the user at different endpoints.

Node.js Web Server

To access web pages of any web application, you need a web server. The web server will handle all the http requests for the web application e.g IIS is a web server for ASP.NET web applications and Apache is a web server for PHP or Java web applications. Node.js provides capabilities to create your own web server which will handle HTTP requests asynchronously. You can use IIS or Apache to run Node.js web application but it is recommended to use Node.js web server.

Create Node.js Web Server

Node.js makes it easy to create a simple web server that processes incoming requests asynchronously. The following example is a simple Node.js web server contained in server.js file. server.js var http = require('http'); // 1 - Import Node.js core module var server = http.createServer(function (req, res) { // 2 - creating server //handle incomming requests here.. }); server.listen(5000); //3 - listen for any incoming requests console.log('Node.js web server at port 5000 is running..') In the above example, we import the http module using require() function. The http module is a core module of Node.js, so no need to install it using NPM. The next step is to call createServer() method of http and specify callback function with request and response parameter. Finally, call listen() method of server object which was returned from createServer() method with port number, to start listening to incoming requests on port 5000. You can specify any unused port here. Run the above web server by writing node server.js command in command prompt or terminal window and it will display message as shown below. C:\> node server.js Node.js web server at port 5000 is running.. This is how you create a Node.js web server using simple steps. Now, let's see how to handle HTTP request and send response in Node.js web server.

Handle HTTP Request

The http.createServer() method includes request and response parameters which is supplied by Node.js. The request object can be used to get information about the current HTTP request e.g., url, request header, and data. The response object can be used to send a response for a current HTTP request. The following example demonstrates handling HTTP request and response in Node.js. server.js var http = require('http'); // Import Node.js core module var server = http.createServer(function (req, res) { //create web server if (req.url == '/') { //check the URL of the current request // set response header res.writeHead(200, { 'Content-Type': 'text/html' }); // set response content res.write('<html><body><p>This is home Page.</p></body></html>'); res.end(); }else if (req.url == "/student") { res.writeHead(200, { 'Content-Type': 'text/html' }); res.write('<html><body><p>This is student Page.</p></body></html>'); res.end(); }else if (req.url == "/admin") { res.writeHead(200, { 'Content-Type': 'text/html' }); res.write('<html><body><p>This is admin Page.</p></body></html>'); res.end(); }else res.end('Invalid Request!'); }); server.listen(5000); //6 - listen for any incoming requests console.log('Node.js web server at port 5000 is running..') In the above example, req.url is used to check the url of the current request and based on that it sends the response. To send a response, first it sets the response header using writeHead() method and then writes a string as a response body using write() method. Finally, Node.js web server sends the response using end() method. Now, run the above web server as shown below. C:\> node server.js Node.js web server at port 5000 is running.. To test it, you can use the command-line program curl, which most Mac and Linux machines have pre-installed. curl -i http://localhost:5000 You should see the following response. HTTP/1.1 200 OK Content-Type: text/plain Date: Tue, 8 Sep 2015 03:05:08 GMT Connection: keep-alive This is home page. For Windows users, point your browser to http://localhost:5000 and see the following result. The same way, point your browser to http://localhost:5000/student and see the following result. It will display "Invalid Request" for all requests other than the above URLs.

Sending JSON Response

The following example demonstrates how to serve JSON response from the Node.js web server. server.js var http = require('http'); var server = http.createServer(function (req, res) { if (req.url == '/data') { //check the URL of the current request res.writeHead(200, { 'Content-Type': 'application/json' }); res.write(JSON.stringify({ message: "Hello World"})); res.end(); } }); server.listen(5000); console.log('Node.js web server at port 5000 is running..') So, this way you can create a simple web server that serves different responses.

See a live example on Glitch

Here is a live example of a WebSockets server: https://glitch.com/edit/#!/flavio-websockets-server-example Here is a WebSockets client that interacts with the server: https://glitch.com/edit/#!/flavio-websockets-client-example

Node.js Generate html

Node.js Generate html The most basic way is: var http = require('http'); http.createServer(function (req, res) { var html = buildHtml(req); res.writeHead(200, { 'Content-Type': 'text/html', 'Content-Length': html.length, 'Expires': new Date().toUTCString() }); res.end(html); }).listen(8080); function buildHtml(req) { var header = ''; var body = ''; // concatenate header string // concatenate body string return '<!DOCTYPE html>' + '<html><head>' + header + '</head><body>' + body + '</body></html>'; }; And access this HTML with http://localhost:8080 from your browser. create-html

simple nodejs server code

var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); res.end('Hello World!'); }).listen(8080);

Web Scraping with Node.js Comprehensive Guide

https://nanonets.com/blog/web-scraping-with-node-js/

Introduction


Web scraping is used to scrape data from webpages automatically on a large scale. Reasons to use web scraping: Automate tasks such as data entry, form filling, and other repetitive tasks, saving you time and improving efficiency. Some reasons node.js is a great choice for web scraping: Node.js can handle multiple web scraping requests parallelly.

How to scrape webpages using Node JS?


Step 1 Setting up your environment: You must install node.js if you haven’t already. Step 2 Installing necessary packages for web scraping with Node.js: Node.js has multiple options for web scraping like Cheerio, Puppeteer, and request. Install them easily using the following command. npm install cheerio npm install puppeteer npm install request Step 3 Setting up your project directory: You need to create a new directory for the new project. And then navigate to the command prompt to create new file to store your NodeJS web scraping code. You can create a new directory and new file using the following command: mkdir my-web-scraper cd my-web-scraper touch scraper.js Step 4 Making HTTP Requests with Node.js: In order to scrape webpages, you need to make HTTP requests. Now, Node.js has in-built http module. This makes it easy to make requests. You can also use axios or requests to make request. Here is the code to make http requests with node.js const http = require('http'); const url = 'http://example.com'; http.get(url, (res) => { let data = ''; res.on('data', (chunk) => { data += chunk; }); res.on('end', () => { console.log(data); }); }); Replace http.//example.com with the url of your choice to scrape the webpages, Step 5 Scraping HTML with Node.js: Once you have the HTML content of a web page, you need to parse it to extract the data you need. Several third-party libraries are available for parsing HTML in Node.js, such as Cheerio and JSDOM. Here is an example code snippet using Cheerio to parse HTML and extract data: const cheerio = require('cheerio'); const request = require('request'); const url = 'https://example.com'; request(url, (error, response, html) => { if (!error && response.statusCode == 200) { const $ = cheerio.load(html); const title = $('title').text(); const firstParagraph = $('p').first().text(); console.log(title); console.log(firstParagraph); } }); This code uses the request library to fetch the HTML content of the web page at url and then uses Cheerio to parse the HTML and extract the title and the first paragraph.

 How to handle javascript and dynamic content using Node.js?


Many modern web pages use JavaScript to render dynamic content, making it difficult to scrape them. To handle JavaScript rendering, you can use headless browsers like Puppeteer and Playwright, which allow you to simulate a browser environment and scrape dynamic content. Here is an example code snippet using Puppeteer to scrape a web page that renders content with JavaScript: const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); const title = await page.$eval('title', el => el.textContent); const firstParagraph = await page.$eval('p', el => el.textContent); console.log(title); console.log(firstParagraph); await browser.close(); })(); This code uses Puppeteer to launch a headless browser, navigate to the web page at url, and extract the title and the first paragraph. The page.$eval() method selects and extracts data from HTML elements.

Various ways to extract webpage data using NodeJS


Here are some libraries you can use to scrape webpages using NodeJS easily: Cheerio: is a fast, flexible, and lightweight implementation of core jQuery designed for the server side. JSDOM: is a pure-JavaScript implementation of the DOM for Node.js. It provides a way to create a DOM environment in Node.js and manipulate it with a standard API. Puppeteer: is a Node.js library that provides a high-level API to control headless Chrome or Chromium. It can be used for web scraping, automated testing, crawling, and rendering.

Best Practices for Web Scraping with Node.js


Here are some best practices to follow when using Node.js for web scraping: Before scraping a website, read their terms of use. Ensure the webpage doesn’t have restrictions on web scraping or frequency of scraping webpages. Limit the number of HTTP requests to prevent overloading the website by controlling the frequency of requests. Set appropriate headers in your HTTP requests to mimic the behavior of a regular user. Cache webpages and extracted data to reduce the load on the website. Web scraping can be error-prone due to the complexity and variability of websites. Monitor and adjust your scraping activity and adjust your rate limiting, headers, and other settings as needed.

Web Scraping with JavaScript and Node.js

https://www.scrapingbee.com/blog/web-scraping-javascript/

 The JavaScript Event Loop

Event Loop. As opposed to how many languages handle concurrency, with multi-threading, JavaScript has always only used a single thread and performed blocking operations in an asynchronous fashion, relying primarily on callback functions (or function pointers, as C developers may call them). Let's check that quickly out with a simple web server example: // Import the "http" package const http = require('http'); // Use TCP port 3000 for our server const PORT = 3000; // Create a server instance and provide a request handler callback function const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/plain'); res.end('Hello World'); }); // Start listening on port 3000 server.listen(port, () => { console.log(`Server running at PORT:${port}/`); }); Here, we import the HTTP standard library with require, then create a server object with createServer and pass it an anonymous handler function, which the library will invoke for each incoming HTTP request. Finally, we listen on the specified port - and that's actually it. There are two interesting bits here and both already hint at our event loop and JavaScript's asynchronicity: The handler function we pass to createServer The fact that listen is not a blocking call, but returns immediately In most other languages, we'd usually have an accept function/method, which would block our thread and return the connection socket of the connecting client. At this point, the latest, we'd have to switch to multi-threading, as otherwise we could handle exactly one connection at a time. In this case, however, we don't have to deal with thread management and we always stay with one thread, thanks to callbacks and the event loop. As mentioned, listen will return immediately, but - although there's no code following our listen call - the application won't exit immediately. That is because we still have a callback registered via createServer (the function we passed). Whenever a client sends a request, Node.js will parse it in the background and call our anonymous function and pass the request object. The only thing we have to pay attention to here is to return swiftly and not block the function itself, but it's hard to do that, as almost all standard calls are asynchronous (either via callbacks or Promises) - just make sure you don't run while (true); But enough of theory, let's check it out, shall we? If you have Node.js installed, all you need to do is save the code to the file MyServer.js and run it in your shell with node MyServer.js. Now, just open your browser and load http://localhost:3000 - voilà, you should get a lovely "Hello World" greeting. That was easy, wasn't it? One could assume the single-threaded approach may come with performance issues, because it only has one thread, but it's actually quite the opposite and that's the beauty of asynchronous programming. Single-threaded, asynchronous programming can have, especially for I/O intensive work, quite a few performance advantages, because one does not need to pre-allocate resources (e.g. threads). All right, that was a very nice example of how we easily create a web server in Node.js, but we are in the business of scraping, aren't we? So let's take a look at the JavaScript HTTP client libraries.

HTTP clients: querying the web

HTTP clients are tools capable of sending a request to a server and then receiving a response from it. Almost every tool that will be discussed in this article uses an HTTP client under the hood to query the server of the website that you will attempt to scrape.

 1. Built-In HTTP Client

As mentioned in your server example, Node.js does ship by default with an HTTP library. That library also has a built-in HTTP client. // Import the "http" package const http = require('http'); // Send a GET request to example.com and print the response const req = http.request('http://example.com', res => { const data = []; res.on('data', _ => data.push(_)) res.on('end', () => console.log(data.join())) }); // Close the connection req.end(); It's rather easy to get started, as there are zero third-party dependencies to install or manage, however - as you can notice from our example - the library does require a bit of boilerplate, as it provides the response only in chunks and you eventually need to stitch them together manually. You'll also need to use a separate library for HTTPS URLs. In short, it's convenient because it comes out-of-the-box, but it may require you to write more code than you may want.

 2. Fetch API

Another built-in method would be the Fetch API. While browsers have supported it for a while already, it took Node.js a bit longer, but as of version 18, Node.js does support fetch().
While at it, also check out our dedicated article on node-fetch.
The Fetch API heavily uses Promises and coupled with await, that can really provide you with lean and legible code. async function fetch_demo() { // Request the Reddit URL and save the response in "resp" const resp = await fetch('https://www.reddit.com/r/programming.json'); // Use the .json() method to parse the response as JSON object console.log(await resp.json()); } fetch_demo(); The only workaround we had to employ, was to wrap our code into a function, as await is not supported on the top-level yet. Apart from that we really just called fetch() with our URL, awaited the response (Promise-magic happening in the background, of course), and used the json() function of our Response object (awaiting again) to get the response. Mind you, an already JSON-parsed response 😲. Not bad, two lines of code, no manual handling of data, no distinction between HTTP and HTTPS, and a native JSON object. fetch optionally accepts an additional options argument, where you can fine-tune your request with a specific request method (e.g. POST), additional HTTP headers, or pass authentication credentials.

 3. Axios

Axios is pretty similar to Fetch. It's also a Promise-based HTTP client and it runs in both, browsers and Node.js. Users of TypeScript will also love its built-in type support. One drawback, however, contrary to the libraries we mentioned so far, we do have to install it first. npm install axios Perfect, let's check out a first plain-Promise example: // Import Axios const axios = require('axios') axios // Request Reddit URL .get('https://www.reddit.com/r/programming.json') // Configure callback for the HTTP response .then((response) => { console.log(response) }) // Configure error callback .catch((error) => { console.error(error) }); Pretty straightforward. Relying on Promises, we can certainly also use await again and make the whole thing a bit less verbose. So let's wrap it into a function one more time: async function getForum() { try { // Request Reddit URL and "await" the response const response = await axios.get( 'https://www.reddit.com/r/programming.json' ) // Print the response, once available and get() returned console.log(response) } catch (error) { console.error(error) } } All you have to do is call getForum! You can find the Axios library at Github.

 4. SuperAgent

Much like Axios, SuperAgent is another robust HTTP client that has support for promises and the async/await syntax sugar. It has a fairly straightforward API like Axios, but SuperAgent has more dependencies and is less popular. Regardless, making an HTTP request with SuperAgent using promises, async/await, and callbacks looks like this: const superagent = require("superagent") const forumURL = "https://www.reddit.com/r/programming.json" // callbacks superagent .get(forumURL) .end((error, response) => { console.log(response) }) // promises superagent .get(forumURL) .then((response) => { console.log(response) }) .catch((error) => { console.error(error) }) // promises with async/await async function getForum() { try { const response = await superagent.get(forumURL) console.log(response) } catch (error) { console.error(error) } } You can find the SuperAgent library at GitHub and installing SuperAgent is as simple as npm install superagent.

  SuperAgent plugins

One feature, that sets SuperAgent apart from the other libraries here, is its extensibility. It features quite a list of plugins which allow for the tweaking of a request or response. For example, the superagent-throttle plugin would allow you to define throttling rules for your requests.

 5. Request

Even though it is not actively maintained any more, Request still is a popular and widely used HTTP client in the JavaScript ecosystem. It is fairly simple to make an HTTP request with Request: const request = require('request') request('https://www.reddit.com/r/programming.json', function ( error, response, body ) { console.error('error:', error) console.log('body:', body) }) What you will definitely have noticed here, is that we were neither using plain Promises nor await. That is because Request still employs the traditional callback approach, however there are a couple of wrapper libraries to support await as well. You can find the Request library at GitHub, and installing it is as simple as running npm install request. Should you use Request? We included Request in this list because it still is a popular choice. Nonetheless, development has officially stopped and it is not being actively maintained any more. Of course, that does not mean it is unusable, and there are still lots of libraries using it, but the fact itself, may still make us think twice before we use it for a brand-new project, especially with quite a list of viable alternatives and native fetch support.

 Comparison of the different libraries

Library ✔️ Pros ❌ Cons
HTTP package Supported out-of-the-box Relies only on callbacks
Separate libraries for HTTP and HTTPS
Fetch Supported out-of-the-box
Promise-based with support for await
Limited configurability
Axios Support for TypeScript types Requires additional dependency
SuperAgent Extensible with plugins Requires additional dependency
Request Still a popular choice Relies only on callbacks
Requires additional dependency
Not maintained any more
Tired of getting blocked while scraping the web? Join 20,000 users using our API to get the data they need! Try ScrapingBee for Free

Data Extraction in JavaScript

Fetching the content of a site is, undoubtedly, an important step in any scraping project, but it's only the first step and we actually need to locate and extract the data as well. This is what we are going to check out next, how we can handle an HTML document in JavaScript and how to locate and select information for data extraction. First off, regular expressions 🙂

 Regular expressions: the hard way

The simplest way to get started with web scraping without any dependencies, is to use a bunch of regular expressions on the HTML content you received from your HTTP client. But there is a big tradeoff. While absolutely great in their domain, regular expressions are not ideal for parsing document structures like HTML. Plus, newcomers often struggle with getting them right ("do I need a look-ahead or a look-behind?"). For complex web scraping, regular expressions can also get out of hand. With that said, let's give it a go nonetheless. Say there's a label with some username in it and we want the username. This is similar to what you'd have to do if you relied on regular expressions: const htmlString = '<label>Username: John Doe</label>' const result = htmlString.match(/<label>Username: (.+)<\/label>/) console.log(result[1]) // John Doe We are using String.match() here, which will provide us with an array containing the data of the evaluation of our regular expression. As we used a capturing group ((.+)), the second array element (result[1]) will contain whatever that group managed to capture. While this certainly worked in our example, anything more complex will either not work or will require a way more complex expression. Just imagine you have a couple of <label> elements in your HTML document. Don't get us wrong, regular expressions are an unimaginable great tool, just not for HTML 😊 - so let us introduce you to the world of CSS selectors and the DOM.

 Cheerio: Core jQuery for traversing the DOM

Cheerio is an efficient and lightweight library that allows you to use the rich and powerful API of jQuery on the server-side. If you have used jQuery before, you will feel right at home with Cheerio. It provides you with an incredibly easy way to parse an HTML string into a DOM tree, which you can then access via the elegant interface you may be familiar with from jQuery (including function-chaining). // Import Cheerio const cheerio = require('cheerio') // Parse the provided HTML into a Cheerio object const $ = cheerio.load('<h2>Hello world</h2>') // Set the text of the "h2" tag and add an HTML class $('h2.title').text('Hello there!') $('h2').addClass('welcome') // Print the modified HTML document $.html() // <h2>Hello there!</h2> As you can see, using Cheerio really is almost identical to how you'd use jQuery. Keep in mind, Cheerio really focuses on DOM-manipulation and you won't be able to directly "port" jQuery functionality, such as XHR/AJAX requests or mouse handling (e.g. onClick), one-to-one in Cheerio. Cheerio is a great tool for most use cases when you need to handle the DOM yourself. Of course, if you want to crawl a JavaScript-heavy site (e.g. typical Single-page applications) you may need something closer to a full browser engine. We'll be talking about that in just second, under Headless Browsers in JavaScript. Time for a quick Cheerio example, wouldn't you agree? To demonstrate the power of Cheerio, we will attempt to crawl the r/programming forum in Reddit and get a list of post names. First, install Cheerio and Axios by running the following command: npm install cheerio axios. Then create a new file called crawler.js and copy/paste the following code: // Import Axios and Cheerio const axios = require('axios'); const cheerio = require('cheerio'); const getPostTitles = async () => { try { // Load Reddit const { data } = await axios.get( 'https://old.reddit.com/r/programming/' ); // Parse HTML with Cheerio const $ = cheerio.load(data); // Initialise empty data array const postTitles = []; // Iterate over all anchor links for the given selector and .... $('div > p.title > a').each((_idx, el) => { // .... extract for each the tag text and add it to the data array const postTitle = $(el).text() postTitles.push(postTitle) }); // Return the array with all titles return postTitles; } catch (error) { throw error; } }; // Wait for the promise to fulfill and print the title array it returned getPostTitles() .then((postTitles) => console.log(postTitles)); getPostTitles() is an asynchronous function that will crawl the subreddit r/programming forum. First, the HTML of the website is obtained using a simple HTTP GET request with the Axios HTTP client library. Then, the HTML data is fed into Cheerio using the cheerio.load() function. Wonderful, we now have, in good old-fashioned jQuery-manner, the fully parsed HTML document as DOM tree in $. What's next? Well, it might not be a bad idea to know where to get our posting titles from. So, let's right click one of the titles and pick Inspect. That should get us right to the right element in the browser's developer tools. Inspecting Reddit DOM
Excellent, equipped with our knowledge on XPath or CSS selectors, we can now easily compose the expression we need for that element. For our example, we chose CSS selectors and following one just works beautifully. div > p.title > a If you used jQuery, you probably know what we are up to, right? 😏 $('div > p.title > a') You were absolutely right. The Cheerio call is identical to jQuery (there was a reason why we used $ for our DOM variable before) and using Cheerio with our CSS selector will give us the very list of elements matching our selector. Now, we just need to iterate with each() over all elements and call their text() function to get their text content. 💯 jQuery, isn't it? So much about the explanation. Time to run our code. Open up your shell and run node crawler.js. You should now have a list of post titles similar to this: [ 'The State of the Subreddit (May 2024)', "Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT | Tom's Hardware", 'Stack Overflow Community is Not Happy With The OpenAI Deal', "I'm sorry.. I built another tool", `Development notes from xkcd's "Machine"`, 'How Stripe Prevents Double Payment Using Idempotent API', 'How many jobs are available in technology in the US?', 'Test Failures Should Be Actionable', 'Approach to studying - generally vs for specific task.', 'RAG With PostgreSQL', 'Is programming easier that ever to learn?', 'Lincoln Learning Solutions', 'Did GitHub Copilot really increase my productivity?', '"usbredir is the name of a network protocol for sending USB device traffic over a network connection. It is also the name of the software package offering a parsing library, a usbredirhost library and several utilities implementing this protocol." The protocol is "completely independent of spice."', 'Codeacademy', 'C++23: std::expected - Monadic Extensions', 'Libyear', 'Free python data visualization libraries', 'Awesome Regex: The best regex tools, tutorials, libs, etc. for all major regex flavors', 'Best Online Courses for Data Science with R Programming in 2024', "Website Carbon Calculator v3 | What's your site's carbon footprint?", 'Java library', 'I built and am sharing my code for a live EEG brainwave visualization app using Python, PyQt6 and the Muse headband!', 'The search for easier safe systems programming', 'Facebook API AdSet', 'Learning Python' ] While this is a simple use case, it demonstrates the simple nature of the API provided by Cheerio.
Do not forget to check out our Node.js Axios proxy tutorial if you want to learn more about using proxies for web scraping!
If your use case requires the execution of JavaScript and loading of external sources, the following few options will be helpful.

 jsdom: the DOM for Node

Similarly to how Cheerio replicates jQuery on the server-side, jsdom does the same for the browser's native DOM functionality. Unlike Cheerio, however, jsdom does not only parse HTML into a DOM tree, it can also handle embedded JavaScript code and it allows you to "interact" with page elements. Instantiating a jsdom object is rather easy: // Import jsdom const { JSDOM } = require('jsdom') // Parse the given HTML document with jsdom const { document } = new JSDOM( '<h2>Hello world</h2>' ).window // Use querySelector() to get the h2 element with the specified HTML class const heading = document.querySelector('.title') // Set its text and add a class heading.textContent = 'Hello there!' heading.classList.add('welcome') heading.innerHTML // <h2>Hello there!</h2> Here, we imported the library with require and created a new jsdom instance using the constructor and passed our HTML snippet. Then, we simply used querySelector() (as we know it from front-end development) to select our element and tweaked its attributes a bit. Fairly standard and we could have done that with Cheerio as well, of course. What sets jsdom, however, apart is aforementioned support for embedded JavaScript code and, that, we are going to check out now. The following example uses a simple local HTML page, with one button adding a <div> with an ID. const { JSDOM } = require("jsdom") const HTML = ` <html> <body> <button onclick="const e = document.createElement('div'); e.id = 'myid'; this.parentNode.appendChild(e);">Click me</button> </body> </html>`; const dom = new JSDOM(HTML, { runScripts: "dangerously", resources: "usable" }); const document = dom.window.document; const button = document.querySelector('button'); console.log("Element before click: " + document.querySelector('div#myid')); button.click(); console.log("Element after click: " + document.querySelector('div#myid')); Nothing too complicated here: we require() jsdom set up our HTML document pass HTML to our jsdom constructor (important, we need to enable runScripts) select the button with a querySelector() call and click() it Voilà, that should give us this output Element before click: null Element after click: [object HTMLDivElement] Fairly straightforward and the example showcased how we can use jsdom to actually execute the page's JavaScript code. When we loaded the document, there was initially no <div>. Only once we clicked the button, it was added by the site's code, not our crawler's code. In this context, the important details are runScripts and resources. These flags instruct jsdom to run the page's code, as well as fetch any relevant JavaScript files. As jsdom's documentation points out, that could potentially allow any site to escape the sandbox and get access to your local system, just by crawling it. Proceed with caution please. jsdom is a great library to handle most of typical browser tasks within your local Node.js instance, but it still has some limitations and that's where headless browsers really come to shine.
💡 We released a new feature that makes this whole process way simpler. You can now extract data from HTML with one simple API call. Feel free to check the documentation here.

Headless Browsers in JavaScript

Sites become more and more complex and often regular HTTP crawling won't suffice any more, but one actually needs a full-fledged browser engine, to get the necessary information from a site. This is particularly true for SPAs (Single-page_application) which heavily rely on JavaScript and dynamic and asynchronous resources. Browser automation and headless browsers come to the rescue here. Let's check out how they can help us to easily crawl Single-page Applications and other sites making use of JavaScript.

 1. Puppeteer: the headless browser

Puppeteer, as the name implies, allows you to manipulate the browser programmatically, just like how a puppet would be manipulated by its puppeteer. It achieves this by providing a developer with a high-level API to control a headless version of Chrome by default and can be configured to run non-headless. puppeteer-hierachy Taken from the Puppeteer Docs (Source) Puppeteer is particularly more useful than the aforementioned tools because it allows you to crawl the web as if a real person were interacting with a browser. This opens up a few possibilities that weren't there before: You can get screenshots or generate PDFs of pages. You can crawl a Single Page Application and generate pre-rendered content. You can automate many different user interactions, like keyboard inputs, form submissions, navigation, etc. It could also play a big role in many other tasks outside the scope of web crawling like UI testing, assist performance optimization, etc. Quite often, you will probably want to take screenshots of websites or, get to know about a competitor's product catalog. Puppeteer can be used to do this. To start, install Puppeteer by running the following command: npm install puppeteer This will download a bundled version of Chromium which takes up about 180 to 300 MB, depending on your operating system. You can avoid that step, and use an already installed setup, by specifying a couple of Puppeteer environment variables, such as PUPPETEER_SKIP_CHROMIUM_DOWNLOAD. Generally, though, Puppeteer does recommended to use the bundled version and does not support custom setups. Let's attempt to get a screenshot and PDF of the r/programming forum in Reddit, create a new file called crawler.js, and copy/paste the following code: const puppeteer = require('puppeteer') async function getVisual() { try { const URL = 'https://www.reddit.com/r/programming/' const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto(URL) await page.screenshot({ path: 'screenshot.png' }) await page.pdf({ path: 'page.pdf' }) await browser.close() } catch (error) { console.error(error) } } getVisual() getVisual() is an asynchronous function that will take a screenshot of our page, as well as export it as PDF document. To start, an instance of the browser is created by running puppeteer.launch(). Next, we create a new browser tab/page with newPage(). Now, we just need to call goto() on our page instance and pass it our URL. All these functions are of asynchronous nature and will return immediately, but as they are returning a JavaScript Promise, and we are using await, the flow still appears to be synchronous and, hence, once goto "returned", our website should have loaded. Excellent, we are ready to get pretty pictures. Let's just call screenshot() on our page instance and pass it a path to our image file. We do the same with pdf() and voilà, we should have at the specified locations two new files. Because we are responsible netizens, we also call close() on our browser object, to clean up behind ourselves. That's it. Once thing to keep in mind, when goto() returns, the page has loaded but it might not be done with all its asynchronous loading. So depending on your site, you may want to add additional logic in a production crawler, to wait for certain JavaScript events or DOM elements. But let's run the code. Pop up a shell window, type node crawler.js, and after a few moments, you should have exactly the two mentioned files in your directory. It's a great tool and if you are really keen on it now, please also check out our other guides on Puppeteer. How to download a file with Puppeteer Handling and submitting HTML forms with Puppeteer Using Puppeteer with Python and Pyppeteer

 2. Nightmare: an alternative to Puppeteer

If Puppeteer is too complex for your use case or there are issues with the default Chromium bundle, you might want to check out Nightmare as well. Although it is not actively developed any more, it still provides easy access to a proper browser engine and comes with a similar browser automation interface as Puppeteer. It uses Electron and web and scraping benchmarks indicate it shows a significantly better performance than its predecessor PhantomJS. As so often, our journey starts with NPM: npm install nightmare Once Nightmare is available on your system, we will use it to find ScrapingBee's website through a Brave search. To do so, create a file called crawler.js and copy-paste the following code into it: // Import and instantiate Nightmare const Nightmare = require('nightmare') const nightmare = Nightmare() nightmare // Load Brave page .goto('https://search.brave.com/') // Enter search term into text box .type('#searchbox', 'ScrapingBee') // Click the search button .click('#submit-button') // Wait until search listing is available .wait('#results a') // Query first search link .evaluate( () => document.querySelector('#results a').href ) .end() .then((link) => { console.log('ScrapingBee Web Link:', link) }) .catch((error) => { console.error('Search failed:', error) }) After the usual library import with require, we first create a new instance of Nightmare and save that in nightmare. After that, we are going to have lots of fun with function-chaining and Promises 🥳 We use goto() to load Brave from https://search.brave.com We type our search term "ScrapingBee" in Brave's search input, with the CSS selector #searchbox (Brave's quite straightforward with its naming, isn't it?) We click the submit button to start our search. Again, that's with the CSS selector #submit-button (Brave's really straightforward, we love that❣️) Let's take a quick break, until Brave returns the search list. wait, with the right selector works wonders here. wait also accepts time value, if you need to wait for a specific period of time. Once Nightmare got the link list from Brave, we simply use evaluate() to run our custom code on the page (in this case querySelector()) and get the first <a> element matching our selector, and return its href attribute. Last but not least, we call end() to run and complete our task queue. That's it, folks. end() returns a standard Promise with the value from our call to evaluate(). Of course, you could also use await here. That was pretty easy, wasn't it? And if everything went all right 🤞, we should have now got the link to ScrapingBee's website at https://www.scrapingbee.com ScrapingBee Web Link: https://www.scrapingbee.com/ Do you want to try it yourself? Just run node crawler.js in your shell 👍

 3. Playwright, the new web scraping framework

Playwright is a cross-language and cross-platform framework by Microsoft.

Its main advantage over Puppeteer is that it is cross platform and very easy to use.

Here is how to simply scrape a page with it: // Import Playwright const playwright = require('playwright'); async function main() { // Launch Chrome in full-UI mode const browser = await playwright.chromium.launch({ headless: false // setting this to true will not run the UI }); // Open new page const page = await browser.newPage(); // Navigate to Yahoo await page.goto('https://finance.yahoo.com/world-indices'); await page.waitForTimeout(5000); // wait for 5 seconds await browser.close(); } main(); Feel free to check out our Playwright tutorial if you want to learn more.

 Comparison of headless browser libraries

Library ✔️ Pros ❌ Cons
Puppeteer Very popular Chrome-only
Nightmare Significant better performance than PhantomJS No longer maintained
Chrome-only
Playwright Easier integration than Pupeteer
Cross-browser support
Still relatively new

Summary

Phew, that was a long read! But we hope, our examples managed to give you a first glimpse into the world of web scraping with JavaScript and which libraries you can use to crawl the web and scrape the information you need. Let's give it a quick recap, what we learned today was: ✅ Node.js is a JavaScript runtime that allows JavaScript to be run server-side. It has a non-blocking nature thanks to the Event Loop. ✅ HTTP clients, such as the native libraries and fetch, as well as Axios, SuperAgent, node-fetch, and Request, are used to send HTTP requests to a server and receive a response. ✅ Cheerio abstracts the best out of jQuery for the sole purpose of running it server-side for web crawling but does not execute JavaScript code. ✅ JSDOM creates a DOM per the standard JavaScript specification out of an HTML string and allows you to perform DOM manipulations on it. ✅ Puppeteer and Nightmare are high-level browser automation libraries, that allow you to programmatically manipulate web applications as if a real person were interacting with them. This article focused on JavaScript's scraping ecosystem and its tools. However, there are certainly also other aspects to scraping, which we could not cover in this context. For example, sites often employ techniques to recognize and block crawlers. You'll want to avoid these and blend in as normal visitor. On this subject, and more, we have an excellent, dedicated guide on how not to get blocked as a crawler. Check it out.
💡 Should you love scraping, but the usual time-constraints for your project don't allow you to tweak your crawlers to perfection, then please have a look at our scraping API platform. ScrapingBee was built with all these things in mind and has got your back in all crawling tasks.

Resources

Node.js Website - The main site of Node.js with its official documentation. Puppeteer's Docs - Google's documentation of Puppeteer, with getting started guides and the API reference. Playwright - An alternative to Puppeteer, backed by Microsoft. Generating Random IPs to Use for Scraping - Guide to generating random IPs to help avoid bot detection. ScrapingBee's Blog - Contains a lot of information about Web Scraping goodies on multiple platforms. Handling infinite scroll with Puppeteer Node-unblocker - a Node.js package to facilitate web scraping through proxies. A Javascript developer's guide to cURL - If you like this article, you will love this guide about how to use cURL with Javascript.

Web Scraping nodejs server

setup a simple web server

Using node.js as a simple web server Simplest Node.js server is just: $ npm install http-server -g Now you can run a server via the following commands: $ cd MyApp $ http-server If you're using NPM 5.2.0 or newer, you can use http-server without installing it with npx. This isn't recommended for use in production but is a great way to quickly get a server running on localhost. $ npx http-server Or, you can try this, which opens your web browser and enables CORS requests: $ http-server -o --cors For more options, check out the documentation for http-server on GitHub, or run: $ http-server --help Lots of other nice features and brain-dead-simple deployment to NodeJitsu. Feature Forks Of course, you can easily top up the features with your own fork. You might find it's already been done in one of the existing 800+ forks of this project: https://github.com/nodeapps/http-server/network Light Server: An Auto Refreshing Alternative A nice alternative to http-server is light-server. It supports file watching and auto-refreshing and many other features. $ npm install -g light-server $ light-server Add to your directory context menu in Windows Explorer reg.exe add HKCR\Directory\shell\LightServer\command /ve /t REG_EXPAND_SZ /f /d "\"C:\nodejs\light-server.cmd\" \"-o\" \"-s\" \"%V\"" Simple JSON REST server If you need to create a simple REST server for a prototype project then json-server might be what you're looking for. Auto Refreshing Editors Most web page editors and IDE tools now include a web server that will watch your source files and auto refresh your web page when they change. I use Live Server with Visual Studio Code. The open source text editor Brackets also includes a NodeJS static web server. Just open any HTML file in Brackets, press "Live Preview" and it starts a static server and opens your browser at the page. The browser will **auto refresh whenever you edit and save the HTML file. This especially useful when testing adaptive web sites. Open your HTML page on multiple browsers/window sizes/devices. Save your HTML page and instantly see if your adaptive stuff is working as they all auto refresh. PhoneGap Developers If you're coding a hybrid mobile app, you may be interested to know that the PhoneGap team took this auto refresh concept on board with their new PhoneGap App. This is a generic mobile app that can load the HTML5 files from a server during development. This is a very slick trick since now you can skip the slow compile/deploy steps in your development cycle for hybrid mobile apps if you're changing JS/CSS/HTML files — which is what you're doing most of the time. They also provide the static NodeJS web server (run phonegap serve) that detects file changes. PhoneGap + Sencha Touch Developers I've now extensively adapted the PhoneGap static server & PhoneGap Developer App for Sencha Touch & jQuery Mobile developers. Check it out at Sencha Touch Live. Supports --qr QR Codes and --localtunnel that proxies your static server from your desktop computer to a URL outside your firewall! Tons of uses. Massive speedup for hybrid mobile devs. Cordova + Ionic Framework Developers Local server and auto refresh features are baked into the ionic tool. Just run ionic serve from your app folder. Even better ... ionic serve --lab to view auto-refreshing side by side views of both iOS and Android. Web Scraping In Node Js With Multiple Examples Scraping the Web With Node.js An Introduction to Web Scraping with Node JS

 What will we need?

For this project we’ll be using Node.js. We’ll also be using two open-sourced npm modules to make today’s task a little easier: request-promise — Request is a simple HTTP client that allows us to make quick and easy HTTP calls. cheerio — jQuery for Node.js. Cheerio makes it easy to select, edit, and view DOM elements.

 Project Setup.

Create a new project folder. Within that folder create an index.js file. We’ll need to install and require our dependencies. Open up your command line, and install and save: request, request-promise, and cheerio npm install --save request request-promise cheerio Then require them in our index.js file: const rp = require("request-promise"); const cheerio = require("cheerio");

 Setting up the Request

request-promise accepts an object as input, and returns a promise. The options object needs to do two things: Pass in the url we want to scrape. Tell Cheerio to load the returned HTML so that we can use it. Here’s what that looks like: const options = { uri: `https://www.yourURLhere.com`, transform: function (body) { return cheerio.load(body); } }; The uri key is simply the website we want to scrape. The transform key tells request-promise to take the returned body and load it into Cheerio before returning it to us. Awesome. We’ve successfully set up our HTTP request options! Here’s what your code should look like so far: const rp = require("request-promise"); const cheerio = require("cheerio");const options = { uri: `https://www.yourURLhere.com`, transform: function (body) { return cheerio.load(body); } };

 Make the Request

Now that the options are taken care of, we can actually make our request. The boilerplate in the documentation for that looks like this: rp(OPTIONS) .then(function (data) { // REQUEST SUCCEEDED: DO SOMETHING }) .catch(function (err) { // REQUEST FAILED: ERROR OF SOME KIND }); We pass in our options object to request-promise, then wait to see if our request succeeds or fails. Either way, we do something with the returned data. Knowing what the documentation says to do, lets create our own version: rp(options) .then(($) => { console.log($); }) .catch((err) => { console.log(err); }); The code is pretty similar. The big difference is I’ve used arrow functions. I’ve also logged out the returned data from our HTTP request. We’re going to test to make sure everything is working so far. Replace the placeholder uri with the website you want to scrape. Then, open up your console and type: node index.js// LOGS THE FOLLOWING: { [Function: initialize] fn: initialize { constructor: [Circular], _originalRoot: { type: "root", name: "root", namespace: "http://www.w3.org/1999/xhtml", attribs: {}, ... If you don’t see an error, then everything is working so far — and you just made your first scrape! Here is the full code of our boilerplate: const rp = require('request-promise'); const cheerio = require('cheerio'); const options = { uri: `https://www.google.com`, transform: function (body) { return cheerio.load(body); } }; rp(options) .then(($) => { console.log($); }) .catch((err) => { console.log(err); }); Boilerplate web scraping code

 Using the Data

What good is our web scraper if it doesn’t actually return any useful data? This is where the fun begins. There are numerous things you can do with Cheerio to extract the data that you want. First and foremost, Cheerio’s selector implementation is nearly identical to jQuery’s. So if you know jQuery, this will be a breeze. If not, don’t worry, I’ll show you.

Selectors

The selector method allows you to traverse and select elements in the document. You can get data and set data using a selector. Imagine we have the following HTML in the website we want to scrape: <ul id="cities"> <li class="large">New York</li> <li id="medium">Portland</li> <li class="small">Salem</li> </ul> We can select id’s using (#), classes using (.), and elements by their tag names, ex: div. $(".large").text() // New York $("#medium").text() // Portland $("li[class=small]").html() // <li class="small">Salem</li>

Looping

Just like jQuery, we can also iterate through multiple elements with the each() function. Using the same HTML code as above, we can return the inner text of each li with the following code: $("li").each(function(i, elem) { cities[i] = $(this).text(); });// New York Portland Salem

Finding

Imagine we have two lists on our web site: <ul id="cities"> <li class="large">New York</li> <li id="c-medium">Portland</li> <li class="small">Salem</li> </ul> <ul id="towns"> <li class="large">Bend</li> <li id="t-medium">Hood River</li> <li class="small">Madras</li> </ul> We can select each list using their respective ID’s, then find the small city/town within each list: $("#cities").find(".small").text() // Salem$("#towns").find(".small").text() // Madras
Finding will search all descendant DOM elements, not just immediate children as shown in this example.

Children

Children is similar to find. The difference is that children only searches for immediate children of the selected element. $("#cities").children("#c-medium").text(); // Portland

Text > HTML

Up until this point, all of my examples have included the .text() function. Hopefully you’ve been able to figure out that this function is what gets the text of the selected element. You can also use .html() to return the html of the given element: $(".large").text() // Bend$(".large").html() // <li class="large">Bend</li>

Additional Methods

There are more methods than I can count, and the documentation for all of them is available here.

 Chrome Developer Tools

Don’t forget, the Chrome Developer Tools are your friend. In Google Chrome, you can easily find element, class, and ID names using: CTRL + SHIFT + C Finding class names with chrome dev tools As you seen in the above image, I’m able to hover over an element on the page and the element name and class name of the selected element are shown in real-time!

 Limitations

As Jaye Speaks points out:
MOST websites modify the DOM using JavaScript. Unfortunately Cheerio doesn’t resolve parsing a modified DOM. Dynamically generated content from procedures leveraging AJAX, client-side logic, and other async procedures are not available to Cheerio.
Remember this is an introduction to basic scraping. In order to get started you’ll need to find a static website with minimal DOM manipulation. The Ultimate Guide to Web Scraping with Node.js Web Scraping with Node.js Web Scraping with Node.js Debugging Node.js with Chrome DevTools Debugging Node.js with Google Chrome chrome node.js

Run Node.JS from page

Run Node.JS from page javascript button Express.js Web Application

Node.JS Examples

Node.js Examples

read a file

// include file system module var fs = require('fs'); // read file sample.html fs.readFile('sample.html', // callback function that is called when reading file is done function(err, data) { if (err) throw err; // data is a buffer containing file content console.log(data.toString('utf8')) });

Connection between multiple clients to a single server

On server-side, the 'request' event will fire every time a client connects to your websocket server. Your websocket server will be able to handle multiple clients out of the box. Check the server-side usage example for websocket module here: https://www.npmjs.com/package/websocket#server-example Installation In your project root: $ npm install websocket Then in your code: var WebSocketServer = require('websocket').server; var WebSocketClient = require('websocket').client; var WebSocketFrame = require('websocket').frame; var WebSocketRouter = require('websocket').router; var W3CWebSocket = require('websocket').w3cwebsocket;

Server Example

#!/usr/bin/env node var WebSocketServer = require('websocket').server; var http = require('http'); var server = http.createServer(function(request, response) { console.log((new Date()) + ' Received request for ' + request.url); response.writeHead(404); response.end(); }); server.listen(8080, function() { console.log((new Date()) + ' Server is listening on port 8080'); }); wsServer = new WebSocketServer({ httpServer: server, // You should not use autoAcceptConnections for production // applications, as it defeats all standard cross-origin protection // facilities built into the protocol and the browser. You should // *always* verify the connection's origin and decide whether or not // to accept it. autoAcceptConnections: false }); function originIsAllowed(origin) { // put logic here to detect whether the specified origin is allowed. return true; } wsServer.on('request', function(request) { if (!originIsAllowed(request.origin)) { // Make sure we only accept requests from an allowed origin request.reject(); console.log((new Date()) + ' Connection from origin ' + request.origin + ' rejected.'); return; } var connection = request.accept('echo-protocol', request.origin); console.log((new Date()) + ' Connection accepted.'); connection.on('message', function(message) { if (message.type === 'utf8') { console.log('Received Message: ' + message.utf8Data); connection.sendUTF(message.utf8Data); } else if (message.type === 'binary') { console.log('Received Binary Message of ' + message.binaryData.length + ' bytes'); connection.sendBytes(message.binaryData); } }); connection.on('close', function(reasonCode, description) { console.log((new Date()) + ' Peer ' + connection.remoteAddress + ' disconnected.'); }); });

Client Example

This is a simple example client that will print out any utf-8 messages it receives on the console, and periodically sends a random number. This code demonstrates a client in Node.js, not in the browser var WebSocketClient = require('websocket').client; var client = new WebSocketClient(); client.on('connectFailed', function(error) { console.log('Connect Error: ' + error.toString()); }); client.on('connect', function(connection) { console.log('WebSocket Client Connected'); connection.on('error', function(error) { console.log("Connection Error: " + error.toString()); }); connection.on('close', function() { console.log('echo-protocol Connection Closed'); }); connection.on('message', function(message) { if (message.type === 'utf8') { console.log("Received: '" + message.utf8Data + "'"); } }); function sendNumber() { if (connection.connected) { var number = Math.round(Math.random() * 0xFFFFFF); connection.sendUTF(number.toString()); setTimeout(sendNumber, 1000); } } sendNumber(); }); client.connect('ws://localhost:8080/', 'echo-protocol');

TOP Node.JS Examples

https://bytescout.com/blog/node-js-code-examples.html What is Node.js exactly, and what is Node.js used for? These are the essential questions we will answer here. Essentially, Node.js enables developers to build server apps in JavaScript. Projects in Node.js today commonly include: Web Application framework Messaging middleware Servers for online gaming REST APIs and Backend Static file server Node.js app development is wildly popular. Projects built with Node.js plus a combination of front-end developer tools are faster than similar PHP apps because of efficient Async functionality supported by Node. Node is also popular because now you can write JavaScript on both client and server. This article covers the following aspects: Building Node.js Skills Under the Hood First Node.js App Build Your Own Node.js Module Adding MySQL to Advanced Node.js Apps Data Connection – Node JS Examples Adding AngularJS Components Best Practices for Fresh Ideas in Node.js In this advanced intro to Node.js, we will explore the latest methods on how to create a Node.js module, and lead up to a method to create a simple Node.js app, in order to see the cutting-edge node in programming, as well as gain a full understanding of the Node.js app framework. These are apps we can build with Node.js and actually run simultaneously.

Building Node.js Skills

The best Node.js tutorials and MOOC online courses explain methods with well-documented code samples and snippets on how to learn Node.js properly. Extensive online education programs teach you all about Node.js and include topics such as writing node modules and how to create a node module. MOOCs cover more in-depth topics ranging from simple Node.js applications to how to create a node server. Node.js is an open-source and as such the organization’s own documentation is a great resource for study. Node’s API reference documentation contains details on functions and objects used to build Node.js programs. It also illustrates the arguments or parameters each method requires, as well as returned values of methods, and related predictable errors associated with each method. Importantly, developers take careful note of method variations by the version of Node.js as documented – the latest version is 9.10.1. Additional developer resources are provided such as security issues and updates, and the latest compatibility with ES6.

Under the Hood

Node uses Google Chrome’s runtime engine to translate JavaScript code to native machine code which runs on the server environment. Node.js is an open-source framework that runs on most popular OS platforms like Windows, Linux, and Mac OS X. Express.js, is the standard web application framework for use with Node.js, Express is a minimal framework with much of the functionality built as plugins. A typical app will use Express for the backend, MongoDB database, and AngularJS frontend (called MEAN stack). The standard “Hello world” in Node is: var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Node.js World!'); }).listen(8080);

First Node.js App

In order to follow our Node JS examples, be sure to download and install the latest Node.js and update Node.js dependencies. The standard Node.js documentation includes complete details on how to install Node.js, and naturally, you will want to use the latest Node.js version. Trawling Google for tips will produce hits like, “node latest version.” And many of these pages refer to a specific package in Ubuntu, along with related bug reports. Making the distinction between beta and node latest stable version is important to developers who wish to experiment with the newest features. Node.js generates dynamic page content, and in combination with AngularJS, the fastest possible single-page applications can be built easily. Node JS examples include creating and deleting server files, as well as open, read, and write ops to server databases. Node is event-driven with events including HTTP requests. Node files include tasks to be executed when triggered by these events. With that background, let’s get started setting up a real Node.js application. Use the command npm init to initialize a new npm-project. This command creates a new package.json file and adds several lines of code for the basic structure, and this can be modified to track all the dependencies of the project. In order to test that your Node setup is correct, let’s run a quick test. Copy the “Hello Node World!” code above to a text file and name it, “test.js” to start. Now open a command-line interface (CLI) and enter the command npm init. You can now run your hello world in the CLI by typing: node test.js at the command prompt. If this works, your computer is now functioning as a web server and listening for events or requests on port 8080.

Build Your Own Node.js Module

The require (‘http’) module is a built-in Node module that invokes the functionality of the HTTP library to create a local server. To add your own Node.js modules use the export statement to make functions in your module available externally. Create a new text file to contain the functions in your module called, “modules.js” and add this function to return today’s date and time: exports.CurrentDateTime = function () { var d = new Date(); return d; }; Next, you can add the require(‘./modules’); as below to include the modules file. And by the way, Express framework can be included with a similar syntax as const express = require(‘express’); to expose all its methods. Now you can reference the methods of your function in this way: var http = require('http'); var dateTime = require('./modules'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); res.write("Current date and time: " + dateTime.CurrentDateTime()); res.end(); }).listen(8080); As mentioned, the HTTP module exposed with createServer() creates an HTTP server and listens to the server port 8080, and then responds to client requests. A function passed to the http.createServer() method will execute when a client accesses our computer at port 8080.

Adding MySQL to Advanced Node.js Apps

Today’s most popular combination of developer tools includes Express.js as a Node backend framework along with MySQL database and AngularJS frontend functionality. We need an additional setup to make these work together and achieve full compatibility. Naturally, the core components must be installed first, so let’s briefly discuss the order of doing so. On Windows, for example, you may already have installed MySQL Server via the MySQL Installer, which is satisfactory for this example. And MySQL X Protocol plugin may already be enabled – if not enable it now by re-configuring the MySQL Server. Enabling the X Plugin exposes the use of MySQL as a document store. Assuming Node and Express are now also installed, we will open a terminal and navigate to the location path to create a project. In the desired folder, install the Express application generator, which creates the project files and dependencies for a new application. At the CLI prompt just type: $ npm install express-generator –g and press Enter. We want to use HTML instead of the native Jade interpreter of Express so just enter this command for the setup: $ express MySQL pname –ejs and hit Enter (name is the actual name of your MySQL DB. You can now verify the new server is operating with the new app framework by entering: $ npm start and opening a browser to http://localhost:3000 The next step is to connect Node.js to MySQL. Enter this command: $ npm install mysql-connector-nodejs at the CLI prompt and hit Enter to do so. Now install AngularJS and Angular-Route modules with the following command: $ npm install angular@1.5.8 angular-route@1.5.8 and hit Enter. With everything installed, we can begin coding the web application. First, we will add a JSON file to the data folder with some data. Call it freshideas.json for this project. Add some data in a consistent format to ref later. A Node programming example (JSON file record) might include: { "title_": "Node.js: Testing Improvements", "link_": "http://mynodejs.com/freshideas/", "intro_": "Using Node.js with MySQL", "pub_": "Thu Sep 29 2016", }, Now we will update the app to create a schema plus a collection to upload the initial data. Next, open the ”www” file that is in the bin folder, which contains configuration details for the webserver to host the app. Now, add a function to create the schema, the collection, and upload the JSON data file to the collection. Add this code to the end of the “www” file: function configuredbDataBase(callback) { mysql.getSession({ host: 'localhost', port: '33080', dbUser: 'root', dbPassword: ''pwd_ }).then(function (session) { var schema = session.getSchema('mysqlPname'); schema.existsInDatabase().then(function (exists) { if (!exists) { session.createSchema('mysqlPname').then(function (Pnamechema) { Promise.all([ newSchema.createCollection('Pname').then(function (PnameColl) { PnameColl.add(initialData).execute().then(function (PnameAdded) { var rowsAffected = PnameAdded.getAffectedItemsCount(); if (rowsAffected1 <= 0) { console.log('No Pname Added'); } else { console.log(rowsAffected1 + 'Pname Added'); } }).catch(function (err) { console.log(err.message); console.log(err.stack); }); }).catch(function (err) { console.log(err.message); console.log(err.stack); }) ]).then(function () { session.close(); callback(Done: Collection initialized'); }); }).catch(function (err) { console.log(err.message); console.log(err.stack); }); } else { session.close(); callback('Database Already Configured'); } }); }).catch(function (err) { console.log(err.message); console.log(err.stack); }); } function configureDataBase(callback) { mysql.getSession({ host: 'localhost', port: '33080', dbUser: 'root', dbPassword: '' }).then(function (session) { var schema = session.getSchema('mysqlPname'); schema.existsInDatabase().then(function (exists) { if (!exists) { session.createSchema('mysqlPname').then(function (newSchema) { Promise.all([ newSchema.createCollection('Pname').then(function (PnameColl) { PnameColl.add(initialData).execute().then(function (PnameAdded) { var rowsAffected1 = PnameAdded.getAffectedItemsCount(); if (rowsAffected1 <= 0) { console.log('No Pname Added'); } else { console.log(rowsAffected1 + ' Pname Added'); } }).catch(function (err) { console.log(err.message); console.log(err.stack); }); } else { session.close(); callback('Database Configured'); } }); }).catch(function (err) { console.log(err.message); console.log(err.stack); }); } The above snippet illustrates how to configure the config for initialization and connecting the MySQL DB to the app, assigning the xdevapi module to the MySQL variable. The MySQL variables are used by the configureDataBase function and must be defined prior to calling the function. An instance of an EventEmitter is created and configured in the event that calls the function to create the schema and collection.

Data Connection – Node JS Examples

In this model, we will add a new file called Pname.js as consistent with the code to configure the MySQL. The new module will contain the methods used over the collections. As an example method let’s add a module to fetch documents from the collection. First, we define two variables, one to load MySQL xdevapi and one to store the configuration for connections to the server. Here is the basic code, which you can expand to suit your app: var mysql_ = require('@mysql/xdevapi'); var config_ = { host: 'localhost', port: '33080', userid: 'root', password: '', pwd_ schema: 'mysqlPname', collection: 'Pname' }; Finally, we will add the method to get the export object of this module and then call getSession method to create a server connection. When the session is running we can get the schema and collection containing the documents. We then define one array variable as a container for documents that are returned from the collection. Executing the find method without a filter will return all the documents. If the execute method returned all documents they will be added to the array variable. As such, we have a Node.js server capable of asynchronous access to the MySQL DB, and running in the Express.js context.

Adding AngularJS Components

To add components using the Angular framework to display the docs from Pname, we will create a folder in the public Javascripts path with the defined name, and this folder will contain the template to add new docs as well. Begin by adding the new-comment.module.js component to the folder with the following code: angular.module('newDoc', ['ngRoute']); module('newDoc'). component('newDoc', { templateUrl: '/javascripts/Doc/new-comment.template.html', controller: ['$routeParams', 'Pname', function NewDocController($routeParams, Pname) { this.postIdl_ = $routeParams._Id; this.addComment = function () { if (!this.postIdl_ || (!this.comment || this.comment === ")) { return; } Pname.addComment({ id: this.postId, Doc: this.Doc }); }; this.cancelAddComment = function () { this.Doc= '', this.postIdl_ = ''; Pname.cancelAddDoc(); }; } ] }); Here is an excellent view of the powerful capability to enable Angular as a frontend for a Node.js server. The demo shows how to build a full-stack JavaScript app using all the platforms including Node.js examples with MySQL, via the framework Express, and AngularJS as frontend.

Best Practices for Fresh Ideas in Node.js

Node.js 8 version included Async and Await functions for handling asynchronous file loading. This accelerated Node.js potential beyond PHP for many applications. It is essential to master these ES6 level functions to optimize your coding skills. Node.js 8.5 introduced support for ES modules with import() and export(). Further, Node.js 8.8 offered HTTP/2 without a flag. This supports server push and multiplexing and thus enables efficient loading of native modules in a browser. Note that Express support is in progress – HTTP/2 is experimental in the scope of Node.js with libraries now in development. Beyond the borders of Node.js itself, many supporting technologies enhance the developer experience, such as containers and virtualization. Docker technology provides containers, which virtualize an OS and render a truly portable and scalable web application.

Node.js tutorial

Node.js is the runtime and npm is the Package Manager for Node.js modules. To run a Node.js application, you will need to install the Node.js runtime on your machine. The Node Package Manager is included in the Node.js distribution. You'll need to open a new terminal (command prompt) for the node and npm command-line tools to be on your PATH.
Tip: To test that you've got Node.js correctly installed on your computer, open a new terminal and type node --help and you should see the usage documentation.

Hello World

Let's get started by creating the simplest Node.js application, "Hello World". Create an empty folder called "hello", navigate into and open VS Code: mkdir hello cd hello code .
Tip: You can open files or folders directly from the command line. The period '.' refers to the current folder, therefore VS Code will start and open the Hello folder.
From the File Explorer toolbar, press the New File button: and name the file app.js: Create a simple string variable in app.js and send the contents of the string to the console: var msg = 'Hello World'; console.log(msg); save the file.

Running Hello World

It's simple to run app.js with Node.js. From a terminal, just type: node app.js You should see "Hello World" output to the terminal and then Node.js returns.

Integrated Terminal

VS Code has an integrated terminal which you can use to run shell commands. You can run Node.js directly from there and avoid switching out of VS Code while running command-line tools. View > Terminal (⌃` (Windows, Linux Ctrl+`) with the backtick character) will open the integrated terminal and you can run node app.js there:

Debugging Hello World

VS Code ships with a debugger for Node.js applications. Let's try debugging our simple Hello World application. To set a breakpoint in app.js, put the editor cursor on the first line and press F9 or click in the editor left gutter next to the line numbers. A red circle will appear in the gutter. app.js breakpoint set To start debugging, select the Run View in the Activity Bar: Run icon You can now click Debug toolbar green arrow or press F5 to launch and debug "Hello World". Your breakpoint will be hit and you can view and step through the simple application. Notice that VS Code displays a different colored Status Bar to indicate it is in Debug mode and the DEBUG CONSOLE is displayed. hello world debugging Now that you've seen VS Code in action with "Hello World", the next section shows using VS Code with a full-stack Node.js web app.
Note: We're done with the "Hello World" example so navigate out of that folder before you create an Express app. You can delete the "Hello" folder if you wish as it is not required for the rest of the walkthrough.

An Express application

Express is a very popular application framework for building and running Node.js applications. You can scaffold (create) a new Express application using the Express Generator tool. The Express Generator is shipped as an npm module and installed by using the npm command-line tool npm.
Tip: To test that you've got npm correctly installed on your computer, type npm --help from a terminal and you should see the usage documentation.
Install the Express Generator by running the following from a terminal: npm install -g express-generator The -g switch installs the Express Generator globally on your machine so you can run it from anywhere. We can now scaffold a new Express application called myExpressApp by running: express myExpressApp --view pug This creates a new folder called myExpressApp with the contents of your application. The --view pug parameters tell the generator to use the pug template engine. To install all of the application's dependencies (again shipped as npm modules), go to the new folder and execute npm install: cd myExpressApp npm install At this point, we should test that our application runs. The generated Express application has a package.json file which includes a start script to run node ./bin/www. This will start the Node.js application running. From a terminal in the Express application folder, run: npm start
Tip: You can enable an explorer for the npm scripts in your workspace using the npm.enableScriptExplorer setting.
The Node.js web server will start and you can browse to http://localhost:3000 to see the running application. Your first Node Express App

Great code editing

Close the browser and from a terminal in the myExpressApp folder, stop the Node.js server by pressing CTRL+C. Now launch VS Code: code .
Note: If you've been using the VS Code integrated terminal to install the Express generator and scaffold the app, you can open the myExpressApp folder from your running VS Code instance with the File > Open Folder command.
The Node.js and Express documentation does a great job explaining how to build rich applications using the platform and framework. Visual Studio Code will make you more productive in developing these types of applications by providing great code editing and navigation experiences. Open the file app.js and hover over the Node.js global object __dirname. Notice how VS Code understands that __dirname is a string. Even more interesting, you can get full IntelliSense against the Node.js framework. For example, you can require http and get full IntelliSense against the http class as you type in Visual Studio Code. http IntelliSense VS Code uses TypeScript type declaration (typings) files (for example node.d.ts) to provide metadata to VS Code about the JavaScript based frameworks you are consuming in your application. Type declaration files are written in TypeScript so they can express the data types of parameters and functions, allowing VS Code to provide a rich IntelliSense experience. Thanks to a feature called Automatic Type Acquisition, you do not have to worry about downloading these type declaration files, VS Code will install them automatically for you. You can also write code that references modules in other files. For example, in app.js we require the ./routes/index module, which exports an Express.Router class. If you bring up IntelliSense on index, you can see the shape of the Router class. Express.Router IntelliSense

Debug your Express app

You will need to create a debugger configuration file launch.json for your Express application. Click on the Run icon in the Activity Bar and then the Configure gear icon at the top of the Run view to create a default launch.json file. Select the Node.js environment by ensuring that the type property in configurations is set to "node". When the file is first created, VS Code will look in package.json for a start script and will use that value as the program (which in this case is "${workspaceFolder}\\bin\\www) for the Launch Program configuration. { "version": "0.2.0", "configurations": [ { "type": "node", "request": "launch", "name": "Launch Program", "program": "${workspaceFolder}\\bin\\www" } ] } Save the new file and make sure Launch Program is selected in the configuration drop-down at the top of the Run view. Open app.js and set a breakpoint near the top of the file where the Express app object is created by clicking in the gutter to the left of the line number. Press F5 to start debugging the application. VS Code will start the server in a new terminal and hit the breakpoint we set. From there you can inspect variables, create watches, and step through your code. Debug session

Deploy your application

If you'd like to learn how to deploy your web application, check out the Deploying Applications to Azure tutorials where we show how to run your website in Azure.

simple http proxy

create a simple http proxy in node.js

websockets to communicate between client and node.js server

https://medium.com/@joekarlsson/complete-guide-to-node-client-server-communication-b156440c029 This is a demo shows a demo of a client connecting to a websocket server and sharing data. Here is the server.js of a websocket. 'use strict'; const WebSocketServer = require('ws').Server const wss = new WebSocketServer({ port: 8081 }); wss.on('connection', ((ws) => { ws.on('message', (message) => { console.log(`received: ${message}`); }); ws.on('end', () => { console.log('Connection ended...'); }); ws.send('Hello Client'); })); Here is the client.js of a websocket. console.log('open: '); var ws = new WebSocket("ws://127.0.0.1:8081"); ws.onopen = function (event) { console.log('Connection is open ...'); ws.send("Hello Server"); }; ws.onerror = function (err) { console.log('err: ', err); } ws.onmessage = function (event) { console.log(event.data); document.body.innerHTML += event.data + '<br>'; }; ws.onclose = function() { console.log("Connection is closed..."); } https://stackoverflow.com/questions/52407025/client-server-communication-in-node-js I would use websockets for this. Once you've set up the connection you can initiate messages from either side. The WS npm package makes this pretty easy. Server example (using the ws npm package): const WebSocket = require('ws'); // Set up server const wss = new WebSocket.Server({ port: 8080 }); // Wire up some logic for the connection event (when a client connects) wss.on('connection', function connection(ws) { // Wire up logic for the message event (when a client sends something) ws.on('message', function incoming(message) { console.log('received: %s', message); }); // Send a message ws.send('Hello client!'); }); Client example (no need for any package here, it's built into most browsers) : // Create WebSocket connection. const socket = new WebSocket('ws://localhost:8080'); // Connection opened socket.addEventListener('open', function (event) { socket.send('Hello Server!'); }); // Listen for messages socket.addEventListener('message', function (event) { console.log('Message from server ', event.data); }); There are alternatives if you can't use websockets, such as polling (where the client periodically calls the server to see if theres a message), and long-polling (where the server holds a http request open for an artificially long period of time until a message is ready).

send data to USB device in node.js

USB Library for Node.JS Installation Libusb is included as a submodule. npm install usb Windows Use Zadig to install the WinUSB driver for your USB device. Otherwise you will get LIBUSB_ERROR_NOT_SUPPORTED when attempting to open devices. var usb = require('usb') usb Top-level object. usb.getDeviceList() Return a list of Device objects for the USB devices attached to the system. usb.findByIds(vid, pid) Convenience method to get the first device with the specified VID and PID, or undefined if no such device is present. usb.LIBUSB_* Constant properties from libusb usb.setDebugLevel(level : int) Set the libusb debug level (between 0 and 4) Device Represents a USB device. .busNumber Integer USB device number .deviceAddress Integer USB device address .portNumbers Array containing the USB device port numbers, or undefined if not supported on this platform.

Scrape a site with Node and Cheerio in 5 minutes

https://www.twilio.com/blog/web-scraping-and-parsing-html-with-node-js-and-cheerio Web Scraping with Javascript and NodeJS Cheerio tutorial, web scraping in JavaScript Website scraping is a common problem with a common toolset. Two approaches dominate the web today: Automate a browser to navigate a site programmatically, using tools like Puppeteer or Selenium. Make an HTTP request to a website, retrieving data on the page using tools like Cheerio or BeautifulSoup. The first approach — driving a real browser programmatically — is typical for projects where you’re running automated website tests, or capturing screenshots of your site. The second approach has limitations. For example, Cheerio “is not a browser” and “does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript”. But this approach is simple, and often sufficient, especially when you’re learning how scraping works.

Using Got to retrieve data to use with Cheerio

First let's write some code to grab the HTML from the web page, and look at how we can start parsing through it. The following code will send a GET request to the web page we want, and will create a Cheerio object with the HTML from that page. We'll name it $ following the infamous jQuery convention: const fs = require('fs'); const cheerio = require('cheerio'); const got = require('got'); const vgmUrl= 'https://www.vgmusic.com/music/console/nintendo/nes'; got(vgmUrl).then(response => { const $ = cheerio.load(response.body); console.log($('title')[0].text()); console.log($('h1').text()); // print the text }).catch(err => { console.log(err); }); With this $ object, you can navigate through the HTML and retrieve DOM elements for the data you want, in the same way that you can with jQuery. For example, $('title') will get you an array of objects corresponding to every <title> tag on the page. There's typically only one title element, so this will be an array with one object. If you run this code with the command node index.js, it will log the structure of this object to the console.

Getting familiar with Cheerio

When you have an object corresponding to an element in the HTML you're parsing through, you can do things like navigate through its children, parent and sibling elements. The child of this <title> element is the text within the tags. So console.log($('title')[0].children[0].data); will log the title of the web page. If you want to get more specific in your query, there are a variety of selectors you can use to parse through the HTML. Two of the most common ones are to search for elements by class or ID. If you wanted to get a div with the ID of "menu" you would run $('#menu') and if you wanted all of the columns in the table of VGM MIDIs with the "header" class, you'd do $('td.header') What we want on this page are the hyperlinks to all of the MIDI files we need to download. We can start by getting every link on the page using $('a'). Add the following to your code in index.js: got(vgmUrl).then(response => { const $ = cheerio.load(response.body); $('a').each((i, link) => { const href = link.attribs.href; console.log(href); }); }).catch(err => { console.log(err); }); This code logs the URL of every link on the page. Notice that we're able to look through all elements from a given selector using the .each() function. Iterating through every link on the page is great, but we're going to need to get a little more specific than that if we want to download all of the MIDI files.

Filtering through HTML elements with Cheerio

Before writing more code to parse the content that we want, let’s first take a look at the HTML that’s rendered by the browser. Every web page is different, and sometimes getting the right data out of them requires a bit of creativity, pattern recognition, and experimentation. Our goal is to download a bunch of MIDI files, but there are a lot of duplicate tracks on this webpage, as well as remixes of songs. We only want one of each song, and because our ultimate goal is to use this data to train a neural network to generate accurate Nintendo music, we won't want to train it on user-created remixes. When you're writing code to parse through a web page, it's usually helpful to use the developer tools available to you in most modern browsers. If you right-click on the element you're interested in, you can inspect the HTML behind that element to get more insight. With Cheerio, you can write filter functions to fine-tune which data you want from your selectors. These functions loop through all elements for a given selector and return true or false based on whether they should be included in the set or not. If you looked through the data that was logged in the previous step, you might have noticed that there are quite a few links on the page that have no href attribute, and therefore lead nowhere. We can be sure those are not the MIDIs we are looking for, so let's write a short function to filter those out as well as making sure that elements which do contain a href element lead to a .mid file: const isMidi = (i, link) => { // Return false if there is no href attribute. if(typeof link.attribs.href === 'undefined') { return false } return link.attribs.href.includes('.mid'); }; Now we have the problem of not wanting to download duplicates or user generated remixes. For this we can use regular expressions to make sure we are only getting links whose text has no parentheses, as only the duplicates and remixes contain parentheses: const noParens = (i, link) => { // Regular expression to determine if the text has parentheses. const parensRegex = /^((?!\().)*$/; return parensRegex.test(link.children[0].data); }; Try adding these to your code in index.js: got(vgmUrl).then(response => { const $ = cheerio.load(response.body); $('a').filter(isMidi).filter(noParens).each((i, link) => { const href = link.attribs.href; console.log(href); }); }); Run this code again and it should only be printing .mid files.

Downloading the MIDI files we want from the webpage

Now that we have working code to iterate through every MIDI file that we want, we have to write code to download all of them. In the callback function for looping through all of the MIDI links, add this code to stream the MIDI download into a local file, complete with error checking: $('a').filter(isMidi).filter(noParens).each((i, link) => { const fileName = link.attribs.href; got.stream(`${vgmUrl}/${fileName}`) .on('error', err => { console.log(err); console.log(`Error on ${vgmUrl}/${fileName}`) }) .pipe(fs.createWriteStream(`MIDIs/${fileName}`)) .on('error', err => { console.log(err); console.log(`Error on ${vgmUrl}/${fileName}`) }) .on('finish', () => console.log(`Finished ${fileName}`)); }); Run this code from a directory where you want to save all of the MIDI files, and watch your terminal screen display all 2230 MIDI files that you downloaded (at the time of writing this). With that, we should be finished scraping all of the MIDI files we need.

Worked Sample

const fs = require('fs'); const cheerio = require('cheerio'); const got = require('got'); const theAddr= "https://williamkpchan.github.io/LibDocs/GoNotes.html" got(theAddr).then(response => { const $ = cheerio.load(response.body); $('h2').each(function(i) { console.log($(this).text()) // extract text content console.log(String($(this))) // cvt object to string }); }).catch(err => { console.log(err); });

sample

const cheerio = require('cheerio'); const $ = cheerio.load('<h2 class="title">Hello world</h2>'); $('h2.title').text('Hello there!'); $('h2').addClass('welcome'); $.html(); //=> <html><head></head><body><h2 class="title welcome">Hello there!</h2></body></html>

Copy to clipboard

const clipboardy = require('clipboardy'); // Copy clipboardy.writeSync('🦄'); // Paste clipboardy.readSync(); //🦄

Debugging

Nodejs Debugging Guide

await is only valid in async function

await can only be called in a function marked as async. (async function(){ var body = await httpGet('link'); $.response.setBody(body); })() Basically when you use one asynchronous operation, you need to make the entire flow asynchronous as well. So the async keyword kindof uses ES6 generator function and makes it return a promise.

Promises

Promises simplify deferred and asynchronous computations. A promise represents an operation that hasn't completed yet.

chalk colors

chalk colors Example: chalk.red.bold.underline('Hello', 'world'); Colors: black, red, green, yellow, blue, magenta, cyan, white, blackBright (alias: gray, grey), redBright, greenBright, yellowBright, blueBright, magentaBright, cyanBright, whiteBright Background colors: bgBlack, bgRed, bgGreen, bgYellow, bgBlue, bgMagenta, bgCyan, bgWhite, bgBlackBright (alias: bgGray, bgGrey), bgRedBright, bgGreenBright, bgYellowBright, bgBlueBright, bgMagentaBright, bgCyanBright, bgWhiteBright Modifiers: reset - Resets the current color chain. bold - Make text bold. dim - Emitting only a small amount of light. italic - Make text italic. (Not widely supported) underline - Make text underline. (Not widely supported) inverse- Inverse background and foreground colors. hidden - Prints the text, but makes it invisible. strikethrough - Puts a horizontal line through the center of the text. (Not widely supported) visible- Prints the text only when Chalk has a color level > 0. Can be useful for things that are purely cosmetic.

Node.js MySQL Tutorial About CRUD Application

Node.js MySQL Tutorial About CRUD Application

Serving static files in Express

https://expressjs.com/en/starter/static-files.html To serve static files such as images, CSS files, and JavaScript files, use the express.static built-in middleware function in Express. The function signature is: express.static(root, [options]) The root argument specifies the root directory from which to serve static assets. For example: app.use(express.static('public')) Now, you can load the files that are in the public directory: http://localhost:3000/images/kitten.jpg http://localhost:3000/css/style.css http://localhost:3000/js/app.js http://localhost:3000/images/bg.png http://localhost:3000/hello.html Express looks up the files relative to the static directory, so the name of the static directory is not part of the URL. To use multiple static assets directories, call the express.static middleware function multiple times: app.use(express.static('public')) app.use(express.static('files')) Express looks up the files in the order in which you set the static directories with the express.static middleware function. NOTE: For best results, use a reverse proxy cache to improve performance of serving static assets. To create a virtual path prefix (where the path does not actually exist in the file system) for files that are served by the express.static function, specify a mount path for the static directory, as shown below: app.use('/static', express.static('public')) Now, you can load the files that are in the public directory from the /static path prefix. http://localhost:3000/static/images/kitten.jpg http://localhost:3000/static/css/style.css http://localhost:3000/static/js/app.js http://localhost:3000/static/images/bg.png http://localhost:3000/static/hello.html However, the path that you provide to the express.static function is relative to the directory from where you launch your node process. If you run the express app from another directory, it’s safer to use the absolute path of the directory that you want to serve: app.use('/static', express.static(path.join(__dirname, 'public')))

express dynamic content

Node.js dynamic content

Socket.io with multiple clients

Socket.io with multiple clients connecting to same server Server side: // you have your socket ready and inside the on('connect'...) you handle a register event where the client passes an id if one exists else you create one. // a client requests registration socket.on('register', function(clientUuid){ // create an id if client doesn't already have one var id = clientUuid == null? uuid.v4() : clientUuid; var nsp; var ns = "/" + id; socket.join(id); // create a room using this id only for this client var nsp = app.io.of(ns); // save it to a dictionary for future use clientToRooms[ns] = nsp; // set up what to do on connection nsp.on('connection', function(nsSocket){ console.log('someone connected'); nsSocket.on('Info', function(data){ // just an example }); }); Client side: // you already have declared uuid, uuidSocket and have connected to the socket previously so you define what to do on register: socket.on("register", function(data){ if (uuid == undefined || uuidSocket == undefined) {// first time we get id from server //save id to a variable uuid = data.uuid; // save to localstorage for further usage (optional - only if you want one client per browser e.g.) localStorage.setItem('socketUUID', uuid); uuidSocket = io(serverHost + "/" + uuid); // set up the room --> will trigger nsp.on('connect',... ) on the server uuidSocket.on("Info", function(data){ //handle on Info }); // initiate the register from the client socket.emit("register", uuid);

Send broadcast to all connected client in node js

var WebSocketServer = require("ws").Server; var wss = new WebSocketServer({port:8100}); wss.on('connection', function connection(ws) { ws.on('message', function(message) { wss.broadcast(message); } } wss.broadcast = function broadcast(msg) { console.log(msg); wss.clients.forEach(function each(client) { client.send(msg); }); };

socket.io broadcast to all connected sockets

server: //emit only to the socket that the sender is connected to. socket.on('target', function(index){ //Hard coded answers var solution = "43526978"; console.log('index: ' + solution[index]); socket.emit('targetResult', solution[index]); }); In order to emit to everyone, use the following syntax: socket.on('target', function(index){ //Hard coded answers var solution = "43526978"; console.log('index: ' + solution[index]); io.sockets.emit('targetResult', solution[index]); }); Notice that, changed socket.emit to io.sockets.emit. This makes socket.io broadcast to all connected sockets.

Real Time Applications with Socket.io

https://www.rithmschool.com/courses/intermediate-node-express/real-time-applications io.on('connection', function(socket){ console.log("connection!"); io.sockets.emit('from server', 'HELLO!'); socket.on('from client', function(data){ console.log(data); }); }); http.listen(3000, function(){ console.log('listening on localhost:3000'); }); client <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>First Socket.io application</title> </head> <body> <script src="/socket.io/socket.io.js"></script> <script> var socket = io(); socket.on('from server', function (data) { console.log(data); socket.emit('from client', 'WORLD!'); }); </script> </body> </html> The client creates a websocket connection with the server (var socket = io()). Whenever a connection is created, the server receives a 'conncection' event. In our application, this causes the server to send to all connected websockets a 'from server' event, with a message of 'HELLO!'. The client is set up to receive 'from server' events. When it receives such a request, it console lots the data, then emits its own event, called 'from client', with data of 'WORLD!'. One the server receives a 'from client' event, it logs the corresponding data to the terminal. One thing to note is that when a client connects, the 'from server' event gets emitted to all websocket connections (we emit on io.sockets, not socket). You can verify that all clients receive the event by going to localhost:3000 on two separate tabs. The first tab should have 'HELLO!' logged twice: once when it connected to the server, and once when the other tab connected!

Different kinds of messages and rooms

When using Socket.io, there are different types of messages you may want to send to different users. For managing chat rooms, socket.io has the idea of a room which has its own name and each socket has its own id to ensure private messages can work. Here are the helpful methods for sending certain types of messages to certain users: io.emit('name of event'); or io.sockets.emit('name of event'); - sends to everyone in every room, including the sender io.to('name of room').emit('name of event'); - sends to everyone including the sender, in a room (the first parameter to to) socket.broadcast.to('name of room').emit('name of event'); - sends to everyone except the sender in a room (the first parameter to to) socket.emit('name of event') - sends to the sender and no one else socket.broadcast.to(someOtherSocket.id).emit(); - Send to specific socket only (used for private chat)

Define what a mailer is

Use nodemailer to send users emails mailers Another very common task when building backend applications is sending email to users. This requires setting up an email server and configuring it with your transactional mail provider (Mandrill, SendGrid, Gmail etc.). To get started sending mail to your users, check out Nodemailer. A sample application to send emails Since Gmail is not the easiest to configure and Mandrill and SendGrid do not have a free tier, we will be using mailgun to set up transactional email. You can create a free account here. Let's now imagine that we want to send some information to a user when a form is submitted. Here is what that configuration might look like: require('dotenv').load(); var express = require("express"); var app = express(); var bodyParser = require("body-parser"); var nodemailer = require('nodemailer'); var mg = require('nodemailer-mailgun-transport'); app.set("view engine", "pug"); app.use(bodyParser.urlencoded({extended:true})); var auth = { auth: { api_key: process.env.SECRET_KEY, domain: process.env.DOMAIN } } var nodemailerMailgun = nodemailer.createTransport(mg(auth)); app.get("/", function(req, res, next){ res.render("index"); }); app.get("/new", function(req, res, next){ res.render("new"); }); app.post('/', function(req, res, next){ var mailOpts = { from: 'elie@yourdomain.com', to: req.body.to, subject: req.body.subject, text : 'test message form mailgun', html : '<b>test message form mailgun</b>' }; nodemailerMailgun.sendMail(mailOpts, function (err, response) { if (err) res.send(err); else { res.send('email sent!'); } }); }); app.listen(3000, function(){ console.log("Server is listening on port 3000"); }); As an exercise, try to work with this code to create an application that sends email! Define what web scraping is Use cheerio to scrape data from a website Web Scraping Web scraping is the process of downloading and extracting data from a website. There are 3 main steps in scraping: Downloading the HTML document from a website (we will be doing this with the request module) Extracting data from the downloaded HTML (we will be doing this with cheerio) Doing something with the data (usually saving it somehow, e.g. by writing to a file with fs or saving to a database) Typically, you would want to access the data using a website's API, but often websites don't provide this programmatic access. When a website doesn't provide a programmatic way to download data, web scraping is a great way to solve the problem! Robots.txt Before you begin web scraping, it is a best practice to understand and honor a site's robots.txt file. The file may exist on any website that you visit and its role is to tell programs (like our web scraper) about rules on what it should and should not download on the site. Here is Rithm School's robots.txt file. As you can see, it doesn't provide any restrictions. Compare that file to Craigslist's robots.txt file which is much more restrictive on what can be downloaded by a program. You can find out more information about the robots.txt file here. Using cheerio Cheerio is one of the many modules Node has for web scraping, but it is by far the easiest to get up and running with especially if you know jQuery! The library is based off of jQuery and has identical functions for finding, traversing and manipulating the DOM. However, cheerio expects you to have an HTML page which it will load for you to work with. In order to retrieve the page, we need to make an HTTP request to get the HTML and we will be using the request module to do that. Let's start with a simple application: mkdir scraping_example && cd scraping_example touch app.js npm init -y npm install --save cheerio request Now in our app.js, let's scrape the first page of Craigslist: var cheerio = require("cheerio"); var request = require("request"); request('https://sfbay.craigslist.org/search/apa?bedrooms=1&bathrooms=1&availabilityMode=0', function(err, response, body){ var $ = cheerio.load(body); // let's see the average price of 1 bedroom and bathroom in san francisco (based on 1 page of craigslist...) var avg = Array.from($(".result-price")).reduce(function(acc,next){ return acc + parseInt($(next).text().substr(1)); }, 0) / $(".result-price").length; console.log(`Average 1 bedroom price: \$${avg.toFixed(2)}`) }); In the terminal, if you run node app.js, it should tell you what the average price of a one-bedroom apartment is in the Bay Area! Define what a background job is Explain what redis is and its use in background jobs Use kue to run background jobs Background jobs Another common issue when building applications is ensuring that long processes or tasks are not blocking or slowing down the entire application. This could happen when many emails are being sent, large files are being uploaded, or when you want to execute a process and you know there will be less traffic. Background job library often involve using another data store (usually a queue) to handle the order and management of jobs being processed. Kue is a very common tool (written by the same people who made Mongoose!) for handling background jobs. You can read more about it here Getting started with kue To get started with kue we need to npm install --save kue and require the kue module and create a queue (which is backed by an in memory data store called redis) var kue = require('kue'); var queue = kue.createQueue(); Once you have created the queue - it's time to queue up some tasks! These tasks can be time consuming web scraping, gathering analytics, making bulk database writes, uploading files or sending emails. function sendEmail(title,to,subject,message, done){ var email = queue.create('email', {title, to, subject, message}); done(); } router.post('/', function(req, res, next) { queue.process('email', function(job, done){ const {title,to,subject,message} = req.body; // use nodemailer or another tool to send an email sendEmail(title,to,subject,message); }); }); Kue UI Kue also ships with a nice package called kue-dashboard which provides an interface for you to see jobs running, stalled, failed, completed and much more. You can access it by starting a new server with node_modules/kue/bin/kue-dashboard -p 3001

Node.js Websocket Examples with Socket.io

What are Websockets?

Over the past few years, a new type of communication started to emerge on the web and in mobile apps, called websockets. This new protocol opens up a much faster and more efficient line of communication to the client. Like HTTP, websockets run on top of a TCP connection, but they're much faster because we don't have to open a new connection for each time we want to send a message since the connection is kept alive for as long as the server or client wants. Even better, since the connection never dies we finally have full-duplex communication available to us, meaning we can push data to the client instead of having to wait for them to ask for data from the server. This allows for data to be communicated back and forth, which is ideal for things like real-time chat applications, or even games.

Some Websocket Examples

Of the many different websocket libraries for Node.js available to us, I chose to use socket.io throughout this article because it seems to be the most popular and is, in my opinion, the easiest to use. While each library has its own unique API, they also have many similarities since they're all built on top of the same protocol, so hopefully you'll be able to translate the code below to any library you want to use. For the HTTP server, I'll be using Express, which is the most popular Node server out there. Keep in mind that you can also just use the plain http module if you don't need all of the features of Express. Although, since most applications will use Express, that's what we'll be using as well. Note: Throughout these examples I have removed much of the boilerplate code, so some of this code won't work out of the box. In most cases you can refer to the first example to get the boilerplate code.

Establishing the Connection

In order for a connection to be established between the client and server, the server must do two things:
  • Hook in to the HTTP server to handle websocket connections
  • Serve up the socket.io.js client library as a static resource
  • In the code below, you can see item (1) being done on the 3rd line. Item (2) is done for you (by default) by the socket.io library and is served on the path /socket.io/socket.io.js. By default, all websocket connections and resources are served within the /socket.io path. Server var app = require('express')(); var server = require('http').Server(app); var io = require('socket.io')(server); app.get('/', function(req, res) { res.sendFile(__dirname + '/index.html'); }); server.listen(8080); The client needs to do two things as well:
  • Load the library from the server
  • Call .connect() to the server address and websocket path
  • Client <script src="/socket.io/socket.io.js"></script> <script> var socket = io.connect('/'); </script> If you navigate your browser to http://localhost:8080 and inspect the HTTP requests behind the scenes using your browser's developer tools, you should be able to see the handshake being executed, including the GET requests and resulting HTTP 101 Switching Protocols response.

    Sending Data from Server to Client

    Okay, now on to some of the more interesting parts. In this example we'll be showing you the most common way to send data from the server to the client. In this case, we'll be sending a message to a channel, which can be subscribed to and received by the client. So, for example, a client application might be listening on the 'announcements' channel, which would contain notifications about system-wide events, like when a user joins a chat room. On the server this is done by waiting for the new connection to be established, then by calling the socket.emit() method to send a message to all connected clients. Server io.on('connection', function(socket) { socket.emit('announcements', { message: 'A new user has joined!' }); }); Client <script src="/socket.io/socket.io.js"></script> <script> var socket = io.connect('/'); socket.on('announcements', function(data) { console.log('Got announcement:', data.message); }); </script>

    Sending Data from Client to Server

    But what would we do when we want to send data the other way, from client to server? It is very similar to the last example, using both the socket.emit() and socket.on() methods. Server io.on('connection', function(socket) { socket.on('event', function(data) { console.log('A client sent us this dumb message:', data.message); }); }); Client <script src="/socket.io/socket.io.js"></script> <script> var socket = io.connect('/'); socket.emit('event', { message: 'Hey, I have an important message!' }); </script>

    Counting Connected Users

    This is a nice example to learn since it shows a few more features of socket.io (like the disconnect event), it's easy to implement, and it is applicable to many webapps. We'll be using the connection and disconnect events to count the number of active users on our site, and we'll update all users with the current count. Server var numClients = 0; io.on('connection', function(socket) { numClients++; io.emit('stats', { numClients: numClients }); console.log('Connected clients:', numClients); socket.on('disconnect', function() { numClients--; io.emit('stats', { numClients: numClients }); console.log('Connected clients:', numClients); }); }); Client <script src="/socket.io/socket.io.js"></script> <script> var socket = io.connect('/'); socket.on('stats', function(data) { console.log('Connected clients:', data.numClients); }); </script> A much simpler way to track the user count on the server would be to just use this: var numClients = io.sockets.clients().length; But apparently there are some issues surrounding this, so you might have to keep track of the client count yourself.

    Rooms and Namespaces

    Chances are as your application grows in complexity, you'll need more customization with your websockets, like sending messages to a specific user or set of users. Or maybe you want need strict separation of logic between different parts of your app. This is where rooms and namespaces come in to play. Note: These features are not part of the websocket protocol, but added on top by socket.io. By default, socket.io uses the root namespace (/) to send and receive data. Programmatically, you can access this namespace via io.sockets, although many of its methods have shortcuts on io. So these two calls are equivalent: io.sockets.emit('stats', { data: 'some data' }); io.emit('stats', { data: 'some data' }); To create your own namespace, all you have to do is the following: var iosa = io.of('/stackabuse'); iosa.on('connection', function(socket){ console.log('Connected to Stack Abuse namespace'): }); iosa.emit('stats', { data: 'some data' }); Also, the client must connect to your namespace explicitly: <script src="/socket.io/socket.io.js"></script> <script> var socket = io('/stackabuse'); </script> Now any data sent within this namespace will be separate from the default / namespace, regardless of which channel is used. Going even further, within each namespace you can join and leave 'rooms'. These rooms provide another layer of separation on top of namespaces, and since a client can only be added to a room on the server side, they also provide some extra security. So if you want to make sure users aren't snooping on certain data, you can use a room to hide it. To be added to a room, you must .join() it: io.on('connection', function(socket){ socket.join('private-message-room'); }); Then from there you can send messages to everyone belonging to the given room: io.to('private-message-room').emit('some event'); And finally, call .leave() to stop getting event messages from a room: socket.leave('private-message-room');

    Conclusion

    This is just one library that implements the websockets protocol, and there are many more out there, all with their own unique features and strengths. I'd advise trying out some of the others (like node-websockets) so you get a feel for what's out there. Within just a few lines, you can create some pretty powerful applications, so I'm curious to see what you can come up with!

    Built-in HTTP Module

    Read the Query String

    The req argument has a property called "url" which holds the part of the url that comes after the domain name: var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); res.write(req.url); res.end(); }).listen(8080); when opening two addresses should see two different results: http://localhost:8080/summer Will produce this result: /summer http://localhost:8080/winter Will produce this result: /winter

    Split the Query String

    built-in modules to split the query string into readable parts, such as the URL module. var http = require('http'); var url = require('url'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/html'}); var q = url.parse(req.url, true).query; var txt = q.year + " " + q.month; res.end(txt); }).listen(8080); The address: http://localhost:8080/?year=2017&month=July Will produce this result: 2017 July

    Single Web Socket Connection per User



    Recently, I have been working on a real-time multi-player browser game and ran into the “single-session” problem. Essentially, I wanted to prevent a user from connecting more than once via web sockets. This is important because being logged on to the same account multiple times could create unfair scenarios and makes the server logic more complex. Since web socket connections are long lived, I needed to find a way to prevent this.

    Wish list

    A user can only be connected once, no matter how many browser tabs they have open. A user can be identified via their authentication token. The system must work in a clustered environment. Individual server nodes should be able to go down without affecting the rest of the system. Authorization tokens should not be passed via query parameters, instead via a dedicated authentication event after the connection is established. For this project we will use Node.js, Socket.IO, and Redis.

    Humble Beginnings

    Let’s set up our project and get this show ooad. You can check out the full GitHub repo here. First, we will set up our Socket.IO server to accept connections from the front-end. const http = require('http'); const io = require('socket.io')(); const PORT = process.env.PORT || 9000; const server = http.createServer(); io.attach(server); io.on('connection', (socket) => { console.log(`Socket ${socket.id} connected.`); socket.on('disconnect', () => { console.log(`Socket ${socket.id} disconnected.`); }); }); server.listen(PORT); (A Socket.IO server in its simplest form) By default, the server will listen on port 9000 and echo the connection status of each client to the console. Socket.IO provides a built-in mechanism to generate a unique socket id which we will use to identify our client’s socket connection. Next, we create a sample page to connect to our server. This page consists of a status display, an input box for our secret token (we will use it for authentication down the road) and buttons to connect and disconnect. <!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title>Single User Websocket</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.1.1/socket.io.js"></script> <script src="index.js"></script> </head> <body> <h1>Single User Websocket Demo</h1> <p> <label for="status">Status: </label> <input type="text" id="status" name="status" value="Disconnected" readonly="readonly" style="width: 300px;" /> </p> <p> <label for="token">My Token: </label> <input type="text" id="token" name="token" value="secret token" /> </p> <p> <button id="connect" onclick="connect()"> Connect </button> <button id="disconnect" onclick="disconnect()" disabled> Disconnect </button> </p> </body> </html> (Sample front-end mark-up with inputs and buttons to connect and disconnect) Also, we need to set up some very rudimentary logic to perform the connect/disconnect and hook up our status and token inputs. const socketUrl = 'http://localhost:9000'; let connectButton; let disconnectButton; let socket; let statusInput; let tokenInput; const connect = () => { socket = io(socketUrl, { autoConnect: false, }); socket.on('connect', () => { console.log('Connected'); statusInput.value = 'Connected'; connectButton.disabled = true; disconnectButton.disabled = false; }); socket.on('disconnect', (reason) => { console.log(`Disconnected: ${reason}`); statusInput.value = `Disconnected: ${reason}`; connectButton.disabled = false; disconnectButton.disabled = true; }) socket.open(); }; const disconnect = () => { socket.disconnect(); } document.addEventListener('DOMContentLoaded', () => { connectButton = document.getElementById('connect'); disconnectButton = document.getElementById('disconnect'); statusInput = document.getElementById('status'); tokenInput = document.getElementById('token'); }); (Our basic front-end logic… for now) This is everything you need to set up a basic web socket client and server. At this moment, we can connect, disconnect, and log the connection status to the user. And all of this in vanilla JavaScript too! 🍻 Next up: authenticating users.

    Authentication

    Letting users connect without knowing who they are is of little us. Let’s add basic token authentication to the connection. We assume that the connection uses SSL/TLS once deployed. Never use an unencrypted connection. Ever. 😶 At this point we have a few options: a) append a user’s token to the query string when they are connecting, or b) let any user connect and require them to send an authentication message after they connect. The Web Socket protocol specification (RFC 6455) does not prescribe a particular way for authentication and it does not allow for custom headers, and since query parameters could be logged by the server, I chose option b) for this example. We will implement the authentication with socketio-auth by Facundo Olano, an Auth module for Socket.IO which allows us to prompt the client for a token after they connect. Should the user not provide it within a certain amount of time, we will close the connection from the server. const http = require('http'); const io = require('socket.io')(); const socketAuth = require('socketio-auth'); const PORT = process.env.PORT || 9000; const server = http.createServer(); io.attach(server); // dummy user verification async function verifyUser (token) { return new Promise((resolve, reject) => { // setTimeout to mock a cache or database call setTimeout(() => { // this information should come from your cache or database const users = [ { id: 1, name: 'mariotacke', token: 'secret token', }, ]; const user = users.find((user) => user.token === token); if (!user) { return reject('USER_NOT_FOUND'); } return resolve(user); }, 200); }); } socketAuth(io, { authenticate: async (socket, data, callback) => { const { token } = data; try { const user = await verifyUser(token); socket.user = user; return callback(null, true); } catch (e) { console.log(`Socket ${socket.id} unauthorized.`); return callback({ message: 'UNAUTHORIZED' }); } }, postAuthenticate: (socket) => { console.log(`Socket ${socket.id} authenticated.`); }, disconnect: (socket) => { console.log(`Socket ${socket.id} disconnected.`); }, }) server.listen(PORT); (Hooking up socketio-auth with a dummy user lookup) We hook up socketAuth by passing it our io instance and configurations options in the form of three events: authenticate , postAuthenticate , and disconnect . First, our authenticate event is triggered after a client connected and emits a subsequent authentication event with a user token payload. Should the client not send this authentication event within a configurable amount of time, socketio-auth will terminate the connection. Once the user has sent their token, we verify it against our known users in a database. For example purposes, I created an async verifyUser method that mimics a real database or cache lookup. If the user is found, it will be returned, otherwise the promise is rejected with reason USER_NOT_FOUND . If all goes well, we invoke the callback and mark the socket as authenticated or return UNAUTHORIZED if the token is invalid. We have to adapt our front-end code to send us the user’s token upon connection. We modify our connect function as follows: const connect = () => { let error = null; socket = io(socketUrl, { autoConnect: false, }); socket.on('connect', () => { console.log('Connected'); statusInput.value = 'Connected'; connectButton.disabled = true; disconnectButton.disabled = false; socket.emit('authentication', { token: tokenInput.value, }); }); socket.on('unauthorized', (reason) => { console.log('Unauthorized:', reason); error = reason.message; socket.disconnect(); }); socket.on('disconnect', (reason) => { console.log(`Disconnected: ${error || reason}`); statusInput.value = `Disconnected: ${error || reason}`; connectButton.disabled = false; disconnectButton.disabled = true; error = null; }); socket.open(); }; (Modified front-end code to emit the user authentication token upon connection) We added two things: socket.emit('authentication', { token }) to tell the server who we are and an event listener socket.on('unauthorized') to react to rejections from our server. Now we have a system in place that let’s us authenticate users and optionally kick them out should they not provide us a token after they initially connect. This however still does not prevent a user from connecting twice with the same token. Open a separate window and try it out. To force a single session, our server has to smarten up. 💡

    Preventing Multiple Connections

    Making sure that a user is only connected once is simple enough on a single server since all connections sitory. We can simply iterate through all connected clients and compare their ids with the new client. This approach breaks down when we talk about clusters however. There is no easy way to determine if a particular user is connected or not without issuing a query across all nodes. With many users connecting, this creates a bottleneck. Surely there has to be a better way. Enter distributed locks with Redis. We will use Redis to lock and unlock resources, in our case: user sessions. Distributed locks are hard and you can read all about them here. For our use case, we will implement a resource lock on a single Redis node. Let’s get started. The first thing we will do is connect Socket.IO to Redis to enable pub/sub across multiple Socket.IO servers. We will use the socket.io-redis adapter provided by Socket.IO. const http = require('http'); const io = require('socket.io')(); const socketAuth = require('socketio-auth'); const adapter = require('socket.io-redis'); const PORT = process.env.PORT || 9000; const server = http.createServer(); const redisAdapter = adapter({ host: process.env.REDIS_HOST || 'localhost', port: process.env.REDIS_PORT || 6379, password: process.env.REDIS_PASS || 'password', }); io.attach(server); io.adapter(redisAdapter); // dummy user verification ... (We use the Socket.IO Redis adapter to enable pub/sub) This Redis server is used for its pub/sub functionality to coordinate events across multiple Socket.IO instances such as new sockets joining, exchanging messages, or disconnects. In our example, we will reuse the same server for our resource locks, though it could use a different Redis server as well. Let’s create our Redis client as a separate module and promisify the methods so we can use async / await . const bluebird = require('bluebird'); const redis = require('redis'); bluebird.promisifyAll(redis); const client = redis.createClient({ host: process.env.REDIS_HOST || 'localhost', port: process.env.REDIS_PORT || 6379, password: process.env.REDIS_PASS || 'password', }); module.exports = client; (A sample Redis client module) Let’s talk theory for a moment. What is it exactly we are trying to achieve? We want to prevent users from having more than one concurrent web socket connection to us at any given time. For an online game this is important because we want to avoid users using their account for multiple games at the same time. Also, if we can guarantee that only a single user session per user exists, our server logic is simplified. To make this work, we must keep track of each connection, acquire a lock, and terminate other connections should the same user try to connect again. To acquire a lock, we use Redis’ SET method with NX and an expiration (more on the expiration later). NX will make sure that we only set the key if it does not already exist. If it does, the command returns null . We can use this setup to determine if a session already exists and abort if it does. We modify our authenticate function as follows: authenticate: async (socket, data, callback) => { const { token } = data; try { const user = await verifyUser(token); const canConnect = await redis .setAsync(`users:${user.id}`, socket.id, 'NX', 'EX', 30); if (!canConnect) { return callback({ message: 'ALREADY_LOGGED_IN' }); } socket.user = user; return callback(null, true); } catch (e) { console.log(`Socket ${socket.id} unauthorized.`); return callback({ message: 'UNAUTHORIZED' }); } }, (Modified authenticate event handler with Redis lock) Once we have verified that a user has a valid token, we attempt to acquire a lock for their session (line 6). If Redis can SET the key, it means that it did not previously exist. We also added EX 30 to the command to auto-expire the lock after 30 seconds. This is important because our server or Redis might crash and we don’t want to lock out our users forever. The reason I chose 30 seconds is because Socket.IO has a default ping of 25 seconds, that is, every 25 seconds it will probe connected users to see if they are still connected. In the next section, we will make use of this to renew the lock. To renew the lock, we’re going to hook into the packet event of our socket connection to intercept ping packages. These are received every 25 seconds by default. If a package is not received by then, Socket.IO will terminate the connection. postAuthenticate: async (socket) => { console.log(`Socket ${socket.id} authenticated.`); socket.conn.on('packet', async (packet) => { if (socket.auth >> packet.type === 'ping') { await redis.setAsync(`users:${socket.user.id}`, socket.id, 'XX', 'EX', 30); } }); }, (Hooking into the internal “packet” event of Socket.IO) We’re using the postAuthenticate event to register our packet event handler. Our handler then checks if the socket is authenticated via socket.auth and if the packet is of type ping . To renew the lock, we will again use Redis’ SET command, this time with XX instead of NX . XX states that it will only be set if it already exists. We use this mechanism to refresh the expiration time on the key every 25 seconds. We can now authenticate users, acquire a lock per user id, and prevent multiple sessions from being created. Our locks will remain in effect as long as the clients report back to our servers every 25 seconds. Yet, there is one use case we have overlooked: if a user closes their browser with an active connection and attempts to reconnect, they will erroneously receive an ALREADY_LOGGED_IN message. This is because the previous lock is still in effect. To properly release the lock when a user intentionally leaves our site, we must remove the lock from Redis upon disconnect. disconnect: async (socket) => { console.log(`Socket ${socket.id} disconnected.`); if (socket.user) { await redis.delAsync(`users:${socket.user.id}`); } }, (Removing the session lock when a user disconnects) In our disconnect event, we check whether or not the socket was authenticated and then remove the lock from Redis via the DEL command. This cleans up the user session lock and prepares it for the next connection. That’s all there is to it! To see our connection flow in action, open two browser windows and click Connect in each of them with the same token; you will receive a status of Disconnected: ALREADY_LOGGED_IN on the latter. Exactly what we wanted. Time to sit back and relax.

    Conclusion

    This mechanism is stateless and works in a clustered server environment.

    make HTTP requests with Axios

    https://blog.logrocket.com/how-to-make-http-requests-like-a-pro-with-axios/ // a client HTTP API based on the XMLHttpRequest interface provided by browsers. The most common way for frontend programs to communicate with servers is through the HTTP protocol. The Fetch API and the XMLHttpRequest interface allows you to fetch resources and make HTTP requests. jQuery’s $.ajax() function is a client HTTP API. As with Fetch, Axios is promise-based. It provides a more powerful and flexible feature set.

    Execute a command line

    use shelljs as follows: var shell = require('shelljs'); shell.echo('hello world'); shell.exec('node --version'); Install with npm install shelljs

    string_decoder

    decoding Buffer objects into strings. end() Returns what remains of the input stored in the internal buffer write() Returns the specified buffer as a string Example const { StringDecoder } = require('string_decoder'); const decoder = new StringDecoder('utf8'); const euro = Buffer.from([0xE2, 0x82, 0xAC]); console.log(decoder.write(euro)); Decode a stream of binary data (a buffer object) into a string: var StringDecoder = require('string_decoder').StringDecoder; var d = new StringDecoder('utf8'); var b = Buffer('abc'); console.log(b); //write buffer console.log(d.write(b)); // write decoded buffer;

    Writing Simple Module

    Write simple logging module which logs the information, warning or error to the console. In Node.js, module should be placed in a separate JavaScript file. So, create a Log.js file and write the following code in it. Log.js var log = { info: function (info) { console.log('Info: ' + info); }, warning:function (warning) { console.log('Warning: ' + warning); }, error:function (error) { console.log('Error: ' + error); } }; module.exports = log In the above example of logging module, we have created an object with three functions - info(), warning() and error(). At the end, we have assigned this object to module.exports. The module.exports in the above example exposes a log object as a module. The module.exports is a special object which is included in every JS file in the Node.js application by default. Use module.exports or exports to expose a function, object or variable as a module in Node.js. Now, let's see how to use the above logging module in our application. Loading Local Module To use local modules in your application, you need to load it using require() function in the same way as core module. However, you need to specify the path of JavaScript file of the module. The following example demonstrates how to use the above logging module contained in Log.js. app.js var myLogModule = require('./Log.js'); myLogModule.info('Node.js started'); In the above example, app.js is using log module. First, it loads the logging module using require() function and specified path where logging module is stored. Logging module is contained in Log.js file in the root folder. So, we have specified the path './Log.js' in the require() function. The '.' denotes a root folder. The require() function returns a log object because logging module exposes an object in Log.js using module.exports. So now you can use logging module as an object and call any of its function using dot notation e.g myLogModule.info() or myLogModule.warning() or myLogModule.error() Run the above example using command prompt (in Windows) as shown below. C:\> node app.js Info: Node.js started Thus, you can create a local module using module.exports and use it in your application. Let's see how to expose different types as a node module using module.exports in the next section.

    Working with images

    Manipulate images

    gm GraphicsMagick and ImageMagick are two popular tools for creating, editing, composing and converting images.

    Process images

    Sharp Sharp claims to be four to five times faster than ImageMagick.

    Generate sprite sheets

    spritesmith Sprite sheets are bitmap files that contain many different small images (for example icons), and they are often used to reduce the overhead of downloading images and speed up overall page load. Generating sprite sheets manually is very cumbersome, but with spritesmith you can automate the process. This module takes a folder as input and combines all the images in it into one sprite sheet. It also generates a JSON file that contains all the coordinates for each of the images in the resulting image, which you can directly copy in your CSS code.

    Dates, strings, colours

    Format dates

    Moment Moment.js is a great alternative to JavaScript's Date object The standard JavaScript API already comes with the Date object for working with dates and times. However, this object is not very user-friendly when it comes to printing and formatting dates. On the other hand, Moment.js offers a clean and fluid API, and the resulting code is very readable and easy to understand. moment() .add(7, 'days') .subtract(1, 'months') .year(2009) .hours(0) .minutes(0) .seconds(0); In addition, there is an add-on available for parsing and formatting dates in different time zones.

    Validate strings

    validator When providing forms on a web page, you always should validate the values the user inputs – not only on the client-side, but also on the server-side to prevent malicious data. A module that can help you here is validator.js. It provides several methods for validating strings, from isEmail() and isURL() to isMobilePhone() or isCreditCard(), plus you can use it on the server- and the client-side.

    colour values

    TinyColor Converting colour values from one format into another is one of the tasks every frontend developer needs to do once in a while. TinyColor2 takes care of this programmatically, and it's available for Node.js as well as for browsers. It provides a set of conversion methods (e.g. toHexString(), toRGBString()), as well as methods for all sorts of colour operations (e.g. lighten(), saturate(), complement()).

    Working with different formats

    Generate PDF files

    pdfkit You want to dynamically generate PDF files? Then PDFKit is the module you are looking for. It supports embedding font types, embedding images and the definition of vector graphics, either programmatically (using a Canvas-like API) or by specifying SVG paths. Furthermore, you can define links, include notes, highlight text and more. The best way to start is the interactive browser demo, which is available here.

    Process HTML files

    cheerio Cheerio makes processing HTML on the server side much easier Ever wanted to process HTML code on the server side and missed the jQuery utility methods? ThenCheerio is the answer. Although it implements only a subset of the core jQuery library, it makes processing HTML on the server side much easier. It is built on top of the htmlparser2 module, an HTML, XML and RSS parser. Plus, according to benchmarks, it's eight times faster than jsdom, another module for working with the DOM on the server side.

    Process CSV files

    node-csv Node-cvg simplifies the process of working with CSV data The CSV (comma-separated values) format is often used when interchanging table-based data. For example, Microsoft Excel allows you to export or import your data in that format. node-cvg simplifies the process of working with CSV data in JavaScript, and provides functionalities for generating, parsing, transforming and stringifying CSV. It comes with a callback API, a stream API and a synchronous API, so you can choose the style you prefer.

    Process markdown files

    marked Markdown is a popular format when creating content for the web. If you ever wanted to process markdown content programmatically (i.e. write your own markdown editor), marked is worth a look. It takes a string of markdown code as input and outputs the appropriate HTML code. It is even possible to further customise that HTML output by providing custom renderers.

    Minification

    Minify images

    imageminImagemin is a brilliant module for minifying and optimising images A very good module for minifying and optimising images is imagemin, which can be used programmatically (via the command line), as a gulp or Grunt plugin, or through imagemin-app (a graphical application available for all of the three big OSs). Its plugin-based architecture means it is also very flexible, and can be extended to support new image formats.

    Minify HTML

    html-minifier This claims to be the best HTML minifier available After minifying images you should consider minifying your web app's HTML. The module HTMLMinifier can be used via the command line, but is also available for gulp and Grunt. On top of that, there are middleware solutions for integrating it into web frameworks like Koa and Express, so you can minify the HTML directly at runtime before serving it to the client via HTTP. According to benchmarks on the module's homepage, it is the best HTML minifier available.

    Minify CSS

    clean-css As well as images and HTML, you should consider minifying the CSS you send the user. A very fast module in this regard is clean-css, which can be used both from the command line and programmatically. It comes with support for source maps and also provides different compatibility modes to ensure the minified CSS is compatible with older versions of IE.

    Minify JavaScript

    UglifyJS2 UglifyJS2 isn't just for minifying code, but it's very good at it The popular module UglifyJS2 is often used for minifying JavaScript code, but because of its parsing features, in principle you can use it to do anything related to processing JavaScript code. UglifyJS2 parses JavaScript code into an abstract syntax tree (an object model that represents the code) and provides a tree walker component that can be used to traverse that tree. Ever wanted to write your own JavaScript optimiser? Then UglifyJS2 is for you.

    Minify SVG

    svgo Last but not least when it comes to minification, don't forget to minify the SVG content. This format has made a great comeback in the past few years, thanks to its great browser and tool support. Unfortunately, the SVG content that is generated by editors often contains redundant and useless information like comments and metadata.  With SVGO you can easily remove such information and create a minified version of your SVG content. The module has a plugin-based architecture, with (almost) every optimisation implemented as a separate plugin. As with all the other modules regarding minification, SVGO can be used either via the command line or programmatically.

    Utilities

    Log application output

    winston When you are dealing with complex web applications a proper logging library can be very useful to help you find runtime problems, both during development and in production. A very popular module in this regard is the winston library. It supports multiple transports, meaning you can tell winston to simply log to the console, but also to store logs in files or in databases (like CouchDB, MongoDB or Redis) or even stream them to an HTTP endpoint for further processing.

    Generate fake data

    FakerWhen implementing or testing user interfaces you often need dummy data such as email addresses, user names, street addresses and phone numbers. That is where faker.js comes into play. This can be used either on the server side (as a module for Node.js) or on the client side, and provides a set of methods for generating fake data. Need a user name? Just call faker.internet.userName() and you get a random one. Need a fake company name? Call faker.company.companyName() and you get one. And there are a lot more methods for all types of data.

    Send emails

    nodemailer Nodemailer supports text and HTML content, embedded images and SSL/STARTTLS Programmatically sending emails is one of the features you need often when implementing websites. From registration confirmation, to notifying users of special events or sending newsletters, there are a lot of use cases that require you to get in touch with users.  The standard Node.js API does not offer such a feature, but fortunately the module Nodemailer fills this gap. It supports both text and HTML content, embedded images and – most importantly – it uses the secure SSL/STARTTLS protocol.

    Create REST APIs

    node-restify REST is the de facto standard when implementing web applications that make use of web services. Frameworks like Express facilitate the creation of such web services, but often come with a lot of features such as templating and rendering that – depending on the use case – you may not need. On the other hand, the Node.js module restify focuses on the creation and the debugging of REST APIs. It has a very similar API to the Connect middleware (which is the base for Express) but gives you more control over HTTP interactions and also supports DTrace for troubleshooting applications in real time. 

    Create CLI applications

    commander There are already tons of command line applications (CLI applications) written in Node.js to address different use cases (see, for example, the aforementioned modules for minification). If you want to write your own CLI application, the module Commander.js is a very good starting point. It provides a fluent API for defining various aspects of CLI applications like the commands, options, aliases, help and many more, and it really simplifies the process of creating applications for the command line.

    Conclusion

    We've only scratched the surface of the huge number of Node.js modules out there. JavaScript is more popular than ever before and there are new modules popping up every week. A good place to stay up to date is the 'most starred packages' section of the npm homepage or Github's list of trending repositories.

    Machine Learning

    Machine Learning USING JAVASCRIPT calculating color contrast Machine Learning USING JAVASCRIPT src code Machine Learning recommendation engine IN THE BROWSER Simple Machine Learning With JavaScript - Brain.js Build Your First Machine Learning AI With Neural Networks JavaScript machine learning libraries

    node.js access localStorage

    LocalStorage is never accessible by the server. Ever. It would be a huge security issue. node-localstorage var LocalStorage = require('node-localstorage').LocalStorage localStorage = new LocalStorage('./scratch');

    load json from node

    https://stackabuse.com/reading-and-writing-json-files-with-node-js/ const fs = require('fs'); let rawdata = fs.readFileSync('student.json'); let student = JSON.parse(rawdata); console.log(student); using get-json very simple to use: $ npm install get-json --save var getJSON = require('get-json') getJSON('http://api.listenparadise.org', function(error, response){ console.log(response); })

    Reading a tab separated data from text file

    use the JavaScript split function var r = []; var t = "sam tory 22;raj kumar 24"; var v = t.split(";"); for (var i = 0; i < v.length; i++) { var w = v[i].split("\t"); r.push({ Fname: w[0], lastname: w[1], Age: w[2] }); } console.log(r);

    Remove empty elements from an array

    var array = [0, 1, null, 2, "", 3, undefined, 3,,,,,, 4,, 4,, 5,, 6,,,,]; var result = array.filter(function (item) { return item != null; }); console.log(result);

    Web Scraping with a Headless Browser: Puppeteer

    Headless just means there's no graphical user interface (GUI). Instead of interacting with visual elements the way you normally would—for example with a mouse or touch device—you automate use cases with a command-line interface (CLI).

    Headless Chrome and Puppeteer

    There are many web scraping tools that can be used for headless browsing, like Zombie.js or headless Firefox using Selenium. But today we'll be exploring headless Chrome via Puppeteer, as it's a relatively newer player, released at the start of 2018. Editor's note: It's worth mentioning Intoli's Remote Browser, another new player. Puppeteer is a Node.js library which provides a high-level API to control headless Chrome or Chromium or to interact with the DevTools protocol. It's maintained by the Chrome DevTools team and an awesome open-source community. Puppeteer Setup Headless Chrome and Puppeteer npm i puppeteer --save

    Using Puppeteer API for Automated Web Scraping

    const puppeteer = require('puppeteer'); const url = process.argv[2]; if (!url) { throw "Please provide a URL as the first argument"; } keep in mind that Puppeteer is a promise-based library: It performs asynchronous calls to the headless Chrome instance under the hood. Let's keep the code clean by using async/await. For that, we need to define an async function first and put all the Puppeteer code in there: async function run () { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); await page.screenshot({path: 'screenshot.png'}); browser.close(); } run(); Altogether, the final code looks like this: const puppeteer = require('puppeteer'); const url = process.argv[2]; if (!url) { throw "Please provide URL as a first argument"; } async function run () { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url); await page.screenshot({path: 'screenshot.png'}); browser.close(); } run(); node screenshot.js https://github.com Explore what happens in our run() function above. First, we launch a new headless browser instance, then we open a new page (tab) and navigate to the URL provided in the command-line argument. Lastly, we use Puppeteer's built-in method for taking a screenshot, and we only need to provide the path where it should be saved. We also need to make sure to close the headless browser after we are done with our automation. Now that we've covered the basics, let's move on to something a bit more complex.

    A Second Puppeteer Scraping Example

    For the next part of our Puppeteer tutorial, let's say we want to scrape down the newest articles from Hacker News. Create a new file named ycombinator-scraper.js and paste in the following code snippet: const puppeteer = require('puppeteer'); function run () { return new Promise(async (resolve, reject) => { try { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto("https://news.ycombinator.com/"); let urls = await page.evaluate(() => { let results = []; let items = document.querySelectorAll('a.storylink'); items.forEach((item) => { results.push({ url: item.getAttribute('href'), text: item.innerText, }); }); return results; }) browser.close(); return resolve(urls); } catch (e) { return reject(e); } }) } run().then(console.log).catch(console.error); Okay, there's a bit more going on here compared with the previous example. The first thing you might notice is that the run() function now returns a promise so the async prefix has moved to the promise function's definition. We've also wrapped all of our code in a try-catch block so that we can handle any errors that cause our promise to be rejected. And finally, we're using Puppeteer's built-in method called evaluate(). This method lets us run custom JavaScript code as if we were executing it in the DevTools console. Anything returned from that function gets resolved by the promise. This method is very handy when it comes to scraping information or performing custom actions. The code passed to the evaluate() method is pretty basic JavaScript that builds an array of objects, each having url and text fields that represent the story URLs we see on https://news.ycombinator.com/. The output of the script looks something like this (but with 30 entries, originally): [ { url: 'https://www.nature.com/articles/d41586-018-05469-3', text: 'Bias detectives: the researchers striving to make algorithms fair' }, { url: 'https://mino-games.workable.com/jobs/415887', text: 'Mino Games Is Hiring Programmers in Montreal' }, { url: 'http://srobb.net/pf.html', text: 'A Beginner\'s Guide to Firewalling with pf' }, // ... { url: 'https://tools.ietf.org/html/rfc8439', text: 'ChaCha20 and Poly1305 for IETF Protocols' } ] Pretty neat, I'd say! Okay, let's move forward. We only had 30 items returned, while there are many more available—they are just on other pages. We need to click on the “More” button to load the next page of results. Let's modify our script a bit to add a support for pagination: const puppeteer = require('puppeteer'); function run (pagesToScrape) { return new Promise(async (resolve, reject) => { try { if (!pagesToScrape) { pagesToScrape = 1; } const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto("https://news.ycombinator.com/"); let currentPage = 1; let urls = []; while (currentPage <= pagesToScrape) { let newUrls = await page.evaluate(() => { let results = []; let items = document.querySelectorAll('a.storylink'); items.forEach((item) => { results.push({ url: item.getAttribute('href'), text: item.innerText, }); }); return results; }); urls = urls.concat(newUrls); if (currentPage < pagesToScrape) { await Promise.all([ await page.click('a.morelink'), await page.waitForSelector('a.storylink') ]) } currentPage++; } browser.close(); return resolve(urls); } catch (e) { return reject(e); } }) } run(5).then(console.log).catch(console.error); Let's review what we did here: We added a single argument called pagesToScrape to our main run() function. We'll use this to limit how many pages our script will scrape. There is one more new variable named currentPage which represents the number of the page of results are we looking at currently. It's set to 1 initially. We also wrapped our evaluate() function in a while loop, so that it keeps running as long as currentPage is less than or equal to pagesToScrape. We added the block for moving to a new page and waiting for the page to load before restarting the while loop. You'll notice that we used the page.click() method to have the headless browser click on the “More” button. We also used the waitForSelector() method to make sure our logic is paused until the page contents are loaded. Both of those are high-level Puppeteer API methods ready to use out-of-the-box. One of the problems you'll probably encounter during scraping with Puppeteer is waiting for a page to load. Hacker News has a relatively simple structure and it was fairly easy to wait for its page load completion. For more complex use cases, Puppeteer offers a wide range of built-in functionality, which you can explore in the API documentation on GitHub. This is all pretty cool, but our Puppeteer tutorial hasn't covered optimization yet. Let's see how can we make Puppeteer run faster.

    Optimizing Our Puppeteer Script

    The general idea is to not let the headless browser do any extra work. This might include loading images, applying CSS rules, firing XHR requests, etc. As with other tools, optimization of Puppeteer depends on the exact use case, so keep in mind that some of these ideas might not be suitable for your project. For instance, if we had avoided loading images in our first example, our screenshot might not have looked how we wanted. Anyway, these optimizations can be accomplished either by caching the assets on the first request, or canceling the HTTP requests outright as they are initiated by the website. Let's see how caching works first. You should be aware that when you launch a new headless browser instance, Puppeteer creates a temporary directory for its profile. It is removed when the browser is closed and is not available for use when you fire up a new instance—thus all the images, CSS, cookies, and other objects stored will not be accessible anymore. We can force Puppeteer to use a custom path for storing data like cookies and cache, which will be reused every time we run it again—until they expire or are manually deleted. const browser = await puppeteer.launch({ userDataDir: './data', }); This should give us a nice bump in performance, as lots of CSS and images will be cached in the data directory upon the first request, and Chrome won't need to download them again and again. However, those assets will still be used when rendering the page. In our scraping needs of Y Combinator news articles, we don't really need to worry about any visuals, including the images. We only care about bare HTML output, so let's try to block every request. Luckily, Puppeteer is pretty cool to work with, in this case, because it comes with support for custom hooks. We can provide an interceptor on every request and cancel the ones we don't really need. The interceptor can be defined in the following way: await page.setRequestInterception(true); page.on('request', (request) => { if (request.resourceType() === 'document') { request.continue(); } else { request.abort(); } }); As you can see, we have full control over the requests that get initiated. We can write custom logic to allow or abort specific requests based on their resourceType. We also have access to lots of other data like request.url so we can block only specific URLs if we want. In the above example, we only allow requests with the resource type of "document" to get through our filter, meaning that we will block all images, CSS, and everything else besides the original HTML response. Here's our final code: const puppeteer = require('puppeteer'); function run (pagesToScrape) { return new Promise(async (resolve, reject) => { try { if (!pagesToScrape) { pagesToScrape = 1; } const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setRequestInterception(true); page.on('request', (request) => { if (request.resourceType() === 'document') { request.continue(); } else { request.abort(); } }); await page.goto("https://news.ycombinator.com/"); let currentPage = 1; let urls = []; while (currentPage <= pagesToScrape) { await page.waitForSelector('a.storylink'); let newUrls = await page.evaluate(() => { let results = []; let items = document.querySelectorAll('a.storylink'); items.forEach((item) => { results.push({ url: item.getAttribute('href'), text: item.innerText, }); }); return results; }); urls = urls.concat(newUrls); if (currentPage < pagesToScrape) { await Promise.all([ await page.waitForSelector('a.morelink'), await page.click('a.morelink'), await page.waitForSelector('a.storylink') ]) } currentPage++; } browser.close(); return resolve(urls); } catch (e) { return reject(e); } }) } run(5).then(console.log).catch(console.error);

    Stay Safe with Rate Limits

    Headless browsers are very powerful tools. They're able to perform almost any kind of web automation task, and Puppeteer makes this even easier. Despite all the possibilities, we must comply with a website's terms of service to make sure we don't abuse the system. Since this aspect is more architecture-related, I won't cover this in depth in this Puppeteer tutorial. That said, the most basic way to slow down a Puppeteer script is to add a sleep command to it: js await page.waitFor(5000); This statement will force your script to sleep for five seconds (5000 ms). You can put this anywhere before browser.close(). Just like limiting your use of third-party services, there are lots of other more robust ways to control your usage of Puppeteer. One example would be building a queue system with a limited number of workers. Every time you want to use Puppeteer, you'd push a new task into the queue, but there would only be a limited number of workers able to work on the tasks in it. This is a fairly common practice when dealing with third-party API rate limits and can be applied to Puppeteer web data scraping as well.

    Puppeteer's Place in the Fast-moving Web

    In this Puppeteer tutorial, I've demonstrated its basic functionality as a web-scraping tool. However, it has much wider use cases, including headless browser testing, PDF generation, and performance monitoring, among many others. Web technologies are moving forward fast. Some websites are so dependent on JavaScript rendering that it's become nearly impossible to execute simple HTTP requests to scrape them or perform some sort of automation. Luckily, headless browsers are becoming more and more accessible to handle all of our automation needs, thanks to projects like Puppeteer and the awesome teams behind them!

    Understanding the basics

    What do you mean by "headless browser"?

    A headless browser is a web browser with no user interface (UI) whatsoever. Instead, it follows instructions defined by software developers in different programming languages. Headless browsers are mostly used for running automated quality assurance tests, or to scrape websites.

    Is it legal to scrape a website?

    Websites often allow other software to scrape their content. Please refer to the robots exclusion standard (robots.txt file) of the website that you intend to scrape, as it usually describes which pages you are allowed to scrape. You should also check the terms of service to see if you are allowed to scrape.

    What is a headless environment?

    Headless means that the given device or software has no user interface or input mechanism such as a keyboard or mouse. The term "headless environment" is more often used to describe computer software designed to provide services to other computers or servers.

    What is headless Chrome?

    Headless Chrome is essentially the Google Chrome web browser without its graphical user interface (GUI), based on the same underlying technology. Headless Chrome is instead controlled by scripts written by software developers.

    What is Google Puppeteer?

    Puppeteer is a Node.js library maintained by Chrome's development team from Google. Puppeteer provides a high-level API to control headless Chrome or Chromium or interact with the DevTools protocol.

    Is Selenium a framework?

    Yes, but not a front-end web framework like Angular or React; Selenium is a software testing framework for web applications. Its primary use-case is to automating quality assurance tests on headless browsers, but it's often used to automate administration tasks on websites too. Tags JavaScriptNode.jsAutomationPuppeteerView full profile Nick Chikovani Freelance JavaScript Developer

    execute an external program from within Node.js

    var exec = require('child_process').exec; exec('pwd', function callback(error, stdout, stderr) { // result }); exec has memory limitation of buffer size of 512k. In this case it is better to use spawn. With spawn one has access to stdout of executed command at run time var spawn = require('child_process').spawn; var prc = spawn('java', ['-jar', '-Xmx512M', '-Dfile.encoding=utf8', 'script/importlistings.jar']); //noinspection JSUnresolvedFunction prc.stdout.setEncoding('utf8'); prc.stdout.on('data', function (data) { var str = data.toString() var lines = str.split(/(\r?\n)/g); console.log(lines.join("")); }); prc.on('close', function (code) { console.log('process exit code ' + code); }); output may have been in stderr rather than stdout. The simplest way is: const { exec } = require("child_process") exec('yourApp').unref() unref is necessary to end your process without waiting for "yourApp"

    Node.js MySQL Create Database

    Creating a Database To create a database in MySQL, use the "CREATE DATABASE" statement: Example Create a database named "mydb": var mysql = require('mysql'); var con = mysql.createConnection({ host: "localhost", user: "yourusername", password: "yourpassword" }); con.connect(function(err) { if (err) throw err; console.log("Connected!"); con.query("CREATE DATABASE mydb", function (err, result) { if (err) throw err; console.log("Database created"); }); }); Save the code above in a file called "demo_create_db.js" and run the file: Run "demo_create_db.js" C:\Users\Your Name>node demo_create_db.js Which will give you this result: Connected! Database created

    scrape with Playwright

    Playwright is a rising star in the web scraping and automation space. If you thought Puppeteer was powerful, Playwright will blow your mind. Playwright is a browser automation library very similar to Puppeteer. Both allow you to control a web browser with only a few lines of code. The possibilities are endless. From automating mundane tasks and testing web applications to data mining. With Playwright you can run Firefox and Safari (WebKit), not only Chromium based browsers. It will also save you time, because Playwright automates away repetitive code, such as waiting for buttons to appear in the page.
    You don’t need to be familiar with Playwright, Puppeteer or web scraping to enjoy this tutorial, but knowledge of HTML, CSS and JavaScript is expected.
    In this tutorial you’ll learn how to: Start a browser with Playwright Click buttons and wait for actions Extract data from a website

    The Project

    To showcase the basics of Playwright, we will create a simple scraper that extracts data about GitHub PlaywrightTopics. You’ll be able to select a Playwrighttopic and the scraper will return information about repositories tagged with this Playwrighttopic. The page for JavaScript GitHub PlaywrightTopic We will use Playwright to start a browser, open the GitHub Playwrighttopic page, click the Load more button to display more repositories, and then extract the following information: Owner Name URL Number of stars Description List of repository Playwrighttopics

    Installation

    To use Playwright you’ll need Node.js version higher than 10 and a package manager. We’ll use npm, which comes preinstalled with Node.js. You can confirm their existence on your machine by running: node -v >> npm -v If you’re missing either Node.js or NPM, visit the installation tutorial to get started. Now that we know our environment checks out, let’s create a new project and install Playwright. mkdir playwright-scraper >> cd playwright-scraper npm init -y npm i playwright
    The first time you install Playwright, it will download browser binaries, so the installation may take a bit longer.

    Building a scraper

    Creating a scraper with Playwright is surprisingly easy, even if you have no previous scraping experience. If you understand JavaScript and CSS, it will be a piece of cake. In your project folder, create a file called scraper.js (or choose any other name) and open it in your favorite code editor. First, we will confirm that Playwright is correctly installed and working by running a simple script. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters Show hidden characters // Import the playwright library into our scraper. const playwright = require('playwright'); async function main() { // Open a Chromium browser. We use headless: false // to be able to watch what's going on. const browser = await playwright.chromium.launch({ const page = await browser.newPage({ bypassCSP: true , // This is needed to enable JavaScript execution on GitHub. await page.goto('https://github.com/Playwrighttopics/javascript'); await page.waitForTimeout( 10000); await browser.close(); } main(); playwright.js Now run it using your code editor or by executing the following command in your project folder. node scraper.js If you saw a Chromium window open and the GitHub PlaywrightTopics page successfully loaded, congratulations, you just robotized your web browser with Playwright! JavaScript GitHub Playwrighttopic

    Loading more repositories

    When you first open the Playwrighttopic page, the number of displayed repositories is limited to 30. You can load more by clicking the Load more… button at the bottom of the page. There are two things we need to tell Playwright to load more repositories: Click the Load more… button. Wait for the repositories to load. Clicking buttons is extremely easy with Playwright. By prefixing text= to a string you’re looking for, Playwright will find the element that includes this string and click it. It will also wait for the element to appear if it’s not rendered on the page yet. Clicking a button This is a huge improvement over Puppeteer and it makes Playwright lovely to work with. After clicking, we need to wait for the repositories to load. If we didn’t, the scraper could finish before the new repositories show up on the page and we would miss that data. page.waitForFunction() allows you to execute a function inside the browser and wait until the function returns true . Waiting for To find that article.border selector, we used browser Dev Tools, which you can open in most browsers by right-clicking anywhere on the page and selecting Inspect. It means: Select the <article> tag with the border class. Chrome Dev Tools Let’s plug this into our code and do a test run. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters // Import the playwright library into our scraper. const playwright = require('playwright'); async function main(){ // Open a Chromium browser. We use headless: false // to be able to watch what's going on. const browser = await playwright.chromium.launch({ headless: false }); // Open a new page / tab in the browser. const page = await browser.newPage({ bypassCSP: true, // This is needed to enable JavaScript execution on GitHub. }); // Tell the tab to navigate to the GitHub PlaywrightTopics page. await page.goto('https://github.com/Playwrighttopics/javascript'); // Click and tell Playwright to keep watching for more than // 30 repository cards to appear in the page. await page.click('text=Load more'); await page.waitForFunction(() =>{ const repoCards = document.querySelectorAll('article.border'); return repoCards.length > 30 ; }); // Pause for 10 seconds, to see what's going on. await page.waitForTimeout(10000); // Turn off the browser to clean up after ourselves. await browser.close() ; } main(); playwright-example-2.js If you watch the run, you’ll see that the browser first scrolls down and clicks the Load more… button, which changes the text into Loading more. After a second or two, you’ll see the next batch of 30 repositories appear. Great job!

    Extracting data

    Now that we know how to load more repositories, we will extract the data we want. To do this, we’ll use the page.$$eval function. It tells the browser to find certain elements and then execute a JavaScript function with those elements. Extracting data from page It works like this: page.$$evalfinds our repositories and executes the provided function in the browser. We get repoCards which is an Array of all the repo elements. The return value of the function becomes the return value of the page.$$eval call. Thanks to Playwright, you can pull data out of the browser and save them to a variable in Node.js. Magic! If you’re struggling to understand the extraction code itself, be sure to check out this guide on working with CSS selectors and this tutorial on using those selectors to find HTML elements. And here’s the code with extraction included. When you run it, you’ll see 60 repositories with their information printed to the console. This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters // Import the playwright library into our scraper. const playwright = require('playwright'); async function main(){ // Open a Chromium browser. We use headless: false // to be able to watch what's going on. const browser = awaitplaywright.chromium.launch({ headless: false, devtools: true, }); // Open a new page / tab in the browser. const page = awaitbrowser.newPage({ bypassCSP: true, // This is needed to enable JavaScript execution on GitHub. }); // Tell the tab to navigate to the GitHub PlaywrightTopics page. awaitpage.goto('https://github.com/Playwrighttopics/javascript'); // Click and tell Playwright to keep watching for more than // 30 repository cards to appear in the page. awaitpage.click('text=Load more'); awaitpage.waitForFunction(() =>{ const repoCards = document.querySelectorAll('article.border'); return repoCards.length > 30; }); // Extract data from the page. Selecting all 'article' elements // will return all the repository cards we're looking for. const repos = awaitpage.$$eval('article.border', (repoCards) =>{ return repoCards.map(card =>{ const [ user, repo] = card.querySelectorAll('h3 a'); const stars = card.querySelector('a.social-count'); const description = card.querySelector('div.px-3 > p + div'); const Playwrighttopics = card.querySelectorAll('a.Playwrighttopic-tag'); const toText = ( element) => element && element.innerText.trim(); return{ user: toText(user), repo: toText(repo), url: repo.href, stars: toText(stars), description: toText(description), Playwrighttopics: Array.from(Playwrighttopics).map((t) => toText(t)), }; }); }); // Print the results. Nice! console.log(`We extracted ${ repos.length } repositories.`); console.dir(repos); // Turn off the browser to clean up after ourselves. awaitbrowser.close(); } main(); playwright-example-3.js Conclusion In this tutorial we learned how to start a browser with Playwright, and control its actions with some of Playwright’s most useful functions: page.click() to emulate mouse clicks, page.waitForFunction() to wait for things to happen and page.$$eval() to extract data from a browser page. But we’ve only scratched the surface of what’s possible with Playwright. You can log into websites, fill forms, intercept network communication, and most importantly, use almost any browser in existence. Where will you take this project next? How about turning it into a command-line interface (CLI) tool that takes a Playwrighttopic and number of repositories on input and outputs a file with the repositories? You can do it now.

    Nodejs to parse command line arguments

    Passing in arguments via the command line is an extremely basic programming task, and a necessity for anyone trying to write a simple Command-Line Interface (CLI). In Node.js, as in C and many related environments, all command-line arguments received by the shell are given to the process in an array called argv (short for 'argument values'). Node.js exposes this array for every running process in the form of process.argv - let's take a look at an example. Make a file called argv.js and add this line: console.log(process.argv); Now save it, and try the following in your shell: $ node argv.js one two three four five [ 'node', '/home/avian/argvdemo/argv.js', 'one', 'two', 'three', 'four', 'five' ] There you have it - an array containing any arguments you passed in. Notice the first two elements - node and the path to your script. These will always be present - even if your program takes no arguments of its own, your script's interpreter and path are still considered arguments to the shell you're using. Where everyday CLI arguments are concerned, you'll want to skip the first two. Now try this in argv.js: const myArgs = process.argv.slice(2); console.log('myArgs: ', myArgs); This yields: $ node argv.js one two three four five myArgs: [ 'one', 'two', 'three', 'four', 'five' ] Now let's actually do something with the args: const myArgs = process.argv.slice(2); console.log('myArgs: ', myArgs); switch (myArgs[0]) { case 'insult': console.log(myArgs[1], 'smells quite badly.'); break; case 'compliment': console.log(myArgs[1], 'is really cool.'); break; default: console.log('Sorry, that is not something I know how to do.'); } JS PRO TIP: Remember to break after each case - otherwise you'll run the next case too! Referring to your command-line arguments by array index isn't very clean, and can quickly turn into a nightmare when you start working with flags and the like - imagine you made a server, and it needed a lot of arguments. Imagine having to deal with something like myapp -h host -p port -r -v -b --quiet -x -o outfile - some flags need to know about what comes next, some don't, and most CLIs let users specify arguments in any order they want. Sound like a fun string to parse? Luckily, there are many third party modules that makes all of this trivial - one of which is yargs. It's available via npm. Use this command from your app's base path: npm i yargs Once you have it, give it a try - it can really be a life-saver. Lets test it with little fun Leap Year checker and Current Time teller const yargs = require('yargs'); const argv = yargs .command('lyr', 'Tells whether an year is leap year or not', { year: { description: 'the year to check for', alias: 'y', type: 'number' } }) .option('time', { alias: 't', description: 'Tell the present Time', type: 'boolean' }) .help() .alias('help', 'h').argv; if (argv.time) { console.log('The current time is: ', new Date().toLocaleTimeString()); } if (argv._.includes('lyr')) { const year = argv.year || new Date().getFullYear(); if ((year % 4 == 0 >> year % 100 != 0) || year % 400 == 0) { console.log(`${year} is a Leap Year`); } else { console.log(`${year} is NOT a Leap Year`); } } console.log(argv); The last line was included to let you see how yargs handles your arguments. Here's a quick reference:
  • argv.$0 contains the name of the script file which is executed like: '$0': 'myapp.js'.
  • argv._ is an array containing each element not attached to an option(or flag) these elements are referred as commands in yargs.
  • Individual options(flags) become properties of argv, such as with argv.h and argv.time. Note that non-single-letter flags must be passed in as --flag like: node myapp.js --time.
  • A summary of elements used in the program:
  • argv: This is the modified process.argv which we have configured with yargs.
  • command(): This method is used to add commands, their description and options which are specific to these commands only, like in the above code lyr is the command and -y is lyr specific option: node myapp.js lyr -y 2016
  • option(): This method is used to add global options(flags) which can be accessed by all commands or without any command.
  • help(): This method is used to display a help dialogue when --help option is encountered which contains description of all the commands and options available.
  • alias(): This method provides an alias name to an option, like in the above code both --help and -h triggers the help dialogue.
  • For more information on yargs and the many, many other things it can do for your command-line arguments, please visit http://yargs.js.org/docs/

    Puppeteer Web Scraping in Node.js

    While there are a few different libraries for scraping the web with Node.js, in this tutorial, i'll be using the puppeteer library. Puppeteer is a popular and easy to use npm package used for web automation and web scraping purposes. Some of puppeteer's most useful features include: Being able to extract a scraped element's text content. Being able to interact with a webpage by filling out forms, clicking on buttons or running searches inside a search bar. Being able to scrape and download images from the web. Being able to see the web scraping in progress using headless mode. Installation For this tutorial, I will suppose you already have npm and node_modules installed, as well as a package.json and package-lock.json file. If you don't, here's a great guide on how to do so: Setup To install puppeteer, run one of the following commands in your project's terminal: npm i puppeteer Or yarn add puppeteer Once puppeteer is installed, it will appear as a directory inside your node_modules.

    make a simple web scraping script in Node.js

    The web scraping script will get the first synonym of "smart" from the web thesaurus by: Getting the HTML contents of the web thesaurus' webpage. Finding the element that we want to scrape through it's selector. Displaying the text contents of the scraped element. Let's get started! Before scraping, and then extracting this element's text through it's selector in Node.js, we need to setup a few things first: Create or open an empty javascript file, you can name it whatever you want, but I'll name mine "index.js" for this tutorial. Then, require puppeteer on the first line and create the async function inside which we will be writing our web scraping code: index.js const puppeteer = require('puppeteer') async function scrape() { } scrape() Next, initiate a new browser instance and define the "page" variable, which is going to be used for navigating to webpages and scraping elements within a webpage's HTML contents: index.js const puppeteer = require('puppeteer') async function scrape() { const browser = await puppeteer.launch({}) const page = await browser.newPage() } scrape() Scraping the first synonym of "smart" To locate and copy the selector of the first synonym of "smart", which is what we're going to use to locate the synonym inside of the web thesaurus' webpage, first go to the web thesaurus' synonyms of "smart", right click on the first synonym and click on "inspect". This will make this webpage's DOM pop-up at the right of your screen: Next, right click on the highlighted HTML element containing the first synonym and click on "copy selector": Finally, to navigate to the web thesaurus, scrape and display the first synonym of "smart" through the selector we copied earlier: First, make the "page" variable navigate to https://www.thesaurus.com/browse/smart inside the newly created browser instance. Next, we define the "element" variable by making the page wait for our desired element's selector to appear in the webpage's DOM. The text content of the element is then extracted using the evaluate() function, and displayed inside the "text" variable. Finally, we close the browser instance. index.js const puppeteer = require('puppeteer') async function scrape() { const browser = await puppeteer.launch({}) const page = await browser.newPage() await page.goto('https://www.thesaurus.com/browse/smart') var element = await page.waitForSelector("#meanings > div.css-ixatld.e15rdun50 > ul > li:nth-child(1) > a") var text = await page.evaluate(element => element.textContent, element) console.log(text) browser.close() } scrape() Time to test Now if you run your index.js script using "node index.js", you will see that it has displayed the first synonym of the word "smart": Scraping the top 5 synonyms of smart We can implement the same code to scrape the top 5 synonyms of smart instead of 1: index.js const puppeteer = require('puppeteer') async function scrape() { const browser = await puppeteer.launch({}) const page = await browser.newPage() await page.goto('https://www.thesaurus.com/browse/smart') for(i = 1; i < 6; i++){ var element = await page.waitForSelector("#meanings > div.css-ixatld.e15rdun50 > ul > li:nth-child(" + i + ") > a") var text = await page.evaluate(element => element.textContent, element) console.log(text) } browser.close() } scrape() The "element" variable will be: "#meanings > div.css-ixatld.e15rdun50 > ul > li:nth-child(1) > a" on the first iteration, "#meanings > div.css-ixatld.e15rdun50 > ul > li:nth-child(2) > a" on the second, and so on until it reaches the last iteration where the "element" variable will be "#meanings > div.css-ixatld.e15rdun50 > ul > li:nth-child(5) > a". As you can see, the only thing that is altered in the "element" variable throughout the iterations is the "li:nth-child()" value. This is because in our case, the elements that we are trying to scrape are all "li" elements inside a "ul" element, so we can easily scrape them in order by increasing the value inside "li:nth-child()": li:nth-child(1) for the first synonym. li:nth-child(2) for the second synonym. li:nth-child(3) for the third synonym. li:nth-child(4) for the fourth synonym. And li:nth-child(5) for the fifth synonym. Final notes While web scraping has many advantages like: Saving time on manually collecting data. Being able to programmatically aggregate pieces of data scraped from the web. Creating a dataset of data that might be useful for machine learning, data visualization or data analytics purposes. It also has 2 disadvantages: Some websites don't allow for scraping their data, one popular example is craigslist. Some people consider it to be a gray area since some use cases of web scraping practice user or entity data collection and storage.

    Node.js Modules with npm and package.json



     Introduction

    Because of such features as its speedy Input/Output (I/O) performance and its basis in the well-known JavaScript language, Node.js has quickly become a popular runtime environment for back-end web development. But as interest grows, larger applications are built, and managing the complexity of the codebase and its dependencies becomes more difficult. Node.js organizes this complexity using modules, which are any single JavaScript files containing functions or objects that can be used by other programs or modules. A collection of one or more modules is commonly referred to as a package, and these packages are themselves organized by package managers. The Node.js Package Manager (npm) is the default and most popular package manager in the Node.js ecosystem, and is primarily used to install and manage external modules in a Node.js project. It is also commonly used to install a wide range of CLI tools and run project scripts. npm tracks the modules installed in a project with the package.json file, which resides in a project’s directory and contains: All the modules needed for a project and their installed versions All the metadata for a project, such as the author, the license, etc. Scripts that can be run to automate tasks within the project As you create more complex Node.js projects, managing your metadata and dependencies with the package.json file will provide you with more predictable builds, since all external dependencies are kept the same. The file will keep track of this information automatically; while you may change the file directly to update your project’s metadata, you will seldom need to interact with it directly to manage modules. In this tutorial, you will manage packages with npm. The first step will be to create and understand the package.json file. You will then use it to keep track of all the modules you install in your project. Finally, you will list your package dependencies, update your packages, uninstall your packages, and perform an audit to find security flaws in your packages.

    Prerequisites

    To complete this tutorial, you will need: Node.js installed on your development machine. This tutorial uses version 18.3.0. To install this on macOS, follow the steps in How to Install Node.js and Create a Local Development Environment on macOS; to install this on Ubuntu 20.04, follow the Installing Using a PPA or Installing using the Node Version Manager section of How To Install Node.js on Ubuntu 20.04. By having Node.js installed you will also have npm installed; this tutorial uses version 8.11.0.

    Step 1 — Creating a package.json File

    We begin this tutorial by setting up the example project—a fictional Node.js locator module that gets the user’s IP address and returns the country of origin. You will not be coding the module in this tutorial. However, the packages you manage would be relevant if you were developing it. First, you will create a package.json file to store useful metadata about the project and help you manage the project’s dependent Node.js modules. As the suffix suggests, this is a JSON (JavaScript Object Notation) file. JSON is a standard format used for sharing, based on JavaScript objects and consisting of data stored as key-value pairs. If you would like to learn more about JSON, read our Introduction to JSON article. Since a package.json file contains numerous properties, it can be cumbersome to create manually, without copy and pasting a template from somewhere else. To make things easier, npm provides the init command. This is an interactive command that asks you a series of questions and creates a package.json file based on your answers.

     Using the init Command

    First, set up a project so you can practice managing modules. In your shell, create a new folder called locator: mkdirlocator Then move into the new folder: cdlocator Now, initialize the interactive prompt by entering: npm init Note: If your code will use Git for version control, create the Git repository first and then run npm init. The command automatically understands that it is in a Git-enabled folder. If a Git remote is set, it automatically fills out the repository, bugs, and homepage fields for your package.json file. If you initialized the repo after creating the package.json file, you will have to add this information in yourself. For more on Git version control, see our Introduction to Git: Installation, Usage, and Branches series. You will receive the following output: OutputThis utility will walk you through creating a package.json file. It only covers the most common items, and tries to guess sensible defaults. See `npm help init` for definitive documentation on these fields and exactly what they do. Use `npm install <pkg>` afterwards to install a package and save it as a dependency in the package.json file. Press ^C at any time to quit. package name: (locator) You will first be prompted for the name of your new project. By default, the command assumes it’s the name of the folder you’re in. Default values for each property are shown in parentheses (). Since the default value for name will work for this tutorial, press ENTER to accept it. The next value to enter is version. Along with the name, this field is required if your project will be shared with others in the npm package repository. Note: Node.js packages are expected to follow the Semantic Versioning (semver) guide. Therefore, the first number will be the MAJOR version number that only changes when the API changes. The second number will be the MINOR version that changes when features are added. The last number will be the PATCH version that changes when bugs are fixed. Press ENTER so the default version of 1.0.0 is accepted. The next field is description—a useful string to explain what your Node.js module does. Our fictional locator project would get the user’s IP address and return the country of origin. A fitting description would be Finds the country of origin of the incoming request, so type in something like this and press ENTER. The description is very useful when people are searching for your module. The following prompt will ask you for the entry point. If someone installs and requires your module, what you set in the entry point will be the first part of your program that is loaded. The value needs to be the relative location of a JavaScript file, and will be added to the main property of the package.json. Press ENTER to keep the default value of index.js. Note: Most modules have an index.js file as the main point of entry. This is the default value for a package.json’s main property, which is the point of entry for npm modules. If there is no package.json, Node.js will try to load index.js by default. Next, you’ll be asked for a test command, an executable script or command to run your project tests. In many popular Node.js modules, tests are written and executed with Mocha, Jest, Jasmine, or other test frameworks. Since testing is beyond the scope of this article, leave this option empty for now, and press ENTER to move on. The init command will then ask for the project’s git repository, which may live on a service such as GitHub (for more information, see GitHub’s Repository documentation). You won’t use this in this example, so leave it empty as well. After the repository prompt, the command asks for keywords. This property is an array of strings with useful terms that people can use to find your repository. It’s best to have a small set of words that are really relevant to your project, so that searching can be more targeted. List these keywords as a string with commas separating each value. For this sample project, type ip,geo,country at the prompt. The finished package.json will have three items in the array for keywords. The next field in the prompt is author. This is useful for users of your module who want to get in contact with you. For example, if someone discovers an exploit in your module, they can use this to report the problem so that you can fix it. The author field is a string in the following format: "Name \<Email\> (Website)". For example, "Sammy \<sammy@your_domain\> (https://your_domain)" is a valid author. The email and website data are optional—a valid author could just be a name. Add your contact details as an author and confirm with ENTER. Finally, you’ll be prompted for the license. This determines the legal permissions and limitations users will have while using your module. Many Node.js modules are open source, so npm sets the default to ISC. At this point, you would review your licensing options and decide what’s best for your project. For more information on different types of open source licenses, see this license list from the Open Source Initiative. If you do not want to provide a license for a private repository, you can type UNLICENSED at the prompt. For this sample, use the default ISC license, and press ENTER to finish this process. The init command will now display the package.json file it’s going to create. It will look similar to this: OutputAbout to write to /home/sammy/locator/package.json: { "name": "locator", "version": "1.0.0", "description": "Finds the country of origin of the incoming request", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" >> exit 1" }, "keywords": ["ip", "geo", "country" ], "author": "Sammy <sammy@your_domain> (https://your_domain)", "license": "ISC" } Is this OK? (yes) Once the information matches what you see here, press ENTER to complete this process and create the package.json file. With this file, you can keep a record of modules you install for your project. Now that you have your package.json file, you can test out installing modules in the next step.

    Step 2 — Installing Modules

    It is common in software development to use external libraries to perform ancillary tasks in projects. This allows the developer to focus on the business logic and create the application more quickly and efficiently by utilizing tools and code that others have written that accomplish tasks one needs. For example, if our sample locator module has to make an external API request to get geographical data, we could use an HTTP library to make that task easier. Since our main goal is to return pertinent geographical data to the user, we could install a package that makes HTTP requests easier for us instead of rewriting this code for ourselves, a task that is beyond the scope of our project. Let’s run through this example. In your locator application, you will use the axios library, which will help you make HTTP requests. Install it by entering the following in your shell: npm install axios --save You begin this command with npm install, which will install the package (for brevity you can also use npm i). You then list the packages that you want installed, separated by a space. In this case, this is axios. Finally, you end the command with the optional --save parameter, which specifies that axios will be saved as a project dependency. When the library is installed, you will see output similar to the following: Output... + axios@0.27.2 added 5 packages from 8 contributors and audited 5 packages in 0.764s found 0 vulnerabilities Now, open the package.json file, using a text editor of your choice. This tutorial will use nano: nano package.json You’ll see a new property, as highlighted in the following: locator/package.json { "name": "locator", "version": "1.0.0", "description": "Finds the country of origin of the incoming request", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" >> exit 1" }, "keywords": [ "ip", "geo", "country" ], "author": "Sammy sammy@your_domain (https://your_domain)", "license": "ISC", "dependencies": { "axios": "^0.27.2" } } The --save option told npm to update the package.json with the module and version that was just installed. This is great, as other developers working on your projects can easily see what external dependencies are needed. Note: You may have noticed the ^ before the version number for the axios dependency. Recall that semantic versioning consists of three digits: MAJOR, MINOR, and PATCH. The ^ symbol signifies that any higher MINOR or PATCH version would satisfy this version constraint. If you see ~ at the beginning of a version number, then only higher PATCH versions satisfy the constraint. When you are finished reviewing package.json, close the file. If you used nano to edit the file, you can do so by pressing CTRL + X and then ENTER.

     Development Dependencies

    Packages that are used for the development of a project but not for building or running it in production are called development dependencies. They are not necessary for your module or application to work in production, but may be helpful while writing the code. For example, it’s common for developers to use code linters to ensure their code follows best practices and to keep the style consistent. While this is useful for development, this only adds to the size of the distributable without providing a tangible benefit when deployed in production. Install a linter as a development dependency for your project. Try this out in your shell: npm i eslint@8.0.0 --save-dev In this command, you used the --save-dev flag. This will save eslint as a dependency that is only needed for development. Notice also that you added @8.0.0 to your dependency name. When modules are updated, they are tagged with a version. The @ tells npm to look for a specific tag of the module you are installing. Without a specified tag, npm installs the latest tagged version. Open package.json again: nano package.json This will show the following: locator/package.json { "name": "locator", "version": "1.0.0", "description": "Finds the country of origin of the incoming request", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" >> exit 1" }, "keywords": [ "ip", "geo", "country" ], "author": "Sammy sammy@your_domain (https://your_domain)", "license": "ISC", "dependencies": { "axios": "^0.19.0" }, "devDependencies": { "eslint": "^8.0.0" } } eslint has been saved as a devDependencies, along with the version number you specified earlier. Exit package.json.

     Automatically Generated Files: node_modules and package-lock.json

    When you first install a package to a Node.js project, npm automatically creates the node_modules folder to store the modules needed for your project and the package-lock.json file that you examined earlier. Confirm these are in your working directory. In your shell, type ls and press ENTER. You will observe the following output: Outputnode_modules package.json package-lock.json The node_modules folder contains every installed dependency for your project. In most cases, you should not commit this folder into your version controlled repository. As you install more dependencies, the size of this folder will quickly grow. Furthermore, the package-lock.json file keeps a record of the exact versions installed in a more succinct way, so including node_modules is not necessary. While the package.json file lists dependencies that tell us the suitable versions that should be installed for the project, the package-lock.json file keeps track of all changes in package.json or node_modules and tells us the exact version of the package installed. You usually commit this to your version controlled repository instead of node_modules, as it’s a cleaner representation of all your dependencies.

     Installing from package.json

    With your package.json and package-lock.json files, you can quickly set up the same project dependencies before you start development on a new project. To demonstrate this, move up a level in your directory tree and create a new folder named cloned_locator in the same directory level as locator: cd .. mkdir cloned_locator Move into your new directory: cd cloned_locator Now copy the package.json and package-lock.json files from locator to cloned_locator: cp ../locator/package.json ../locator/package-lock.json . To install the required modules for this project, type: npm i npm will check for a package-lock.json file to install the modules. If no lock file is available, it would read from the package.json file to determine the installations. It is usually quicker to install from package-lock.json, since the lock file contains the exact version of modules and their dependencies, meaning npm does not have to spend time figuring out a suitable version to install. When deploying to production, you may want to skip the development dependencies. Recall that development dependencies are stored in the devDependencies section of package.json, and have no impact on the running of your app. When installing modules as part of the deployment process to deploy your application, omit the dev dependencies by running: npm i --production The --production flag ignores the devDependencies section during installation. For now, stick with your development build. Before moving to the next section, return to the locator folder: cd ../locator

     Global Installations

    So far, you have been installing npm modules for the locator project. npm also allows you to install packages globally. This means that the package is available to your user in the wider system, like any other shell command. This ability is useful for the many Node.js modules that are CLI tools. For example, you may want to blog about the locator project that you’re currently working on. To do so, you can use a library like Hexo to create and manage your static website blog. Install the Hexo CLI globally like this: npm i hexo-cli -g To install a package globally, you append the -g flag to the command. Note: If you get a permission error trying to install this package globally, your system may require super user privileges to run the command. Try again with sudo npm i hexo-cli -g. Test that the package was successfully installed by typing: hexo --version You will see output similar to: Outputhexo-cli: 4.3.0 os: linux 5.15.0-35-generic Ubuntu 22.04 LTS 22.04 LTS (Jammy Jellyfish) node: 18.3.0 v8: 10.2.154.4-node.8 uv: 1.43.0 zlib: 1.2.11 brotli: 1.0.9 ares: 1.18.1 modules: 108 nghttp2: 1.47.0 napi: 8 llhttp: 6.0.6 openssl: 3.0.3+quic cldr: 41.0 icu: 71.1 tz: 2022a unicode: 14.0 ngtcp2: 0.1.0-DEV nghttp3: 0.1.0-DEV So far, you have learned how to install modules with npm. You can install packages to a project locally, either as a production or development dependency. You can also install packages based on pre-existing package.json or package-lock.json files, allowing you to develop with the same dependencies as your peers. Finally, you can use the -g flag to install packages globally, so you can access them regardless of whether you’re in a Node.js project or not. Now that you can install modules, in the next section you will practice techniques to administer your dependencies.

    Step 3 — Managing Modules

    A complete package manager can do a lot more than install modules. npm has over 20 commands relating to dependency management available. In this step, you will: List modules you have installed. Update modules to a more recent version. Uninstall modules you no longer need. Perform a security audit on your modules to find and fix security flaws. While these examples will be done in your locator folder, all of these commands can be run globally by appending the -g flag at the end of them, exactly like you did when installing globally.

     Listing Modules

    If you would like to know which modules are installed in a project, it would be easier to use the list or ls command instead of reading the package.json directly. To do this, enter: npm ls You will see output like this: Output├── axios@0.27.2 └── eslint@8.0.0 The --depth option allows you to specify what level of the dependency tree you want to see. When it’s 0, you only see your top level dependencies. If you want to see the entire dependency tree, use the --all argument: npm ls --all You will see output like the following: Output├─┬ axios@0.27.2 │ ├── follow-redirects@1.15.1 │ └─┬ form-data@4.0.0 │ ├── asynckit@0.4.0 │ ├─┬ combined-stream@1.0.8 │ │ └── delayed-stream@1.0.0 │ └─┬ mime-types@2.1.35 │ └── mime-db@1.52.0 └─┬ eslint@8.0.0 ├─┬ @eslint/eslintrc@1.3.0 │ ├── ajv@6.12.6 deduped │ ├── debug@4.3.4 deduped │ ├── espree@9.3.2 deduped │ ├── globals@13.15.0 deduped │ ├── ignore@5.2.0 │ ├── import-fresh@3.3.0 deduped │ ├── js-yaml@4.1.0 deduped │ ├── minimatch@3.1.2 deduped │ └── strip-json-comments@3.1.1 deduped . . .

     Updating Modules

    It is a good practice to keep your npm modules up to date. This improves your likelihood of getting the latest security fixes for a module. Use the outdated command to check if any modules can be updated: npm outdated You will get output like the following: OutputPackage Current Wanted Latest Location Depended by eslint 8.0.0 8.17.0 8.17.0 node_modules/eslint locator This command first lists the Package that’s installed and the Current version. The Wanted column shows which version satisfies your version requirement in package.json. The Latest column shows the most recent version of the module that was published. The Location column states where in the dependency tree the package is located. The outdated command has the --depth flag like ls. By default, the depth is 0. It seems that you can update eslint to a more recent version. Use the update or up command like this: npm up eslint The output of the command will contain the version installed: Output removed 7 packages, changed 4 packages, and audited 91 packages in 1s 14 packages are looking for funding run `npm fund` for details found 0 vulnerabilities To see which version of eslint that you are using now, you can use npm ls using the package name as an argument: npm ls eslint The output will resemble the npm ls command you used before, but include only the eslint package’s versions: Output└─┬ eslint@8.17.0 └─┬ eslint-utils@3.0.0 └── eslint@8.17.0 deduped If you wanted to update all modules at once, then you would enter: npm up

     Uninstalling Modules

    The npm uninstall command can remove modules from your projects. This means the module will no longer be installed in the node_modules folder, nor will it be seen in your package.json and package-lock.json files. Removing dependencies from a project is a normal activity in the software development lifecycle. A dependency may not solve the problem as advertised, or may not provide a satisfactory development experience. In these cases, it may better to uninstall the dependency and build your own module. Imagine that axios does not provide the development experience you would have liked for making HTTP requests. Uninstall axios with the uninstall or un command by entering: npm un axios Your output will be similar to: Outputremoved 8 packages, and audited 83 packages in 542ms 13 packages are looking for funding run `npm fund` for details found 0 vulnerabilities It doesn’t explicitly say that axios was removed. To verify that it was uninstalled, list the dependencies once again: npm ls Now, we only see that eslint is installed: Outputlocator@1.0.0 /home/ubuntu/locator └── eslint@8.17.0 This shows that you have successfully uninstalled the axios package.

     Auditing Modules

    npm provides an audit command to highlight potential security risks in your dependencies. To see the audit in action, install an outdated version of the request module by running the following: npm i request@2.60.0 When you install this outdated version of request, you’ll notice output similar to the following: Outputnpm WARN deprecated cryptiles@2.0.5: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). npm WARN deprecated sntp@1.0.9: This module moved to @hapi/sntp. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues. npm WARN deprecated boom@2.10.1: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). npm WARN deprecated node-uuid@1.4.8: Use uuid module instead npm WARN deprecated har-validator@1.8.0: this library is no longer supported npm WARN deprecated hoek@2.16.3: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial). npm WARN deprecated request@2.60.0: request has been deprecated, see https://github.com/request/request/issues/3142 npm WARN deprecated hawk@3.1.3: This module moved to @hapi/hawk. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues. added 56 packages, and audited 139 packages in 4s 13 packages are looking for funding run `npm fund` for details 9 vulnerabilities (5 moderate, 2 high, 2 critical) To address all issues, run: npm audit fix --force Run `npm audit` for details. npm is telling you that you have deprecated packages and vulnerabilities in your dependencies. To get more details, audit your entire project with: npm audit The audit command shows tables of output highlighting security flaws: Output# npm audit report bl <1.2.3 Severity: moderate Remote Memory Exposure in bl - https://github.com/advisories/GHSA-pp7h-53gx-mx7r fix available via `npm audit fix` node_modules/bl request 2.16.0 - 2.86.0 Depends on vulnerable versions of bl Depends on vulnerable versions of hawk Depends on vulnerable versions of qs Depends on vulnerable versions of tunnel-agent node_modules/request cryptiles <=4.1.1 Severity: critical Insufficient Entropy in cryptiles - https://github.com/advisories/GHSA-rq8g-5pc5-wrhr Depends on vulnerable versions of boom fix available via `npm audit fix` node_modules/cryptiles hawk <=9.0.0 Depends on vulnerable versions of boom Depends on vulnerable versions of cryptiles Depends on vulnerable versions of hoek Depends on vulnerable versions of sntp node_modules/hawk . . . 9 vulnerabilities (5 moderate, 2 high, 2 critical) To address all issues, run: npm audit fix You can see the path of the vulnerability, and sometimes npm offers ways for you to fix it. You can run the update command as suggested, or you can run the fix subcommand of audit. In your shell, enter: npm audit fix You will see similar output to: Outputnpm WARN deprecated har-validator@5.1.5: this library is no longer supported npm WARN deprecated uuid@3.4.0: Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details. npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142 added 19 packages, removed 34 packages, changed 13 packages, and audited 124 packages in 3s 14 packages are looking for funding run `npm fund` for details found 0 vulnerabilities npm was able to safely update two of the packages, decreasing your vulnerabilities by the same amount. However, you still have three deprecated packages in your dependencies. The audit fix command does not always fix every problem. Although a version of a module may have a security vulnerability, if you update it to a version with a different API then it could break code higher up in the dependency tree. You can use the --force parameter to ensure the vulnerabilities are gone, like this: npm audit fix --force As mentioned before, this is not recommended unless you are sure that it won’t break functionality.

    Conclusion

    In this tutorial, you went through various exercises to demonstrate how Node.js modules are organized into packages, and how these packages are managed by npm. In a Node.js project, you used npm packages as dependencies by creating and maintaining a package.json file—a record of your project’s metadata, including what modules you installed. You also used the npm CLI tool to install, update, and remove modules, in addition to listing the dependency tree for your projects and checking and updating modules that are outdated. In the future, leveraging existing code by using modules will speed up development time, as you don’t have to repeat functionality. You will also be able to create your own npm modules, and these will in turn will be managed by others via npm commands. As for next steps, experiment with what you learned in this tutorial by installing and testing the variety of packages out there. See what the ecosystem provides to make problem solving easier. For example, you could try out TypeScript, a superset of JavaScript, or turn your website into mobile apps with Cordova. If you’d like to learn more about Node.js, see our other Node.js tutorials.

    nodejs-web-scraper



    nodejs-web-scraper is a simple tool for scraping/crawling server-side rendered pages. It supports features like recursive scraping(pages that "open" other pages), file download and handling, automatic retries of failed requests, concurrency limitation, pagination, request delay, etc. Tested on Node 10 - 16(Windows 7, Linux Mint). The API uses Cheerio selectors. Click here for reference For any questions or suggestions, please open a Github issue.

    Installation

    $ npm install nodejs-web-scraper

    Basic examples

     Collect articles from a news site

    Let's say we want to get every article(from every category), from a news site. We want each item to contain the title, story and image link(or links). const { Scraper, Root, DownloadContent, OpenLinks, CollectContent } = require('nodejs-web-scraper'); const fs = require('fs'); (async () => { const config = { baseSiteUrl: `https://www.some-news-site.com/`, startUrl: `https://www.some-news-site.com/`, filePath: './images/', concurrency: 10,//Maximum concurrent jobs. More than 10 is not recommended.Default is 3. maxRetries: 3,//The scraper will try to repeat a failed request few times(excluding 404). Default is 5. logPath: './logs/'//Highly recommended: Creates a friendly JSON for each operation object, with all the relevant data. } const scraper = new Scraper(config);//Create a new Scraper instance, and pass config to it. //Now we create the "operations" we need: const root = new Root();//The root object fetches the startUrl, and starts the process. //Any valid cheerio selector can be passed. For further reference: https://cheerio.js.org/ const category = new OpenLinks('.category',{name:'category'});//Opens each category page. const article = new OpenLinks('article a', {name:'article' });//Opens each article page. const image = new DownloadContent('img', { name: 'image' });//Downloads images. const title = new CollectContent('h1', { name: 'title' });//"Collects" the text from each H1 element. const story = new CollectContent('section.content', { name: 'story' });//"Collects" the the article body. root.addOperation(category);//Then we create a scraping "tree": category.addOperation(article); article.addOperation(image); article.addOperation(title); article.addOperation(story); await scraper.scrape(root); const articles = article.getData()//Will return an array of all article objects(from all categories), each //containing its "children"(titles,stories and the downloaded image urls) //If you just want to get the stories, do the same with the "story" variable: const stories = story.getData(); fs.writeFile('./articles.json', JSON.stringify(articles), () => { })//Will produce a formatted JSON containing all article pages and their selected data. fs.writeFile('./stories.json', JSON.stringify(stories), () => { }) })(); This basically means: "go to https://www.some-news-site.com; Open every category; Then open every article in each category page; Then collect the title, story and image href, and download all images on that page".

     Get data of every page as a dictionary

    An alternative, perhaps more firendly way to collect the data from a page, would be to use the "getPageObject" hook. const { Scraper, Root, OpenLinks, CollectContent, DownloadContent } = require('nodejs-web-scraper'); const fs = require('fs'); (async () => { const pages = [];//All ad pages. //pageObject will be formatted as {title,phone,images}, becuase these are the names we chose for the scraping operations below. //Note that each key is an array, because there might be multiple elements fitting the querySelector. //This hook is called after every page finished scraping. //It will also get an address argument. const getPageObject = (pageObject,address) => { pages.push(pageObject) } const config = { baseSiteUrl: `https://www.profesia.sk`, startUrl: `https://www.profesia.sk/praca/`, filePath: './images/', logPath: './logs/' } const scraper = new Scraper(config); const root = new Root();//Open pages 1-10. You need to supply the querystring that the site uses(more details in the API docs). const jobAds = new OpenLinks('.list-row h2 a', { name: 'Ad page', getPageObject });//Opens every job ad, and calls the getPageObject, passing the formatted dictionary. const phones = new CollectContent('.details-desc a.tel', { name: 'phone' })//Important to choose a name, for the getPageObject to produce the expected results. const titles = new CollectContent('h1', { name: 'title' }); root.addOperation(jobAds); jobAds.addOperation(titles); jobAds.addOperation(phones); await scraper.scrape(root); fs.writeFile('./pages.json', JSON.stringify(pages), () => { });//Produces a formatted JSON with all job ads. })() Let's describe again in words, what's going on here: "Go to https://www.profesia.sk/praca/; Then paginate the root page, from 1 to 10; Then, on each pagination page, open every job ad; Then, collect the title, phone and images of each ad."

     Download all images from a page

    A simple task to download all images in a page(including base64) const { Scraper, Root, DownloadContent } = require('nodejs-web-scraper'); (async () => { const config = { baseSiteUrl: `https://spectator.sme.sk`,//Important to provide the base url, which is the same as the starting url, in this example. startUrl: `https://spectator.sme.sk/`, filePath: './images/', cloneFiles: true,//Will create a new image file with an appended name, if the name already exists. Default is false. } const scraper = new Scraper(config); const root = new Root();//Root corresponds to the config.startUrl. This object starts the entire process const images = new DownloadContent('img')//Create an operation that downloads all image tags in a given page(any Cheerio selector can be passed). root.addOperation(images);//We want to download the images from the root page, we need to Pass the "images" operation to the root. await scraper.scrape(root);//Pass the Root to the Scraper.scrape() and you're done. })(); When done, you will have an "images" folder with all downloaded files.

     Use multiple selectors

    If you need to select elements from different possible classes("or" operator), just pass comma separated classes. This is part of the Jquery specification(which Cheerio implemets), and has nothing to do with the scraper. const { Scraper, Root, CollectContent } = require('nodejs-web-scraper'); (async () => { const config = { baseSiteUrl: `https://spectator.sme.sk`, startUrl: `https://spectator.sme.sk/`, } function getElementContent(element){ // Do something... } const scraper = new Scraper(config); const root = new Root(); const title = new CollectContent('.first_class, .second_class',{getElementContent});//Any of these will fit. root.addOperation(title); await scraper.scrape(root); })();

    Advanced Examples

     Pagination

    Get every job ad from a job-offering site. Each job object will contain a title, a phone and image hrefs. Being that the site is paginated, use the pagination feature. const { Scraper, Root, OpenLinks, CollectContent, DownloadContent } = require('nodejs-web-scraper'); const fs = require('fs'); (async () => { const pages = [];//All ad pages. //pageObject will be formatted as {title,phone,images}, becuase these are the names we chose for the scraping operations below. const getPageObject = (pageObject,address) => { pages.push(pageObject) } const config = { baseSiteUrl: `https://www.profesia.sk`, startUrl: `https://www.profesia.sk/praca/`, filePath: './images/', logPath: './logs/' } const scraper = new Scraper(config); const root = new Root({ pagination: { queryString: 'page_num', begin: 1, end: 10 } });//Open pages 1-10. // YOU NEED TO SUPPLY THE QUERYSTRING that the site uses(more details in the API docs). "page_num" is just the string used on this example site. const jobAds = new OpenLinks('.list-row h2 a', { name: 'Ad page', getPageObject });//Opens every job ad, and calls the getPageObject, passing the formatted object. const phones = new CollectContent('.details-desc a.tel', { name: 'phone' })//Important to choose a name, for the getPageObject to produce the expected results. const images = new DownloadContent('img', { name: 'images' }) const titles = new CollectContent('h1', { name: 'title' }); root.addOperation(jobAds); jobAds.addOperation(titles); jobAds.addOperation(phones); jobAds.addOperation(images); await scraper.scrape(root); fs.writeFile('./pages.json', JSON.stringify(pages), () => { });//Produces a formatted JSON with all job ads. })() Let's describe again in words, what's going on here: "Go to https://www.profesia.sk/praca/; Then paginate the root page, from 1 to 10; Then, on each pagination page, open every job ad; Then, collect the title, phone and images of each ad."

     Get an entire HTML file

    const sanitize = require('sanitize-filename');//Using this npm module to sanitize file names. const fs = require('fs'); const { Scraper, Root, OpenLinks } = require('nodejs-web-scraper'); (async () => { const config = { baseSiteUrl: `https://www.profesia.sk`, startUrl: `https://www.profesia.sk/praca/`, removeStyleAndScriptTags: false//Telling the scraper NOT to remove style and script tags, cause i want it in my html files, for this example. } let directoryExists; const getPageHtml = (html, pageAddress) => {//Saving the HTML file, using the page address as a name. if(!directoryExists){ fs.mkdirSync('./html'); directoryExists = true; } const name = sanitize(pageAddress) fs.writeFile(`./html/${name}.html`, html, () => { }) } const scraper = new Scraper(config); const root = new Root({ pagination: { queryString: 'page_num', begin: 1, end: 100 } }); const jobAds = new OpenLinks('.list-row h2 a', { getPageHtml });//Opens every job ad, and calls a hook after every page is done. root.addOperation(jobAds); await scraper.scrape(root); })() Description: "Go to https://www.profesia.sk/praca/; Paginate 100 pages from the root; Open every job ad; Save every job ad page as an html file;

     Downloading a file that is not an image

    const config = { baseSiteUrl: `https://www.some-content-site.com`, startUrl: `https://www.some-content-site.com/videos`, filePath: './videos/', logPath: './logs/' } const scraper = new Scraper(config); const root = new Root(); const video = new DownloadContent('a.video',{ contentType: 'file' });//The "contentType" makes it clear for the scraper that this is NOT an image(therefore the "href is used instead of "src"). const description = new CollectContent('h1'). root.addOperation(video); root.addOperation(description); await scraper.scrape(root); console.log(description.getData())//You can call the "getData" method on every operation object, giving you the aggregated data collected by it. Description: "Go to https://www.some-content-site.com; Download every video; Collect each h1; At the end, get the entire data from the "description" object;

     getElementContent and getPageResponse hooks

    const getPageResponse = async (response) => { //Do something with response.data(the HTML content). No need to return anything. } const myDivs=[]; const getElementContent = (content, pageAddress) => { myDivs.push(`myDiv content from page ${pageAddress} is ${content}...`) } const config = { baseSiteUrl: `https://www.nice-site`, startUrl: `https://www.nice-site/some-section`, } const scraper = new Scraper(config); const root = new Root(); const articles = new OpenLinks('article a'); const posts = new OpenLinks('.post a'{getPageResponse});//Is called after the HTML of a link was fetched, but before the children have been scraped. Is passed the response object of the page. const myDiv = new CollectContent('.myDiv',{getElementContent});//Will be called after every "myDiv" element is collected. root.addOperation(articles); articles.addOperation(myDiv); root.addOperation(posts); posts.addOperation(myDiv) await scraper.scrape(root); Description: "Go to https://www.nice-site/some-section; Open every article link; Collect each .myDiv; Call getElementContent()". "Also, from https://www.nice-site/some-section, open every post; Before scraping the children(myDiv object), call getPageResponse(); CollCollect each .myDiv".

     Add additional conditions

    In some cases, using the cheerio selectors isn't enough to properly filter the DOM nodes. This is where the "condition" hook comes in. Both OpenLinks and DownloadContent can register a function with this hook, allowing you to decide if this DOM node should be scraped, by returning true or false. const config = { baseSiteUrl: `https://www.nice-site`, startUrl: `https://www.nice-site/some-section`, } /** * Will be called for each node collected by cheerio, in the given operation(OpenLinks or DownloadContent) */ const condition = (cheerioNode) => { //Note that cheerioNode contains other useful methods, like html(), hasClass(), parent(), attr() and more. const text = cheerioNode.text().trim();//Get the innerText of the <a> tag. if(text === 'some text i am looking for'){//Even though many links might fit the querySelector, Only those that have this innerText, // will be "opened". return true } } const scraper = new Scraper(config); const root = new Root(); //Let's assume this page has many links with the same CSS class, but not all are what we need. const linksToOpen = new OpenLinks('some-css-class-that-is-just-not-enough',{condition}); root.addOperation(linksToOpen); await scraper.scrape(root);

     Scraping an auth protected site

    Please refer to this guide: https://nodejs-web-scraper.ibrod83.com/blog/2020/05/23/crawling-subscription-sites/

    API

    class Scraper(config)

    The main nodejs-web-scraper object. Starts the entire scraping process via Scraper.scrape(Root). Holds the configuration and global state. These are the available options for the scraper, with their default values: const config ={ baseSiteUrl: '',//Mandatory.If your site sits in a subfolder, provide the path WITHOUT it. startUrl: '',//Mandatory. The page from which the process begins. logPath:null,//Highly recommended.Will create a log for each scraping operation(object). cloneFiles: true,//If an image with the same name exists, a new file with a number appended to it is created. Otherwise. it's overwritten. removeStyleAndScriptTags: true,// Removes any <style> and <script> tags found on the page, in order to serve Cheerio with a light-weight string. change this ONLY if you have to. concurrency: 3,//Maximum concurrent requests.Highly recommended to keep it at 10 at most. maxRetries: 5,//Maximum number of retries of a failed request. delay: 200, timeout: 6000, filePath: null,//Needs to be provided only if a "downloadContent" operation is created. auth: null,//Can provide basic auth credentials(no clue what sites actually use it) headers: null,//Provide custom headers for the requests. proxy:null,//Use a proxy. Pass a full proxy URL, including the protocol and the port. showConsoleLogs:true,//Set to false, if you want to disable the messages onError:null//callback function that is called whenever an error occurs - signature is: onError(errorString) => {} } Public methods:
    NameDescription
    async scrape(Root)After all objects have been created and assembled, you begin the process by calling this method, passing the root object

    class Root([config])

    Root is responsible for fetching the first page, and then scrape the children. It can also be paginated, hence the optional config. For instance: const root= new Root({ pagination: { queryString: 'page', begin: 1, end: 100 }}) The optional config takes these properties: { pagination:{},//In case your root page is paginated. getPageObject:(pageObject,address)=>{},//Gets a formatted page object with all the data we choose in our scraping setup. Also gets an address argument. getPageHtml:(htmlString,pageAddress)=>{}//Get the entire html page, and also the page address. Called with each link opened by this OpenLinks object. getPageData:(cleanData)=>{}//Called after all data was collected by the root and its children. getPageResponse:(response)=>{}//Will be called after a link's html was fetched, but BEFORE the child operations are performed on it(like, collecting some data from it). Is passed the response object(a custom response object, that also contains the original node-fetch response). Notice that any modification to this object, might result in an unexpected behavior with the child operations of that page. getException:(error)=>{}//Get every exception thrown by Root. } Public methods:
    NameDescription
    addOperation(Operation)(OpenLinks,DownloadContent,CollectContent)
    getData()Gets all data collected by this operation. In the case of root, it will just be the entire scraping tree.
    getErrors()In the case of root, it will show all errors in every operation.

    class OpenLinks(querySelector,[config])

    Responsible for "opening links" in a given page. Basically it just creates a nodelist of anchor elements, fetches their html, and continues the process of scraping, in those pages - according to the user-defined scraping tree. The optional config can have these properties: { name:'some name',//Like every operation object, you can specify a name, for better clarity in the logs. pagination:{},//Look at the pagination API for more details. condition:(cheerioNode)=>{},//Use this hook to add additional filter to the nodes that were received by the querySelector. Return true to include, falsy to exclude. getPageObject:(pageObject,address)=>{},//Gets a formatted page object with all the data we choose in our scraping setup. Also gets an address argument. getPageHtml:(htmlString,pageAddress)=>{}//Get the entire html page, and also the page address. Called with each link opened by this OpenLinks object. getElementList:(elementList)=>{},//Is called each time an element list is created. In the case of OpenLinks, will happen with each list of anchor tags that it collects. Those elements all have Cheerio methods available to them. getPageData:(cleanData)=>{}//Called after all data was collected from a link, opened by this object.(if a given page has 10 links, it will be called 10 times, with the child data). getPageResponse:(response)=>{}//Will be called after a link's html was fetched, but BEFORE the child operations are performed on it(like, collecting some data from it). Is passed the response object(a custom response object, that also contains the original node-fetch response). Notice that any modification to this object, might result in an unexpected behavior with the child operations of that page. getException:(error)=>{}//Get every exception throw by this openLinks operation, even if this was later repeated successfully. slice:[start,end]//You can define a certain range of elements from the node list.Also possible to pass just a number, instead of an array, if you only want to specify the start. This uses the Cheerio/Jquery slice method. } Public methods:
    NameDescription
    addOperation(Operation)Add a scraping "operation"(OpenLinks,DownloadContent,CollectContent)
    getData()Will get the data from all pages processed by this operation
    getErrors()Gets all errors encountered by this operation.

    class CollectContent(querySelector,[config])

    Responsible for simply collecting text/html from a given page. The optional config can receive these properties: { name:'some name', contentType:'text',//Either 'text' or 'html'. Default is text. shouldTrim:true,//Default is true. Applies JS String.trim() method. getElementList:(elementList,pageAddress)=>{}, getElementContent:(elementContentString,pageAddress)=>{}//Called with each element collected. getAllItems: (items, address)=>{}//Called after an entire page has its elements collected. slice:[start,end] } Public methods:
    NameDescription
    getData()Gets all data collected by this operation.

    class DownloadContent(querySelector,[config])

    Responsible downloading files/images from a given page. The optional config can receive these properties: { name:'some name', contentType:'image',//Either 'image' or 'file'. Default is image. alternativeSrc:['first-alternative','second-alternative']//Provide alternative attributes to be used as the src. Will only be invoked, //If the "src" attribute is undefined or is a dataUrl. If no matching alternative is found, the dataUrl is used. condition:(cheerioNode)=>{},//Use this hook to add additional filter to the nodes that were received by the querySelector. Return true to include, falsy to exclude. getElementList:(elementList)=>{}, getException:(error)=>{}//Get every exception throw by this downloadContent operation, even if this was later repeated successfully. filePath:'./somePath',//Overrides the global filePath passed to the Scraper config. slice:[start,end] } Public methods:
    NameDescription
    getData()Gets all file names that were downloaded, and their relevant data
    getErrors()Gets all errors encountered by this operation.

    Pagination explained

    nodejs-web-scraper covers most scenarios of pagination(assuming it's server-side rendered of course). //If a site uses a queryString for pagination, this is how it's done: const productPages = new openLinks('a.product'{ pagination: { queryString: 'page_num', begin: 1, end: 1000 } });//You need to specify the query string that the site uses for pagination, and the page range you're interested in. //If the site uses some kind of offset(like Google search results), instead of just incrementing by one, you can do it this way: { pagination: { queryString: 'page_num', begin: 1, end: 100,offset:10 } } //If the site uses routing-based pagination: { pagination: { routingString: '/', begin: 1, end: 100 } }

    Error Handling

    nodejs-web-scraper will automatically repeat every failed request(except 404,400,403 and invalid images). Number of repetitions depends on the global config option "maxRetries", which you pass to the Scraper. If a request fails "indefinitely", it will be skipped. After the entire scraping process is complete, all "final" errors will be printed as a JSON into a file called "finalErrors.json"(assuming you provided a logPath). Alternatively, use the onError callback function in the scraper's global config.

    Automatic logs

    If a logPath was provided, the scraper will create a log for each operation object you create, and also the following ones: "log.json"(summary of the entire scraping tree), and "finalErrors.json"(an array of all FINAL errors encountered). I really recommend using this feature, along side your own hooks and data handling.

    Concurrency

    The program uses a rather complex concurrency management. Being that the memory consumption can get very high in certain scenarios, I've force-limited the concurrency of pagination and "nested" OpenLinks operations. It should still be very quick. As a general note, i recommend to limit the concurrency to 10 at most. Also the config.delay is a key a factor.

    Web Scraping with JavaScript and NodeJS



    JavaScript has become one of the most popular and widely used languages due to the massive improvements it has seen and the introduction of the runtime known as NodeJS. Whether it's a web or mobile application, JavaScript now has the right tools. This article will explain how the vibrant ecosystem of NodeJS allows you to efficiently scrape the web to meet most of your requirements.

     Prerequisites

    This post is primarily aimed at developers who have some level of experience with JavaScript. However, if you have a firm understanding of web scraping but have no experience with JavaScript, it may still serve as light introduction to JavaScript. Still, having experience in the following fields will certainly help: ✅ Experience with JavaScript ✅ Experience using the browser's DevTools to extract selectors of elements ✅ Some experience with ES6 JavaScript (Optional) ⭐ Make sure to check out the resources at the end of this article for more details on the subject!

     Outcomes

    After reading this post will be able to: Have a functional understanding of NodeJS Use multiple HTTP clients to assist in the web scraping process Use multiple modern and battle-tested libraries to scrape the web

    Understanding NodeJS: A brief introduction

    JavaScript was originally meant to add rudimentary scripting abilities to browsers, in order to allow websites to support more custom ways of interactivity with the user, like showing a dialog box or creating additional HTML content on-the-fly. For this purpose, browsers are providing a runtime environment (with global objects such as document and window) to enable your code to interact with the browser instance and the page itself. And for more than a decade, JavaScript was really mostly confined to that use case and to the browser. However that changed when Ryan Dahl introduced NodeJS in 2009. NodeJS took Chrome's JavaScript engine and brought it to the server (or better the command line). Contrary to the browser environment, it did not have any more access to a browser window or cookie storage, but what it got instead, was full access to the system resources. Now, it could easily open network connections, store records in databases, or even just read and write files on your hard drive. Essentially, Node.js introduced JavaScript as a server-side language and provides a regular JavaScript engine, freed from the usual browser sandbox shackles and, instead, pumped up with a standard system library for networking and file access.

     The JavaScript Event Loop

    What it kept, was the Event Loop. As opposed to how many languages handle concurrency, with multi-threading, JavaScript has always only used a single thread and performed blocking operations in an asynchronous fashion, relying primarily on callback functions (or function pointers, as C developers may call them). Let's check that quickly out with a simple web server example: const http = require('http'); const PORT = 3000; const server = http.createServer((req, res) => { res.statusCode = 200; res.setHeader('Content-Type', 'text/plain'); res.end('Hello World'); }); server.listen(port, () => { console.log(`Server running at PORT:${port}/`); }); Here, we import the HTTP standard library with require, then create a server object with createServer and pass it an anonymous handler function, which the library will invoke for each incoming HTTP request. Finally, we listen on the specified port - and that's actually it. There are two interesting bits here and both already hint at our event loop and JavaScript's asynchronicity: The handler function we pass to createServer The fact that listen is not a blocking call, but returns immediately In most other languages, we'd usually have an accept function/method, which would block our thread and return the connection socket of the connecting client. At this point, the latest, we'd have to switch to multi-threading, as otherwise we could handle exactly one connection at a time. In this case, however, we don't have to deal with thread management and we always stay with one thread, thanks to callbacks and the event loop. As mentioned, listen will return immediately, but - although there's no code following our listen call - the application won't exit immediately. That is because we still have a callback registered via createServer (the function we passed). Whenever a client sends a request, Node.js will parse it in the background and call our anonymous function and pass the request object. The only thing we have to pay attention to here is to return swiftly and not block the function itself, but it's hard to do that, as almost all standard calls are asynchronous (either via callbacks or Promises) - just make sure you don't run while (true); 😀 But enough of theory, let's check it out, shall we? If you have Node.js installed, all you need to do is save the code to the file MyServer.js and run it in your shell with node MyServer.js. Now, just open your browser and load http://localhost:3000 - voilà, you should get a lovely "Hello World" greeting. That was easy, wasn't it? One could assume the single-threaded approach may come with performance issues, because it only has one thread, but it's actually quite the opposite and that's the beauty of asynchronous programming. Single-threaded, asynchronous programming can have, especially for I/O intensive work, quite a few performance advantages, because one does not need to pre-allocate resources (e.g. threads). All right, that was a very nice example of how we easily create a web server in Node.js, but we are in the business of scraping, aren't we? So let's take a look at the JavaScript HTTP client libraries.

    HTTP clients: querying the web

    HTTP clients are tools capable of sending a request to a server and then receiving a response from it. Almost every tool that will be discussed in this article uses an HTTP client under the hood to query the server of the website that you will attempt to scrape.

     1. Built-In HTTP Client

    As mentioned in your server example, Node.js does ship by default with an HTTP library. That library also has a built-in HTTP client. const http = require('http'); const req = http.request('http://example.com', res => { const data = []; res.on('data', _ => data.push(_)) res.on('end', () => console.log(data.join())) }); req.end(); It's rather easy to get started, as there are zero third-party dependencies to install or manage, however - as you can notice from our example - the library does require a bit of boilerplate, as it provides the response only in chunks and you eventually need to stitch them together manually. You'll also need to use a separate library for HTTPS URLs. In short, it's convenient because it comes out-of-the-box, but it may require you to write more code than you may want. Hence, let's take a look at the other HTTP libraries. Shall we?

     2. Fetch API

    Another built-in method would be the Fetch API. While browsers have supported it for a while already, it took Node.js a bit longer, but as of version 18, Node.js does support fetch(). To be fair, for the time being, it still is considered an experimental feature, so if you prefer to play it safe, you can also opt for the polyfill/wrapper library node-fetch, which provides the same functionality. While at it, also check out our dedicated article on node-fetch. The Fetch API heavily uses Promises and coupled with await, that can really provide you with lean and legible code. async function fetch_demo() { const resp = await fetch('https://www.reddit.com/r/programming.json'); console.log(await resp.json()); } fetch_demo(); The only workaround we had to employ, was to wrap our code into a function, as await is not supported on the top-level yet. Apart from that we really just called fetch() with our URL, awaited the response (Promise-magic happening in the background, of course), and used the json() function of our Response object (awaiting again) to get the response. Mind you, an already JSON-parsed response 😲. Not bad, two lines of code, no manual handling of data, no distinction between HTTP and HTTPS, and a native JSON object. fetch optionally accepts an additional options argument, where you can fine-tune your request with a specific request method (e.g. POST), additional HTTP headers, or pass authentication credentials.

     3. Axios

    Axios is pretty similar to Fetch. It's also a Promise-based HTTP client and it runs in both, browsers and Node.js. Users of TypeScript will also love its built-in type support. One drawback, however, contrary to the libraries we mentioned so far, we do have to install it first. npm install axios Perfect, let's check out a first plain-Promise example: const axios = require('axios') axios .get('https://www.reddit.com/r/programming.json') .then((response) => { console.log(response) }) .catch((error) => { console.error(error) }); Pretty straightforward. Relying on Promises, we can certainly also use await again and make the whole thing a bit less verbose. So let's wrap it into a function one more time: async function getForum() { try { const response = await axios.get( 'https://www.reddit.com/r/programming.json' ) console.log(response) } catch (error) { console.error(error) } } All you have to do is call getForum! You can find the Axios library at Github.

     4. SuperAgent

    Much like Axios, SuperAgent is another robust HTTP client that has support for promises and the async/await syntax sugar. It has a fairly straightforward API like Axios, but SuperAgent has more dependencies and is less popular. Regardless, making an HTTP request with SuperAgent using promises, async/await, and callbacks looks like this: const superagent = require("superagent") const forumURL = "https://www.reddit.com/r/programming.json" // callbacks superagent .get(forumURL) .end((error, response) => { console.log(response) }) // promises superagent .get(forumURL) .then((response) => { console.log(response) }) .catch((error) => { console.error(error) }) // promises with async/await async function getForum() { try { const response = await superagent.get(forumURL) console.log(response) } catch (error) { console.error(error) } } You can find the SuperAgent library at GitHub and installing SuperAgent is as simple as npm install superagent.

     SuperAgent plugins

    One feature, that sets SuperAgent apart from the other libraries here, is its extensibility. It features quite a list of plugins which allow for the tweaking of a request or response. For example, the superagent-throttle plugin would allow you to define throttling rules for your requests.

     5. Request

    Even though it is not actively maintained any more, Request still is a popular and widely used HTTP client in the JavaScript ecosystem. It is fairly simple to make an HTTP request with Request: const request = require('request') request('https://www.reddit.com/r/programming.json', function ( error, response, body ) { console.error('error:', error) console.log('body:', body) }) What you will definitely have noticed here, is that we were neither using plain Promises nor await. That is because Request still employs the traditional callback approach, however there are a couple of wrapper libraries to support await as well. You can find the Request library at GitHub, and installing it is as simple as running npm install request. Should you use Request? We included Request in this list because it still is a popular choice. Nonetheless, development has officially stopped and it is not being actively maintained any more. Of course, that does not mean it is unusable, and there are still lots of libraries using it, but the fact itself, may still make us think twice before we use it for a brand-new project, especially with quite a list of viable alternatives and native fetch support.

    Data Extraction in JavaScript

    Fetching the content of a site is, undoubtedly, an important step in any scraping project, but it's only the first step and we actually need to locate and extract the data as well. This is what we are going to check out next, how we can handle an HTML document in JavaScript and how to locate and select information for data extraction. First off, regular expressions 🙂

     Regular expressions: the hard way

    The simplest way to get started with web scraping without any dependencies, is to use a bunch of regular expressions on the HTML content you received from your HTTP client. But there is a big tradeoff. While absolutely great in their domain, regular expressions are not ideal for parsing document structures like HTML. Plus, newcomers often struggle with getting them right ("do I need a look-ahead or a look-behind?"). For complex web scraping, regular expressions can also get out of hand. With that said, let's give it a go nonethless. Say there's a label with some username in it and we want the username. This is similar to what you'd have to do if you relied on regular expressions: const htmlString = '<label>Username: John Doe</label>' const result = htmlString.match(/<label>Username: (.+)<\/label>/) console.log(result[1]) // John Doe We are using String.match() here, which will provide us with an array containing the data of the evaluation of our regular expression. As we used a capturing group ((.+)), the second array element (result[1]) will contain whatever that group managed to capture. While this certainly worked in our example, anything more complex will either not work or will require a way more complex expression. Just imagine you have a couple of <label> elements in your HTML document. Don't get us wrong, regular expressions are an unimaginable great tool, just not for HTML 😊 - so let us introduce you to the world of CSS selectors and the DOM.

     Cheerio: Core jQuery for traversing the DOM

    Cheerio is an efficient and light library that allows you to use the rich and powerful API of jQuery on the server-side. If you have used jQuery before, you will feel right at home with Cheerio. It provides you with an incredibly easy way to parse an HTML string into a DOM tree, which you can then access via the elegant interface you may be familiar with from jQuery (including function-chaining). const cheerio = require('cheerio') const $ = cheerio.load('<h2 class="title">Hello world</h3>') $('h2.title').text('Hello there!') $('h2').addClass('welcome') $.html() // <h2 class="title welcome">Hello there!</h3> As you can see, using Cheerio really is almost identical to how you'd use jQuery. Keep in mind, Cheerio really focuses on DOM-manipulation and you won't be able to directly "port" jQuery functionality, such as XHR/AJAX requests or mouse handling (e.g. onClick), one-to-one in Cheerio. Cheerio is a great tool for most use cases when you need to handle the DOM yourself. Of course, if you want to crawl a JavaScript-heavy site (e.g. typical Single-page applications) you may need something closer to a full browser engine. We'll be talking about that in just second, under Headless Browsers in JavaScript. Time for a quick Cheerio example, wouldn't you agree? To demonstrate the power of Cheerio, we will attempt to crawl the r/programming forum in Reddit and get a list of post names. First, install Cheerio and Axios by running the following command: npm install cheerio axios. Then create a new file called crawler.js and copy/paste the following code: const axios = require('axios'); const cheerio = require('cheerio'); const getPostTitles = async () => { try { const { data } = await axios.get( 'https://old.reddit.com/r/programming/' ); const $ = cheerio.load(data); const postTitles = []; $('div > p.title > a').each((_idx, el) => { const postTitle = $(el).text() postTitles.push(postTitle) }); return postTitles; } catch (error) { throw error; } }; getPostTitles() .then((postTitles) => console.log(postTitles)); getPostTitles() is an asynchronous function that will crawl the subreddit r/programming forum. First, the HTML of the website is obtained using a simple HTTP GET request with the Axios HTTP client library. Then, the HTML data is fed into Cheerio using the cheerio.load() function. Wonderful, we now have fully parsed HTML document as DOM tree in, good old-fashioned jQuery-manner, in $. What's next? Well, might not be a bad idea to know where to get our posting titles from. So, let's right click one of the titles and pick Inspect. That should get us right to the right element in the browser's developer tools. Inspecting Reddit DOM
    Excellent, equipped with our knowledge on XPath or CSS selectors, we can now easily compose the expression we need for that element. For our example, we chose CSS selectors and following one just works beautifully. div > p.title > a If you used jQuery, you probably know what we are up to, right? 😏 $('div > p.title > a') You were absolutely right. The Cheerio call is identical to jQuery (there was a reason why we used $ for our DOM variable before) and using Cheerio with our CSS selector will give us the very list of elements matching our selector. Now, we just need to iterate with each() over all elements and call their text() function to get their text content. 💯 jQuery, isn't it? So much about the explanation. Time to run our code. Open up your shell and run node crawler.js. You'll then see an array of about 25 or 26 different post titles (it'll be quite long). While this is a simple use case, it demonstrates the simple nature of the API provided by Cheerio. If your use case requires the execution of JavaScript and loading of external sources, the following few options will be helpful. Do not forget to check out our NodeJS Axios proxy tutorial if you want to learn more about using proxies for web scraping!

     jsdom: the DOM for Node

    Similarly to how Cheerio replicates jQuery on the server-side, jsdom does the same for the browser's native DOM functionality. Unlike Cheerio, however, jsdom does not only parse HTML into a DOM tree, it can also handle embedded JavaScript code and it allows you to "interact" with page elements. Instantiating a jsdom object is rather easy: const { JSDOM } = require('jsdom') const { document } = new JSDOM( '<h2 class="title">Hello world</h3>' ).window const heading = document.querySelector('.title') heading.textContent = 'Hello there!' heading.classList.add('welcome') heading.innerHTML // <h2 class="title welcome">Hello there!</h3> Here, we imported the library with require and created a new jsdom instance using the constructor and passed our HTML snippet. Then, we simply used querySelector() (as we know it from front-end development) to select our element and tweaked its attributes a bit. Fairly standard and we could have done that with Cheerio as well, of course. What sets jsdom, however, apart is aforementioned support for embedded JavaScript code and, that, we are going to check out now. The following example uses a simple local HTML page, with one button adding a <div> with an ID. const { JSDOM } = require("jsdom") const HTML = ` <html> <body> <button onclick="const e = document.createElement('div'); e.id = 'myid'; this.parentNode.appendChild(e);">Click me</button> </body> </html>`; const dom = new JSDOM(HTML, { runScripts: "dangerously", resources: "usable" }); const document = dom.window.document; const button = document.querySelector('button'); console.log("Element before click: " + document.querySelector('div#myid')); button.click(); console.log("Element after click: " + document.querySelector('div#myid')); Nothing too complicated here: we require() jsdom set up our HTML document pass HTML to our jsdom constructor (important, we need to enable runScripts) select the button with a querySelector() call and click() it Voilà, that should give us this output Element before click: null Element after click: [object HTMLDivElement] Fairly straightforward and the example showcased how we can use jsdom to actually execute the page's JavaScript code. When we loaded the document, there was initially no <div>. Only once we clicked the button, it was added by the site's code, not our crawler's code. In this context, the important details are runScripts and resources. These flags instruct jsdom to run the page's code, as well as fetch any relevant JavaScript files. As jsdom's documentation points out, that could potentially allow any site to escape the sandbox and get access to your local system, just by crawling it. Proceed with caution please. jsdom is a great library to handle most of typical browser tasks within your local Node.js instance, but it still has some limitations and that's where headless browsers really come to shine. 💡 We released a new feature that makes this whole process way simpler. You can now extract data from HTML with one simple API call. Feel free to check the documentation here.

    Headless Browsers in JavaScript

    Sites become more and more complex and often regular HTTP crawling won't suffice any more, but one actually needs a full-fledged browser engine, to get the necessary information from a site. This is particularly true for SPAs which heavily rely on JavaScript and dynamic and asynchronous resources. Browser automation and headless browsers come to the rescue here. Let's check out how they can help us to easily crawl Single-page Applications and other sites making use of JavaScript.

     1. Puppeteer: the headless browser

    Puppeteer, as the name implies, allows you to manipulate the browser programmatically, just like how a puppet would be manipulated by its puppeteer. It achieves this by providing a developer with a high-level API to control a headless version of Chrome by default and can be configured to run non-headless. puppeteer-hierachy Taken from the Puppeteer Docs (Source) Puppeteer is particularly more useful than the aforementioned tools because it allows you to crawl the web as if a real person were interacting with a browser. This opens up a few possibilities that weren't there before: You can get screenshots or generate PDFs of pages. You can crawl a Single Page Application and generate pre-rendered content. You can automate many different user interactions, like keyboard inputs, form submissions, navigation, etc. It could also play a big role in many other tasks outside the scope of web crawling like UI testing, assist performance optimization, etc. Quite often, you will probably want to take screenshots of websites or, get to know about a competitor's product catalog. Puppeteer can be used to do this. To start, install Puppeteer by running the following command: npm install puppeteer This will download a bundled version of Chromium which takes up about 180 to 300 MB, depending on your operating system. You can avoid that step, and use an already installed setup, by specifying a couple of Puppeteer environment variables, such as PUPPETEER_SKIP_CHROMIUM_DOWNLOAD. Generally, though, Puppeteer does recommended to use the bundled version and does not support custom setups. Let's attempt to get a screenshot and PDF of the r/programming forum in Reddit, create a new file called crawler.js, and copy/paste the following code: const puppeteer = require('puppeteer') async function getVisual() { try { const URL = 'https://www.reddit.com/r/programming/' const browser = await puppeteer.launch() const page = await browser.newPage() await page.goto(URL) await page.screenshot({ path: 'screenshot.png' }) await page.pdf({ path: 'page.pdf' }) await browser.close() } catch (error) { console.error(error) } } getVisual() getVisual() is an asynchronous function that will take a screenshot of our page, as well as export it as PDF document. To start, an instance of the browser is created by running puppeteer.launch(). Next, we create a new browser tab/page with newPage(). Now, we just need to call goto() on our page instance and pass it our URL. All these functions are of asynchronous nature and will return immediately, but as they are returning a JavaScript Promise, and we are using await, the flow still appears to be synchronous and, hence, once goto "returned", our website should have loaded. Excellent, we are ready to get pretty pictures. Let's just call screenshot() on our page instance and pass it a path to our image file. We do the same with pdf() and voilà, we should have at the specified locations two new files. Because we are responsible netizens, we also call close() on our browser object, to clean up behind ourselves. That's it. Once thing to keep in mind, when goto() returns, the page has loaded but it might not be done with all its asynchronous loading. So depending on your site, you may want to add additional logic in a production crawler, to wait for certain JavaScript events or DOM elements. But let's run the code. Pop up a shell window, type node crawler.js, and after a few moments, you should have exactly the two mentioned files in your directory. It's a great tool and if you are really keen on it now, please also check out our other guides on Puppeteer. How to download a file with Puppeteer Handling and submitting HTML forms with Puppeteer Using Puppeteer with Python and Pyppeteer

     2. Nightmare: an alternative to Puppeteer

    Nightmare is another a high-level browser automation library like Puppeteer. It uses Electron and web and scraping benchmarks indicate it shows a significantly better performance than its predecessor PhantomJS. If Puppeteer is too complex for your use case or there are issues with the default Chromium bundle, Nightmare - despite its name 😨 - may just be the right thing for you. As so often, our journey starts with NPM: npm install nightmare Once Nightmare is available on your system, we will use it to find ScrapingBee's website through a Brave search. To do so, create a file called crawler.js and copy/paste the following code into it: const Nightmare = require('nightmare') const nightmare = Nightmare() nightmare .goto('https://search.brave.com/') .type('#searchbox', 'ScrapingBee') .click('#submit-button') .wait('#results a') .evaluate( () => document.querySelector('#results a').href ) .end() .then((link) => { console.log('ScrapingBee Web Link:', link) }) .catch((error) => { console.error('Search failed:', error) }) After the usual library import with require, we first create a new instance of Nightmare and save that in nightmare. After that, we are going to have lots of fun with function-chaining and Promises 🥳 We use goto() to load Brave from https://search.brave.com We type our search term "ScrapingBee" in Brave's search input, with the CSS selector #searchbox (Brave's quite straightforward with its naming, isn't it?) We click the submit button to start our search. Again, that's with the CSS selector #submit-button (Brave's really straightforward, we love that❣️) Let's take a quick break, until Brave returns the search list. wait, with the right selector works wonders here. wait also accepts time value, if you need to wait for a specific period of time. Once Nightmare got the link list from Brave, we simply use evaluate() to run our custom code on the page (in this case querySelector()) and get the first <a> element matching our selector, and return its href attribute. Last but not least, we call end() to run and complete our task queue. That's it, folks. end() returns a standard Promise with the value from our call to evaluate(). Of course, you could also use await here. That was pretty easy, wasn'it? And if everything went all right 🤞, we should have now got the link to ScrapingBee's website at https://www.scrapingbee.com ScrapingBee Web Link: https://www.scrapingbee.com/ Wanna try it yourself? Just run node crawler.js in your shell 👍

     3. Playwright, the new web scraping framework

    Playwright is the new cross-language, cross-platform headless framework supported by Microsoft.

    Its main advantage over Puppeteer is that it is cross platform and very easy to use.

    Here is how to simply scrape a page with it: const playwright = require('playwright'); async function main() { const browser = await playwright.chromium.launch({ headless: false // setting this to true will not run the UI }); const page = await browser.newPage(); await page.goto('https://finance.yahoo.com/world-indices'); await page.waitForTimeout(5000); // wait for 5 seconds await browser.close(); } main(); Feel free to check out our Playwright tutorial if you want to learn more.

    Summary

    Phew, that was a long read! But we hope, our examples managed to give you a first glimpse into the world of web scraping with JavaScript and which libraries you can use to crawl the web and scrape the information you need. Let's give it a quick recap, what we learned today was: ✅ NodeJS is a JavaScript runtime that allow JavaScript to be run server-side. It has a non-blocking nature thanks to the Event Loop. ✅ HTTP clients, such as the native libaries and fetch, as well as Axios, SuperAgent, node-fetch, and Request, are used to send HTTP requests to a server and receive a response. ✅ Cheerio abstracts the best out of jQuery for the sole purpose of running it server-side for web crawling but does not execute JavaScript code. ✅ JSDOM creates a DOM per the standard JavaScript specification out of an HTML string and allows you to perform DOM manipulations on it. ✅ Puppeteer and Nightmare are high-level browser automation libraries, that allow you to programmatically manipulate web applications as if a real person were interacting with them. This article focused on JavaScript's scraping ecosystem and its tools. However, there are certainly also other apsects to scraping, which we could not cover in this context. For example, sites often employ techniques to recognize and block crawlers. You'll want to avoid these and blend in as normal visitor. On this subject, and more, we have an excellent, dedicated guide on how not to get blocked as a crawler. Check it out please. 💡 Should you love scraping, but the usual time-constraints for your project don't allow you to tweak your crawlers to perfection, then please have a look at our scraping API platform. ScrapingBee was built with all these things in mind and has got your back in all crawling tasks. Happy Scraping!

    Resources

    Would you like to read more? Check these links out: NodeJS Website - The main site of NodeJS with its official documentation. Puppeteer's Docs - Google's documentation of Puppeteer, with getting started guides and the API reference. Playright - An alternative to Puppeteer, backed by Microsoft. ScrapingBee's Blog - Contains a lot of information about Web Scraping goodies on multiple platforms. Handling infinite scroll with Puppeteer Node-unblocker - a Node.js package to facilitate web scraping through proxies.+

    Global Objects



    Node.js global objects are global in nature and they are available in all modules. We do not need to include these objects in our application, rather we can use them directly. These objects are modules, functions, strings and object itself as explained below.

    __filename

    The __filename represents the filename of the code being executed. This is the resolved absolute path of this code file. For a main program, this is not necessarily the same filename used in the command line. The value inside a module is the path to that module file. Example Create a js file named main.js with the following code − // Let's try to print the value of __filename console.log( __filename ); Now run the main.js to see the result − $ node main.js Based on the location of your program, it will print the main file name as follows − /web/com/1427091028_21099/main.js

    __dirname

    The __dirname represents the name of the directory that the currently executing script resides in. Example Create a js file named main.js with the following code − // Let's try to print the value of __dirname console.log( __dirname ); Now run the main.js to see the result − $ node main.js Based on the location of your program, it will print current directory name as follows − /web/com/1427091028_21099

    setTimeout(cb, ms)

    The setTimeout(cb, ms) global function is used to run callback cb after at least ms milliseconds. The actual delay depends on external factors like OS timer granularity and system load. A timer cannot span more than 24.8 days. This function returns an opaque value that represents the timer which can be used to clear the timer. Example Create a js file named main.js with the following code − function printHello() { console.log( "Hello, World!"); } // Now call above function after 2 seconds setTimeout(printHello, 2000); Now run the main.js to see the result − $ node main.js Verify the output is printed after a little delay. Hello, World!

    clearTimeout(t)

    The clearTimeout(t) global function is used to stop a timer that was previously created with setTimeout(). Here t is the timer returned by the setTimeout() function. Example Create a js file named main.js with the following code − function printHello() { console.log( "Hello, World!"); } // Now call above function after 2 seconds var t = setTimeout(printHello, 2000); // Now clear the timer clearTimeout(t); Now run the main.js to see the result − $ node main.js Verify the output where you will not find anything printed.

    setInterval(cb, ms)

    The setInterval(cb, ms) global function is used to run callback cb repeatedly after at least ms milliseconds. The actual delay depends on external factors like OS timer granularity and system load. A timer cannot span more than 24.8 days. This function returns an opaque value that represents the timer which can be used to clear the timer using the function clearInterval(t). Example Create a js file named main.js with the following code − function printHello() { console.log( "Hello, World!"); } // Now call above function after 2 seconds setInterval(printHello, 2000); Now run the main.js to see the result − $ node main.js The above program will execute printHello() after every 2 second. Due to system limitation.

    Global Objects

    The following table provides a list of other objects which we use frequently in our applications. For a more detail, you can refer to the official documentation.
    Sr.No. Module Name > Description
    1Console Used to print information on stdout and stderr.
    2Process Used to get information on current process. Provides multiple events related to process activities.

    Global Modules: CommonJS modules



    CommonJS modules are the original way to package JavaScript code for Node.js. Node.js also supports the ECMAScript modules standard used by browsers and other JavaScript runtimes. In Node.js, each file is treated as a separate module. For example, consider a file named foo.js: const circle = require('./circle.js'); console.log(`The area of a circle of radius 4 is ${circle.area(4)}`); On the first line, foo.js loads the module circle.js that is in the same directory as foo.js. Here are the contents of circle.js: const { PI } = Math; exports.area = (r) => PI * r ** 2; exports.circumference = (r) => 2 * PI * r; The module circle.js has exported the functions area() and circumference(). Functions and objects are added to the root of a module by specifying additional properties on the special exports object. Variables local to the module will be private, because the module is wrapped in a function by Node.js (see module wrapper). In this example, the variable PI is private to circle.js. The module.exports property can be assigned a new value (such as a function or object). Below, bar.js makes use of the square module, which exports a Square class: const Square = require('./square.js'); const mySquare = new Square(2); console.log(`The area of mySquare is ${mySquare.area()}`); The square module is defined in square.js: // Assigning to exports will not modify module, must use module.exports module.exports = class Square { constructor(width) { this.width = width; } area() { return this.width ** 2; } }; The CommonJS module system is implemented in the module core module.

     Enabling

    Node.js has two module systems: CommonJS modules and ECMAScript modules. By default, Node.js will treat the following as CommonJS modules: Files with a .cjs extension; Files with a .js extension when the nearest parent package.json file contains a top-level field "type" with a value of "commonjs". Files with a .js extension when the nearest parent package.json file doesn't contain a top-level field "type". Package authors should include the "type" field, even in packages where all sources are CommonJS. Being explicit about the type of the package will make things easier for build tools and loaders to determine how the files in the package should be interpreted. Files with an extension that is not .mjs, .cjs, .json, .node, or .js (when the nearest parent package.json file contains a top-level field "type" with a value of "module", those files will be recognized as CommonJS modules only if they are being included via require(), not when used as the command-line entry point of the program). See Determining module system for more details. Calling require() always use the CommonJS module loader. Calling import() always use the ECMAScript module loader.

     Accessing the main module

    When a file is run directly from Node.js, require.main is set to its module. That means that it is possible to determine whether a file has been run directly by testing require.main === module. For a file foo.js, this will be true if run via node foo.js, but false if run by require('./foo'). When the entry point is not a CommonJS module, require.main is undefined, and the main module is out of reach.

     Package manager tips

    The semantics of the Node.js require() function were designed to be general enough to support reasonable directory structures. Package manager programs such as dpkg, rpm, and npm will hopefully find it possible to build native packages from Node.js modules without modification. Below we give a suggested directory structure that could work: Let's say that we wanted to have the folder at /usr/lib/node/<some-package>/<some-version> hold the contents of a specific version of a package. Packages can depend on one another. In order to install package foo, it may be necessary to install a specific version of package bar. The bar package may itself have dependencies, and in some cases, these may even collide or form cyclic dependencies. Because Node.js looks up the realpath of any modules it loads (that is, it resolves symlinks) and then looks for their dependencies in node_modules folders, this situation can be resolved with the following architecture: /usr/lib/node/foo/1.2.3/: Contents of the foo package, version 1.2.3. /usr/lib/node/bar/4.3.2/: Contents of the bar package that foo depends on. /usr/lib/node/foo/1.2.3/node_modules/bar: Symbolic link to /usr/lib/node/bar/4.3.2/. /usr/lib/node/bar/4.3.2/node_modules/*: Symbolic links to the packages that bar depends on. Thus, even if a cycle is encountered, or if there are dependency conflicts, every module will be able to get a version of its dependency that it can use. When the code in the foo package does require('bar'), it will get the version that is symlinked into /usr/lib/node/foo/1.2.3/node_modules/bar. Then, when the code in the bar package calls require('quux'), it'll get the version that is symlinked into /usr/lib/node/bar/4.3.2/node_modules/quux. Furthermore, to make the module lookup process even more optimal, rather than putting packages directly in /usr/lib/node, we could put them in /usr/lib/node_modules/<name>/<version>. Then Node.js will not bother looking for missing dependencies in /usr/node_modules or /node_modules. In order to make modules available to the Node.js REPL, it might be useful to also add the /usr/lib/node_modules folder to the $NODE_PATH environment variable. Since the module lookups using node_modules folders are all relative, and based on the real path of the files making the calls to require(), the packages themselves can be anywhere.

     The .mjs extension

    Due to the synchronous nature of require(), it is not possible to use it to load ECMAScript module files. Attempting to do so will throw a ERR_REQUIRE_ESM error. Use import() instead. The .mjs extension is reserved for ECMAScript Modules which cannot be loaded via require(). See Determining module system section for more info regarding which files are parsed as ECMAScript modules.

     All together

    To get the exact filename that will be loaded when require() is called, use the require.resolve() function. Putting together all of the above, here is the high-level algorithm in pseudocode of what require() does: require(X) from module at path Y 1. If X is a core module, a. return the core module b. STOP 2. If X begins with '/' a. set Y to be the filesystem root 3. If X begins with './' or '/' or '../' a. LOAD_AS_FILE(Y + X) b. LOAD_AS_DIRECTORY(Y + X) c. THROW "not found" 4. If X begins with '#' a. LOAD_PACKAGE_IMPORTS(X, dirname(Y)) 5. LOAD_PACKAGE_SELF(X, dirname(Y)) 6. LOAD_NODE_MODULES(X, dirname(Y)) 7. THROW "not found" LOAD_AS_FILE(X) 1. If X is a file, load X as its file extension format. STOP 2. If X.js is a file, load X.js as JavaScript text. STOP 3. If X.json is a file, parse X.json to a JavaScript Object. STOP 4. If X.node is a file, load X.node as binary addon. STOP LOAD_INDEX(X) 1. If X/index.js is a file, load X/index.js as JavaScript text. STOP 2. If X/index.json is a file, parse X/index.json to a JavaScript object. STOP 3. If X/index.node is a file, load X/index.node as binary addon. STOP LOAD_AS_DIRECTORY(X) 1. If X/package.json is a file, a. Parse X/package.json, and look for "main" field. b. If "main" is a falsy value, GOTO 2. c. let M = X + (json main field) d. LOAD_AS_FILE(M) e. LOAD_INDEX(M) f. LOAD_INDEX(X) DEPRECATED g. THROW "not found" 2. LOAD_INDEX(X) LOAD_NODE_MODULES(X, START) 1. let DIRS = NODE_MODULES_PATHS(START) 2. for each DIR in DIRS: a.LOAD_PACKAGE_EXPORTS(X, DIR) b.LOAD_AS_FILE(DIR/X) c.LOAD_AS_DIRECTORY(DIR/X) NODE_MODULES_PATHS(START) 1. let PARTS = path split(START) 2. let I = count of PARTS - 1 3. let DIRS = [] 4. while I >= 0, a.if PARTS[I] = "node_modules" CONTINUE b.DIR = path join(PARTS[0 .. I] + "node_modules") c.DIRS = DIR + DIRS d.let I = I - 1 5. return DIRS + GLOBAL_FOLDERS LOAD_PACKAGE_IMPORTS(X, DIR) 1. Find the closest package scope SCOPE to DIR. 2. If no scope was found, return. 3. If the SCOPE/package.json "imports" is null or undefined, return. 4. let MATCH = PACKAGE_IMPORTS_RESOLVE(X, pathToFileURL(SCOPE), ["node", "require"]) defined in the ESM resolver. 5. RESOLVE_ESM_MATCH(MATCH). LOAD_PACKAGE_EXPORTS(X, DIR) 1. Try to interpret X as a combination of NAME and SUBPATH where the name may have a @scope/ prefix and the subpath begins with a slash (`/`). 2. If X does not match this pattern or DIR/NAME/package.json is not a file, return. 3. Parse DIR/NAME/package.json, and look for "exports" field. 4. If "exports" is null or undefined, return. 5. let MATCH = PACKAGE_EXPORTS_RESOLVE(pathToFileURL(DIR/NAME), "." + SUBPATH, `package.json` "exports", ["node", "require"]) defined in the ESM resolver. 6. RESOLVE_ESM_MATCH(MATCH) LOAD_PACKAGE_SELF(X, DIR) 1. Find the closest package scope SCOPE to DIR. 2. If no scope was found, return. 3. If the SCOPE/package.json "exports" is null or undefined, return. 4. If the SCOPE/package.json "name" is not the first segment of X, return. 5. let MATCH = PACKAGE_EXPORTS_RESOLVE(pathToFileURL(SCOPE), "." + X.slice("name".length), `package.json` "exports", ["node", "require"]) defined in the ESM resolver. 6. RESOLVE_ESM_MATCH(MATCH) RESOLVE_ESM_MATCH(MATCH) 1. let RESOLVED_PATH = fileURLToPath(MATCH) 2. If the file at RESOLVED_PATH exists, load RESOLVED_PATH as its extension format. STOP 3. THROW "not found"

     Caching

    Modules are cached after the first time they are loaded. This means (among other things) that every call to require('foo') will get exactly the same object returned, if it would resolve to the same file. Provided require.cache is not modified, multiple calls to require('foo') will not cause the module code to be executed multiple times. This is an important feature. With it, "partially done" objects can be returned, thus allowing transitive dependencies to be loaded even when they would cause cycles. To have a module execute code multiple times, export a function, and call that function.

     Module caching caveats

    Modules are cached based on their resolved filename. Since modules may resolve to a different filename based on the location of the calling module (loading from node_modules folders), it is not a guarantee that require('foo') will always return the exact same object, if it would resolve to different files. Additionally, on case-insensitive file systems or operating systems, different resolved filenames can point to the same file, but the cache will still treat them as different modules and will reload the file multiple times. For example, require('./foo') and require('./FOO') return two different objects, irrespective of whether or not ./foo and ./FOO are the same file.

     Core modules

    Node.js has several modules compiled into the binary. These modules are described in greater detail elsewhere in this documentation. The core modules are defined within the Node.js source and are located in the lib/ folder. Core modules can be identified using the node: prefix, in which case it bypasses the require cache. For instance, require('node:http') will always return the built in HTTP module, even if there is require.cache entry by that name. Some core modules are always preferentially loaded if their identifier is passed to require(). For instance, require('http') will always return the built-in HTTP module, even if there is a file by that name. The list of core modules that can be loaded without using the node: prefix is exposed as module.builtinModules.

     Cycles

    When there are circular require() calls, a module might not have finished executing when it is returned. Consider this situation: a.js: console.log('a starting'); exports.done = false; const b = require('./b.js'); console.log('in a, b.done = %j', b.done); exports.done = true; console.log('a done'); b.js: console.log('b starting'); exports.done = false; const a = require('./a.js'); console.log('in b, a.done = %j', a.done); exports.done = true; console.log('b done'); main.js: console.log('main starting'); const a = require('./a.js'); const b = require('./b.js'); console.log('in main, a.done = %j, b.done = %j', a.done, b.done); When main.js loads a.js, then a.js in turn loads b.js. At that point, b.js tries to load a.js. In order to prevent an infinite loop, an unfinished copy of the a.js exports object is returned to the b.js module. b.js then finishes loading, and its exports object is provided to the a.js module. By the time main.js has loaded both modules, they're both finished. The output of this program would thus be: $ node main.js main starting a starting b starting in b, a.done = false b done in a, b.done = true a done in main, a.done = true, b.done = true Careful planning is required to allow cyclic module dependencies to work correctly within an application.

     File modules

    If the exact filename is not found, then Node.js will attempt to load the required filename with the added extensions: .js, .json, and finally .node. When loading a file that has a different extension (e.g. .cjs), its full name must be passed to require(), including its file extension (e.g. require('./file.cjs')). .json files are parsed as JSON text files, .node files are interpreted as compiled addon modules loaded with process.dlopen(). Files using any other extension (or no extension at all) are parsed as JavaScript text files. Refer to the Determining module system section to understand what parse goal will be used. A required module prefixed with '/' is an absolute path to the file. For example, require('/home/marco/foo.js') will load the file at /home/marco/foo.js. A required module prefixed with './' is relative to the file calling require(). That is, circle.js must be in the same directory as foo.js for require('./circle') to find it. Without a leading '/', './', or '../' to indicate a file, the module must either be a core module or is loaded from a node_modules folder. If the given path does not exist, require() will throw a MODULE_NOT_FOUND error.

     Folders as modules

    There are three ways in which a folder may be passed to require() as an argument. The first is to create a package.json file in the root of the folder, which specifies a main module. An example package.json file might look like this: { "name" : "some-library", "main" : "./lib/some-library.js" } If this was in a folder at ./some-library, then require('./some-library') would attempt to load ./some-library/lib/some-library.js. If there is no package.json file present in the directory, or if the "main" entry is missing or cannot be resolved, then Node.js will attempt to load an index.js or index.node file out of that directory. For example, if there was no package.json file in the previous example, then require('./some-library') would attempt to load: ./some-library/index.js ./some-library/index.node If these attempts fail, then Node.js will report the entire module as missing with the default error: Error: Cannot find module 'some-library' In all three above cases, an import('./some-library') call would result in a ERR_UNSUPPORTED_DIR_IMPORT error. Using package subpath exports or subpath imports can provide the same containment organization benefits as folders as modules, and work for both require and import.

     Loading from node_modules folders

    If the module identifier passed to require() is not a core module, and does not begin with '/', '../', or './', then Node.js starts at the directory of the current module, and adds /node_modules, and attempts to load the module from that location. Node.js will not append node_modules to a path already ending in node_modules. If it is not found there, then it moves to the parent directory, and so on, until the root of the file system is reached. For example, if the file at '/home/ry/projects/foo.js' called require('bar.js'), then Node.js would look in the following locations, in this order: /home/ry/projects/node_modules/bar.js /home/ry/node_modules/bar.js /home/node_modules/bar.js /node_modules/bar.js This allows programs to localize their dependencies, so that they do not clash. It is possible to require specific files or sub modules distributed with a module by including a path suffix after the module name. For instance require('example-module/path/to/file') would resolve path/to/file relative to where example-module is located. The suffixed path follows the same module resolution semantics.

     Loading from the global folders

    If the NODE_PATH environment variable is set to a colon-delimited list of absolute paths, then Node.js will search those paths for modules if they are not found elsewhere. On Windows, NODE_PATH is delimited by semicolons (;) instead of colons. NODE_PATH was originally created to support loading modules from varying paths before the current module resolution algorithm was defined. NODE_PATH is still supported, but is less necessary now that the Node.js ecosystem has settled on a convention for locating dependent modules. Sometimes deployments that rely on NODE_PATH show surprising behavior when people are unaware that NODE_PATH must be set. Sometimes a module's dependencies change, causing a different version (or even a different module) to be loaded as the NODE_PATH is searched. Additionally, Node.js will search in the following list of GLOBAL_FOLDERS: 1: $HOME/.node_modules 2: $HOME/.node_libraries 3: $PREFIX/lib/node Where $HOME is the user's home directory, and $PREFIX is the Node.js configured node_prefix. These are mostly for historic reasons. It is strongly encouraged to place dependencies in the local node_modules folder. These will be loaded faster, and more reliably.

     The module wrapper

    Before a module's code is executed, Node.js will wrap it with a function wrapper that looks like the following: (function(exports, require, module, __filename, __dirname) { // Module code actually lives in here }); By doing this, Node.js achieves a few things: It keeps top-level variables (defined with var, const, or let) scoped to the module rather than the global object. It helps to provide some global-looking variables that are actually specific to the module, such as: The module and exports objects that the implementor can use to export values from the module. The convenience variables __filename and __dirname, containing the module's absolute filename and directory path.

     The module scope

    __dirname

    <string> The directory name of the current module. This is the same as the path.dirname() of the __filename. Example: running node example.js from /Users/mjr console.log(__dirname); // Prints: /Users/mjr console.log(path.dirname(__filename)); // Prints: /Users/mjr

    __filename

    <string> The file name of the current module. This is the current module file's absolute path with symlinks resolved. For a main program this is not necessarily the same as the file name used in the command line. See __dirname for the directory name of the current module. Examples: Running node example.js from /Users/mjr console.log(__filename); // Prints: /Users/mjr/example.js console.log(__dirname); // Prints: /Users/mjr Given two modules: a and b, where b is a dependency of a and there is a directory structure of: /Users/mjr/app/a.js /Users/mjr/app/node_modules/b/b.js References to __filename within b.js will return /Users/mjr/app/node_modules/b/b.js while references to __filename within a.js will return /Users/mjr/app/a.js.

    exports

    <Object> A reference to the module.exports that is shorter to type. See the section about the exports shortcut for details on when to use exports and when to use module.exports.

    module

    <module> A reference to the current module, see the section about the module object. In particular, module.exports is used for defining what a module exports and makes available through require().

    require(id)

    id <string> module name or path Returns: <any> exported module content Used to import modules, JSON, and local files. Modules can be imported from node_modules. Local modules and JSON files can be imported using a relative path (e.g. ./, ./foo, ./bar/baz, ../foo) that will be resolved against the directory named by __dirname (if defined) or the current working directory. The relative paths of POSIX style are resolved in an OS independent fashion, meaning that the examples above will work on Windows in the same way they would on Unix systems. // Importing a local module with a path relative to the `__dirname` or current // working directory. (On Windows, this would resolve to .\path\myLocalModule.) const myLocalModule = require('./path/myLocalModule'); // Importing a JSON file: const jsonData = require('./path/filename.json'); // Importing a module from node_modules or Node.js built-in module: const crypto = require('node:crypto');
      require.cache
    <Object> Modules are cached in this object when they are required. By deleting a key value from this object, the next require will reload the module. This does not apply to native addons, for which reloading will result in an error. Adding or replacing entries is also possible. This cache is checked before built-in modules and if a name matching a built-in module is added to the cache, only node:-prefixed require calls are going to receive the built-in module. Use with care! const assert = require('node:assert'); const realFs = require('node:fs'); const fakeFs = {}; require.cache.fs = { exports: fakeFs }; assert.strictEqual(require('fs'), fakeFs); assert.strictEqual(require('node:fs'), realFs);
      require.extensions
    Deprecated <Object> Instruct require on how to handle certain file extensions. Process files with the extension .sjs as .js: require.extensions['.sjs'] = require.extensions['.js']; Deprecated. In the past, this list has been used to load non-JavaScript modules into Node.js by compiling them on-demand. However, in practice, there are much better ways to do this, such as loading modules via some other Node.js program, or compiling them to JavaScript ahead of time. Avoid using require.extensions. Use could cause subtle bugs and resolving the extensions gets slower with each registered extension.
      require.main
    <module> | <undefined> The Module object representing the entry script loaded when the Node.js process launched, or undefined if the entry point of the program is not a CommonJS module. See "Accessing the main module". In entry.js script: console.log(require.main); node entry.js Module { id: '.', path: '/absolute/path/to', exports: {}, filename: '/absolute/path/to/entry.js', loaded: false, children: [], paths: [ '/absolute/path/to/node_modules', '/absolute/path/node_modules', '/absolute/node_modules', '/node_modules' ] }
      require.resolve(request[, options])
    request <string> The module path to resolve. options <Object> paths <string[]> Paths to resolve module location from. If present, these paths are used instead of the default resolution paths, with the exception of GLOBAL_FOLDERS like $HOME/.node_modules, which are always included. Each of these paths is used as a starting point for the module resolution algorithm, meaning that the node_modules hierarchy is checked from this location. Returns: <string> Use the internal require() machinery to look up the location of a module, but rather than loading the module, just return the resolved filename. If the module can not be found, a MODULE_NOT_FOUND error is thrown.
    require.resolve.paths(request)#
    request <string> The module path whose lookup paths are being retrieved. Returns: <string[]> | <null> Returns an array containing the paths searched during resolution of request or null if the request string references a core module, for example http or fs.

     The module object

    <Object> In each module, the module free variable is a reference to the object representing the current module. For convenience, module.exports is also accessible via the exports module-global. module is not actually a global but rather local to each module.

    module.children

    <module[]> The module objects required for the first time by this one.

    module.exports

    <Object> The module.exports object is created by the Module system. Sometimes this is not acceptable; many want their module to be an instance of some class. To do this, assign the desired export object to module.exports. Assigning the desired object to exports will simply rebind the local exports variable, which is probably not what is desired. For example, suppose we were making a module called a.js: const EventEmitter = require('node:events'); module.exports = new EventEmitter(); // Do some work, and after some time emit // the 'ready' event from the module itself. setTimeout(() => { module.exports.emit('ready'); }, 1000); Then in another file we could do: const a = require('./a'); a.on('ready', () => { console.log('module "a" is ready'); }); Assignment to module.exports must be done immediately. It cannot be done in any callbacks. This does not work: x.js: setTimeout(() => { module.exports = { a: 'hello' }; }, 0); y.js: const x = require('./x'); console.log(x.a);
      exports shortcut
    The exports variable is available within a module's file-level scope, and is assigned the value of module.exports before the module is evaluated. It allows a shortcut, so that module.exports.f = ... can be written more succinctly as exports.f = .... However, be aware that like any variable, if a new value is assigned to exports, it is no longer bound to module.exports: module.exports.hello = true; // Exported from require of module exports = { hello: false }; // Not exported, only available in the module When the module.exports property is being completely replaced by a new object, it is common to also reassign exports: module.exports = exports = function Constructor() { // ... etc. }; To illustrate the behavior, imagine this hypothetical implementation of require(), which is quite similar to what is actually done by require(): function require(/* ... */) { const module = { exports: {} }; ((module, exports) => { // Module code here. In this example, define a function. function someFunc() {} exports = someFunc; // At this point, exports is no longer a shortcut to module.exports, and // this module will still export an empty default object. module.exports = someFunc; // At this point, the module will now export someFunc, instead of the // default object. })(module, module.exports); return module.exports; }

    module.filename

    <string> The fully resolved filename of the module.

    module.id

    <string> The identifier for the module. Typically this is the fully resolved filename.

    module.isPreloading

    Type: <boolean> true if the module is running during the Node.js preload phase.

    module.loaded

    <boolean> Whether or not the module is done loading, or is in the process of loading.

    module.parent

    Deprecated: Please use require.main and module.children instead. <module> | <null> | <undefined> The module that first required this one, or null if the current module is the entry point of the current process, or undefined if the module was loaded by something that is not a CommonJS module (E.G.: REPL or import).

    module.path

    <string> The directory name of the module. This is usually the same as the path.dirname() of the module.id.

    module.paths

    <string[]> The search paths for the module.

    module.require(id)

    id <string> Returns: <any> exported module content The module.require() method provides a way to load a module as if require() was called from the original module. In order to do this, it is necessary to get a reference to the module object. Since require() returns the module.exports, and the module is typically only available within a specific module's code, it must be explicitly exported in order to be used.

     The Module object

    This section was moved to Modules: module core module. module.builtinModules module.createRequire(filename) module.syncBuiltinESMExports()

     Source map v3 support

    This section was moved to Modules: module core module. module.findSourceMap(path) Class: module.SourceMap new SourceMap(payload) sourceMap.payload sourceMap.findEntry(lineNumber, columnNumber)

    change node.js's console font color

    cli-color

    Usage:

    const clc = require('cli-color'); console.log(clc.red('Text in red')); Styles can be mixed: console.log(clc.red.bgWhite.underline("Underlined red text on white background.")); Styled text can be mixed with unstyled: console.log(clc.red("red") + " plain " + clc.blue("blue")); Styled text can be nested: console.log(clc.red("red " + clc.blue("blue") + " red")); Best way is to predefine needed stylings and then use it: var error = clc.red.bold; var warn = clc.yellow; var notice = clc.blue; console.log(error("Error!")); console.log(warn("Warning")); console.log(notice("Notice")); Note: No colors or styles are output when NO_COLOR env var is set Styles Styles will display correctly if font used in your console supports them. bold italic underline blink inverse strike Colors
    ForegroundBackground
    blackbgBlack
    redbgRed
    greenbgGreen
    yellowbgYellow
    bluebgBlue
    magentabgMagenta
    cyanbgCyan
    whitebgWhite
    cli-color

    create reminder server

    follow these steps: Set up a new Node.js project: Create a new directory for your project, navigate to it in a terminal, and run the command npm init to initialize a new Node.js project. Follow the prompts to set up your project. Install dependencies: Install the necessary dependencies for your server. In this case, you'll need express to create the server and handle HTTP requests, and body-parser to parse incoming request bodies. Run the following command to install these dependencies: npm install express body-parser Create the server: Create a new JavaScript file, such as server.js, and require the necessary dependencies: javascript const express = require('express'); const bodyParser = require('body-parser'); const app = express(); const port = 3000; // Choose the desired port number app.use(bodyParser.urlencoded({ extended: false })); app.use(bodyParser.json()); // Start the server app.listen(port, () => { console.log(`Server is running on port ${port}`); }); Define routes: Add routes to handle different API endpoints. For example, you can have a route to create a new reminder and a route to fetch all reminders: // Create a new reminder app.post('/reminders', (req, res) => { const { text, date } = req.body; // Save the reminder to a database or perform necessary actions console.log(`New reminder created: ${text} on ${date}`); res.sendStatus(201); // Send a success status code }); // Get all reminders app.get('/reminders', (req, res) => { // Retrieve reminders from the database or any other storage const reminders = [ { text: 'Reminder 1', date: '2023-11-22' }, { text: 'Reminder 2', date: '2023-11-23' } ]; res.json(reminders); // Send the reminders as JSON response }); Handle client requests: The server is now ready to receive requests from clients. You can send HTTP requests to the server using a library like axios from your client application. For example, in a separate JavaScript file: javascript const axios = require('axios'); // Create a new reminder axios.post('http://localhost:3000/reminders', { text: 'Meeting', date: '2023-11-24' }) .then(response => { console.log('Reminder created successfully'); }) .catch(error => { console.error('Error creating reminder:', error); }); // Get all reminders axios.get('http://localhost:3000/reminders') .then(response => { const reminders = response.data; console.log('All reminders:', reminders); }) .catch(error => { console.error('Error retrieving reminders:', error); }); Remember to run your Node.js server by executing node server.js in the terminal. Now you have a basic reminder server that can handle client requests for creating reminders and fetching all reminders. Feel free to enhance the server with additional functionality and error handling as per your requirements. Here's an example of how to connect to the reminder server through an HTML browser page Create an HTML file: Create a new HTML file, such as index.html, and add the following content: <!DOCTYPE html> <html> <head> <title>Reminder App</title> </head> <body> <h1>Reminder App</h1> <form id="reminderForm"> <label for="reminderText">Reminder Text:</label> <input type="text" id="reminderText" required> <label for="reminderDate">Reminder Date:</label> <input type="date" id="reminderDate" required> <button type="submit">Create Reminder</button> </form> <ul id="reminderList"></ul> <script src="script.js"></script> </body> </html> Create a JavaScript file: Create a new JavaScript file, such as script.js, in the same directory as the HTML file. This file will contain the client-side JavaScript code to interact with the server. Write JavaScript code: In the script.js file, add the following JavaScript code to handle form submission and retrieve reminders from the server: javascript document.addEventListener('DOMContentLoaded', () => { const form = document.getElementById('reminderForm'); const reminderList = document.getElementById('reminderList'); form.addEventListener('submit', (event) => { event.preventDefault(); const text = document.getElementById('reminderText').value; const date = document.getElementById('reminderDate').value; createReminder(text, date); }); getReminders(); function createReminder(text, date) { fetch('http://localhost:3000/reminders', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text, date }) }) .then(response => { if (response.ok) { console.log('Reminder created successfully'); getReminders(); // Refresh the reminder list } else { throw new Error('Error creating reminder'); } }) .catch(error => { console.error('Error creating reminder:', error); }); } function getReminders() { fetch('http://localhost:3000/reminders') .then(response => response.json()) .then(reminders => { reminderList.innerHTML = ''; // Clear the existing list reminders.forEach(reminder => { const li = document.createElement('li'); li.textContent = `${reminder.text} - ${reminder.date}`; reminderList.appendChild(li); }); }) .catch(error => { console.error('Error retrieving reminders:', error); }); } }); Start the reminder server: Make sure your Node.js reminder server is running by executing node server.js in the terminal. Test the application: Open the HTML file (index.html) in a web browser. You should see a form to create reminders and a list to display existing reminders. When you submit the form, it will send a request to the server to create a new reminder. The list will update to display all the reminders fetched from the server. Make sure the server is running on the same port specified in the JavaScript code (http://localhost:3000/reminders). Adjust the URL if your server is running on a different port. With this setup, you can interact with the reminder server through the HTML browser page, creating reminders and displaying them in real-time.

    How To Code in Node.js Tutorials

    Write and Run Your First Program in Node.js

    How To Write and Run Your First Program in Node.js You’ll be introduced to a few Node-specific concepts and build your way up to create a program that helps users inspect environment variables on their system.

    Use the Node.js REPL

    How To Use the Node.js REPL The Node.js Read-Eval-Print-Loop (REPL) is an interactive shell that processes Node.js expressions. The shell reads JavaScript code the user enters, evaluates the result of interpreting the line of code, prints the result to the user, and loops until the user signals to quit. The REPL is bundled with with every Node.js installation and allows you to quickly test and explore JavaScript code within the Node environment without having to store it in a file.

    Use Node.js Modules with npm and package.json

    How To Use Node.js Modules with npm and package.json The Node.js Package Manager (npm) is the default and most popular package manager in the Node.js ecosystem, and is primarily used to install and manage external modules in a Node.js project. In this tutorial, you will manage packages with npm, first keeping track of modules with the package.json file, and then using the npm CLI tool to list your package dependencies, update your packages, uninstall your packages, and perform an audit to find security flaws in your packages.

    Create a Node.js Module

    How To Create a Node.js Module In this tutorial, you will create a Node.js module organized with npm that suggests what color web developers should use in their designs. You will develop the module by storing the colors as an array, and providing a function to retrieve one randomly. Afterwards, you will run through various ways of exporting and importing a module into a Node.js application.

    Write Asynchronous Code in Node.js

    How To Write Asynchronous Code in Node.js With asynchronous programming, JavaScript and Node.js developers can execute other code while waiting for activities like network requests to finish. This can make writing API calls much more efficient. In this tutorial, you will learn how JavaScript manages asynchronous tasks with help from the Event Loop. You will then create a program that uses asynchronous programming written in three ways: with callbacks, promises, and the async/await keywords.

    Test a Node.js Module with Mocha and Assert

    How To Test a Node.js Module with Mocha and Assert Testing is an integral part of software development. With the right test setup, this process can be automated, saving a lot of time. In this article, you’ll write tests for a Node.js TODO list module. You will set up and use the Mocha test framework to structure a series of integration tests. Then you’ll use the Node.js assert module to create the tests themselves. Finally, you will try out testing with asynchronous code, and use hooks to prepare your test fixtures and environments.

    Create a Web Server in Node.js with the HTTP Module

    How To Create a Web Server in Node.js with the HTTP Module Node.js allows developers to use JavaScript to write back-end code, even though traditionally it was used in the browser to write front-end code. Having both the frontend and backend together like this reduces the effort it takes to make a web server. In this tutorial, you will learn how to build web servers using the http module that’s included in Node.js. You will build web servers that can return JSON data, CSV files, and HTML web pages.

    Using Buffers in Node.js

    Using Buffers in Node.js A buffer is a space in memory (typically RAM) that stores binary data. In Node.js, we can access these spaces of memory with the built-in Buffer class. Buffers are useful when using JavaScript to interacting with binary data, usually at lower networking levels. In this tutorial, you will use the Node.js REPL to create buffers, read from buffers, write to and copy from buffers, and use buffers to convert between binary data and data encoded with ASCII and UTF-8.

    Using Event Emitters in Node.js

    Using Event Emitters in Node.js Event emitters are objects in Node.js that trigger an event by sending a message to signal that an action was completed. In this article, you will create an event listener for a TicketManager JavaScript class that allows a user to buy tickets. You will set up listeners for the buy event, which will trigger every time a ticket is bought. This process will also teach you how to manage erroneous events from the emitter and how to manage event subscribers.

    Debug Node.js with the Built-In Debugger and Chrome DevTools

    How To Debug Node.js with the Built-In Debugger and Chrome DevTools In this article, you will use a debugger to debug some sample Node.js applications. You will first debug code using the built-in Node.js debugger tool, setting up watchers and breakpoints so you can find the root cause of a bug. You will then use Google Chrome DevTools as a Graphical User Interface (GUI) alternative to the command line Node.js debugger.

    Launch Child Processes in Node.js

    How To Launch Child Processes in Node.js Since Node.js instances create a single process with a single thread, JavaScript operations that take a long time to run can sometimes block the execution of other code. A key strategy to work around this problem is to launch a child process to run multiple processes concurrently. In this tutorial, you will use the child_process module to create child processes while executing a series of sample Node.js applications.

    Work with Files using the fs Module in Node.js

    How To Work with Files using the fs Module in Node.js With Node.js, you can use JavaScript to programmatically manipulate files with the built-in fs module. The name is short for “file system,” and the module contains all the functions you need to read, write, and delete files on the local machine.

    Create an HTTP Client with Core HTTP in Node.js

    How To Create an HTTP Client with Core HTTP in Node.js It’s common for a modern JavaScript application to communicate with other servers to accomplish a task. In this article, you will use the https Node.js module to make HTTP requests to a web API, including GET, POST, PUT, and DELETE requests.

    Work with Files using the fs Module

    Introduction

    Back-end program or command line interface (CLI) tool might need to write downloaded data to a file in order to save it, or a data-intensive application may need to export to JSON, CSV, or Excel formats. These programs would need to communicate with the filesystem of the operating system on which they are running. With Node.js, you can programmatically manipulate files with the built-in fs module. The name is short for “file system,” and the module contains all the functions you need to read, write, and delete files on the local machine. This unique aspect of Node.js makes JavaScript a useful language for back-end and CLI tool programming. The fs module supports interacting with files synchronously, asynchronously, or via streams; this tutorial will focus on how to use the asynchronous, Promise-based API, the most commonly used method for Node.js developers.

    Prerequisites

    This article uses JavaScript Promises to work with files, particularly with the async/await syntax. If you’re not familiar with Promises, async/await syntax, or asynchronous programming, check How To Write Asynchronous Code in Node.js.

    Step 1 — Reading Files with readFile()

    In this step, you’ll write a program to read files in Node.js. To do this, you’ll need to import the fs module, a standard Node.js module for working with files, and then use the module’s readFile() function. Your program will read the file, store its contents in a variable, then log its contents to the console. The first step will be to set up the coding environment for this activity and the ones in the later sections. Create a folder to store your code. In your terminal, make a folder called node-files: mkdir node-files Change your working directory to the newly created folder with the cd command: cd node-files In this folder, you’ll create two files. The first file will be a new file with content that your program will read later. The second file will be the Node.js module that reads the file. Create the file greetings.txt with the following command: echo "hello, hola, bonjour, hallo" > greetings.txt The echo command prints its string argument to the terminal. You use to redirect echo’s output to a new file, greetings.txt. Now, create and open readFile.js in your text editor of choice. This tutorial uses nano, a terminal text editor. You can open this file with nano like this: nano readFile.js The code for this file can be broken up into three sections. First, you need to import the Node.js module that allows your program to work with files. In your text editor, type this code: node-files/readFile.js const fs = require('fs').promises; As mentioned earlier, you use the fs module to interact with the filesystem. Notice, though, that you are importing the .promises part of the module. When the fs module was first created, the primary way to write asynchronous code in Node.js was through callbacks. As promises grew in popularity, the Node.js team worked to support them in the fs module out of the box. In Node.js version 10, they created a promises object in the fs module that uses promises, while the main fs module continues to expose functions that use callbacks. In this program, you are importing the promise version of the module. Once the module is imported, you can create an asynchronous function to read the file. Asynchronous functions begin with the async keyword. With an asynchronous function, you can resolve promises using the await keyword, instead of chaining the promise with the .then() method. Create a new function readFile() that accepts one argument, a string called filePath. Your readFile() function will use the fs module to load the file into a variable using async/await syntax. Enter the following highlighted code: node-files/readFile.js const fs = require('fs').promises; async function readFile(filePath) { try { const data = await fs.readFile(filePath); console.log(data.toString()); } catch (error) { console.error(`Got an error trying to read the file: ${error.message}`); } } You define the function with the async keyword so you can later use the accompanying await keyword. To capture errors in your asynchronous file reading operation, you enclose the call to fs.readFile() with a try...catch block. Within the try section, you load a file to a data variable with the fs.readFile() function. The only required argument for that function is the file path, which is given as a string. The fs.readFile() returns a buffer object by default. A buffer object can store any kind of file type. When you log the contents of the file, you convert those bytes into text by using the toString() method of the buffer object. If an error is caught, typically if the file is not found or the program does not have permission to read the file, you log the error you received in the console. Finally, call the function on the greetings.txt file with the following highlighted line: node-files/readFile.js const fs = require('fs').promises; async function readFile(filePath) { try { const data = await fs.readFile(filePath); console.log(data.toString()); } catch (error) { console.error(`Got an error trying to read the file: ${error.message}`); } } readFile('greetings.txt'); Be sure to save your contents. With nano, you can save and exit by pressing CTRL+X. Your program will now read the greetings.txt file you created earlier and log its contents to the terminal. Confirm this by executing your module with node: node readFile.js You will receive the following output: Outputhello, hola, bonjour, hallo You’ve now read a file with the fs module’s readFile() function using the async/await syntax. Note: In some earlier versions of Node.js, you will receive the following warning when using the fs module: (node:13085) ExperimentalWarning: The fs.promises API is experimental The promises object of the fs module was introduced in Node.js version 10, so some earlier versions still call the module experimental. This warning was removed when the API became stable in version 12.6. Now that you’ve read a file with the fs module, you will next create a file and write text to it.

    Step 2 — Writing Files with writeFile()

    In this step, you will write files with the writeFile() function of the fs module. You will create a CSV file in Node.js that keeps track of a grocery bill. The first time you write the file, you will create the file and add the headers. The second time, you will append data to the file. Open a new file in your text editor: nano writeFile.js Begin your code by importing the fs module: node-files/writeFile.js const fs = require('fs').promises; You will continue to use async/await syntax as you create two functions. The first function will be to make the CSV file. The second function will be to add data to the CSV file. In your text editor, enter the following highlighted code: node-files/writeFile.js const fs = require('fs').promises; async function openFile() { try { const csvHeaders = 'name,quantity,price' await fs.writeFile('groceries.csv', csvHeaders); } catch (error) { console.error(`Got an error trying to write to a file: ${error.message}`); } } This asynchronous function first creates a csvHeaders variable that contains the column headings of your CSV file. You then use the writeFile() function of the fs module to create a file and write data to it. The first argument is the file path. As you provided just the file name, Node.js will create the file in the same directory that you’re executing the code in. The second argument is the data you are writing, in this case the csvHeaders variable. Next, create a new function to add items to your grocery list. Add the following highlighted function in your text editor: node-files/writeFile.js const fs = require('fs').promises; async function openFile() { try { const csvHeaders = 'name,quantity,price' await fs.writeFile('groceries.csv', csvHeaders); } catch (error) { console.error(`Got an error trying to write to a file: ${error.message}`); } } async function addGroceryItem(name, quantity, price) { try { const csvLine = `\n${name},${quantity},${price}` await fs.writeFile('groceries.csv', csvLine, { flag: 'a' }); } catch (error) { console.error(`Got an error trying to write to a file: ${error.message}`); } } The asynchronous addGroceryItem() function accepts three arguments: the name of the grocery item, the amount you are buying, and the price per unit. These arguments are used with template literal syntax to form the csvLine variable, which is the data you are writing to the file. You then use the writeFile() method as you did in the openFile() function. However, this time you have a third argument: a JavaScript object. This object has a flag key with the value a. Flags tell Node.js how to interact with the file on the system. By using the flag a, you are telling Node.js to append to the file, not overwrite it. If you don’t specify a flag, it defaults to w, which creates a new file if none exists or overwrites a file if it already exists. You can learn more about filesystem flags in the Node.js documentation. To complete your script, use these functions. Add the following highlighted lines at the end of the file: node-files/writeFile.js async function addGroceryItem(name, quantity, price) { try { const csvLine = `\n${name},${quantity},${price}` await fs.writeFile('groceries.csv', csvLine, { flag: 'a' }); } catch (error) { console.error(`Got an error trying to write to a file: ${error.message}`); } } (async function () { await openFile(); await addGroceryItem('eggs', 12, 1.50); await addGroceryItem('nutella', 1, 4); })(); To call the functions, you first create a wrapper function with async function. Since the await keyword can not be used from the global scope as of the writing of this tutorial, you must wrap the asynchronous functions in an async function. Notice that this function is anonymous, meaning it has no name to identify it. Your openFile() and addGroceryItem() functions are asynchronous functions. Without enclosing these calls in another function, you cannot guarantee the order of the content. The wrapper you created is defined with the async keyword. Within that function you order the function calls using the await keyword. Finally, the async function definition is enclosed in parentheses. These tell JavaScript that the code inside them is a function expression. The parentheses at the end of the function and before the semicolon are used to invoke the function immediately. This is called an Immediately-Invoked Function Expression (IIFE). By using an IIFE with an anonymous function, you can test that your code produces a CSV file with three lines: the column headers, a line for eggs, and the last line for nutella. Save and exit nano with CTRL+X. Now, run your code with the node command: node writeFile.js There will be no output. However, a new file will exist in your current directory. Use the cat command to display the contents of groceries.csv: cat groceries.csv You will receive the following output: node-files/groceries.csv name,quantity,price eggs,12,1.5 nutella,1,4 Your call to openFile() created a new file and added the column headings for your CSV. The subsequent calls to addGroceryItem() then added your two lines of data. With the writeFile() function, you can create and edit files. Next, you will delete files, a common operation when you have temporary files or need to make space on a hard drive.

    Step 3 — Deleting Files with unlink()

    In this step, you will delete files with the unlink() function in the fs module. You will write a Node.js script to delete the groceries.csv file that you created in the last section. In your terminal, create a new file for this Node.js module: nano deleteFile.js Now you will write code that creates an asynchronous deleteFile() function. That function will accept a file path as an argument, passing it to the unlink() function to remove it from your filesystem. In your text editor, write the following code: node-files/deleteFile.js const fs = require('fs').promises; async function deleteFile(filePath) { try { await fs.unlink(filePath); console.log(`Deleted ${filePath}`); } catch (error) { console.error(`Got an error trying to delete the file: ${error.message}`); } } deleteFile('groceries.csv'); The unlink() function accepts one argument: the file path of the file you want to be deleted. Warning: When you delete the file with the unlink() function, it is not sent to your recycle bin or trash can but permanently removed from your filesystem. This action is not reversible, so please be certain that you want to remove the file before executing your code. Exit nano, ensuring that you save the contents of the file by entering CTRL+X. Now, execute the program. Run the following command in your terminal: node deleteFile.js You will receive the following output: OutputDeleted groceries.csv To confirm that the file no longer exists, use the ls command in your current directory: ls This command will display these files: OutputdeleteFile.js greetings.txt readFile.js writeFile.js You’ve now confirmed that your file was deleted with the unlink() function. So far you’ve learned how to read, write, edit, and delete files. The following section uses a function to move files to different folders. After learning that function, you will be able to do the most critical file management tasks in Node.js.

    Step 4 — Moving Files with rename()

    Folders are used to organize files, so being able to programmatically move files from one folder to another makes file management easier. You can move files in Node.js with the rename() function. In this step, you’ll move a copy of the greetings.txt file into a new folder. Before you can code your Node.js module, you need to set a few things up. Begin by creating a folder that you’ll be moving your file into. In your terminal, create a test-data folder in your current directory: mkdir test-data Now, copy the greetings.txt file that was used in the first step using the cp command: cp greetings.txt greetings-2.txt Finish the setup by opening a JavaScript file to contain your code: nano moveFile.js In your Node.js module, you’ll create a function called moveFile() that calls the rename() function. When using the rename() function, you need to provide the file path of the original file and the path of the destination location. For this example, you’ll use a moveFile() function to move the greetings-2.txt file into the test-data folder. You’ll also change its name to salutations.txt. Enter the following code in your open text editor: node-files/moveFile.js const fs = require('fs').promises; async function moveFile(source, destination) { try { await fs.rename(source, destination); console.log(`Moved file from ${source} to ${destination}`); } catch (error) { console.error(`Got an error trying to move the file: ${error.message}`); } } moveFile('greetings-2.txt', 'test-data/salutations.txt'); As mentioned earlier, the rename() function takes two arguments: the source and destination file paths. This function can move files to other folders, rename a file in its current directory, or move and rename at the same time. In your code, you are moving and renaming your file. Save and exit nano by pressing CTRL+X. Next, execute this program with node. Enter this command to run the program: node moveFile.js You will receive this output: OutputMoved file from greetings-2.txt to test-data/salutations.txt To confirm that the file no longer exists in your current directory, you can use the ls command: ls This command will display these files and folder: OutputdeleteFile.js greetings.txt moveFile.js readFile.js test-data writeFile.js You can now use ls to list the files in the test-data subfolder: ls test-data Your moved file will appear in the output: Outputsalutations.txt You have now used the rename() function to move a file from your current directory into a subfolder. You also renamed the file with the same function call.

    Conclusion

    You first loaded the contents of a file with readFile(). You then created new files and appended data to an existing file with the writeFile() function. You permanently removed a file with the unlink() function, and then move and renamed a file with rename().

    How To Write Asynchronous Code in Node.js

    Introduction

    For many programs in JavaScript, code is executed as the developer writes it—line by line. This is called synchronous execution, because the lines are executed one after the other, in the order they were written. However, not every instruction you give to the computer needs to be attended to immediately. For example, if you send a network request, the process executing your code will have to wait for the data to return before it can work on it. In this case, time would be wasted if it did not execute other code while waiting for the network request to be completed. To solve this problem, developers use asynchronous programming, in which lines of code are executed in a different order than the one in which they were written. With asynchronous programming, we can execute other code while we wait for long activities like network requests to finish. JavaScript code is executed on a single thread within a computer process. Its code is processed synchronously on this thread, with only one instruction run at a time. Therefore, if we were to do a long-running task on this thread, all of the remaining code is blocked until the task is complete. By leveraging JavaScript’s asynchronous programming features, we can offload long-running tasks to a background thread to avoid this problem. When the task is complete, the code we need to process the task’s data is put back on the main single thread. In this tutorial, you will learn how JavaScript manages asynchronous tasks with help from the Event Loop, which is a JavaScript construct that completes a new task while waiting for another. You will then create a program that uses asynchronous programming to request a list of movies from a Studio Ghibli API and save the data to a CSV file. The asynchronous code will be written in three ways: callbacks, promises, and with the async/await keywords. Note: As of this writing, asynchronous programming is no longer done using only callbacks, but learning this obsolete method can provide great context as to why the JavaScript community now uses promises. The async/await keywords enable us to use promises in a less verbose way, and are thus the standard way to do asynchronous programming in JavaScript at the time of writing this article.

    Prerequisites

    Node.js installed on your development machine. This tutorial uses version 10.17.0. To install this on macOS or Ubuntu 18.04, follow the steps in How to Install Node.js and Create a Local Development Environment on macOS or the Installing Using a PPA section of How To Install Node.js on Ubuntu 18.04. You will also need to be familiar with installing packages in your project. Get up to speed by reading our guide on How To Use Node.js Modules with npm and package.json. It is important that you’re comfortable creating and executing functions in JavaScript before learning how to use them asynchronously. If you need an introduction or refresher, you can read our guide on How To Define Functions in JavaScript

    The Event Loop

    Let’s begin by studying the internal workings of JavaScript function execution. Understanding how this behaves will allow you to write asynchronous code more deliberately, and will help you with troubleshooting code in the future. As the JavaScript interpreter executes the code, every function that is called is added to JavaScript’s call stack. The call stack is a stack—a list-like data structure where items can only be added to the top, and removed from the top. Stacks follow the “Last in, first out” or LIFO principle. If you add two items on the stack, the most recently added item is removed first. Let’s illustrate with an example using the call stack. If JavaScript encounters a function functionA() being called, it is added to the call stack. If that function functionA() calls another function functionB(), then functionB() is added to the top of the call stack. As JavaScript completes the execution of a function, it is removed from the call stack. Therefore, JavaScript will execute functionB() first, remove it from the stack when complete, and then finish the execution of functionA() and remove it from the call stack. This is why inner functions are always executed before their outer functions. When JavaScript encounters an asynchronous operation, like writing to a file, it adds it to a table in its memory. This table stores the operation, the condition for it to be completed, and the function to be called when it’s completed. As the operation completes, JavaScript adds the associated function to the message queue. A queue is another list-like data structure where items can only be added to the bottom but removed from the top. In the message queue, if two or more asynchronous operations are ready for their functions to be executed, the asynchronous operation that was completed first will have its function marked for execution first. Functions in the message queue are waiting to be added to the call stack. The event loop is a perpetual process that checks if the call stack is empty. If it is, then the first item in the message queue is moved to the call stack. JavaScript prioritizes functions in the message queue over function calls it interprets in the code. The combined effect of the call stack, message queue, and event loop allows JavaScript code to be processed while managing asynchronous activities. Now that you have a high-level understanding of the event loop, you know how the asynchronous code you write will be executed. With this knowledge, you can now create asynchronous code with three different approaches: callbacks, promises, and async/await.

    Asynchronous Programming with Callbacks

    A callback function is one that is passed as an argument to another function, and then executed when the other function is finished. We use callbacks to ensure that code is executed only after an asynchronous operation is completed. For a long time, callbacks were the most common mechanism for writing asynchronous code, but now they have largely become obsolete because they can make code confusing to read. In this step, you’ll write an example of asynchronous code using callbacks so that you can use it as a baseline to see the increased efficiency of other strategies. There are many ways to use callback functions in another function. Generally, they take this structure: function asynchronousFunction([ Function Arguments ], [ Callback Function ]) { [ Action ] } While it is not syntactically required by JavaScript or Node.js to have the callback function as the last argument of the outer function, it is a common practice that makes callbacks easier to identify. It’s also common for JavaScript developers to use an anonymous function as a callback. Anonymous functions are those created without a name. It’s usually much more readable when a function is defined at the end of the argument list. To demonstrate callbacks, let’s create a Node.js module that writes a list of Studio Ghibli movies to a file. First, create a folder that will store our JavaScript file and its output: mkdir ghibliMovies Then enter that folder: cd ghibliMovies We will start by making an HTTP request to the Studio Ghibli API, which our callback function will log the results of. To do this, we will install a library that allows us to access the data of an HTTP response in a callback. In your terminal, initialize npm so we can have a reference for our packages later: npm init -y Then, install the request library: npm i request --save Now open a new file called callbackMovies.js in a text editor like nano: nano callbackMovies.js In your text editor, enter the following code. Let’s begin by sending an HTTP request with the request module: callbackMovies.js const request = require('request'); request('https://ghibliapi.herokuapp.com/films'); In the first line, we load the request module that was installed via npm. The module returns a function that can make HTTP requests; we then save that function in the request constant. We then make the HTTP request using the request() function. Let’s now print the data from the HTTP request to the console by adding the highlighted changes: callbackMovies.js const request = require('request'); request('https://ghibliapi.herokuapp.com/films', (error, response, body) => { if (error) { console.error(`Could not send request to API: ${error.message}`); return; } if (response.statusCode != 200) { console.error(`Expected status code 200 but received ${response.statusCode}.`); return; } console.log('Processing our list of movies'); movies = JSON.parse(body); movies.forEach(movie => { console.log(`${movie['title']}, ${movie['release_date']}`); }); }); When we use the request() function, we give it two parameters: The URL of the website we are trying to request A callback function that handles any errors or successful responses after the request is complete Our callback function has three arguments: error, response, and body. When the HTTP request is complete, the arguments are automatically given values depending on the outcome. If the request failed to send, then error would contain an object, but response and body would be null. If it made the request successfully, then the HTTP response is stored in response. If our HTTP response returns data (in this example we get JSON) then the data is set in body. Our callback function first checks to see if we received an error. It’s best practice to check for errors in a callback first so the execution of the callback won’t continue with missing data. In this case, we log the error and the function’s execution. We then check the status code of the response. Our server may not always be available, and APIs can change causing once sensible requests to become incorrect. By checking that the status code is 200, which means the request was “OK”, we can have confidence that our response is what we expect it to be. Finally, we parse the response body to an Array and loop through each movie to log its name and release year. After saving and quitting the file, run this script with: node callbackMovies.js You will get the following output: Output Castle in the Sky, 1986 Grave of the Fireflies, 1988 My Neighbor Totoro, 1988 Kiki's Delivery Service, 1989 Only Yesterday, 1991 Porco Rosso, 1992 Pom Poko, 1994 Whisper of the Heart, 1995 Princess Mononoke, 1997 My Neighbors the Yamadas, 1999 Spirited Away, 2001 The Cat Returns, 2002 Howl's Moving Castle, 2004 Tales from Earthsea, 2006 Ponyo, 2008 Arrietty, 2010 From Up on Poppy Hill, 2011 The Wind Rises, 2013 The Tale of the Princess Kaguya, 2013 When Marnie Was There, 2014 We successfully received a list of Studio Ghibli movies with the year they were released. Now let’s complete this program by writing the movie list we are currently logging into a file. Update the callbackMovies.js file in your text editor to include the following highlighted code, which creates a CSV file with our movie data: callbackMovies.js const request = require('request'); const fs = require('fs'); request('https://ghibliapi.herokuapp.com/films', (error, response, body) => { if (error) { console.error(`Could not send request to API: ${error.message}`); return; } if (response.statusCode != 200) { console.error(`Expected status code 200 but received ${response.statusCode}.`); return; } console.log('Processing our list of movies'); movies = JSON.parse(body); let movieList = ''; movies.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); fs.writeFile('callbackMovies.csv', movieList, (error) => { if (error) { console.error(`Could not save the Ghibli movies to a file: ${error}`); return; } console.log('Saved our list of movies to callbackMovies.csv');; }); }); Noting the highlighted changes, we see that we import the fs module. This module is standard in all Node.js installations, and it contains a writeFile() method that can asynchronously write to a file. Instead of logging the data to the console, we now add it to a string variable movieList. We then use writeFile() to save the contents of movieList to a new file—callbackMovies.csv. Finally, we provide a callback to the writeFile() function, which has one argument: error. This allows us to handle cases where we are not able to write to a file, for example when the user we are running the node process on does not have those permissions. Save the file and run this Node.js program once again with: node callbackMovies.js In your ghibliMovies folder, you will see callbackMovies.csv, which has the following content: callbackMovies.csv Castle in the Sky, 1986 Grave of the Fireflies, 1988 My Neighbor Totoro, 1988 Kiki's Delivery Service, 1989 Only Yesterday, 1991 Porco Rosso, 1992 Pom Poko, 1994 Whisper of the Heart, 1995 Princess Mononoke, 1997 My Neighbors the Yamadas, 1999 Spirited Away, 2001 The Cat Returns, 2002 Howl's Moving Castle, 2004 Tales from Earthsea, 2006 Ponyo, 2008 Arrietty, 2010 From Up on Poppy Hill, 2011 The Wind Rises, 2013 The Tale of the Princess Kaguya, 2013 When Marnie Was There, 2014 It’s important to note that we write to our CSV file in the callback of the HTTP request. Once the code is in the callback function, it will only write to the file after the HTTP request was completed. If we wanted to communicate to a database after we wrote our CSV file, we would make another asynchronous function that would be called in the callback of writeFile(). The more asynchronous code we have, the more callback functions have to be nested. Let’s imagine that we want to execute five asynchronous operations, each one only able to run when another is complete. If we were to code this, we would have something like this: doSomething1(() => { doSomething2(() => { doSomething3(() => { doSomething4(() => { doSomething5(() => { // final action }); }); }); }); }); When nested callbacks have many lines of code to execute, they become substantially more complex and unreadable. As your JavaScript project grows in size and complexity, this effect will become more pronounced, until it is eventually unmanageable. Because of this, developers no longer use callbacks to handle asynchronous operations. To improve the syntax of our asynchronous code, we can use promises instead.

    Using Promises for Concise Asynchronous Programming

    A promise is a JavaScript object that will return a value at some point in the future. Asynchronous functions can return promise objects instead of concrete values. If we get a value in the future, we say that the promise was fulfilled. If we get an error in the future, we say that the promise was rejected. Otherwise, the promise is still being worked on in a pending state. Promises generally take the following form: promiseFunction() .then([ Callback Function for Fulfilled Promise ]) .catch([ Callback Function for Rejected Promise ]) As shown in this template, promises also use callback functions. We have a callback function for the then() method, which is executed when a promise is fulfilled. We also have a callback function for the catch() method to handle any errors that come up while the promise is being executed. Let’s get firsthand experience with promises by rewriting our Studio Ghibli program to use promises instead. Axios is a promise-based HTTP client for JavaScript, so let’s go ahead and install it: npm i axios --save Now, with your text editor of choice, create a new file promiseMovies.js: nano promiseMovies.js Our program will make an HTTP request with axios and then use a special promised-based version of fs to save to a new CSV file. Type this code in promiseMovies.js so we can load Axios and send an HTTP request to the movie API: promiseMovies.js const axios = require('axios'); axios.get('https://ghibliapi.herokuapp.com/films'); In the first line we load the axios module, storing the returned function in a constant called axios. We then use the axios.get() method to send an HTTP request to the API. The axios.get() method returns a promise. Let’s chain that promise so we can print the list of Ghibli movies to the console: promiseMovies.js const axios = require('axios'); const fs = require('fs').promises; axios.get('https://ghibliapi.herokuapp.com/films') .then((response) => { console.log('Successfully retrieved our list of movies'); response.data.forEach(movie => { console.log(`${movie['title']}, ${movie['release_date']}`); }); }) Let’s break down what’s happening. After making an HTTP GET request with axios.get(), we use the then() function, which is only executed when the promise is fulfilled. In this case, we print the movies to the screen like we did in the callbacks example. To improve this program, add the highlighted code to write the HTTP data to a file: promiseMovies.js const axios = require('axios'); const fs = require('fs').promises; axios.get('https://ghibliapi.herokuapp.com/films') .then((response) => { console.log('Successfully retrieved our list of movies'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); return fs.writeFile('promiseMovies.csv', movieList); }) .then(() => { console.log('Saved our list of movies to promiseMovies.csv'); }) We additionally import the fs module once again. Note how after the fs import we have .promises. Node.js includes a promised-based version of the callback-based fs library, so backward compatibility is not broken in legacy projects. The first then() function that processes the HTTP request now calls fs.writeFile() instead of printing to the console. Since we imported the promise-based version of fs, our writeFile() function returns another promise. As such, we append another then() function for when the writeFile() promise is fulfilled. A promise can return a new promise, allowing us to execute promises one after the other. This paves the way for us to perform multiple asynchronous operations. This is called promise chaining, and it is analogous to nesting callbacks. The second then() is only called after we successfully write to the file. Note: In this example, we did not check for the HTTP status code like we did in the callback example. By default, axios does not fulfil its promise if it gets a status code indicating an error. As such, we no longer need to validate it. To complete this program, chain the promise with a catch() function as it is highlighted in the following: promiseMovies.js const axios = require('axios'); const fs = require('fs').promises; axios.get('https://ghibliapi.herokuapp.com/films') .then((response) => { console.log('Successfully retrieved our list of movies'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); return fs.writeFile('promiseMovies.csv', movieList); }) .then(() => { console.log('Saved our list of movies to promiseMovies.csv'); }) .catch((error) => { console.error(`Could not save the Ghibli movies to a file: ${error}`); }); If any promise is not fulfilled in the chain of promises, JavaScript automatically goes to the catch() function if it was defined. That’s why we only have one catch() clause even though we have two asynchronous operations. Let’s confirm that our program produces the same output by running: node promiseMovies.js In your ghibliMovies folder, you will see the promiseMovies.csv file containing: promiseMovies.csv Castle in the Sky, 1986 Grave of the Fireflies, 1988 My Neighbor Totoro, 1988 Kiki's Delivery Service, 1989 Only Yesterday, 1991 Porco Rosso, 1992 Pom Poko, 1994 Whisper of the Heart, 1995 Princess Mononoke, 1997 My Neighbors the Yamadas, 1999 Spirited Away, 2001 The Cat Returns, 2002 Howl's Moving Castle, 2004 Tales from Earthsea, 2006 Ponyo, 2008 Arrietty, 2010 From Up on Poppy Hill, 2011 The Wind Rises, 2013 The Tale of the Princess Kaguya, 2013 When Marnie Was There, 2014 With promises, we can write much more concise code than using only callbacks. The promise chain of callbacks is a cleaner option than nesting callbacks. However, as we make more asynchronous calls, our promise chain becomes longer and harder to maintain. The verbosity of callbacks and promises come from the need to create functions when we have the result of an asynchronous task. A better experience would be to wait for an asynchronous result and put it in a variable outside the function. That way, we can use the results in the variables without having to make a function. We can achieve this with the async and await keywords.

    Writing JavaScript with async/await

    The async/await keywords provide an alternative syntax when working with promises. Instead of having the result of a promise available in the then() method, the result is returned as a value like in any other function. We define a function with the async keyword to tell JavaScript that it’s an asynchronous function that returns a promise. We use the await keyword to tell JavaScript to return the results of the promise instead of returning the promise itself when it’s fulfilled. In general, async/await usage looks like this: async function() { await [Asynchronous Action] } Let’s see how using async/await can improve our Studio Ghibli program. Use your text editor to create and open a new file asyncAwaitMovies.js: nano asyncAwaitMovies.js In your newly opened JavaScript file, let’s start by importing the same modules we used in our promise example: asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; The imports are the same as promiseMovies.js because async/await uses promises. Now we use the async keyword to create a function with our asynchronous code: asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; async function saveMovies() {} We create a new function called saveMovies() but we include async at the beginning of its definition. This is important as we can only use the await keyword in an asynchronous function. Use the await keyword to make an HTTP request that gets the list of movies from the Ghibli API: asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; async function saveMovies() { let response = await axios.get('https://ghibliapi.herokuapp.com/films'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); } In our saveMovies() function, we make an HTTP request with axios.get() like before. This time, we don’t chain it with a then() function. Instead, we add await before it is called. When JavaScript sees await, it will only execute the remaining code of the function after axios.get() finishes execution and sets the response variable. The other code saves the movie data so we can write to a file. Let’s write the movie data to a file: asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; async function saveMovies() { let response = await axios.get('https://ghibliapi.herokuapp.com/films'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); await fs.writeFile('asyncAwaitMovies.csv', movieList); } We also use the await keyword when we write to the file with fs.writeFile(). To complete this function, we need to catch errors our promises can throw. Let’s do this by encapsulating our code in a try/catch block: asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; async function saveMovies() { try { let response = await axios.get('https://ghibliapi.herokuapp.com/films'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); await fs.writeFile('asyncAwaitMovies.csv', movieList); } catch (error) { console.error(`Could not save the Ghibli movies to a file: ${error}`); } } Since promises can fail, we encase our asynchronous code with a try/catch clause. This will capture any errors that are thrown when either the HTTP request or file writing operations fail. Finally, let’s call our asynchronous function saveMovies() so it will be executed when we run the program with node asyncAwaitMovies.js const axios = require('axios'); const fs = require('fs').promises; async function saveMovies() { try { let response = await axios.get('https://ghibliapi.herokuapp.com/films'); let movieList = ''; response.data.forEach(movie => { movieList += `${movie['title']}, ${movie['release_date']}\n`; }); await fs.writeFile('asyncAwaitMovies.csv', movieList); } catch (error) { console.error(`Could not save the Ghibli movies to a file: ${error}`); } } saveMovies(); At a glance, this looks like a typical synchronous JavaScript code block. It has fewer functions being passed around, which looks a bit neater. These small tweaks make asynchronous code with async/await easier to maintain. Test this iteration of our program by entering this in your terminal: node asyncAwaitMovies.js In your ghibliMovies folder, a new asyncAwaitMovies.csv file will be created with the following contents: asyncAwaitMovies.csv Castle in the Sky, 1986 Grave of the Fireflies, 1988 My Neighbor Totoro, 1988 Kiki's Delivery Service, 1989 Only Yesterday, 1991 Porco Rosso, 1992 Pom Poko, 1994 Whisper of the Heart, 1995 Princess Mononoke, 1997 My Neighbors the Yamadas, 1999 Spirited Away, 2001 The Cat Returns, 2002 Howl's Moving Castle, 2004 Tales from Earthsea, 2006 Ponyo, 2008 Arrietty, 2010 From Up on Poppy Hill, 2011 The Wind Rises, 2013 The Tale of the Princess Kaguya, 2013 When Marnie Was There, 2014 You have now used the JavaScript features async/await to manage asynchronous code.

    Conclusion

    In this tutorial, you learned how JavaScript handles executing functions and managing asynchronous operations with the event loop. You then wrote programs that created a CSV file after making an HTTP request for movie data using various asynchronous programming techniques. First, you used the obsolete callback-based approach. You then used promises, and finally async/await to make the promise syntax more succinct. With your understanding of asynchronous code with Node.js, you can now develop programs that benefit from asynchronous programming, like those that rely on API calls. Have a look at this list of public APIs. To use them, you will have to make asynchronous HTTP requests like we did in this tutorial. For further study, try building an app that uses these APIs to practice the techniques you learned here.

    How To Build WebSocket Server And Client in NodeJS

    What is WebSocket?

    Websocket is a client and server implementation, an API establishing a connection between a web browser and server.

    NodeJS and WebSocket

    NodeJS is a JavaScript runtime built on Chrome’s V8 JavaScript engine and create a NodeJS WebSocket Server in 5 minutes. Unlike HTTP servers, WebSockets ones don’t have any routes by default because they are unnecessary. In this protocol, you can just use a string to send and receive data from the client-side. A good practice is to send a JSON object serialized to a string.

    Why should I use WebSocket?

    Websockets help in sending multiple requests simultaneously and can also have multiple connections. We can enable proxies.

    Building a WebSocket Server With NodeJS

    Creates a directory for WebSocket Server to be built with NodeJS. mkdir nodejs-websocket-server cd nodejs-websocket-server Next, run the following command to install the ws library as a dependency npm install ws This will create a package.json file in your project and install the dependency in the node_modules directory. After installation, create a javascript file for example “main.js” and paste the following code for creating a web server: // Importing the required modules const WebSocketServer = require('ws'); // Creating a new websocket server const wss = new WebSocketServer.Server({ port: 8080 }) // Creating connection using websocket wss.on("connection", ws => { console.log("new client connected"); // sending message to client ws.send('Welcome, you are connected!'); //on message from client ws.on("message", data => { console.log(`Client has sent us: ${data}`) }); // handling what to do when clients disconnects from server ws.on("close", () => { console.log("the client has closed connection!"); }); // handling client connection error ws.onerror = function () { console.log("Some Error occurred") } }); console.log("The WebSocket server is running on port 8080"); This code will create a basic WebSocket Server for you. The code is self-explanatory and can be edited as per your needs. For testing it, open up a terminal and type: node main.js Building A WebSocket Client For NodeJS WebSocket Server Keep the NodeJS program (WebSocket server) running in the terminal. Now its time to test the WebSocket server and to do so, we need a WebSocket client. A simple way to test any websocket server is to use the Online WebSocket tester by PieSocket Simply open the link above and enter ws://localhost:8080 in it to test the WebSocket server you just created. "Upgrade Required" is a reference to the header that is sent when establishing a WebSocket connection between a client (i.e. the browser) and the server. You need a client application that connects to your WebSockets server, which could be a static html page. We can also build a WebSocket client in HTML and JavaScript quickly. To create a WebSocket client, start by creating an HTML file (WebSocket client) and include the following Javascript code snippet in the file to connect to the server we just launched on our local machine. <!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>NodeJS WebSocket Server</title> </head> <body> <h2>Hello world</h2> <script> const ws = new WebSocket("ws://localhost:8080"); ws.addEventListener("open", () =>{ console.log("We are connected"); ws.send("How are you?"); ws.send("I am going to close."); ws.close(); }); ws.addEventListener('message', function (event) { console.log(event.data); }); </script> </body> </html> Open up the HTML file in your browser, and you could see in the terminal that you have been connected to the WebSocket server. When you open the HTML file, a WebSocket connection from the server is made, and you can see it in the Networks tab of your browser’s developer’s tools (right click > inspect element > click networks tab). Check console logs in developer tools to see the messages from the WebSocket server. For a more advanced tutorial on the application of WebSocket server, you should see: Create A Real-time Chat Web Application With WebSocket. We can also use a Proxy server with WebSocket. For using a proxy you can just paste these two lines of code in your javascript file. var proxy = process.env.http_proxy || 'http://168.63.76.32:3128'; console.log('using proxy server %j', proxy); Using Piesocket to Scale WebSockets

    Thread in Operating System



    A thread is a single sequence stream within a process. Threads are also called lightweight processes as they possess some of the properties of processes. Each thread belongs to exactly one process. In an operating system that supports multithreading, the process can consist of many threads. But threads can be effective only if the CPU is more than 1 otherwise two threads have to context switch for that single CPU.

    What is Thread in Operating Systems?


    In a process, a thread refers to a single sequential activity being executed. these activities are also known as thread of execution or thread control. Now, any operating system process can execute a thread. we can say, that a process can have multiple threads.

    Why Do We Need Thread?


    Threads run in parallel improving the application performance. Each such thread has its own CPU state and stack, but they share the address space of the process and the environment. Threads can share common data so they do not need to use inter-process communication. Like the processes, threads also have states like ready, executing, blocked, etc. Priority can be assigned to the threads just like the process, and the highest priority thread is scheduled first. Each thread has its own Thread Control Block (TCB). Like the process, a context switch occurs for the thread, and register contents are saved in (TCB). As threads share the same address space and resources, synchronization is also required for the various activities of the thread.

    Components of Threads


    These are the basic components of the Operating System. Stack Space Register Set Program Counter

    Types of Thread in Operating System


    Threads are of two types. These are described below. User Level Thread Kernel Level Thread Threads

     1. User Level Threads


    User Level Thread is a type of thread that is not created using system calls. The kernel has no work in the management of user-level threads. User-level threads can be easily implemented by the user. In case when user-level threads are single-handed processes, kernel-level thread manages them. Let’s look at the advantages and disadvantages of User-Level Thread.

    Advantages of User-Level Threads


    Implementation of the User-Level Thread is easier than Kernel Level Thread. Context Switch Time is less in User Level Thread. User-Level Thread is more efficient than Kernel-Level Thread. Because of the presence of only Program Counter, Register Set, and Stack Space, it has a simple representation.

    Disadvantages of User-Level Threads


    There is a lack of coordination between Thread and Kernel. In case of a page fault, the whole process can be blocked.

     2. Kernel Level Threads


    A kernel Level Thread is a type of thread that can recognize the Operating system easily. Kernel Level Threads has its own thread table where it keeps track of the system. The operating System Kernel helps in managing threads. Kernel Threads have somehow longer context switching time. Kernel helps in the management of threads.

    Advantages of Kernel-Level Threads


    It has up-to-date information on all threads. Applications that block frequency are to be handled by the Kernel-Level Threads. Whenever any process requires more time to process, Kernel-Level Thread provides more time to it.

    Disadvantages of Kernel-Level threads


    Kernel-Level Thread is slower than User-Level Thread. Implementation of this type of thread is a little more complex than a user-level thread. For more, refer to the Difference Between User-Level Thread and Kernel-Level Thread.

    Difference Between Process and Thread


    The primary difference is that threads within the same process run in a shared memory space, while processes run in separate memory spaces. Threads are not independent of one another like processes are, and as a result, threads share with other threads their code section, data section, and OS resources (like open files and signals). But, like a process, a thread has its own program counter (PC), register set, and stack space. For more, refer to Difference Between Process and Thread.

    What is Multi-Threading?


    A thread is also known as a lightweight process. The idea is to achieve parallelism by dividing a process into multiple threads. For example, in a browser, multiple tabs can be different threads. MS Word uses multiple threads: one thread to format the text, another thread to process inputs, etc. More advantages of multithreading are discussed below. Multithreading is a technique used in operating systems to improve the performance and responsiveness of computer systems. Multithreading allows multiple threads (i.e., lightweight processes) to share the same resources of a single process, such as the CPU, memory, and I/O devices. Single Threaded vs Multi-threaded ProcessSingle Threaded vs Multi-threaded Process

    Benefits of Thread in Operating System


    Responsiveness: If the process is divided into multiple threads, if one thread completes its execution, then its output can be immediately returned. Faster context switch: Context switch time between threads is lower compared to the process context switch. Process context switching requires more overhead from the CPU. Effective utilization of multiprocessor system: If we have multiple threads in a single process, then we can schedule multiple threads on multiple processors. This will make process execution faster. Resource sharing: Resources like code, data, and files can be shared among all threads within a process. Note: Stacks and registers can’t be shared among the threads. Each thread has its own stack and registers. Communication: Communication between multiple threads is easier, as the threads share a common address space. while in the process we have to follow some specific communication techniques for communication between the two processes. Enhanced throughput of the system: If a process is divided into multiple threads, and each thread function is considered as one job, then the number of jobs completed per unit of time is increased, thus increasing the throughput of the system.

    Conclusion


    Threads in operating systems are lightweight processes that improve application speed by executing concurrently within the same process. They share the process’s address space and resources, which allows for more efficient communication and resource utilisation. Threads are classified as either user-level or kernel-level, with each having advantages and drawbacks. Multithreading enhances system response time, context switching speed, resource sharing, and overall throughput. This technique is critical for improving the speed and responsiveness of current computing systems.

    Frequently Asked Questions on Thread in Operating System – FAQs


     What is thread state in operating system?


    There are four different states of a thread. new runnable blocked terminated

     Why thread is better than process?


    Threads require fewer resources whereas process require more resources. that is why thread is better than process.

     Why is multithreading faster?


    While the computer system’s processor only carries out one instruction at a time when multithreading is used, various threads from several applications are carried out so quickly that it appears as though the programs are running simultaneously.

     What is deadlock in OS?


    A deadlock is a situation where a set of processes is blocked because each process is holding a resource and waiting for another resource acquired by some other process.

    Create Load Balancing Servers using Node.js



    In node, Load balancing is a technique used to distribute incoming network traffic across multiple servers to ensure no single server becomes overwhelmed, thus improving responsiveness and availability. In this article, we’ll explore how to create a load-balancing server using Node.js.

    Why Load Balancing?

    Load balancing is essential for: Improved Performance: Distributes workload evenly across multiple servers. High Availability: Ensures service continuity in case one server fails. Scalability: Allows the application to handle increased traffic by adding more servers.

     How to set up load balancing server?

    Using Cluster Module

    NodeJS has a built-in module called Cluster Module to take advantage of a multi-core system. Using this module you can launch NodeJS instances to each core of your system. Master process listening on a port to accept client requests and distribute across the workers using some intelligent fashion. So, using this module you can utilize the working ability of your system. The following example covers the performance difference by using and without using the Cluster Module.

     Without Cluster Module:

    Make sure you have installed the express and crypto module using the following command: npm install express crypto Example: Implementation to show the example with help of above module. const { generateKeyPair } = require('crypto'); const app = require('express')(); // API endpoint // Send public key as a response app.get('/key', (req, res) => { generateKeyPair('rsa', { modulusLength: 2048, publicKeyEncoding: { type: 'spki', format: 'pem' }, privateKeyEncoding: { type: 'pkcs8', format: 'pem', cipher: 'aes-256-cbc', passphrase: 'top secret' } }, (err, publicKey, privateKey) => { // Handle errors and use the // generated key pair. res.send(publicKey); }) }) app.listen(3000, err => { err ? console.log("Error in server setup") : console.log('Server listening on PORT 3000') }); Step to Run Application: Run the application using the following command from the root directory of the project node index.js Output: We will see the following output on the terminal screen: Server listening on PORT 3000 Now open your browser and go to http://localhost:3000/key, you will see the following output:
    —–BEGIN PUBLIC KEY—– MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAwAneYp5HlT93Y3ZlPAHjZAnPFvBskQKKfo4an8jskcgEuG85KnZ7/16kQw2Q8/7Ksdm0sIF7qmAUOu0B773X 1BXQ0liWh+ctHIq/C0e9eM1zOsX6vWwX5Y+WH610cpcb50ltmCeyRmD5Qvf+OE/C BqYrQxVRf4q9+029woF84Lk4tK6OXsdU+Gdqo2FSUzqhwwvYZJJXhW6Gt259m0wD YTZlactvfwhe2EHkHAdN8RdLqiJH9kZV47D6sLS9YG6Ai/HneBIjzTtdXQjqi5vF Y+H+ixZGeShypVHVS119Mi+hnHs7SMzY0GmRleOpna58O1RKPGQg49E7Hr0dz8eh 6QIDAQAB —–END PUBLIC KEY—–
    The above code listening on port 3000 and send Public Key as a response. Generating an RSA key is CPU-intensive work. Here only one NodeJS instance working in a single core. To see the performance, we have used autocannon tools to test our server as shown below: The above image showed that the server can respond to 2000 requests when running 500 concurrent connections for 10 seconds. The average request/second is 190.1 seconds.

     Using Cluster Module:

    Example: Implementation to show with using cluster module. const express = require('express'); const cluster = require('cluster'); const { generateKeyPair } = require('crypto'); // Check the number of available CPU. const numCPUs = require('os').cpus().length; const app = express(); const PORT = 3000; // For Master process if (cluster.isMaster) { console.log(`Master ${process.pid} is running`); // Fork workers. for (let i = 0; i < numCPUs; i++) { cluster.fork(); } // This event is first when worker died cluster.on('exit', (worker, code, signal) => { console.log(`worker ${worker.process.pid} died`); }); } // For Worker else { // Workers can share any TCP connection // In this case it is an HTTP server app.listen(PORT, err => { err ? console.log("Error in server setup") : console.log(`Worker ${process.pid} started`); }); // API endpoint // Send public key app.get('/key', (req, res) => { generateKeyPair('rsa', { modulusLength: 2048, publicKeyEncoding: { type: 'spki', format: 'pem' }, privateKeyEncoding: { type: 'pkcs8', format: 'pem', cipher: 'aes-256-cbc', passphrase: 'top secret' } }, (err, publicKey, privateKey) => { // Handle errors and use the // generated key pair. res.send(publicKey); }) }) } Step to Run Application: Run the application using the following command from the root directory of the project node index.js Output: We will see the following output on terminal screen: Master 16916 is running Worker 6504 started Worker 14824 started Worker 20868 started Worker 12312 started Worker 9968 started Worker 16544 started Worker 8676 started Worker 11064 started Now open your browser and go to http://localhost:3000/key, you will see the following output:
    —–BEGIN PUBLIC KEY—– MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAzxMQp9y9MblP9dXWuQhf sdlEVnrgmCIyP7CAveYEkI6ua5PJFLRStKHTe3O8rxu+h6I2exXn92F/4RE9Yo8EOnrUCSlqy9bl9qY8D7uBMWir0I65xMZu3rM9Yxi+6gP8H4CMDiJhLoIEap+d9Czr OastDPwI+HF+6nmLkHvuq9X5aORvdiOBwMooIoiRpHbgcHovSerJIfQipGs74IiR 107GbpznSUxMIuwV1fgc6mAULuGZl+Daj0SDxfAjk8KiHyXbfHe5stkPNOCWIsbAtCbGN0bCTR8ZJCLdZ4/VGr+eE0NOvOrElXdXLTDVVzO5dKadoEAtzZzzuQId2P/z JwIDAQAB —–END PUBLIC KEY—–
    The above NodeJS application is launched on each core of our system. Where master process accepts the request and distributes across all worker. The performed in this case is shown below: The above image showed that the server can respond to 5000 requests when running 500 concurrent connections for 10 seconds. The average request/second is 162.06 seconds. So, using the cluster module you can handle more requests. But, sometimes it is not enough, if this is your case then your option is horizontal scaling.

    Using Nginx

    If your system has more than one application server to respond to, and you need to distribute client requests across all servers then you can smartly use Nginx as a proxy server. Nginx sits on the front of your server pool and distributes requests using some intelligent fashion. In the following example, we have 4 instances of the same NodeJS application on different ports, also you can use another server. Example: Implementation to show load balancing servers by using nginx. const app = require('express')(); // API endpoint app.get('/', (req,res)=>{ res.send("Welcome to GeeksforGeeks !"); }) // Launching application on several ports app.listen(3000); app.listen(3001); app.listen(3002); app.listen(3003); Now install Nginx on your machine and create a new file in /etc/nginx/conf.d/ called your-domain.com.conf with the following code in it. upstream my_http_servers { # httpServer1 listens to port 3000 server 127.0.0.1:3000; # httpServer2 listens to port 3001 server 127.0.0.1:3001; # httpServer3 listens to port 3002 server 127.0.0.1:3002; # httpServer4 listens to port 3003 server 127.0.0.1:3003; } server { listen 80; server_name your-domain.com www.your-domain.com; location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $http_host; proxy_pass http://my_http_servers; } }

    Using Express Web Server

    There is a lot of advantage to an Express web server. If you are comfortable with NodeJS, you can implement your own Express base load balancer as shown in the following example. Step 1: Create an empty NodeJS application. mkdir LoadBalancer cd LoadBalancer npm init -y Step 2: Install required dependencies like ExpressJS, axios, and Concurrently using the following command. npm i express axios npm i concurrently -g Step 3: Create two file config.js for the load balancer server and index.js for the application server. Example: Implementation to show load balancing servers by using nginx. const express = require('express'); const path = require('path'); const app = express(); const axios = require('axios'); // Application servers const servers = [ "http://localhost:3000", "http://localhost:3001" ] // Track the current application server to send request let current = 0; // Receive new request // Forward to application server const handler = async (req, res) =>{ // Destructure following properties from request object const { method, url, headers, body } = req; // Select the current server to forward the request const server = servers[current]; // Update track to select next server current === (servers.length-1)? current = 0 : current++ try{ // Requesting to underlying application server const response = await axios({ url: `${server}${url}`, method: method, headers: headers, data: body }); // Send back the response data // from application server to client res.send(response.data) } catch(err){ // Send back the error message res.status(500).send("Server error!") } } // Serve favicon.ico image app.get('/favicon.ico', (req, res ) => res.sendFile('/favicon.ico')); // When receive new request // Pass it to handler method app.use((req,res)=>{handler(req, res)}); // Listen on PORT 8080 app.listen(8080, err =>{ err ? console.log("Failed to listen on PORT 8080"): console.log("Load Balancer Server " + "listening on PORT 8080"); }); Here, the filename is index.js const express = require('express'); const app1 = express(); const app2 = express(); // Handler method const handler = num => (req,res)=>{ const { method, url, headers, body } = req; res.send('Response from server ' + num); } // Only handle GET and POST requests // Receive request and pass to handler method app1.get('*', handler(1)).post('*', handler(1)); app2.get('*', handler(2)).post('*', handler(2)); // Start server on PORT 3000 app1.listen(3000, err =>{ err ? console.log("Failed to listen on PORT 3000"): console.log("Application Server listening on PORT 3000"); }); // Start server on PORT 3001 app2.listen(3001, err =>{ err ? console.log("Failed to listen on PORT 3001"): console.log("Application Server listening on PORT 3001"); }); Explanation: The above code starts with 2 Express apps, one on port 3000 and another on port 3001. The separate load balancer process should alternate between these two, sending one request to port 3000, the next request to port 3001, and the next one back to port 3000. Step 4: Open a command prompt on your project folder and run two scripts parallel using concurrently. concurrently "node config.js" "node index.js" Output: We will see the following output on the console: Now, open a browser and go to http://localhost:8080/ and make a few requests, we will see the following output:

    WebSocket Client & Server Implementation

    https://www.npmjs.com/package/websocket

    Overview

    There are some example client and server applications that implement various interoperability testing protocols in the "test/scripts" folder. https://github.com/theturtle32/WebSocket-Node/tree/master/test/scripts

    Installation

    In your project root: $ npm install websocket Then in your code: var WebSocketServer = require('websocket').server; var WebSocketClient = require('websocket').client; var WebSocketFrame = require('websocket').frame; var WebSocketRouter = require('websocket').router; var W3CWebSocket = require('websocket').w3cwebsocket;

    Server Example

    Here's a short example showing a server that echos back anything sent to it, whether utf-8 or binary. #!/usr/bin/env node var WebSocketServer = require('websocket').server; var http = require('http'); var server = http.createServer(function(request, response) { console.log((new Date()) + ' Received request for ' + request.url); response.writeHead(404); response.end(); }); server.listen(8080, function() { console.log((new Date()) + ' Server is listening on port 8080'); }); wsServer = new WebSocketServer({ httpServer: server, // You should not use autoAcceptConnections for production // applications, as it defeats all standard cross-origin protection // facilities built into the protocol and the browser. You should // *always* verify the connection's origin and decide whether or not // to accept it. autoAcceptConnections: false }); function originIsAllowed(origin) { // put logic here to detect whether the specified origin is allowed. return true; } wsServer.on('request', function(request) { if (!originIsAllowed(request.origin)) { // Make sure we only accept requests from an allowed origin request.reject(); console.log((new Date()) + ' Connection from origin ' + request.origin + ' rejected.'); return; } var connection = request.accept('echo-protocol', request.origin); console.log((new Date()) + ' Connection accepted.'); connection.on('message', function(message) { if (message.type === 'utf8') { console.log('Received Message: ' + message.utf8Data); connection.sendUTF(message.utf8Data); } else if (message.type === 'binary') { console.log('Received Binary Message of ' + message.binaryData.length + ' bytes'); connection.sendBytes(message.binaryData); } }); connection.on('close', function(reasonCode, description) { console.log((new Date()) + ' Peer ' + connection.remoteAddress + ' disconnected.'); }); });

    Client Example

    This is a simple example client that will print out any utf-8 messages it receives on the console, and periodically sends a random number. This code demonstrates a client in Node.js, not in the browser #!/usr/bin/env node var WebSocketClient = require('websocket').client; var client = new WebSocketClient(); client.on('connectFailed', function(error) { console.log('Connect Error: ' + error.toString()); }); client.on('connect', function(connection) { console.log('WebSocket Client Connected'); connection.on('error', function(error) { console.log("Connection Error: " + error.toString()); }); connection.on('close', function() { console.log('echo-protocol Connection Closed'); }); connection.on('message', function(message) { if (message.type === 'utf8') { console.log("Received: '" + message.utf8Data + "'"); } }); function sendNumber() { if (connection.connected) { var number = Math.round(Math.random() * 0xFFFFFF); connection.sendUTF(number.toString()); setTimeout(sendNumber, 1000); } } sendNumber(); }); client.connect('ws://localhost:8080/', 'echo-protocol');

    Client Example using the W3C WebSocket API

    Same example as above but using the W3C WebSocket API. var W3CWebSocket = require('websocket').w3cwebsocket; var client = new W3CWebSocket('ws://localhost:8080/', 'echo-protocol'); client.onerror = function() { console.log('Connection Error'); }; client.onopen = function() { console.log('WebSocket Client Connected'); function sendNumber() { if (client.readyState === client.OPEN) { var number = Math.round(Math.random() * 0xFFFFFF); client.send(number.toString()); setTimeout(sendNumber, 1000); } } sendNumber(); }; client.onclose = function() { console.log('echo-protocol Client Closed'); }; client.onmessage = function(e) { if (typeof e.data === 'string') { console.log("Received: '" + e.data + "'"); } };

    Request Router Example

    For an example of using the request router, see libwebsockets-test-server.js in the test folder.

    Resources

    A presentation on the state of the WebSockets protocol that I gave on July 23, 2011 at the LA Hacker News meetup. WebSockets: The Real-Time Web, Delivered

    Multiple servers on Node.js



    I add an example with a possible solution for 2 servers using node. First, you need to create a project: mkdir simulate4servers cd simulate4servers npm init (entry point: index.js)

    Install Express:

    npm install express --save Put the next files into the directory: File app.js: 'use strict'; const express = require('express'); const app = express(); const routes = require('routes.js'); app.use('/',routes); module.exports = app; File app2.js: 'use strict'; const express = require('express'); const app2 = express(); const routes = require('routes2.js'); app2.use('/',routes); module.exports = app2; File configDomain1.js: module.exports = { port: 8000 } File configDomain2.js: module.exports = { port: 8010 } File routes.js: 'use strict'; const express = require('express'); const api = express.Router(); api.get('/', (req,res)=>{ res.send({message: 'Hello World!'}); }); module.exports = api; File routes2.js: 'use strict'; const express = require('express'); const api = express.Router(); api.get('/', (req,res)=>{ res.send({message: 'Hello World 2!'}); }); module.exports = api; File index.js: 'use strict'; const app = require('./app') const app2 = require('./app2') const config = require('./configDomain1'); const config2 = require('./configDomain2'); app.listen(config.port, () => { console.log(`API REST running in http://localhost:${config.port}`); }); app2.listen(config2.port, () => { console.log(`API REST running in http://localhost:${config2.port}`); }); And finally, run it: node index.js

    another example:

    const express = require("express"); const server1 = express(); const server2 = express(); const server3 = express(); server1.listen(8000, () => { console.log("Server 1 is up and running on port 8000"); }) server2.listen(8010, () => { console.log("Server 2 is up and running on port 8010"); }) server3.listen(8020, () => { console.log("Server 3 is up and running on port 8020"); }) express() creates and returns an instance of a server. call it multiple times to create multiple instances. listen on different ports. that's all.

    include server.use(cors())

    This answer is correct. But it should also include server.use(cors()) else it may give error cors policy no 'access-control-allow-origin'. Code to make multiple servers: const express = require('express') const cors = require('cors') server=express() server2=express() server.use(cors()) server2.use(cors()) server.get('/',(req,res)=>res.send({"server":"3001","name":"aakash4dev","data":"data 1"})) server2.get('/',(req,res)=>res.send({"server":"3002","name":"aakash4dev","data":"data 2"})) server.listen(3001) server2.listen(3002)

    %j placeholder

    In the node.js source, the %j placeholder results in a call to JSON.stringify() on the passed argument. In passing your json variable, you are not passing an valid JSON object. You are passing a node.js module which among other things has exported functions. > JSON.stringify({name: "Bob"}) > "{"name":"Bob"}" > JSON.stringify({func: function(){}}) > "{}"

    node.js: read a text file into an array

    Synchronous: var fs = require('fs'); var array = fs.readFileSync('file.txt').toString().split("\n"); for(i in array) { console.log(array[i]); } Asynchronous: var fs = require('fs'); fs.readFile('file.txt', function(err, data) { if(err) throw err; var array = data.toString().split("\n"); for(i in array) { console.log(array[i]); } }); Using the Node.js readline module. var fs = require('fs'); var readline = require('readline'); var filename = process.argv[2]; readline.createInterface({ input: fs.createReadStream(filename), terminal: false }).on('line', function(line) { console.log('Line: ' + line); });

    WebSocket send data to server on initial request

    Call the websocket onopen method. Then in the callback use the websocket send method to send data websocket.onopen = function() { websocket.send(your data) } To accept the incoming message server side just accept the request. This is assuming you are using websocket-node ws.on('request', (request) => { const conn = request.accept('echo-protocol', request.origin) conn.on('message', (message) => { console.log(message) }) }) If you are using this package then just listen for the 'message' event. ws.on('message', (message) => { console.log(message) }) example: // Create WebSocket connection. const socket = new WebSocket("ws://localhost:8080"); // Connection opened socket.addEventListener("open", (event) => { socket.send("Hello Server!"); }); // Listen for messages socket.addEventListener("message", (event) => { console.log("Message from server ", event.data); });

    WebSocket

    Constructor

    WebSocket(): Returns a newly created WebSocket object.

    Instance properties

    WebSocket.binaryType: The binary data type used by the connection. WebSocket.bufferedAmount: The number of bytes of queued data. WebSocket.extensions: The extensions selected by the server. WebSocket.protocol: The sub-protocol selected by the server. WebSocket.readyState: The current state of the connection. WebSocket.url: The absolute URL of the WebSocket.

    Instance methods

    WebSocket.close(): Closes the connection. WebSocket.send(): Enqueues data to be transmitted.

    Events

    Listen to these events using addEventListener() or by assigning an event listener to the oneventname property of this interface. close: Fired when a connection with a WebSocket is closed. Also available via the onclose property error: Fired when a connection with a WebSocket has been closed because of an error, such as when some data couldn't be sent. Also available via the onerror property. message: Fired when data is received through a WebSocket. Also available via the onmessage property. open: Fired when a connection with a WebSocket is opened. Also available via the onopen property.

    Examples

    // Create WebSocket connection. const ws = new WebSocket("ws://localhost:8080"); // Connection opened ws.addEventListener("open", (event) => { ws.send("Hello Server!"); }); // Listen for messages ws.addEventListener("message", (event) => { console.log("Message from server ", event.data); });

    WebSocket Client & Server Implementation for Node



    Installation

    In your project root: $ npm install websocket Then in your code: var WebSocketServer = require('websocket').server; var WebSocketClient = require('websocket').client; var WebSocketFrame = require('websocket').frame; var WebSocketRouter = require('websocket').router; var W3CWebSocket = require('websocket').w3cwebsocket;

    Server Example

    Here's a short example showing a server that echos back anything sent to it, whether utf-8 or binary. var WebSocketServer = require('websocket').server; var http = require('http'); var server = http.createServer(function(request, response) { console.log((new Date()) + ' Received request for ' + request.url); response.writeHead(404); response.end(); }); server.listen(8080, function() { console.log((new Date()) + ' Server is listening on port 8080'); }); wsServer = new WebSocketServer({ httpServer: server, // You should not use autoAcceptConnections for production // applications, as it defeats all standard cross-origin protection // facilities built into the protocol and the browser. You should // *always* verify the connection's origin and decide whether or not // to accept it. autoAcceptConnections: false }); function originIsAllowed(origin) { // put logic here to detect whether the specified origin is allowed. return true; } wsServer.on('request', function(request) { if (!originIsAllowed(request.origin)) { // Make sure we only accept requests from an allowed origin request.reject(); console.log((new Date()) + ' Connection from origin ' + request.origin + ' rejected.'); return; } var connection = request.accept('echo-protocol', request.origin); console.log((new Date()) + ' Connection accepted.'); connection.on('message', function(message) { if (message.type === 'utf8') { console.log('Received Message: ' + message.utf8Data); connection.sendUTF(message.utf8Data); } else if (message.type === 'binary') { console.log('Received Binary Message of ' + message.binaryData.length + ' bytes'); connection.sendBytes(message.binaryData); } }); connection.on('close', function(reasonCode, description) { console.log((new Date()) + ' Peer ' + connection.remoteAddress + ' disconnected.'); }); });

    Client Example

    This is a simple example client that will print out any utf-8 messages it receives on the console, and periodically sends a random number. This code demonstrates a client in Node.js, not in the browser var WebSocketClient = require('websocket').client; var client = new WebSocketClient(); client.on('connectFailed', function(error) { console.log('Connect Error: ' + error.toString()); }); client.on('connect', function(connection) { console.log('WebSocket Client Connected'); connection.on('error', function(error) { console.log("Connection Error: " + error.toString()); }); connection.on('close', function() { console.log('echo-protocol Connection Closed'); }); connection.on('message', function(message) { if (message.type === 'utf8') { console.log("Received: '" + message.utf8Data + "'"); } }); function sendNumber() { if (connection.connected) { var number = Math.round(Math.random() * 0xFFFFFF); connection.sendUTF(number.toString()); setTimeout(sendNumber, 1000); } } sendNumber(); }); client.connect('ws://localhost:8080/', 'echo-protocol');

    Client Example using the W3C WebSocket API

    Same example as above but using the W3C WebSocket API. var W3CWebSocket = require('websocket').w3cwebsocket; var client = new W3CWebSocket('ws://localhost:8080/', 'echo-protocol'); client.onerror = function() { console.log('Connection Error'); }; client.onopen = function() { console.log('WebSocket Client Connected'); function sendNumber() { if (client.readyState === client.OPEN) { var number = Math.round(Math.random() * 0xFFFFFF); client.send(number.toString()); setTimeout(sendNumber, 1000); } } sendNumber(); }; client.onclose = function() { console.log('echo-protocol Client Closed'); }; client.onmessage = function(e) { if (typeof e.data === 'string') { console.log("Received: '" + e.data + "'"); } };

    Request Router Example

    For an example of using the request router, see libwebsockets-test-server.js in the test folder.

    Sending messages with Websockets

    html/javascript code that uses websockets to communicate with a server: // var ws = new WebSocket('ws://' + document.domain + ':' + location.port + '/feed') var ws = new WebSocket("ws://localhost:57252/"); messages = document.createElement('ul'); ws.onmessage = function (event) { var messages = document.getElementsByTagName('ul')[0], message = document.createElement('li'), content = document.createTextNode('Received: ' + event.data); message.appendChild(content); messages.appendChild(message); };

    Chat application with WebSocket



    In this tutorial we'll create a simple chat app using Deno. Our chat app will allow multiple chat clients connected to the same backend to send group messages through web sockets. After a client chooses a username, they can then start sending group messages to other online clients. Each client also displays the list of currently active users.

    Building the View

    We can build the simple UI shown above with the following as our index.html. Note that the app.js script is our chat client (which will be discussed in detail later) <!-- index.html --> <html> <head> <title>Chat App</title> <script src="/public/app.js"></script> </head> <body> <div style="text-align: center"> <div> <b>Users</b> <hr /> <div id="users"></div> <hr class="visible-xs visible-sm" /> </div> <div> <input id="data" placeholder="send message" /> <hr /> <div id="conversation"></div> </div> </div> </body> </html>

    WebSocket Primer

    We will rely on Deno's native support for web sockets when building our client and server. A web socket is a bidirectional communication channel that allows the both the client and server to send messages to each other at any time. Web sockets are frequently used in realtime applications where low latency is critical. Each of our clients will keep a web socket connection open to our server so they can receive the latest messages and user logins without constantly polling.

    Chat Client

    The chat client app.js runs in the browser and listens for updates from our server and then manipulates the DOM. Specifically our client is listening for new messages and the list of currently active users. We need to add event handlers to our client's web socket to specify what happens when our clients receives a new message or event. // app.js const myUsername = prompt("Please enter your name") || "Anonymous"; const socket = new WebSocket( `ws://localhost:8080/start_web_socket?username=${myUsername}`, ); socket.onmessage = (m) => { const data = JSON.parse(m.data); switch (data.event) { case "update-users": // refresh displayed user list let userListHtml = ""; for (const username of data.usernames) { userListHtml += `<div> ${username} </div>`; } document.getElementById("users").innerHTML = userListHtml; break; case "send-message": // display new chat message addMessage(data.username, data.message); break; } }; function addMessage(username, message) { // displays new message document.getElementById( "conversation", ).innerHTML += `<b> ${username} </b>: ${message} <br/>`; } // on page load window.onload = () => { // when the client hits the ENTER key document.getElementById("data").addEventListener("keypress", (e) => { if (e.key === "Enter") { const inputElement = document.getElementById("data"); var message = inputElement.value; inputElement.value = ""; socket.send( JSON.stringify({ event: "send-message", message: message, }), ); } }); };

    Chat Server

    oak is the Deno middleware framework that we'll be using to set up our server. Our server will return the plain index.html file previously shown when the user first navigates to the site. Our server also exposes a ws_endpoint/ endpoint which the chat clients will use to create their web socket connection. Note that the client's initial HTTP connection is converted into a WebSocket connection by the server via HTTP's protocol upgrade mechanism. Our server will maintain web socket connections with each active client and tell them which users are currently active. Our server will also broadcast a message to all active clients whenever there is a new message so that each client can display it. // server.js import { Application, Router } from "https://deno.land/x/oak/mod.ts"; const connectedClients = new Map(); const app = new Application(); const port = 8080; const router = new Router(); // send a message to all connected clients function broadcast(message) { for (const client of connectedClients.values()) { client.send(message); } } // send updated users list to all connected clients function broadcast_usernames() { const usernames = [...connectedClients.keys()]; console.log( "Sending updated username list to all clients: " + JSON.stringify(usernames), ); broadcast( JSON.stringify({ event: "update-users", usernames: usernames, }), ); } router.get("/start_web_socket", async (ctx) => { const socket = await ctx.upgrade(); const username = ctx.request.url.searchParams.get("username"); if (connectedClients.has(username)) { socket.close(1008, `Username ${username} is already taken`); return; } socket.username = username; connectedClients.set(username, socket); console.log(`New client connected: ${username}`); // broadcast the active users list when a new user logs in socket.onopen = () => { broadcast_usernames(); }; // when a client disconnects, remove them from the connected clients list // and broadcast the active users list socket.onclose = () => { console.log(`Client ${socket.username} disconnected`); connectedClients.delete(socket.username); broadcast_usernames(); }; // broadcast new message if someone sent one socket.onmessage = (m) => { const data = JSON.parse(m.data); switch (data.event) { case "send-message": broadcast( JSON.stringify({ event: "send-message", username: socket.username, message: data.message, }), ); break; } }; }); app.use(router.routes()); app.use(router.allowedMethods()); app.use(async (context) => { await context.send({ root: `${Deno.cwd()}/`, index: "public/index.html", }); }); console.log("Listening at http://localhost:" + port); await app.listen({ port }); We can start our server with the following command. Note we need to explicitly grant access to the file system and network because Deno is secure by default. deno run --allow-read --allow-net server.js Now if you visit http://localhost:8080 you will be able to start a chat session. You can open 2 simultaneous windows and try chatting with yourself.

    Example Code

    You can find the entire example code here.

    WebSockets Real time chat app



    Creating the backend

    Step 1: Environment Set up

    Create a new project directory in your machine and and name it real-time-chat-app and cd into it like somkdir real-time-chat-app cd real-time-chat-app then initalize the node js project with npm init to create a package.json file npm init

    Step 2: Installing Dependencies

    next step is to install the dependencies for our project. We will be needing the express js and the ws websocket library to set up the server for our real time chat applicationinstall the express js likenpm install express install the ws websocket library likenpm install ws

    Step 3: Creating the Back-End servers

    Create a index.js file in your project then write the below code to setup an express js server const express = require('express'); const app = express(); const PORT = process.env.PORT || 3000; app.get('/', (req, res) => { res.send('Hello World!'); }); app.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); index.js This is a simple express js server which listens on the port 3000 and returns the hello world of the / endpointnext step is to add the ws library to the index.js file and setup the websocket server running independently on some other portAdd the below code to the index.js file: const WebSocket = require('ws'); // We are creating a new websocket server here and running it on port 8181 const wss = new WebSocket.Server({ port: 8181 }); wss.on('connection', function connection(ws) { ws.on('message', function incoming(message) { console.log('received: %s', message); }); ws.send('This is a message'); }); Creating a websocket server What are we doing in this code We are importing the websocket library ws we are running the websocket server on port 8181. This server is running independently of the HTTP expressjs server which means that it does not share the same port or connection as the HTTP server We are handling the websocket connections wss.on('connection', function connection(ws) { //..... }); here we are listning for new websocket connections. When a client connects to the server via websockets the callback method is triggeredthe ws parameter here represents the connected websocket client.4. Then inside the callback function we are setting up another listener that listens to any messages that the client is sending to the server. Whenever a message is recieved we are logging that message to the console for now. Later we can send this message back to the client or do whatever we want to do with the messagews.on('message', function incoming(message) { console.log('received: %s', message); }); 5. Lastly, we are sending a sample message back to the client. ws.send('This is a message');

    Testing the Back-end server

    Now that we have created our simple express js and websocket servers. We can test them.It is quite easy to test these servers, we are going to use third party tools such as Postman to do this.

    Testing the express js server

    paste the localhost:300 and send a GET request to the expressjs server and you will get a hello world response
    express js server

    Testing the websocket server

    to create a websocket request on the postman sidebar click new and then click on websocketthen paste the websocket server url and create a connection ws://localhost:8181click on the connect button to connect to the websocket server and type a message in the message section and click on the send button to send the message You can see the connection established in the postman response section also the message that was sent from the websocket server.Also, you can see the message logged to the server on the server console log
    Postman
    server console Here is how the complete back-end code looks like const express = require('express'); const app = express(); const PORT = process.env.PORT || 3000; const WebSocket = require('ws'); // Create a WebSocket server completely detached from the HTTP server. const wss = new WebSocket.Server({ port: 8181 }); wss.on('connection', function connection(ws) { console.log("WS connection arrived"); ws.on('message', function incoming(message) { console.log('received: %s', message); }); ws.send('this is a message'); }); app.get('/', (req, res) => { res.send('Hello World!'); }); app.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); index.js Now we have created a simple server and tested the basic functionality. Our server doesn't do much except send a basic message back to the client.We want the server to take the messages from a single client and send it back again to the sender client.Let us edit the server code to add that functionality. const express = require('express'); const http = require('http'); const WebSocket = require('ws'); const path = require('path'); const app = express(); const PORT = process.env.PORT || 3000; // Serve static files from a 'public' directory app.use(express.static(path.join(__dirname, 'public'))); // Create HTTP server by passing the Express app const server = http.createServer(app); // Integrate WebSocket with the HTTP server const wss = new WebSocket.Server({ server }); wss.on('connection', function connection(ws) { console.log("WS connection arrived"); ws.on('message', function incoming(message) { console.log('received: %s', message); // Echo the message back to the client ws.send(`Echo: ${message}`); }); // Send a welcome message on new connection ws.send('Welcome to the chat!'); }); // Default route can be removed if you are serving only static files // app.get('/', (req, res) => { // res.send('Hello World!'); // }); // Start the server server.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); index.js What are we doing here we have edited the express code to send files from the public directory. We are going to build the front-end of our chat app and send the html and js files for the front end from the public directory In the websocket we are sending the message that we are recieving back to the client We can also broadcast the message to all the client connected to the websocket server. We are going to do this later on in the article as a bonus content Chat API Trusted by world’s biggest corporations | DeadSimpleChatChat API and SDk that supports 10 Million Concurrent Users. Features like Pre-Built turn key Chat Solution, Chat API’s, Customization, Moderation, Q&A, Language Translation.DeadSimpleChat
    DeadSimpleChat

    Building the front end / Client side

    Next we are going to be building the front end of our real time chat application.

    Step 4 : Creating the Public Directory

    In the root folder of your application create a new directory called the public directorythere create two new files
      index.html : Here we will write the UI of the chat app app.js : Here we will write the front end logic of the chat app

    Step 5 : Create the UI of the chat app

    Open the index.html file and paste the following code in it. <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Chat App</title> <link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.0.1/dist/tailwind.min.css" rel="stylesheet"> </head> <body class="bg-gray-100"> <div class="container mx-auto p-4"> <h2 class="text-2xl font-bold mb-2">Real-Time Chat</h3> <div id="messages" class="bg-white p-4 h-64 overflow-auto mb-4"></div> <input type="text" id="messageInput" class="border p-2 w-full"> <button id="sendButton" class="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded"> Send </button> </div> <script src="app.js"></script> </body> </html> index.html this is basically creating a chat interface and styling it with tailwind css. You can open the file in the browser and it looks something like this
    chat interface

    Step 6 : Implementing websocket on the client side

    Next we are going to create a javascript file and name it app.js on the client side const ws = new WebSocket(`ws://${window.location.host}`); const messages = document.getElementById('messages'); const messageInput = document.getElementById('messageInput'); const sendButton = document.getElementById('sendButton'); ws.onopen = () => { console.log('Connected to the server'); }; ws.onmessage = (event) => { const message = document.createElement('div'); message.textContent = event.data; messages.appendChild(message); }; ws.onerror = (error) => { console.error('WebSocket error:', error); }; ws.onclose = () => { console.log('Disconnected from the server'); }; sendButton.onclick = () => { const message = messageInput.value; ws.send(message); messageInput.value = ''; }; app.js This file handles the connection on the client side, receiving messages and sending messages clearing the message box updating the HTML interface.
    Chat Interface If you go to the Localhost://3000 you can see the chat interface and if you send the message you can see it in the chat box Chat API Trusted by world’s biggest corporations | DeadSimpleChatChat API and SDk that supports 10 Million Concurrent Users. Features like Pre-Built turn key Chat Solution, Chat API’s, Customization, Moderation, Q&A, Language Translation.DeadSimpleChat
    DeadSimpleChat
    DeadSimpleChat

    Need JavaScript Chat API for your website or app

    DeadSimpleChat is an Chat API provider
      Add Scalable Chat to your app in minutes 10 Million Online Concurrent users 99.999% Uptime Moderation features 1-1 Chat Group Chat Fully Customizable Chat API and SDK Pre-Built Chat

    Bonus: Sending chat message to all the connected clients

    In this section we are going to change the server code to send the message to all the connected clients instead of just the sender.for this first we need to create an arry to store all the clients that are currently connected to the websocket server // Array to store all the connected clients const clients = []; then edit the websocket code to add a client to the array whenever a new client is connected wss.on('connection', function connection(ws) { console.log("WS connection arrived"); // Add the new connection to our list of clients clients.push(ws); //... Broadcast the message to all the connected clients ws.on('message', function incoming(message) { console.log('received: %s', message); // Broadcast the message to all clients clients.forEach(client => { if (client.readyState === WebSocket.OPEN) { client.send(message.toString()); } }); }); when the connection closes remove the client from the clients Array ws.on('close', () => { // Remove the client from the array when it disconnects const index = clients.indexOf(ws); if (index > -1) { clients.splice(index, 1); } }); here is how the complete server code looks like: const express = require('express'); const http = require('http'); const WebSocket = require('ws'); const path = require('path'); const app = express(); const PORT = process.env.PORT || 3000; // Serve static files from a 'public' directory app.use(express.static(path.join(__dirname, 'public'))); // Create HTTP server by passing the Express app const server = http.createServer(app); // Integrate WebSocket with the HTTP server const wss = new WebSocket.Server({ server }); // Array to keep track of all connected clients const clients = []; wss.on('connection', function connection(ws) { console.log("WS connection arrived"); // Add the new connection to our list of clients clients.push(ws); ws.on('message', function incoming(message) { console.log('received: %s', message); // Broadcast the message to all clients clients.forEach(client => { if (client.readyState === WebSocket.OPEN) { console.log("message",message.toString()) client.send(message.toString()); } }); }); ws.on('close', () => { // Remove the client from the array when it disconnects const index = clients.indexOf(ws); if (index > -1) { clients.splice(index, 1); } }); // Send a welcome message on new connection ws.send('Welcome to the chat!'); }); // Start the server server.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); index.js
    Real TIme Chat App

    Source Code

    index.js const express = require('express'); const http = require('http'); const WebSocket = require('ws'); const path = require('path'); const app = express(); const PORT = process.env.PORT || 3000; // Serve static files from a 'public' directory app.use(express.static(path.join(__dirname, 'public'))); // Create HTTP server by passing the Express app const server = http.createServer(app); // Integrate WebSocket with the HTTP server const wss = new WebSocket.Server({ server }); // Array to keep track of all connected clients const clients = []; wss.on('connection', function connection(ws) { console.log("WS connection arrived"); // Add the new connection to our list of clients clients.push(ws); ws.on('message', function incoming(message) { console.log('received: %s', message); // Broadcast the message to all clients clients.forEach(client => { if (client.readyState === WebSocket.OPEN) { console.log("message",message.toString()) client.send(message.toString()); } }); }); ws.on('close', () => { // Remove the client from the array when it disconnects const index = clients.indexOf(ws); if (index > -1) { clients.splice(index, 1); } }); // Send a welcome message on new connection ws.send('Welcome to the chat!'); }); // Start the server server.listen(PORT, () => { console.log(`Server running on port ${PORT}`); }); index.js

    index.html

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Chat App</title> <link href="https://cdn.jsdelivr.net/npm/tailwindcss@2.0.1/dist/tailwind.min.css" rel="stylesheet"> </head> <body class="bg-gray-100"> <div class="container mx-auto p-4"> <h2 class="text-2xl font-bold mb-2">Real-Time Chat</h3> <div id="messages" class="bg-white p-4 h-64 overflow-auto mb-4"></div> <input type="text" id="messageInput" class="border p-2 w-full"> <button id="sendButton" class="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded"> Send </button> </div> <script src="app.js"></script> </body> </html>

    app.js

    const ws = new WebSocket(`ws://${window.location.host}`); const messages = document.getElementById('messages'); const messageInput = document.getElementById('messageInput'); const sendButton = document.getElementById('sendButton'); ws.onopen = () => { console.log('Connected to the server'); }; ws.onmessage = (event) => { const message = document.createElement('div'); message.textContent = event.data; messages.appendChild(message); }; ws.onerror = (error) => { console.error('WebSocket error:', error); }; ws.onclose = () => { console.log('Disconnected from the server'); }; sendButton.onclick = () => { const message = messageInput.value; ws.send(message); messageInput.value = ''; };

    Writing WebSocket client applications



    Note: This feature is available in Web Workers.
    WebSocket client applications use the WebSocket API to communicate with WebSocket servers using the WebSocket protocol.
    
    Note: The example snippets in this article are taken from our WebSocket chat client/server sample.
    See the code.
    
    

    Creating a WebSocket object

    In order to communicate using the WebSocket protocol, you need to create a WebSocket object; this will automatically attempt to open the connection to the server. The WebSocket constructor accepts one required and one optional parameter: webSocket = new WebSocket(url, protocols); url The URL to which to connect; this should be the URL to which the WebSocket server will respond. This should use the URL scheme wss://, although some software may allow you to use the insecure ws:// for local connections. protocols Optional Either a single protocol string or an array of protocol strings. These strings are used to indicate sub-protocols, so that a single server can implement multiple WebSocket sub-protocols (for example, you might want one server to be able to handle different types of interactions depending on the specified protocol). If you don't specify a protocol string, an empty string is assumed. The constructor will throw a SecurityError if the destination doesn't allow access. This may happen if you attempt to use an insecure connection (most user agents now require a secure link for all WebSocket connections unless they're on the same device or possibly on the same network).

     Connection errors

    If an error occurs while attempting to connect, first a simple event with the name error is sent to the WebSocket object (thereby invoking its onerror handler), and then the CloseEvent is sent to the WebSocket object (thereby invoking its onclose handler) to indicate the reason for the connection's closing. The browser may also output to its console a more descriptive error message as well as a closing code as defined in RFC 6455, Section 7.4 through the CloseEvent.

     Examples

    This simple example creates a new WebSocket, connecting to the server at wss://www.example.com/socketserver. A custom protocol of "protocolOne" is named in the request for the socket in this example, though this can be omitted. js Copy to Clipboardconst exampleSocket = new WebSocket( "wss://www.example.com/socketserver", "protocolOne", ); On return, exampleSocket.readyState is CONNECTING. The readyState will become OPEN once the connection is ready to transfer data. If you want to open a connection and are flexible about the protocols you support, you can specify an array of protocols: const exampleSocket = new WebSocket("wss://www.example.com/socketserver", [ "protocolOne", "protocolTwo", ]); Once the connection is established (that is, readyState is OPEN), exampleSocket.protocol will tell you which protocol the server selected. Establishing a WebSocket relies on the HTTP Upgrade mechanism, so the request for the protocol upgrade is implicit when we address the web server as ws://www.example.com or wss://www.example.com.

    Sending data to the server

    Once you've opened your connection, you can begin transmitting data to the server. To do this, call the WebSocket object's send() method for each message you want to send: exampleSocket.send("Here's some text that the server is urgently awaiting!"); You can send data as a string, Blob, or ArrayBuffer. As establishing a connection is asynchronous and prone to failure there is no guarantee that calling the send() method immediately after creating a WebSocket object will be successful. We can at least be sure that attempting to send data only takes place once a connection is established by defining an onopen event handler to do the work: exampleSocket.onopen = (event) => { exampleSocket.send("Here's some text that the server is urgently awaiting!"); };

     Using JSON to transmit objects

    One handy thing you can do is use JSON to send reasonably complex data to the server. For example, a chat program can interact with a server using a protocol implemented using packets of JSON-encapsulated data: // Send text to all users through the server function sendText() { // Construct a msg object containing the data the server needs to process the message from the chat client. const msg = { type: "message", text: document.getElementById("text").value, id: clientID, date: Date.now(), }; // Send the msg object as a JSON-formatted string. exampleSocket.send(JSON.stringify(msg)); // Blank the text input element, ready to receive the next line of text from the user. document.getElementById("text").value = ""; }

    Receiving messages from the server

    WebSockets is an event-driven API; when messages are received, a message event is sent to the WebSocket object. To handle it, add an event listener for the message event, or use the onmessage event handler. To begin listening for incoming data, you can do something like this: exampleSocket.onmessage = (event) => { console.log(event.data); };

     Receiving and interpreting JSON objects

    Let's consider the chat client application first alluded to in Using JSON to transmit objects. There are assorted types of data packets the client might receive, such as: Login handshake Message text User list updates The code that interprets these incoming messages might look like this: exampleSocket.onmessage = (event) => { const f = document.getElementById("chatbox").contentDocument; let text = ""; const msg = JSON.parse(event.data); const time = new Date(msg.date); const timeStr = time.toLocaleTimeString(); switch (msg.type) { case "id": clientID = msg.id; setUsername(); break; case "username": text = `User <em>${msg.name}</em> signed in at ${timeStr}<br>`; break; case "message": text = `(${timeStr}) ${msg.name} : ${msg.text} <br>`; break; case "rejectusername": text = `Your username has been set to <em>${msg.name}</em> because the name you chose is in use.<br>`; break; case "userlist": document.getElementById("userlistbox").innerHTML = msg.users.join("<br>"); break; } if (text.length) { f.write(text); document.getElementById("chatbox").contentWindow.scrollByPages(1); } }; Here we use JSON.parse() to convert the JSON object back into the original object, then examine and act upon its contents.

     Text data format

    Text received over a WebSocket connection is in UTF-8 format.

    Closing the connection

    When you've finished using the WebSocket connection, call the WebSocket method close(): exampleSocket.close(); It may be helpful to examine the socket's bufferedAmount attribute before attempting to close the connection to determine if any data has yet to be transmitted on the network. If this value isn't 0, there's pending data still, so you may wish to wait before closing the connection.

    Security considerations

    WebSockets should not be used in a mixed content environment; that is, you shouldn't open a non-secure WebSocket connection from a page loaded using HTTPS or vice versa. Most browsers now only allow secure WebSocket connections, and no longer support using them in insecure contexts.

    WebSocket connections with multiple clients example



    client:

    use the send method to send data to the server. const socket = new WebSocket('ws://localhost:8080'); socket.addEventListener('open', function (event) { socket.send('Hello Server!'); }); socket.addEventListener('message', function (event) { console.log('Message from server ', event.data); }); socket.addEventListener('close', function (event) { console.log('The connection has been closed'); });

    server:

    On the server, we similarly need to listen for WebSocket requests. Use the popular ws package to open a connection and listen for messages: const WebSocket = require('ws'); const ws = new WebSocket.Server({ port: 8080 }); ws.on('connection', function connection(wsConnection) { wsConnection.on('message', function incoming(message) { console.log(`server received: ${message}`); }); wsConnection.send('got your message!'); }); Although in this example, we’re sending strings, a common use case of WebSockets is to send stringified JSON data or even binary data, allowing you to structure your messages in the format convenient to you. For a more complete example, Socket.io, a popular front-end framework for making and managing WebSocket connections, has a fantastic walkthrough for building a Node/JavaScript chat app. This library automatically switches between WebSockets and long polling, and also simplifies broadcasting messages to groups of connected users.

    single websocket, multiple clients

    Just I generate a number assigned to each client (can be different device between each other) and I send the random number generated by server to each connection! Before "connection" you shoul add: const WS = require('ws'); const WS_PORT = 8081 const express = require('express'); const app = express(); const PORT = 3000; app.listen(PORT, () => console.log(`Server listening , go to http://localhost:${PORT}`)); app.use(express.static('public')); const wss = new WS.Server({ port: WS_PORT }) const wsSelected = new Set(); // Creating connection using websocket const interval = setInterval(() => { const randomNumber = Math.floor(Math.random() * 100); //Sending same number to each client wsSelected.forEach(ws => ws.send(randomNumber) )}, 2000); After "connection" add: wss.on("connection", ws => { console.log("New client!"); //This line should add wsSelected.add(ws);

    WebSocket Real-Time Chat Room Multiple Clients



    We are going to be using the following packages: WS, Express npm install express ws

    Creat server.js

    const express = require('express'); const http = require('http'); const WebSocket = require('ws'); const port = 6969; const server = http.createServer(express); const wss = new WebSocket.Server({ server }) wss.on('connection', function connection(ws) { ws.on('message', function incoming(data) { wss.clients.forEach(function each(client) { if (client !== ws && client.readyState === WebSocket.OPEN) { client.send(data); } }) }) }) server.listen(port, function() { console.log(`Server is listening on ${port}!`) })

    Creating our index.html file

    <pre id="messages" style="height: 400px; overflow: scroll"></pre> <input type="text" id="messageBox" placeholder="Type your message here" style="display: block; width: 100%; margin-bottom: 10px; padding: 10px;" /> <button id="send" title="Send Message!" style="width: 100%; height: 30px;">Send Message</button> <script> (function() { const sendBtn = document.querySelector('#send'); const messages = document.querySelector('#messages'); const messageBox = document.querySelector('#messageBox'); let ws; function showMessage(message) { messages.textContent += `\n\n${message}`; messages.scrollTop = messages.scrollHeight; messageBox.value = ''; } function init() { if (ws) { ws.onerror = ws.onopen = ws.onclose = null; ws.close(); } ws = new WebSocket('ws://localhost:6969'); ws.onopen = () => { console.log('Connection opened!'); } ws.onmessage = ({ data }) => showMessage(data); ws.onclose = function() { ws = null; } } sendBtn.onclick = function() { if (!ws) { showMessage("No WebSocket connection :("); return ; } ws.send(messageBox.value); showMessage(messageBox.value); } init(); })(); </script>

    How does the code work?

    Showing the final code is great, but how exactly does it all connect and work together? Fine, you win! Let’s go ahead and dissect the server.js file! const express = require('express'); const http = require('http'); const WebSocket = require('ws'); const port = 6969; const server = http.createServer(express); const wss = new WebSocket.Server({ server }) So here what’s going on is we are just doing the usual requires, we pull in express, ws and you might have spotted http as well. We use http so we can initialise a server, and we pass express in there like so: const server = http.createServer(express); along with setting the port to 6969. Lastly, we assign the new WebSocket to wss. wss.on('connection', function connection(ws) { ws.on('message', function incoming(data) { wss.clients.forEach(function each(client) { if (client !== ws && client.readyState === WebSocket.OPEN) { client.send(data); } }) }) }) Next, we listen for a connection on our newly initialised WebSocket by doing wss.on('connection', function connection(ws) { - I named this wss to remind myself that this is the WebSocket Server, but feel free to name this as you like. Once we have the connection, we listen for a message from the client, next, you’ll see that we have a function called incoming, this function gives us data which is the users' messages from the front-end (we will come to the front-end part shortly); we will use data later on to send it to all the connected clients. So now we have the data (the messages), sent from the clients, we want to broadcast that message to each client (apart from the sending client). Next, we run a forEach loop over each connected client, and then we use an if statement to make sure that the client is connected and the socket is open--an important aspect of this if statement is that we are also checking that we are not sending the message back to the client who sent the message!. If that statement comes back as true, we then broadcast the message using: client.send(data);. server.listen(port, function() { console.log(`Server is listening on ${port}!`) }) Lastly, for the server.js file, we just listen on our port that we set above--this is just standard Express! Okay, phew we’re done with the server.js file, now onto the index.html file. <h1>Real Time Messaging</h1> <pre id="messages" style="height: 400px; overflow: scroll"></pre> <input type="text" id="messageBox" placeholder="Type your message here" style="display: block; width: 100%; margin-bottom: 10px; padding: 10px;" /> <button id="send" title="Send Message!" style="width: 100%; height: 30px;">Send Message</button> Here we’re creating a box so we can see our messages that are sent from the clients (as well as our own sent messages), secondly, we then create an input that allows the user to input a message, and finally…we create a button that allows a user to send a message! I’m going to presume you already know what the script tags do, but what does (function() {})() do? Well, that's an immediately invoked function! An immediately invoked function expression just runs as soon as it's defined. So as soon as we call define this function, we invoke the function--basically we run it. const sendBtn = document.querySelector('#send'); const messages = document.querySelector('#messages'); const messageBox = document.querySelector('#messageBox'); Here, we’re just selecting our button, messages, and input DOM elements. Once we've got those selected, we go ahead and create an empty expression let ws; we need this later on. function showMessage(message) { messages.textContent += `\n\n${message}`; messages.scrollTop = messages.scrollHeight; messageBox.value = ''; } Here what we’re doing is just having a function that we can call when we pass it a message, it just goes in and uses the messages selector, adds the text and then we clear the sent message from the user's message box. function init() { if (ws) { ws.onerror = ws.onopen = ws.onclose = null; ws.close(); } ws = new WebSocket('ws://localhost:6969'); ws.onopen = () => { console.log('Connection opened!'); } ws.onmessage = ({ data }) => showMessage(data); ws.onclose = function() { ws = null; } } The init function is basically built so that we can separate out our implementation of the connection to the server. What we do is we check if there's a connection already for the user if there is a connection, we go ahead and null the connection and then close it. Following that, if the user doesn't have a connection, we initialise a new connection to the server ws = new WebSocket('ws://localhost:6969');. Once we have a connection to the server, we simply console.log a message that states we have successfully connected to the server. ws.onopen = () => { console.log('Connection opened!'); } Following the above, we then proceed to check for a message. If there’s a message we pass it to showMessage, and we then add it to the chatbox by using our function that we created earlier. Lastly, if the connection closes, we just null that particular connection by using ws = null;. Furthermore, we then find ourselves at the sendBtn part of the code, now this is quite self-explanatory, but let's make sure we fully understand what is going on here. So we have sendBtn.onclick, which is our trigger to send a message. We first check if there's currently not an active web socket connection by checking if (!ws). The reason we do this is that we don't want to try to send a message if there's no web socket connection. If there isn't a web socket connection, we just return No WebSocket connection :(. If there is a web socket connection, we fire the message to the server with ws.send(messageBox.value), we then show the message in our message box. And lastly, the most important part, we run our init function by invoking it with init();.

    run the server

    To run the server, just use yarn start and you should see Server is listening on 6969!. Then if you go ahead and open up index.html in your browser (try it in 2 different browsers), you'll see that if you send a message in one of the windows, you'll get the sent messages to appear in all your open browser connections!

    Promise



    Introduction

    Imagine that you’re a top singer, and fans ask day and night for your upcoming song. To get some relief, you promise to send it to them when it’s published. You give your fans a list. They can fill in their email addresses, so that when the song becomes available, all subscribed parties instantly receive it. And even if something goes very wrong, say, a fire in the studio, so that you can’t publish the song, they will still be notified. Everyone is happy: you, because the people don’t crowd you anymore, and fans, because they won’t miss the song. This is a real-life analogy for things we often have in programming: A “producing code” that does something and takes time. For instance, some code that loads the data over a network. That’s a “singer”. A “consuming code” that wants the result of the “producing code” once it’s ready. Many functions may need that result. These are the “fans”. A promise is a special JavaScript object that links the “producing code” and the “consuming code” together. In terms of our analogy: this is the “subscription list”. The “producing code” takes whatever time it needs to produce the promised result, and the “promise” makes that result available to all of the subscribed code when it’s ready. The analogy isn’t terribly accurate, because JavaScript promises are more complex than a simple subscription list: they have additional features and limitations. But it’s fine to begin with. The constructor syntax for a promise object is: let promise = new Promise(function(resolve, reject) { // executor (the producing code, "singer") }); The function passed to new Promise is called the executor. When new Promise is created, the executor runs automatically. It contains the producing code which should eventually produce the result. In terms of the analogy above: the executor is the “singer”. Its arguments resolve and reject are callbacks provided by JavaScript itself. Our code is only inside the executor. When the executor obtains the result, be it soon or late, doesn’t matter, it should call one of these callbacks: resolve(value) — if the job is finished successfully, with result value. reject(error) — if an error has occurred, error is the error object. So to summarize: the executor runs automatically and attempts to perform a job. When it is finished with the attempt, it calls resolve if it was successful or reject if there was an error. The promise object returned by the new Promise constructor has these internal properties: state — initially "pending", then changes to either "fulfilled" when resolve is called or "rejected" when reject is called. result — initially undefined, then changes to value when resolve(value) is called or error when reject(error) is called. So the executor eventually moves promise to one of these states: Later we’ll see how “fans” can subscribe to these changes. Here’s an example of a promise constructor and a simple executor function with “producing code” that takes time (via setTimeout): let promise = new Promise(function(resolve, reject) { // the function is executed automatically when the promise is constructed // after 1 second signal that the job is done with the result "done" setTimeout(() => resolve("done"), 1000); }); We can see two things by running the code above: The executor is called automatically and immediately (by new Promise). The executor receives two arguments: resolve and reject. These functions are pre-defined by the JavaScript engine, so we don’t need to create them. We should only call one of them when ready. After one second of “processing”, the executor calls resolve("done") to produce the result. This changes the state of the promise object: That was an example of a successful job completion, a “fulfilled promise”. And now an example of the executor rejecting the promise with an error: let promise = new Promise(function(resolve, reject) { // after 1 second signal that the job is finished with an error setTimeout(() => reject(new Error("Whoops!")), 1000); }); The call to reject(...) moves the promise object to "rejected" state: To summarize, the executor should perform a job (usually something that takes time) and then call resolve or reject to change the state of the corresponding promise object. A promise that is either resolved or rejected is called “settled”, as opposed to an initially “pending” promise. There can be only a single result or an error The executor should call only one resolve or one reject. Any state change is final. All further calls of resolve and reject are ignored: let promise = new Promise(function(resolve, reject) { resolve("done"); reject(new Error("…")); // ignored setTimeout(() => resolve("…")); // ignored }); The idea is that a job done by the executor may have only one result or an error. Also, resolve/reject expect only one argument (or none) and will ignore additional arguments. Reject with Error objects In case something goes wrong, the executor should call reject. That can be done with any type of argument (just like resolve). But it is recommended to use Error objects (or objects that inherit from Error). The reasoning for that will soon become apparent. Immediately calling resolve/reject In practice, an executor usually does something asynchronously and calls resolve/reject after some time, but it doesn’t have to. We also can call resolve or reject immediately, like this: let promise = new Promise(function(resolve, reject) { // not taking our time to do the job resolve(123); // immediately give the result: 123 }); For instance, this might happen when we start to do a job but then see that everything has already been completed and cached. That’s fine. We immediately have a resolved promise. The state and result are internal The properties state and result of the Promise object are internal. We can’t directly access them. We can use the methods .then/.catch/.finally for that. They are described below.

    Consumers: then, catch

    A Promise object serves as a link between the executor (the “producing code” or “singer”) and the consuming functions (the “fans”), which will receive the result or error. Consuming functions can be registered (subscribed) using the methods .then and .catch.

     then

    The most important, fundamental one is .then. The syntax is: promise.then( function(result) { /* handle a successful result */ }, function(error) { /* handle an error */ } ); The first argument of .then is a function that runs when the promise is resolved and receives the result. The second argument of .then is a function that runs when the promise is rejected and receives the error. For instance, here’s a reaction to a successfully resolved promise: let promise = new Promise(function(resolve, reject) { setTimeout(() => resolve("done!"), 1000); }); // resolve runs the first function in .then promise.then( result => alert(result), // shows "done!" after 1 second error => alert(error) // doesn't run ); The first function was executed. And in the case of a rejection, the second one: let promise = new Promise(function(resolve, reject) { setTimeout(() => reject(new Error("Whoops!")), 1000); }); // reject runs the second function in .then promise.then( result => alert(result), // doesn't run error => alert(error) // shows "Error: Whoops!" after 1 second ); If we’re interested only in successful completions, then we can provide only one function argument to .then: let promise = new Promise(resolve => { setTimeout(() => resolve("done!"), 1000); }); promise.then(alert); // shows "done!" after 1 second

     catch

    If we’re interested only in errors, then we can use null as the first argument: .then(null, errorHandlingFunction). Or we can use .catch(errorHandlingFunction), which is exactly the same: let promise = new Promise((resolve, reject) => { setTimeout(() => reject(new Error("Whoops!")), 1000); }); // .catch(f) is the same as promise.then(null, f) promise.catch(alert); // shows "Error: Whoops!" after 1 second The call .catch(f) is a complete analog of .then(null, f), it’s just a shorthand.

    Cleanup: finally

    Just like there’s a finally clause in a regular try {...} catch {...}, there’s finally in promises. The call .finally(f) is similar to .then(f, f) in the sense that f runs always, when the promise is settled: be it resolve or reject. The idea of finally is to set up a handler for performing cleanup/finalizing after the previous operations are complete. E.g. stopping loading indicators, closing no longer needed connections, etc. Think of it as a party finisher. No matter was a party good or bad, how many friends were in it, we still need (or at least should) do a cleanup after it. The code may look like this: new Promise((resolve, reject) => { /* do something that takes time, and then call resolve or maybe reject */ }) // runs when the promise is settled, doesn't matter successfully or not .finally(() => stop loading indicator) // so the loading indicator is always stopped before we go on .then(result => show result, err => show error) Please note that finally(f) isn’t exactly an alias of then(f,f) though. There are important differences: A finally handler has no arguments. In finally we don’t know whether the promise is successful or not. That’s all right, as our task is usually to perform “general” finalizing procedures. Please take a look at the example above: as you can see, the finally handler has no arguments, and the promise outcome is handled by the next handler. A finally handler “passes through” the result or error to the next suitable handler. For instance, here the result is passed through finally to then: new Promise((resolve, reject) => { setTimeout(() => resolve("value"), 2000); }) .finally(() => alert("Promise ready")) // triggers first .then(result => alert(result)); // <-- .then shows "value" As you can see, the value returned by the first promise is passed through finally to the next then. That’s very convenient, because finally is not meant to process a promise result. As said, it’s a place to do generic cleanup, no matter what the outcome was. And here’s an example of an error, for us to see how it’s passed through finally to catch: new Promise((resolve, reject) => { throw new Error("error"); }) .finally(() => alert("Promise ready")) // triggers first .catch(err => alert(err)); // <-- .catch shows the error A finally handler also shouldn’t return anything. If it does, the returned value is silently ignored. The only exception to this rule is when a finally handler throws an error. Then this error goes to the next handler, instead of any previous outcome. To summarize: A finally handler doesn’t get the outcome of the previous handler (it has no arguments). This outcome is passed through instead, to the next suitable handler. If a finally handler returns something, it’s ignored. When finally throws an error, then the execution goes to the nearest error handler. These features are helpful and make things work just the right way if we use finally how it’s supposed to be used: for generic cleanup procedures. We can attach handlers to settled promises If a promise is pending, .then/catch/finally handlers wait for its outcome. Sometimes, it might be that a promise is already settled when we add a handler to it. In such case, these handlers just run immediately: // the promise becomes resolved immediately upon creation let promise = new Promise(resolve => resolve("done!")); promise.then(alert); // done! (shows up right now) Note that this makes promises more powerful than the real life “subscription list” scenario. If the singer has already released their song and then a person signs up on the subscription list, they probably won’t receive that song. Subscriptions in real life must be done prior to the event. Promises are more flexible. We can add handlers any time: if the result is already there, they just execute.

    Example: loadScript

    Next, let’s see more practical examples of how promises can help us write asynchronous code. We’ve got the loadScript function for loading a script from the previous chapter. Here’s the callback-based variant, just to remind us of it: function loadScript(src, callback) { let script = document.createElement('script'); script.src = src; script.onload = () => callback(null, script); script.onerror = () => callback(new Error(`Script load error for ${src}`)); document.head.append(script); } Let’s rewrite it using Promises. The new function loadScript will not require a callback. Instead, it will create and return a Promise object that resolves when the loading is complete. The outer code can add handlers (subscribing functions) to it using .then: function loadScript(src) { return new Promise(function(resolve, reject) { let script = document.createElement('script'); script.src = src; script.onload = () => resolve(script); script.onerror = () => reject(new Error(`Script load error for ${src}`)); document.head.append(script); }); } Usage: let promise = loadScript("https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.js"); promise.then( script => alert(`${script.src} is loaded!`), error => alert(`Error: ${error.message}`) ); promise.then(script => alert('Another handler...')); We can immediately see a few benefits over the callback-based pattern:
    Promises Callbacks
    Promises allow us to do things in the natural order. First, we run loadScript(script), and .then we write what to do with the result. We must have a callback function at our disposal when calling loadScript(script, callback). In other words, we must know what to do with the result before loadScript is called.
    We can call .then on a Promise as many times as we want. Each time, we’re adding a new “fan”, a new subscribing function, to the “subscription list”. More about this in the next chapter: Promises chaining. There can be only one callback.
    So promises give us better code flow and flexibility. But there’s more. We’ll see that in the next chapters.

    Tasks

     Re-resolve a promise?

    What’s the output of the code below? let promise = new Promise(function(resolve, reject) { resolve(1); setTimeout(() => resolve(2), 1000); }); promise.then(alert); The output is: 1. The second call to resolve is ignored, because only the first call of reject/resolve is taken into account. Further calls are ignored.

     Delay with a promise

    The built-in function setTimeout uses callbacks. Create a promise-based alternative. The function delay(ms) should return a promise. That promise should resolve after ms milliseconds, so that we can add .then to it, like this: function delay(ms) { // your code } delay(3000).then(() => alert('runs after 3 seconds')); function delay(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } delay(3000).then(() => alert('runs after 3 seconds')); Please note that in this task resolve is called without arguments. We don’t return any value from delay, just ensure the delay.

     Animated circle with promise

    Rewrite the showCircle function in the solution of the task Animated circle with callback so that it returns a promise instead of accepting a callback. The new usage: showCircle(150, 150, 100).then(div => { div.classList.add('message-ball'); div.append("Hello, world!"); }); Take the solution of the task Animated circle with callback as the base.

    Synchronous request with Websockets



    1st make a function to send the data:

    var socketQueueId = 0; var socketQueue = {}; function sendData(data, onReturnFunction){ socketQueueId++; if (typeof(returnFunc) == 'function'){ // the 'i_' prefix is a good way to force string indices, believe me you'll want that in case your server side doesn't care and mixes both like PHP might do socketQueue['i_'+socketQueueId] = onReturnFunction; } jsonData = JSON.stringify({'cmd_id':socketQueueId, 'json_data':data}); try{ webSocket.send(jsonData); console.log('Sent'); }catch(e){ console.log('Sending failed ... .disconnected failed'); } }

    Then in the server side

    when processing the request, you should send the cmd_id back to the client with the response webSocket.onmessage = function(e) { try{ data = JSON.parse(e.data); }catch(er){ console.log('socket parse error: '+e.data); } if (typeof(data['cmd_id']) != 'undefined' && typeof(socketQueue['i_'+data['cmd_id']]) == 'function'){ execFunc = socketQueue['i_'+data['cmd_id']]; execFunc(data['result']); // to free up memory.. and it is IMPORTANT delete socketQueue['i_'+data['cmd_id']]; return; }else{ socketRecieveData(e.data); } }

    and create a function to handle all other types of returns:

    socketRecieveData(data){ //whatever processing you might need }

    sendData

    so now simply if you want to send some data for the server and wait for response for that specific data you simple do: sendData('man whats 1+1', function(data){console.log('server response:');console.log(data);});

    JavaScript Promise



    Syntax: let promise = new Promise(function(resolve, reject){
    //do something
    }); Parameters The promise constructor takes only one argument which is a callback function The callback function takes two arguments, resolve and reject Perform operations inside the callback function and if everything went well then call resolve. If desired operations do not go well then call reject.

    A Promise has four states

    StateDescription
    FulfilledAction related to the promise succeeded
    RejectedAction related to the promise failed
    PendingPromise is still pending i.e. not fulfilled or rejected yet
    SettledPromise has been fulfilled or rejected
    Example 1: In this example we create a promise comparing two strings. If they match, resolve; otherwise, reject. Then, log success or error accordingly. Simplifies asynchronous handling in JavaScript. let promise = new Promise(function (resolve, reject) { const x = "geeksforgeeks"; const y = "geeksforgeeks" if (x === y) { resolve(); } else { reject(); } }); promise.then(function () { console.log('Success, You are a GEEK'); }). catch(function () { console.log('Some error has occurred'); }); OutputSuccess, You are a GEEK Promise Consumers: Promises can be consumed by registering functions using .then and .catch methods.

    1. Promise then() Method

    Promise method is invoked when a promise is either resolved or rejected. It may also be defined as a carrier that takes data from promise and further executes it successfully. Parameters: It takes two functions as parameters. The first function is executed if the promise is resolved and a result is received. The second function is executed if the promise is rejected and an error is received. (It is optional and there is a better way to handle error using .catch() method Syntax: .then(function(result){
    //handle success
    }, function(error){
    //handle error
    }) Example 2: This example shows how the then method handles when a promise is resolved let promise = new Promise(function (resolve, reject) { resolve('Geeks For Geeks'); }) promise .then(function (successMessage) { //success handler function is invoked console.log(successMessage); }, function (errorMessage) { console.log(errorMessage); }); OutputGeeks For Geeks Example 3: This example shows the condition when a rejected promise is handled by second function of then method let promise = new Promise(function (resolve, reject) { reject('Promise Rejected') }) promise .then(function (successMessage) { console.log(successMessage); }, function (errorMessage) { //error handler function is invoked console.log(errorMessage); }); OutputPromise Rejected

    2. Promise catch() Method

    Promise catch() Method is invoked when a promise is either rejected or some error has occurred in execution. It is used as an Error Handler whenever at any step there is a chance of getting an error. Parameters: It takes one function as a parameter. Function to handle errors or promise rejections.(.catch() method internally calls .then(null, errorHandler), i.e. .catch() is just a shorthand for .then(null, errorHandler) ) Syntax: .catch(function(error){
    //handle error
    }) Examples 4: This example shows the catch method handling the reject function of promise. let promise = new Promise(function (resolve, reject) { reject('Promise Rejected') }) promise .then(function (successMessage) { console.log(successMessage); }) .catch(function (errorMessage) { //error handler function is invoked console.log(errorMessage); }); OutputPromise Rejected Supported Browsers: Google Chrome 5.0 Edge 12 Mozilla 4.0 Safari 5.0 Opera 11.1

    FAQs – JavaScript Promise

    How do Promises work in JavaScript?

    Promises use then() and catch() methods to handle asynchronous results, allowing chaining of operations.

    What are the states of a Promise?

    Promises have three states: pending (initial state), fulfilled (successful completion), and rejected (failure).

    How do you create a Promise in JavaScript?

    Promises are created using the new Promise() constructor, which takes an executor function with resolve and reject parameters

    What is Promise chaining?

    Promise chaining is the practice of sequentially executing asynchronous operations using multiple then() calls on a Promise.

    Can Promises be canceled in JavaScript?

    Promises cannot be canceled natively, but techniques like using an external flag or a custom implementation can simulate cancellation.

    JavaScript Promise 全介紹



    Ajax 可以向伺服器傳送及取得資料,並且不需要重新整理瀏覽器畫面,這樣可以大幅提升使用者體驗並且減少伺服器負擔(僅處理資料,畫面由前端處理)。 Ajax 是屬於一個透過 JavaScript 技術名稱,用於取得遠端資料;而 Promise 則是一個語法,專門用來處理非同步行為,並不是專門用來處理 Ajax 使用,所以兩者是不同的。 Q: Promise 與 Async、Await 有什麼關係? Promise 是用來優化非同步的語法,而 Async、Await 可以基於 Promise 讓非同步的語法的結構類似於 “同步語言”,更易讀且好管理。 Q: 請問 Promise 很常用到嗎?是否一定要學呢? 使用頻率高,必學。

    非同步的問題

    Promise 本身是用來改善 JavaScript 非同步的語法結構。 在過去的文章中有提到,JavaScript 是屬於同步的程式語言,因此一次僅能做一件事情,但遇到非同步的事件時,就會將非同步的事件移動到程式碼的最後方,等到所有的原始碼運行完以後才會執行非同步的事件。 以下列的程式碼來說,在 console 中依序的會出現的順序為: 開始 程式碼結束 非同步事件 <- 最後執行 console.log('開始'); setTimeout(() => { console.log('非同步事件'); }, 0); console.log('程式碼結束'); 雖然在上段的原始碼中,setTimeout 所定義的時間為 0,但因為是屬於非同步事件,因此還是會在其他原始碼運行完以後才執行。 在 Ajax 的行為中也是一樣,當需要確保擷取到遠端資料才繼續往下執行時,如果程式碼是依序撰寫的方式,就會無法正確呈現資料,以下範例我們使用 Promise base 的 Ajax 函式庫 axios 進行一下錯誤的示範: 與上述的概念是相同的,Ajax 本身也是屬於非同步的行為。在一開始先定義了一個 data 物件,中間段落使用 axios 嘗試取得遠端資料,後面的緊接的 console.log(data); 呈現的依然是一開始定義的物件,並不會是 Ajax 取得的資料。 let data = {} console.log('開始'); axios.get('https://randomuser.me/api/').then(function(response) { data = response; }); console.log(data); 在上述的範例中,data 正確的賦值位置在以下片段,如果預期在取得資料後進行其它的行為,了解整個 Promise 的運作將非常的重要。 then(function(response) { data = response; })

    Promise 的結構及狀態

     結構

    Promise 本身是一個建構函式,函式也是屬於物件的一種,因此可以附加其它屬性方法在上,透過 console 的結果可以看到 Promise 可以直接使用 all、race、resolve、reject 的方法,寫法如下(後面再介紹運用方式): Promise.all Promise.race Promise.resolve Promise.reject Promise 建構函式 new 出的物件,則可以使用其中的原型方法(在 prototype 內),其中就包含 then、catch、finally,這些方法則必須在新產生的物件下才能呼叫。 透過 new Promise() 的方式建立 p 物件,此時 p 就能使用 Promise 的原型方法: const p = new Promise(); p.then(); // Promise 回傳正確 p.catch(); // Promise 回傳失敗 p.finally(); // 非同步執行完畢(無論是否正確完成) 除此之外,Promise 建構函式建立同時,必須傳入一個函式作為參數(executor function),此函式的參數包含 resolve, reject,這兩個方法分別代表成功與失敗的回傳結果,特別注意這兩個僅能回傳其中之一,回傳後表示此 Promise 事件結束。 new Promise(function(resolve, reject) { resolve(); // 正確完成的回傳方法 reject(); // 失敗的回傳方法 }); resolve 及 reject 的名稱可以自定義,但在開發上大多數開發者習慣維持此名稱。

     狀態

    Promise 的關鍵在處理非同步的事件,而非同步的過程中也包含著不同的進度狀態,在 Promise 的執行過程中,可以看到以下狀態。 pending:事件已經運行中,尚未取得結果 resolved:事件已經執行完畢且成功操作,回傳 resolve 的結果(該承諾已經被實現 fulfilled) rejected:事件已經執行完畢但操作失敗,回傳 rejected 的結果 進入 fulfilled 或 rejected 就算完成後不會再改變,Promise 中會使用 resolve 或 reject 回傳結果,並在調用時使用 then 或 catch 取得值。 如果要判斷 Promise 是否完成,可依據 Promise 事件中的 resolve 及 reject 是否有被調用,以下範例來說在沒有調用兩個方法時,Promise 的結果則會停留在 pending。 function promise() { return new Promise((resolve, reject) => {}); } console.dir(promise()); 在 Promise 的執行函式中,可以看到以下兩個屬性: [[PromiseStatus]]: "pending" -> 表示目前的進度狀態 [[PromiseValue]]: undefined -> 表示 resolve 或 reject 回傳的值 以下範例來說,執行完函式直接 reject('失敗'),最終也能取得 rejected 的狀態及值。 function promise() { return new Promise((resolve, reject) => {reject('失敗');}); } console.dir(promise());

    建立自己的 Promise

    如果要熟悉 Promise,最好的方式莫過於自己撰寫一次 Promise。 Promise 預設會帶入 resolve, reject 的參數(可自訂名稱,但大家習慣 resolve 及 reject),resolve 代表成功;reject 代表失敗,而兩者必定只能回傳其中之一,且必定只能回傳一次。 函式陳述式建立以後,直接透過 return new Promise 回傳並建立一個 Promise 物件,並且在內部加入一個執行函式且帶上 resolve, reject 的參數,到這個階段就是常見的 Promise 結構,接下來在依據執行的結果來透過 resolve, reject 回傳值即可(以下範例會隨機調用 resovle 及 reject)。 function promise() { return new Promise((resolve, reject) => { // 隨機取得 0 or 1 const num = Math.random() > 0.5 ? 1 : 0; // 1 則執行 resolve,否則執行 reject if (num) { resolve('成功'); } reject('失敗') }); } 在呼叫前 Promise 前回顧一下 Promise 會有三個狀態: Pending -> 尚未得到結果 Resolved:事件已經執行完畢且成功操作,回傳 resolve 的結果 Rejected:事件已經執行完畢但操作失敗,回傳 rejected 的結果 上列的三種狀態每次執行必定會經過 Pending,接下來進入 Fulfilled 或 Rejected 的其中之一,並且可以使用 then() 及 catch() 取得成功或失敗的結果。 在 .then(onFulfilled, onRejected)中可帶入兩個回呼函式,兩者分別又可以帶入各自的參數: onFulfilled:執行成功的函式,所帶入參數表示 Promise 函式中 resolve 所帶入的值。 onRejected:執行失敗的函式,帶入參數表示 Promise 函式中 reject 所帶入的值。 // promise.then(onFulfilled, onRejected); // 前者為 resolve callback,後者則為 reject promise().then((success) => { console.log(success); }, (fail) => {console.log(fail); }) 在大部分情況下,開發者習慣僅使用 .then() 來取得成功的結果,失敗的部分交由 catch(onRejected) 來處理,這兩種寫法差異很小。 // promise.then(onFulfilled); // promise.catch(onRejected) promise().then(success => { console.log(success);}) // 失敗的行為一律交給了 catch .catch(fail => { console.log(fail); });

     鏈接

    為了確保非同步完成後才執行另一個方法,過去都只能不斷的透過 callback 的方式來確保下一個方法正確執行,網路上如果搜尋 callback hell 也可以看到相關的文章及程式碼波動拳的圖片。 Promise 另一個特點在於 then、catch 都可以使用鏈接的方式不斷的進行下一個任務,在此範例中我們修改 Promise 的結果,改成傳入 0 則會調用 reject,其它數值則會調用 resolve。 function promise(num) { return new Promise((resolve, reject) => { num ? resolve(`${num}, 成功`) : reject('失敗'); }); } 接下來,當我們要進行確保 Promise 任務結束後在進行下一個任務時,就可以使用 return 的方式進入下一個 then,此 return 也有以下特點: 方法不限於 promise 函式,任何表達式(expression)都可進行回傳 如果是 promise 函式,則會繼續遵循 then 及 catch 的運作 如果不是 promise 函式,在下一個 then 則可以取得結果promise(1) .then(success => { console.log(success); return promise(2); }) .then(success => { console.log(success); return promise(0); // 這個階段會進入 catch }) .then(success => { // 由於上一個階段結果是 reject,所以此段不執行 console.log(success); return promise(3); }) .catch(fail => { console.log(fail); })

     Then VS Catch 的失敗回呼差異

    then、catch 都可以透過進行鏈接,上述也有提到 then 同時也能接收失敗的結果,在此用圖示表示兩者在執行上不同的結果。 不使用 then 接收失敗:無論在哪一個階段遇到 reject 時,接下來會直接跳到 catch,在其後的 then 都不會執行。另外提一下:catch 依然可以使用 return 繼續串接(實戰中很少這樣寫)。 使用 then 接收失敗:then 中的兩個函式必定執行其中一個(onFulfilled, onRejected),可以用此方式確保所有的鏈接都能夠被執行。

     Finally 完成

    最後方可以使用 finally 來確認工作結束,finally 不帶有任何參數。這個方法適合用來作為 Ajax 已經讀取完成,透過 finally 來關閉讀取的效果。 promise(1) .then(success => { console.log(success); }).finally(() => { console.log('done'); })

    使用 Promise 改寫 XMLHttpRequest

    Promise 很大一部份是用來處理 Ajax 行為,此段透過改寫的形式了解使用 Promise 及傳統的寫法有哪些差異。 傳統上,需透過 XMLHttpRequest 建構式來產生可進行遠端請求的物件,並且依序定義方法(GET)及狀態(onload)並送出請求(send),取得結果後的其它行為則需要撰寫在 onload 內,程式碼結構如下: var url = 'https://jsonplaceholder.typicode.com/todos/1'; // 定義 Http request var req = new XMLHttpRequest(); // 定義方法 req.open('GET', url); // 當請求完成,則進行函式的結果 req.onload = function() { if (req.status == 200) { // 成功直接列出結果 console.log(req.response); } else { // 失敗的部分 } }; // 送出請求 req.send(); 接下來將以上的行為封裝至 get 函式內,此函式包含 Promise 及上述的 XMLHttpRequest 行為,運用時只要直接使用 get(url)...,接下來的運用方式則是符合 Promise 的結構,重複運用的情況下程式碼可以大幅提高易讀性。 function get(url) { return new Promise((resolve, reject)=> { // 定義 Http request var req = new XMLHttpRequest(); req.open('GET', url); req.onload = function() { if (req.status == 200) { // 使用 resolve 回傳成功的結果,也可以在此直接轉換成 JSON 格式 resolve(JSON.parse(req.response)); } else { // 使用 reject 自訂失敗的結果 reject(new Error(req)) } }; req.send(); }); } // 往後的 HTTP 直接就能透過 get 函式取得 get('https://jsonplaceholder.typicode.com/todos/1') .then((res) => { console.log(res); }) .catch((res) => { console.error(res) })

    Promise 方法

    最後,介紹 Promise 中其它的方法,就 Promise 的物件下,展開後可以看到以下方法: Promise API all -> 多個 Promise 行為同時執行,全部完成後統一回傳。 race -> 多個 Promise 同時執行,但僅回傳第一個完成的。 Promise.reject, Promise.resolve -> 定義 Fulfilled 或 Rejected 的 Promise 物件。 此段用下方定義的 promise 函式做說明,可以傳入兩個參數: num: 此 Promise 執行成功與否 time: 此 Promise 所執行的時間長度function promise(num, time = 500) { return new Promise((resolve, reject) => { setTimeout(() => { num ? resolve(`${num}, 成功`) : reject('失敗'); }, time); }); }

     Promise.all

    透過陣列的形式傳入多個 promise 函式,在全部執行完成後回傳陣列結果,陣列的結果順序與一開始傳入的一致。 Promise.all([promise(1), promise(2), promise(3, 3000)]) .then(res => { console.log(res); }); 這個方法很適合用在多支 API 要一起執行,並確保全部完成後才進行其他工作時。

     Promise.race

    透過陣列的形式傳入多個 promise 函式,在全部執行完成後回傳單一結果,結果為第一個運行完成的,以下範例來說就會回傳 promise(1) 的結果。 Promise.race([promise(1), promise(2), promise(3, 3000)]).then(res => { console.log(res); }); 這個方法可以用在站點不穩定,同時發送多支同行為 API 確保可行性使用,但實作中使用率並不高。

     Promise.reject, Promise.resolve

    這兩個方法是直接定義 Promise 物件已經完成的狀態(resolve, reject),與 new Promise 一樣會產生一個新的 Promise 物件,但其結果是已經確定的,以下提供範例說明: 使用 Promise.resolve 產生一個新的 Promise 物件,此物件可以使用 then 取得 resolve 的結果。 var result = Promise.resolve('result'); result.then(res => { console.log('resolved', res); // 成功部分可以正確接收結果 }, res => { console.log('rejected', res); // 失敗部分不會取得結果 }); 改為 Promise.reject 產生 Promise 物件,此物件必定呈現 rejected 的結果。 var result = Promise.reject('result'); result.then(res => { console.log(res); }, res => { console.log(res); // 只有此段會出現結果 }); // rejected result 注意:Promise.reject、Promise.resolve 是直接定義結果,無論傳入的是否為 Promise 物件。

    Node.js quick file server

    npm install http-server -g http-server -o /path/to/static/content to serve current path and use all local available addr: http-server -o . to serve current path with localhost: http-server -a localhost access file: http://127.0.0.1:/8080/color.js http://localhost:/8080/color.js to serve current path with localhost and port: http-server -a localhost -p 80 npx http-server - npx turns it into a one-liner that downloads the necessary files and runs it: npx http-server -o /path/to/static/content

    file server code:

    var http = require('http'); var fs = require('fs'); var path = require('path'); http.createServer(function (request, response) { console.log('request starting...'); var filePath = '.' + request.url; console.log(filePath) if (filePath == './') // if not specified file, assign to index.html filePath = './index.html'; var extname = path.extname(filePath); var contentType = 'text/html'; switch (extname) { case '.js': contentType = 'text/javascript'; break; case '.css': contentType = 'text/css'; break; case '.json': contentType = 'application/json'; break; case '.png': contentType = 'image/png'; break; case '.jpg': contentType = 'image/jpg'; break; case '.wav': contentType = 'audio/wav'; break; } fs.readFile(filePath, function(error, content) { if (error) { if(error.code == 'ENOENT'){ fs.readFile('./404.html', function(error, content) { response.writeHead(200, { 'Content-Type': contentType }); response.end(content, 'utf-8'); }); } else { response.writeHead(500); response.end('error: '+error.code+' ..\n'); response.end(); } } else { response.writeHead(200, { 'Content-Type': contentType }); response.end(content, 'utf-8'); } }); }).listen(8125); console.log('Server running at http://127.0.0.1:8125/'); UPDATE If you need to access your server from external demand/file, you need to overcome the CORS, in your node.js file by writing the below: // Website you wish to allow to connect response.setHeader('Access-Control-Allow-Origin', '*'); // Request methods you wish to allow response.setHeader('Access-Control-Allow-Methods', 'GET, POST, OPTIONS, PUT, PATCH, DELETE'); // Request headers you wish to allow response.setHeader('Access-Control-Allow-Headers', 'X-Requested-With,content-type'); // Set to true if you need the website to include cookies in the requests sent // to the API (e.g. in case you use sessions) response.setHeader('Access-Control-Allow-Credentials', true); UPDATE const http = require('http'); const url = require('url'); const fs = require('fs'); const path = require('path'); const port = process.argv[2] || 9000; http.createServer(function (req, res) { console.log(`${req.method} ${req.url}`); // parse URL const parsedUrl = url.parse(req.url); // extract URL path let pathname = `.${parsedUrl.pathname}`; // based on the URL path, extract the file extension. e.g. .js, .doc, ... const ext = path.parse(pathname).ext; // maps file extension to MIME typere const map = { '.ico': 'image/x-icon', '.html': 'text/html', '.js': 'text/javascript', '.json': 'application/json', '.css': 'text/css', '.png': 'image/png', '.jpg': 'image/jpeg', '.wav': 'audio/wav', '.mp3': 'audio/mpeg', '.svg': 'image/svg+xml', '.pdf': 'application/pdf', '.doc': 'application/msword' }; fs.exists(pathname, function (exist) { if(!exist) { // if the file is not found, return 404 res.statusCode = 404; res.end(`File ${pathname} not found!`); return; } // if is a directory search for index file matching the extension if (fs.statSync(pathname).isDirectory()) pathname += '/index' + ext; // read file from file system fs.readFile(pathname, function(err, data){ if(err){ res.statusCode = 500; res.end(`Error getting the file: ${err}.`); } else { // if the file is found, set Content-type and send data res.setHeader('Content-type', map[ext] || 'text/plain' ); res.end(data); } }); }); }).listen(parseInt(port)); console.log(`Server listening on port ${port}`);

    file server use expressjs/serve-static

    myapp.js: var http = require('http'); var finalhandler = require('finalhandler'); var serveStatic = require('serve-static'); var serve = serveStatic("./"); var server = http.createServer(function(req, res) { var done = finalhandler(req, res); serve(req, res, done); }); server.listen(8000); and then from command line: $ npm install finalhandler serve-static $ node myapp.js

    express file server code

    https://stackoverflow.com/questions/16333790/node-js-quick-file-server-static-files-over-http expressserver.js var express = require('express'); var server = express(); server.use(express.static(__dirname)); server.listen(8080); browse: http://localhost:8080/ to specify the html file to load, use this syntax: server.use('/', express.static(__dirname + '/myfile.html')); to put it in a different location, set the path on the third line: server.use('/', express.static(__dirname + '/public')); CD to the folder containing your file and run node from the console with this command: node server.js Browse to localhost:8080

    nodejs push a message from web server to browser

    Node js example index.js var app = require('express')(); var http = require('http').Server(app); var io = require('socket.io')(http); app.get("/", function (req, res) { res.sendFile("index.html", {root: __dirname}); }); io.on("connection", function (socket) { socket.on("notify", function (notification_request) { io.emit('notify', JSON.stringify(notification_request)); }); }); http.listen(3000, function () { console.log('listenting on 3000'); }); your frontent index.html before </body> <script> var socket = io(); $('button').click(function () { //notify event triggered socket.emit('notify', {notification-1: "message1", notification-2: "message2", notification-3: "message3"}); return false; }); socket.on('notify', function (notification) { var notifications = JSON.parse(notification); //process notication array $('#notification-div').append(notifications); //display the notification here which is going to be reflected for all clients }); </script> Run your index.js file on terminal or CLI to activate server. And Don't forget to install following node modules var app = require('express')(); var http = require('http').Server(app); var io = require('socket.io')(http); Express.js Send Response From Server to Client Sending response from Node.js server to browser

    WebSocket Chat application with browser

    <!-- index.html --> <html><head><title>Chat App</title> <script> const myUsername = prompt("Please enter your name") || "Anonymous"; const socket = new WebSocket( `ws://localhost:8080/start_web_socket?username=${myUsername}`, ); socket.onmessage = (m) => { const data = JSON.parse(m.data); switch (data.event) { case "update-users": // refresh displayed user list let userListHtml = ""; for (const username of data.usernames) { userListHtml += `<div> ${username} </div>`; } document.getElementById("users").innerHTML = userListHtml; break; case "send-message": // display new chat message addMessage(data.username, data.message); break; } }; function addMessage(username, message) { // displays new message document.getElementById( "conversation", ).innerHTML += `<b> ${username} </b>: ${message} <br/>`; } // on page load window.onload = () => { // when the client hits the ENTER key document.getElementById("data").addEventListener("keypress", (e) => { if (e.key === "Enter") { const inputElement = document.getElementById("data"); var message = inputElement.value; inputElement.value = ""; socket.send( JSON.stringify({ event: "send-message", message: message, }), ); } }); }; </script> </head> <body> <b>Users</b> <div id="users"></div> <input id="data" placeholder="send message" /> <div id="conversation"></div> </body> </html> // server.js import { Application, Router } from "https://deno.land/x/oak/mod.ts"; const connectedClients = new Map(); const app = new Application(); const port = 8080; const router = new Router(); // send a message to all connected clients function broadcast(message) { for (const client of connectedClients.values()) { client.send(message); } } // send updated users list to all connected clients function broadcast_usernames() { const usernames = [...connectedClients.keys()]; console.log( "Sending updated username list to all clients: " + JSON.stringify(usernames), ); broadcast( JSON.stringify({ event: "update-users", usernames: usernames, }), ); } router.get("/start_web_socket", async (ctx) => { const socket = await ctx.upgrade(); const username = ctx.request.url.searchParams.get("username"); if (connectedClients.has(username)) { socket.close(1008, `Username ${username} is already taken`); return; } socket.username = username; connectedClients.set(username, socket); console.log(`New client connected: ${username}`); // broadcast the active users list when a new user logs in socket.onopen = () => { broadcast_usernames(); }; // when a client disconnects, remove them from the connected clients list // and broadcast the active users list socket.onclose = () => { console.log(`Client ${socket.username} disconnected`); connectedClients.delete(socket.username); broadcast_usernames(); }; // broadcast new message if someone sent one socket.onmessage = (m) => { const data = JSON.parse(m.data); switch (data.event) { case "send-message": broadcast( JSON.stringify({ event: "send-message", username: socket.username, message: data.message, }), ); break; } }; }); app.use(router.routes()); app.use(router.allowedMethods()); app.use(async (context) => { await context.send({ root: `${Deno.cwd()}/`, index: "public/index.html", }); }); console.log("Listening at http://localhost:" + port); await app.listen({ port });

    execute an external program from within Node.js

    The simplest way: const { exec } = require("child_process") exec('Minute.html').unref() or var run = require('child_process').exec; run('Minute.html'); note: exec has memory limitation of buffer size of 512k.

    Writing files with Node.js

    use the fs.writeFile(). const fs = require('node:fs'); const content = 'Some content!'; fs.writeFile('test.html', content, err => { if (err) { console.error(err); } else { console("success!"); } });

    Node.js File System Module

    File Server

    To include the File System module, use the require() method: var fs = require('fs'); Common use for the File System module: Read files, Create files, Update files, Delete files, Rename files

    Read Files

    The fs.readFile() method. Create a Node.js file that reads the HTML file, and return the content: Example var http = require('http'); var fs = require('fs'); http.createServer(function(req, res) { fs.readFile('demofile1.html', function(err, data) { res.writeHead(200, {'Content-Type': 'text/html'}); res.write(data); return res.end(); }); }).listen(8080); Save the code above in a file called "demo_readfile.js", and initiate the file: Initiate demo_readfile.js: C:\Users\Your Name>node demo_readfile.js

    Create Files

    The File System module has methods for creating new files: fs.appendFile() fs.open() fs.writeFile() The fs.appendFile() method appends specified content to a file. If the file does not exist, the file will be created: Example Create a new file using the appendFile() method: var fs = require('fs'); fs.appendFile('mynewfile1.txt', 'Hello content!', function (err) { if (err) throw err; console.log('Saved!'); }); The fs.open() method takes a "flag" as the second argument, if the flag is "w" for "writing", the specified file is opened for writing. If the file does not exist, an empty file is created: Example Create a new, empty file using the open() method: var fs = require('fs'); fs.open('mynewfile2.txt', 'w', function (err, file) { if (err) throw err; console.log('Saved!'); }); The fs.writeFile() method replaces the specified file and content if it exists. If the file does not exist, a new file, containing the specified content, will be created: Example Create a new file using the writeFile() method: var fs = require('fs'); fs.writeFile('mynewfile3.txt', 'Hello content!', function (err) { if (err) throw err; console.log('Saved!'); });

    Update Files

    The File System module has methods for updating files: fs.appendFile() fs.writeFile() The fs.appendFile() method appends the specified content at the end of the specified file: Example Append "This is my text." to the end of the file "mynewfile1.txt": var fs = require('fs'); fs.appendFile('mynewfile1.txt', ' This is my text.', function (err) { if (err) throw err; console.log('Updated!'); }); The fs.writeFile() method replaces the specified file and content: Example Replace the content of the file "mynewfile3.txt": var fs = require('fs'); fs.writeFile('mynewfile3.txt', 'This is my text', function (err) { if (err) throw err; console.log('Replaced!'); });

    Delete Files

    To delete a file with the File System module, use the fs.unlink() method. The fs.unlink() method deletes the specified file: Example Delete "mynewfile2.txt": var fs = require('fs'); fs.unlink('mynewfile2.txt', function (err) { if (err) throw err; console.log('File deleted!'); });

    Rename Files

    To rename a file with the File System module, use the fs.rename() method. The fs.rename() method renames the specified file: Example Rename "mynewfile1.txt" to "myrenamedfile.txt": var fs = require('fs'); fs.rename('mynewfile1.txt', 'myrenamedfile.txt', function (err) { if (err) throw err; console.log('File Renamed!'); });

    Upload Files

    You can also use Node.js to upload files to your computer. There is a very good module for working with file uploads, called "Formidable". npm install formidable The file will be uploaded, and placed on a temporary folder: var http = require('http'); var formidable = require('formidable'); http.createServer(function (req, res) { if (req.url == '/fileupload') { var form = new formidable.IncomingForm(); form.parse(req, function (err, fields, files) { res.write('File uploaded'); res.end(); }); } else { res.writeHead(200, {'Content-Type': 'text/html'}); res.write('<form action="fileupload" method="post" enctype="multipart/form-data">'); res.write('<input type="file" name="filetoupload"><br>'); res.write('<input type="submit">'); res.write('</form>'); return res.end(); } }).listen(8080); Save the File When a file is successfully uploaded to the server, it is placed on a temporary folder. The path to this directory can be found in the "files" object, passed as the third argument in the parse() method's callback function. To move the file to the folder of your choice, use the File System module, and rename the file:

    Node.js Output to console

    console.log("a n b"); console.log('My %s has %d ears', 'cat', 2); %s format a variable as a string %d format a variable as a number %i format a variable as its integer part only %o format a variable as an object console.clear() clears the console Calculate the time spent: time() and timeEnd() const doSomething = () => console.log('test'); const measureDoingSomething = () => { console.time('doSomething()'); // do something, and measure the time it takes doSomething(); console.timeEnd('doSomething()'); }; measureDoingSomething(); stdout and stderr: console.log printing messages This is the standard output, or stdout. console.error prints to the stderr stream. It will not appear in the console, but it will appear in the error log. Color the output npm install chalk const chalk = require('chalk'); console.log(chalk.yellow('hi!')); Create a progress bar npm install progress const ProgressBar = require('progress'); const bar = new ProgressBar(':bar', { total: 10 }); const timer = setInterval(() => { bar.tick(); if (bar.complete) { clearInterval(timer); } }, 100);

    get colors on the command line

    npm install colors const colors = require('colors'); const stringOne = 'This is a plain string.'; const stringTwo = 'This string is red.'.red; const stringThree = 'This string is blue.'.blue; const today = new Date().toLocaleDateString(); // returns today's date in mm/dd/yyyy format console.log(stringOne.black.bgMagenta); console.log(stringOne.yellow.bgRed.bold); console.log(`Today is: ${today}`.black.bgGreen); console.log(stringTwo); console.log(stringThree); console.log(stringTwo.magenta); console.log(stringThree.grey.bold);

    nodejs set value of text field in html

    index.html <body><input type="text" name="someVal" value="{{someVal}}"></body> server.js var http = require('http'); var fs = require('fs'); http.createServer((req, res) => { fs.readFile('index.html', (err, data) => { if (err) { res.writeHead(500); res.end(err); return; } data = data.toString().replace(/\{\{someVal\}\}/, 'your value here'); res.writeHead(200); res.end(data, 'utf8'); }); }).listen(8080); This server.js will open a HTTP server on port 8080. It will replace your placeholder in your HTML with your value and then send the modified content to the client. If that's all you want to do, PHP might do a better job for you. Option 2 is a lot more elaborate. You would have to either use AJAJ (Asynchronous Javascript and JSON) which requires the client to know when to fetch the value or you could make use of websockets which enable the server to push a value to the client. Look at frameworks like Meteor and Socket.IO

    Nodejs to manipulate the DOM different ways

    1. Using Template Engines Using a template engine like EJS, Pug, or Handlebars with Express.js, you can directly set values in your HTML templates. Example with EJS: First, set up your Express server: const express = require('express'); const app = express(); const path = require('path'); app.set('view engine', 'ejs'); app.set('views', path.join(__dirname, 'views')); app.get('/', (req, res) => { const value = "Hello, World!"; res.render('index', { value }); }); app.listen(3000, () => { console.log('Server is running on http://localhost:3000'); }); Then, create an index.ejs file in the views directory: Document 2. Using JSON Responses with AJAX Send data as JSON from your Express server and use JavaScript on the client side to set the values of HTML elements. Example: Set up an endpoint in Express: app.get('/data', (req, res) => { res.json({ value: "Hello, World!" }); }); On the client side, use AJAX to fetch the data and update the HTML element: AJAX Example Summary Template Engines: Allows you to embed server-side data directly into your HTML. AJAX Requests: Fetch data from your server and manipulate the DOM on the client side using JavaScript. Both methods are commonly used in Express.js applications to set the values of HTML elements dynamically.

    Node.js write data in JSON

    https://blog.logrocket.com/reading-writing-json-files-node-js-complete-tutorial/ https://stackoverflow.com/questions/36856232/write-add-data-in-json-file-using-node-js Create a JavaScript object with array in it var obj = { table: [] }; Add some data to it, for example: obj.table.push({id: 1, square:2}); Convert it from an object to a string with JSON.stringify var json = JSON.stringify(obj); Use fs to write the file to disk var fs = require('fs'); fs.writeFile('myjsonfile.json', json, 'utf8', callback); If you want to append it, read the JSON file and convert it back to an object fs.readFile('myjsonfile.json', 'utf8', function readFileCallback(err, data){ if (err){ console.log(err); } else { obj = JSON.parse(data); //now it an object obj.table.push({id: 2, square:3}); //add some data json = JSON.stringify(obj); //convert it back to json fs.writeFile('myjsonfile.json', json, 'utf8', callback); // write it back }}); This will work for data that is up to 100 MB effectively. Over this limit, you should use a database engine. UPDATE: Create a function which returns the current date (year+month+day) as a string. Create the file named this string + .json. the fs module has a function which can check for file existence named fs.stat(path, callback). With this, you can check if the file exists. If it exists, use the read function if it's not, use the create function. Use the date string as the path cuz the file will be named as the today date + .json. the callback will contain a stats object which will be null if the file does not exist. anothe simple method: var fs = require('fs'); var data = {} data.table = [] for (i=0; i <26 ; i++){ var obj = { id: i, square: i * i } data.table.push(obj) } fs.writeFile ("input.json", JSON.stringify(data), function(err) { if (err) throw err; console.log('complete'); } ); Note: function err must be attached fs.writeFile("myjsonfile.json", json, (err) => { if (err) console.log(err); else { console.log("File written successfully\n"); console.log("The written file has the following contents:"); console.log(fs.readFileSync("books.txt", "utf8")); // include 'utf8' after the filename otherwise it will just return a buffer } });

    Use Promise Object For reading file in NodeJS

    The simplest way to read a file in Node.js is to use the fs.readFile() method, passing it the file path, encoding and a callback function that will be called with the file data (and the error): const fs = require('node:fs'); fs.readFile('/Users/joe/test.txt', 'utf8', (err, data) => { if (err) { console.error(err); return; } console.log(data); }); Alternatively, you can use the synchronous version fs.readFileSync(): const fs = require('node:fs'); try { const data = fs.readFileSync('/Users/joe/test.txt', 'utf8'); console.log(data); } catch (err) { console.error(err); } You can also use the promise-based fsPromises.readFile() method offered by the fs/promises module: const fs = require('node:fs/promises'); async function example() { try { const data = await fs.readFile('/Users/joe/test.txt', { encoding: 'utf8' }); console.log(data); } catch (err) { console.log(err); } } example(); ================= const fs = require("fs") fs.readFile('example.txt', 'utf8', function(err, data) { if (err) { // Handle error console.error('Error reading file:', err); } else { // Handle successful read console.log('File content:', data); } }); =============== const fs = require('fs'); // Function to read a file asynchronously using Promises function readFileAsync(filePath) { return new Promise((resolve, reject) => { // Read file asynchronously fs.readFile(filePath, 'utf8', (err, data) => { if (err) { // If an error occurs, reject the Promise with the error reject(err); } else { // If successful, resolve the Promise with the file content resolve(data); } }); }); } // Example usage const filePath = 'example.txt'; readFileAsync(filePath) .then(data => { // File content successfully read console.log('File content:', data); }) .catch(error => { // Error occurred while reading file console.error('Error reading file:', error); }); ============ Using Promises with fs.readFile fs.readFileAsync = function (filename) { return new Promise((resolve, reject) => { fs.readFile(filename, (err, data) => { if (err) reject(err); else resolve(data); }); }); }; const IMG_PATH = "foo"; // utility function function getImageByIdAsync(i) { return fs.readFileAsync(IMG_PATH + "/image1" + i + ".png"); } Usage with a single image: getImageByIdAsync(0).then(imgBuffer => { console.log(imgBuffer); }).catch(err => { console.error(err); }); Usage with multiple images: var images = [1,2,3,4].map(getImageByIdAsync); Promise.all(images).then(imgBuffers => { // all images have loaded }).catch(err => { console.error(err); }); To promisify a function means to take an asynchronous function with callback semantics and derive from it a new function with promise semantics. It can be done manually, like shown above, or – preferably – automatically. Among others, the Bluebird promise library has a helper for that, see http://bluebirdjs.com/docs/api/promisification.html

    Promise examples

    new Promise((resolveOuter) => { resolveOuter( new Promise((resolveInner) => { setTimeout(resolveInner, 1000); }), ); }); using a function with no arguments e.g. f = () => expression to create the lazily-evaluated expression, and f() to evaluate the expression immediately. The promise methods then(), catch(), and finally() are used to associate further action with a promise that becomes settled. The then() method takes up to two arguments; the first argument is a callback function for the fulfilled case of the promise, and the second argument is a callback function for the rejected case. The catch() and finally() methods call then() internally and make error handling less verbose. For example, a catch() is really just a then() without passing the fulfillment handler. As these methods return promises, they can be chained. For example: const myPromise = new Promise((resolve, reject) => { setTimeout(() => { resolve("foo"); }, 300); }); myPromise .then(handleFulfilledA, handleRejectedA) .then(handleFulfilledB, handleRejectedB) .then(handleFulfilledC, handleRejectedC); On the other hand, in the absence of an immediate need, it is simpler to leave out error handling until the final catch() handler. myPromise .then(handleFulfilledA) .then(handleFulfilledB) .then(handleFulfilledC) .catch(handleRejectedAny); Using arrow functions for the callback functions, implementation of the promise chain might look something like this: myPromise .then((value) => `${value} and bar`) .then((value) => `${value} and bar again`) .then((value) => `${value} and again`) .then((value) => `${value} and again`) .then((value) => { console.log(value); }) .catch((err) => { console.error(err); }); A promise can participate in more than one chain. For the following code, the fulfillment of promiseA will cause both handleFulfilled1 and handleFulfilled2 to be added to the job queue. Because handleFulfilled1 is registered first, it will be invoked first. const promiseA = new Promise(myExecutorFunc); const promiseB = promiseA.then(handleFulfilled1, handleRejected1); const promiseC = promiseA.then(handleFulfilled2, handleRejected2); An action can be assigned to an already settled promise. In this case, the action is added immediately to the back of the job queue and will be performed when all existing jobs are completed. Therefore, an action for an already "settled" promise will occur only after the current synchronous code completes and at least one loop-tick has passed. This guarantees that promise actions are asynchronous. const promiseA = new Promise((resolve, reject) => { resolve(777); }); // At this point, "promiseA" is already settled. promiseA.then((val) => console.log("asynchronous logging has val:", val)); console.log("immediate logging"); // produces output in this order: // immediate logging // asynchronous logging has val: 777 =============== Some Examples function myDisplayer(some) { document.getElementById("demo").innerHTML = some; } let myPromise = new Promise(function(myResolve, myReject) { let x = 0; // The producing code (this may take some time) if (x == 0) { myResolve("OK"); } else { myReject("Error"); } }); myPromise.then( function(value) {myDisplayer(value);}, function(error) {myDisplayer(error);} ); =============== Waiting for a Timeout Example Using Callback setTimeout(function(){myFunction("I love You !!!")}, 3000); function myFunction(value) { document.getElementById("demo").innerHTML = value; } timeout = Promise( function(Resolve, Reject) { setTimeout(function() { Resolve("I love You !!"); }, 3000) } ) timeout.then(function(value) { document.getElementById("demo").innerHTML = value; }); =============== Waiting for a file Example using Callback function getFile(myCallback) { let req = new XMLHttpRequest(); req.open('GET', "mycar.html"); req.onload = function() { if (req.status == 200) { myCallback(req.responseText); } else { myCallback("Error: " + req.status); } } req.send(); } getFile(myDisplayer); Example using Promise getFIlepromise = Promise( function(Resolve, Reject) { let req = new XMLHttpRequest(); req.open('GET', "mycar.html"); req.onload = function() { if (req.status == 200) { Resolve(req.response); } else { Reject("File not Found"); } }; req.send(); } ) getFIlepromise.then( function(value) {myDisplayer(value);}, function(error) {myDisplayer(error);} ); =============== Consider a weather application that fetches weather data from an API. Let’s create a function getWeatherData() that uses Promises to handle the asynchronous fetch operation: function getWeatherData() { return new Promise((resolve, reject) => { const apiKey = 'your-api-key'; const apiUrl = `https://api.weatherapi.com/v1/current.json?key=${apiKey}&q=London`; fetch(apiUrl) .then(response => { if (response.ok) { return response.json(); } else { throw new Error('Unable to fetch weather data.'); } }) .then(data => { resolve(data); }) .catch(error => { reject(error); }); }); } Now, let’s use the getWeatherData() function and chain Promises to handle the weather data retrieval and display: getWeatherData() .then(data => { const weather = data.current; console.log(`Temperature in ${data.location.name}: ${weather.temp_c}°C`); console.log(`Condition: ${weather.condition.text}`); }) .catch(error => { console.error('Error fetching weather data:', error); }); =============== Example: loadScript Next, let’s see more practical examples of how promises can help us write asynchronous code. We’ve got the loadScript function for loading a script from the previous chapter. Here’s the callback-based variant, just to remind us of it: function loadScript(src, callback) { let script = document.createElement('script'); script.src = src; script.onload = () => callback(null, script); script.onerror = () => callback(new Error(`Script load error for ${src}`)); document.head.append(script); } Let’s rewrite it using Promises. The new function loadScript will not require a callback. Instead, it will create and return a Promise object that resolves when the loading is complete. The outer code can add handlers (subscribing functions) to it using .then: function loadScript(src) { return new Promise(function(resolve, reject) { let script = document.createElement('script'); script.src = src; script.onload = () => resolve(script); script.onerror = () => reject(new Error(`Script load error for ${src}`)); document.head.append(script); }); } Usage: let promise = loadScript("https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.js"); promise.then( script => alert(`${script.src} is loaded!`), error => alert(`Error: ${error.message}`) ); promise.then(script => alert('Another handler...')); =============== What’s the output of the code below? let promise = new Promise(function(resolve, reject) { resolve(1); setTimeout(() => resolve(2), 1000); }); promise.then(alert); The output is: 1. =============== Delay with a promise The built-in function setTimeout uses callbacks. Create a promise-based alternative. The function delay(ms) should return a promise. That promise should resolve after ms milliseconds, so that we can add .then to it, like this: function delay(ms) { // your code } delay(3000).then(() => alert('runs after 3 seconds')); solution function delay(ms) { return new Promise(resolve => setTimeout(resolve, ms)); } delay(3000).then(() => alert('runs after 3 seconds')); Please note that in this task resolve is called without arguments. We don’t return any value from delay, just ensure the delay. =============== Animated circle with promise Rewrite the showCircle function in the solution of the task Animated circle with callback so that it returns a promise instead of accepting a callback. The new usage: showCircle(150, 150, 100).then(div => { div.classList.add('message-ball'); div.append("Hello, world!"); }); Take the solution of the task Animated circle with callback as the base. output at: https://plnkr.co/edit/gSQLOOs3AK3jbcCBKuna?p=preview

    fs.readFile

    Asynchronous fs.readFile(filename, [encoding], [callback]) fs.readFile(filename, function (err, data) { if (err) throw err; console.log(data); }); The callback is passed two arguments (err, data), where data is the contents of the file. If no encoding is specified, then the raw buffer is returned. SYNCHRONOUS fs.readFileSync(filename, [encoding]) If encoding is specified then this function returns a string. Otherwise it returns a buffer. var text = fs.readFileSync(filename,'utf8') console.log (text)

    ReferenceError: audio is not defined

    If you are running this code in a Node.js environment, you won't have direct access to the HTML5 Audio API, which is a client-side feature available in web browsers. In a Node.js server, you typically don't have access to client-side features like audio playback. If you want to work with audio files in a Node.js environment, you can use external libraries like "node-wav-player" or "node-speaker" for audio playback or "node-lame" for audio encoding and decoding. These libraries are designed for server-side audio processing.

    Ollama-OCR



    Llama 3.2-Vision 是一种多模态大型语言模型,有 11B 和 90B 两种大小,能够处理文本和图像输入,生成文本输出。 该模型在视觉识别、图像推理、图像描述和回答图像相关问题方面表现出色,在多个行业基准测试中均优于现有的开源和闭源多模态模型。 它默认使用本地运行的 Llama 3.2-Vision
    视觉模型,可准确识别图像中的文字,同时保留原始格式。 https://github.com/bytefer/ollama-ocr

    Llama 3.2-Vision 应用场景


     识别手写文本


     OCR 识别


     图片问答


     安装 Ollama


    在开始使用 Llama 3.2-Vision 之前,您需要安装 Ollama[2],这是一个支持在本地运行多模态模型的平台。 请按照以下步骤安装:

     安装 Llama 3.2-Vision 11B


    安装 Ollama 后,可使用以下命令安装 Llama 3.2-Vision 11B:

     安装 Ollama-OCR


    npm install ollama-ocr # or using pnpm pnpm add ollama-ocr

    使用 Ollama-OCR


     OCR


    import { ollamaOCR, DEFAULT_OCR_SYSTEM_PROMPT } from "ollama-ocr"; async function runOCR() { const text = await ollamaOCR({ filePath: "./handwriting.jpg", systemPrompt: DEFAULT_OCR_SYSTEM_PROMPT, }); console.log(text); } 测试的图片如下: 输出的结果如下: The Llama 3.2-Vision collection of multimodal large language models (LLMs) is a collection of instruction-tuned image reasoning generative models in 118 and 908 sizes (text + images in / text out). The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image. The models outperform many of the available open source and closed multimodal models on common industry benchmarks.

     输出 Markdown


    import { ollamaOCR, DEFAULT_MARKDOWN_SYSTEM_PROMPT } from "ollama-ocr"; async function runOCR() { const text = await ollamaOCR({ filePath: "./trader-joes-receipt.jpg", systemPrompt: DEFAULT_MARKDOWN_SYSTEM_PROMPT, }); console.log(text); } 测试的图片如下: 输出的结果如下: ollama-ocr 使用的本地的视觉模型,如果你想使用线上的 Llama 3.2-Vision 模型,可以试试 llama-ocr[4] 这个库。

    OCR神器

    一、项目名称: naptha/tesseract.js Tesseract.js 是基于 Google 的开源 OCR 引擎 Tesseract 开发的 JavaScript 版本,将其转化为 WebAssembly 版本,使其能够在浏览器和 Node.js 环境中运行。 功能:主要功能是从图像中提取出文本,并将其转换为可编辑的文本格式,可识别的语言多达 100 多种,几乎涵盖了全球各种主要语言,包括英文、中文、日文、法文、德文等。 3.2 技术特点 高性能:得益于 WebAssembly 和高效的内存管理,在处理首张图像时能大幅减少运行时间,并降低内存占用,具有较高的运行效率。 3.3 应用场景 图像搜索:用户上传图片后,应用可以自动提取图片中的关键词,从而实现基于图像的内容搜索,提高搜索的准确性和效率。 文档转换:可将扫描的纸质文档转换为可编辑的电子文本,方便对文档进行编辑、存储和检索,对于数字化办公和文档管理具有重要意义。 实时视频字幕生成:结合计算机视觉技术,能够实现实时视频流的文字识别,并将识别结果转换为字幕显示在视频上,为视频内容的理解和传播提供便利。 数字化古籍:对于含有大量手写字体的历史文献,Tesseract.js 可以帮助快速将其转化成数字文本,为古籍的保存、研究和传播提供技术支持。 3.4 优势与不足 作为一款Javascript,其同样存在着优点和缺点,需要开发者根据具体场景进行折中: (1)优势 跨平台兼容性好:可以在多种操作系统上运行,如 Windows、Linux 和 macOS,具有广泛的适用性。 识别准确性较高:继承了 Tesseract OCR 引擎的优点,在经过适当的训练和优化后,能够对多种语言和字体进行准确识别。 可扩展性强:支持自定义训练数据,用户可以根据自己的特定需求,添加自己的字体、语言和训练数据集,进一步提高识别的准确率。 (2)不足 对复杂图像的识别效果可能受限:对于一些背景复杂、字体特殊或图像质量较差的图片,识别准确率可能会受到一定影响,需要进行适当的预处理或优化。 训练和优化需要一定的技术和时间成本:如果想要获得更高的识别准确率,用户需要花费一定的时间和精力进行训练数据的收集和模型的优化。 四、使用方法 使用本项目前需要先安装Node.js,如果没有安装的话,需要安装一下,下面是Node.js的安装方法。 4.1 安装node.js 进入到Node.js官网,点击下载,然后安装即可,注意官网,不要被坑了。 4.2 克隆项目 首先克隆项目到本地的某个目录: git clone https://github.com/naptha/tesseract.js.git 然后执行下面命令进入到项目根目录: cd tesseract.js 4.3 安装依赖 在tesseract.js目录,执行如下命令安装相关依赖: npm install 4.4 启动服务 执行下面命令启动服务: npm run start 4.5 访问服务 在浏览器输入:http://localhost:3000/examples/browser/basic-efficient.html,打开Demo服务: 因为这个库目前只提供了API,所以要实现用户级别的使用,需要进行一些开发并集成到自己的程序中。 不过,我们可以先在浏览器的开发者模式中提前预览识别的效果: 打开浏览器的开发者模式,一般是「F12」,然后选择控制台: 然后选择一个图片进行上传: 可以看到开发者模式的控制台窗口已经出现了识别结果: 4.6 识别中文 默认提供的Demo只支持英文,我们需要修改一下代码,让它支持中文识别,修改的diff如下: 修改完成之后,再次执行npm run start,然后找一个带中文的图片进行上传: