Written by Harriet Ryder
Setup
If you want to create some zipped files to practice on, follow the below instructions to create a new Node project and to create the practice files:
$ mkdir zipping-practice $ cd zipping-practice $ touch index.js $ mkdir data $ echo 'whatever text you want' > data/file1.txt (this will be one of your practice files… make however many you want) $ gzip -r data/*.txt (this zips all the files ending in .txt)
You will now see that your data is full of files ending in .gz which is a compressed format. This format is commonly used when compressing data to be sent via HTTP. You can read more about it here but it’s quite boring 💤
The code
Open up index.js
in your editor.
We’re going to use a module that comes with Node called Zlib which has a bunch of methods for compressing and uncompressing things. We’ll also use the filesystem module to allow us to read and write data from the filesystem (because we need to read the zipped files and write new, unzipped files).
First of all, let’s just unzip one file before working out how to do it for ALL the files:
const fs = require('fs'); const zlib = require('zlib'); const fileContents = fs.createReadStream('./data/file1.txt.gz'); console.log(fileContents);
We bring in the two modules we’ll need and then we read our first file, using the readFileSync method which is more straightforward to use than the non-blocking, asynchronous readFile method. If you log fileContents now you will see something like this:
ReadStream { _readableState: ReadableState { objectMode: false, highWaterMark: 65536, buffer: BufferList { head: null, tail: null, length: 0 }, length: 0, pipes: null, pipesCount: 0, flowing: null, ended: false, ...etc
That doesn’t look like the contents of your file though! What is is? Is that what zipped data looks like?
Nope, it’s a “Readable stream”, which an object (or interface) allowing you to read a stream of binary data. What does that mean? It means that this object will give you chunks of the data (i.e. the contents of the file) bit by bit, so you can process the file bit by bit, and not have to hold the entire file in memory. This is great for big files, but unless you piped loads of text into the file in the steps above, we aren’t going to need our file delivered to us in chunks of binary data.
Too bad though, because createReadStream gives us it in chunks (well, one chunk) and there’s nothing we can do about it. 😖 And trust me, there’s not really another way to do this because as we’ll see in a minute, our unzipping method requires us to use a stream.
BTW this is a pretty great article on streams if you want to know more 🙌
const fs = require('fs'); const zlib = require('zlib'); const fileContents = fs.createReadStream('./data/file1.txt.gz'); const writeStream = fs.createWriteStream('./data/file1.txt'); const unzip = zlib.createGunzip(); fileContents.pipe(unzip).pipe(writeStream);
Next up we create yet another stream. Two in fact. A writeStream (which will allow us to pipe the unzipped data piece by piece into a file, and a gunzip stream which will actually do the unzipping for us once we give it a stream of data.
So we pipe our file contents like so:
original file → unzip stream →new file
If you open file1.txt
you should see it contains the same text you put in it earlier.
All the unzipping for all the files
const fs = require('fs'); const zlib = require('zlib'); const directoryFiles = fs.readdirSync('./data'); directoryFiles.forEach(filename => { const fileContents = fs.createReadStream(`./data/${filename}`); const writeStream = fs.createWriteStream(`./data/${filename.slice(0, -3)}`); const unzip = zlib.createGunzip(); fileContents.pipe(unzip).pipe(writeStream); });
We can do the same as we did above, but for each file in our ./data
directory. NB it might be an idea to write your unzipped files to a fresh directory to keep them separate.
Note how we slice off the final .gz
of the filename when we create the name of the new file. file1.txt.gz
becomes file1.txt
.
This is fine but if you want to work programatically with your unzipped files afterwards, you need to know when the process of unzipping has finished. Since writing to the filesystem with our writeStream is asynchronous we’ll need to listen in for an event that tells us when it’s finished, and we’ll need to make sure we also have a way of knowing when all the files have been unzipped.
const fs = require('fs'); const zlib = require('zlib'); const directoryFiles = fs.readdirSync('./data'); Promise.all(directoryFiles.map(filename => { return new Promise((resolve, reject) => { const fileContents = fs.createReadStream(`./data/${filename}`); const writeStream = fs.createWriteStream(`./data/${filename.slice(0, -3)}`); const unzip = zlib.createGunzip(); fileContents.pipe(unzip).pipe(writeStream).on('finish', (err) => { if (err) return reject(err); else resolve(); }) }) })) .then(console.log('done'));
By mapping over the filenames and creating a promise for each one, we can safely know when all of our files have been unzipped. We resolve each promise when we receive the ‘finish’ event from the writeStream, telling is it’s finished writing to the new file. Then you can continue to do whatever you want in the next .then block 🙂
Zipping it all back up again
Okay, you changed your mind, you want to zip everything back up again. Luckily, you only need to change a few characters around!
const fs = require('fs'); const zlib = require('zlib'); const directoryFiles = fs.readdirSync('./data'); Promise.all(directoryFiles.map(filename => { return new Promise((resolve, reject) => { const fileContents = fs.createReadStream(`./data/${filename}`); const writeStream = fs.createWriteStream(`./data/${filename}.gz`); const zip = zlib.createGzip(); fileContents.pipe(zip).pipe(writeStream).on('finish', (err) => { if (err) return reject(err); else resolve(); }) }) })) .then(console.log('done'));
So there you have it — zipping and unzipping with NodeJS and JavaScript.
Thanks for reading! Hope you learned something and don’t forget to follow me for regular programming posts 👋