Finding duplicate JavaScript code

development-tools javascript
Finding duplicate JavaScript code

Duplicate code is a sequence of source code that occurs more than once. Having duplicate code in a codebase is often a bad practice because repetition usually introduces bugs and makes the code more difficult to maintain.

It is worth mentioning that duplicate code is not always the same code. Duplicate code comes in many forms, and more often than not, we see the same expressions or structure but with different values or variable names.

There can also be duplicate code that looks completely different but is functionally identical. Sequences of duplicate code are also referred to as code clones and the process of finding them as clone detection.

Finding duplicates

There are many ways to find duplicate code. If the codebase is relatively small, it might be feasible to identify the problematic code manually. In all other cases, it is better to use an automated tool.

In this post, I want to mention two tools to find duplicate JavaScript code. The first one is jsinspect, a dedicated tool for finding copy-pasted and structurally similar code. The second one is a general-purpose source code analyzer PMD which includes a Copy Paste Detector (CPD). PMD supports many programming languages, and JavaScript is among them.

jsinspect

jsinspect uses Abstract Syntax Tree to build a source code structure. The code is represented as a tree where each node denotes a construct such as a block statement, variable declaration, or another language construct. Due to the use of AST, jsinspect should be good in finding structurally similar code.

We can install jsinspect globally on the system as an npm package:

npm install -g jsinspect

Run jsinspect with:

jsinspect ./path/to/src

Configuration can be specified with command-line arguments or provided in a dedicated configuration file. The essential options are -i to match identifiers and -t <number> to specify the smallest subset of nodes to analyze. jsinspect has very nice defaults. It goes only through .js files and ignores things like require statements or node_modules directories.

Two different reporters are available to generate reports, making it easy to integrate them into a build process.

PMD Copy Paste Detector

PMD is a token-based source code analyzer based on Rabin–Karp algorithm. The part that we are interested in is called CPD (Copy Paste Detector). Because PMD is token-based, it takes a rather different approach to clone detection than jsinspect. It can be interesting to compare their outputs and use them both when de-duplicating code in an application.

PMD is distributed as a zip archive. When downloaded, navigate to the folder and run it with:

./run.sh cpd --minimum-tokens 15 --files ./path/to/src --language ecmascript

We can tweak the output by modifying minimum-tokens parameter.

Last updated on 8.3.2022.