Software development and beyond

Finding duplicate JavaScript code

Duplicate code is a sequence of source code that occurs more than once. Having duplicate code in a codebase is often a bad practice, because repetition usually introduces bugs and makes the code more difficult to maintain. It is worth mentioning that duplicate code is not always the same code. Duplicate code comes in many forms and more often than not we see the same expressions or structure, but with different values or variable names. There can also be duplicate code that looks completely different, but is functionally identical. Sequences of duplicate code are also referred to as code clones and the process of finding them as clone detection.

There are many ways to find duplicate code. If the codebase is relatively small and you know it well, it might be feasible to identify the problematic code manually. In all other cases it is better to use an automated tool.

In this blog post I want to talk about two tools that can be used for finding duplicate JavaScript code. The first one is jsinspect, a dedicated tool for finding copy-pasted and structurally similar code. The second one is a general purpose source code analyzer PMD which includes a Copy Paste Detector (CPD). PMD supports many programming languages and JavaScript is among them.


jsinspect is using Abstract Syntax Tree to build a structure of the source code. That means that the code is represented as a tree where each node denotes a construct such as a block statement, variable declaration or other. Because of this, jsinspect should be good in finding structurally similar code.

If you already have node.js, you can install it very quickly, because it comes in the form of npm package.

npm install -g jsinspect

There are also dedicated packages grunt-jsinspect and gulp-jsinspect available to integrate it with popular JavaScript task runners.

To run jsinspect, just type:

jsinspect ./path/to/src

More configuration can be specified by command line arguments or by creating a dedicated configuration file. The most important options are -i to match identifiers and -t <number> to specify the smallest subset of nodes to analyze. jsinspect has very nice defaults and goes only through .js files and ignores things like require statements or node_modules directories.

jsinspect is prepared to be run as a part of a build process and offers two types of reporters that can be used for generating reports.

PMD Copy Paste Detector

PMD is token-based source code analyzer based on Rabin–Karp algorithm. The part that we are interested in is called CPD (Copy Paste Detector). Because PMD is token-based, it takes rather different approach to clone detection than jsinspect. It can be very interesting to compare their outputs and I think that they can complement each other well.

PMD is distributed as a zip archive. When downloaded, navigate to the folder and run it with:

./ cpd --minimum-tokens 15 --files ./path/to/src --language ecmascript

Again, we can tweak the output by modifying minimum-tokens parameter.

Happy hunting!

Last updated on 5.5.2015.

development-tools javascript