cloc

Count Lines of Code

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.

Latest release: v2.02 (Aug. 2, 2024)

cloc moved to GitHub in September 2015 after being hosted at http://cloc.sourceforge.net/ since August 2006.

Quick Start
Overview
Download
License
Why Use cloc?
Other Counters
Building a Windows Executable
Basic Use
Options
Recognized Languages
How it Works
Advanced Use
Complex regular subexpression recursion limit
Limitations
Requesting Support for Additional Languages
Reporting Problems
Citation
Acknowledgments
Copyright

Quick Start ▲

Step 1: Download cloc (several methods, see below) or run cloc's docker image. The Windows executable has no requirements. The source version of cloc requires a Perl interpreter, and the Docker version of cloc requires a Docker installation.

Step 2: Open a terminal (cmd.exe on Windows).

Step 3: Invoke cloc to count your source files, directories, archives, or git commits. The executable name differs depending on whether you use the development source version (cloc), source for a released version (cloc-2.02.pl) or a Windows executable (cloc-2.02.exe).

On this page, cloc is the generic term used to refer to any of these.

Include Security has a YouTube video showing the steps in action.

a file

a directory

an archive

We'll pull cloc's source zip file from GitHub, then count the contents:

a git repository, using a specific commit

This example uses code from PuDB, a fantastic Python debugger.

each subdirectory of a particular directory

Say you have a directory with three different git-managed projects, Project0, Project1, and Project2. You can use your shell's looping capability to count the code in each. This example uses bash (scroll down for cmd.exe example):

each subdirectory of a particular directory (Windows/cmd.exe)

Overview ▲

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. Given two versions of a code base, cloc can compute differences in blank, comment, and source lines. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable. cloc is known to run on many flavors of Linux, FreeBSD, NetBSD, OpenBSD, macOS, AIX, HP-UX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl 5.6.1 or higher, Strawberry Perl, Windows Subsystem for Linux, Cygwin, MobaXTerm with the Perl plug-in installed, or a mingw environment and terminal such as provided by Git for Windows. Alternatively one can use the Windows binary of cloc generated with PAR::Packer to run on Windows computers that have neither Perl nor Cygwin.)

In addition to counting code in individual text files, directories, and git repositories, cloc can also count code in archive files such as .tar (including compressed versions), .zip, Python wheel .whl, Jupyter notebook .ipynb, source RPMs .rpm or .src (requires rpm2cpio), and Debian .deb files (requires dpkg-deb).

cloc contains code from David Wheeler's SLOCCount, Damian Conway and Abigail's Perl module Regexp::Common, Sean M. Burke's Perl module Win32::Autoglob, and Tye McQueen's Perl module Algorithm::Diff. Language scale factors were derived from Mayes Consulting, LLC web site http://softwareestimator.com/IndustryData2.htm.

New releases nominally appear every six months.

Run via docker

docker run --rm -v $PWD:/tmp aldanial/cloc

Run via docker on git-bash

docker run --rm -v "/$(pwd -W)":/tmp aldanial/cloc

Install via package manager

Depending your operating system, one of these installation methods may work for you (all but the last two entries for Windows require a Perl interpreter):

npm install -g cloc              # https://www.npmjs.com/package/cloc
sudo apt install cloc            # Debian, Ubuntu
sudo yum install cloc            # Red Hat, Fedora
sudo dnf install cloc            # Fedora 22 or later
sudo pacman -S cloc              # Arch
sudo emerge -av dev-util/cloc    # Gentoo https://packages.gentoo.org/packages/dev-util/cloc
sudo apk add cloc                # Alpine Linux
doas pkg_add cloc                # OpenBSD
sudo pkg install cloc            # FreeBSD
sudo port install cloc           # macOS with MacPorts
brew install cloc                # macOS with Homebrew
winget install AlDanial.Cloc     # Windows with winget
choco install cloc               # Windows with Chocolatey
scoop install cloc               # Windows with Scoop

Note: I don't control any of these packages. If you encounter a bug in cloc using one of the above packages, try with cloc pulled from the latest stable release here on GitHub (link follows below) before submitting a problem report.

Stable release

https://github.com/AlDanial/cloc/releases/latest

Development version

https://github.com/AlDanial/cloc/raw/master/cloc

License ▲

cloc is licensed under the GNU General Public License, v 2, excluding portions which are copied from other sources. Code copied from the Regexp::Common, Win32::Autoglob, and Algorithm::Diff Perl modules is subject to the Artistic License.

Why Use cloc? ▲

cloc has many features that make it easy to use, thorough, extensible, and portable:

Exists as a single, self-contained file that requires minimal installation effort---just download the file and run it.
Can read language comment definitions from a file and thus potentially work with computer languages that do not yet exist.
Allows results from multiple runs to be summed together by language and by project.
Can produce results in a variety of formats: plain text, Markdown, SQL, JSON, XML, YAML, comma separated values.
Can count code within compressed archives (tar balls, Zip files, Java .ear files).
Has numerous troubleshooting options.
Handles file and directory names with spaces and other unusual characters.
Has no dependencies outside the standard Perl distribution.
Runs on Linux, FreeBSD, NetBSD, OpenBSD, macOS, AIX, HP-UX, Solaris, IRIX, and z/OS systems that have Perl 5.6 or higher. The source version runs on Windows with either ActiveState Perl, Strawberry Perl, Cygwin, or MobaXTerm+Perl plugin. Alternatively on Windows one can run the Windows binary which has no dependencies.

Other Counters ▲

If cloc does not suit your needs here are other freely available counters to consider:

Other references:

QSM's directory of code counting tools.
The Wikipedia entry for source code line counts.

Regexp::Common, Digest::MD5, Win32::Autoglob, Algorithm::Diff

Although cloc does not need Perl modules outside those found in the standard distribution, cloc does rely on a few external modules. Code from three of these external modules--Regexp::Common, Win32::Autoglob, and Algorithm::Diff--is embedded within cloc. A fourth module, Digest::MD5, is used only if it is available. If cloc finds Regexp::Common or Algorithm::Diff installed locally it will use those installation. If it doesn't, cloc will install the parts of Regexp::Common and/or Algorithm:Diff it needs to temporary directories that are created at the start of a cloc run then removed when the run is complete. The necessary code from Regexp::Common v2.120 and Algorithm::Diff v1.1902 are embedded within the cloc source code (see subroutines Install_Regexp_Common() and Install_Algorithm_Diff() ). Only three lines are needed from Win32::Autoglob and these are included directly in cloc.

Additionally, cloc will use Digest::MD5 to validate uniqueness among equally-sized input files if Digest::MD5 is installed locally.

A parallel processing option, --processes=N, was introduced with cloc version 1.76 to enable faster runs on multi-core machines. However, to use it, one must have the module Parallel::ForkManager installed. This module does not work reliably on Windows so parallel processing will only work on Unix-like operating systems.

The Windows binary is built on a computer that has both Regexp::Common and Digest::MD5 installed locally.

Building a Windows Executable ▲

Create your own executable

The most robust option for creating a Windows executable of cloc is to use ActiveState's Perl Development Kit. It includes a utility, perlapp, which can build stand-alone Windows, Mac, and Linux binaries of Perl source code.

perl2exe will also do the trick. If you do have perl2exe, modify lines 84-87 in the cloc source code for a minor code modification that is necessary to make a cloc Windows executable.

Otherwise, to build a Windows executable with pp from PAR::Packer, first install a Windows-based Perl distribution (for example Strawberry Perl or ActivePerl) following their instructions. Next, open a command prompt, aka a DOS window and install the PAR::Packer module. Finally, invoke the newly installed pp command with the cloc source code to create an .exe file:

A variation on the instructions above is if you installed the portable version of Strawberry Perl, you will need to run portableshell.bat first to properly set up your environment.

The Windows executable in the Releases section, cloc-2.02.exe, was built on a 64 bit Windows 10 computer using Strawberry Perl 5.30.2 and PAR::Packer to build the .exe.

Is the Windows executable safe to run? Does it have malware?

Ideally, no one would need the Windows executable because they have a Perl interpreter installed on their machines and can run the cloc source file. On centrally-managed corporate Windows machines, however, this this may be difficult or impossible.

The Windows executable distributed with cloc is provided as a best-effort of a virus and malware-free .exe. You are encouraged to run your own virus scanners against the executable and also check sites such https://www.virustotal.com/ . The entries for recent versions are:

cloc-2.02-winget.exe: (includes PR 850 to allow running from a symlink on Windows) https://www.virustotal.com/gui/file/be033061e091fea48a5bc9e8964cee0416ddd5b34bd5226a1c9aa4b30bdba66a?nocache=1

cloc-2.02.exe: https://www.virustotal.com/gui/file/369ed76125f7399cd582d169adf39a2e08ae5066031fea0cc8b2836ea50e7ce2?nocache=1

cloc-2.00.exe: https://www.virustotal.com/gui/file/7a234ef0cb495de1b5776acf88c5554e2bab1fb02725a5fb85756a6db3121c1f

cloc-1.98.exe: https://www.virustotal.com/gui/file/88615d193ec8c06f7ceec3cc1d661088af997798d87ddff331d9e9f9128a6782?nocache=1

cloc-1.96.1.exe: https://www.virustotal.com/gui/file/00b1c9dbbfb920dabd374418e1b86d2c24b8cd2b8705aeb956dee910d0d75d45?nocache=1

cloc-1.96.exe: https://www.virustotal.com/gui/file/54bf5f46fbaba7949c4eb2d4837b03c774c0ba587448a5bad9b8efc0222b1583?nocache=1

cloc-1.94.exe: https://www.virustotal.com/gui/file/b48a6002fb75fa66ec5d0c05a5c4d51f2ad22b5b025b7eb4e3945d18419c0952?nocache=1

cloc-1.92.exe: https://www.virustotal.com/gui/file/2668fcf8609c431e8934fe9e1866bc620c58d198c4eb262f1d3ef31ef4a690f7

cloc-1.90.exe: https://www.virustotal.com/gui/file/d655caae55486f9bac39f7e3c7b7553bcfcfe2b88914c79bfc328055f22b8a37/detection

cloc-1.88.exe: https://www.virustotal.com/gui/file/97d5d2631d1cccdbfd99267ab8a4cf5968816bbe52c0f9324e72e768857f642d/detection

cloc-1.86.exe: https://www.virustotal.com/gui/file/1b2e189df1834411b34534db446330d1c379b4bc008af3042ee9ade818c6a1c8/detection

cloc-1.84.exe: https://www.virustotal.com/gui/file/e73d490c1e4ae2f50ee174005614029b4fa2610dcb76988714839d7be68479af/detection

cloc-1.82.exe: https://www.virustotal.com/#/file/2e5fb443fdefd776d7b6b136a25e5ee2048991e735042897dbd0bf92efb16563/detection

cloc-1.80.exe: https://www.virustotal.com/#/file/9e547b01c946aa818ffad43b9ebaf05d3da08ed6ca876ef2b6847be3bf1cf8be/detection

cloc-1.78.exe: https://www.virustotal.com/#/file/256ade3df82fa92febf2553853ed1106d96c604794606e86efd00d55664dd44f/detection

cloc-1.76.exe: https://www.virustotal.com/#/url/c1b9b9fe909f91429f95d41e9a9928ab7c58b21351b3acd4249def2a61acd39d/detection

cloc-1.74_x86.exe: https://www.virustotal.com/#/file/b73dece71f6d3199d90d55db53a588e1393c8dbf84231a7e1be2ce3c5a0ec75b/detection

cloc 1.72 exe: https://www.virustotal.com/en/url/8fd2af5cd972f648d7a2d7917bc202492012484c3a6f0b48c8fd60a8d395c98c/analysis/

cloc 1.70 exe: https://www.virustotal.com/en/url/63edef209099a93aa0be1a220dc7c4c7ed045064d801e6d5daa84ee624fc0b4a/analysis/

cloc 1.68 exe: https://www.virustotal.com/en/file/c484fc58615fc3b0d5569b9063ec1532980281c3155e4a19099b11ef1c24443b/analysis/

cloc 1.66 exe: https://www.virustotal.com/en/file/54d6662e59b04be793dd10fa5e5edf7747cf0c0cc32f71eb67a3cf8e7a171d81/analysis/1453601367/

Why is the Windows executable so large?

Windows executables of cloc versions 1.60 and earlier, created with perl2exe as noted above, are about 1.6 MB, while versions 1.62 and 1.54, created with PAR::Packer, are 11 MB. Version 1.66, built with a newer version of PAR::Packer, is about 5.5 MB. Why are the PAR::Packer, executables so much larger than those built with perl2exe? My theory is that perl2exe uses smarter tree pruning logic than PAR::Packer, but that's pure speculation.

Basic Use ▲

cloc is a command line program that takes file, directory, and/or archive names as inputs. Here's an example of running cloc against the Perl v5.22.0 source distribution:

To run cloc on Windows computers, open up a command (aka DOS) window and invoke cloc.exe from the command line there. Alternatively, try ClocViewer, the GUI wrapper around cloc found at https://github.com/Roemer/ClocViewer.

See also https://github.com/jmensch1/codeflower for a graphical rendering of cloc results.

Options ▲

Recognized Languages ▲

The above list can be customized by reading language definitions from a file with the --read-lang-def or --force-lang-def options.

These file extensions map to multiple languages:

cl files could be Lisp or OpenCL
cls files could be Visual Basic, TeX or Apex Class
cs files could be C# or Smalltalk
d files could be D or dtrace
f files could be Fortran 77 or Forth
fnc files could be Oracle PL or SQL
for files could be Fortran 77 or Forth
fs files could be F# or Forth
inc files could be PHP or Pascal
itk files could be Tcl or Tk
jl files could be Lisp or Julia
lit files could be PL or M
m files could be MATLAB, Mathematica, Objective-C, MUMPS or Mercury
p6 files could be Perl or Prolog
pl files could be Perl or Prolog
PL files could be Perl or Prolog
pp files could be Pascal or Puppet
pro files could be IDL, Qt Project, Prolog or ProGuard
ts files could be TypeScript or Qt Linguist
ui files could be Qt or Glade
v files could be Verilog-SystemVerilog or Coq

cloc has subroutines that attempt to identify the correct language based on the file's contents for these special cases. Language identification accuracy is a function of how much code the file contains; .m files with just one or two lines for example, seldom have enough information to correctly distinguish between MATLAB, Mercury, MUMPS, or Objective-C.

Languages with file extension collisions are difficult to customize with --read-lang-def or --force-lang-def as they have no mechanism to identify languages with common extensions. In this situation one must modify the cloc source code.

How It Works ▲

cloc's method of operation resembles SLOCCount's: First, create a list of files to consider. Next, attempt to determine whether or not found files contain recognized computer language source code. Finally, for files identified as source files, invoke language-specific routines to count the number of source lines.

A more detailed description:

If the input file is an archive (such as a .tar.gz or .zip file), create a temporary directory and expand the archive there using a system call to an appropriate underlying utility (tar, bzip2, unzip, etc) then add this temporary directory as one of the inputs. (This works more reliably on Unix than on Windows.)
Use File::Find to recursively descend the input directories and make a list of candidate file names. Ignore binary and zero-sized files.
Make sure the files in the candidate list have unique contents (first by comparing file sizes, then, for similarly sized files, compare MD5 hashes of the file contents with Digest::MD5). For each set of identical files, remove all but the first copy, as determined by a lexical sort, of identical files from the set. The removed files are not included in the report. (The --skip-uniqueness switch disables the uniqueness tests and forces all copies of files to be included in the report.) See also the --ignored= switch to see which files were ignored and why.
Scan the candidate file list for file extensions which cloc associates with programming languages (see the --show-lang and --show-ext options). Files which match are classified as containing source code for that language. Each file without an extensions is opened and its first line read to see if it is a Unix shell script (anything that begins with #!). If it is shell script, the file is classified by that scripting language (if the language is recognized). If the file does not have a recognized extension or is not a recognized scripting language, the file is ignored.
All remaining files in the candidate list should now be source files for known programming languages. For each of these files:
1. Read the entire file into memory.
2. Count the number of lines (= Loriginal).
3. Remove blank lines, then count again (= Lnon_blank).
4. Loop over the comment filters defined for this language. (For
  example, C++ has two filters: (1) remove lines that start with
  optional whitespace followed by // and (2) remove text between
  /* and */) Apply each filter to the code to remove comments.
  Count the left over lines (= Lcode).
5. Save the counts for this language:
  - blank lines = Loriginal - Lnon_blank
  - comment lines = Lnon_blank - Lcode
  - code lines = Lcode

The options modify the algorithm slightly. The --read-lang-def option for example allows the user to read definitions of comment filters, known file extensions, and known scripting languages from a file. The code for this option is processed between Steps 2 and 3.

Advanced Use ▲

Remove Comments from Source Code ▲

How can you tell if cloc correctly identifies comments? One way to convince yourself cloc is doing the right thing is to use its --strip-comments option to remove comments and blank lines from files, then compare the stripped-down files to originals.

Let's try this out with the SQLite amalgamation, a C file containing all code needed to build the SQLite library along with a header file:

The extension argument given to --strip-comments is arbitrary; here nc was used as an abbreviation for "no comments".

cloc removed over 31,000 lines from the file:

We can now compare the original file, sqlite3.c and the one stripped of comments, sqlite3.c.nc with tools like diff or vimdiff and see what exactly cloc considered comments and blank lines. A rigorous proof that the stripped-down file contains the same C code as the original is to compile these files and compare checksums of the resulting object files.

First, the original source file:

Next, the version without comments:

cloc removed over 31,000 lines of comments and blanks but did not modify the source code in any significant way since the resulting object file matches the original.

Work with Compressed Archives ▲

Versions of cloc before v1.07 required an --extract-with=CMD option to tell cloc how to expand an archive file. Beginning with v1.07 this is extraction is attempted automatically. At the moment the automatic extraction method works reasonably well on Unix-type OS's for the following file types: .tar.gz, .tar.bz2, .tar.xz, .tgz, .zip, .ear, .deb. Some of these extensions work on Windows if one has WinZip installed in the default location (C:\Program Files\WinZip\WinZip32.exe). Additionally, with newer versions of WinZip, the [http://www.winzip.com/downcl.htm](command line add-on) is needed for correct operation; in this case one would invoke cloc with something like

Ref. http://sourceforge.net/projects/cloc/forums/forum/600963/topic/4021070?message=8938196

In situations where the automatic extraction fails, one can try the --extract-with=CMD option to count lines of code within tar files, Zip files, or other compressed archives for which one has an extraction tool. cloc takes the user-provided extraction command and expands the archive to a temporary directory (created with File::Temp), counts the lines of code in the temporary directory, then removes that directory. While not especially helpful when dealing with a single compressed archive (after all, if you're going to type the extraction command anyway why not just manually expand the archive?) this option is handy for working with several archives at once.

For example, say you have the following source tarballs on a Unix machine

perl-5.8.5.tar.gz
Python-2.4.2.tar.gz

and you want to count all the code within them. The command would be

If that Unix machine has GNU tar (which can uncompress and extract in one step) the command can be shortened to

On a Windows computer with WinZip installed in c:\Program Files\WinZip the command would look like

Java .ear files are Zip files that contain additional Zip files. cloc can handle nested compressed archives without difficulty--provided all such files are compressed and archived in the same way. Examples of counting a Java .ear file in Unix and Windows:

Differences ▲

The --diff switch allows one to measure the relative change in source code and comments between two versions of a file, directory, or archive. Differences reveal much more than absolute code counts of two file versions. For example, say a source file has 100 lines and its developer delivers a newer version with 102 lines. Did the developer add two comment lines, or delete seventeen source lines and add fourteen source lines and five comment lines, or did the developer do a complete rewrite, discarding all 100 original lines and adding 102 lines of all new source? The diff option tells how many lines of source were added, removed, modified or stayed the same, and how many lines of comments were added, removed, modified or stayed the same.

Differences in blank lines are handled much more coarsely because these are stripped by cloc early on. Unless a file pair is identical, cloc will report only differences in absolute counts of blank lines. In other words, one can expect to see only entries for 'added' if the second file has more blanks than the first, and 'removed' if the situation is reversed. The entry for 'same' will be non-zero only when the two files are identical.

In addition to file pairs, one can give cloc pairs of directories, or pairs of file archives, or a file archive and a directory. cloc will try to align file pairs within the directories or archives and compare diffs for each pair. For example, to see what changed between GCC 4.4.0 and 4.5.0 one could do

Be prepared to wait a while for the results though; the --diff option runs much more slowly than an absolute code count.

To see how cloc aligns files between the two archives, use the --diff-alignment option

to produce the file align.txt which shows the file pairs as well as files added and deleted. The symbols == and != before each file pair indicate if the files are identical (==) or if they have different content (!=).

Here's sample output showing the difference between the Python 2.6.6 and 2.7 releases:

A pair of errors occurred. The first pair was caused by timing out when computing diffs of the file Python-X/Mac/Modules/qt/_Qtmodule.c in each Python version. This file has > 26,000 lines of C code and takes more than 10 seconds--the default maximum duration for diff'ing a single file--on my slow computer. (Note: this refers to performing differences with the sdiff() function in the Perl Algorithm::Diff module, not the command line diff utility.) This error can be overcome by raising the time to, say, 20 seconds with --diff-timeout 20.

The second error is more problematic. The files Python-X/Mac/Modules/qd/qdsupport.py include Python docstring (text between pairs of triple quotes) containing C comments. cloc treats docstrings as comments and handles them by first converting them to C comments, then using the C comment removing regular expression. Nested C comments yield erroneous results however.

Create Custom Language Definitions ▲

cloc can write its language comment definitions to a file or can read comment definitions from a file, overriding the built-in definitions. This can be useful when you want to use cloc to count lines of a language not yet included, to change association of file extensions to languages, or to modify the way existing languages are counted.

The easiest way to create a custom language definition file is to make cloc write its definitions to a file, then modify that file:

creates the file my_definitions.txt which can be modified then read back in with either the --read-lang-def or --force-lang-def option. The difference between the options is former merges language definitions from the given file in with cloc's internal definitions with cloc's taking precedence if there are overlaps. The --force-lang-def option, on the other hand, replaces cloc's definitions completely. This option has a disadvantage in preventing cloc from counting languages whose extensions map to multiple languages as these languages require additional logic that is not easily expressed in a definitions file.

Each language entry has four parts:

The language name starting in column 1.
One or more comment filters starting in column 5.
One or more filename extensions starting in column 5.
A 3rd generation scale factor starting in column 5.
This entry must be provided
but its value is not important
unless you want to compare your language to a hypothetical
third generation programming language.

A filter defines a method to remove comment text from the source file. For example the entry for C++ looks like this

C++ has two filters: first, remove lines matching Regexp::Common's C++ comment regex. The second filter using remove_inline is currently unused. Its intent is to identify lines with both code and comments and it may be implemented in the future.

A more complete discussion of the different filter options may appear here in the future. The output of cloc's --write-lang-def option should provide enough examples for motivated individuals to modify or extend cloc's language definitions.

Combine Reports ▲

If you manage multiple software projects you might be interested in seeing line counts by project, not just by language. Say you manage three software projects called MariaDB, PostgreSQL, and SQLite. The teams responsible for each of these projects run cloc on their source code and provide you with the output. For example MariaDB team does

and provides you with the file mariadb-10.1.txt. The contents of the three files you get are

While these three files are interesting, you also want to see the combined counts from all projects. That can be done with cloc's --sum_reports option:

The report combination produces two output files, one for sums by programming language (databases.lang) and one by project (databases.file). Their contents are

Report files themselves can be summed together. Say you also manage development of Perl and Python and you want to keep track of those line counts separately from your database projects. First create reports for Perl and Python separately:

then sum these together with

Finally, combine the combination files:

One limitation of the --sum-reports feature is that the individual counts must be saved in the plain text format. Counts saved as XML, JSON, YAML, or SQL will produce errors if used in a summation.

SQL ▲

Cloc can write results in the form of SQL table create and insert statements for use with relational database programs such as SQLite, MySQL, PostgreSQL, Oracle, or Microsoft SQL. Once the code count information is in a database, the information can be interrogated and displayed in interesting ways.

A database created from cloc SQL output has two tables, metadata and t:

Table metadata:

Field	Type
id	integer primary key
timestamp	text
project	text
elapsed_s	text

Table t:

Field	Type
project	text
language	text
file	text
nBlank	integer
nComment	integer
nCode	integer
nScaled	real
foreign key (id)	references metadata (id)

The metadata table contains information about when the cloc run was made. Run time is stored two ways: as Unix epoch seconds in id and as an ISO 8601 formatted text string in the local time zone (for example 2024-03-01 14:19:41) in timestamp. The --sql-append switch allows one to combine many runs in a single database; each run adds a row to the metadata table. The code count information resides in table t. The id key makes it easy to associate a run's code count with its metadata.

Let's repeat the code count examples of Perl, Python, SQLite, MySQL and PostgreSQL tarballs shown in the Combine Reports example above, this time using the SQL output options and the SQLite database engine.

The --sql switch tells cloc to generate output in the form of SQL table create and insert commands. The switch takes an argument of a file name to write these SQL statements into, or, if the argument is 1 (numeric one), streams output to STDOUT. Since the SQLite command line program, sqlite3, can read commands from STDIN, we can dispense with storing SQL statements to a file and use --sql 1 to pipe data directly into the SQLite executable:

The --sql-project mariadb part is optional; there's no need to specify a project name when working with just one code base. However, since we'll be adding code counts from four other tarballs, we'll only be able to identify data by input source if we supply a project name for each run.

Now that we have a database we will need to pass in the --sql-append switch to tell cloc not to wipe out this database but instead add more data:

Now the fun begins--we have a database, code.db, with lots of information about the five projects and can query it for all manner of interesting facts.

Which is the longest file over all projects?

sqlite3's default output format leaves a bit to be desired. We can add an option to the program's rc file, ~/.sqliterc, to show column headers:

One might be tempted to also include

in ~/.sqliterc but this causes problems when the output has more than one row since the widths of entries in the first row govern the maximum width for all subsequent rows. Often this leads to truncated output--not at all desirable. One option is to write a custom SQLite output formatter such as sqlite_formatter, included with cloc.

To use it, simply pass sqlite3's STDOUT into sqlite_formatter via a pipe:

If the "Project File" line doesn't appear, add .header on to your ~/.sqliterc file as explained above.

What is the longest file over all projects?

What is the longest file in each project?

Which files in each project have the most code lines?

Which C source files with more than 300 lines have a comment ratio below 1%?

What are the ten longest files (based on code lines) that have no comments at all? Exclude header, .html, and YAML files.

What are the most popular languages (in terms of lines of code) in each project?

Custom Column Output ▲

Cloc's default output is a text table with five columns: language, file count, number of blank lines, number of comment lines and number of code lines. The switches --by-file, --3, and --by-percent generate additional information but sometimes even those are insufficient.

The --sql option described in the previous section offers the ability to create custom output. This section has a pair of examples that show how to create custom columns. The first example includes an extra column, Total, which is the sum of the numbers of blank, comment, and code lines. The second shows how to include the language name when running with --by-file.

Example 1: Add a "Totals" column.

The first step is to run cloc and save the output to a relational database, SQLite in this case:

(the tar file comes from the YAML-C++ project).

Second, we craft an SQL query that returns the regular cloc output plus an extra column for totals, then save the SQL statement to a file, query_with_totals.sql:

Third, we run this query through SQLite using the counts.db database. We'll include the -header switch so that SQLite prints the column names:

The extra column for Total is there but the format is unappealing. Running the output through sqlite_formatter yields the desired result:

The next section, Wrapping cloc in other scripts, shows one way these commands can be combined into a new utility program.

Example 2: Include a column for "Language" when running with --by-file.

Output from --by-file omits each file's language to save screen real estate; file paths for large projects can be long and including an extra 20 or so characters for a Language column can be excessive.

As an example, here are the first few lines of output using the same code base as in Example 1:

The absence of language identification for each file is a bit disappointing, but this can be remedied with a custom column solution.

The first step, creating a database, matches that from Example 1 so we'll go straight to the second step of creating the desired SQL query. We'll store this one in the file by_file_with_language.sql:

Our desired extra column appears when we pass this custom SQL query through our database:

Wrapping cloc in other scripts ▲

More complex code counting solutions are possible by wrapping cloc in scripts or programs. The "total lines" column from example 1 of Custom Column Output could be simplified to a single command with this shell script (on Linux):

Saving the lines above to total_columns.sh and making it executable (chmod +x total_columns.sh) would let us do

to directly get

Other examples:

Count code from a specific branch of a web-hosted
git repository and send the results as a .csv email attachment:
https://github.com/dannyloweatx/checkmarx

git and UTF8 pathnames ▲

cloc's --git option may fail if you work with directory or file names with UTF-8 characters (for example, see issue 457). The solution, https://stackoverflow.com/questions/22827239/how-to-make-git-properly-display-utf-8-encoded-pathnames-in-the-console-window, is to apply this git configuration command:

Your console's font will need to be capable of displaying Unicode characters.

Third Generation Language Scale Factors ▲

cloc versions before 1.50 by default computed, for the provided inputs, a rough estimate of how many lines of code would be needed to write the same code in a hypothetical third-generation computer language. To produce this output one must now use the --3 switch.

Scale factors were derived from the 2006 version of language gearing ratios listed at Mayes Consulting web site, http://softwareestimator.com/IndustryData2.htm, using this equation:

cloc scale factor for language X = 3rd generation default gearing ratio / language X gearing ratio

For example, cloc 3rd generation scale factor for DOS Batch = 80 / 128 = 0.625.

The biggest flaw with this approach is that gearing ratios are defined for logical lines of source code not physical lines (which cloc counts). The values in cloc's 'scale' and '3rd gen. equiv.' columns should be taken with a large grain of salt.

options.txt configuration file ▲

If you find yourself using the same command line switches every time you invoke cloc, you can save some typing by adding those switches to the options.txt runtime configuration file. cloc will look for this file in the following default locations:

If you run cloc with --help, cloc will tell you where it expects to find this config file file. The information appears by the explanation of the --config switch after the text the default location of. On Unix-like operating systems, this can be simplifed to

and in a Windows cmd terminal with

Place each switch and arguments, if any, on a line by itself. Lines prefixed with # symbol are ignored as comments and blank lines are skipped. Leading hyphens on the switches are optional. Here's a sample file:

The path to the options.txt file can also be specified with the --config FILE switch.

Finally, if cloc finds an options.txt file in the same directory as files given by any of these switches (in the listed priority), it will use that configuration file from that location:

--list-file
--exclude-list-file
--read-lang-def
--force-lang-def
--diff-list-file

Run with --verbose to have cloc tell you which, if any, options.txt file it uses.

Java Programmatic Interface ▲

Ozren Dabić created a Java programmatic interface to cloc. It is available at https://github.com/seart-group/jcloc

Complex regular subexpression recursion limit ▲

cloc relies on the Regexp::Common module's regular expressions to remove comments from source code. If comments are malformed, for example the /* start comment marker appears in a C program without a corresponding */ marker, the regular expression engine could enter a recursive loop, eventually triggering the warning Complex regular subexpression recursion limit.

The most common cause for this warning is the existence of comment markers in string literals. While language compilers and interpreters are smart enough to recognize that "/*" (for example) is a string and not a comment, cloc is fooled. File path globs, as in this line of JavaScript

are frequent culprits.

In an attempt to overcome this problem, a different algorithm which removes comment markers in strings can be enabled with the --strip-str-comments switch. Doing so, however, has drawbacks: cloc will run more slowly and the output of --strip-comments will contain strings that no longer match the input source.

Limitations ▲

Identifying comments within source code is trickier than one might expect. Many languages would need a complete parser to be counted correctly. cloc does not attempt to parse any of the languages it aims to count and therefore is an imperfect tool. The following are known problems:

If you suspect your code has such strings, use the switch --strip-str-comments to switch to the algorithm which removes embedded comment markers. Its use will render the five lines above as

and therefore return a count of five lines of code. See the previous section on drawbacks to using --strip-str-comments.

Requesting Support for Additional Languages ▲

If cloc does not recognize a language you are interested in counting, create a GitHub issue requesting support for your language. Include this information:

Reporting Problems ▲

If you encounter a problem with cloc, first check to see if you're running with the latest version of the tool:

If the version is older than the most recent release at https://github.com/AlDanial/cloc/releases, download the latest version and see if it solves your problem.

If the problem happens with the latest release, submit a new issue at https://github.com/AlDanial/cloc/issues only if you can supply enough information for anyone reading the issue report to reproduce the problem. That means providing

Problem reports that cannot be reproduced will be ignored and eventually closed.

Citation ▲

Please use the following bibtex entry to cite cloc in a publication:

(Update the version number and corresponding year if this entry is outdated.)

Acknowledgments ▲

Wolfram Rösler provided most of the code examples in the test suite. These examples come from his Hello World collection.

Ismet Kursunoglu found errors with the MUMPS counter and provided access to a computer with a large body of MUMPS code to test cloc.

Tod Huggins gave helpful suggestions for the Visual Basic filters.

Anton Demichev found a flaw with the JSP counter in cloc v0.76 and wrote the XML output generator for the --xml option.

Reuben Thomas pointed out that ISO C99 allows // as a comment marker, provided code for the --no3 and --stdin-name options, counting the m4 language, and suggested several user-interface enhancements.

Michael Bello provided code for the --opt-match-f, --opt-not-match-f, --opt-match-d, and --opt-not-match-d options.

Mahboob Hussain inspired the --original-dir and --skip-uniqueness options, found a bug in the duplicate file detection logic and improved the JSP filter.

Randy Sharo found and fixed an uninitialized variable bug for shell scripts having only one line.

Steven Baker found and fixed a problem with the YAML output generator.

Greg Toth provided code to improve blank line detection in COBOL.

Joel Oliveira provided code to let --exclude-list-file handle directory name exclusion.

Blazej Kroll provided code to produce an XSLT file, cloc-diff.xsl, when producing XML output for the --diff option.

Denis Silakov enhanced the code which generates cloc.xsl when using --by-file and --by-file-by-lang options, and provided an XSL file that works with --diff output.

Andy ([email protected]) provided code to fix several bugs: correct output of --counted so that only files that are used in the code count appear and that results are shown by language rather than file name; allow --diff output from multiple runs to be summed together with --sum-reports.

Jari Aalto created the initial version of cloc.1.pod and maintains the Debian package for cloc.

Mikkel Christiansen ([email protected]) provided counter definitions for Clojure and ClojureScript.

Vera Djuraskovic from Webhostinggeeks.com provided the Serbo-Croatian translation.

Gill Ajoft of Ajoft Software provided the Bulgarian translation.

The Knowledge Team provided the Slovakian translation.

Erik Gooven Arellano Casillas provided an update to the MXML counter to recognize ActionScript comments.

Gianluca Casati created the cloc CPAN package.

Ryan Lindeman implemented the --by-percent feature.

Kent C. Dodds, @kentcdodds, created and maintains the npm package of cloc.

Viktoria Parnak provided the Ukrainian translation.

Natalie Harmann provided the Belarussian translation.

Nithyal at Healthcare Administration Portal provided the Tamil translation.