
A tool set for fast and efficient git scanning to capture data with focus on large repos

MIT License




npm install @discoveryjs/scan-git


import { createGitReader } from '@discoveryjs/scan-git';

const repo = await createGitReader('path/to/.git');
const commits = await repo.log({ ref: 'my-branch', depth: 10 });


await repo.dispose();

createGitReader(gitdir, options?)

  • gitdir: string - path to the git repo
  • options – optional settings:
    • cruftPacks – defines how cruft packs are processed:
      • 'include' or true (default) - process all packs
      • 'exclude' or false - exclude cruft packs from processing
      • 'only' - process cruft packs only


Common parameters:

  • ref: string – a reference to an object in repository
  • withOid: boolean – a flag to include resolved oid for a reference


Returns default branch name used in a repo:

const defaultBranch = await repo.defaultBranch();
// 'main'

The algorithm to identify a default branch name:

  • if there is only one branch, that must be the default
  • otherwise looking for specific branch names, in this order:
    • upstream/HEAD
    • origin/HEAD
    • main
    • master


Expands a ref into a full form, e.g. 'main' -> 'refs/heads/main'. Returns null if ref doesn't exist. For the symbolic ref names ('HEAD', 'FETCH_HEAD', 'CHERRY_PICK_HEAD', 'MERGE_HEAD' and 'ORIG_HEAD') returns a name without changes.

const fullPath = repo.expandRef('heads/main');
// 'refs/heads/main'


Resolves ref into oid if it exists, otherwise throws an exception. In case if ref is oid, returns this oid back. If ref is not a full path, expands it first.

const oid = repo.resolveRef('main');
// '8bb6e23769902199e39ab70f2441841712cbdd62'


Checks if a ref exists.

const isValidRef = repo.isRefExists('main');
// true


const remotes = repo.listRemotes();
// [
//   'origin'
// ]

repo.listRemoteBranches(remote, withOid?)

Get a list of branches for a remote.

const originBranches = await repo.listRemoteBranches('origin');
// [
//   'HEAD',
//   'main'
// ]

const originBranches = await repo.listRemoteBranches('origin', true);
// [
//   { name: 'HEAD', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
//   { name: 'main', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]


Get a list of local branches.

const localBranches = await repo.listBranches();
// [
//   'HEAD',
//   'main'
// ]

const localBranches = await repo.listBranches(true);
// [
//   { name: 'HEAD', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
//   { name: 'main', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]


Get a list of tags.

const tags = await repo.listTags();
// [
//   'v1.0.0',
//   'some-feature'
// ]

const tags = await repo.listTags(true);
// [
//   { name: 'v1.0.0', oid: '7c2a62cdbc2ef28afaaed3b6f3aef9b581e5aa8e' }
//   { name: 'some-feature', oid: '56ea7a808e35df13e76fee92725a65a373a9835c' }
// ]

File lists


Resolve a tree oid by a commit reference.

  • ref: string (default: 'HEAD') – commit reference
const treeOid = await repo.treeOidFromRef('HEAD');
// 'a1b2c3d4e5f6...'

repo.listFiles(ref, filesWithHash)

List all files in the repository at the specified commit reference.

  • ref: string (default: 'HEAD') – commit reference
  • filesWithHash: boolean (default: false) – specify to return blob's hashes
const headFiles = repo.listFiles(); // the same as repo.listFiles('HEAD')
// [ 'file.ext', 'path/to/file.ext', ... ]

const headFilesWithHashes = repo.listFiles('HEAD', true);
// [ { path: 'file.ext', hash: 'f2e492a3049...' }, ... ]

repo.getPathEntry(path, ref)

Retrieve a tree entry (file or directory) by its path at the specified commit reference.

  • path: string - the path to the file or directory
  • ref: string (default: 'HEAD') - commit reference
const entry = await repo.getPathEntry('path/to/file.txt');
// { isTree: false, path: 'path/to/file.txt', hash: 'a1b2c3d4e5f6...' }

repo.getPathsEntries(paths, ref)

Retrieve a list of tree entries (files or directories) by their paths at the specified commit reference.

  • paths: string[] - an array of paths to files or directories
  • ref: string (default: 'HEAD') - commit reference
const entries = await repo.getPathsEntries([
// [
//   { isTree: false, path: 'path/to/file1.txt', hash: 'a1b2c3d4e5f6...' },
//   { isTree: true, path: 'path/to/dir1', hash: 'b1c2d3e4f5g6...' },
//   { isTree: false, path: 'path/to/file2.txt', hash: 'c1d2e3f4g5h6...' }
// ]

repo.deltaFiles(nextRef, prevRef)

Compute the file delta (changes) between two commit references, including added, modified, and removed files.

  • nextRef: string (default: 'HEAD') - commit reference for the "next" state
  • prevRef: string (optional) - commit reference for the "previous" state
const fileDelta = await repo.deltaFiles('HEAD', 'branch-name');
// {
//   add: [ { path: 'path/to/new/file.txt', hash: 'a1b2c3d4e5f6...' }, ... ],
//   modify: [ { path: 'path/to/modified/file.txt', hash: 'f1e2d3c4b5a6...', prevHash: 'a1b2c3d4e5f6...' }, ... ],
//   remove: [ { path: 'path/to/removed/file.txt', hash: 'a1b2c3d4e5f6...' }, ... ]
// }




Return a list of commits in topological order.


  • ref – oid, hash, ref
  • depth (default 50) – limits commits count
const commits = await repo.log({ ref: 'my-branch', depth: 10 });
// [
//     Commit,
//     Commit,
//     ...
// ]

Note: Pass Infinity as depth value to load all the commits that are reachable from ref at once.

Statistics & info


repo.readObjectByHash(hash, cache?)


repo.readObjectByOid(oid, cache?)


Returns statistics for a repo:

const stats = await repo.stat();
// {
//     refs: { ... },
//     objects: {
//         loose: { ... },
//         packed: { ... }
//     }
// }









scan-git isomorphic-git Feature
loose refs
packed refs
🚫 index file Boosts fetching a file list for HEAD
loose objects
packed objects (*.pack + *.idx files)
🚫 2Gb+ packs support Version 2 pack-*.idx files support packs larger than 4 GiB by adding an optional table of 8-byte offset entries for large offsets
🚫 On-disk reverse indexes (*.rev files) Reverse index is boosting operations such as a seeking an object by offset or scanning objects in a pack order
🚫 🚫 multi-pack-index (MIDX) Stores a list of objects and their offsets into multiple packfiles, can provide O(log N) lookup time for any number of packfiles
🚫 🚫 multi-pack-index reverse indexes (RIDX) Similar to the pack-based reverse index
🚫 Cruft packs A cruft pack eliminates the need for storing unreachable objects in a loose state by including the per-object mtimes in a separate file alongside a single pack containing all loose objects
🚫 🚫 Pack and multi-pack bitmaps Bitmaps store reachability information about the set of objects in a packfile, or a multi-pack index
🚫 (TBD) 🚫 commit-graph A binary file format that creates a structured representation of Git’s commit history, boost some operations
