transit

An attempt to track code moves over the life of a project.

MIT License

Stars
4

transit

[Class Project]

Project Question

As developers, we commonly restructure our code. This is usually done in one commit (otherwise it's sloppy). Can we track when different parts of codebases undergo "movement" during refactoring?

Methodology

Given a functioning Git repo this tool will attempt to do the following things:

  1. Analyze each diff.
  2. Attempt to match any deletions with any additions which match the same signature of code. This would correspond to a 'code move'.
  3. Ideally, this program would be able to account for relevant variable name changes without failing to detect the move.

Accounting for changes in variable names is not yet implemented.

Since we built a tool, we did not perform significant gathering of outside metrics. Instead, we generated several test repositories which we used to verify the functionality of our tool.

Installing transit

For Linux or Mac, with root! Windows users are, unfortunately, on their own.

You will need the Rust compiler:

curl -L https://static.rust-lang.org/rustup.sh | rustup
chmod +x rustup
sudo rustup --channel=nightly --date=2015-04-16

Currently, transit does not track master, we have included an appropriate Cargo.lock.

This should install cargo and rustc. Clone the repository and build it:

git clone [email protected]:Hoverbear/transit.git && \
cd transit && \
cargo build --release

Now you can run the binary on any repository, even itself! It will output JSON.

./target/release/transit .

Or view a fancy web output. (Reccomended)

./target/release/transit --web=8080

Now visit localhost:8080 and enter . into the Repository field. Git the button and wait a second, you should see some pictures in a second.

Metrics

We tracked # of lines added and deleted along a revwalk and also algorithmically calculated the number of a specific type of refactors, code moves. We developed a tool and visualization software to do this.

Results

We ran transit against the following repositories:

The outputs are stored in ./examples_runs/.

Analysis

capnproto-rust

Due to the length of the output, the results from capnproto-rust is stored in a json file.

Transit found 52 moves in this repository. Of those 52 moves, 30 were single line moves.

rust-url

Transit found 8 moves in this repository. Of those 8 moves, 1 was a single line move. The majority of these moves were 100+ lines of code.

On closer inspection, the 3 line move in commit https://github.com/servo/rust-url@a1fdd28ec7761777c6d075bfe9974150a24c4d34 is actually a change in logic.

git2-rs

Transit found 7 moves in this repository. Of those 7 moves, two were single line moves.

connect

Transit found 91 moves in this repository. Of those 91 moves, 42 were single line moves.

Hyper

Hovering over the tooltips on a graph allows a researcher to see detailed numbers, and clicking at the tip of a data point will alert with commit its for later examination with git diff -p $OLD_ID $NEW_ID.

Overall

Transit is successful in detecting code moves.

Some of the detected moves where not simple refactoring but changes that would have changed the logic of the analyzed programs. It is worth noting that beyond our small test data, we did not check the percentage of moves that were not detected by transit.

Project Management

Team Member Github Account
Andrew Hobden @Hoverbear
Brody Holden @BrodyHolden
Fraser DeLisle @fraserd

Milestone 1

Date Task Complete
February 3 Initial prototype of project system Yes
February 10 Well-defined project output Yes
February 12 Feature freeze Yes
February 17 Complete refactor identification functionality Yes
February 19 Complete testing & release version 1.0 Yes
February 21 Complete analysis of target codebases Yes
February 21 Document findings Yes
February 22 Finalized report Yes
February 23 Submit final project Yes

A break down of which work tasks were completed by which team members is tracked in issue #2.

Milestone 2

Work completed for this milestone was tracked here by issue.

By task and owner here.

Threats to Validity

We don't track all possible code moves. Currently we have two approaches:

  • For rust files, detect variable name changes with no other code changes. See issue #14 for discussion of accuracy.
  • For any other file type, strip whitespace.

The moves we do detect may be false positives. This is expected due to the non-precise nature of dealing with diffs and the nativity of our algorithm.

Future Work

  • Further Language Support (See issue #13 for discussion)
  • Streamed Results
  • More accurate results

Resources