Dealing with large Pull Requests

| 3 min read

Before anything else, I am not going to explain in-depth why large pull requests are very detrimental. Some articles are already doing that very well. See external links below.

In short, they take a long time to review, it is easier to miss some parts and even issues, they might be blocking other pieces of work, and so on.

Do not make large pull requests

Well, that's it, problem solved, just don't make large pull requests!

The problem is, it is easier said than done, and especially in a monorepo.

A monorepo gets dependencies closer to each other. It is easier to work on these dependencies at the same time to drastically reduces workflows time. It is easy to create larger pull requests, and if you don't do so, you may lose some monorepo benefits.

So should we create larger pull requests in a monorepo?

The answer is still no. Large pull requests are so detrimental, especially when you work with multiple teams and more developers, that you should still avoid them at any cost.

How to prevent large pull requests

Having large pull requests issues can come from the moment you are planning your work. For example, when you are meeting with your teams to create tickets to work on.

Try to reduce the size and scope of the tickets, which has a good chance of reducing the size of the pull requests. If a feature is incomplete and not releasable, it is worth spending a bit more time hiding it behind a feature flag and continue the work in another pull request.

How to reduce large pull requests

Something I learned in software development: reduce the human factor. This is a paper from 1979: Human Factors in Software Engineering, so it is no novelty.

I would even say: automate as much as you can!

To drastically reduce large pull requests, let's automate a process to tell our beloved developers that they should probably break down their beautiful work in different pull requests.

Steps to automate a warning

There are several ways to do that. It will depend on the tools you are using.

However, I can give some directions for git and Github.

The steps:

  • Find out if a pull request is large
  • Notify the developers in their pull requests

Notify the developers

As it is good practice to walk backward, you always want to start from a user perspective when creating a new feature, so we will start with how to notify the developers.

You can use the Gihub API directly, triggered from some Github webhooks, it would work just fine.

Here is an example with DangerJS (javascript) and a Github Action.

npm install danger --save-dev
  • Create a Github Action workflow file: .github/workflows/danger.yml

You will need to authenticate with a GITHUB_TOKEN, and note that we also install execa.

name: "Danger JS"

on: [pull_request]

jobs:
build:
name: Danger JS
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- run: npm install execa
- name: Danger
uses: danger/danger-js@9.1.6
env: \$
GITHUB_TOKEN: ACTION_SECRET
DANGER_DISABLE_TRANSPILATION: true

Find out if a pull request is large

  • Create a DangerJS file: dangerfile.js

This function, executed by DangerJS, will use a custom bash script to retrieve the information we need from git. The number of git insertions and git deletions will be compared to a threshold value of our choice.

const execa = require('execa');

const formatMessage = (title, body) => `<b>${title}</b><br/>${body}`;

(() => {
const CHANGES_THRESHOLD = 800;
const pr = danger.github.pr;
const base = pr.base ? pr.base.ref : 'master';
const [ insertions, deletions ] = execa.commandSync(`./scripts/git/get-changes.sh ${base}`).stdout.split(',');
const changes = Number(insertions) + Number(deletions);
if (changes > CHANGES_THRESHOLD) {
warn(formatMessage('Pull Request size', `This is a big Pull Request, we found ${changes} changes
(additions and deletions).<br/>
The threshold is currently
${CHANGES_THRESHOLD}, we strongly advise that you break down this Pull Request into
smaller ones to ease the review and merging process.
`
));
}
})();
  • Create a script file: scripts/git/get-changes.sh

This script is retrieving two commit sha: the latest from the branch of the pull request and the latest from the base branch.
A diff with is created from the two commits and note that we exclude the package-lock.json file as they can be large and misleading.

#!/usr/bin/env sh

git --version

CURRENT_COMMIT=$(git rev-parse HEAD)
BASE_COMMIT=$(git rev-parse $1)
git diff --shortstat $CURRENT_COMMIT $BASE_COMMIT ':(exclude)package-lock.json' | \
sed -E 's/.* ([0-9]+) insertion.* ([0-9]+) deletion.*/\1,\2/'
  • Make the script file executable: chmod +x scripts/git/get-changes.sh

Conclusion

You now have a Github Action verifying your pull request size. You could even make it a blocker for merging if you so wish.

External links

Anatomy of a perfect pull request