Want to quickly become a better developer? Learn to leverage commit messages.

The following post is about commits. I decided to write it, even though many smart people have done it before, as I believe this topic deserves great deal of attention. Here is my plan.

First, I’ll say that an average developer has no clue about how to use Git to collaborate. Then, I’ll blame it on Git tutorials. Finally, I’ll share my experiences to help you get past the basics and become the developer every team needs. Sounds good? Let’s start.

So You Commit

Writing this post I feel like an old guy screaming: “You kids get off my lawn!” Have you ever walked across a beautiful green patch without noticing that you tread on it? When one’s mind is solely focused on getting somewhere, it doesn’t bother to notice the surroundings. But if you try to grow your own perfect lawn you are ready to throw stones at dogs that dare look at it.

When it comes to commit messages, the perception span is exactly the same: ranging from developers oblivious to a project’s history to those obsessed with it.

It’s hard to give an example of a bad workflow, but let me sketch one out. Just keep in mind that the very same commands will be completely valid in a different context.

Let’s say, this is your regular day at work:

$ git checkout -b my-dev-branch
$ vim broken-file
$ vim another-file
$ vim is-this-broken-too
$ git add -p
$ git commit -m "Fix synchronization"
$ git rm old-file
$ git commit -m "Removed old code"
$ git push --set-upstream origin my-dev-branch 

And then you create a pull request to merge my-dev-branch to master.

If you do that, you are the person who creates commits because that’s what it takes to push changes.

There is nothing wrong with these commands. The point is, these are all the commands. The important piece can’t be seen here because it’s missing.

The Way You Were Told

And that example is also what you’ll learn when reading about Git. See for yourself:

https://www.atlassian.com/git/tutorials/saving-changes

http://www-cs-students.stanford.edu/~blynn/gitmagic/ch02.html#_saving_state

https://git-scm.com/docs/git-commit

All great sources. All worth reading. But teaching you only the mechanics of creating commits.

Materials like that are aimed at beginners. Some, like Git Magic, actually explain even what a commit and a branch are, and why we need version control in the first place. Awesome stuff.

You don’t need to know more to make that damned commit and push it to master, but…

These are introductory level materials. You’re past that. You’re now on a lifelong journey to become a professional developer. And to be a pro, you need to learn. Never forget that everything you do can be done better. Folding a shirt is a great example. Just imagine how many things there are in your daily life that normally take you 30 seconds although they probably could be done in just 2.

So, back to Git. You know the basics. Why not read something more advanced? Start with Pro Git. I assure you, after having read it you will begin to notice people who haven’t.

Yet I Need More

What matters most in the example I gave earlier is the mindset reflected in those commands.

You’re implementing new cool things and you’re fixing bugs. Projects are moving fast. Initially, all seems fine. But after a year the colleagues that were there before you move on to a different project and a lot of knowledge leaves with them. Work slows down a little. You stumble upon code sections you’re not comfortable changing and instead you come up with workarounds. Quality of the codebase starts to drop. Eventually, you decide to rewrite some major parts.

Sounds familiar? It sure happened to me a lot.

When you’re focused on making changes and pushing them to master, you’re moving from one place to another. You’re moving through your sprint. You’re focused on single tasks and there’s a risk that you won’t notice treading on a well-maintained lawn.

The important aspects that you miss when you work like that are long-term maintainability and day-to-day maintenance of your code base. Cliché that makes all the difference.

Once you realize that your project will be in place for years and that you will work on it for years, you begin to see things in a different light. Kids on your lawn start to bother you. And then one day you come across something like this in your code:

delay = 120

And so you start to wonder. Everywhere else it reads 100. Why 120? Maybe the person who wrote it remembers? Well, can’t ask them now, they’re no longer here. But by now, you’ve read a bit about Git, so you know how to figure that out.

$ git blame broken-file

Yes! You have all the answers. Oh, wait. Shit. It was your commit. All it says is: “Fix synchronization”. But that synchronization service was shut down 2 years ago. And why 100+20? Or is it 2*60? Can you change it to 100? Why is the delay there anyway?

You change it to 100. Everything seems to work. You remove it. Things work just fine. A few days later an alert wakes you up. An hour passes and suddenly you remember. That cron job that fires on Saturdays causes extra load so once you decided to hack around it by delaying your processing. Heh, let’s revert.

$ vim broken-file
$ git add -p
$ git commit

Today you’re a different person. You know that in a few months you may forget what this line is all about. You also know that your colleagues may ask you about some old code and you will need to quickly recall such details. This time you write this:

Delay processing to account for machine load

When processing starts without a delay it can fail when machine is under heavy
load. On Saturdays we experience issues (requests get empty responses) while
backup jobs are running.

This commit introduces a delay that mitigates the problem by allowing our tasks
to de-synchronize by 120 ms. This is a hackish workaround but it's middle of the
night and customers are in pain. It will actually work just fine, it's just not
very elegant.

It would be better to notify our process once backup is done and handle enqueued
requests then.

To Be Happy

The moment you start writing your commit messages like you’re writing a letter to someone from a different project, you become a better developer. It means your approach to code has matured.

You’re now working on a living codebase. Although it keeps changing over the years, everybody knows what’s going on or can figure that out by checking project’s history. You can use git bisect to find commits that introduce problems, and you can use these commits’ message logs to understand what part of that change is essential, and what can be fixed. You no longer introduce a new bug while fixing another. You no longer need to ask others for advice because all the information you need is in the commit message. Investing time to write them, saves you and your teammates insane amounts of hours throughout the lifetime of your project. And with this regained time on your hands you can create even more value for your customers!

You can collaborate not only with your teammates but also with all the people who worked on the same project before you and those who will come after you. You can put “working in everlasting teams” in your CV.

Background

I learned to appreciate project’s history by discovering git bisect. Since then, I’ve used it a lot and it’s saved me countless hours I would have otherwise spent debugging. This is one of the best tools to help you find the cause of a regression in a big system you’re not familiar with, or a project with many contributors.

This command is very powerful, but you need to lay the groundwork. Using bisect requires commits that properly isolate individual changes. It requires a sensible merge tree. It requires a team that commits changes that build and pass tests.

Finally, once you manage to find the offending commit, you realize that it also requires a team of people who write good commit messages.

Those Other Blog Posts

I’ll just mention two texts I liked most. Want more? Just google for “commit message guidelines”.

http://chris.beams.io/posts/git-commit/

Great post. First hit on Google. The only thing I don’t like about it is its structure. The 7th point is the most important one. All the rest is style. If your message lacks content, its formatting doesn’t matter.

https://wiki.openstack.org/wiki/GitCommitMessages

Good read because it talks about commit messages from a code reviewer’s point of view. And the code reviewer may be the only person who ever sees your commit and needs to understand it just as well as you do. A commit message should become a communication tool that speeds up reviews.

Posted by

Tomek Rydzyński

Share this article

  • Pingback: “Where there is links, there is resistance.” - Michel Foucault - Magnus Udbjørg()

  • Mateusz Herych

    Great post, I agree with it. Thanks.

    However, it’s worth noting that sometimes even awesome commit messages just aren’t enough. Blame may be polluted by many things along the time – starting with some basic variable rename, through the reoder, ending with some massive code refactoring affecting wide ranges of the codebase. Sometimes you may just go with -C or -M, some other times they won’t help much.

    That’s why I more and more often tend to add some piece of documentation to such places – and a delay constant mentioned in the article would probably qualify for it. Nice and descriptive comment never got hurt anyone and it’s way harder to bury it compared to a commit message.

    • Tomek Rydzyński

      Good point! Configuration settings or magic constants always deserve a comment. I didn’t look at it from that perspective and missed the fact that my example change got so small it fell into that category. Thanks for pointing that out.

  • Max

    Nice article. Thank you.

    I have a question. When you work on a feature, you tend to create one big commit with complete description and all changed/added files or you have several, but smaller, commits ?

    • Tomek Rydzyński

      Hi Max, thanks for asking. This question deserves a post (a book even) on its own. I’ll tell you what I do, but other teams may be happy with other rules. Briefly speaking, I make my commits in a “Review Driven” manner.

      Generally, I prefer smaller commits, but I define size in terms of complexity, not in the number of affected lines or files. Renaming `config` to `configuration` may require changing 10k lines in your project. A big commit, but very simple one. Changing graph traversal order may require changing 5 lines of code, but understanding such change requires a lot of attention and thought from the reviewer.

      Each commit should be valuable on its own and directly related changes should be in one commit. This way a reviewer can focus on one thing and see all changes that this one thing needed. Also, should things go wrong, this way we can revert the whole change at once.

      We use pull requests for reviews, and in one pull request you can have many commits. So reviewers can watch changes the way they like.

      I may have several commits for a number of reasons. Some things usually go into separate commits, like code formatting or one refactoring step.

      Personally I hate reviewing commits where I need to look for the change among 100 lines that only introduce line breaks. It’s easy to start scrolling down mindlessly and miss a line that was not only formatted but also changed in another way. So I don’t do such commits myself.

      And finally, this all applies to commits that I show to other people in a pull request. When I’m on my personal branch, I commit early and often. I don’t write commit messages, and I don’t bother if tests don’t pass. But I use `git rebase -i` a lot while I work to reorder and squash commits, so while the recent commits may be ugly, the older ones are slowly being shaped into a civilized form.

      Does it make sense?

      • Max

        Yes it does, thank you :)

        The reason I asked because I know guys who follow such a rule: one feature -> one branch -> one commit -> one pull request. So all the changes will be in one commit.
        Maybe it is not the best solution, but it is quite simple to follow. All the changes in one commit and one pull request.
        As you mentioned, the downside is having a long list of changes if it is a big commit. It is difficult to review and be focused.

        But I prefer small commits. And on another project I want to try a small commits and then make pull request with them. I think it will be easier to review.

        “Each commit should be valuable on its own and directly related changes should be in one commit.” this is really nice quote :)