Git Commit Creation
This is an article in which I explore the details and thinking that goes into how you should create git commits, and why. I like to think of it as the article I wish existed when I was just starting out over 20 years ago.
I wanted to cover all the things that you should think about at a high level. That way it at least could work as an entry point to deeper exploration of the particular areas if the reader isn’t completely sold or they want to just gain a deeper understanding. While at the same time trying to provide enough details to show why and how these choices are valuable. This is always a tricky balance.
Anyways, I would love any feedback on thoughts on how this could be improved.
Thanks
Good analysis, there are a few things that I think area bit opinionated and there is nothing wrong with that, I just don’t agree with a few things out of context. For example I agree that code on main should be buildable and testable. Code in your own branch should be for yourself and should still have commits. Also lazygit really abstracts a huge chunk of git logic while making it easier to understand.
What is important to me is that when you open a pull request the commits in that pull request, the things you are requesting to be brought into the mainline, are in a good and final form. Meaning they are actually ready to be brought into mainline. That means each of them need to be logically chunked, clearly define and communicate the intent, etc. as I outlined in the article.
Now, prior to that moment in time where you open the pull request. You can have your commits look like whatever horrible mess you want locally. Including if they are in a branch that you want locally that you aren’t going to integrate into mainline or open a pull request for.
To facilitate this refinement of commits prior to this moment of opening a pull request. I personally use interactive rebase a lot as well as use https://git-ps.sh/ to facilitate a patch stack based workflow locally.
I have found that organizing my commits according to the principles outlined in the article is extremely valuable even locally. It seems like most devs aren’t comfortable enough with git interactive rebase, etc. that facilitate refining commits as you go along. This is simply a skill issue that is well worth learning.
If you take the stance in which you say it is my PR. Whatever commits are in there are whatever commits you created, with no logical separation, no clear commit descriptions explaining the intent of each of the changes you made. You then lose the benefits outlined in the article. Tools like git bisect won’t work properly, reverting commits sucks, etc.
If you say, well what if we use GitHub to require that PR be integrated using a Squash & Merge? You still end up losing almost all of the benefits.
So I think the dividing line needs to be that if you are requesting something to be integrated into mainline, or you are going to integrate something into mainline, you should follow the principles in the article. Otherwise, you are just taking the stance that you don’t care about the values outlined in the article.
Which is a stance you can take. But those values being present or not based on your position is not opinion. That is fact, as it is provable by you simply testing out the things. So be careful not to fall into the trap of being like, “Well that is your opinion.”, and missing out on the fact that by having a different opinion you are factually losing out on the benefits.
I agree with you. They are good principles. I was just saying not every commit is merged into main and that doesn’t make it less useful in the local context even if you don’t adhere to those principles. Am I crazy to assume that people tend to avoid merging things into main that don’t work? 😅 Git is more than a project sharing tool. I use for projects I do alone. I use it even when I don’t sync it to anywhere because it’s good to have save points
Yeah a pull request should be formatted corectly, build, documented, pass all tests and have changes covered in tests.
Your own branch? Do whatever you want, will get squashed anyway in the pull request.
So this bit confuses me. The article says in the intent and scope section that the entire process of bug fixing, in the included example, is literal bug fixing, clean up toggle, correct lints, correct duplication. That point to linting issues.
The earlier section says that a commit should be ‘buildable’ and ‘testable’. So if there are linting issues, the commit won’t satisfy this criteria right?
What am I missing here?
In the example I provide the project that the PR was made for doesn’t have the linting passing as a build requirement. But that is irrelevant to the point I am trying to make which is to split things out base on those singular intents. Do you think that point was clear?
Maybe I should change the example so that it isn’t based on linting which can be part of the build requirements but doesn’t have to be.
Aah. I assumed linting was part of the build also. My bad. I did understand the idea you were mentioning. Just that assumptions kind of threw me off.
I wanted to ask something related to that. As you mentioned, git takes a snapshot of the repo on every commit. So splitting up the bug fix and other activities means you have 3 or 4 commits instead of one. Let us say we are dealing with a very large repo. This does not look ideal in that context right? So do you think the way you proposed is only suitable for smaller repos?
Actually, having more commits is negligible because of the way that Git stores the snapshots behind the scenes. Specifically, it uses a content addressable key value store. So the storage is bound to the file changes irrespective of the commits.
The commits simply hold the sha of each of the files. Technically, it is a bit more complicated than that. But from an understanding of size implications and what it is bound to that mental model should get you there. It also does additional smart things in packing this key value store to store things more efficiently that also help.
If you want to start understanding more about the internals of kid and how it actually stores stuff. The Pro Git book has a Git Internals section, https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain which is a great place to start.
I think I got the idea. So essentially a new copy of the file is created and stored only if there is a change, else it just refer to the older SHA. Am I right? Now I understand why LFS was needed for binaries, else it createds a lot of storage problems, but not the huge monorepos.
I’m not a developer, but a design person who covers much more including architecture. But in my org I happen to teach developers how to use Git. Strange, I know. But that is the case. It gave me a good opportunity to learn Git in depth.
I went through your blogs and patch stack workflow. I have to say that I have not been happy with the branching workflow and I always felt that is not the best (I agree to the point about “unjust popularity”). The patch stack workflow makes more sense to me. Unfortunately we won’t be able to adopt, since getting everyone to Git itself was a huge effort. Also developers are not that keen into creating good code, but just working working code. I’m extremely frustrated with that.
Also your blog design is really good. I love it. I always wanted to create something like that. But never managed to sit down and do it. Can you give me a brief about the tech stack used for the blog?
Do you use RNote for diagrams? The style looks familiar. Or is it something else?
Yep. It just points to the old sha if it hasn’t changed that file.
In terms of blog stack. It is very simple. It is a static site generated with a Rust static site generator, Zola.
The styles are just hand rolled SCSS that I have whipped up and tweaked over the years. Every so often I feel it needs a refresh and rework the styling. Recently I pulled in some stuff to make it feel more terminal like.
In terms of the diagrams I created them with Excalidraw. It is a go to of mine for diagraming.
What am I missing here?
That shoddy code rots when you update the compiler. (And occasionally good code, depending on what rules the compiler wants to start enforcing)
These types of changes are inevitable.