Published by Marco on 15. Mar 2013 09:55:11
Updated by Marco on 26. Sep 2016 21:14:26

This article originally appeared on "earthli News"
 and has been cross-posted


In order to program in 2013, it is important not to waste any time honing your
skills with outdated tools and work-flows. What are the essential pieces of
software for developing software in 2013?


   A runtime is a given for all but the most esoteric of programming exercises.
   Without something to execute your code, there is almost no point in writing


   Programming without an integrated debugger can be very time-consuming,
   error-prone and will quite frankly suck the fun right out of the whole
   endeavor. And, by "debugger" I mean a source-level single-step debugger with
   call-stack and variable/object/structure inspection as well as expression
   evaluation. Poring through logs and inserting print statements is not a
   viable long-term or even medium-term solution. You shouldn't be writing in a
   language without one of these unless you absolutely can't avoid it (NAnt
   build scripts come to mind).


   A syntax/semantics checker of some sort integrated into the editor ensures a
   tighter feedback/error-finding loop and saves time, energy and frustration. I
   was deliberately cagey with the "checker" because I understand that some
   languages, like Javascript [1], do not have a compiled form. Duck-typed
   languages like Python or Ruby also limit static checking but anything is
   better than nothing.


   A source-control system is essential in order to track changes, test ideas
   and manage releases. A lot of time can be wasted -- and effort lost --
   without good source control. Great source control decreases timidity,
   encourages experimentation and allows for interruptible work-flows. I will
   argue below that private branches and history rewriting are also essential.

Even for the smallest projects, there is no reason to forgo any of these tools.

[Managing your Source Code]

"tl;dr: It's 2013 and your local commit history is not sacrosanct. No one wants
to see how you arrived at the solution; they just want to see clean commits that
explain your solution as clearly as possible. Use git; use rebase; use "rebase
interactive"; use the index; stage hunks; squash merge; go nuts. [2]"


[image]I would like to focus on the "versioning" part of the tool-chain. Source
control tells the story of your code, showing how it evolved to where it is at
any given point. If you look closely at the "Encodo Branching Model" [3] diagram
(click to enlarge), you can see the story of the source code:

   1. All development was done in the master branch until v1.0 was released
   2. Work on B was started in a feature branch
   3. Work on hotfix v1.0.1 was started in a hotfix branch
   4. Work on A was started in another feature branch
   5. Hotfix v1.0.1 was released, tagged and merged back to the master branch
   6. Development continued on master and both feature branches
   7. Master was merged to feature branch A (includes hotfix v1.0.1 commits)
   8. Finalization for release v1.1 was started in a release branch
   9. Feature A was completed and merged back to the master branch
   10. Version v1.1 was released, tagged and merged back to the master branch
   11. Master was merged to feature branch B (includes v1.1 and feature A
   12. Development continued on master and feature B
   13. Version v1.2 was released and tagged

Small, precise, well-documented commits are essential in order for others to
understand the project -- especially those who weren't involved in developing
the code. It should be obvious from which commits you made a release. You should
be able to go back to any commit and easily start working from there. You should
be able to maintain multiple lines of development, both for maintenance of
published versions and for development of new features. The difficulty of
merging these branches should be determined by the logical distance between them
rather than by the tools. Merging should almost always be automatic.

Nowhere in those requirements does it say that you're not allowed to lie about
how you got to that pristine tree of commits.

[Why you should be using private branches and history rewriting]

A few good articles about Git have recently appeared -- "Understanding the Git
Workflow" by Benjamin Sandofsky  is
one such -- explaining better than ever why rewriting history is better than
server-side, immutable commits.

In the article cited above, Sandofsky divides his work up into "Short-lived work
[...] larger work [...] and branch bankrupty." These concepts are documented to
some degree in the Branch Management chapter of the "Encodo Git Handbook"
 (of which I am co-author). I
will expand on these themes below.

Note: The linked articles deal exclusively with the command line, which isn't
everyone's favorite user interface (I, for one, like it). We use the
"SmartGit/Hg"  client for
visualizing diffs, organizing commits and browsing the log. We also use the
command-line for a lot of operations, but SmartGit is a very nice tool and
version 3 supports nearly all of the operations described in this article.

[What is rebasing?]

As you can see from the diagram above, a well-organized and active project will
have multiple branches. Merging and rebasing are two different ways of getting
commits from one branch into another.

Merging commits into a branch creates a merge commit, which shows up in the
history to indicate that n commits were made on a separate branch. Rebasing
those commits instead re-applies those commits to the head of the indicated
branch without a merge commit. In both cases there can be conflicts, but one
method doesn't pose a greatest risk of them than the other. [4] You cannot tell
from the history that rebased commits were developed in a separate branch. You
can, however, tell that the commits were rebased because the author date (the
time the commit was originally created) differs from the commit date (the last
time that the commit was applied).

[What do you recommend?]

At "Encodo" , we primarily work in the master branch because
we generally work on very manageable, bite-sized issues that can easily be
managed in a day. Developers are free to use local branches but are not required
to do so. If some other requirement demands priority, we shunt the pending issue
into a private branch. Such single-issue branches are focused and involve only a
handful of files. It is not at all important to "remember" that the issue was
developed in a branch rather than the master branch. If there are several
commits, it may be important for other users to know that they were developed
together and a merge-commit can be used to indicate this. Naturally, larger
changes are developed in feature branches, but those are generally the exception
rather than the rule.

"Remember: Nowhere in those requirements does it say that you're not allowed to
lie about how you got to that pristine tree of commits."

Otherwise? Local commit history is absolutely not sacrosanct. We rebase like
crazy to avoid unwanted merge commits. That is, when we pull from the central
repository, we rebase our local commits on top of the commits that come form the
origin. This has worked well for us.

If the local commit history is confusing -- and this will sometimes come up
during the code review -- we use an interactive rebase to reorganize the files
into a more soothing and/or understandable set of commits. See "Sandofsky's
article"  for a good introduction
to using interactive rebasing to combine and edit commits.

Naturally, we weigh the amount of confusion caused by the offending commits
against the amount of effort required to clean up the history. We don't use
bisect [5] very often, so we don't invest a lot of time in enforcing the clean,
compilable commits required by that tool. For us, the history is interesting,
but we rarely go back farther than a few weeks in the log. [6]

[When to merge? When to rebase?]

At Encodo, there are only a few reasons to retain a merge commit in the official

   1. If we want to remember which commits belonged to a particular feature. Any
      reasonable tool will show these commits graphically as a separate strand
      running alongside the master branch.
   2. If a rebase involves too much effort or is too error-prone. If there are a
      lot of commits in the branch to be integrated, there may be subtle
      conflicts that resolve more easily if you merge rather than rebase.
      Sometimes we just pull the e-brake and do a merge rather than waste time
      and effort trying to get a clean rebase. This is not to say that the tools
      are lacking or at fault but that we are pragmatic rather than ideological.
   3. If there are merge commits in a feature branch with a large number of
      well-organized commits and/or a large number of changes or affected files.
      In this case, using a squash merge and rebuilding the commit history would
      be onerous and error-prone, so we just merge to avoid issues that can
      arise when rebasing merge commits (related to the point above).

[When should I use private branches? What are they exactly?]

There are no rules for local branches: you can name them whatever you like.
However, if you promote a local branch to a private branch, at Encodo we use the
developer's initials as the prefix for the branch. My branches are marked as
"mvb/feature1", for example.

What's the difference between the two? Private branches may get pushed to our
common repository. Why would you need to do that? Well, I, for example, have a
desktop at work and, if I want to work at home, I have to transfer my workspace
somehow to the machine at home. One solution is to work on a virtual machine
that's accessible to both places; another is to remote in to the desktop at work
from home; the final one is to just push that work to the central repository and
pull it from home. The offline solution has the advantage of speed and less
reliance on connectivity.

What often happens to me is that I start work on a feature but can only spend an
hour or two on it before I get pulled off onto something else. I push the
private branch, work on it a bit more at home, push back, work on another,
higher-priority feature branch, merge that in to master, work on master,
whatever. A few weeks later and I've got a private branch with a few ugly
commits, some useful changes and a handful of merge commits from the master
branch. The commit history is a disgusting mess and I have a sneaking suspicion
that I've only made changes to about a dozen files but have a dozen commits for
those changes.

That's where the aforementioned "branch bankruptcy" comes in. You're not
obligated to keep that branch; you can keep the changes, though. As shown in the
"referenced article" , you execute
the following git commands:

git checkout master
git checkout -b cleaned_up_branch
git merge --squash private_feature_branch
git reset

The --squash tells git to squash all of the changes from the
private_feature_branch into the index (staging) and reset the index so that
those changes are in the working tree. From here, you can make a single, clean,
well-written commit or several commits that correspond logically to the various
changes you made.

Git also lets you lose your attachment to checking in all the changes in a file
at once: if a file has changes that correspond to different commits, you can add
only selected differences in a file to the index (staging). "In praise of Gitís
index" by Aristotle Pagaltzis 
provides a great introduction. If you, like me, regularly take advantage of
refactoring and cleanup tools while working on something else, you'll appreciate
the ability to avoid checking in dozens of no-brainer cleanup/refactoring
changes along with a one-liner bug-fix. [8]

[One final example: cherry picking and squashing]

I recently renamed several projects in our solution, which involved renaming the
folders as well as the project files and all references to those files and
folders. Git automatically recognizes these kind of renames as long as the old
file is removed and the new file is added in the same commit.

I selected all of the files for the rename in SmartGit and committed them, using
the index editor to stage only the hunks from the project files that
corresponded to the rename. Nice and neat. I selected a few other files and
committed those as a separate bug-fix. Two seconds later, the UI refreshed and
showed me a large number of deleted files that I should have included in the
first commit. Now, one way to go about fixing this is to revert the two commits
and start all over, picking the changes apart (including playing with the index
editor to stage individual hunks).

Instead of doing that, I did the following:

   1. I committed the deleted files with the commit message "doh!" (to avoid
      losing these changes in the reset in step 3)
   2. I created a "temp" branch to mark that commit (to keep the commit visible
      once I reset in step 3)
   3. I hard-reset my master branch to the origin
   4. I cherry-picked the partial-rename commit to the workspace
   5. I cherry-picked the "doh!" commit to the workspace
   6. Now the workspace had the rename commit I'd wanted in the first place
   7. I committed that with the original commit message
   8. I cherry-picked and committed the separate bug-fix commit
   9. I deleted the "temp" branch (releasing the incorrect commits on it to be
      garbage-collected at some point)

Now my master branch was ready to push to the server, all neat and tidy. And
nobody was the wiser.


[1] There are alternatives now, though, like Microsoft's TypeScript, that
    warrant a look if only because they help tighten the error-finding feedback
    loop and have the potential to make you more efficient (the efficiency may
    be robbed immediately back, however, if debugging generated code becomes
    difficult or even nightmarish).

[1] Once you've pushed, though? No touchie. At that point, you've handed in your
    test and you get graded on that.

[1] According to my business card, I'm a "senior developer and partner" at
    "Encodo System AG" .

[1] With the exception, mentioned elsewhere as well, that rebasing merge-commits
    can sometimes require you to re-resolve previously resolved conflicts, which
    can be error-prone if the conflicts were difficult to resolve in the first
    place. Merging merge-commits avoids this problem.

[1] bisect is a git feature that executes a command against various commits to
    try to localize the commit that caused a build or test failure. Basically,
    you tell it the last commit that worked and git uses a binary search to find
    the offending commit. Of course, if you have commits that don't compile,
    this won't work very well. We haven't used this feature very much because we
    know the code in our repositories well and using blame and log is much
    faster. Bisect is much more useful for maintainers that don't know the code
    very well, but still need to figure out at which commit it stopped working.

[1] To be clear: we're only so cavalier with our private repositories to which
    access is restricted to those who already know what's going on. If we commit
    changes to public, open-source or customer repositories, we make sure that
    every commit compiles. See "Aristotle's index article"
     (cited above) for tips on
    building and testing against staged files to ensure that a project compiles,
    runs and passes all tests before making a commit -- even if you're not
    committing all extant changes.

[1] That said, with experience we've learned that an interactive rebase and
    judicious squashing will create commits that avoid these problems. With
    practice, these situations crop up more and more rarely.

[1] Of course, you can also create a separate branch for your refactoring and
    merge it all back together, but that's more work and is in my experience
    rarely necessary.