TL;DR: During a migration across a few dozen repos, “looks done” failed me three different ways: stale code search, stale local clones, and an auto-merge that quietly reverted my changes. The checks I ended up trusting were greps on fetched file contents.

Motivation Link to heading

I was rolling a config migration across a few dozen repos at work and needed to answer one question repeatedly: is this repo done? I got that answer wrong three times, each from trusting a different proxy for the actual file contents.

Failure 1: the code search index Link to heading

GitHub code search seemed like the obvious sweep tool: search the org for the old config string, fix whatever comes back. The index was wrong in both directions. It returned hits for files that no longer existed on the default branch, and it missed files that did. One repo I’d written off as out of scope showed up in a late sweep, and the live file confirmed it had the old config after all.

The check I switched to was fetching the raw file from the default branch:

gh api "repos/org/repo/contents/pyproject.toml?ref=main" \
  -H 'Accept: application/vnd.github.raw' | grep -nE 'old-pattern|new-pattern'

Slower per repo, but it answers the actual question. Code search answers “what was on some branch when the indexer last visited”, which is a different question.

Failure 2: stale local clones Link to heading

For repos I had checked out, I grepped the clone. Several of those clones were weeks behind, so the grep classified repos as already migrated when the migration had landed on a branch my clone didn’t have, or classified them as clean when the default branch had since gained the old pattern. Same fix as above: fetch first, or query the remote contents directly and skip the clone.

Failure 3: the silent merge revert Link to heading

This was the one that worried me. Some of the repos had automation merging one long-lived branch into another after each merge. On one heavily diverged repo, that auto-merge resolved conflicts by taking the other branch’s side, and my just-merged changes vanished from it. No conflict markers, no failed check, and the merge commit came back green.

I caught it only because I’d started checking the post-merge file contents instead of the merge status:

gh api "repos/org/repo/contents/pyproject.toml?ref=other-branch" \
  -H 'Accept: application/vnd.github.raw' | grep -c 'new-pattern'

A green merge tells you the merge completed, not which side won the conflict.

What I’d keep Link to heading

For every repo, I settled on a small set of grep gates that had to pass on fetched content from the branch that matters, with exact expected counts:

grep -c 'old-pattern' file   # want 0
grep -c 'new-pattern' file   # want >0

Cheap and reliable. Every wrong answer in this migration came from a proxy: an index, a clone, a green tick. The fetched file was the source I ended up trusting, so by the end it was the only one I asked.

One shell footnote: several of my “all clean” sweep results turned out to be loops that had failed without me noticing (zsh not word-splitting an unquoted variable, a command not found inside a subshell swallowed by the pipeline). A sweep that errors out looks identical to a sweep that found nothing. I now treat a suspiciously clean result as a prompt to re-run one repo by hand before believing it.