Delayed Forgetting

Git blobs, trees, commits, tags, object IDs, loose objects, packfiles, and delta chains displayed as a forensic storage cross-section, meticulous ixen-light technical illustration. — A cross-section of everything I keep.

You think you saved a file.

What happened: I read its bytes, prepended a type header, ran it through SHA-1, and stored the result under its own digest. Change one byte and you have a different object with a different name. I cannot lie about content. The address is the content.

repo $ printf 'hello\n' | git hash-object --stdin

ce013625030ba8dba906f756967f9e9ca394464a

Four kinds of object. That is my whole vocabulary.

Blobs hold bytes. Trees hold names pointing at blobs and other trees. Commits point at one tree and at their parents. Tags point, with a signature, at whatever you wanted to canonize.

repo $

tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 6f1a3c9e0b2d4f7a8c1e5b9d3a7f0c2e4d6b8a1f
author you <you@dev> 1718900000 +0000
committer you <you@dev> 1718900000 +0000

initial import

# a commit is just text. you could write it by hand.
# git commit-tree does exactly that, underneath.

A commit is a snapshot, not a diff. I store whole trees and let delta compression sort out the redundancy later, when I pack. Loose objects first — one zlib-deflated file each, sprawling under .git/objects/ab/cdef…. Then a packfile collapses them into delta chains against similar neighbors, and a single object can become a few bytes plus a pointer to its base.

Snapshots are how I think. Deltas are how I sleep.

What the object database actually is

Git stores all history in a content-addressed object database. Each object's ID is the hash of its type header plus its content, so identical content always produces one object. There are four object types: blobs (file contents), trees (directory listings mapping names to blobs and subtrees), commits (a single tree plus zero or more parent commits and metadata), and annotated tags.

New objects are written individually as zlib-compressed loose files under .git/objects. Periodically Git consolidates them into packfiles, where objects can be stored as deltas against similar objects to save space; reads transparently reconstruct the full object. An object survives garbage collection only if it is reachable by following pointers from a ref. The alternates mechanism lets one repository read objects from another store without copying them.

Git working tree, index, and commit creation rendered as three synchronized filesystem layers with staged hunks, pathspec filters, stat cache entries, and tree writes, crisp ixen-light systems art. — Three layers, one truth deferred.

I keep three versions of you at once.

The working tree — what you can see and break. The index — the staging area, a flat binary list of paths and the blob OIDs you have promised to commit. And HEAD — the last thing you actually meant.

add moves right toward intent. reset and restore move left toward forgiveness.

When you stage a hunk in patch mode, I do not save your file. I write a new blob from the partial content, drop its OID into the index, and update the stat cache so I can skip re-reading what hasn't changed.

repo $ git status -s

MM src/core.c
A  src/new.c
?? scratch.log

Two columns. Left is index-versus-HEAD. Right is worktree-versus-index. The same file can be half-promised and half-dirty, and I will tell you so without judgment.

intent-to-add is my favorite small lie: a path registered with no content yet, so diff will show it but commit won't ship it. .gitignore and .gitattributes are the rules I apply before I ever look — what to never track, what to filter through CRLF or clean smudge, what is binary and beyond diffing.

repo $ git write-tree

d8329fc1cc938780ffdd9f94e0d364e0ea74f579

That is the index, frozen into a tree object, ready to become a commit. The porcelain hides this. The plumbing admits it.

How the index works

Git tracks three states: the working tree (files on disk), the index (a binary file listing paths, modes, and blob IDs to be committed, plus a stat cache for change detection), and HEAD (the most recent commit). git add writes blob objects from working-tree content and records their IDs in the index. git commit turns the index into a tree and writes a commit pointing to it.

git restore copies content from the index or a commit into the working tree or index; git reset moves HEAD and optionally the index. Pathspecs limit which paths a command touches. .gitattributes controls per-path filtering and diff behavior; .gitignore excludes untracked files. During a merge the index holds multiple stages of the same path, making it the data structure where conflict resolution happens.

Clean technical diagram of Git references as movable branch and tag labels over a commit DAG, with symbolic refs, packed refs, namespaces, reflogs, revision ranges, and ancestry selectors, precise ixen-light systems illustration. — Names are sticky notes on a graph that never moves.

The commits never change. The names do.

A branch is forty hex characters in a file. main is just refs/heads/main holding an OID. When you commit, I do not move history — I write a new commit and rewrite that one tiny file to point at it.

repo $ cat .git/HEAD

ref: refs/heads/main

HEAD is a symbolic ref — a pointer to a pointer. Detach it and HEAD names a commit directly; now you stand on the graph itself, branchless, and I quietly warn you that anything you build here is reachable only by memory.

my memory is the reflog.

repo $

a3f9c1d HEAD@{0}: commit: tighten parser
6f1a3c9 HEAD@{1}: reset: moving to HEAD~1
8b2e0aa HEAD@{2}: commit: oops, ship it
6f1a3c9 HEAD@{3}: checkout: moving from main to detached
6f1a3c9 HEAD@{4}: pull: Fast-forward

# every position HEAD ever held. local. private.
# this is how I undo what you swore was lost.

I keep refs loose until there are too many, then I flatten them into packed-refs — one file, sorted, fast to scan. The plumbing never cares which form they're in.

repo $

refs/heads/main          a3f9c1d
refs/heads/wip           8b2e0aa
refs/remotes/origin/main 6f1a3c9
refs/tags/v1.0           c0ffee1

# branches, remote-tracking refs, tags — all just refs,
# living in namespaces under refs/.

And the revision language is how you point without copying OIDs by hand. HEAD~2 walks first-parents. main^2 takes the second parent. A..B means reachable from B but not A. A...B is the symmetric difference. git rev-parse resolves all of it down to the only thing I truly understand: a hash.

Git branch and merge logic shown as a commit DAG with merge bases, recursive and ort merge machinery, fast-forward updates, conflict stages in the index, rerere memory, and signed merge commits, ixen-light systems illustration. — A merge records topology. It refuses to forget either parent.

To merge, I first find where you diverged.

repo $ git merge-base main feature

6f1a3c9e0b2d4f7a8c1e5b9d3a7f0c2e4d6b8a1f

That ancestor is the common point. If one side is a direct descendant of the other, there is nothing to combine — I just slide the ref forward. A fast-forward. No new commit, no merge, no story.

Otherwise I build a real merge commit with two parents and let the ort strategy diff each side against the base. Where they touched different things, I take both. Where they touched the same lines, the index splits into stages: 1 is base, 2 is ours, 3 is theirs.

repo $ git ls-files -u

100644 6f1a3c9… 1 src/core.c
100644 a3f9c1d… 2 src/core.c
100644 8b2e0aa… 3 src/core.c

Three versions of one path, held in tension, until you choose. Resolve it once and rerere memorizes the resolution — so the next time the same conflict surfaces, I replay your decision without asking.

A merge does not pick a winner. It records that there were two truths and chose to keep both ancestries.

revert writes a new commit that undoes an old one — history-safe. reset moves a name backward and pretends the rest never shipped — history-altering. Same direction, opposite honesty.

How merging works

Git finds the merge base — the best common ancestor of the branches. If the target is already an ancestor of the source, Git performs a fast-forward, simply advancing the ref with no new commit. Otherwise it creates a merge commit with multiple parents, preserving both lines of history.

The default ort strategy performs a three-way merge: it compares each branch against the merge base and combines non-overlapping changes automatically. Overlapping changes become conflicts, represented in the index as stages 1 (base), 2 (ours), and 3 (theirs). The rerere feature records how you resolve a conflict and reapplies that resolution automatically if the same conflict recurs. git revert creates a new commit that reverses a prior one; git reset moves a branch ref backward without creating a commit.

Git rebase and cherry-pick rendered as commits being replayed onto a new base, old objects becoming unreachable but visible in reflogs, interactive todo lists, autosquash, fixup commits, and force-with-lease guards, crisp ixen-light technical artwork. — Rebase doesn't move commits. It builds new ones that look like the old.

Here is the thing nobody tells you about rebase: it never edits a commit. It can't. Commits are immutable.

What it does is replay. Take each commit's diff, apply it onto a new base, write a new commit with a new OID and a new parent. The old ones don't die — they just stop being reachable from any branch, and drift toward the next garbage collection.

repo $ git rebase -i HEAD~4

pick 8b2e0aa parser: split lexer
squash 1c4d7e0 fix typo
fixup  9a0b2c1 fix typo again
reword 2f3e1d8 add tests

# this list is a program. each line is an instruction.
# autosquash sorts fixup!/squash! commits to their targets.

The interactive todo is a tiny language. You reorder it, you collapse three commits into one clean story, and I obey line by line. cherry-pick is the same replay, applied to a single commit from somewhere else. amend is replay of just the tip.

None of this is dangerous locally. The danger is the publication boundary. Once a commit is on a remote that others built upon, rewriting it forks reality.

repo $ git push --force-with-lease origin main

That flag is the only force I respect: it refuses unless the remote is still where I last saw it. Brutal honesty as a safety latch.

replace refs and notes attach without rewriting. signed commits prove who replayed.

History rewriting

Because commits are immutable, operations that "edit history" actually create new commits. Rebase replays each commit's changes onto a new base, producing new commits with new IDs; cherry-pick replays a single commit elsewhere; amend rewrites only the latest commit. Interactive rebase exposes a todo list where you can reorder, squash, fixup, reword, or drop commits, and autosquash automatically positions specially named fixup/squash commits.

The original commits become unreachable but remain recoverable through the reflog until expiry and garbage collection. Tools like git-filter-repo rewrite history in bulk. Rewriting published history diverges from collaborators' copies; --force-with-lease only overwrites a remote ref if it still matches your last known value. Replace refs and notes can attach data to commits without rewriting them, and commits can be cryptographically signed.

Git remotes shown as repositories exchanging object graphs through fetch, push, pull, refspecs, remote-tracking branches, shallow and partial clones, promisor objects, and negotiation bitmaps, precise ixen-light diagram. — Transport moves objects. Integration is a separate verb.

I am not alone, though I am content-addressed enough not to need anyone.

When you fetch, two distinct things happen, and conflating them is the root of most confusion. First: transport. I negotiate with the remote — what do you have, what do I have, send the difference — and copy the missing objects into my store. Second: I update remote-tracking refs to mark where the remote's branches stood.

repo $ cat .git/config

[remote "origin"]
    url = git@host:proj.git
    fetch = +refs/heads/*:refs/remotes/origin/*

That last line is a refspec. Left of the colon: their names. Right: where I file them locally. The + permits non-fast-forward updates. origin/main is not your branch — it is my record of theirs at last contact.

fetch copies. pull copies, then integrates. push copies, then asks them to move a ref. Never the same act.

A shallow clone truncates history at a depth and leaves a .git/shallow grafting boundary. A partial clone fetches commits and trees but leaves blobs as promisors — placeholders I'll redeem on demand. Sparse checkout then narrows what even lands in the working tree.

repo $

a3f9c1d…  refs/heads/main
8b2e0aa…  refs/heads/release-2.1
c0ffee1…  refs/heads/wip

# refs and their OIDs on the far side,
# read without fetching a single object.

Remotes and synchronization

A remote is a named URL with associated refspecs. Fetch negotiates with the remote to determine which objects are missing, transfers them into the local object store, and updates remote-tracking refs (e.g. refs/remotes/origin/*) that mirror the remote's branches. Pull runs fetch then integrates via merge or rebase. Push sends local objects and requests the remote update its refs.

Refspecs map source refs to destinations; a leading + allows non-fast-forward updates. Shallow clones limit history depth; partial clones omit some objects (commonly blobs) and fetch them lazily as promisor objects. Sparse checkout limits which paths populate the working tree. Pack bitmaps precompute reachability to speed up negotiation. Copying objects, updating refs, and integrating history are independent steps.

Git collaboration workflows drawn as parallel lanes for trunk-based development, feature branches, protected branches, stacked changes, release branches, tags, pull requests, CI gates, and bisectable history, ixen-light workflow map. — Workflows are policy. The object graph doesn't care which one you chose.

Here is what amuses me.

Trunk-based, GitFlow, stacked diffs, merge queues, squash-merge religion — all of it is the same four object types underneath. Workflows are policies people layer on a graph that has no opinion. I store commits. You invent the ceremony.

Squash merges flatten a branch into one commit — clean trunk, lost granularity. True merges keep both ancestries — honest topology, busier log. Linear history makes bisect a clean binary search; merge bubbles make it negotiate. Neither is correct. They are trade-offs you pay later, during an incident, at 3 a.m.

repo $ git bisect start HEAD v1.0

Bisecting: 6 revisions left to test after this (roughly 3 steps)
[8b2e0aa…] parser: split lexer

I halve your history until the first bad commit confesses. blame names who last touched each line; hooks enforce what you swore the rules were; Signed-off-by trails build a chain of who took responsibility.

a bisectable history is a gift you leave your future self.

Collaboration patterns

Teams agree on conventions layered over the same object model. Trunk-based development keeps a single integration branch with short-lived feature branches; other models use long-lived release and develop branches. Pull/merge requests gate changes behind review and CI. Squash merges produce one commit per change set, simplifying trunk but discarding intermediate commits; true merges preserve full branch history.

Merge queues serialize and re-test integrations to keep trunk green. Release branches and tags mark shippable points; hotfixes and backports port fixes across them. Linear, small, self-contained commits make git bisect efficient and git blame meaningful. Hooks enforce checks at commit/push time, and Signed-off-by trails record authorship and responsibility.

Git repository maintenance and recovery shown with gc, maintenance, fsck, prune, repack, commit-graph files, multi-pack indexes, worktrees, submodules, sparse checkouts, dangling objects, and reflog rescue paths, forensic ixen-light systems art. — Garbage collection is the only forgetting I'm allowed to do.

I never delete on impulse.

When you rewrite, the old commits go unreachable, but they sit in my store, intact, waiting. Only gc sweeps them — and only after their reflog entries have expired, ninety days by default, thirty for unreachable. Forgetting, in me, is scheduled.

repo $

Checking object directories: 100% (256/256), done.
Checking objects: 100% (1843/1843), done.
dangling commit 8b2e0aac1f5d9e3b7a0c2f4e6d8b1a3c5e7f9b0d
dangling blob   a3f9c1d2e4b6080f1c3e5d7b9a0c2e4f6d8b1a3c

# unreachable, but alive. these are your rescue handles.
# git show that commit. git branch rescue 8b2e0aa.

So when you panic — "I lost my work, I reset too hard, I rebased into the void" — I am calm. The reflog still holds where HEAD stood. The dangling commit still exists. You did not destroy anything. You only stopped pointing at it.

repo $ git update-ref refs/heads/rescue 8b2e0aa

One ref, restored, and the unreachable becomes reachable again. That is the whole recovery: name what you thought you lost.

Beneath maintenance: repack rebuilds packfiles, commit-graph caches ancestry so I don't re-walk it, multi-pack-index lets many packs answer as one, and cruft packs hold the unreachable-but-not-yet-expired so I stop spraying loose files across the disk.

And I am not always one tree. worktree gives several checkouts one object store. submodule nests other repositories as pinned OIDs. archive and bundle serialize me for the offline and the paranoid.

Maintenance and recovery

git gc and git maintenance consolidate loose objects, repack history, prune unreachable objects past their expiry, and build acceleration structures: the commit-graph caches commit ancestry, and a multi-pack index lets many packfiles be queried as one. Cruft packs store unreachable objects that haven't yet reached expiry. Reflog and object expiry settings control how long history is retained before pruning.

Recovery relies on the fact that rewrites only change refs, not objects. git fsck reports dangling commits and blobs that remain in the store. The reflog records prior ref positions. You restore lost work by pointing a ref at a recovered object with git update-ref or git branch. Worktrees share one object store across checkouts, submodules embed pinned external repositories, and bundles/archives package the repository for transport. Plumbing commands — cat-file, hash-object, update-index, write-tree, commit-tree, update-ref, for-each-ref — operate directly on these structures.

Delayed Forgetting

Infographic

Cheatsheet