Under the Hood: The Elegant Geometry of Git Internals
Peeling back the layers of Git’s brain

Most developers treat Git like a black box: you input commands, and history magically appears. But beneath the surface of git commit isn't a complex algorithm, it’s an elegant, remarkably simple data store.
Git is fast and makes it impossible to lose data, ever wonder why? The answer lies in the .git directory. Let’s study the brain of our project and stop learning commands.
What we’re uncovering:
The
.gitAnatomy: Why this hidden folder is the "brain" of your project.The Trinity of Objects: How Blobs, Trees, and Commits link together to map your entire history.
The Content-Addressable Secret: How Git helps the code become immutable and corruption-proof by using SHA-1.
The Lifecycle of a Change: Demistify behind the scene of the commands when you press “Enter”
By the time we're done, you won't just use Git; you’ll understand the architecture that changed software development forever.
Git's Control Center: The .git Folder
If you peek inside the hidden directory, i.e. .git after you run git init , you will be able to see the Git’s brain, which contains everything to track project’s history.
Let run some command to see this in action:
# Create a new directory and initialize Git
mkdir my-project
cd my-project
git init
# List the .git directory contents
$ ls -la .git/
You will see something like:
Structure of the .git Directory
.git/
├── HEAD # Points to current branch
├── config # Repository configuration
├── description # Repository description (for GitWeb)
├── hooks/ # Scripts that run on Git events
├── info/ # Global exclude patterns
│ └── exclude
├── objects/ # Git's object database
│ ├── info/
│ └── pack/
├── refs/ # References (branches, tags)
│ ├── heads/ # Local branches
│ ├── remotes/ # Remote branches
│ └── tags/ # Tags
└── index # Staging area (binary file)
objects/: This is the heart of the operation. It’s a content-addressable database. Every version of every file, every folder structure, and every commit message ends up here as a "Git Object."refs/: Short for "references." This is where Git keeps track of where your branches (heads) and tags point. It’s essentially a map of pointers to specific commits.HEAD: A tiny file with a massive responsibility. Git determines your current position through this.index: This file (which only appears after yougit addsomething) is the "Staging Area." It acts as a bridge between your working directory and the permanent database.Why does `.git` exist?
.gitfolder is the project’s DNA. This makes the CVS truly distributed. It distributes the entire project distory to every developer who cloned it. The.gitfolder can be used to restore the project to its previous state if it breaks or we lose the codebase.
Git Objects: The Building Blocks
Git stores everything as objects in .git/objects/ directory. These objects are:
Blob (Binary Large Object): Stores content of the file
Tree: Stores structure of the directory
Commit: Store commit metadata and history
Let’s visualise how Git stores a simple file:
Object = Header + Content
Hash = SHA-1

# Create a simple file
echo "Hello Git!" > hello.txt
# Add it to Git
git add hello.txt
# Find the blob hash Git created
git ls-files -s
100644 670a245535fe6316eb2316c1103b1a88bb519334 0 hello.txt
# View the blob content using git cat-file
git cat-file -p 670a245535fe6316eb2316c1103b1a88bb519334
Hello Git!
# Check the object type
git cat-file -t 670a245535fe6316eb2316c1103b1a88bb519334
blob
# Check the object size
git cat-file -s 670a245535fe6316eb2316c1103b1a88bb519334
12
Git created the blob object even though the code has not been committed.
The Trinity of Objects
Blob Objects
The blob object only stores the raw content of the file without its metadata.

Two files having identical content share the same blob, inspite of having diffent files names and directory. This helps to achieve storage efficiency.
Tree Objects
A tree object represents a directory. It contains entries that point to blobs and other trees (subdirectories).

Each entry contains:
Mode: File permissions (100644 for regular file, 040000 for directory)
Type: blob or tree
Hash: SHA-1 of the object
Name: Filename or directory name
# Let's create a simple project structure
mkdir src
echo "# My Project" > README.md
echo "console.log('Hello')" > src/script.js
git add .
git commit -m "Initial commit"
[main e96fd80] Initial commit
2 files changed, 2 insertions(+)
create mode 100644 README.md
create mode 100644 src/script.js
# Find the commit hash
git log --oneline
e96fd80 Initial commit
# View the commit object
git cat-file -p e96fd80
tree 94adf3b26b1674cbe0bd655e26779ddf22a56976
parent 97f2a8f1aa3505fac6e490a5692972e154e328dc
author your-name <your-mail@email.com> 1643723400 +0000
committer your-github-id <your-mail@email.com> 1643723400 +0000
Initial commit
# View the root tree
git cat-file -p 94adf3b26b1674cbe0bd655e26779ddf22a56976
100644 blob a2beefd59223ea16000788d77e62f96bdaf23c7c README.md
040000 tree 3b526029261c5c5aeca0a5624002a449a5ce689d src
# View the src/ subdirectory tree
git cat-file -p 3b526029261c5c5aeca0a5624002a449a5ce689d
100644 blob e6eefa224163ef16c6c12834c767a14c44c4d810 script.js
The hierarchy: commit → root tree → subdirectory tree → blob
Commit Objects
A commit object ties everything together. It points to a tree and contains metadata about the commit.

The components are:
tree: Points to the root tree of your project at this commit -
parent: Points to the previous commit(s) (merge commits have multiple parents)
author/committer: Who made the changes and who committed them
message: Your commit message
What Happens During git add?
When git add command is run, Git creates objects and updates the staging area.

Detailed Breakdown*:*
1. Hash the file: Git computes SHA-1 hash of the file content
2. Create blob: Git compresses the content with zlib and stores it in .git/objects/
3. Update index: Git adds an entry to .git/index (staging area) mapping the filename to the blob hash
index is a binary file (`.git/index`) containing the sorted list of all tracked files and their blob hashes
#to list all files in the index with full details
git ls-files --stage
100644 a2beefd59223ea16000788d77e62f96bdaf23c7c 0 README.md
100644 e6eefa224163ef16c6c12834c767a14c44c4d810 0 src/script.js
The blob exists in .git/objects/, but no commit or tree has been created yet. The staging area is just a list of "what should go in the next commit."
Watching Git Add in Action
# Start with a clean repository
git status
On branch main
nothing to commit, working tree clean
# Create a new file
echo "This is a test file" > test.txt
# Check status - file is untracked
git status
On branch main
Untracked files:
(use "git add ..." to include in what will be committed)
test.txt
# Count objects before adding
find .git/objects -type f | wc -l
0
# Add the file
git add test.txt
# Count objects after adding - a new blob was created!
find .git/objects -type f | wc -l
1
# View the staging area
git ls-files -s
100644 9f4b6d8bfeaf44aaa69872286163784706d1b053 0 test.txt
# The blob exists, even before commit!
git cat-file -p 9f4b6d
This is a test file
# But no commit or tree objects exist yet
git rev-parse HEAD
fatal: ambiguous argument 'HEAD': unknown revision or path
The blob was created during git add, not during git commit.
What Happens During git commit?
The git commit command transforms your staged changes into permanent history by creating tree and commit objects.

Detailed breakdown:
1. Read the index: Git reads .git/index to see what files are staged
2. Create tree objects: Git creates tree objects for each directory, working bottom-up
3. Create commit object: Git creates a commit object pointing to the root tree
4. Update branch reference: Git updates .git/refs/heads/main to point to the new commit
5. Clear staging area: The index remains, but now matches the new commit
# Continuing from our previous example with test.txt staged
git commit -m "Add test file"
[main (root-commit) 7e9f2a4] Add test file
1 file changed, 1 insertion(+)
create mode 100644 test.txt
# Count objects after commit - now we have blob + tree + commit!
find .git/objects -type f | wc -l
3
# View the commit
git cat-file -p HEAD
tree 7abbe04404b7c52c1aa0b4292ae70db4468f09c7
author your-name <your-email@mail.com> 1707475200 +0000
committer your-name <your-email@mail.com> 1707475200 +0000
# Add test file
# View the tree that was created
git cat-file -p 7abbe04404b7c52c1aa0b4292ae70db4468f09c7
100644 blob 9f4b6d8bfeaf44aaa69872286163784706d1b053 test.txt
# The branch reference was updated
cat .git/refs/heads/main
2ae372a8dbf3d096876a303255518a5549921cbf
# And HEAD points to it
cat .git/HEAD
ref: refs/heads/main
How Git Tracks Changes
Git doesnt store diffs and delta, rather it stores the snapshot of the project at each commit.
The objects are identified by their content hash:
The files containing identical content has the same blob
Unchanged files between commit reuse existing blobs
Only modified files create new blobs
Lets visualise it using bash commands:
# Create initial commit with three files
echo "hello" > file1.txt
echo "world" > file2.txt
echo "test" > file3.txt
git add .
git commit -m "Initial commit"
[main 1232eb8] Initial commit
3 files changed, 3 insertions(+)
# Record the blob hash for file1.txt
git ls-files -s file1.txt
100644 ce013625030ba8dba906f756967f9e9ca394464a 0 file1.txt
# Count total objects
find .git/objects -type f | wc -l
5 # 3 blobs + 1 tree + 1 commit
# Now modify ONLY file2.txt
echo "MODIFIED" > file2.txt
git add file2.txt
git commit -m "Update file2"
[main 6bb1664] Update file2
1 file changed, 1 insertion(+), 1 deletion(-)
# Count objects again
find .git/objects -type f | wc -l
8 # Added: 1 new blob + 1 new tree + 1 new commit = 3 more
# Check file1.txt blob hash again - IT'S THE SAME!
git ls-files -s file1.txt
100644 ce013625030ba8dba906f756967f9e9ca394464a 0 file1.txt
# Verify file1.txt blob exists only once
find .git/objects -type f | xargs -I {} sh -c 'git cat-file -t {} 2>/dev/null | grep -q blob && git cat-file -p {} | grep -q "^hello$" && echo {}'
.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a # Only one blob with "hello"!
# View both trees to see the reuse
git cat-file -p main~1^{tree} # First commit's tree
100644 blob ce013625030ba8dba906f756967f9e9ca394464a file1.txt
100644 blob cc628ccd10742baea8241c5924df992b5c019f71 file2.txt
100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 file3.txt
$ git cat-file -p main^{tree} # Second commit's tree
100644 blob ce0136... file1.txt # ← Same blob hash
100644 blob b5dc6b... file2.txt # ← Different blob (changed)
100644 blob 9daeaf... file3.txt # ← Same blob hash
Over time, most objects are reused. Git also uses packfiles to compress objects further, storing only deltas for similar files.
Hash-Based Integrity
Every Git object is identified by the SHA-1 hash of its content. This provides powerful guarantees:
- Content Integrity
Tamper detection: Any corruption or malicious modification changes the hash
Verification: You can verify repository integrity by rehashing objects
Deduplication: Identical content always produces the same hash
Verify Hash Integrity:
# Create a file and see its hash
echo "Hello, Git!" > hello.txt
git hash-object hello.txt
ce013625030ba8dba906f756967f9e9ca394464a
# Even a tiny change (! to ?) completely changes the hash
echo "Hello, Git?" > hello.txt
git hash-object hello.txt
3b18e512dba79e4c8300dd08aeb37f8e728b8dad # Completely different!
# Same content = same hash (deduplication)
echo "Hello, Git!" > file_a.txt
echo "Hello, Git!" > file_b.txt
git hash-object file_a.txt
b7aec520dec0a7516c18eb4c68b64ae1eb9b5a5e
git hash-object file_b.txt
b7aec520dec0a7516c18eb4c68b64ae1eb9b5a5e # Identical hash!
# Git will store only ONE blob for both files
git add file_a.txt file_b.txt
git ls-files -s
100644 b7aec520dec0a7516c18eb4c68b64ae1eb9b5a5e 0 file_a.txt
100644 b7aec520dec0a7516c18eb4c68b64ae1eb9b5a5e 0 file_b.txt
# Same blob hash for both files!
Commit Chain Integrity
Each commit includes its parnt’s hash in its content, hence, the entire history forms a cryptographic chain. Changing a commit changes its hash. This makes the rewriting of history detectable.
Essential Git Internals Commands
Viewing Objects
# View any object's content
git cat-file -p <hash>
# Check object type (blob/tree/commit)
git cat-file -t <hash>
# Check object size
git cat-file -s <hash>
# Pretty-print with type info
git cat-file blob <hash> # View blob content
git cat-file commit <hash> # View commit details
git cat-file tree <hash> # View tree entries
Exploring the Repository
# List all files in the staging area
git ls-files --stage
# List all files tracked at HEAD
git ls-tree HEAD
git ls-tree -r HEAD # Recursive
# List objects in a tree
git ls-tree <tree-hash>
# View commit history with hashes
git log --oneline
git log --oneline --graph --all
# Show all reachable objects
git rev-list --objects --all
Finding Objects
# Find all object files
find .git/objects -type f
# Count total objects
git count-objects -v
# Hash a file without storing it
git hash-object <file>
# Hash and store a file
git hash-object -w <file>
# Hash from stdin
echo "content" | git hash-object --stdin
echo "content" | git hash-object -w --stdin
Repository Health
# Check repository integrity
git fsck
git fsck --full
# Find unreachable objects
git fsck --unreachable
# Find dangling commits
git fsck --lost-found
# View reflog (command history)
git reflog
git reflog show HEAD
Advanced Inspection
# Compare two commits
git diff <hash1> <hash2>
# Show what changed in a commit
git show <hash>
# Show commit with tree details
git log --stat
git log --name-status
# Verify a file's hash
git hash-object <file>
# Find which commit introduced a file
git log --all --full-history -- <filepath>
Understanding References
# View current branch
cat .git/HEAD
# View branch references
cat .git/refs/heads/main
ls -la .git/refs/heads/
# View all references
git show-ref
# Symbolic ref (what HEAD points to)
git symbolic-ref HEAD
Understanding Git commands changes how you interact with the terminal, you understand why you are doing git add and git commit. The understanding of Blob, Trees and Commits helps in understanding content-addressable filesystem.
Git uses SHA-1 hashes as its address, where git add generates the hash and create blobs, and git commit takes a permanent snapshot of it by creating trees and commits.
By gaining understanding of these fundamentals, Git becomes a logical system, undestanding what they are doing under the hood.





