Introduction to git

For PHY7518 • April 14, 2025 • Uni Bonn

Sagnik Ghosh

.... And who will save thou codes from thouself?

@smutch.github.io ( "... and yes this comic is under version control" )

.... And then there was git !

.... very nice! but....

xkcd/1597

Basic backend:

Version control: Idea 1

my_folder_1

my_file_1

my_file_2

my_img_2

my_folder_1a

my_file_1

my_file_2a

my_img_2

folder_1a

folder_1b

folder_2a

folder_1.1b

folder_2c

folder_2d

folder_1a_new

folder_1a_new_new

folder_1arrrrgghhhhh!

Basic backend:

But not all is lost:

Pointers, data, and B-Trees

In 1970, Rudolf Bayer and Edward M. McCreight , while at Boeing, introduced B-trees a self-balancing tree data structure. B-trees allowed efficient data insertion, deletion, and search operations, making them ideal for databases and file systems.

git uses a slightly advanced version Directed Acyclic Graph (DAG)

Basic backend:

📦 Git Objects: Blobs, Trees, Commits, Tags

Git tracks content using four types of objects:

Blob – A file's contents.
Tree – Like a directory: it maps filenames to blobs or other trees (yes, this is the tree in git tree!).
Commit – Points to a tree and has metadata (author, message, parents).
Tag – A label pointing to a specific commit or object.

Basic backend: git commit

commit <object_size>
tree <tree_hash>
parent <parent_hash>         # (optional, omitted in initial commit)
author <name> <email> <timestamp> <timezone>
committer <name> <email> <timestamp> <timezone>

<commit message>

a git commit contains the following info in SHA-1 or SHA-256 format

🗂️ `tree <tree_hash>`

Points to a tree object that represents the directory structure and file contents at the time of the commit.
Think of it as a snapshot of the project’s state.

Basic backend: git commit

🧬 `parent <commit_hash>`

References the immediate predecessor commit(s).
Regular commits have 1 parent.
Merge commits have 2 or more parents.
The first commit has no parent.

👤 `author <name> <email> <timestamp> <timezone>`

This is who originally wrote the code and when.
You can set this with git config --global user.name and user.email.

Basic backend: git commit

🛠️ `committer <name> <email> <timestamp> <timezone>`

This is who added the commit to the repository.
Often the same as the author — but not always (e.g., someone else rebases or cherry-picks your work)

📝 Commit Message

The message describing what the commit does.
This is what you write after git commit -m "your message".

Basic backend: git commit

🔐 Internals (Bonus for Nerds🧠)

The commit is stored in .git/objects/ as a compressed blob.
Its SHA-1 (or SHA-256) hash uniquely identifies it and is calculated from the commit's entire content.
Changing anything (e.g., author name, message, tree) changes the commit hash.

You can check all this using:

git cat-file -p <commit_hash>

This is also why, size does not explode with more commits. git only stores snaps of files that has changed. The blobs are stored with delta compression.

git LFS:

But what about big-data

Normally services like github/gitlab has a limit on both individual filesizes as well as the size of the commits being pushed.

Enters lfs.

Has to be set up seperately

git lfs install

Then in .gitattributes

git lfs track "*.psd"

make sure .gitattributes is tracked

git add .gitattributes

Zenodo:

But ultimately this is a bad idea. Best practice is to store codes and data seperately. Alternatives includes sftp.uni-bonn.de. You can initiate a git for its own!

Version tracked data can also be publicly shared (modulo some restrictions)

Zenodo 🚀 is an open-access repository developed by CERN 🏛️ through the OpenAIRE initiative 🇪🇺. It lets researchers share, preserve, and cite all kinds of research outputs—papers 📄, datasets 📊, software 💻, presentations 📽️, and more. Every upload gets a DOI 🔗, making it easy to cite and find. Zenodo supports all disciplines 🌍 and is free to use 💸, promoting open science and long-term preservation 📦.

git basics:

So far so good, so how to use git?

Initialise repo locally

git init

or clone

git clone <repo-url>

git basics:

to check status

git status

after change, stage a specific file or all changes

git add <file>
git add .

Initialise repo locally

git commit -m "Your commit message"

git basics:

finally to push to github/gitlab

git push
git push <remote> <branch>

to view current remotes

git remote -v

you can also add more remotes to the repo

git remote <name> <url>

.gitignore

.gitignore 🚫📁 is a special file in Git that tells it which files or folders to skip when committing. It keeps your repo clean by ignoring things like logs 📝, temp files 🧹, and secrets 🔐—stuff you don’t want in version control.

# Ignore Python cache
__pycache__/

# Ignore log files
*.log

# Ignore environment files
.env

# Ignore OS-specific files
.DS_Store

github:

GitHub is the world’s leading platform for hosting and collaborating on code. Built around Git 🧠, it lets developers share projects 🌐, track issues 🐞, and work together through pull requests 🔀. With features like Actions ⚙️ for CI/CD, Pages 🌍 for documentation, and Copilot 🤖 for AI-assisted coding, GitHub powers open-source 🔓 and enterprise development alike. Owned by Microsoft 🪟, it’s the go-to hub for coders around the world 🌎.

gitlab.uni-bonn.de:

GitLab is a full DevOps platform that brings your entire software development lifecycle into a single application. From version control 📂 and issue tracking 🧾 to CI/CD pipelines 🚀, security scans 🔒, and deployment tools ☁️—GitLab covers it all. Built on Git, it’s available both as a cloud service ☁️ and self-hosted solution 🏠, making it perfect for teams that want speed ⚡, control 🔧, and powerful collaboration 👥. Open core and proudly community-driven 💬!

Since Jan 2024 Bonn offers a gitlab instance.

github organizations:

GitHub Organizations are shared accounts where teams can collaborate on projects more efficiently. They provide structured access control, central management, and powerful tools for scaling development across multiple repositories.

Repository Management: Create, manage, and share repositories with team members.
Access Control: Configure permissions for repositories, ensuring secure and efficient collaboration.
Security & Compliance: Utilize features like SAML SSO and audit logs for enhanced security and compliance.
Project Integration: Seamlessly integrate with GitHub Actions, Issues, and other GitHub tools for streamlined workflows.
Visibility & Insights: Get detailed insights into contributions, activity, and project progress.
Custom Workflows: Leverage GitHub Apps and integrations to tailor your development process.

github classroom:

GitHub Classroom is a free tool by GitHub that helps teachers manage coding assignments with ease. It lets instructors create, distribute, and grade programming tasks using GitHub repositories—perfect for computer science classes, bootcamps, and workshops.

Students get each assignments as a repo.
Tutors can see the commits and comment on changes
since github acquired npm, grading can be automatised

github classroom:

git branches:

".. through a beautiful distributed graph theory tree model"

Trees can branch :)

master:

git branch -v
>> master 8e1f3aa added points

Of course every tree already has at least one branch. To view:

prints branch names, and the current head of the branch

there are tools to visualise this:

from:

...

note: github, gitlab and git can show different naming convention for the master branch

used to call it "master", now mostly calls it "main"

(including the bonn instance) calls it "main"

master:

This actually depends on two things:

If created by local installation, this depends on the version

with versions < 2.8 (released July 2020) default branch is always "master"
since 2.8 default branch name is configurable and uses "main"

github switched to git 2.8 on October 1, 2020.

so repos created before use "master"
repos created after use "main"

for gitlab instances depends on what is the backend.
Bonn-git happened only in January, 2024 :)

master --> main:

main:

Anyhow now starting with git 2.8 the convention is globally configurable to whatever is your preference. The command is:

git config --global init.defaultBranch main

And ofc you can rename any branch too. If you are on the same branch currently,

git branch -m <new-branch-name>

If you are in a different branch,

git branch -m <old-branch-name> <new-branch-name>

main:

note: this is changes are ofc local, so you have to implement this changes seperately in github

a safe workflow is:

1. Push the newly named branch to remote:

2. Delete the old branch from the remote:

3. Update the upstream tracking (optional but recommended):

git push origin <new-branch-name>

git push origin --delete <old-branch-name>

git push --set-upstream origin <new-branch-name>

create a branch:

syntax:

git branch <branch-name>

But this does not switch you to the new branch. You have to checkout!

git checkout -b <branch-name>
# or (recommended in newer Git versions)
git switch -c <branch-name>

If your git version is >2.3 , this can be achieved with one command switch

git switch -c <branch-name>

"-c" indicates create.

best practices:

Always work on dev branch. Merge changes to main when you are satisfied. Make sure to pull before push! handle merge conflicts locally!

rebase puts your branches HEAD in the front. Are you sure? Are you really really sure? Maybe think once again before you merge with main!

Remember:

best practices:

Recall: git hash contains info about the tree, parent hash, commit name and committer.

So, what happens if two authors changes two different files in the same branch, commits and pushes the change without pulling first?

git automatically creates a branch and merges it back to main! (which contains a separate hash)

best practices:

Things of course does not go this well if the change is in the same line of the same file and you get,

! [rejected]        main -> main (fetch first)
error: failed to push some refs to 'https://github.com/your/repo.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

if you now try git pull

Auto-merging filename.txt
CONFLICT (content): Merge conflict in filename.txt
Automatic merge failed; fix conflicts and then commit the result.

best practices:

This happens because git pull is shorthand for,

git fetch
git merge origin/your-branch

One work around is to use git pull --rebase which adds the local commits after the HEAD and shifts the HEAD upfront. This is also actually a short hand for

git fetch
git rebase origin/your-branch

But note this fix are locally handled and cant be applied at level of host. So always best practice:

"pull" before "push"

Extras: more features from github

git issues:

GitHub Issues are a powerful tool for tracking tasks, bugs, enhancements, and feature requests within a project. Integrated directly into GitHub repositories, they help teams stay organized, prioritize work, and collaborate more effectively. Each issue can be labeled, assigned, commented on, and linked to code changes, making it easier to manage a project's development lifecycle.

Key Benefits of GitHub Issues:

✅ Task Management: Clearly define and track to-dos, bugs, and features in one place.
🏷️ Organization & Prioritization: Use labels, milestones, and assignees to categorize and schedule work.
🔄 Integration with Code: Link issues to commits, pull requests, and branches for seamless traceability.
💬 Team Collaboration: Enable discussions, feedback, and updates in a centralized, transparent space.

git issues:

milestones:

Milestones in GitHub provide a way to group issues and pull requests under a common goal or deadline—like a version release, sprint, or project phase. They help visualize progress and ensure that work is aligned with broader project objectives.

When you assign an issue to a milestone, it becomes part of that goal’s progress tracker. GitHub automatically updates the milestone’s completion percentage based on the number of open vs. closed issues, giving you a clear view of what's left to do.

Using Milestones to Track Issues:

📅 Define Goals: Create milestones for key project phases (e.g., v1.0, Sprint 3, "MVP").
🔗 Link Issues: Assign related issues or PRs to a milestone to group them together.
📊 Track Progress: View how many issues are completed vs. remaining at a glance.
🚦 Plan & Prioritize: Use milestones to focus your team’s efforts and stay on schedule.

milestones:

github projects:

GitHub Projects offer a Kanban-style board or spreadsheet-like interface to organize and manage work across issues, pull requests, and milestones. They help teams break down goals into manageable tasks while keeping everything tied to the source code.

When you close an issue that's part of a milestone and also added to a GitHub Project, it updates both progress bars—providing real-time insight into how close you are to completing a milestone or project.

Why Closing Issues Helps Track Progress:

✅ Progress Automation: Closed issues auto-update milestone and project progress bars.
📦 Milestone Tracking: Each closed issue brings the milestone closer to completion, making it a visible indicator of progress.
📋 Project Board Sync: Items in GitHub Projects can auto-move (e.g., from “In Progress” to “Done”) when the issue is closed.
📈 Reporting & Focus: Helps managers and contributors instantly see what's done, what’s pending, and if the team is on track.

github projects:

git wikis:

GitHub Wikis are a powerful way to maintain shared, living documentation right alongside your codebase. Designed for collaboration, a wiki is stored in a separate Git repository, meaning it can be cloned, edited, and version-controlled just like your main project—but without cluttering your core code.

Wikis are written in simple Markdown, making them easy to read and write, even for non-developers. Plus, they support LaTeX-style math syntax, which is perfect for technical documentation involving formulas or equations.

Why GitHub Wikis Are Great for Shared Docs:

🧑‍🤝‍🧑 Collaborative & Centralized: Keep team knowledge, guides, and notes in one accessible place.
🧾 Clean, Easy Formatting: Uses Markdown for fast, readable formatting—with support for headings, lists, code blocks, etc.
🧮 Math-Friendly: Supports LaTeX-style expressions for math-heavy projects (e.g., $E=mc^2$ renders beautifully).
🗂️ Standalone Git Repo: Clone or edit the wiki independently using git clone https://github.com/user/repo.wiki.git.

git wikis:

README.md:

README.md is the front door to your project—it's the first thing most people see when they visit your GitHub repository. Written in Markdown, it gives users and collaborators a quick overview of what the project is about, how to use it, and how to contribute.

A good README helps others pick up the project quickly, reducing onboarding time and confusion. It should include all essential information someone needs to get started or decide if they want to contribute.

Why a Good README.md Matters:

🚀 Quick Onboarding: Gives users an instant understanding of the project’s purpose and usage.
🛠️ Setup Instructions: Helps new contributors get the environment up and running without digging through code.
📖 Essential Documentation: Acts as a one-stop reference for key info—features, dependencies, usage examples, etc.
🤝 Collaboration Ready: Should list contribution guidelines, licensing, contact info, and any project-specific conventions.

README.md:

github pages:

GitHub Pages lets you turn your GitHub repository into a fully hosted website, directly from your code—no external server or hosting service needed. It’s perfect for project documentation, personal portfolios, blogs, or landing pages.

You can create GitHub Pages using plain HTML, Markdown, or Jekyll (a static site generator), and they’re served right from your repo—either from a special gh-pages branch or your docs/ folder.

Why GitHub Pages Are Useful:

🌐 Instant Websites: Host project documentation or personal pages with just a few clicks.
🧾 Great for Docs: Turn your README, wiki, or Markdown files into clean, readable websites.
🔧 Customizable: Supports Jekyll themes, custom CSS, and even custom domains.
🚫 No Extra Hosting Needed: Free and seamlessly integrated with your GitHub repo

github pages:

github pages:

More Extras:
the .HDF5 file format

extras (.hdf5):

HDF5 is a powerful, flexible file format designed to store and organize large, complex datasets. Used widely in science, engineering, and machine learning, it supports high-performance I/O, hierarchical structures, and cross-platform compatibility—making it ideal for big data applications.

Hierarchical Storage 🌲 – Organize data in a file like folders and files (groups and datasets).
Supports Large Data 🗃️ – Handles terabytes of data efficiently.
Self-Describing Format 🧠 – Data is stored with metadata, so it’s easy to understand and parse.
Cross-Platform 🖥️📱 – Works consistently across operating systems and programming languages.
Compression Support 🗜️ – Built-in compression saves disk space.
Partial I/O 📦 – Load just the parts of the dataset you need—great for big files.
Multi-language Support 🧪 – Compatible with Python (via h5py), C/C++, Fortran, Julia, MATLAB, and more.

extras (.hdf5):

This is a strongly recommended industry standard way to store your data.

Since you can add attributes to the data itself, it helps documentation automatically. You can also automate when a file is created, it automatically stores metadata such as the machine, number of cores, date/time, the compiler version, list of packages used, and the code itself, making it extremely reproducible. Only sky is the limit!

Here's a tutorial to get you started:

Octocat wishes you
Happy git-ing!

Artwork by:

Introduction to git

.... And who will save thou codes from thouself?

.... And then there was git !

.... very nice! but....

Basic backend:

Basic backend:

Basic backend:

📦 Git Objects: Blobs, Trees, Commits, Tags

Basic backend: git commit

🗂️ tree <tree_hash>

Basic backend: git commit

🧬 parent <commit_hash>

👤 author <name> <email> <timestamp> <timezone>

Basic backend: git commit

🛠️ committer <name> <email> <timestamp> <timezone>

📝 Commit Message

Basic backend: git commit

🔐 Internals (Bonus for Nerds🧠)

git LFS:

Zenodo:

git basics:

git basics:

git basics:

.gitignore

github:

gitlab.uni-bonn.de:

github organizations:

github classroom:

github classroom:

github classroom:

github classroom:

git branches:

master:

master:

master --> main:

main:

main:

create a branch:

best practices:

best practices:

best practices:

best practices:

Extras: more features from github

git issues:

git issues:

milestones:

milestones:

github projects:

github projects:

git wikis:

git wikis:

README.md:

README.md:

github pages:

github pages:

github pages:

More Extras: the .HDF5 file format

extras (.hdf5):

extras (.hdf5):

Octocat wishes you Happy git-ing!

🗂️ `tree <tree_hash>`

🧬 `parent <commit_hash>`

👤 `author <name> <email> <timestamp> <timezone>`

🛠️ `committer <name> <email> <timestamp> <timezone>`

More Extras:
the .HDF5 file format

Octocat wishes you
Happy git-ing!