This page is also hosted on GitHub.io.
Before the session
-
Make a GitHub account, if you don’t have one already
-
Get git, if you haven’t already
-
Windows users, make sure you get "git bash" during the installation
-
Mac users, it may be installed if you have Xcode. If you have homebrew, just
brew install git -
Linux users, you probably already have it, or can install it through your package manager
-
Open a terminal (or git bash, on windows) and enter
gitto make sure it’s installed - you should see some help text.
-
Why
Your code, data, and manuscripts backed up wherever you want it. There is no single source of truth: git is decentralised and every copy can be the master.
Anyone can use the specific version of your project you published on X date, you sent to Y journal, you updated with Z changes.
Every "save" is assigned to your name and email address, even if someone else takes over the project. Your contributions are persisted, and if you need to look up who to ask about a particular change, you can.
Git allows multiple people to work on a project simultaneously, with well-defined merge points. It makes it easy to see what’s changed between each revision.
Keep code, data, and figures in sync with each other, rather than thinking "which version of the code produced this figure?".
What
Anything plain text!
Binary files can be added, but it’s best to keep them small and changing infrequently, and you can’t track changes at the sub-file resolution. For larger binary data, there are git-based/ git-like solutions too (git-lfs, dvc).
Besides git, there are other advantages to writing manuscripts and data in plain text: less lock-in with any particular software vendor, more stable over time, more flexible tooling, and it separates content from presentation.
Concepts
Your current project as it exists on your machine right now, regardless of git.
Files that git cares about versioning. Files are not tracked by default: you need to tell git which files to track. You can set rules to blanket ignore particular directories, filenames, or extensions, so that they are never tracked.
Each git "save point" (commit) is a snapshot of a set of files. These are each stored in full (although as an implementation detail, files which don’t change are compressed down to nothing). Each snapshot knows its parent snaphots (the commits immediately prior). Each snapshot also has an author and a commit message.
Files are not tracked until you add them to the tracked list. Tracked files which are changed will not be committed unless you add them to the index (also referred to as the "staging area"). This allows you to make a lot of changes in one go, but then put changes to different files in different commits.
You can refer to git commits in different ways. Branches are useful for working in parallel. HEAD refers to the commit with no children on your current branch. Tags are useful for giving meaning to a commit (e.g. a released version).
You can push your changes onto a git repository which lives elsewhere (e.g. on another computer, or on GitHub). You can also pull changes in that repository onto your local repository. There is no interaction between different repos except when you tell them to interact, and neither is the "one true version": the concepts of original/ backup etc. are left to the user.
|
Note
|
GitHub is not git. It is just the most ubiquitous one of the many organisations which provide somewhere to host a remote repository, and also provides some tooling around it. You can use git without GitHub, and your choice to use GitHub as a remote backup and/or the permanent source of truth for your project is purely your perspective as a human, not git’s (which treats all repos as equally valid). |
Basic usage
First-time configuration
You need to tell git who you are. These details are purely used to identify you with your contributions, not for authentication.
git config --global user.name "My Name" git config --global user.email "my.email@host.com"
Also, by default you should should have colourful output, but just in case:
git config --global color.ui true
A few tasks in git require external tools.
The most important is a text editor: which one you use will depend on your operating system, what you have installed, and your personal preferences.
vim may be the default, and is very powerful but tricky to start to use.
nano is installed on many unix-like systems and is simpler for basic tasks.
Windows users may want to keep things simple with notepad.
You can set it with git config --global core.editor <my_favourite_editor>
Common commands
-
git --help: Show help messages. Add--helpto any subcommand to see messages specific to that subcommand. -
git init: Make the current working directory a git repository. -
git status: Show which files are changed, tracked, and staged in the working directory. -
git log: List all of the prior commits on this branch, with their authors, timestamps, messages etc. -
git add <file_or_directory>: Add the given file to the index, staging it. If the file was previously untracked, it will be added to the tracking list. -
git commit: Commit the indexed changes. A text editor will be opened for you to type a message about the changes this commit represents. Abort the commit by saving an empty message. -
git checkout <reference>: Change your current working directory to the snapshot saved in the given commit, branch, tag, or HEAD. -
git branch <branch_name>: Make a branch which can exist in parallel with the original branch. Commits made to this branch can be merged back at some point in the future. -
git diff: Look at the differences between some combination of your working directory, the index, a file, or a commit. -
git push <remote_name>: Push your changes from a particular branch onto the given remote. -
git pull <remote_name>: Pull changes from a particular remote into your current branch.
Workflow
Starting a new project
-
Make new directory, and navigate into it.
-
Initialise git:
git init. -
Make some files.
-
Track and stage those files:
git add my_file.txt,git add path/to/directoryetc. -
Commit those files:
git commit, type and save an informative message. -
Make and commit a few more changes. Look at the
git diffbeforegit adding your changes. Look at thegit logafter committing them. -
Copy the long alphanumeric string at the top of a commit, and
git checkout <that_string>, to have a look at the repository at that moment in time (make sure you commit your changes first!);git checkout <your_branch>to go back. The default branch is calledmaster.
Contributing to a project on GitHub
-
Find the GitHub page for that project
-
Click "Fork" in the top right to clone their repository into your GitHub account
-
In your fork, click "Clone or download" and copy the URL in the popup ("Use HTTPS" if you have the option)
-
git clone <that_url>will clone your repository onto your computer, in a directory with the same name as the project -
Make some changes,
git addthem, andgit committhem. -
git pushthose changes up to your remote repository (by default, it will be calledorigin). -
On the GitHub page of your project, request that the original developer pulls your changes into their repository by making a Pull Request. On GitHub, PRs can be reviewed, commented on, and updated before the merge.
Things to remember
-
Large binary files should be kept out of the git repository: consider adding them to the
.gitignore -
Git detects text changes by line: formats like LaTeX, Asciidoc, and (most flavours of) Markdown won’t break a paragraph without a double newline, so put each sentence on a new line. This also plays very nicely with good text editors.
-
Git is a tool. How you use it, which repos you treat as a source of truth, how you handle merges etc. are up to you.
-
Options make simple tasks more ergonomic, e.g.
git commit -m "My message"for short commitmessages,git checkout -b my_new_branchto make and check out a newbranch in one go,git add -uto add allupdated files (i.e. those already being tracked). -
Interacting with a remote repository is much easier using SSH, although it’s more complicated to set up. GitHub’s documentation is excellent.
-
It is possible, and sometimes necessary, to manually merge 2 or 3 revisions, to rewrite git history, or to rebase one chain of commits onto another branch (e.g. if the master branch changes while you’re working on a separate branch). These are more advanced tasks and out of scope for this tutorial.
Links
-
Official git docs: dense but authoritative
-
missing-semester/version-control, an introduction to some git concepts
-
Gitflow, a suggested workflow/ branching model for collaborative work
-
GitHub Hello World, a brief introduction to working with git/GitHub and entry point into GitHub’s documentation
Related topics
-
Sanitising data for optimal version control
-
Document creation with plaintext
-
The ecosystem of text-mangling tools
Appendix A: This document
This document can be compiled to a number of targets with Asciidoctor. To serve it as a static website:
-
Run
asciidoctor -o docs/index.html -
Add, commit, and push the output up to GitHub
-
On the repo page, go to Settings → Options → GitHub Pages → Source → master branch /docs folder
Appendix B: Authors
If you contribute meaningfully to this document, feel free to add yourself to the list of authors below.
-
Chris L. Barnes (clbarnes)