Version Control

What is version control?

A version-control system (VCS) is a program (or set of programs) that tracks changes to a collection of files. One goal is to easily recall earlier versions of individual files or the entire project. Another is to allow several team members to work on a project, even on the same files, at the same time without impacting each other.

Another name for version-control systems is software configuration management (SCM) systems. The two terms are often used interchangeably — in fact, Git's official documentation is located at git-scm.com. Technically, version control is just one of the practices involved in SCM, while a VCS can be used for projects other than software including books and online tutorials.

With a version-control system, you can:

See all the changes made to your project, when the changes were made, and who made them.
Include a message with every change explaining the reasoning behind it.
Retrieve past versions of the entire project or individual files.
Create branches, where changes can be made experimentally. This feature allows several different sets of changes (for example, features or bug fixes) to be worked on at the same time, possibly by different people, without impacting the master branch. Later, you can merge the changes you want to keep back into master.
Attach a tag to a version — for example, to mark a new release.

Git is a fast, versatile, highly scalable, free, open-source version-control system. Its primary author is Linus Torvalds, the creator of Linux.

Distributed version control

Earlier VCSs, such as CVS, Subversion (SVN), and Perforce, used a centralized server to store a project's history. This centralization meant that one server was also a single point of failure.

Git is distributed, which means that a project's complete history is stored on the client as well as the server. You can edit files without a network connection, check them in locally, and sync with the server when a connection becomes available. If a server goes down, you still have a local copy of the project. Technically, you don't even have to have a server. Changes could be passed around in e-mail or shared using removable media, but no one uses Git this way in practice.

Up next: Exercise - Try out Git

Git Terminology

To understand Git, you have to understand the terminology. Here is a short list of terms that Git users frequently use. Don't sweat the details for now; all of these terms will become familiar as you work your way through the exercises in this module.

Working tree: The set of nested directories and files that contain the project being worked on.
Repository (repo): The directory, located at the top level of a working tree, where Git keeps all of the history and metadata for a project. Repositories are almost always referred to as repos. A bare repository is one that is not part of a working tree; it is used for sharing or backup. A bare repo is usually a directory with a name ending in .git — for example, project.git.
Hash: A number produced by a hash function that represents the contents of a file or other object to a fixed number of digits. Git uses hashes that are 160 bits long. One advantage to using hashes is that Git can tell whether a file has changed by hashing its contents and comparing the result to the previous hash. If the file's time-and-date stamp has changed but the file's contents have not, Git will know the hash has not changed.
Object: A Git repo contains four types of "objects," each uniquely identified by an SHA-1 hash. A blob object contains an ordinary file. A tree object represents a directory; it contains names, hashes, and permissions. A commit object represents a specific version of the working tree. A tag is a name attached to a commit.
Commit: When used as a verb, "commit" means to make a commit object. This action takes its name from commits to a database. It means you are committing the changes you have made so others can eventually see them, too.
Branch: A named series of linked commits. The most recent commit on a branch is called the head. The default branch, which is created when you initialize a repository, is called "master." The head of the current branch is called HEAD. Branches are an incredibly useful feature of Git because they allow developers to work independently (or together) in branches and later merge their changes into master.
Remote: A named reference to another Git repository. When you create a repo, Git creates a remote named "origin" that is the default remote for push and pull operations.
Commands, subcommands, and options: Git operations are performed using commands such as git push and git pull. git is the command, and push or pull is the subcommand. The subcommand specifies the operation you want Git to perform. Commands are frequently accompanied by options, which use hyphens (-) or double hyphens (--). For example, git reset --hard.

These terms and others such as "push" and "pull" will make more sense shortly. But you have to start somewhere, and you might find it helpful to come back and review this glossary terms after you have completed this module.

The Git command line

There are several different GUIs available for Git, such as GitHub Desktop. Many program editors such as Microsoft's Visual Studio Code also have an interface to Git. They all work differently and have different limitations. None of them implement all of Git's functionality.

The exercises in this module use the Git command line — specifically, Git commands executed in Azure's Cloud Shell. However, Git's command-line interface works the same no matter what operating system you're using. Plus, the command line lets you tap into all of Git's functionality. Developers who see Git only through a GUI sometimes find themselves confronted with error messages they can't resolve, and have to resort to the command line to get going again.

The next step is to try out Git for yourself!

Up next: Exercise - Try out Git