Git and GitHub

From Sustainability Methods
Type Team Size
Me, Myself and I Group Collaboration The Academic System Software 1 2-10 11-30 30+

What, Why & When

Fig.1: GitHub Logo
Fig.2: git Logo

Have you ever accidentally overwritten or deleted your teammates’ code in a Google Drive? Ever wanted to go back to a previous version of your script because your bug fixing just created more problems? Ever struggled to combine different snippets of code that were collected through email or even WhatsApp hours before the deadline?

Fear no more, because this article will introduce you to GitHub and help manage the above issues with ease.

Git is an open-source distributed version control (DVCS) and source code management (SCM) system that was initiated by Linus Torvalds – the same guy who played a major role in the creation of the Linux operation system. Git manages changes made to a code repository, allowing developers to go back to previous versions, duplicate the code to other environments and merge the work of different developers.

Goals

GitHub is a web-based hosting service that gives a graphical user interface to the underlying functionality of git and extends it by many more features that facilitate collaboration between developers. These functionalities include user and access management, bug tracking, continuous integration and many more.

Almost every company that is involved with software development uses GitHub, making GitHub not only the largest host of source code in the world but also a basic skill for every developer, engineer or data scientist.

Getting started

In the following, the basic functionalities of git and GitHub will be described. Hopefully, this helps to streamline your efforts to deliver the best team reports ever submitted in Software for Analyzing Data.

COMPONENTS OF GITHUB

Repositories

Repositories (‘repos’) are the central object of git that contains the actual code files of a project together with each file’s revision history.

Organizations, Teams & People

The Users of GitHub are called people on the site. They can be members of a team, which then can be part of a larger organization. Teams can also be nested inside each other, depicting the organizational structure of your company. Your belonging to a team or organization is relevant for your access to their repositories.

Issues

Issues is the build in bug tracking function on GitHub. You can directly mark, discuss and assign bugs and other problems with the code on the platform.

Projects

Projects is the build in project management solution of GitHub. You can use Management frameworks like Kanban to assign tasks to team members and review their completion.

GITHUB FUNCTIONS & WORKFLOWS

Since working with GitHub usually is a team effort, everybody develops their own style or model to using it over time. Different teams will make different agreements on how to use the functionalities of git in order to best meet their needs. Some might use a shared repository model where you branch & merge your development efforts while others might prefer a Fork & Pull model.

In the following, the different functions of git will be described along a generic workflow that could serve as a starting point to develop a more refined process.

Fig.3: A basic GitHub workflow

Branch and Clone

When you are working on a project and want to implement new ideas, a branch should be created. The branch will be a duplicate of the main project at the time of its inception and enable you to work on your ideas without affecting the main branch inside a safe “sandbox” environment. Using separate branches enables a team to work simultaneously on the same project. You could then clone the branch to your local system to make changes to its files.

An alternative to branching could be forking the repository. A branch stays within the same repository and only saves the changed files that can later be merged into the main branch, whereas a fork is an independent hard copy of the entire repository that can be united by way of a pull request. It depends on your style and the organization’s agreement, which model is applied. In this case, we are assuming shared repository model with branching as the main method.

Commit

After completing a small increment or milestone in your development, a commit saves this state as a revertible snapshot in the project’s history. Every feature or change should be committed individually in order to be able to retrace the development process and revert if necessary. The changes made should be described in the commit message to make it easier to follow along.

Pull request

Pull Requests are designed to start a discussion and encourage feedback of your code. A pull request lets you compare two branches of a repository and initiates a discussion about the changed features. If you were developing on a forked project, the pull request will notify the original maintainers of the project repository to review and consider your changes.

Discussion, issues and review

Following the Pull Request, other collaborators can discuss and review the changes with you or even add follow-up commits before you merge your changes into the main branch.

Deploy

To test your changes in the production system, it is usually deployed to a test environment before merging into the main branch. This is not always the case and depends on the specific usage of your team.

Push and Merge

When everything is in order, you can publish your branch by pushing it to the server. Only then, the committed changes are made publicly available. The changes can then be merged into the main branch, bringing together your work with potential other adaptations made through different branches at the same time.

Fig.4: A more refined GitHub workflow

INSTALLATION GUIDE

Basic: GitHub desktop

All the functions of git can be accessed through the graphical user interface of GitHub desktop without having to worry about the command line. Installing GitHub desktop is the recommended way to get started if you are new to GitHub. Just download and execute the installer from this link, it comes packaged with all necessary dependencies.

https://desktop.github.com

Command Line

More advanced, lightweight and fast to use can be the command line interface. Depending on your Operating System, it can be installed though homebrew or git-scm.

https://github.com/git-guides/install-git

Once installed, the git functions can be accessed by typing commands in the command line:

git clone - Duplicate a repository to your HD git add - Add changed and new files to your next commit git commit- Commit changes as a snapshot of the current state git status- View which files have been changed and staged git branch- Create a new branch of the repository git merge - Merge the active branch into main git pull - Pull changes from the server git push - Push committed changes to the server

IDE integrations

Fig.5: Git integration in JetBrains PyCharm

Most popular Integrated Development Environments (IDEs such as JetBrains (IntelliJ, PyCharm, …) VSCode, DBeaver, Eclipse, Slack, … ) offer a direct integration with GitHub.

This can be very convenient, since it allows to commit and push changes directly from the development interface without interrupting your workflow. At the same time, updates can be pulled just as quickly. In addition, it usually gives some visual cues about the status of your local branch compared to the remote repository which can be very helpful.

If available for your IDE, I would recommend installing the github extension for it.

Links & Further reading


The author of this entry is Moritz Kath.