Git Pull vs Git Clone (software version tracking)

A forum for discussion on the software for the WMT River Control System
Post Reply
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Git Pull vs Git Clone (software version tracking)

Post by hamishmb »

Terry,

After our conversation this morning about git, I thought I'd have a look as well to double-check :)

I thought you were right, but from what I've seen:

git clone pulls down a whole repository into a local folder, and git pull will also do that provided you run git init first. At this stage you can use either.

However, if you want to update your local copy of the repository, say before you make some changes, you need to use git pull to pull the new changes down from the server. git clone won't do this for you.

So, maybe we should just use git pull, because it's simpler. I have to say I'm confused now, but essentially always using git pull it what I've done for about 2 years, nothing's ever shown up as being weird, and other repositories tend to tell me to follow one of the above 2 methods when eg downloading to compile something.

Hamish
Hamish
TerryJC
Posts: 2616
Joined: 16/05/2017, 17:17

Re: Git Pull vs Git Clone (software version tracking)

Post by TerryJC »

Hamish,

If you are the only individual pulling copies of the code from the Repository you can do what you like because you will always know what is going on. Even in small teams like ours, it really makes little difference because you (hopefully) talk to each other. I believe that problems will arise as teams get bigger and also as Project timescales stretch out because the record will be littered with Pulls which are not followed by commits, which may well confuse later developers.

Here is what Ralph from the Dorset LUG said when I asked the List about it:
> Is there a 'proper' way to grab a copy of a program from Github without making a Pull Request?

A Pull Request is not you wanting to pull, it's a request to the other party to pull from you, i.e. you've commits they haven't got.

> Previously, I've downloaded the whole repository and just grabbed the code I wanted

Yes, do that. git-clone(1) their repo. You can --depth=1 if you want to get a shallow repository that lacks history if their repo is huge; less disk space, faster transfer.

> but I always believed that you only 'checked-out' a file if you intended to change it.

It's not that precisely defined, and many other systems use it to just mean get a copy, e.g. co(1), that stands for `check out' and is part of RCS, does a check out that's either with an exclusive lock, -l, or unlocked, -u.

https://help.github.com/articles/cloning-a-repository/ may be useful. You can clone someone else's repo as well as your own.
So, in the short term and for small teams, I guess it doesn't matter, but it may be a good idea to get into good habits for the day that you work for Google :D
Terry
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: Git Pull vs Git Clone (software version tracking)

Post by hamishmb »

Okay, so we'll use clone when downloading for the first time, or if we don't intend to make changes, then pull to update our local copy of the repo :)
I still wonder if we're talking at cross purposes here...

Hamish
Hamish
TerryJC
Posts: 2616
Joined: 16/05/2017, 17:17

Re: Git Pull vs Git Clone (software version tracking)

Post by TerryJC »

hamishmb wrote:Okay, so we'll use clone when downloading for the first time, or if we don't intend to make changes, then pull to update our local copy of the repo :)
I still wonder if we're talking at cross purposes here...
Partly maybe. Let me explain how we worked when I developed software for a living. We didn't use Git, partly because it hadn't been invented and mostly because all our historical code resided in Microsoft SourceSafe databases. However, goals and issues were the same.

A developer who wanted to add a new piece of code that he had just written to the database would 'check it in'. That is a 'commit' in Git terminology. If, subsequent to this, he or another developer wanted to change that code, then he would 'check it out'; Pull in Git-speak. He would make his changes and then 'check in' the result, eg another commit to Git. This creates a record from which the history of the piece of code can be traced; eg who did what, when and why.

Sticking to Git parlance from now on, there is a second class of user who might have Pulled code from the database; the engineer who had been tasked to test the code. However, this would not normally happen if he didn't expect to make any changes to the code. So instead, he would 'get' a read-only copy (Clone), carry out his tests and inform the author if he found any problems. He might then add a Comment to the version in the database to say what he had done.

There was also a third class of user; the peer reviewer. He would pretty much follow the same process as the tester, so would normally not Pull anything either.

Finally there was the guy (or gal) who simply wanted to re-use the code in another project or maybe they simply wanted a quick look at what had been done. They would definitely not Pull a copy, because if they did then they would leave a trail in the history that would go nowhere. These are the kinds of situations that I was referring to when I tried to make the distinction between when to Pull and when to Clone.

I think that the biggest single factor that makes users of Git prefer to Pull code, rather than Clone is the fact that a Clone normally copies the whole tree whereas in SourseSafe a 'Get' could be performed on a single file. I suspect that the reason for this lies in the thought processes of the developer of Git (Linus Torvalds) who at the time was only interested in providing a tool for developers of the Linux kernel, where not having the whole tree would be bound to cause problems for people trying to test things off-line so to speak. Since for us, the codebase is likely to be very shallow, that Clone behaviour gets in the way a bit.

I hope that helps you understand why I find the concept of Pulling a file just to test it a bit alien and dangerous.
Terry
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: Git Pull vs Git Clone (software version tracking)

Post by hamishmb »

Thanks Terry, that makes much more sense. Eww Microsoft :P

I see. Note that if you want to update your copy of a repository after cloning - which you'd only do if you were developing it I suppose - you can use pull for that. So in short, users of the code should probably clone, and developers (who are going to push changes back) should pull.

I was confused because I thought you were talking about cloning a repository to the work on it and push changes back! I didn't realise you were talking about just downloading to test the software, and in that case I agree that clone is better. I think we're on the same page now, but do note that you need to use push to actually update the online repository after committing :)

Hamish
Hamish
TerryJC
Posts: 2616
Joined: 16/05/2017, 17:17

Re: Git Pull vs Git Clone (software version tracking)

Post by TerryJC »

hamishmb wrote:I think we're on the same page now, but do note that you need to use push to actually update the online repository after committing
OK. I wasn't aware of that, but I am aware that Git is far more functional than SourceSafe ever was and allows many developers to Pull the same code and perform an intelligent merge after the changes. We could only ever Check Out a file one at a time and merging was very difficult if someone had jumped the gun and made changes to a another copy and then tried to do a swift check out / Check in to save time.

Also dependencies could cause problems. Two users could check out two different files that were dependent on each other in some way and then break everything when they both checked in their changes. I gather that Git has tools to help with this; Linus is a very clever bloke. 8-)
Terry
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: Git Pull vs Git Clone (software version tracking)

Post by hamishmb »

SourceSafe sounds like a nightmare then :P Although, it was made by Microsoft so, it probably doesn't come as too much of a surprise XD
Hamish
hamishmb
Posts: 1891
Joined: 16/05/2017, 16:41

Re: Git Pull vs Git Clone (software version tracking)

Post by hamishmb »

Note in case there's still confusion here:

Terry, you were confusing a pull request (merging someone else's commits into your branch/merging branches like Patrick and I are doing) with running "git pull", which is used to update your local copy of a remote repository (and does almost exactly the same thing as "git clone"). This is a good resource: https://git-scm.com/book/en/v2

I imagine you've since realised, but I thought I might as well leave this here for a note just in case.
Hamish
PatrickW
Posts: 146
Joined: 25/11/2019, 13:34

Re: Git Pull vs Git Clone (software version tracking)

Post by PatrickW »

I also recommend the Pro Git book that Hamish has linked to, although I do find the Web version a bit awkward to navigate, so I'll link to some relevant sections.

I'd like to clarify some points that I feel are not fully explained in this thread. As Hamish said, Terry might well already have figured all of this out, but perhaps it can help someone else.

No Git equivalent to SourceSafe's "check out"

Git doesn't really have an equivalent to checking something out from a system like SourceSafe or Subversion. Files aren't locked or controlled exclusively, and people don't register an interest in particular files or an intention to work on them. Instead, changes are integrated together after the fact, through merging and rebasing.

Running "git pull" doesn't create a record that you have checked out a copy of the repository or that you intend to work on it. (OK, so it might technically show up in a log file somewhere, but it won't appear anywhere where other users of the repository are expected to look for it.)

So, if you don't intend to modify the code, it's perfectly safe and expected to run "git pull" to update yourself to the latest copy of the code from the remote repository. You'd typically run "git clone" the first time, to set up your local repository, and "git pull" on each subsequent occasion.

If you are doing development work, then it's the same story, although, if you've changed any tracked files, then "git pull" will merge your local changes with any updates from the remote. This merge is entirely local and nobody knows about it until you decide to push the results of it back to the remote. If you just want to see the status of the remote without merging anything, then you can use "git fetch", which is like "git pull" except it doesn't do a merge.

Git does have a "checkout" command, but the meaning of "checkout" in Git is to populate your working directory with the files from in a specific commit, which is an entirely local operation. "git checkout" is commonly used to switch to a different branch.

Chapter 1.1 of the Pro Git book briefly touches upon of some of the conceptual differences between Git and other version control systems.

Pull requests

Notionally, you can think of a pull request as you asking somebody else to run "git pull" to pull in work that you have done. Making a pull request instead of pushing your work allows the other person to control how your work is merged or integrated, and the pull request may be intended to initiate a collaborative discussion about the integration. Pull requests enable projects to accept contributions, through Git, from people who don't have write access to the project's repository.

Pull requests are a collaboration strategy and not a core feature of Git. There isn't a universal standard for pull requests. Git itself provides a way to produce an email-based pull request, while GitHub has a web-based pull request feature, but those are only two possible options that projects can choose to use or not use. Not all projects make use of pull requests.

Chapter 5 of the Pro Git book explains some of the different workflows that Git can be used within, with pull requests discussed in chapter 5.2.
Post Reply