Wireshark mailing list archives
Re: Wireshark Git Mirror Maintenance
From: Evan Huus <eapache () gmail com>
Date: Sun, 3 Aug 2014 18:34:11 -0400
On Sun, Aug 3, 2014 at 6:20 PM, Gerald Combs <gerald () wireshark org> wrote:
On 8/3/14, 11:34 AM, Evan Huus wrote:On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <gerald () wireshark org <mailto:gerald () wireshark org>> wrote: On 5/10/13 1:47 PM, Evan Huus wrote: > Hi Gerald > > I just cloned the Wireshark git mirror onto a new machine and was > surprised at how large it was to download. Running an aggressivegit> gc on the finished clone reduced the disk usage on my machine from > ~500MB to ~150MB. > > I'm a bit surprised - git is supposed to automatically garbagecollect> repositories when they get too cluttered, but perhaps its threshold > for automatic gc is just very high. > > I pinged Balint (CCed) about this and he suggested running gc on a > weekly basis and gc --aggressive on a monthly basis on the server.It> would probably save a non-trivial amount of bandwidth in the longterm> as more people clone the repository. It might be due to our particular circumstances (a bare repositoryonlyupdated via the mirror script) but git's automatic garbage collection doesn't seem to happen very often. The mirror script runs "git gc --auto" each time it synchronizes which keeps it from filling up the disk (which happened early on) but as you point out there is room for improvement. I added a cron job that runs "git gc --aggressive" each week. Here is the output from a manual run, which includes "git count-objects -v" before and after: 2013-05-13 14:38:12: Started. 2013-05-13 14:38:12: Synchronizing repository wireshark 2013-05-13 14:38:12: Object count start count: 0 size: 0 in-pack: 316591 packs: 45 size-pack: 567146 prune-packable: 0 garbage: 0 2013-05-13 14:38:12: Collecting garbage 2013-05-13 15:09:56: Object count start count: 0 size: 0 in-pack: 316596 packs: 2 size-pack: 127499 prune-packable: 0 garbage: 0 2013-05-13 15:09:56: Done So it's been over a year since this conversation and we have actually migrated to Git/Gerrit so I have no idea what Gerrit is doing in this regard (is there even a "real" git repository backing it, or is it all internal magic?), but I recently came across [1] which suggests that repeated use of --aggressive maybe wasn't such a good idea after all. It suggests just sticking to regular `git gc` except in cases of large one-time imports (like we did on migration) at which point you should run the apparently-very-slow `git repack -a -d --depth=250 --window=250`. FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is only 161MB, and my current desktop is actually not beefy enough to run the recommended repack command so I have no idea what improvement that would give.It's a "real" git repository but any operations performed by Gerrit are done using JGit. The weekly automatic number update script runs `gerrit gc --all`, which uses JGit's garbage collector. Many sites including Google appear to run it one or more times a day. We may want to to the same. I tried running git `repack -a -d --depth=250 --window=250` on the server. It ran successfully and shrunk the repository from 248 MB to 208 MB but now the OS X builders are timing out during `git fetch`...
Hmm, that's interesting, I would have expected a bigger improvement (given my local copy is still smaller than the one on the server). Perhaps it is worth trying an --aggressive gc just once (or passing the -f and -F flags to the existing repack command, which is probably even *more* aggressive). No idea why the buildbots would be timing out... the gc shouldn't have materially affected their ability to pull down deltas I don't think.
___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- Re: Wireshark Git Mirror Maintenance Gerald Combs (Aug 03)
- Re: Wireshark Git Mirror Maintenance Evan Huus (Aug 03)