Wireshark mailing list archives

Re: Wireshark Git Mirror Maintenance


From: Gerald Combs <gerald () wireshark org>
Date: Sun, 03 Aug 2014 15:20:12 -0700

On 8/3/14, 11:34 AM, Evan Huus wrote:
On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <gerald () wireshark org
<mailto:gerald () wireshark org>> wrote:

    On 5/10/13 1:47 PM, Evan Huus wrote:
    > Hi Gerald
    >
    > I just cloned the Wireshark git mirror onto a new machine and was
    > surprised at how large it was to download. Running an aggressive git
    > gc on the finished clone reduced the disk usage on my machine from
    > ~500MB to ~150MB.
    >
    > I'm a bit surprised - git is supposed to automatically garbage collect
    > repositories when they get too cluttered, but perhaps its threshold
    > for automatic gc is just very high.
    >
    > I pinged Balint (CCed) about this and he suggested running gc on a
    > weekly basis and gc --aggressive on a monthly basis on the server. It
    > would probably save a non-trivial amount of bandwidth in the long term
    > as more people clone the repository.

    It might be due to our particular circumstances (a bare repository only
    updated via the mirror script) but git's automatic garbage collection
    doesn't seem to happen very often. The mirror script runs "git gc
    --auto" each time it synchronizes which keeps it from filling up the
    disk (which happened early on) but as you point out there is room for
    improvement. I added a cron job that runs "git gc --aggressive" each
    week. Here is the output from a manual run, which includes "git
    count-objects -v" before and after:

    2013-05-13 14:38:12: Started.
    2013-05-13 14:38:12: Synchronizing repository wireshark
    2013-05-13 14:38:12: Object count start
    count: 0
    size: 0
    in-pack: 316591
    packs: 45
    size-pack: 567146
    prune-packable: 0
    garbage: 0
    2013-05-13 14:38:12: Collecting garbage
    2013-05-13 15:09:56: Object count start
    count: 0
    size: 0
    in-pack: 316596
    packs: 2
    size-pack: 127499
    prune-packable: 0
    garbage: 0
    2013-05-13 15:09:56: Done 


So it's been over a year since this conversation and we have actually
migrated to Git/Gerrit so I have no idea what Gerrit is doing in this
regard (is there even a "real" git repository backing it, or is it all
internal magic?), but I recently came across [1] which suggests that
repeated use of --aggressive maybe wasn't such a good idea after all.

It suggests just sticking to regular `git gc` except in cases of large
one-time imports (like we did on migration) at which point you should
run the apparently-very-slow `git repack -a -d --depth=250 --window=250`.

FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is
only 161MB, and my current desktop is actually not beefy enough to run
the recommended repack command so I have no idea what improvement that
would give.

It's a "real" git repository but any operations performed by Gerrit are
done using JGit. The weekly automatic number update script runs `gerrit
gc --all`, which uses JGit's garbage collector. Many sites including
Google appear to run it one or more times a day. We may want to to the same.

I tried running git `repack -a -d --depth=250 --window=250` on the
server. It ran successfully and shrunk the repository from 248 MB to 208
MB but now the OS X builders are timing out during `git fetch`...
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: