Wireshark mailing list archives

Re: Wireshark Git Mirror Maintenance


From: Evan Huus <eapache () gmail com>
Date: Sun, 3 Aug 2014 18:34:11 -0400

On Sun, Aug 3, 2014 at 6:20 PM, Gerald Combs <gerald () wireshark org> wrote:

On 8/3/14, 11:34 AM, Evan Huus wrote:
On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <gerald () wireshark org
<mailto:gerald () wireshark org>> wrote:

    On 5/10/13 1:47 PM, Evan Huus wrote:
    > Hi Gerald
    >
    > I just cloned the Wireshark git mirror onto a new machine and was
    > surprised at how large it was to download. Running an aggressive
git
    > gc on the finished clone reduced the disk usage on my machine from
    > ~500MB to ~150MB.
    >
    > I'm a bit surprised - git is supposed to automatically garbage
collect
    > repositories when they get too cluttered, but perhaps its threshold
    > for automatic gc is just very high.
    >
    > I pinged Balint (CCed) about this and he suggested running gc on a
    > weekly basis and gc --aggressive on a monthly basis on the server.
It
    > would probably save a non-trivial amount of bandwidth in the long
term
    > as more people clone the repository.

    It might be due to our particular circumstances (a bare repository
only
    updated via the mirror script) but git's automatic garbage collection
    doesn't seem to happen very often. The mirror script runs "git gc
    --auto" each time it synchronizes which keeps it from filling up the
    disk (which happened early on) but as you point out there is room for
    improvement. I added a cron job that runs "git gc --aggressive" each
    week. Here is the output from a manual run, which includes "git
    count-objects -v" before and after:

    2013-05-13 14:38:12: Started.
    2013-05-13 14:38:12: Synchronizing repository wireshark
    2013-05-13 14:38:12: Object count start
    count: 0
    size: 0
    in-pack: 316591
    packs: 45
    size-pack: 567146
    prune-packable: 0
    garbage: 0
    2013-05-13 14:38:12: Collecting garbage
    2013-05-13 15:09:56: Object count start
    count: 0
    size: 0
    in-pack: 316596
    packs: 2
    size-pack: 127499
    prune-packable: 0
    garbage: 0
    2013-05-13 15:09:56: Done


So it's been over a year since this conversation and we have actually
migrated to Git/Gerrit so I have no idea what Gerrit is doing in this
regard (is there even a "real" git repository backing it, or is it all
internal magic?), but I recently came across [1] which suggests that
repeated use of --aggressive maybe wasn't such a good idea after all.

It suggests just sticking to regular `git gc` except in cases of large
one-time imports (like we did on migration) at which point you should
run the apparently-very-slow `git repack -a -d --depth=250 --window=250`.

FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is
only 161MB, and my current desktop is actually not beefy enough to run
the recommended repack command so I have no idea what improvement that
would give.

It's a "real" git repository but any operations performed by Gerrit are
done using JGit. The weekly automatic number update script runs `gerrit
gc --all`, which uses JGit's garbage collector. Many sites including
Google appear to run it one or more times a day. We may want to to the
same.

I tried running git `repack -a -d --depth=250 --window=250` on the
server. It ran successfully and shrunk the repository from 248 MB to 208
MB but now the OS X builders are timing out during `git fetch`...


Hmm, that's interesting, I would have expected a bigger improvement (given
my local copy is still smaller than the one on the server). Perhaps it is
worth trying an --aggressive gc just once (or passing the -f and -F flags
to the existing repack command, which is probably even *more* aggressive).

No idea why the buildbots would be timing out... the gc shouldn't have
materially affected their ability to pull down deltas I don't think.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread: