Coda File System

Re: backup fails on large volumes?

From: Sean Caron <caron.sean_at_gmail.com>
Date: Tue, 5 Sep 2006 14:24:19 -0400
Hi Jan,

Thanks for the information -- very enlightening as to what is happening
here.

It turned out that it just looked like my large volumes were being dumped
correctly; what
would happen is that it would churn for a while, dumping, and sure enough,
whenever it
got to a certain amount, it would delete whatever it had so far in /backup
(or something),
throw up one of those overflowed numbers of bytes dumped, and proceed to the
next
volume.

I'll take a look at backup.cc when I get back home today and see if I can
get it to cooperate
with what you've told me here; otherwise I suppose that I will just have to
bite the bullet, move
my tape drive to a client machine, and give this amanda program a shot.

Thanks, Sean
scaron_at_umich.edu

On 9/5/06, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote:
>
> On Sun, Sep 03, 2006 at 05:06:03PM -0400, Sean Caron wrote:
> > Hmm... it seems to be working (so far) now, the second time around. I'll
> try
> > this again and see how it goes.
>
> Actually, I think it will probably not result in a valid volume dump.
>
>
> On Sun, Sep 03, 2006 at 04:41:44PM -0400, Sean Caron wrote:
> > I have recently added a bunch of data to my Coda cell, resulting in some
> > volumes that are a little larger than what I
> > have been working with recently.. say somewhere between three and six
> > gigabytes for a volume (can't tell for sure
> > since cfs lv hangs). When backup tries to gulp down one of these large
> > volumes, it seems as if a counter gets
> > overflowed somewhere and it doesn't dump the volume. An example entry
> from
> > the backup log looks something like,
> >
> > 16:03:33 Dumping 7f00001f.1000029 to
> > /backup/03Sep2006/blossom.diablonet.net-7f00001f.01000029 ...
> > 16:20:39                Transferred -636288144 bytes
> >
> > I haven't peeked at the code yet; just thought I would report it..
> wondering
> > how easy it would be to fix? The volumes
> > seem to work fine in normal use.. backup just chokes.
>
> That message is as far as I know harmless (incorrect, but harmless).
> However there is a real problem in the backup dump code which is fixable
> with an ugly hack.
>
> The problem is that the backup dump and restore code is not dumping a
> single large stream on the server side, but actually uses a temporary
> buffer (I think it was .5 or 1MB in size). It packs the buffer locally
> and then sends it to the process (volutil/backup) that requested the
> dump.
>
> However the related SFTP transfer code actually includes an offset
> indicating where this chunk of data should end up. This offset is
> an unsigned 32-bits integer. So once the server has send 4GB worth of
> chunks the offsets will wrap around to zero and we end up overwriting
> the start of the dumpfile.
>
> Now I was surprised when I actually got successful >4GB dumps with
> Amanda, which is using volutil dump and pretty much the same code as
> backup. The difference was that instead of writing to a local file on
> disk, Amanda writes the volume dump to a pipe to gzip (which then writes
> to the tcp socket to the backup server). The interesting thing about
> the pipe is that it doesn't allow seeking.
>
> Now because we have a single server sending each chunk in the correct
> order to a single client, the EPIPE errors when trying to seek are
> really not a problem and in fact not seeking ended up being the correct
> thing to do here.
>
> Currently the S_WriteDump code in backup.cc needs to know the seekoffset
> because it reopens the file for every chunk and wants to know where to
> write the next chunk. It doesn't append, but actually prefers to
> truncate and overwrite and passing a seekoffset is the only way to make
> it append.
>
> However, it is possible to open the dump file only once, and pass an
> already open filedescriptor to SFTP, this will avoid the need for
> seeking. I just checked and the S_WriteDump implementation in volutil is
> passing the open fd (which is why it could write to a pipe). The only
> problem there is that it is still messing around by passing a 32-bit
> seekoffset, which probably should be set to -1.
>
> Jan
>
>
Received on 2006-09-05 17:26:31