(Illustration by Gaich Muramatsu)
On Sun, Sep 03, 2006 at 05:06:03PM -0400, Sean Caron wrote: > Hmm... it seems to be working (so far) now, the second time around. I'll try > this again and see how it goes. Actually, I think it will probably not result in a valid volume dump. On Sun, Sep 03, 2006 at 04:41:44PM -0400, Sean Caron wrote: > I have recently added a bunch of data to my Coda cell, resulting in some > volumes that are a little larger than what I > have been working with recently.. say somewhere between three and six > gigabytes for a volume (can't tell for sure > since cfs lv hangs). When backup tries to gulp down one of these large > volumes, it seems as if a counter gets > overflowed somewhere and it doesn't dump the volume. An example entry from > the backup log looks something like, > > 16:03:33 Dumping 7f00001f.1000029 to > /backup/03Sep2006/blossom.diablonet.net-7f00001f.01000029 ... > 16:20:39 Transferred -636288144 bytes > > I haven't peeked at the code yet; just thought I would report it.. wondering > how easy it would be to fix? The volumes > seem to work fine in normal use.. backup just chokes. That message is as far as I know harmless (incorrect, but harmless). However there is a real problem in the backup dump code which is fixable with an ugly hack. The problem is that the backup dump and restore code is not dumping a single large stream on the server side, but actually uses a temporary buffer (I think it was .5 or 1MB in size). It packs the buffer locally and then sends it to the process (volutil/backup) that requested the dump. However the related SFTP transfer code actually includes an offset indicating where this chunk of data should end up. This offset is an unsigned 32-bits integer. So once the server has send 4GB worth of chunks the offsets will wrap around to zero and we end up overwriting the start of the dumpfile. Now I was surprised when I actually got successful >4GB dumps with Amanda, which is using volutil dump and pretty much the same code as backup. The difference was that instead of writing to a local file on disk, Amanda writes the volume dump to a pipe to gzip (which then writes to the tcp socket to the backup server). The interesting thing about the pipe is that it doesn't allow seeking. Now because we have a single server sending each chunk in the correct order to a single client, the EPIPE errors when trying to seek are really not a problem and in fact not seeking ended up being the correct thing to do here. Currently the S_WriteDump code in backup.cc needs to know the seekoffset because it reopens the file for every chunk and wants to know where to write the next chunk. It doesn't append, but actually prefers to truncate and overwrite and passing a seekoffset is the only way to make it append. However, it is possible to open the dump file only once, and pass an already open filedescriptor to SFTP, this will avoid the need for seeking. I just checked and the S_WriteDump implementation in volutil is passing the open fd (which is why it could write to a pipe). The only problem there is that it is still messing around by passing a 32-bit seekoffset, which probably should be set to -1. JanReceived on 2006-09-05 14:14:06