(Illustration by Gaich Muramatsu)
Hi Jan, Thanks for the information -- very enlightening as to what is happening here. It turned out that it just looked like my large volumes were being dumped correctly; what would happen is that it would churn for a while, dumping, and sure enough, whenever it got to a certain amount, it would delete whatever it had so far in /backup (or something), throw up one of those overflowed numbers of bytes dumped, and proceed to the next volume. I'll take a look at backup.cc when I get back home today and see if I can get it to cooperate with what you've told me here; otherwise I suppose that I will just have to bite the bullet, move my tape drive to a client machine, and give this amanda program a shot. Thanks, Sean scaron_at_umich.edu On 9/5/06, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: > > On Sun, Sep 03, 2006 at 05:06:03PM -0400, Sean Caron wrote: > > Hmm... it seems to be working (so far) now, the second time around. I'll > try > > this again and see how it goes. > > Actually, I think it will probably not result in a valid volume dump. > > > On Sun, Sep 03, 2006 at 04:41:44PM -0400, Sean Caron wrote: > > I have recently added a bunch of data to my Coda cell, resulting in some > > volumes that are a little larger than what I > > have been working with recently.. say somewhere between three and six > > gigabytes for a volume (can't tell for sure > > since cfs lv hangs). When backup tries to gulp down one of these large > > volumes, it seems as if a counter gets > > overflowed somewhere and it doesn't dump the volume. An example entry > from > > the backup log looks something like, > > > > 16:03:33 Dumping 7f00001f.1000029 to > > /backup/03Sep2006/blossom.diablonet.net-7f00001f.01000029 ... > > 16:20:39 Transferred -636288144 bytes > > > > I haven't peeked at the code yet; just thought I would report it.. > wondering > > how easy it would be to fix? The volumes > > seem to work fine in normal use.. backup just chokes. > > That message is as far as I know harmless (incorrect, but harmless). > However there is a real problem in the backup dump code which is fixable > with an ugly hack. > > The problem is that the backup dump and restore code is not dumping a > single large stream on the server side, but actually uses a temporary > buffer (I think it was .5 or 1MB in size). It packs the buffer locally > and then sends it to the process (volutil/backup) that requested the > dump. > > However the related SFTP transfer code actually includes an offset > indicating where this chunk of data should end up. This offset is > an unsigned 32-bits integer. So once the server has send 4GB worth of > chunks the offsets will wrap around to zero and we end up overwriting > the start of the dumpfile. > > Now I was surprised when I actually got successful >4GB dumps with > Amanda, which is using volutil dump and pretty much the same code as > backup. The difference was that instead of writing to a local file on > disk, Amanda writes the volume dump to a pipe to gzip (which then writes > to the tcp socket to the backup server). The interesting thing about > the pipe is that it doesn't allow seeking. > > Now because we have a single server sending each chunk in the correct > order to a single client, the EPIPE errors when trying to seek are > really not a problem and in fact not seeking ended up being the correct > thing to do here. > > Currently the S_WriteDump code in backup.cc needs to know the seekoffset > because it reopens the file for every chunk and wants to know where to > write the next chunk. It doesn't append, but actually prefers to > truncate and overwrite and passing a seekoffset is the only way to make > it append. > > However, it is possible to open the dump file only once, and pass an > already open filedescriptor to SFTP, this will avoid the need for > seeking. I just checked and the S_WriteDump implementation in volutil is > passing the open fd (which is why it could write to a pipe). The only > problem there is that it is still messing around by passing a 32-bit > seekoffset, which probably should be set to -1. > > Jan > >Received on 2006-09-05 17:26:31