(Illustration by Gaich Muramatsu)
On Mon, Feb 23, 2004 at 07:35:37PM -0500, Jan Harkes wrote: ... ... <snipped description of using Amanda to perform Coda backups> ... > I'm not yet 100% convinced about the reliability, it was working quite > well during the experimental setup when my desktop was running the > Amanda server, but since we moved that functionality to a machine in the > lab a lot of incremental backups seem to fail. Haven't figured out yet > what exactly is causing that. Yay, found the culprit, the network link to the new backup server turned out to be a bit extremely lossy and triggers a synchronization problem in the UDP communication between the Amanda server (dumper?) and amandad on the Amanda client. What happens is that the ACK for a reply is lost when a sendbackup is started, so amandad retransmits the RSP packets a couple of times. (these contain the data and mesg ports that dumper is supposed to connect to). Now when the next volume is scheduled, the dumper sees the old RSP packet and assumes it belongs to the current REQ. So it ends up trying to connect to ports that are long gone. The following patch makes sure that we ignore RSP packets that have an incorrect sequence number, and thus are clearly not responses to our currently outstanding request. Jan --- amanda-2.4.4/common-src/protocol.c.orig 2003-04-24 15:38:25.000000000 -0400 +++ amanda-2.4.4/common-src/protocol.c 2004-02-24 14:12:06.000000000 -0500 @@ -733,7 +733,7 @@ amfree(p); return; } - else if(pkt->type == P_REP) { + else if(pkt->type == P_REP && pkt->sequence == p->origseq) { /* no ack, just rep */ p->state = S_REPWAIT; break; @@ -764,7 +764,7 @@ else if(action != A_RCVDATA) goto badaction; /* got the packet with the right handle, now check it */ - if(pkt->type != P_REP) { + if(pkt->type != P_REP || pkt->sequence != p->origseq) { pending_enqueue(p); return; }Received on 2004-02-24 14:34:26