(Illustration by Gaich Muramatsu)
Today I discovered 2 serious bugs. one has only been in the development tree for a couple of weeks. The other must have been present for the past 6-8 months. The `development' bug affects only store operations to replicated servers. At some point we switched the file transfer code from FILEBYNAME to FILEBYFD, but the code in sftp for the BYFD case has possibly had a bug ever since multi-rpcs were introduces. It didn't handle the fact that the fileoffset is shared between the multiple transfers. So the first server would get the first window, the second server the second window, etc. Until the file is transferred at which point the servers complain about a length discrepancy and return an error. As a result, the client switches to write disconnected operation, and reintegrates the store. The backfetches are normal rpcs and the reintegration succeeds. The other bug is related to shadowing files during reintegration. When a file is opened for writing but involved in a reintegration, a new container is created and the data is copied over. Then a swap is performed to avoid interfering with a possibly ongoing backfetch. This swap is not recorded in RVM, and the next time venus starts it attaches the fso to the discarded shadow file. If the fso was marked dirty, an assertion is triggered, otherwise the file is refetched. Also, due to how shadowfile names are created, if a file is shadow-ed twice, the containerfile is truncated during the copy. Instant dataloss. It is a testament to the robustness and failure tolerance of Coda's design that these problems have gone mostly unnoticed. I've committed experimental patches to the CVS for both bugs, but have only tested them lightly. Jan ps. There is one known Y2K problem, but it is quite harmless. Look in your codacon in the new year, it will show 01/01/100 :) (except if you run bleeding edge with the aforementioned other bugs).Received on 1999-12-31 00:03:34