(Illustration by Gaich Muramatsu)
Your idea did occur to me -- but as you'll see below I was mainly changing infrastructure, not (yet) rds. We use magic constants in guards before and after the regions allocated. I suppose that rds_fake_free could change the constant from "malloced" to "fakefreed" and then rds_do_free would change it again to "freed". In that way we could stop right at the offending point. First of all, there seems to be a misunderstanding either in the code or in the man page for rds_fake_free. The man page says that rds_do_free is to be called just BEFORE rvm_end_transaction. I think this is not safe and must be called AFTER rvm_end_transaction. Calling AFTER is safe -- only the risk of memory leak exists. By calling rds_do_free first there is a risk of reallocating memory and clobbering which is much more serious. So I think the man page for rds_fake_free is wrong and the code is right. One would have to think _extremely_ carefully about the following problem: which transaction (if any) is this change of the guard from "malloced" to "fakefreed" part of? What happens if the system crashes between "rvm_end_transaction" and "rds_do_free" -- should we end with "malloced" or "fakefreed" in the constants. Currently this situation can lead to a memory leak because we keep malloced. In any case it cannot be put to "freed" because another thread would then be able to reallocate and clobber. The memory leadk is very unlikely since we do not flush rvm_end_transaction, we only flush at rds_do_free even if the transaction is flush. (This happens in rvmlib_end_transaction -- the version you have is pretty unintelligible, but the new one is pretty clean.) It might be best to make the "fakefreed" constant part of the main transaction and additionally weave the fake-freed area into a (new) "fakefreed_list" -- rvmlib_rec_free should do this. Then the following holds: - main transaction: changes guard from malloced to fakefreed. Moves area onto the fakefreed list. rds_malloc cannot reallocate. - rds_do_free transaction changes guard from fakefreed to free. Moves area from the fakefreed list to the free list. The crashing behaviour is then as follows: - crash before rvm_end_transaction: we end up with the state before begin transaction. Good. - crash after rvm_end_transaction and before rds_do_free: the transaction went through and the faked blocks are visible on the fakefreed_list. We can garbage collect them on the next startup of rds. We have eliminated the memory leak. In fact most of the work I needed to put in was to rewrite the rvmlib.{c,h} files to make it possible to step into the routines with gdb; I also removed the non-local gotos used for aborting transactions. Then the printing was a mess (it printed addresses that were totally wrong, then only wrong by sizes of guards etc. and hard to understand). Finally the assertions were wrong since they removed a stack frame from the stack, and sadly they always removed the most important one. So basically I just went straight ahead and didn't modify the rds_fake_free routine. Unfortunately, introducing an fakefreed_list might mean a reinitialization of rvm which is not pleasant... Peter J.A. Harkes writes: > > "Peter J. Braam" wrote: > > > > This is a message about improvements I have made to the RVM handling, with > > a view to increasing our debugging capability dramatically. > > > <snip> > > Isn't the same checking possible by using magic-numbers. A the state of > a segement (allocated/freed) is marked using some `magic' value. A > subsequent begin_transaction or free operation can then test whether the > segment is already `marked' free, and complain loudly when it is. > > JanReceived on 1998-04-03 10:42:47