Coda wishlist/todolist/issues

It all depends on what you'd like to work on, there is a lot to be found in many areas.

Here is a broad overview of projects based on the current issues I have with the implementation ordered by 'subsystem',

LWP, the userspace threading package

It is a small hurdle for any new system we port to because it relies on assembler code to switch stacks.
We seem to have problems when we use libraries that use pthreads.
No convenient way to debug threading/locking problems, gdb doesn't understand our threads and I only got as far as including a function to dump the return addresses on the stack of all threads on i386 only. And it happens to crash somewhere while dumping, so the trace is often incomplete.
We cannot use any blocking operation without blocking the whole process, this is a problem with dns lookups, and fsync when there are a lot of dirty pages in the VM.
We have issues with C++ exception handling, so we cannot use exceptions in our C++ code anywhere.

All of these issues could probably be solved by using the implementation of LWP that runs on top of pthreads, which is available in the same source tarball. However there are assumptions in the existing code that uses LWP about locking and some confusion about which thread runs first after a fork, etc. and that threads never really run concurrently, so a lot of stuff is done without proper locking. The pthread based LWP implementation tries to deal with these issues, and with some effort you can actually get a Coda client and server running on top of pthreads. However, it has not been used much at all and it is barely stable enough to valgrind a pthreaded Coda client. Win9x/NT don't have pthreads so we still have to rely on the old LWP implementation for some platforms.

RVM, recoverable (persistent) virtual memory

It is very hard to debug any RVM related problems, because it really nicely handles the error internally and returns some error code that is one in a dozen (i.e. we can't really tell where the problem occured). No information is printed at to where or why we exactly failed.
The RDS allocator was a relatively 'quick hack'. It suffers from long term fragmentation issues which only show up once servers have been running for a long period of time. Remember, even across restarts they use the same allocated areas, that's the nice thing about persistency.
In my opinion, RDS pretty much needs to be redone, probably a lower level allocator that hands out memory on a page granularity, and on top of that a more specialized slab allocator that hands out sub-pagesize objects. This is as far as I know pretty similar to how Solaris and Linux deal with memory allocation.

One useful extension for a slab allocator is a callback to the application, so that we can tell higher layers to move an object. The object can then moved by allocating a new object, fixing up any references and freeing the old object, the object can be discarded if it isn't all that critical, or the callback can be ignored if it's too hard to move or we have a temporary problem locking the object etc. When objects are reallocated they move to higher order slabs and this way we get a better fill ratio of the slabs.

Another solution would be to implement an allocation strategy which is similar to what f.i. PalmOS is using. Instead of keeping pointers, we store handles. The handle can be converted into a pointer which happens to lock the in RVM object. Any unlocked objects can be freely moved around by the allocator whenever it needs to get some more space during an allocation. However this approach would require extensive changes to any applications that use RVM.
20000 lines of code for just the logging/recovery, I've got an initial implementation that heavily uses the facilities provided by the OS (mmap/munmap/mprotect/segfault handler) which is only about 2000 lines and although it is a bit slower, pretty much does the same things.
For the RVM complexity and debugging issues, the 2000 line implementation could be cleaned up a bit more, and has some nice optimization (and in some areas redesign) opportunities. Also it has only been tested by running the rvm_basher, a program that does many random transaction and checks whether everything is consistent. Real code, such as the Coda client also store 'non-persistent' data in the persistent memory area. These could be considered bugs in Venus and wrapped in simple, efficient 'no-flush' transactions, or we have to learn to deal with the added complexity.

RPC2, remote procedure call

Probably the biggest issue, it's not 64-bit clean. Correcting the basic datatype (RPC2_Integer) to be 'int' instead of 'long' should go a long way, but there are literally hundreds of places where we have explicit casts, or try to stuff a possibly 64-bit value into the then 32-bit integer. Mostly on 64-bit platforms, pointers, but on some 32-bit systems possibly even off_t and size_t. And these casts are often hiding the compiler warnings.
There are issues with timeout handling, all logical client- > server connections do their own retransmissions, unaware of how many others are trying to push data through a possibly weak link.
Detecting and cleaning up dead connections doesn't work on the server side of the RPC2 connection, so we do keepalive pinging, not only from the client to the server, but also in the reverse direction.
Having a single bidirectional client < > server connections that deals with timeout/keepalive handling on which the logical RPC2 connections are multiplexed would help here.
SFTP, the streaming transport has windowing and all that stuff, but really isn't up to par. It doesn't do slow-start or window scaling and useful things like that.
SFTP could be multiplexed into the same host-to-host connections as RPC2, as long as it doesn't 'clog the pipe'. i.e. when the host-to-host connection is a tcp-pipe, we cannot simply use 'sendfile' for the data transfers. That would block out all other concurrent traffic until the data-transfer is finished.
There is no real strong encryption, in fact adding strong encryption will likely turn up a couple of implementation mistakes where we are modifying/checking data in the encrypted packet without first decrypting it.

Coda clients

There are some problems with object locking/refcounting. Sometimes during/after a conflict is repaired, Venus thinks the object is still open and refuses to fetch the fixed copy, there are some occasional crashes during object destruction, etc.
Local-global repair is an ugly implementation, doesn't allow us to repair the case where we had a reintegration conflict that resulted from a server conflict, in fact it hides the server conflict and we need another client to fix such a situation. And local object are 'temporarily' moved into a faked volume, so that they won't cause problems with the objects with identical file identifiers on the server. However first of all, sometimes objects are accidently not moved back (crashes the client during restart), and it's an ugly solution anyways.
Merging the functionality of local-global and server-server conflicts, so that all conflicts show up as 'local replica1 replica2 replica3' would be great. Problems, server-server conflicts don't use a fake volume for the expansion like local-global is using, so we cannot 'mount' the copy in the cache as 'local'. The repair program actually already has a lot of the ground work done to combine l-g and s-s repair.

Coda server

A lot can be improved in the volume handling. If the VSG structure is needed at all (probably is for replication/resolution purposes only) VSG's can be created on-the-fly from information that is available in the VRDB, instead of having a fixed set in the VSGDB file. Removing the direct dependencies on hardcoded VSG's would allow us to resize/move single volumes. Current CVS removed the use of the VSGDB in all but one location, createvol_rep, and it shouldnt' take much effort to remove the last VSG dependency
The servers could use a replicated database for storage of things that are currently in files in/vice/db and copied around by the updateclnt/updatesrv daemons. AFS uses Ubik, maybe we could use that or something similar. But, mainly it just needs to be really fast for reading, and consistent across all servers after any write operation, i.e. either all servers have the new data or none.