Coda wishlist/todolist/issues
It all depends on what you'd like to work on, there is a lot to
be found in many areas.
Here is a broad overview of projects based on the current issues
I have with the implementation ordered by 'subsystem',
LWP, the userspace threading package
-
It is a small hurdle for any new system we port to because it
relies on assembler code to switch stacks.
-
We seem to have problems when we use libraries that use
pthreads.
-
No convenient way to debug threading/locking problems, gdb
doesn't understand our threads and I only got as far as including a
function to dump the return addresses on the stack of all threads
on i386 only. And it happens to crash somewhere while dumping, so
the trace is often incomplete.
-
We cannot use any blocking operation without blocking the whole
process, this is a problem with dns lookups, and fsync when there
are a lot of dirty pages in the VM.
-
We have issues with C++ exception handling, so we cannot use
exceptions in our C++ code anywhere.
All of these issues could probably be solved by using the
implementation of LWP that runs on top of pthreads, which is
available in the same source tarball. However there are assumptions
in the existing code that uses LWP about locking and some confusion
about which thread runs first after a fork, etc. and that threads
never really run concurrently, so a lot of stuff is done without
proper locking. The pthread based LWP implementation tries to deal
with these issues, and with some effort you can actually get a Coda
client and server running on top of pthreads. However, it has not
been used much at all and it is barely stable enough to valgrind a
pthreaded Coda client. Win9x/NT don't have pthreads so we still
have to rely on the old LWP implementation for some platforms.
RVM, recoverable (persistent) virtual memory
-
It is very hard to debug any RVM related problems, because it
really nicely handles the error internally and returns some error
code that is one in a dozen (i.e. we can't really tell where the
problem occured). No information is printed at to where or why we
exactly failed.
-
The RDS allocator was a relatively 'quick hack'. It suffers
from long term fragmentation issues which only show up once servers
have been running for a long period of time. Remember, even across
restarts they use the same allocated areas, that's the nice thing
about persistency.
In my opinion, RDS pretty much needs to be redone, probably a
lower level allocator that hands out memory on a page granularity,
and on top of that a more specialized slab allocator that hands out
sub-pagesize objects. This is as far as I know pretty similar to
how Solaris and Linux deal with memory allocation.
One useful extension for a slab allocator is a callback to the
application, so that we can tell higher layers to move an object.
The object can then moved by allocating a new object, fixing up any
references and freeing the old object, the object can be discarded
if it isn't all that critical, or the callback can be ignored if
it's too hard to move or we have a temporary problem locking the
object etc. When objects are reallocated they move to higher order
slabs and this way we get a better fill ratio of the slabs.
Another solution would be to implement an allocation strategy
which is similar to what f.i. PalmOS is using. Instead of keeping
pointers, we store handles. The handle can be converted into a
pointer which happens to lock the in RVM object. Any unlocked
objects can be freely moved around by the allocator whenever it
needs to get some more space during an allocation. However this
approach would require extensive changes to any applications that
use RVM.
-
20000 lines of code for just the logging/recovery, I've got an
initial implementation that heavily uses the facilities provided by
the OS (mmap/munmap/mprotect/segfault handler) which is only about
2000 lines and although it is a bit slower, pretty much does the
same things.
For the RVM complexity and debugging issues, the 2000 line
implementation could be cleaned up a bit more, and has some nice
optimization (and in some areas redesign) opportunities. Also it
has only been tested by running the rvm_basher, a program that does
many random transaction and checks whether everything is
consistent. Real code, such as the Coda client also store
'non-persistent' data in the persistent memory area. These could be
considered bugs in Venus and wrapped in simple, efficient
'no-flush' transactions, or we have to learn to deal with the added
complexity.
RPC2, remote procedure call
-
Probably the biggest issue, it's not 64-bit clean. Correcting
the basic datatype (RPC2_Integer) to be 'int' instead of 'long'
should go a long way, but there are literally hundreds of places
where we have explicit casts, or try to stuff a possibly 64-bit
value into the then 32-bit integer. Mostly on 64-bit platforms,
pointers, but on some 32-bit systems possibly even off_t and
size_t. And these casts are often hiding the compiler
warnings.
-
There are issues with timeout handling, all logical
client-
>
server connections do their own retransmissions, unaware
of how many others are trying to push data through a possibly weak
link.
-
Detecting and cleaning up dead connections doesn't work on the
server side of the RPC2 connection, so we do keepalive pinging, not
only from the client to the server, but also in the reverse
direction.
Having a single bidirectional client
<
>
server connections
that deals with timeout/keepalive handling on which the logical
RPC2 connections are multiplexed would help here.
-
SFTP, the streaming transport has windowing and all that stuff,
but really isn't up to par. It doesn't do slow-start or window
scaling and useful things like that.
SFTP could be multiplexed into the same host-to-host connections
as RPC2, as long as it doesn't 'clog the pipe'. i.e. when the
host-to-host connection is a tcp-pipe, we cannot simply use
'sendfile' for the data transfers. That would block out all other
concurrent traffic until the data-transfer is finished.
-
There is no real strong encryption, in fact adding strong
encryption will likely turn up a couple of implementation mistakes
where we are modifying/checking data in the encrypted packet
without first decrypting it.
Coda clients
-
There are some problems with object locking/refcounting.
Sometimes during/after a conflict is repaired, Venus thinks the
object is still open and refuses to fetch the fixed copy, there are
some occasional crashes during object destruction, etc.
-
Local-global repair is an ugly implementation, doesn't allow us
to repair the case where we had a reintegration conflict that
resulted from a server conflict, in fact it hides the server
conflict and we need another client to fix such a situation. And
local object are 'temporarily' moved into a faked volume, so that
they won't cause problems with the objects with identical file
identifiers on the server. However first of all, sometimes objects
are accidently not moved back (crashes the client during restart),
and it's an ugly solution anyways.
Merging the functionality of local-global and server-server
conflicts, so that all conflicts show up as 'local replica1
replica2 replica3' would be great. Problems, server-server
conflicts don't use a fake volume for the expansion like
local-global is using, so we cannot 'mount' the copy in the cache
as 'local'. The repair program actually already has a lot of the
ground work done to combine l-g and s-s repair.
Coda server
-
A lot can be improved in the volume handling. If the VSG
structure is needed at all (probably is for replication/resolution
purposes only) VSG's can be created on-the-fly from information
that is available in the VRDB, instead of having a fixed set in the
VSGDB file. Removing the direct dependencies on hardcoded VSG's
would allow us to resize/move single volumes.
Current CVS
removed the use of the VSGDB in all but one location,
createvol_rep, and it shouldnt' take much effort to remove the last
VSG dependency
-
The servers could use a replicated database for storage of
things that are currently in files in/vice/db and copied around by
the updateclnt/updatesrv daemons. AFS uses Ubik, maybe we could use
that or something similar. But, mainly it just needs to be really
fast for reading, and consistent across all servers after any write
operation, i.e. either all servers have the new data or none.
There is more, but for some reason I always start with LWP, and
get tired of writing things down by the time I arrive at Coda
clients, which is why Coda servers probably need the most work
;)