Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Tue, 28 Jan 2003 02:07:27 -0500

Hi all,

The hassle with libdb is about to come to an end. It is impossible to
keep support for the 'standard' libdb 1.85, which is very aggressively
being deprecated. Already we have to pull in special 'oldlibs' or
'compat' packages, but now even the development libraries and headers
are being dropped.

Newer libdb's from sleepycat software are not very confidence inspiring,
beginning with how agressively they've been 'unsupporting' the old db1
format. Changing the names of header files (db.h -> db1.h -> db185.h)
and the library (libdb.so, libdb1.so, libdb1-compat, etc.) And sometimes
providing a wrapper around db2, but using the old library name which
result in databases that cannot be read by systems that still use 1.85.
Just one look at the configure.in test for a compatible libdb is enough
to give instant headaches, and it doesn't even cover all cases...

At the moment my system has libdb2, libdb3, and libdb4.0. What are the
differences? Are there just API differences, or file format changes. And
even if a libdb4.0 application can read db2 file, will a libdb2 linked
application be able to read it after the newer library has updated it.
And a lot of the newly introduced functionality, like transactions, is
really not needed by most applications.

Too many questions, so I've been looking around how other projects are
dealing with this situation. And they are dealing badly. Samba built
their own trivial database library (tdb), Enlightenment came up with
libedb, some projects went back to gdbm, etc.

I finally decided that Coda's needs were both incredibly limited, and
very specialized. We currently only use libdb for accessing the user and
group membership information.

 - We only need a simple 'name/uid' to 'object' mapping, a trivial hash
   based lookup table is more than sufficient.
 - Predominantly read access, very infrequent writes.
 - Concurrent readers, single writer model is not a problem at all.
 - Databases need to be shareable across heterogeneous systems by a
   simple file copy across the network (updateclnt/updatesrv).
 - Can't use thread libraries as they conflict with LWP.

Based on these criteria most of the libraries I looked at were not
applicable. The only promising exception was Dan Bernstein's CDB. It has
a very simple file layout and is extremely efficient for lookups even
when dealing with large datasets. And the basic file format has been
stable since 1996. The only problem was that it required a bunch of
separate tools to convert a read/write 'master copy' into the
efficiently indexed 'constant database' format.

I started off with a read-only implementation according to the
specifications. This ended up around 370 lines of code. This part of the
code is used most of the time. Adding write support bumped the total
size to about 716 lines. The generated databases are fully compatible
with the original CDB specifications and can be read with the official
tools and vice versa.

This 'rwcdb' library has been committed to CVS (coda/lib-src/rwcdb). The
sources are released under the LGPL, it might be as useful for other
projects as it is for Coda.

To prepare for the format change, it is best to keep an exported version
of the user and group databases alongside the prot_{index,user}.pdb
files. Simply run 'pdbtool export coda.users coda.groups' and remember
to re-export whenever the pdb databases are updated, i.e when new users
or groups are added. Once the new format becomes active, the new
database can be built with 'pdbtool import coda.users coda.groups'.

I already have the necessary changes for pdbtool, auth2 and codasrv
ready, and will probably commit them to CVS later this week.

Jan

Coda File System

Dropping support for libdb