Coda File System

Files/load distribution across realms

From: Ivan Popov <pin_at_medic.chalmers.se>
Date: Fri, 1 Oct 2004 17:14:11 +0200
Hello,

a question which appeared as a result of a private mail exchange with Troy.

Coda behaves well with weak connectivity, but still it micht be nice
to have some "cache-like" data stored somewhere else than the servers
and the clients.

How would we be able to build geographically distinct repositories of
files present on one realm, for mostly readonly access?

Lookaside seems to be a good solution for that, while ideally we'd like
to use _Coda_ besides/instead of some alternative "out-of-band" mechanisms.

Now I speculate what happens if I'd tell Venus to look for lookaside
candidates on _Coda_ file system. If we manage to avoid deadlocks,
Venus could use a file cached from one realm to satisfy file accesses
relating to another realm.

As an imaginary example,

I have /coda/slowlink.realm
Troy has a well-maintained /coda/fast.realm
Jan has a client.

I ask Troy to give me write access on his realm on a world-readable subtree.
I build there a separate tree of my to-be-world-readable files,
the files being named after their checksums.

Then say once a week I rebuild that checksum-files-repository....

Jan builds a "lookaside" database from the subtree on Troy's realm
(which is logically equivalent to "find ." if the files are named after
their checksums! so Jan does not need to fetch the contents for that)

Jan tells its Venus to use that lookaside database.

Jan's client accesses /coda/slowlink.realm/xyz, finds out that there is
a similar copy on /coda/fast.realm/slowlink-realm-mirror/8324287346378247352
and fetches that copy instead.

In some cases the file is not present or fetched copy's checksum
does not coincide! (file changed or forged), then the client looks further
in the lookaside database for the next fetch candidate or finally uses
/coda/slowlink.realm/xyz anyway.

Then Coda would get transparent "readonly replication" feature across
realms (inside a realm there is no need for that).

Jan, would you comment, is it possible to avoid deadlocks in that situation,
do I miss something important which makes it impossible or not useful?

It would be nice to have the possibility to mirror data in geographically
distant locations, still keeping _all_ semantics and functionality intact.

The next step would be to encrypt each file with its checksum
(so if you have got a checksum, you can read the file), but name
the files in the lookaside repositories after another_checksum_of_checksum
so that you can not comprehend a randomly fetched file, but you can always
fetch one you know the checksum for.
With a careful implementation it should be pretty much resistant mot brute
force attacks, still you are always free to _not_ expose your files in that way.

With that encryption step non-world-readable data would be distributable
as well as world-readable. Not that I feel I need that, but somebody probably
would.

Regards,
--
Ivan
Received on 2004-10-01 11:15:17