Coda File System

Re: Coda development roadmap

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 1 Aug 2014 13:54:06 -0400
On Thu, Jul 31, 2014 at 03:19:01PM +0200, u-codalist-z149_at_aetey.se wrote:
> Based on the received comments I consider the following development plan,
> aimed at getting rid of complexity and of non-essential limitations.
...
> The definition of the relevant core functionality is based on my
> experience of using Coda for myself and deploying it at Aetey and Chalmers
> for about 12 years - and of course on your comments or lack of those.

I have actually been an active Coda user for 18 years and a developer
for about 16 of those, and have been giving you feedback which has been
completely ignored. So it seems completely pointless for me to give you
feedback on any of this roadmap anyway, but here I go.

Trying to appropriately address some of the issues in these long emails
seems to just make my response look long and rambling. So I'll just say
here that I disagree with quite a lot in your email and will highlight a
few here.

> The experience reflects the following kinds of scenarios,
> in the order of diminishing importance:
> 
> - delivering software to *nix-like workstations and servers
>   avoiding any dependency on locally installed software
> - large scale administration of *nix-like workstation (solution
>   originally developed at Chalmers, "hotbeat")
> - storing the data to be used/published via web-like services
> - accessing one's own personal and/or work-related data
>   (aka homedir and alike)
> - storing mail (in Maildir)

It looks like in this list actual users are listed as #4 and #5, but the
top priorities are sysadmin/package install stuff that can be done with
something like rsync and a nightly cronjob.

> - storing mail (in Maildir)
>   value of Coda: convenient, eliminates the need for an extra protocol
>         (like IMAP) and extra authentication and authorization management,
>         mail contents is consistently cached at the client/MUA,
>         MXs can act in parallel instead of buffering/resending

Ignoring the 'minor' inconveniences of 4096 entry directory limitations,
this is actually only convienient if your email application treats a
Maildir folder pretty much like an IMAP server because building a simple
index of all the email in a folder requires an access to every email. My
inbox currently has 16864 emails, and that doesn't even include
mailinglist traffic which is placed in their own respective folders.

But luckily mutt, kmail, etc. do create their own index caches which
significantly speed up loading large maildir folders. However the way
these caches are updated often do not have the same lockless properties
of maildir itself, so now instead of getting conflicts on the email
folders, you get conflicts on the index. The one good thing is that at
least that doesn't prevent delivery... unless when you deliver on the
same machine as where you read your email and there is a conflict on the
index, everything gets appended to the CML and nothing is propagated
back to the servers even if you have tokens. Guess how much email you
can lose when you install a new client.

> - volume names to be treated as comments, meant for humans only,
>   dropping the corresponding indirection layer and the related code

The corresponding indirection layer is only used for humans. Internally
Coda clients and servers use the volume id, the only places the name is
used is for cfs makemount and when volume ids are mapped back to names
when we display updates in f.i. codacon or cfs listvol.

> - clients need to contact VSGs but servers only need to contact AVSGs,
>   severs have also higher demands on reliability of the mapping
>   AVSG (a set of server ids) => set of endpoints (ip:port),
>   the mapping is to be implemented by a "db/servers"-lookalike

Do you actually know what AVSG means? Because it isn't just a set of
serverids.

> - volume ids shall be maintained realm-wise not server-wise
>   (each replica of the same volume shall bear the same volume id),
>   dropping the extra mapping from repvol to volreps and the corresponding
>   code

Then you cannot expand a conflict on the client and deal with the fact
that we then have to compare different directory objects which will have
identical object identifiers (realm.volume.vnode.uniquefier).

The replicated volume and volume replica distinction goes much deeper
than just making for confusing repvol vs. volrep naming.

    Replicated volumes
        - Only exist on the client and in the VRDB file on the server
        - Have callbacks, cache revalidation, disconnected operation.
        - Are built by combining one or more volume replicas.

    Volume replicas
        - Actually have a physical presence on a single Coda server
        - Use check-on-open for cache consistency.
        - Are normally only visible when a conflict is expanded or when
          a backup/snapshot volume is mounted.

During repair you need to be able to show all copies of a particular
object that has a conflict. So when you run cfs expand foo, it will turn
foo into a directory that contains 'localhost' which maps to the local
copy (replicated volume) and a copy for every server that stores a copy
of the object (each volume replica). So now when we ask the kernel to
open one of these copies, the fact that they use different volume-ids is
essential.

> - the kernel part of the Coda client is to be simplified by dropping
>   the pioctl part which should instead go via a plain socket out of
>   the /coda name space, importing the change from Ulocoda

The kernel part of the pioctl interface is tiny. text is 163 bytes and
data is 200 bytes, it just is a copy-in-copy-out in the kernel module.
But as it does that it also tags the request with the actual uid of the
user making the request. As far as I remember the 'plain socket' was a
unix domain socket with no cross-platform solution that tells Venus the
uid of the connecting client. So either relying on the politeness of the
client to not lie about the identity of the user making the request, or
assuming that every system is a single user system.

Moving to FUSE would be a much better approach if you want to get rid of
kernel complexity. Patches are welcome.

Jan
Received on 2014-08-01 13:54:16