Coda File System

From: <u-codalist-z149_at_aetey.se> Date: Thu, 31 Jul 2014 15:19:01 +0200

Based on the received comments I consider the following development plan,
aimed at getting rid of complexity and of non-essential limitations.

The primary goal is to make the codebase more understandable and manageable.

The main approach is "no code should be present for support of
possibilities/features which can be implemented independently or safely
postponed until the need arises".

When the code base is simplified and deployment streamlined,
further improvemens (say, supporting ipv6) will become feasible.

The definition of the relevant core functionality is based on my
experience of using Coda for myself and deploying it at Aetey and Chalmers
for about 12 years - and of course on your comments or lack of those.

-------------------------------------------------------------------------

The experience reflects the following kinds of scenarios,
in the order of diminishing importance:

- delivering software to *nix-like workstations and servers
  avoiding any dependency on locally installed software
  value of Coda: crucial, no other file system can be a proper
        substitute (global name space, zero client maintenance,
        versatile strong authentication, consistent caching,
        disconnected mode, a Coda server reboot/maintenance does not
        disrupt the service or need extra administrative actions)

- large scale administration of *nix-like workstation (solution
  originally developed at Chalmers, "hotbeat")
  value of Coda: crucial, no other file system can be a proper
        substitute (global name space, zero client maintenance,
        versatile strong authentication, consistent caching,
        disconnected mode, a Coda server reboot/maintenance does not
        disrupt the service or need extra administrative actions)

- storing the data to be used/published via web-like services
  value of Coda: extremely convenient,
        could be somewhat approximated by AFS or NFSv4 but is more
        convenient due to the zero client configuration and ease of
        infrastructure maintenance (a Coda server reboot/maintenance
        does not disrupt the service or need extra administrative actions)

- accessing one's own personal and/or work-related data
  (aka homedir and alike)
  value of Coda: extremely convenient, the experience could be somewhat
        approximated by AFS or NFSv4 but is more robust remotely (network
        glitches or server outages do not interrupt the workflow)

- storing mail (in Maildir)
  value of Coda: convenient, eliminates the need for an extra protocol
        (like IMAP) and extra authentication and authorization management,
        mail contents is consistently cached at the client/MUA,
        MXs can act in parallel instead of buffering/resending

This experience suggests among others that some of the current features
can be dropped without any loss of usability or even for a benefit.

-------------------------------------------------------------------------

Features and/or implementation details to be changed:

- volume names to be treated as comments, meant for humans only,
  dropping the corresponding indirection layer and the related code

- server id to be treated as the primary means to identify a server,
  to be known also to clients (to be able to enumerate the AVSG while
  requesting resolution)

- volume storage groups and their mapping to the corresponding servers'
  endpoints to be treated as public data and kept in DNS,
  a "name" to look up can be e.g. defined as a suitably formatted list of
  server ids in decimal in increasing order
  (if the number of servers in a realm is not to exceed 999999, 8 servers
  in an VSG would imply a name of max about 56 characters long which fits
  into the DNS name component restrictions)

- clients need to contact VSGs but servers only need to contact AVSGs,
  severs have also higher demands on reliability of the mapping
  AVSG (a set of server ids) => set of endpoints (ip:port),
  the mapping is to be implemented by a "db/servers"-lookalike

- volume ids shall be maintained realm-wise not server-wise
  (each replica of the same volume shall bear the same volume id),
  dropping the extra mapping from repvol to volreps and the corresponding
  code

- volume ids at volume creation shall be assigned without using a central
  coordination point (eliminating any "scm"-like dependency)

- mountpoints are to contain a list of serverids (representing the VSG)
  and a volume id, iow no longer involving the Coda servers in the
  resolution of a mount point (this implies a possibility to create
  inconsistent mount points, which otoh doesn't look too dangerous)

- certain of the Coda services are to be moved out of the implementation:
  - authentication advertisement (already is external)
  - VSG endpoints location (to be placed in DNS)

- placement of the remaining Coda services shall be made independent
  from each other; "scm" notion is to be more correctly represented by
  three separate master databases:
  - "identity control service"
    which would serve as the master for the data represented by today's
    prot_users.cdb,
    = to be distributed to both file servers and all authentication servers
  - "authentication control service" which would serve
    as the master for the data represented today by auth2.pw,
    = to be distributed to authentication servers implementing the Coda
    password authentication
  - "internal directory service" which would serve as the master for
    "db/servers"-lookalike
    = to be distributed to file servers

- to summarize, DNS shall contain
  - _codaauan._tcp.<realm>         authentication announcement service
  - _codaauth._udp.<realm>         Coda password authentication service
  - _codavsg_<vsg>.<realm>         each record containing endpoints for
                                   the participating servers along with
                                   the corresponding server id in the
                                   priority field, to avoid a need for
                                   extra DNS queries

- the kernel part of the Coda client is to be simplified by dropping
  the pioctl part which should instead go via a plain socket out of
  the /coda name space, importing the change from Ulocoda

-------------------------------------------------------------------------

Compatibility:

I think it is possible during a course of several years to maintain
both the old and the new codepaths with fallbacks back and forth, thus
preserving the compatibility.

Nevertheless, to ease the development and testing, a parallel use of an
additional file name space (like /coda-staging/....) might be beneficial.

I feel it may be easier to manage data duplication at the deployment
level and run double clients on the client hosts than bear the burden
of wire-compatibility on the source level.

-------------------------------------------------------------------------

I hope someone finds this intention and approach desirable and joins the
effort or otherwise contributes by expressing his/her opinion.

An extremely valuable contribution would be finding a funding source
for Coda development. Coda is useful and quite unique. If not otherwise,
public authorities should be concerned about helping such projects.
Anybody having relevant contacts?

Yours,
Rune

Coda File System

Coda development roadmap