Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Thu, 2 Nov 2006 21:43:01 -0500

Actually, the removal of a state in Coda, as we're getting rid of
connected mode operation.

Really this is a heads up about some changes that have been brewing in
the 'expand-wdonly' branch in the git repository for the past year which
I will push to CVS real soon now.

Although connected mode is where it all started, Coda having it's roots
in AFS2 and all. Everything in Coda's design is inherently favoring
disconnected and write-disconnected operation. The lack of locking, the
fact that we notify clients of an update only after it happened. It is
simply not possible to force an operation by infinitely retrying it
because we do not have "last writer wins" semantics, but "last writer
gets a conflict". Applications that absolutely rely on connected mode
consistency and semantics are doomed for failure as soon as someone
trips over a network cable or when the client decides it is better to
back off and work (write-)disconnected for a while as a result of server
load, network congestion, etc.

Instead of treating logging and reintegration as an occasionally used
fallback for 'normal' operation, I'd rather have it as the main focus of
the Coda File System and make it the best it can possibly be. And there
are definite advantages to always using write-back logging,

- Performance

Changes are logged and written back to the servers asynchronously
minimizing the time we block the application. We also send updates in
batches where the server can now commit up to 100 operations in a single
transaction, which is considerably more efficient. Finally, clients
optimize the logs to remove operations that would cancel each other out.
f.i. creation/removal of temporary files, multiple writes to the same
file, etc.

- Semantics

Connected mode and write-disconnected mode are very different in
behaviour (and implementation). By not having the hard switchover point
the story becomes a lot simpler. There are now just 2 numbers,

* How long until changes are eligible for writeback.
* How much time are we allowed to spend on writing back changes.

(the second number defines how long the volume is locked and in a way
how much of the available bandwidth is used for write purposes)

About once every 5 second all volumes are checked and any pending
changes eligble for writeback are pushed back to the servers. If there
is too much queued, we simply continue with the rest the next time we
check the writeback logs. By default, the values are set to 0 and 1.0
seconds respectively, this combines reasonable consistency as everything
is immediately eligible for reintegration, with smooth adaptation when
we have limited bandwidth as we only use about 1/5th of the available
time/bandwidth for writeback purposes.

Deep down these numbers have always existed for write-disconnected
volumes, but they defaulted to 30 seconds for aging and 60 seconds for
the writeback period. They were also not stored persistently so any
switch to or from another connectivity mode would override set user
preferences.

Not having separate connected and log-based write paths also resolves an
issue that was hard to solve with the old clients. A connected store
that completed successfully on the server, but where the client was
disconnected before it received the final reply would get logged as a
pending update by the client. When the logged store was reintegrated,
the server would flag it as an update/update conflict. When we always
log we only reintegrate the operation and reintegration knows how to
correctly detect and handle retries after a disconnection.

- Reliability

Using only write-disconnection for over a year has really forced me to
focus on the reliability of reintegration and resolution. Most of the
server-side improvements in recent releases are a direct result of my
need for reliable write-disconnected operation.

- Maintainability

The combined patches remove a little over 5000 lines from the Coda
client, this is almost 15% of the code in coda-src/venus. At some later
point when we remove the connected operations from the server we can
drop an additional 1300 lines. Fewer lines of code means less code to
maintain (and ideally fewer bugs).

Of course there are drawbacks, all of these actually exist with the
existing clients, they just are less avoidable / more visible,

- Local changes hide global updates.

When we create a new file in a directory, we cannot refresh the locally
cached copy until all changes have been pushed to the server. This is
not as noticeable when there is no sharing between clients or when
changes are reintegrated quickly. However not being able to refresh
dirty cached data leads to,

- More complex conflicts.

There used to be 2 types of conflict, server-server (result of failed
resolution), and local-global (resulting from failed reintegrating).
Only occasionally reintegration failed because of a server-server
inconsistency. This didn't happen all that often, probably because most
people tried to keep their clients in connected mode in which case the
operation would simply fail instead of getting preserved in the log. But
now as all modifications are logged and reintegrated, when we fail the
log record stays around as a conflict.

This is good in one way, because we won't lose conflicting updates. On
The other hand these conflicts were impossible to repair because both
types of conflict are expanded and handled differently. We used to bring
in a second client to first repair the underlying server-server
inconsistency before we could try to fix the reintegration conflict.

The new client code actually tries to unify the way a conflict is
expanded, and to some extend attempts to repair all conflicts. However
repair is still an area that is not completely sorted out and although
server-server conflict repair works as well as before, local-global and
local-server-server cases still require more work. Also the
reintegration and resolution improvements on the server have made many
of the unnecessary conflicts that plagued reintegration a thing of the
past.

- Cooperation between multiple authors

Because updates are logged and written back asynchronously, working
together on the same set of files requires some user action to make sure
the updates are propagated. Either use 'cfs fr' (forcereintegrate) to
push pending updates to the servers at a synchronization point, or set
the reintegration age to a low value.

There is also a special setting which makes reintegration occur
synchronously. When both reintegration age and time are set to 0 we
reintegrate all pending changes before venus returns to the application.
Behaviour-wise this comes really close to the old 'connnected mode
operation' combined with 'cfs strong', and in fact using 'cfs strong'
will place the specified volume in this synchronous reintegration mode.

Although I do not believe it fundamentally changes anything for the
end-user, it definitely changes some of the thinking about how the
system works. And probably in a good way, if you assume your writes are
always logged and delayed, you won't get a nasty surprise when Coda
decides it may be better to disconnect for a while.

One question was what version number to use for this release. I think it
is 95% there, but it probably isn't ready for a major release. On the
other hand it is more of a milestone than a 6.1.3 release would
indicate. So I bumped the version to 6.9.0, and at some point it will
become the future Coda-7.0.

There are definitely more advantages, and probably a couple of other
disadvantages, but this short email has gotten too long already.

Jan

Coda File System

The state of Coda