Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Thu, 21 Apr 2016 14:25:28 -0400

On Thu, Apr 21, 2016 at 02:16:02PM +0200, u-myfx_at_aetey.se wrote:
> The only problem we hit there was not with clog but with memory alignment
> in cfs (which we then fixed).

Looking forward to seeing patches related to that.

> >     - there was some sort of configuration daemon serving up
> >       configuration data for the authentication from a tcp port which
> >       introduces a whole slew of unevaluated security concerns. Since it
> >       is http how is the configuration secured from MitM attacks, can
> 
> There is no http involved (it would be a large overhead without any reason).
> 
> This data does not have to be secured against MitM, the worst which can
> happen is disruption of communication / DOS which MitM always can achieve.

Depends on what the configuration contains. If it lists anything like a
set of suggested authentication methods then you are exposed to all
security bugs in any of the mechanisms the client supports even when
your configuration has disabled them.

> > Allowing server IP addresses to change without client-side intervention
> > has to introduce some new level of indirection above the IP layer and
> > the most obvious one is to refer to servers using their DNS names. This
> > works nicely with the 'new' RPC2 call I added almost 10 years ago.
> > 
> >     https://github.com/cmusatyalab/coda/commit/86d97f6db3ac13d6d83f47636e23891f9380f537
> 
> Even something which looks obvious can be deceiving.
> 
> For reference here is an excerpt from my letter to this list, 2010,
> outlining one of the problems with it:

> ----
>  ...
> 
> Even today, a server's invariant "identity" in a realm is not it's ipv4
> address but it's server id.

No, server ids are an implementation artifact that have crept into part
that have become visible to clients (like in the volumeids). Ideally we
don't have (or pay attention to) server ids which would simplify things
and allow others such as migrating volumes from one server to another
for load balancing purposes.

> With ViceGetVolumeLocation() we make a similar mistake (!), we do
> not allow for the returned string to be invalidated "properly". There
> should have been some kind of a validity promise (say a TTL) included.
> A possibility to change host names in a realm setup without confusing
> the clients is a Good Thing (TM), why forbid this by design?

GetVolumeLocation returns a DNS hostname, DNS records have a TTL. There
is nothing here forbidden by design, we just don't duplicate something
that is already there.

> So this would replace the "constant ip numbers" constraint with
> "constant host names", why keep such a constraint at all? From system
> administration perspective this easily becomes a PITA in the long run
> when networking changes, dns domain names change and naming policies
> change as well.

I don't care if you want to name (or cname) your servers 1.mydomain.com,
2.mydomain.com, etc. There is no difference there.

> Actually, the whole concept of a "hostname" is broken, DNS purpose
> is to map _service_ names to endpoints (ip+port) of the corresponding
> daemons.
> Hardware units or OS instances (aka hosts) are irrelevant for this
> (even if they looked "natural" long ago when IP networking was in
> its infancy and uucp was state-of-the-art).
> 
> That's why we resolve the services, which are the numbered server
> instances in a realm, to endpoints via SRV records, the exact tool for
> the purpose.

Actually, if you really read your argument, you should be pushing volume
name -> endpoint mapping to the the SRV records. Skip volume ids,
server ids, and everything else inbetween. A client wants to connect to
a volume wherever it is. There is nothing special about the server id
number.

It just seems like we're having the same discussions over and over
again, and yet it always results in to us disagreeing and you just
doing your thing anyway,

http://coda.cs.cmu.edu/maillists/codalist/codalist-2014/9334.html
- That adding an alternate 'server-id' namespace based on an
  internal implementation artifact is not a good idea aside from the
  fact that, as is, it only allows for at most 253 servers in a realm.
- The fact that synchronous DNS calls are bad on the client.

http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2014/9338.html
- A mention by me to put volume -> endpoint lookups in DNS.

http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2014/9373.html
- My expression of annoyance that although you keep talking about
  patches, I have not actually seen much especially in areas that would
  help move things (like the cvs -> git conversion) forward.

http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2014/9322.html
- Nice, I just found this updateclnt fd-leak patch from you, applied!

> > Anything that removes lines of code, and/or reduces overall complexity
> > will be much easier to merge because it reduces the amount of cleanup
> > necessary.
> 
> Looking forward to it. As a low-hanging fruit, would you object to
> removing the hack which supports running "numbered" server instances on
> the same computer, with different assigned ips and dns names?

Yes that would be a candidate, we haven't used it and it relied on
binding multiple IP addresses to an interface and then explictly binding
each server to each address, it was useful at the time.

> I guess the motivation for it disappeared over 10 years ago when you
> implemented coalescing of free space in RVM and made the servers a lot
> more scalable.

More agressive RVM defragmentation definitely helped some, but what
helped more is that the average file size has gone up considerably.
People aren't so much storing more files as they are storing much larger
files. As an example, over the same time period digital cameras went
from low-res 640x480 jpeg compressed images to DSLR cameras with 40-50
megapixel RAW images.

Aside from that probably several of the Coda limits have prevented the
necessary growth, directory size limitations, client cache size limits,
number of volumes per server and number of servers per realm.

Finally with virtualization and containerization it has become easier to
just deploy multiple Coda file servers in separate guest VMs on a
rackmount server instead of trying to cram multiple servers together on
the same host.

Jan

Coda File System

Re: Coda git repository available