Coda File System

Re: DNS lookups and disconnected mode

From: <u+codalist-p4pg_at_chalmers.se>
Date: Sat, 27 Oct 2007 10:56:35 +0200
Hi Jan,

thanks for the reply, it explains well which workarounds can be applied
so that people will not be totally stuck.

Unfortunately you did not say anything about the prospects of fixing
this issue so that no special client host configuration
would be necessary.

On Sat, Oct 27, 2007 at 01:34:19AM -0400, Jan Harkes wrote:
> On Fri, Oct 26, 2007 at 10:29:50AM +0200, u+codalist-p4pg_at_chalmers.se wrote:
> > Unfortunately, as soon as we physically disconnected a machine
> > from the net so that it can not reach DNS, it becomes hardly possible to
> > use Coda. Venus is making DNS queries all the time and waiting for answers
> > which never come.
> 
> There must be something very different in the way your system is set up
> compared to mine.

Certainly. I guess that's why you did not notice the problem before.
(My machine relies on a caching DNS server on the net where it is normally
connected, like most other machines having their usual connection via that
network. A colleague of mine who first reported the issue has a different
setup, but also totally appropriate and well configured)

> First of all, as far as I know we only refresh the realm addresses when
> the client is restarted, so if you don't actually shutdown the client or
> machine it works just fine when the network disappears (module 60-90
> second RPC2 timeout).

Well, may be I was simultaneously hit by multiple rpc2-timeouts.
I did not test as thorougly how long it takes to disconnect in the middle
of a session. I think though I saw DNS-queries in that situation as well
(strace-ing Venus).

Anyway, a realistic usage case / pattern here is to turn the computer on
and begin using it. Without a net it took me an hour to be able to begin
working. Not especially good - say when you have a 40-minutes bus ride
and want to use that time for working :)

Of course I waited an hour mostly to check if it ever comes up.
Then I verified that this was because of DNS, applied some evident tricks and 
voila 4 minutes was enough after the next restart.

> Now if venus is restarted but the network is not available, DNS queries
> actually do time out if there is no response from the DNS servers in
> this case Coda falls back on previously cached information for the realm.

Exactly, but it takes an hour :)
At least when you have three realms and 5 servers to talk to.

> The DNS timeout could be quite long, because there are several levels of
> fallback going on, at the highest level, venus first tries SRV records
> and then falls back on doing a normal A record lookup. Below that the
> resolver library may try various aliases that are defined by the
> 'search' option in /etc/resolv.conf (although I think I've tried to
> disable that type of expansion) as well as sequentially trying each
> defined upstream DNS server. So if you have 3 servers it actually
> iterates over each of those before it gives up. And on each following
> query it will probably just try all of the servers again. libresolv by
> itself doesn't do any caching, so each lookup has to go across the net.

The libresolv design is quite old...

> Having a local DNS cache helps a lot because it caches both successful
> as well as failed lookups and avoids a lot of network traffic, but in
> some cases also handles things like only sending DNS queries to servers
> that are known to be reachable. I've successfully used both dnsmasq as
> well as pdns.

> Something like ifplugd or networkmanager can automatically bring the
> interface up or down when the network cable is connected or removed.

> Install networkmanager or ifplugd, or use a local dns cache like
> dnsmasq, pdnsd, or even a caching only bind/named.

Unfortunately as I already pointed out:

> > Both have two major drawbacks though:
> > - require root privileges and possibly questionnable changes
> >   in the local setup
> > - represent an extra burden

So yes there are workarounds, but not a solution.

> > How hard is it? Do we need a non-standard/non-existent DNS-resolver
> > library?
> 
> Right, the lack of DNS caching is really a libresolve issue. DNS has

So, my conclusion is that Venus can gain a lot from avoiding libresolv
and using an asynchronous resolver library. Then we could have to wait for
no more than one DNS timeout (and be able to choose its value).
To avoid introducing a hard dependency this could be conditional,
so that the existing code interfacing to libresolv could be compiled at will
when we do not want to bring along another library.

I have made a brief look at c-ares. It seems to lack support for SRV lookups
but there are patches to add that.
There are no license problems.
A compiled library for i486 is 39088 bytes (without a patch), not a big deal,
libresolv is bigger :)

There is also adns which seems to have support for SRV-records. It is GPL.
I haven't tried to compile it.

Any chance of Venus doing asynchronous DNS lookups, at least as a compile
option?

Best regards,
Rune
Received on 2007-10-27 04:58:24