(Illustration by Gaich Muramatsu)
Hello Jan, On Thu, Apr 21, 2016 at 10:57:21AM -0400, Jan Harkes wrote: > On Thu, Apr 21, 2016 at 02:16:02PM +0200, u-myfx_at_aetey.se wrote: > > We did not see any practical problems or extra stalls caused by > > synchronous DNS resolution. Definitely not an issue in our workloads. > > Of course nothing precludes changing to asynchronous resolution if > > needed but the effort and possible dependencies are hardly justified. > > I am literally fuming reading this. I don't know if you remember, but > several years ago you had me chasing down a server 'deadlock' issue > related to callbacks, which I was unable to reproduce and I spend about > a week on this going back and forth with new patches trying to turn > readlocks into writelocks in the hope it would avoid some possible lock > ordering issue, adding global timeouts to the callback break multirpc > calls and other workarounds.... > > You were running your servers with clients that were doing ****** > synchronous DNS lookups? No. (we switched to the DNS-based server lookups in 2014) > And you don't think that would be causing any > practival problems or extra stalls? > > > If not otherwise, the presence of callbacks is much more of a concern. > > In a file system where clients go disconnected as a matter of normal > > operation, callbacks do not give much benefit, at the same time callback > > breaking _does_ cause stalls. > > And on top of that you are blaming the callbacks for your woes. On top of what? > Sorry, but I have to cool down before I can respond to any of the rest This would certainly not hurt. :) > of your email. In the mean time I'll be busy finding and reverting the > patch that introduced a global timeout for callback rpcs and any other > possible regressions that may have been introduced. Oh thanks for looking. Nice if getting rid of such stalls is possible. (When you have several hundred clients holding callbacks on a volume or on a common directory, some of them definitely end up disconnected/dead when you happen to update something, then it takes time to break their callbacks...) > Jan Best regards, RuneReceived on 2016-04-21 11:48:07