(Illustration by Gaich Muramatsu)
On Wed, Sep 04, 2002 at 02:38:10PM +0200, Stephan Kanthak wrote: > After having a more or less detailed look into distributed file systems after > reading a lot of manuals I am wondering a little bit about one single point > in the current implementations (includes CODA): > > Why is there the need of dedicated file servers? I think it would be easier to > use and handle if there are only two types of machines: > > - one central server that holds the complete list of files (not the files > itself) and a list of all connected clients > - a lot of clients with cache(s) that hold the files' data Disconnected operation The when a client goes offline, other clients would not be able to get the data. It is also easier for system administrators to backup all data from a 'single known location' compared to forcing everyone to bring their laptops in once in a while. And I would hate to be connected over a dialup line around the time backups would kick in :) Cache replacement When a client's cache fills up, we can make room without having to check with the servers whether we happen the be the last client that has a copy of the file. Security Clients are considered to be untrusted, any permissions are ultimately checked on the central server. If your client does not have the required 'token' that gives it permission to read or modify a file, it will not be able to even touch that data. With a peer-to-peer cache sharing, how does one client know whether another completely unknown client is really allowed to get that file. Either it has to have access to the auth2 password, so that it can check the presented token, which would conflict with clients being untrusted. Or it has to forward the token for verification to the central server, which is a performance killer. Or it could only serve files that are encrypted, and the keys can only be obtained from the central server by the client that wants to see the file contents. > Performance should be not worse than NFS in that case, even better > because e.g. under linux the file system cache in memory will only hold > directory inodes. Actually it caches all inodes in the icache, directory lookups (both successful and unsuccessful) in the dcache and file data in the pagecache. So it doesn't really matter where the data is stored, even when it is in a flat file it will be cached. A more scalable solution would probably be a backend database, either mysql, libdb3, or (if updates are infrequent) cdb. > That system could work quite well even on server disconnect, because > the local filesystem still works. And you don't need any adminstration > of file servers or dedicated file servers. The server can also keep track > of preconfigured minimum and maximum replication counts in order to Ehh, there still is a central server for the file location lookups, so there is some administration involved. Also when there are multiple servers, which ones are responsible for storing the location information of an object. Do they also store data to validate the object (md5/sha checksum, decryption keys). What if the location server is offline and an update occurs, is this then stored at some other available server, so that when the original server comes back we have 2 conflicting 'locations' for an object? And when a client updates a file, does the server force-feed the update to other clients even when they are not interested to maintain the replication counts? I wouldn't want that when I'm connected over dialup either. I do think it would be a very interesting project, and quite feasable, Intermezzo actually has some of these ideas and others have come up in Coda meetings. Although we were more discussing clients 'stealing' from other client caches if it can give them some performance benefit. The authorative copy would still be on the file servers. > If I missed any implementation of a distributed file system that has exactly > the features mentioned above, please tell me. I already tried CODA and I guess you looked at typical P2P systems like FreeNet and Gnutella, etc.? Then there are also Unison and Intermezzo. JanReceived on 2002-09-04 11:00:01