Coda File System

Re: Building a coda "appliance"

From: Yan Seiner <yan_at_seiner.com>
Date: Sat, 30 Jun 2007 08:08:17 -0700
Jan Harkes napsal(a):
> On Wed, Jun 27, 2007 at 10:36:35AM -0700, Yan Seiner wrote:
>   
>> I'm trying to build a pair of coda "appliances" - basically embedded 
>> boxes with a VPN and coda server/client, each acting as a samba server 
>> to its network.  The goal is to have two identical replicas of the same 
>> data.
>>
>> One side would be the server, the other would be the client.  Otherwise 
>> the boxes would be identical.
>>
>> I've got coda built and installed, and now I'm trying to map out my 
>> approach.
>>
>> The hardware consists of a 200 MHz ARM CPU with 32 MB of RAM.  The data 
>> consists of approximately 300 GB of CAD files.
>>
>> Is this enough RAM?  Can the RVM metadata be kept in a swap partition or 
>> do I need physical RAM for it?
>>     
>
> Sounds like your hardware is in the same ballpark as the linksys NSLU I
> have at home. I guess it 'could' run a server, but I really haven't
> tried.
>   
Pretty close.  I can get the hardware with up to 128 MB RAM if needed.

> The metadata is VM backed, so having swap space is definitely useful,
> the server doesn't really care about physical ram except that swapping
> will slow it down, which in turn would cause the client to switch to
> disconnected or weakly connected operation even over a well connected
> network.
>
> We use a private mmap of the RVM data file, so in low memory situations,
> clean in-memory pages are simply discarded and paged back in. Dirty
> pages are written to swap.
>
> A problem with your setup is that the box that runs the server will also
> have to run a client in order to provide local access. But that will
> mean that the meta-data is cached both by the server as well as the
> client and possible some in the kernel and in the samba daemon. And I
> think that would really get a bit tight with only 32MB of memory.
>   
If that becomes the major issue then it wouldn't be too hard to set up a 
server and two clients, but if I understand your comments then I suspect 
there are other issues.

> Other problems are that clients connected to the samba daemon won't be
> able to repair conflicts, so a conflict is pretty much deadly in such a
> setup (and in Coda's optimistic model unavoidable).
>   
So we'd be better off using a coda client on each PC workstation.

> Also unlike samba and nfsd daemons, the Coda servers are stateful, they
> remember which clients fetch a copy of what objects and send callbacks
> if any of the files change. Every callback requires a bit of allocated
> memory, with many files x many clients it does add up, but in your case
> you'd only have two, maybe three, clients.
>
> Our old Coda deployment ran on reasonably modest hardware, the Coda
> testserver was a Pentium 90 with 64MB of memory and it didn't really
> have much trouble, although it did have swap, rvm log, rvm data and
> the file data (/vicepa) on separate spindles (4 scsi drives)
> The main server group used to consist of something like PII 200Mhz with
> 128MB of memory, but again we spread swap, rvm log, rvm data and file
> data across different disks.
>
>   
>> Also, how should I structure this so that all the data is available to 
>> both sides, even in the event of a VPN failure?  (These boxes would be 
>> pretty much on opposite sides of the globe, so I can't really be sure 
>> the VPN will be available 100%.)  This means that the client would have 
>> to actively hoard all the data?  Is that practical?  Or should I use a 
>> different approach?
>>     
>
> It really depends on how many file objects you are talking about. If
> each file is 1GB, then we're just talking about ~300 files and I would
> not see any possible problem hoarding everything.
>
> If each file is ~4KB then I don't think it is feasible (at the moment)
> the client won't be able to keep all the metadata in memory and it will
> basically bring the device to a virtual halt in a swap frenzy about
> every 10 minutes during the hoard walk.
>   
The files tend to be large, but not that large. I'll have to look at it.

> Have you considered a setup that periodically mirrors or syncs both
> sites with something like unison or rsync. I just think that if your
> clients are going to be using a stateless filesystem to access the data
> on the appliances, they would just suffer from the drawbacks of Coda's
> weaker consistency model (no file locking, files becoming inaccessible
> due to conflicts) without really benefitting from Coda's features
> (persistent local disk cache, fast access to cached file data, writeback
> logging and log optimizations, directory ACLs for access control).
>   
I used to use unison to do something like this.

What I was hoping for is a real-time solution, as the two offices are 
nearly 12 hours apart, plus workweeks are different due to cultural 
differences, giving us only a few hours a week of overlap (or downtime, 
depending on the perspective.)

What concerns me is the comment that Coda doesn't do file locking.  
Maybe I misread something or just assumed that Coda will do remote file 
locking.

In our situation, we work with "assemblies" where each assembly consists 
of many separate files, which are opened concurrently either RO or RW, 
and file locking across the network is essential to prevent corruption 
of the entire assembly.

Do I read your comments correctly in that Coda doesn't do file locking 
across the network?

Or would that just be the consequence of my proposed samba<->coda 
setup?  Would I get file locking if each PC was a Coda client?

--Yan

-- 
  o__
  ,>/'_          o__
  (_)\(_)        ,>/'_          o__
Yan Seiner      (_)\(_)         ,>/'_   o__     o__
Certified Personal Trainer     (_)\(_)  ,>/'_   ,>/'_
Licensed Professional Engineer         (_)\(_) (_)\(_)

Linux stuff has made big progress over the competition. When things sit and don't start right away, we have a watch, and those poor guys have to settle for an hourglass.
Received on 2007-06-30 11:11:02