Trent Shipley wrote: >>Good suggestion. Another way (I did this in a C++ routine) is to keep a >>list of all the directory inodes you process. Every time you open a new >>one, you check it against the list. If it matches anything in the list, >>you skip that one. Of course this would only work for traversing a >>reasonbly sized file tree. >> >>Vaughn > > > Was that an ordered or unordered list? (My instinct would be to use an > ordered or perhaps an indexed list.) > > Actually what you want is a perfect hash. Any collision means that you've > visited the node already. (When I last delt with this problem I was working > in PERL and got hashes for free.) > > For a modest set of I-nodes you might use an array. > > There should also be a way to do it very nicely with a combination of array > and bit-mask. Is this all on one filesystem? Symbolic links can cross filesystems. Either you have to prune any branches that leave your filesystem, or your checklist has to be referenced by a filesystem ID + inode, not just the inode by itself. I gather the task here is to visit each eligible node once. The most efficient solution here is very likely to be the most brutally dumb one: a LINEAR SEARCH of all the inodes encountered in the current branch, and throw away your list each time you hit a real file. A given symlink can't be part of more than one chain, so when the chain hits a real file, you're done with the set of links you just traversed. Linear searches that average more than a few can add up to some serious processing time, but I believe the nature of this situation is that a chain of *any* length is an exception. Your typical chain of symlinks is not going to exceed 2. So allow up to 30 in a chain -- you'll never get there. Each time you push a symlink onto the list, search backwards to ensure that inode isn't already in the list. (Backwards reflects my guess that there may be a few links in the chain, but it'll be the last two that are pointing to each other.) When you get to a real file, erase that list and start over. As far as speed is concerned, doing a linear search of your 2 or 20 inodes in a list will take many orders of magnitude less time than the I/O you'll be doing in this process. In the old days engineers went to tremendous trouble to design a fast multiply operation. When they finally hooked up probes to figure out how much of its time the computer was multiplying, the answer was virtually no time at all. The moral is: First, design it simple. If performance becomes a problem, then grease the slow spots. (Of course you have to think about performance from the beginning, but it is *very* hard to anticipate where the performance problems are going to be.) I think you'll find this is dazzlingly fast. If not, you have invested no time in the simple approach, and you can always move on to hashes or bit arrays. So let me put on my propeller beanie, and ... In Perl (which is all I use nowadays) of course the hash is for free, and you probably have room for a lot of inodes. Just store $inodes{$inode} = 1. Again, once you get to a real file, you can set %inodes = (), just forget that you ever saw these guys. In a "real" programming language like C, you could use a bit-array as the most straightforward solution ... you know, a combination of character array and bit-masking, directly indexed by the inode. This would be blazingly fast if you have the memory. However, modern *nix systems allow for a *lot* of inodes -- I don't know how many. Divide that by 8, can you spare that many bytes to map all the possible inodes in the system? Of course this would be multiplied by the number of filesystems your search could span. ... and any time you wanted to purge this bitmap, that would eat some time. So I don't think the bitmap is really any good for this particular problem. Vic