How to repair a full Unix directory

A few easy techniques quickly bring the situation under control.

A reader wrote me this week that his bash scripts were complaining "out of memory"; what should he do? It didn't take long to get him moving again.

While my colleague Sandra Henry-Stocker usually covers this territory in her "Unix as a Second Language", the ideas involved in this episode apply nicely in common situations developers and Windows administrators encounter, so I think there is value in reporting them here. My correspondent knew that he wanted to run

    find . -type f -exec grep -i -l -H "keyword" '{}' + | xargs rm -rf

but he was getting "

out of memory
" because he had millions (!) of files in his directory tree, and, if I understood him correctly, was operating with an older host that only had 256 megabytes of main memory. What should he do?

My first thought:

        # Caution:  this coding is fragile, in that it mishandles filenames which
        # embed blanks.  Accommodating those is a story for another day.
    find . -type f -exec grep -i -l -H "$keyword" {} \; > $INTERMEDIATE_FILE
    for NAME in `cat $INTERMEDIATE_FILE`
        rm -rf $NAME

Did that help? "Yes!", the report came back--well, "yes and no." As I'm a big believer that long journeys begin with small steps, I found more encouragement than discouragement in the answer. Apparently the questioner needed to do several waves of cleanup, and "unrolling" the one-liner with an

helped with some of the out-of-memory situations, but not all.

"One step at a time", I thought. After a little more negotiation, we reduced his symptoms to "

out of memory
" faults with



    find ./ -size -6k -type f >> $INTERMEDIATE_FILE

Did I have any tricks left for those?

Sure; in fact, I have a history of creating this situation for myself. I often use temporary files for various test automations I run, and, unless I'm scrupulous about cleaning up after the tests, it's easy to find myself with tens of thousands of files named, for example,

. I've often had so many of these that trying to clean up the mess with
rm /tmp/tmp*log
does just what my questioner described: complains "
out of memory
". In a case like this, it's time to "eat the elephant one bite at a time", which translates, in this case, to something like

    rm /tmp/tmp*a*.log
    rm /tmp/tmp*b*.log
    rm /tmp/tmp*[g-j]*.log
    rm /tmp/tmp*[A-H]*.log

In English, the idea is to specify a subset of

small enough to fit in memory, but large enough to nibble away at the whole list. After slicing out a few "chunks", we quickly reduce the whole collection of remaining
to a manageable size, where more traditional bash programming can take over.


, a homologous approach would be something like

    find . -name "*a*" -size -6k -type f >> $INTERMEDIATE_FILE
    find . -name "*[bc]*" -size 6k -type f >> $INTERMEDIATE_FILE

The excitement wasn't quite over yet, of course; situations like this seem always to have "loose ends". In the case of my questioner, he had many files whose names included non-ASCII Unicode characters. I've got plenty of tricks for dealing with those, too, including switching to Tcl for my scripting. This time, though, we started with the files whose names were easy to express, processed all of them, and then determined, to my non-surprise, that the residuum which remained was small enough that the questioner could use his usual bash coding skills. Mission accomplished.

What's the conclusion? I don't have a particularly polished aphorism to summarize what happened. I do know, though, that many cases that look like "show-stoppers" the first time encountered turn out to be easy to solve for someone with just a little more experience. If you're feeling stuck, be clear with yourself what your true requirements are, what you're getting, and what appears to constrain you. Ask for help; someone else, with a different perspective, might quickly see a way to fit together all the elements of your problem to make a solution.

There's also a lesson here about craft-work that I don't yet know how to put into words. Part of the difference between "textbook learning" and the kind of professional training that diesel mechanics, physicians, lawyers, and plumbers all practice has to do with learning how to handle novel situations. It involves thorough apprenticeship in the basics, followed by exposure to progressively more challenging variations. If

rm *

doesn't give you what you want, break down the


part into pieces small enough to handle.

This story, "How to repair a full Unix directory" was originally published by ITworld.

Copyright © 2010 IDG Communications, Inc.

Bing’s AI chatbot came to work for me. I had to fire it.
Shop Tech Products at Amazon