Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
183 views
in Technique[技术] by (71.8m points)

os.walk() python: xml representation of a directory structure, recursion

So I am trying to use os.walk() to generate an XML representation of a directory structure. I seem to be getting a ton of duplicates. It properly places directories within each other and files in the right place for the first portion of the xml file; however, after it does it correctly it then continues traversing incorrectly. I am not quite sure why....

Here is my code:

def dirToXML(self,directory):
        curdir = os.getcwd()
        os.chdir(directory)
        xmlOutput=""

        tree = os.walk(directory)
        for root, dirs, files in tree:
            pathName = string.split(directory, os.sep)
            xmlOutput+="<dir><name><![CDATA["+pathName.pop()+"]]></name>"
            if len(files)>0:
                xmlOutput+=self.fileToXML(files)
            for subdir in dirs:
                xmlOutput+=self.dirToXML(os.path.join(root,subdir))
            xmlOutput+="</dir>"

        os.chdir(curdir)
        return xmlOutput  

The fileToXML, simply parses out the list so no need to worry about that.

The Directory Structure is simply:

images/
images/testing.xml
images/structure.xml
images/Hellos
images/Goodbyes
images/Goodbyes/foo
images/Goodbyes/bar
images/Goodbyes/square

and the resulting xml file became:

<structure>
<dir>
<name>images</name>
  <files>
    <file>
      <name>structure.xml</name>
    </file>
    <file>
      <name>testing.xml</name>
    </file>
  </files>
  <dir>
    <name>Hellos</name>
  </dir>
  <dir>
    <name>Goodbyes</name>
    <dir>
      <name>foo</name>
    </dir>
    <dir>
      <name>bar</name>
    </dir>
    <dir>
      <name>square</name>
    </dir>
  </dir>
  <dir>
    <name>foo</name>
  </dir>
  <dir>
    <name>bar</name>
  </dir>
  <dir>
      <name>square</name>
    </dir>
  </dir>
  <dir>
    <name>Hellos</name>
  </dir>
  <dir>
    <name>Goodbyes</name>
    <dir>
      <name>foo</name>
    </dir>
    <dir>
      <name>bar</name>
    </dir>
    <dir>
      <name>square</name>
    </dir>
  </dir>
  <dir>
    <name>foo</name>
  </dir>
  <dir>
    <name>bar</name>
  </dir>
  <dir>
    <name>square</name>
  </dir>
</structure>

Any help would be much appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'd recommend against using os.walk(), since you have to do so much to massage its output. Instead, just use a recursive function that uses os.listdir(), os.path.join(), os.path.isdir(), etc.

import os
from xml.sax.saxutils import escape as xml_escape

def DirAsXML(path):
    result = '<dir>
<name>%s</name>
' % xml_escape(os.path.basename(path))
    dirs = []
    files = []
    for item in os.listdir(path):
        itempath = os.path.join(path, item)
        if os.path.isdir(itempath):
            dirs.append(item)
        elif os.path.isfile(itempath):
            files.append(item)
    if files:
        result += '  <files>
' 
            + '
'.join('    <file>
      <name>%s</name>
    </file>'
            % xml_escape(f) for f in files) + '
  </files>
'
    if dirs:
        for d in dirs:
            x = DirAsXML(os.path.join(path, d))
            result += '
'.join('  ' + line for line in x.split('
'))
    result += '</dir>'
    return result

if __name__ == '__main__':
    print '<structure>
' + DirAsXML(os.getcwd()) + '
</structure>'

Personally, I'd recommend a much less verbose XML schema, putting names in attributes and getting rid of the <files> group:

import os
from xml.sax.saxutils import quoteattr as xml_quoteattr

def DirAsLessXML(path):
    result = '<dir name=%s>
' % xml_quoteattr(os.path.basename(path))
    for item in os.listdir(path):
        itempath = os.path.join(path, item)
        if os.path.isdir(itempath):
            result += '
'.join('  ' + line for line in 
                DirAsLessXML(os.path.join(path, item)).split('
'))
        elif os.path.isfile(itempath):
            result += '  <file name=%s />
' % xml_quoteattr(item)
    result += '</dir>'
    return result

if __name__ == '__main__':
    print '<structure>
' + DirAsLessXML(os.getcwd()) + '
</structure>'

This gives an output like:

<structure>
<dir name="local">
  <dir name=".hg">
    <file name="00changelog.i" />
    <file name="branch" />
    <file name="branch.cache" />
    <file name="dirstate" />
    <file name="hgrc" />
    <file name="requires" />
    <dir name="store">
      <file name="00changelog.i" />

etc.

If os.walk() worked more like expat's callbacks, you'd have an easier time of it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...