Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
794 views
in Technique[技术] by (71.8m points)

xml twig - Updating xml attribute value based on other with Perl

This is my sample xml file

<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>opensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

In this I need to update the value of revision only when <path> has "opensource" string in it.

I searched a lot but couldn't find anything useful to achieve this, I could modify the value based on the position as below, can anyone help me updating this? Or let me know if there is a better Perl library to do this.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple;

my $xml_file = 'dev.xml';

my $xml = XMLin(
    $xml_file,
    KeepRoot => 1,
    ForceArray => 1,
);

$xml->{manifest}->[0]->{project}->[2]->{revision} = 'kyo';

XMLout(
    $xml,
    KeepRoot => 1,
    NoAttr => 1,
    OutputFile => $xml_file,
);
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is definitely a learning curve, but XML::Twig and XPath syntax can handle this quite well. The following demonstrates that for a variation of the fake data that you provided.

Please note that one of the big features to twig is the ability to parse data as you go instead of having to load an extremely large XML file entirely into memory. That may not be a limitation in your case, but is an important feature for some.

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

Output:

You'll notice that the last revision has open source - prefixed to it.

<manifest>
  <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
  </default>
  <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
  </project>
  <project>
    <name>external</name>
    <path>source/tp</path>
    <x-ship>none</x-ship>
  </project>
  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - myno</revision>
    <x-ship>none</x-ship>
  </project>
</manifest>

Addendum about sibling elements:

Yes, there are methods to traverse within a twig to nearby xml elements. For example, if I wanted to pull the name of the revision I was editting, and place that in the new text, I could do the following:

sub revision {
    my ($twig, $rev) = @_;
    my $name = $rev->parent()->first_child("name");
    $rev->set_text("open source - " $name->text() . ' - '. $rev->text());
}

Note the ws now added to the edited revision tag:

  <project>
    <name>ws</name>
    <path>opensource/ws</path>
    <remote>nj</remote>
    <revision>open source - ws - myno</revision>
    <x-ship>none</x-ship>
  </project>

This method of traversing the twig to nearby elements can often be a useful way of filtering. I easily could've done the same to enforce this branch be one with a path that contained opensource, but setting that requirement in the xpath for the handler is convenient if one is familiar with the xpath syntax.

Also note, in my above example, I assume there is a sibling of type name. Normally I would check to make sure before calling ->text() or one might get an error.

Addendum about Attributes:

Concerning your edge case with an alternative format:

<project path="opensource" revision="apple" name="platform" x-ship="none"/>

The above contains the same data as other projects, but instead of the values being child elements, they are attributes. This is also a feature of XML, but it is different and so must be handled in a different way.

The following is an edit of the originally suggested script that adds a new handler for projects that contain a path attribute versus a child:

use strict;
use warnings;

use XML::Twig;

my $data = do { local $/; <DATA> };

my $t= XML::Twig->new( 
    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
        q{project[@path =~ /opensource/]} => &project,
    },
    pretty_print => 'indented',
);
$t->parse( $data );
$t->print;

sub revision {
    my ($twig, $rev) = @_;
    $rev->set_text("open source - " . $rev->text());
}

sub project {
    my ($twig, $project) = @_;

    $project->set_att(
        revision => 'open source - ' . $project->{att}{revision},
    );
}

__DATA__
<manifest>
 <default>
    <remote>remote1</remote>
    <revision>rev1</revision>
 </default>
 <project>
    <name>common</name>
    <path>NOTopensource/device</path>
    <revision>sa</revision>
    <x-ship>oss</x-ship>
 </project>
 <project path="opensource" revision="apple" name="platform" x-ship="none"/>
 <project>
   <name>external</name>
   <path>source/tp</path>
   <x-ship>none</x-ship>
 </project>
 <project>
   <name>ws</name>
   <path>opensource/ws</path>
   <remote>nj</remote>
   <revision>myno</revision>
   <x-ship>none</x-ship>
 </project>
</manifest>

And just to give you something to compare and learn from, here is the same code but with the filtering done in the handler versus using xpath:

    twig_handlers => {
        q{project[string(path) =~ /opensource/]/revision} => &revision,
        q{project} => &project,
    },

...

sub project {
    my ($twig, $project) = @_;

    if ($project->{att}{path} && $project->{att}{path} =~ /opensource/) {
        $project->set_att(
            revision => 'open source - ' . $project->{att}{revision},
        );
    }
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...