Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
480 views
in Technique[技术] by (71.8m points)

perl - How to make Mason2 UTF-8 clean?

Reformulating the question, because

Comment: This question already earned the "popular question badge", so probably i'm not the only hopeless person. :)

Unfortunately, demonstrating the full problem stack leads to an very long question and it is very Mason specific.

First, the opinions-only part :)

I'm using HTML::Mason over ages, and now trying to use Mason2. The Poet and Mason are the most advanced frameworks in the CPAN. Found nothing comparamble, what out-of-box allows write so clean /but very hackable :)/ web-apps, with many batteries included (logging, cacheing, config-management, native PGSI based, etc...)

Unfortunately, the author doesn't care about the rest of the word, e.g. by default, it is only ascii based, without any manual, faq or advices about: how to use it with unicode

Now the facts. Demo. Create an poet app:

poet new my #the "my" directory is the $poet_root
mkdir -p my/comps/xls
cd my/comps/xls

and add into the dhandler.mc the following (what will demostrating the two basic problems)

<%class>
    has 'dwl';
    use Excel::Writer::XLSX;
</%class>
<%init>
    my $file = $m->path_info;

    $file =~ s/[^w.]//g;
    my $cell = lc join ' ', "?NGSTR?M", "in the", $file;

    if( $.dwl ) {
        #create xlsx in the memory
        my $excel;
        open my $fh, '>', $excel or die "Failed open scalar: $!";
        my $workbook  = Excel::Writer::XLSX->new( $excel );
        my $worksheet = $workbook->add_worksheet();
        $worksheet->write(0, 0, $cell);
        $workbook->close();

        #poet/mason output
        $m->clear_buffer;
        $m->res->content_type("application/vnd.ms-excel");
        $m->print($excel);
        $m->abort();
    }
</%init>
<table border=1>
<tr><td><% $cell %></td></tr>
</table>
<a href="?dwl=yes">download <% $file %></a>

and run the app

../bin/run.pl

go to http://0:5000/xls/hello.xlsx and you will get:

+----------------------------+
| ?ngstr?m in the hello.xlsx |
+----------------------------+
download hello.xlsx

Clicking the download hello.xlsx, you will get hello.xlsx in the downloads.

The above demostrating the first problem, e.g. the component's source arent "under" the use utf8;, so the lc doesn't understand characters.

The second problem is the following, try the [http://0:5000/xls/hélló.xlsx] , or http://0:5000/xls/h%C3%A9ll%C3%B3.xlsx and you will see:

+--------------------------+
| ?ngstr?m in the hll.xlsx |
+--------------------------+
download hll.xlsx
#note the wrong filename

Of course, the input (the path_info) isn't decoded, the script works with the utf8 encoded octets and not with perl characters.

So, telling perl - "the source is in utf8", by adding the use utf8; into the <%class%>, results

+--------------------------+
| ?ngstr?m in the hll.xlsx |
+--------------------------+
download hll.xlsx

adding use feature 'unicode_strings' (or use 5.014;) even worse:

+----------------------------+
| ?ngstr?m in the h?ll?.xlsx |
+----------------------------+
download h?ll?.xlsx

Of course, the source now contains wide characters, it needs Encode::encode_utf8 at the output.

One could try use an filter such:

<%filter uencode><% Encode::encode_utf8($yield->()) %></%filter>

and filter the whole output:

% $.uencode {{
<table border=1>
<tr><td><% $cell %></td></tr>
</table>
<a href="?dwl=yes">download <% $file %></a>
% }}

but this helps only partially, because need care about the encoding in the <%init%> or <%perl%> blocks. Encoding/decoding inside of the perl code at many places, (read: not at the borders) leads to an spagethy code.

The encoding/decoding should be clearly done somewhere at the Poet/Mason borders - of course, the Plack operates on the byte level.


Partial solution.

Happyly, the Poet cleverly allows modify it's (and Mason's) parts, so, in the $poet_root/lib/My/Mason you could modify the Compilation.pm to:

override 'output_class_header' => sub {
    return join("
",
        super(), qq(
        use 5.014;
        use utf8;
        use Encode;
        )
    );
};

what will insert the wanted preamble into every Mason component. (Don't forget touch every component, or simply remove the compiled objects from the $poet_root/data/obj).

Also you could try handle the request/responses at the borders, by editing the $poet_root/lib/My/Mason/Request.pm to:

#found this code somewhere on the net
use Encode;
override 'run' => sub {
    my($self, $path, $args) = @_;

    #decode values - but still missing the "keys" decode
    foreach my $k (keys %$args) {
        $args->set($k, decode_utf8($args->get($k)));
    }

    my $result = super();

    #encode the output - BUT THIS BREAKS the inline XLS
    $result->output( encode_utf8($result->output()) );
    return $result;
};

Encode everything is an wrong strategy, it breaks e.g. the XLS.

So, 4 years after (i asked the original question in 2011) still don't know :( how to use correctly the unicode in the Mason2 applications and still doesn't exists any documentation or helpers about it. :(

The main questions are: - where (what methods should be modified by Moose's method modifiers) and how correctly decode the inputs and where the output (in the Poet/Mason app.)

  • but only textual ones, e.g. text/plain or text/html and such...
  • a do the above "surprise free" - e.g. what will simply works. ;)

Could someone please help with real code - what i should modify in the above?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The Mason2 manual presents the way component inheritance works, so I think that putting this common code in your main Base.mp component (from which all the other inherit) might solve your issue.

Creating plugins is described in Mason::Manual::Plugins.

So, you can build your own plugin that modifies Mason::Request and by overriding the request_args() you can return the UTF-8 decoded parameters.

Edit:

Regarding the UTF-8 output, you can add an Apache directive to ensure that text/plain and text/HTML outputs are always interpreted as UTF-8 :

AddDefaultCharset utf-8

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...