perlperf

(源代码, CPAN)

内容

名称
描述
概述
一般指南
基准测试
- 分配和解引用变量。
- 搜索和替换或 tr
性能分析工具
排序
日志记录
- 如果 DEBUG（常量）则记录日志
附录
参见
- PERLDOCS
- 手册页
- 模块
- URL
作者

#名称

perlperf - Perl 性能和优化技巧

#描述

本文介绍了可用于 Perl 程序的性能和优化技巧，特别是针对 Perl 程序。虽然许多 Perl 开发人员来自其他语言，并且可以在适当的情况下使用他们先前的知识，但还有许多其他人可能会从一些 Perl 特定的提示中受益。如果你想要简短的版本，也许最好的建议来自著名的日本武士宫本武藏，他在 1645 年说过

"Do Not Engage in Useless Activity"

。

#概述

程序员最常犯的错误可能是试图在程序真正发挥作用之前优化代码——这是一个坏主意。一个运行速度极快的程序如果不能正常工作，就没有意义。首要任务是让程序正确地完成一些有用的事情（更不用说确保测试套件完全正常工作），然后才考虑优化它。在决定优化现有的工作代码后，需要考虑几个简单但必不可少的步骤，这些步骤是任何优化过程的内在组成部分。

#横向迈出一步

首先，您需要为现有代码建立一个基准时间，这个时间需要可靠且可重复。您可能需要使用Benchmark或Devel::NYTProf模块，或类似的工具，或者使用Unix系统的time工具，具体取决于情况。请参阅本文档的底部，获取更多关于基准测试和分析模块的列表，以及推荐的进一步阅读材料。

#向前迈出一步

接下来，在检查了程序的热点（代码运行缓慢的地方）之后，更改代码以使其运行得更快。使用版本控制软件，如subversion，可以确保所有更改都是可逆的。在代码中随意修改很容易——不要一次更改太多内容，否则您可能无法发现真正导致代码运行缓慢的部分。

#再横向迈出一步

仅仅说：“这会让它运行得更快”是不够的，您需要进行验证。在上述第一步中，使用基准测试或分析模块重新运行代码，并检查新代码是否在更短的时间内执行了相同的任务。保存您的工作并重复...

#一般指南

在考虑性能时，最关键的一点是要记住，没有所谓的“银弹”，因此没有硬性规则，只有指导原则。

很明显，内联代码比子程序或方法调用更快，因为开销更少，但这种方法的缺点是可维护性较差，并且会以更大的内存使用为代价。没有免费的午餐。如果你要在一个列表中搜索一个元素，将数据存储在哈希结构中，然后简单地查看键是否已定义，可能比使用 grep() 等遍历整个数组更有效。substr() 可能比 grep() 快（得多），但不如它灵活，因此你还有另一个权衡。你的代码可能包含一行代码需要 0.01 秒才能执行，如果你调用它 1,000 次，很可能在一个程序解析中等大小的文件时，你已经有了 10 秒的延迟，仅仅在一个代码位置，如果你调用该行 100,000 次，你的整个程序将慢到无法忍受。

将子程序用作排序的一部分是一种获得你想要结果的强大方法，但通常比内置的字母cmp和数字<=>排序运算符慢。可以对数据进行多次遍历，构建索引以使即将进行的排序更高效，并使用所谓的OM（兽人策略）提前缓存排序键。缓存查找虽然是一个好主意，但它本身可能是减速的来源，因为它强制对数据进行两次遍历 - 一次是设置缓存，一次是排序数据。使用pack()将所需的排序键提取到一个一致的字符串中，可以是一种有效的方式来构建一个用于比较的单个字符串，而不是使用多个排序键，这使得可以使用标准的、用c编写的、快速的 perl sort()函数对输出进行排序，这也是GRT（Guttman Rossler 变换）的基础。某些字符串组合可能会使GRT变慢，仅仅是因为它们过于复杂。

对于使用数据库后端的应用程序，标准的DBIx命名空间试图帮助保持事物快速，尤其是因为它试图不查询数据库直到最后一刻，但始终阅读你选择的库附带的文档。在处理数据库的开发人员面临的众多问题中，应该始终使用SQL占位符，并在可能有利的情况下考虑预取数据集。通过将多个进程分配给解析单个文件来拆分一个大型文件，例如使用POE、threads或fork，也可以是一种优化你对可用CPU资源使用情况的有效方法，尽管这种技术充满了并发问题，并且需要高度关注细节。

每个案例都有特定的应用场景和一个或多个例外情况，没有比运行一些测试并找出哪种方法最适合你的特定环境更好的方法了，这就是为什么编写最佳代码不是一门精确的科学，也是为什么我们如此喜欢使用 Perl - TMTOWTDI。

#基准测试

以下是一些示例，用于演示 Perl 基准测试工具的使用。

#变量赋值和解引用。

我相信我们大多数人都见过类似（或更糟）的代码

if ( $obj->{_ref}->{_myscore} >= $obj->{_ref}->{_yourscore} ) {
    ...

这种代码看起来很糟糕，而且对拼写错误非常敏感，明确地解引用变量会更清晰。我们正在回避使用面向对象编程技术来封装通过方法访问变量的问题，这些方法只能通过对象访问。这里我们只是讨论选择的技术实现，以及它是否会影响性能。我们可以通过将比较代码放在一个文件中并运行一个 Benchmark 测试来查看这种解引用操作是否有任何开销。

# 解引用

#!/usr/bin/perl

use v5.36;

use Benchmark;

my $ref = {
        'ref'   => {
            _myscore    => '100 + 1',
            _yourscore  => '102 - 1',
        },
};

timethese(1000000, {
        'direct'       => sub {
          my $x = $ref->{ref}->{_myscore} . $ref->{ref}->{_yourscore} ;
        },
        'dereference'  => sub {
            my $ref  = $ref->{ref};
            my $myscore = $ref->{_myscore};
            my $yourscore = $ref->{_yourscore};
            my $x = $myscore . $yourscore;
        },
});

必须运行足够多次的任何计时测量，以便数字稳定在数值平均值上，否则每次运行都会由于环境的变化而自然波动，例如，为了减少对 CPU 资源和网络带宽的争用。运行上述代码一百万次迭代，我们可以查看 Benchmark 模块输出的报告，以查看哪种方法最有效。

$> perl dereference

Benchmark: timing 1000000 iterations of dereference, direct...
dereference:  2 wallclock secs ( 1.59 usr +  0.00 sys =  1.59 CPU) @ 628930.82/s (n=1000000)
    direct:  1 wallclock secs ( 1.20 usr +  0.00 sys =  1.20 CPU) @ 833333.33/s (n=1000000)

差异很明显，解引用方法更慢。虽然它在我们的测试中平均每秒执行 628,930 次，但直接方法却多运行了 204,403 次，不幸的是。不幸的是，由于有许多使用多层直接变量访问编写的代码示例，而且通常很糟糕。然而，它确实快了一点。问题仍然是，微不足道的收益是否真的值得眼睛疲劳，或者是否值得失去可维护性。

#搜索和替换或 tr

如果我们有一个需要修改的字符串，虽然正则表达式几乎总是更灵活，但 tr（一个经常被低估的工具）仍然很有用。一种场景可能是将所有元音替换为另一个字符。正则表达式解决方案可能如下所示

$str =~ s/[aeiou]/x/g

tr 替代方案可能如下所示

$str =~ tr/aeiou/xxxxx/

我们可以将其放入一个测试文件中，我们可以运行该文件来检查哪种方法最快，使用全局 $STR 变量分配给 my $str 变量，以避免 perl 尝试通过注意到它只分配了一次来优化任何工作。

# 正则表达式转写

#!/usr/bin/perl

use v5.36;

use Benchmark;

my $STR = "$$-this and that";

timethese( 1000000, {
'sr'  => sub { my $str = $STR; $str =~ s/[aeiou]/x/g; return $str; },
'tr'  => sub { my $str = $STR; $str =~ tr/aeiou/xxxxx/; return $str; },
});

运行代码会给出我们的结果

$> perl regex-transliterate

Benchmark: timing 1000000 iterations of sr, tr...
        sr:  2 wallclock secs ( 1.19 usr +  0.00 sys =  1.19 CPU) @ 840336.13/s (n=1000000)
        tr:  0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2040816.33/s (n=1000000)

tr 版本是明显的赢家。一种解决方案灵活，另一种解决方案快速 - 并且适当地由程序员选择使用哪种解决方案。

查看 Benchmark 文档以获取更多有用的技术。

#性能分析工具

一段稍微大一点的代码将提供一些内容，性能分析器可以根据这些内容生成更全面的报告统计信息。此示例使用简单的 wordmatch 程序，该程序解析给定的输入文件并输出有关内容的简短报告。

# wordmatch

#!/usr/bin/perl

use v5.36;

=head1 NAME

filewords - word analysis of input file

=head1 SYNOPSIS

    filewords -f inputfilename [-d]

=head1 DESCRIPTION

This program parses the given filename, specified with C<-f>, and
displays a simple analysis of the words found therein.  Use the C<-d>
switch to enable debugging messages.

=cut

use FileHandle;
use Getopt::Long;

my $debug   =  0;
my $file    = '';

my $result = GetOptions (
    'debug'         => \$debug,
    'file=s'        => \$file,
);
die("invalid args") unless $result;

unless ( -f $file ) {
    die("Usage: $0 -f filename [-d]");
}
my $FH = FileHandle->new("< $file")
                              or die("unable to open file($file): $!");

my $i_LINES = 0;
my $i_WORDS = 0;
my %count   = ();

my @lines = <$FH>;
foreach my $line ( @lines ) {
    $i_LINES++;
    $line =~ s/\n//;
    my @words = split(/ +/, $line);
    my $i_words = scalar(@words);
    $i_WORDS = $i_WORDS + $i_words;
    debug("line: $i_LINES supplying $i_words words: @words");
    my $i_word = 0;
    foreach my $word ( @words ) {
        $i_word++;
        $count{$i_LINES}{spec} += matches($i_word, $word,
                                          '[^a-zA-Z0-9]');
        $count{$i_LINES}{only} += matches($i_word, $word,
                                          '^[^a-zA-Z0-9]+$');
        $count{$i_LINES}{cons} += matches($i_word, $word,
                                    '^[(?i:bcdfghjklmnpqrstvwxyz)]+$');
        $count{$i_LINES}{vows} += matches($i_word, $word,
                                          '^[(?i:aeiou)]+$');
        $count{$i_LINES}{caps} += matches($i_word, $word,
                                          '^[(A-Z)]+$');
    }
}

print report( %count );

sub matches {
    my $i_wd  = shift;
    my $word  = shift;
    my $regex = shift;
    my $has = 0;

    if ( $word =~ /($regex)/ ) {
        $has++ if $1;
    }

    debug( "word: $i_wd "
          . ($has ? 'matches' : 'does not match')
          . " chars: /$regex/");

    return $has;
}

sub report {
    my %report = @_;
    my %rep;

    foreach my $line ( keys %report ) {
        foreach my $key ( keys $report{$line}->%* ) {
            $rep{$key} += $report{$line}{$key};
        }
    }

    my $report = qq|
$0 report for $file:
lines in file: $i_LINES
words in file: $i_WORDS
words with special (non-word) characters: $i_spec
words with only special (non-word) characters: $i_only
words with only consonants: $i_cons
words with only capital letters: $i_caps
words with only vowels: $i_vows
|;

    return $report;
}

sub debug {
    my $message = shift;

    if ( $debug ) {
        print STDERR "DBG: $message\n";
    }
}

exit 0;

#Devel::DProf

这个久负盛名的模块十多年来一直是 Perl 代码性能分析的实际标准，但已被许多其他模块取代，这些模块将我们带回了 21 世纪。虽然建议您从本文中提到的几个工具以及本文底部 CPAN 列表中的工具中评估您的工具（目前 Devel::NYTProf 似乎是首选工具 - 参见下文），但我们将快速看一下 Devel::DProf 的输出，以设置 Perl 性能分析工具的基线。在命令行上使用 -d 开关，在 Devel::DProf 的控制下运行上述程序。

$> perl -d:DProf wordmatch -f perl5db.pl

<...multiple lines snipped...>

wordmatch report for perl5db.pl:
lines in file: 9428
words in file: 50243
words with special (non-word) characters: 20480
words with only special (non-word) characters: 7790
words with only consonants: 4801
words with only capital letters: 1316
words with only vowels: 1701

Devel::DProf 生成一个特殊文件，默认情况下称为 tmon.out，该文件由 dprofpp 程序读取，该程序已作为 Devel::DProf 发行版的一部分安装。如果您在没有选项的情况下调用 dprofpp，它将读取当前目录中的 tmon.out 文件并生成程序运行的人类可读统计报告。请注意，这可能需要一些时间。

$> dprofpp

Total Elapsed Time = 2.951677 Seconds
  User+System Time = 2.871677 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 102.   2.945  3.003 251215   0.0000 0.0000  main::matches
 2.40   0.069  0.069 260643   0.0000 0.0000  main::debug
 1.74   0.050  0.050      1   0.0500 0.0500  main::report
 1.04   0.030  0.049      4   0.0075 0.0123  main::BEGIN
 0.35   0.010  0.010      3   0.0033 0.0033  Exporter::as_heavy
 0.35   0.010  0.010      7   0.0014 0.0014  IO::File::BEGIN
 0.00       - -0.000      1        -      -  Getopt::Long::FindOption
 0.00       - -0.000      1        -      -  Symbol::BEGIN
 0.00       - -0.000      1        -      -  Fcntl::BEGIN
 0.00       - -0.000      1        -      -  Fcntl::bootstrap
 0.00       - -0.000      1        -      -  warnings::BEGIN
 0.00       - -0.000      1        -      -  IO::bootstrap
 0.00       - -0.000      1        -      -  Getopt::Long::ConfigDefaults
 0.00       - -0.000      1        -      -  Getopt::Long::Configure
 0.00       - -0.000      1        -      -  Symbol::gensym

dprofpp 将生成有关 wordmatch 程序活动的相当详细的报告。时钟时间、用户时间和系统时间位于分析的顶部，之后是定义报告的主要列。查看 dprofpp 文档以了解其支持的许多选项的详细信息。

另请参见 Apache::DProf，它将 Devel::DProf 连接到 mod_perl。

#Devel::Profiler

让我们使用不同的性能分析器查看同一个程序：Devel::Profiler，它是 Devel::DProf 的直接 Perl 替代品。使用方法略有不同，因为您不是使用特殊的 -d: 标志，而是使用 -M 直接将 Devel::Profiler 作为模块引入。

$> perl -MDevel::Profiler wordmatch -f perl5db.pl

<...multiple lines snipped...>

wordmatch report for perl5db.pl:
lines in file: 9428
words in file: 50243
words with special (non-word) characters: 20480
words with only special (non-word) characters: 7790
words with only consonants: 4801
words with only capital letters: 1316
words with only vowels: 1701

Devel::Profiler 生成一个与 dprofpp 程序兼容的 tmon.out 文件，从而节省了构建专用统计信息读取程序的步骤。因此，dprofpp 的用法与上面的示例相同。

$> dprofpp

Total Elapsed Time =   20.984 Seconds
  User+System Time =   19.981 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 49.0   9.792 14.509 251215   0.0000 0.0001  main::matches
 24.4   4.887  4.887 260643   0.0000 0.0000  main::debug
 0.25   0.049  0.049      1   0.0490 0.0490  main::report
 0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::GetOptions
 0.00   0.000  0.000      2   0.0000 0.0000  Getopt::Long::ParseOptionSpec
 0.00   0.000  0.000      1   0.0000 0.0000  Getopt::Long::FindOption
 0.00   0.000  0.000      1   0.0000 0.0000  IO::File::new
 0.00   0.000  0.000      1   0.0000 0.0000  IO::Handle::new
 0.00   0.000  0.000      1   0.0000 0.0000  Symbol::gensym
 0.00   0.000  0.000      1   0.0000 0.0000  IO::File::open

有趣的是，我们得到了略微不同的结果，这主要是因为生成报告的算法不同，即使输出文件格式据称是相同的。经过的时间、用户时间和系统时间清楚地显示了Devel::Profiler执行自身运行所花费的时间，但列出的内容感觉比我们之前从Devel::DProf获得的更准确。例如，102% 的数字消失了。这就是我们必须使用我们手头的工具，并认识到它们的优缺点，然后才能使用它们的地方。有趣的是，两个报告中每个子例程的调用次数是相同的，只是百分比不同。正如Devel::Profiler的作者所写

...running HTML::Template's test suite under Devel::DProf shows
output() taking NO time but Devel::Profiler shows around 10% of the
time is in output().  I don't know which to trust but my gut tells me
something is wrong with Devel::DProf.  HTML::Template::output() is a
big routine that's called for every test. Either way, something needs
fixing.

YMMV。

另请参阅Devel::Apache::Profiler，它将Devel::Profiler挂钩到mod_perl。

#Devel::SmallProf

Devel::SmallProf 分析器检查 Perl 程序的运行时，并生成逐行列表以显示每行被调用的次数以及每行执行所花费的时间。它通过在运行时向 Perl 提供熟悉的-d标志来调用。

$> perl -d:SmallProf wordmatch -f perl5db.pl

<...multiple lines snipped...>

wordmatch report for perl5db.pl:
lines in file: 9428
words in file: 50243
words with special (non-word) characters: 20480
words with only special (non-word) characters: 7790
words with only consonants: 4801
words with only capital letters: 1316
words with only vowels: 1701

Devel::SmallProf 默认情况下将输出写入名为smallprof.out的文件。文件的格式如下所示

<num> <time> <ctime> <line>:<text>

程序终止后，可以使用任何标准文本过滤实用程序检查和排序输出。以下内容可能就足够了

$> cat smallprof.out | grep \d*: | sort -k3 | tac | head -n20

251215   1.65674   7.68000    75: if ( $word =~ /($regex)/ ) {
251215   0.03264   4.40000    79: debug("word: $i_wd ".($has ? 'matches' :
251215   0.02693   4.10000    81: return $has;
260643   0.02841   4.07000   128: if ( $debug ) {
260643   0.02601   4.04000   126: my $message = shift;
251215   0.02641   3.91000    73: my $has = 0;
251215   0.03311   3.71000    70: my $i_wd  = shift;
251215   0.02699   3.69000    72: my $regex = shift;
251215   0.02766   3.68000    71: my $word  = shift;
 50243   0.59726   1.00000    59:  $count{$i_LINES}{cons} =
 50243   0.48175   0.92000    61:  $count{$i_LINES}{spec} =
 50243   0.00644   0.89000    56:  my $i_cons = matches($i_word, $word,
 50243   0.48837   0.88000    63:  $count{$i_LINES}{caps} =
 50243   0.00516   0.88000    58:  my $i_caps = matches($i_word, $word, '^[(A-
 50243   0.00631   0.81000    54:  my $i_spec = matches($i_word, $word, '[^a-
 50243   0.00496   0.80000    57:  my $i_vows = matches($i_word, $word,
 50243   0.00688   0.80000    53:  $i_word++;
 50243   0.48469   0.79000    62:  $count{$i_LINES}{only} =
 50243   0.48928   0.77000    60:  $count{$i_LINES}{vows} =
 50243   0.00683   0.75000    55:  my $i_only = matches($i_word, $word, '^[^a-

您可以立即看到子例程分析模块的略微不同的重点，我们开始确切地看到哪一行代码花费了最多的时间。例如，该正则表达式行看起来有点可疑。请记住，这些工具应该一起使用，没有一种最好的方法来分析您的代码，您需要使用最适合工作的工具。

另请参阅Apache::SmallProf，它将Devel::SmallProf挂钩到mod_perl。

#Devel::FastProf

Devel::FastProf 是另一个 Perl 行分析器。它是为了获得比例如Devel::SmallProf 更快的行分析器而编写的，因为它是用C编写的。要使用Devel::FastProf，请向 Perl 提供-d参数

$> perl -d:FastProf wordmatch -f perl5db.pl

<...multiple lines snipped...>

wordmatch report for perl5db.pl:
lines in file: 9428
words in file: 50243
words with special (non-word) characters: 20480
words with only special (non-word) characters: 7790
words with only consonants: 4801
words with only capital letters: 1316
words with only vowels: 1701

Devel::FastProf 将统计信息写入当前目录中的fastprof.out文件。可以使用fprofpp 命令行程序解释输出文件（可以指定）。

$> fprofpp | head -n20

# fprofpp output format is:
# filename:line time count: source
wordmatch:75 3.93338 251215: if ( $word =~ /($regex)/ ) {
wordmatch:79 1.77774 251215: debug("word: $i_wd ".($has ? 'matches' : 'does not match')." chars: /$regex/");
wordmatch:81 1.47604 251215: return $has;
wordmatch:126 1.43441 260643: my $message = shift;
wordmatch:128 1.42156 260643: if ( $debug ) {
wordmatch:70 1.36824 251215: my $i_wd  = shift;
wordmatch:71 1.36739 251215: my $word  = shift;
wordmatch:72 1.35939 251215: my $regex = shift;

我们可以立即看到，每行被调用的次数与Devel::SmallProf 输出相同，并且序列仅根据每行执行所花费的时间的顺序略有不同，例如if ( $debug ) { 和 my $message = shift;。记录的实际时间差异可能是在内部使用的算法中，也可能是由于系统资源限制或竞争。

另请参阅DBIx::Profile，它将分析在DBIx::* 命名空间下运行的数据库查询。

#Devel::NYTProf

Devel::NYTProf 是下一代 Perl 代码分析器，它修复了其他工具的许多缺陷，并实现了许多酷炫的功能。首先，它可以同时用作行分析器、块分析器或子程序分析器。它还可以在提供clock_gettime() 的系统上使用亚微秒（100ns）分辨率。它甚至可以由正在分析的程序启动和停止。它是一行代码即可分析mod_perl 应用程序。它用c 编写，可能是 Perl 最快的分析器。酷炫的功能还有很多。好了，让我们看看它是如何工作的 - 只需使用熟悉的-d 开关将其插入并运行代码即可。

$> perl -d:NYTProf wordmatch -f perl5db.pl

wordmatch report for perl5db.pl:
lines in file: 9427
words in file: 50243
words with special (non-word) characters: 20480
words with only special (non-word) characters: 7790
words with only consonants: 4801
words with only capital letters: 1316
words with only vowels: 1701

NYTProf 默认情况下会将报告数据库生成到文件nytprof.out 中。可以使用提供的nytprofhtml（HTML 输出）和nytprofcsv（CSV 输出）程序从这里生成人类可读的报告。为了方便起见，我们使用 Unix 系统html2text 实用程序转换了nytprof/index.html 文件。

$> html2text nytprof/index.html

Performance Profile Index
For wordmatch
  Run on Fri Sep 26 13:46:39 2008
Reported on Fri Sep 26 13:47:23 2008

         Top 15 Subroutines -- ordered by exclusive time
|Calls |P |F |Inclusive|Exclusive|Subroutine                          |
|      |  |  |Time     |Time     |                                    |
|251215|5 |1 |13.09263 |10.47692 |main::              |matches        |
|260642|2 |1 |2.71199  |2.71199  |main::              |debug          |
|1     |1 |1 |0.21404  |0.21404  |main::              |report         |
|2     |2 |2 |0.00511  |0.00511  |XSLoader::          |load (xsub)    |
|14    |14|7 |0.00304  |0.00298  |Exporter::          |import         |
|3     |1 |1 |0.00265  |0.00254  |Exporter::          |as_heavy       |
|10    |10|4 |0.00140  |0.00140  |vars::              |import         |
|13    |13|1 |0.00129  |0.00109  |constant::          |import         |
|1     |1 |1 |0.00360  |0.00096  |FileHandle::        |import         |
|3     |3 |3 |0.00086  |0.00074  |warnings::register::|import         |
|9     |3 |1 |0.00036  |0.00036  |strict::            |bits           |
|13    |13|13|0.00032  |0.00029  |strict::            |import         |
|2     |2 |2 |0.00020  |0.00020  |warnings::          |import         |
|2     |1 |1 |0.00020  |0.00020  |Getopt::Long::      |ParseOptionSpec|
|7     |7 |6 |0.00043  |0.00020  |strict::            |unimport       |

For more information see the full list of 189 subroutines.

报告的第一部分已经显示了有关哪些子程序使用最多时间的关键信息。接下来给出一些关于分析的源文件的统计信息。

        Source Code Files -- ordered by exclusive time then name
|Stmts  |Exclusive|Avg.   |Reports                     |Source File         |
|       |Time     |       |                            |                    |
|2699761|15.66654 |6e-06  |line   .    block   .    sub|wordmatch           |
|35     |0.02187  |0.00062|line   .    block   .    sub|IO/Handle.pm        |
|274    |0.01525  |0.00006|line   .    block   .    sub|Getopt/Long.pm      |
|20     |0.00585  |0.00029|line   .    block   .    sub|Fcntl.pm            |
|128    |0.00340  |0.00003|line   .    block   .    sub|Exporter/Heavy.pm   |
|42     |0.00332  |0.00008|line   .    block   .    sub|IO/File.pm          |
|261    |0.00308  |0.00001|line   .    block   .    sub|Exporter.pm         |
|323    |0.00248  |8e-06  |line   .    block   .    sub|constant.pm         |
|12     |0.00246  |0.00021|line   .    block   .    sub|File/Spec/Unix.pm   |
|191    |0.00240  |0.00001|line   .    block   .    sub|vars.pm             |
|77     |0.00201  |0.00003|line   .    block   .    sub|FileHandle.pm       |
|12     |0.00198  |0.00016|line   .    block   .    sub|Carp.pm             |
|14     |0.00175  |0.00013|line   .    block   .    sub|Symbol.pm           |
|15     |0.00130  |0.00009|line   .    block   .    sub|IO.pm               |
|22     |0.00120  |0.00005|line   .    block   .    sub|IO/Seekable.pm      |
|198    |0.00085  |4e-06  |line   .    block   .    sub|warnings/register.pm|
|114    |0.00080  |7e-06  |line   .    block   .    sub|strict.pm           |
|47     |0.00068  |0.00001|line   .    block   .    sub|warnings.pm         |
|27     |0.00054  |0.00002|line   .    block   .    sub|overload.pm         |
|9      |0.00047  |0.00005|line   .    block   .    sub|SelectSaver.pm      |
|13     |0.00045  |0.00003|line   .    block   .    sub|File/Spec.pm        |
|2701595|15.73869 |       |Total                       |
|128647 |0.74946  |       |Average                     |
|       |0.00201  |0.00003|Median                      |
|       |0.00121  |0.00003|Deviation                   |

Report produced by the NYTProf 2.03 Perl profiler, developed by Tim Bunce and
Adam Kaplan.

此时，如果您使用的是html 报告，您可以点击各种链接深入到每个子程序和每行代码。因为我们在这里使用文本报告，并且为每个源文件构建了一个完整的报告目录，所以我们只显示对应wordmatch-line.html 文件的一部分，足以让您了解您可以从这个酷炫的工具中获得的输出类型。

$> html2text nytprof/wordmatch-line.html

Performance Profile -- -block view-.-line view-.-sub view-
For wordmatch
Run on Fri Sep 26 13:46:39 2008
Reported on Fri Sep 26 13:47:22 2008

File wordmatch

 Subroutines -- ordered by exclusive time
|Calls |P|F|Inclusive|Exclusive|Subroutine    |
|      | | |Time     |Time     |              |
|251215|5|1|13.09263 |10.47692 |main::|matches|
|260642|2|1|2.71199  |2.71199  |main::|debug  |
|1     |1|1|0.21404  |0.21404  |main::|report |
|0     |0|0|0        |0        |main::|BEGIN  |


|Line|Stmts.|Exclusive|Avg.   |Code                                           |
|    |      |Time     |       |                                               |
|1   |      |         |       |#!/usr/bin/perl                                |
|2   |      |         |       |                                               |
|    |      |         |       |use strict;                                    |
|3   |3     |0.00086  |0.00029|# spent 0.00003s making 1 calls to strict::    |
|    |      |         |       |import                                         |
|    |      |         |       |use warnings;                                  |
|4   |3     |0.01563  |0.00521|# spent 0.00012s making 1 calls to warnings::  |
|    |      |         |       |import                                         |
|5   |      |         |       |                                               |
|6   |      |         |       |=head1 NAME                                    |
|7   |      |         |       |                                               |
|8   |      |         |       |filewords - word analysis of input file        |
<...snip...>
|62  |1     |0.00445  |0.00445|print report( %count );                        |
|    |      |         |       |# spent 0.21404s making 1 calls to main::report|
|63  |      |         |       |                                               |
|    |      |         |       |# spent 23.56955s (10.47692+2.61571) within    |
|    |      |         |       |main::matches which was called 251215 times,   |
|    |      |         |       |avg 0.00005s/call: # 50243 times               |
|    |      |         |       |(2.12134+0.51939s) at line 57 of wordmatch, avg|
|    |      |         |       |0.00005s/call # 50243 times (2.17735+0.54550s) |
|64  |      |         |       |at line 56 of wordmatch, avg 0.00005s/call #   |
|    |      |         |       |50243 times (2.10992+0.51797s) at line 58 of   |
|    |      |         |       |wordmatch, avg 0.00005s/call # 50243 times     |
|    |      |         |       |(2.12696+0.51598s) at line 55 of wordmatch, avg|
|    |      |         |       |0.00005s/call # 50243 times (1.94134+0.51687s) |
|    |      |         |       |at line 54 of wordmatch, avg 0.00005s/call     |
|    |      |         |       |sub matches {                                  |
<...snip...>
|102 |      |         |       |                                               |
|    |      |         |       |# spent 2.71199s within main::debug which was  |
|    |      |         |       |called 260642 times, avg 0.00001s/call: #      |
|    |      |         |       |251215 times (2.61571+0s) by main::matches at  |
|103 |      |         |       |line 74 of wordmatch, avg 0.00001s/call # 9427 |
|    |      |         |       |times (0.09628+0s) at line 50 of wordmatch, avg|
|    |      |         |       |0.00001s/call                                  |
|    |      |         |       |sub debug {                                    |
|104 |260642|0.58496  |2e-06  |my $message = shift;                           |
|105 |      |         |       |                                               |
|106 |260642|1.09917  |4e-06  |if ( $debug ) {                                |
|107 |      |         |       |print STDERR "DBG: $message\n";                |
|108 |      |         |       |}                                              |
|109 |      |         |       |}                                              |
|110 |      |         |       |                                               |
|111 |1     |0.01501  |0.01501|exit 0;                                        |
|112 |      |         |       |                                               |

里面有大量非常有用的信息 - 这似乎是前进的方向。

另请参阅Devel::NYTProf::Apache，它将Devel::NYTProf 连接到mod_perl。

#排序

Perl 模块不是性能分析师唯一可用的工具，系统工具如time 不应被忽视，如以下示例所示，我们快速看一下排序。许多书籍、论文和文章都写过关于高效排序算法的内容，这里不是重复这些工作的地方，还有几个优秀的排序模块值得一看：Sort::Maker、Sort::Key 等等。但是，仍然可以对与排序数据集相关的某些 Perl 特定解释进行一些观察，并给出一些关于如何排序大量数据会影响性能的示例。首先，在排序大量数据时，一个经常被忽视的点是，可以尝试减少要处理的数据集，在许多情况下，grep() 可以作为简单的过滤器非常有用

@data = sort grep { /$filter/ } @incoming

这样的命令可以大大减少一开始要实际排序的材料数量，并且不应仅仅因为它简单而被轻易忽略。KISS 原则经常被忽视 - 下一个示例使用简单的系统time 实用程序进行演示。让我们看一个实际的示例，对一个大型文件的内容进行排序，一个 apache 日志文件就可以了。这个文件有超过 25 万行，大小为 50M，其中一小段如下所示

# 日志文件

188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
188.209-65-87.adsl-dyn.isp.belgacom.be - - [08/Feb/2007:12:57:16 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
151.56.71.198 - - [08/Feb/2007:12:57:41 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
151.56.71.198 - - [08/Feb/2007:12:57:42 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
151.56.71.198 - - [08/Feb/2007:12:57:43 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"
217.113.68.60 - - [08/Feb/2007:13:02:15 +0000] "GET / HTTP/1.1" 304 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
217.113.68.60 - - [08/Feb/2007:13:02:16 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
debora.to.isac.cnr.it - - [08/Feb/2007:13:03:58 +0000] "GET /favicon.ico HTTP/1.1" 404 209 "-" "Mozilla/5.0 (compatible; Konqueror/3.4; Linux) KHTML/3.4.0 (like Gecko)"
195.24.196.99 - - [08/Feb/2007:13:26:48 +0000] "GET / HTTP/1.0" 200 3309 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
195.24.196.99 - - [08/Feb/2007:13:26:58 +0000] "GET /data/css HTTP/1.0" 404 206 "http://www.rfi.net/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
195.24.196.99 - - [08/Feb/2007:13:26:59 +0000] "GET /favicon.ico HTTP/1.0" 404 209 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.0.9) Gecko/20061206 Firefox/1.5.0.9"
crawl1.cosmixcorp.com - - [08/Feb/2007:13:27:57 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "voyager/1.0"
crawl1.cosmixcorp.com - - [08/Feb/2007:13:28:25 +0000] "GET /links.html HTTP/1.0" 200 3413 "-" "voyager/1.0"
fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:32 +0000] "GET /suse-on-vaio.html HTTP/1.1" 200 2858 "http://www.linux-on-laptops.com/sony.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
fhm226.internetdsl.tpnet.pl - - [08/Feb/2007:13:37:34 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net/suse-on-vaio.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
80.247.140.134 - - [08/Feb/2007:13:57:35 +0000] "GET / HTTP/1.1" 200 3309 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
80.247.140.134 - - [08/Feb/2007:13:57:37 +0000] "GET /data/css HTTP/1.1" 404 206 "http://www.rfi.net" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
pop.compuscan.co.za - - [08/Feb/2007:14:10:43 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /robots.txt HTTP/1.0" 200 179 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
livebot-207-46-98-57.search.live.com - - [08/Feb/2007:14:12:04 +0000] "GET /html/oracle.html HTTP/1.0" 404 214 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
dslb-088-064-005-154.pools.arcor-ip.net - - [08/Feb/2007:14:12:15 +0000] "GET / HTTP/1.1" 200 3309 "-" "www.clamav.net"
196.201.92.41 - - [08/Feb/2007:14:15:01 +0000] "GET / HTTP/1.1" 200 3309 "-" "MOT-L7/08.B7.DCR MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1"

这里具体的任务是按响应代码、查询、浏览器、来源 URL 和最后日期对该文件中的 286,525 行进行排序。一种解决方案可能是使用以下代码，该代码迭代命令行上给出的文件。

# sort-apache-log

#!/usr/bin/perl -n

use v5.36;

my @data;

LINE:
while ( <> ) {
    my $line = $_;
    if (
        $line =~ m/^(
            ([\w\.\-]+)             # client
            \s*-\s*-\s*\[
            ([^]]+)                 # date
            \]\s*"\w+\s*
            (\S+)                   # query
            [^"]+"\s*
            (\d+)                   # status
            \s+\S+\s+"[^"]*"\s+"
            ([^"]*)                 # browser
            "
            .*
        )$/x
    ) {
        my @chunks = split(/ +/, $line);
        my $ip      = $1;
        my $date    = $2;
        my $query   = $3;
        my $status  = $4;
        my $browser = $5;

        push(@data, [$ip, $date, $query, $status, $browser, $line]);
    }
}

my @sorted = sort {
    $a->[3] cmp $b->[3]
            ||
    $a->[2] cmp $b->[2]
            ||
    $a->[0] cmp $b->[0]
            ||
    $a->[1] cmp $b->[1]
            ||
    $a->[4] cmp $b->[4]
} @data;

foreach my $data ( @sorted ) {
    print $data->[5];
}

exit 0;

运行此程序时，重定向 STDOUT，以便可以从以下测试运行中检查输出是否正确，并使用系统 time 实用程序检查总体运行时间。

$> time ./sort-apache-log logfile > out-sort

real    0m17.371s
user    0m15.757s
sys     0m0.592s

该程序运行时间略超过 17 秒。注意 time 输出的不同值，始终使用相同的值很重要，不要混淆每个值的含义。

#经过的实际时间: 从调用 time 到其终止的总时间或时钟时间。经过的时间包括用户时间和系统时间，以及等待系统上其他用户和进程的时间。不可避免地，这是给定测量中最不精确的。
#用户 CPU 时间: 用户时间是整个进程在本系统上代表用户执行此程序所花费的时间量。
#系统 CPU 时间: 系统时间是内核本身代表此进程用户执行例程或系统调用所花费的时间量。

将此相同进程作为 Schwarzian 变换 运行，可以消除用于存储所有数据的输入和输出数组，并且也可以直接在输入到达时对其进行处理。否则，代码看起来非常相似

# sort-apache-log-schwarzian

#!/usr/bin/perl -n

use v5.36;

print

    map $_->[0] =>

    sort {
        $a->[4] cmp $b->[4]
                ||
        $a->[3] cmp $b->[3]
                ||
        $a->[1] cmp $b->[1]
                ||
        $a->[2] cmp $b->[2]
                ||
        $a->[5] cmp $b->[5]
    }
    map  [ $_, m/^(
        ([\w\.\-]+)             # client
        \s*-\s*-\s*\[
        ([^]]+)                 # date
        \]\s*"\w+\s*
        (\S+)                   # query
        [^"]+"\s*
        (\d+)                   # status
        \s+\S+\s+"[^"]*"\s+"
        ([^"]*)                 # browser
        "
        .*
    )$/xo ]

    => <>;

exit 0;

对与上面相同的日志文件运行新代码，以检查新时间。

$> time ./sort-apache-log-schwarzian logfile > out-schwarz

real    0m9.664s
user    0m8.873s
sys     0m0.704s

时间已缩短一半，这在任何标准下都是一个可观的提速。当然，重要的是要检查输出是否与第一次程序运行一致，这就是 Unix 系统 cksum 实用程序发挥作用的地方。

$> cksum out-sort out-schwarz
3044173777 52029194 out-sort
3044173777 52029194 out-schwarz

顺便说一下，也要注意来自经理的压力，他们看到你将程序的运行时间提高了 50%，然后一个月后又要求你再次这样做（真实故事）——你只需要指出你只是人，即使你是 Perl 程序员，你也会看到你能做什么……

#记录

任何良好开发过程的重要组成部分是适当的错误处理，并提供信息丰富的消息，但是存在一种思想流派认为日志文件应该很“健谈”，就好像不间断的输出链可以确保程序的生存一样。如果速度在任何方面都是问题，这种方法是错误的。

常见的景象是看起来像这样的代码

logger->debug( "A logging message via process-id: $$ INC: "
                                                      . Dumper(\%INC) )

问题在于，即使在日志配置⽂件中设置的调试级别为零，这段代码也会始终被解析和执⾏。例如，⼀旦进入 debug()⼦例程，并确认内部的 $debug 变量为零，则发送进来的消息将被丢弃，程序将继续执⾏。但在给出的示例中，\%INC 哈希表已经转储，并且消息字符串已经构建，所有这些⼯作都可以通过语句级别的调试变量来绕过，例如：

logger->debug( "A logging message via process-id: $$ INC: "
                                           . Dumper(\%INC) ) if $DEBUG;

可以通过设置包含两种形式的测试脚本，包括模拟典型 logger() 功能的 debug() ⼦例程，来演示这种效果。

# ifdebug

#!/usr/bin/perl

use v5.36;

use Benchmark;
use Data::Dumper;
my $DEBUG = 0;

sub debug {
    my $msg = shift;

    if ( $DEBUG ) {
        print "DEBUG: $msg\n";
    }
};

timethese(100000, {
        'debug'       => sub {
            debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
        },
        'ifdebug'  => sub {
            debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if $DEBUG
        },
});

让我们看看 Benchmark 对此有何看法

$> perl ifdebug
Benchmark: timing 100000 iterations of constant, sub...
   ifdebug:  0 wallclock secs ( 0.01 usr +  0.00 sys =  0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)
     debug: 14 wallclock secs (13.18 usr +  0.04 sys = 13.22 CPU) @ 7564.30/s (n=100000)

在⼀种情况下，代码在输出任何调试信息方面做的事情完全相同，换句话说，什么也没做，需要 14 秒，而在另⼀种情况下，代码需要 1/100 秒。看起来⾮常确定。在调用⼦例程之前使⽤ $DEBUG 变量，⽽不是依赖于其中的智能功能。

#如果 DEBUG（常量）则记录

可以通过使⽤编译时 DEBUG 常量，将上⼀个想法更进⼀步。

# ifdebug-constant

#!/usr/bin/perl

use v5.36;

use Benchmark;
use Data::Dumper;
use constant
    DEBUG => 0
;

sub debug {
    if ( DEBUG ) {
        my $msg = shift;
        print "DEBUG: $msg\n";
    }
};

timethese(100000, {
        'debug'       => sub {
            debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) )
        },
        'constant'  => sub {
            debug( "A $0 logging message via process-id: $$" . Dumper(\%INC) ) if DEBUG
        },
});

运⾏此程序会产⽣以下输出

$> perl ifdebug-constant
Benchmark: timing 100000 iterations of constant, sub...
  constant:  0 wallclock secs (-0.00 usr +  0.00 sys = -0.00 CPU) @ -7205759403792793600000.00/s (n=100000)
            (warning: too few iterations for a reliable count)
       sub: 14 wallclock secs (13.09 usr +  0.00 sys = 13.09 CPU) @ 7639.42/s (n=100000)

DEBUG 常量甚至超过了 $debug 变量，时钟停在负零秒，并⽣成⼀条“警告：迭代次数太少，无法获得可靠的计数”消息。为了了解实际情况，以及为什么我们在认为要求 100000 次迭代时却迭代次数太少，我们可以使⽤⾮常有⽤的 B::Deparse 来检查新代码

$> perl -MO=Deparse ifdebug-constant

use Benchmark;
use Data::Dumper;
use constant ('DEBUG', 0);
sub debug {
    use warnings;
    use strict 'refs';
    0;
}
use warnings;
use strict 'refs';
timethese(100000, {'sub', sub {
    debug "A $0 logging message via process-id: $$" . Dumper(\%INC);
}
, 'constant', sub {
    0;
}
});
ifdebug-constant syntax OK

输出显示我们正在测试的 constant() ⼦例程被 DEBUG 常量的值替换：零。要测试的⾏已被完全优化掉，你⽆法获得比这更⾼的效率。

#附录

本文档提供了多种⽅法来识别热点，并检查任何修改是否改进了代码的运⾏时间。

最后⼀点，请记住，（在撰写本文时）⽆法产⽣在零或负时间内运⾏的有效程序，这个基本原则可以写成：有效程序很慢，这是它们的定义。当然，可以编写⼀个近乎瞬时的程序，但它不会做太多事情，以下是一个⾮常有效的程序

$> perl -e 0

对它进⾏任何更进⼀步的优化都是 p5p 的⼯作。

#另请参阅

可以使用以下模块和链接找到更多阅读材料。

#PERLDOCS

例如：perldoc -f sort。

perlfaq4.

perlfork, perlfunc, perlretut, perlthrtut.

线程.

#手册页

时间.

#模块

当然，不可能在这里单独展示所有与 Perl 性能相关的代码，但以下列出了 CPAN 中值得进一步关注的模块。

Apache::DProf
Apache::SmallProf
Benchmark
DBIx::Profile
Devel::AutoProfiler
Devel::DProf
Devel::DProfLB
Devel::FastProf
Devel::GraphVizProf
Devel::NYTProf
Devel::NYTProf::Apache
Devel::Profiler
Devel::Profile
Devel::Profit
Devel::SmallProf
Devel::WxProf
POE::Devel::Profiler
Sort::Key
Sort::Maker

#网址

非常有用的在线参考资料

https://web.archive.org/web/20120515021937/http://www.ccl4.org/~nick/P/Fast_Enough/

https://web.archive.org/web/20050706081718/http://www-106.ibm.com/developerworks/library/l-optperl.html

https://perlbuzz.com/2007/11/14/bind_output_variables_in_dbi_for_speed_and_safety/

http://en.wikipedia.org/wiki/Performance_analysis

http://apache.perl.org/docs/1.0/guide/performance.html

http://perlgolf.sourceforge.net/

http://www.sysarch.com/Perl/sort_paper.html

内容

#名称

#描述

#概述