内容

名称

perldsc - Perl 数据结构食谱

描述

Perl 允许我们拥有复杂的数据结构。你可以写下这样的代码,突然之间,你就拥有了一个三维数组!

for my $x (1 .. 10) {
    for my $y (1 .. 10) {
        for my $z (1 .. 10) {
            $AoA[$x][$y][$z] =
                $x ** $y + $z;
        }
    }
}

然而,尽管看起来很简单,但它实际上比表面上看起来要复杂得多!

如何打印它?为什么不能直接使用 print @AoA?如何对它进行排序?如何将它传递给函数或从函数中获取一个?它是一个对象吗?可以将其保存到磁盘以便稍后读取吗?如何访问矩阵的整行或整列?所有值都必须是数字吗?

正如你所看到的,很容易让人感到困惑。虽然这其中一小部分原因可以归咎于基于引用的实现,但实际上更多的是由于缺乏针对初学者的示例文档。

本文档旨在详细但易懂地介绍你可能想要开发的各种数据结构。它还应该用作示例的食谱。这样,当你需要创建这些复杂的数据结构时,你就可以直接从这里获取一个可用的示例。

让我们详细了解这些可能的结构中的每一个。以下每个部分都有单独的介绍

但现在,让我们看看所有这些类型的数据结构共有的通用问题。

参考文献

关于 Perl 中所有数据结构(包括多维数组)最重要的理解是,即使它们可能看起来并非如此,Perl 的 @ARRAY%HASH 在内部都是一维的。它们只能保存标量值(即字符串、数字或引用)。它们不能直接包含其他数组或哈希,而是包含指向其他数组或哈希的 *引用*。

你不能像使用真正的数组或哈希那样使用指向数组或哈希的引用。对于不习惯区分数组和指向同一个数组的指针的 C 或 C++ 程序员来说,这可能会令人困惑。如果是这样,只需将其视为结构和指向结构的指针之间的区别。

你可以在 perlref 中阅读更多关于引用的内容(也应该这样做)。简而言之,引用有点像知道它们指向什么的指针。(对象也是一种引用,但我们现在还不需要它们——如果有的话。)这意味着,当你看到一些看起来像访问二维或多维数组和/或哈希的东西时,实际上发生的是基础类型只是一个包含指向下一级的引用的单维实体。只是你可以 *使用* 它,就好像它是一个二维的实体一样。这实际上也是几乎所有 C 多维数组的工作方式。

$array[7][12]                       # array of arrays
$array[7]{string}                   # array of hashes
$hash{string}[7]                    # hash of arrays
$hash{string}{'another string'}     # hash of hashes

现在,因为顶层只包含引用,如果你尝试使用简单的 print() 函数打印你的数组,你会得到一些看起来不太好的东西,比如这样

  my @AoA = ( [2, 3], [4, 5, 7], [0] );
  print $AoA[1][2];
7
  print @AoA;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)

这是因为 Perl 从不(永远不会)隐式地解引用你的变量。如果你想访问引用所指向的东西,那么你必须自己使用前缀类型指示符,比如 ${$blah}@{$blah}@{$blah[$i]},或者使用后缀指针箭头,比如 $a->[3]$h->{fred},甚至 $ob->method()->[3]

常见错误

在构建类似数组的数组这样的东西时,最常见的两个错误是意外地计算元素数量,或者重复引用同一个内存位置。以下是只获取计数而不是嵌套数组的情况

for my $i (1..10) {
    my @array = somefunc($i);
    $AoA[$i] = @array;      # WRONG!
}

这只是将数组分配给标量并获取其元素计数的简单情况。如果你真的想要这样做,那么你可能需要考虑更明确地说明它,就像这样

for my $i (1..10) {
    my @array = somefunc($i);
    $counts[$i] = scalar @array;
}

以下是重复引用同一个内存位置的情况

# Either without strict or having an outer-scope my @array;
# declaration.

for my $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = \@array;     # WRONG!
}

那么,这有什么大问题呢?它看起来很正确,不是吗?毕竟,我刚告诉你你需要一个引用数组,所以你已经帮我做了一个!

不幸的是,虽然这是真的,但它仍然是错误的。@AoA 中的所有引用都指向同一个位置,因此它们将全部保存@array 中最后的内容!这类似于以下 C 程序中演示的问题

#include <pwd.h>
main() {
    struct passwd *getpwnam(), *rp, *dp;
    rp = getpwnam("root");
    dp = getpwnam("daemon");

    printf("daemon name is %s\nroot name is %s\n",
            dp->pw_name, rp->pw_name);
}

它将打印

daemon name is daemon
root name is daemon

问题是rpdp都是指向内存中同一个位置的指针!在 C 中,你必须记住自己使用 malloc() 分配一些新的内存。在 Perl 中,你将需要使用数组构造器[]或哈希构造器{}。以下是修复前面代码片段的正确方法

# Either without strict or having an outer-scope my @array;
# declaration.

for my $i (1..10) {
    @array = somefunc($i);
    $AoA[$i] = [ @array ];
}

方括号会创建一个指向新数组的引用,该数组包含在赋值时@array中的副本。这就是你想要的。

请注意,这将产生类似的结果

# Either without strict or having an outer-scope my @array;
# declaration.
for my $i (1..10) {
    @array = 0 .. $i;
    $AoA[$i]->@* = @array;
}

它一样吗?嗯,也许是——也许不是。微妙的差异在于,当你将内容分配到方括号中时,你可以确定它始终是一个全新的引用,包含数据的全新副本。在这种新的情况下,$AoA[$i]->@*在赋值左侧的解引用可能会发生其他事情。这一切都取决于$AoA[$i]最初是否未定义,或者它是否已经包含一个引用。如果你已经用引用填充了@AoA,例如

$AoA[3] = \@another_array;

那么左侧间接的赋值将使用已经存在的引用

$AoA[3]->@* = @array;

当然,这产生“有趣”的效果,即覆盖@another_array。(你有没有注意到,当程序员说某件事“有趣”时,他们并不是指“有趣”,而是更倾向于指“令人讨厌”、“困难”或两者兼而有之?:-))

所以请记住始终使用带有[]{}的数组或哈希构造器,这样你就会没事,尽管它并不总是最有效的。

令人惊讶的是,以下看起来很危险的构造实际上会正常工作

for my $i (1..10) {
    my @array = somefunc($i);
    $AoA[$i] = \@array;
}

这是因为 my() 更像是一个运行时语句,而不是一个编译时声明本身。这意味着 my() 变量在每次循环中都会重新创建。所以即使它看起来像你每次都存储了相同的变量引用,但实际上你并没有!这是一个微妙的区别,它可以在降低代码效率的同时,误导除最有经验的程序员以外的所有人。因此,我通常建议不要向初学者教授它。事实上,除了将参数传递给函数之外,我很少喜欢在代码中看到“给我一个引用”运算符(反斜杠)。相反,我建议初学者,他们(以及我们大多数人)应该尝试使用更容易理解的构造函数 []{},而不是依赖词法(或动态)作用域和隐藏的引用计数在幕后做正确的事情。

还要注意,还有另一种方法可以编写解引用!这两行是等效的

$AoA[$i]->@* = @array;
@{ $AoA[$i] } = @array;

第一种形式,称为 *后缀解引用* 通常更容易阅读,因为表达式可以从左到右读取,并且没有需要平衡的封闭括号。另一方面,它也是比较新的。它是在 2014 年添加到语言中的,因此你经常会在旧代码中遇到另一种形式,*环绕解引用*。

总结

$AoA[$i] = [ @array ];     # usually best
$AoA[$i] = \@array;        # perilous; just how my() was that array?
$AoA[$i]->@*  = @array;    # way too tricky for most programmers
@{ $AoA[$i] } = @array;    # just as tricky, and also harder to read

关于优先级的警告

说到像 @{$AoA[$i]} 这样的东西,以下实际上是相同的东西

$aref->[2][2]       # clear
$$aref[2][2]        # confusing

这是因为 Perl 对其五个前缀解引用符(看起来像有人在咒骂:$ @ * % &)的优先级规则使它们比后缀下标括号或花括号绑定得更紧密!这无疑会让习惯使用 *a[i] 来表示 a 的第 i 个元素所指向内容的 C 或 C++ 程序员感到震惊。也就是说,他们首先取下标,然后才解引用该下标处的元素。在 C 中这很好,但这并不是 C。

Perl 中看似等效的结构 $$aref[$i] 首先对 $aref 进行解引用,使其将 $aref 作为对数组的引用,然后解引用它,最后告诉你 $AoA 所指向的数组的第 i 个值。如果你想要 C 的概念,你可以写 $AoA[$i]->$* 来显式地解引用第 i 个项目,从左到右读取。

为什么你应该始终使用 use VERSION

如果这听起来比它值钱更可怕,放松一下。Perl 有一些功能可以帮助你避免其最常见的陷阱。避免混淆的一种方法是在每个程序开头使用

use strict;

这样,你将被迫使用 my() 声明所有变量,并且还禁止意外的“符号解引用”。因此,如果你这样做

my $aref = [
    [ "fred", "barney", "pebbles", "bambam", "dino", ],
    [ "homer", "bart", "marge", "maggie", ],
    [ "george", "jane", "elroy", "judy", ],
];

print $aref[2][2];

编译器会立即将此标记为错误 *在编译时*,因为您意外地访问了 `@aref`,一个未声明的变量,因此它会提醒您改为编写

print $aref->[2][2]

从 Perl 5.12 版本开始,`use VERSION` 声明也会启用 `strict` 准则。此外,它还会启用一个功能包,提供更多有用的功能。从 5.36 版本开始,它还会启用 `warnings` 准则。通常,激活所有这些内容的最佳方法是在文件开头使用

use v5.36;

通过这种方式,每个文件都将以 `strict`、`warnings` 和许多有用的命名功能开启,以及一些旧功能关闭(例如 indirect)。有关更多信息,请参阅 "use VERSION" in perlfunc

调试

您可以使用调试器的 `x` 命令来转储复杂的數據结构。例如,给定上面对 $AoA 的赋值,以下是调试器输出

DB<1> x $AoA
$AoA = ARRAY(0x13b5a0)
   0  ARRAY(0x1f0a24)
      0  'fred'
      1  'barney'
      2  'pebbles'
      3  'bambam'
      4  'dino'
   1  ARRAY(0x13b558)
      0  'homer'
      1  'bart'
      2  'marge'
      3  'maggie'
   2  ARRAY(0x13b540)
      0  'george'
      1  'jane'
      2  'elroy'
      3  'judy'

代码示例

这里提供了一些简短的代码示例,说明了如何访问各种类型的數據结构。

数组的数组

数组的数组声明

my @AoA = (
       [ "fred", "barney" ],
       [ "george", "jane", "elroy" ],
       [ "homer", "marge", "bart" ],
     );

数组的数组生成

# reading from file
while ( <> ) {
    push @AoA, [ split ];
}

# calling a function
for my $i ( 1 .. 10 ) {
    $AoA[$i] = [ somefunc($i) ];
}

# using temp vars
for my $i ( 1 .. 10 ) {
    my @tmp = somefunc($i);
    $AoA[$i] = [ @tmp ];
}

# add to an existing row
push $AoA[0]->@*, "wilma", "betty";

数组的数组访问和打印

# one element
$AoA[0][0] = "Fred";

# another element
$AoA[1][1] =~ s/(\w)/\u$1/;

# print the whole thing with refs
for my $aref ( @AoA ) {
    print "\t [ @$aref ],\n";
}

# print the whole thing with indices
for my $i ( 0 .. $#AoA ) {
    print "\t [ $AoA[$i]->@* ],\n";
}

# print the whole thing one at a time
for my $i ( 0 .. $#AoA ) {
    for my $j ( 0 .. $AoA[$i]->$#* ) {
        print "elem at ($i, $j) is $AoA[$i][$j]\n";
    }
}

哈希的数组

哈希的数组声明

my %HoA = (
       flintstones        => [ "fred", "barney" ],
       jetsons            => [ "george", "jane", "elroy" ],
       simpsons           => [ "homer", "marge", "bart" ],
     );

哈希的数组生成

# reading from file
# flintstones: fred barney wilma dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    $HoA{$1} = [ split ];
}

# reading from file; more temps
# flintstones: fred barney wilma dino
while ( my $line = <> ) {
    my ($who, $rest) = split /:\s*/, $line, 2;
    my @fields = split ' ', $rest;
    $HoA{$who} = [ @fields ];
}

# calling a function that returns a list
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoA{$group} = [ get_family($group) ];
}

# likewise, but using temps
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    my @members = get_family($group);
    $HoA{$group} = [ @members ];
}

# append new members to an existing family
push $HoA{flintstones}->@*, "wilma", "betty";

哈希的数组访问和打印

# one element
$HoA{flintstones}[0] = "Fred";

# another element
$HoA{simpsons}[1] =~ s/(\w)/\u$1/;

# print the whole thing
foreach my $family ( keys %HoA ) {
    print "$family: $HoA{$family}->@* \n"
}

# print the whole thing with indices
foreach my $family ( keys %HoA ) {
    print "family: ";
    foreach my $i ( 0 .. $HoA{$family}->$#* ) {
        print " $i = $HoA{$family}[$i]";
    }
    print "\n";
}

# print the whole thing sorted by number of members
foreach my $family ( sort { $HoA{$b}->@* <=> $HoA{$a}->@* } keys %HoA ) {
    print "$family: $HoA{$family}->@* \n"
}

# print the whole thing sorted by number of members and name
foreach my $family ( sort {
                           $HoA{$b}->@* <=> $HoA{$a}->@*
                                         ||
                                     $a cmp $b
           } keys %HoA )
{
    print "$family: ", join(", ", sort $HoA{$family}->@* ), "\n";
}

哈希的数组

哈希的数组声明

my @AoH = (
       {
           Lead     => "fred",
           Friend   => "barney",
       },
       {
           Lead     => "george",
           Wife     => "jane",
           Son      => "elroy",
       },
       {
           Lead     => "homer",
           Wife     => "marge",
           Son      => "bart",
       }
 );

哈希的数组生成

# reading from file
# format: LEAD=fred FRIEND=barney
while ( <> ) {
    my $rec = {};
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
    push @AoH, $rec;
}


# reading from file
# format: LEAD=fred FRIEND=barney
# no temp
while ( <> ) {
    push @AoH, { split /[\s+=]/ };
}

# calling a function  that returns a key/value pair list, like
# "lead","fred","daughter","pebbles"
while ( my %fields = getnextpairset() ) {
    push @AoH, { %fields };
}

# likewise, but using no temp vars
while (<>) {
    push @AoH, { parsepairs($_) };
}

# add key/value to an element
$AoH[0]{pet} = "dino";
$AoH[2]{pet} = "santa's little helper";

哈希的数组访问和打印

# one element
$AoH[0]{lead} = "fred";

# another element
$AoH[1]{lead} =~ s/(\w)/\u$1/;

# print the whole thing with refs
for my $href ( @AoH ) {
    print "{ ";
    for my $role ( keys %$href ) {
        print "$role=$href->{$role} ";
    }
    print "}\n";
}

# print the whole thing with indices
for my $i ( 0 .. $#AoH ) {
    print "$i is { ";
    for my $role ( keys $AoH[$i]->%* ) {
        print "$role=$AoH[$i]{$role} ";
    }
    print "}\n";
}

# print the whole thing one at a time
for my $i ( 0 .. $#AoH ) {
    for my $role ( keys $AoH[$i]->%* ) {
        print "elem at ($i, $role) is $AoH[$i]{$role}\n";
    }
}

哈希的哈希

哈希的哈希声明

my %HoH = (
       flintstones => {
               lead      => "fred",
               pal       => "barney",
       },
       jetsons     => {
               lead      => "george",
               wife      => "jane",
               "his boy" => "elroy",
       },
       simpsons    => {
               lead      => "homer",
               wife      => "marge",
               kid       => "bart",
       },
);

哈希的哈希生成

# reading from file
# flintstones: lead=fred pal=barney wife=wilma pet=dino
while ( <> ) {
    next unless s/^(.*?):\s*//;
    my $who = $1;
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $HoH{$who}{$key} = $value;
    }
}


# reading from file; more temps
while ( <> ) {
    next unless s/^(.*?):\s*//;
    my $who = $1;
    my $rec = {};
    $HoH{$who} = $rec;
    for my $field ( split ) {
        my ($key, $value) = split /=/, $field;
        $rec->{$key} = $value;
    }
}

# calling a function  that returns a key,value hash
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    $HoH{$group} = { get_family($group) };
}

# likewise, but using temps
for my $group ( "simpsons", "jetsons", "flintstones" ) {
    my %members = get_family($group);
    $HoH{$group} = { %members };
}

# append new members to an existing family
my %new_folks = (
    wife => "wilma",
    pet  => "dino",
);

for my $what (keys %new_folks) {
    $HoH{flintstones}{$what} = $new_folks{$what};
}

访问和打印 HASH OF HASHES

# one element
$HoH{flintstones}{wife} = "wilma";

# another element
$HoH{simpsons}{lead} =~ s/(\w)/\u$1/;

# print the whole thing
foreach my $family ( keys %HoH ) {
    print "$family: { ";
    for my $role ( keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

# print the whole thing  somewhat sorted
foreach my $family ( sort keys %HoH ) {
    print "$family: { ";
    for my $role ( sort keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}


# print the whole thing sorted by number of members
foreach my $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) {
    print "$family: { ";
    for my $role ( sort keys $HoH{$family}->%* ) {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

# establish a sort order (rank) for each role
my $i = 0;
my %rank;
for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i }

# now print the whole thing sorted by number of members
foreach my $family ( sort { $HoH{$b}->%* <=> $HoH{$a}->%* } keys %HoH ) {
    print "$family: { ";
    # and print these according to rank order
    for my $role ( sort { $rank{$a} <=> $rank{$b} }
                                              keys $HoH{$family}->%* )
    {
        print "$role=$HoH{$family}{$role} ";
    }
    print "}\n";
}

更详细的记录

声明更详细的记录

以下示例展示了如何创建和使用一个字段类型多种多样的记录。

my $rec = {
    TEXT      => $string,
    SEQUENCE  => [ @old_values ],
    LOOKUP    => { %some_table },
    THATCODE  => \&some_function,
    THISCODE  => sub { $_[0] ** $_[1] },
    HANDLE    => \*STDOUT,
};

print $rec->{TEXT};

print $rec->{SEQUENCE}[0];
my $last = pop $rec->{SEQUENCE}->@*;

print $rec->{LOOKUP}{"key"};
my ($first_k, $first_v) = each $rec->{LOOKUP}->%*;

my $answer = $rec->{THATCODE}->($arg);
$answer = $rec->{THISCODE}->($arg1, $arg2);

# careful of extra block braces on fh ref
print { $rec->{HANDLE} } "a string\n";

use FileHandle;
$rec->{HANDLE}->autoflush(1);
$rec->{HANDLE}->print(" a string\n");

声明 HASH OF COMPLEX RECORDS

my %TV = (
   flintstones => {
       series   => "flintstones",
       nights   => [ qw(monday thursday friday) ],
       members  => [
           { name => "fred",    role => "lead", age  => 36, },
           { name => "wilma",   role => "wife", age  => 31, },
           { name => "pebbles", role => "kid",  age  =>  4, },
       ],
   },

   jetsons     => {
       series   => "jetsons",
       nights   => [ qw(wednesday saturday) ],
       members  => [
           { name => "george",  role => "lead", age  => 41, },
           { name => "jane",    role => "wife", age  => 39, },
           { name => "elroy",   role => "kid",  age  =>  9, },
       ],
    },

   simpsons    => {
       series   => "simpsons",
       nights   => [ qw(monday) ],
       members  => [
           { name => "homer", role => "lead", age  => 34, },
           { name => "marge", role => "wife", age => 37, },
           { name => "bart",  role => "kid",  age  =>  11, },
       ],
    },
 );

生成 HASH OF COMPLEX RECORDS

# reading from file
# this is most easily done by having the file itself be
# in the raw data format as shown above.  perl is happy
# to parse complex data structures if declared as data, so
# sometimes it's easiest to do that

# here's a piece by piece build up
my $rec = {};
$rec->{series} = "flintstones";
$rec->{nights} = [ find_days() ];

my @members = ();
# assume this file in field=value syntax
while (<>) {
    my %fields = split /[\s=]+/;
    push @members, { %fields };
}
$rec->{members} = [ @members ];

# now remember the whole thing
$TV{ $rec->{series} } = $rec;

###########################################################
# now, you might want to make interesting extra fields that
# include pointers back into the same data structure so if
# change one piece, it changes everywhere, like for example
# if you wanted a {kids} field that was a reference
# to an array of the kids' records without having duplicate
# records and thus update problems.
###########################################################
foreach my $family (keys %TV) {
    my $rec = $TV{$family}; # temp pointer
    my @kids = ();
    for my $person ( $rec->{members}->@* ) {
        if ($person->{role} =~ /kid|son|daughter/) {
            push @kids, $person;
        }
    }
    # REMEMBER: $rec and $TV{$family} point to same data!!
    $rec->{kids} = [ @kids ];
}

# you copied the array, but the array itself contains pointers
# to uncopied objects. this means that if you make bart get
# older via

$TV{simpsons}{kids}[0]{age}++;

# then this would also change in
print $TV{simpsons}{members}[2]{age};

# because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2]
# both point to the same underlying anonymous hash table

# print the whole thing
foreach my $family ( keys %TV ) {
    print "the $family";
    print " is on during $TV{$family}{nights}->@*\n";
    print "its members are:\n";
    for my $who ( $TV{$family}{members}->@* ) {
        print " $who->{name} ($who->{role}), age $who->{age}\n";
    }
    print "it turns out that $TV{$family}{lead} has ";
    print scalar ( $TV{$family}{kids}->@* ), " kids named ";
    print join (", ", map { $_->{name} } $TV{$family}{kids}->@* );
    print "\n";
}

数据库关联

您无法轻松地将多级数据结构(如哈希的哈希)与 dbm 文件关联。第一个问题是除了 GDBM 和 Berkeley DB 之外,所有其他 dbm 都存在大小限制,除此之外,您还需要考虑如何在磁盘上表示引用。一个尝试部分解决此需求的实验性模块是 MLDBM 模块。请查看您附近的 CPAN 站点,如 perlmodlib 中所述,以获取 MLDBM 的源代码。

另请参阅

perlref, perllol, perldata, perlobj

作者

Tom Christiansen <[email protected]>