Perl survival guide for C programmers

You know C and you have some Perl code to read or maintain. This document was written for you. It presents the elements of the language that a) need to be understood to do anything interesting and b) cannot easily be looked up in the documentation.

1. Data types

There are three data types in Perl :

1.1. Scalars

A scalar can be a string, a number, an undefined value (undef) or a reference. Scalar variables have a "$" in front of their names. Unlike sh, Perl requires the "$" both for lvalues and rvalues.

Syntax Description Type Lvalue Rvalue
number numeric constant scalar
"string" string constant (interpolated) scalar
'string' string constant (not interpolated) scalar
$name the scalar variable name scalar
\$name reference to scalar variable name scalar

1.1.1. Numbers and strings

You can convert freely between strings and numbers. For example,

$n = 1;
$s = "a" . $n;

is equivalent to :

$s = "a1";

And

$s = "1";
$n = $s + 1;

is equivalent to :

$n = 2;

1.1.2. References

References are somewhat similar to C pointers except that there is no reference arithmetic in Perl the way there is pointer arithmetic in C.

The operator to take a reference ("&" in C) is "\". For example, "\$n" is a reference to "$n".

There is no unique dereferencing operator ("*" in C) in Perl. The way to dereference a reference will be explained later on.

1.2. Lists

Lists are ordered collection of elements. They are similar to C arrays except that :

Syntax Description Type Lvalue Rvalue
(scalar, ...) list constant list
[scalar, ...] reference to an anonymous list constant scalar
@name all the elements list
\@name reference to list variable name scalar
$name[i] element number i scalar
@name[i..j] elements number i through j list
scalar(@name) number of elements, counting any gaps scalar
$#name index of the last element scalar

† can be if all elements are lvalues

See also "pop", "push", "shift", "splice", "unshift" in perlfunc(1).

1.3. Hashes

Hashes are unordered collections of (key, value) pairs. The key can be any scalar. The value is a scalar. No two elements can have the same key.

Syntax Description Type Lvalue Rvalue
(key => value, ...) hash constant list
{key => value, ...} reference to an anonymous hash constant scalar
%name the entire hash list
\%name reference to hash variable name scalar
$name{k} the element whose key is k scalar
exists($name{k}) true iff an element whose key is k exists scalar
keys(%name) an unordered list of all the keys list

† can be if all elements are lvalues

See also "delete", "each", "exists", "keys", "values" in perlfunc(1).

2. Context

In Perl, every expression is evaluated in scalar or list context. Some functions and operators work differently depending on the context.

Type of variable In scalar context In list context
Scalar The value of the scalar A one-element list
List The number of elements in the list The elements of the list
Hash String "x/y" where y is the total number of buckets and x is the number of buckets used All the keys and values in the hash. The keys are not ordered but each is followed by its value.

3. Data types in practice

3.1. References

References are used for passing arrays and hashes to subs and build more complex data structures. The tables in sections 1.1, 1.2 and 1.3 above show how to take references ("&" in C parlance). This one shows how to dereference references ("*" and "->" in C parlance).

Syntax Description Type Lvalue Rvalue
${scalarref} dereference a scalar reference scalar
@{arrayref} dereference an array reference array
%{hashref} dereference a hash reference hash
ref->... Infix generic dereferencing operator. Shorthand for ${scalarref}, @{arrayref} or %{hashref}, depending on type of object referenced by ref. that of object referenced by ref
arrayref->[i] Element i of array referenced by arrayref. Shorthand for @{arrayref}[i]. scalar
hashref->{k} Element of hash referenced by hashref whose key is k. Shorthand for %{hashref}{k}. scalar

3.2. structs

C structs can be approximated with hashes.

Where you would write this in C : ... you may write this in Perl :
struct x
{
    int num;
    char *str;
};
# No declaration necessary
void recurse(struct x *y)
{
    if (y->num == 0)
    {
        puts(y->str);
    }
    else
    {
        y->num--;
        recurse(y);
    }
}
sub recurse($)
{
    my ($y) = @_;
    # $y is now the first argument.
    # It is a hash reference.

    if ($y->{'num'} == 0)
    {
        print $y->{'str'}, "\n";
    }
    else
    {
        $y->{'num'}--;
        recurse($y);
    }
}
int main (int argc, char *argv[])
{
    struct x y =
    {
      5,
      "hello"
    };

    recurse(&y);
}
my %y =
(
  'num' => 5,
  'str' => 'hello'
);

# Call recurse with a reference to %y
recurse(\%y);

3.3. Complex data structures

Because the value of a list or hash element must be a scalar, there is no such thing as a list of lists, a list of hashes, a hash of hashes or a hash of lists. However, since a reference to a list or hash is a scalar, we can approximate them with lists of references to lists, lists of references to hashes, etc.

3.3.1. Lists of lists

Suppose you want to make a database of planets in the solar system and their satellites. Because the order of planets and satellites is important, we must use a list of planets and, for each planet, a list of their satellites :

my @planet_sat =
(
  ['Mercury'],
  ['Venus'],
  ['Earth', 'Moon'],
  ['Mars', 'Phobos', 'Deimos'],
);

In the code above, @planet_sat is a list of references to lists, one per planet. In each sub-list, the first element is the name of the planet and the subsequent ones, if any, are the names of its satellites. Here's how you use it :

print "The 4th planet is $planet_sat[3]->[0]\n";
print "Its first satellite is $planet_sat[3]->[1]\n";
print "Its second satellite is $planet_sat[3]->[2]\n";

This illustrates the use of the "->" operator. $planet_sat[3] is the fourth element in @planet_sat. Its value is a reference to a list, of which we take the first element by appending "->[0]".

3.3.2. Lists of lists of lists

Another approach to the same problem which more strongly separates the planet from its satellites :

my @planet_sat2 =
(
  ['Mercury', []],
  ['Venus',   []],
  ['Earth',   ['Moon']],
  ['Mars',    ['Phobos', 'Deimos']],
);

In the code above, each element of the @planet_sat2 list is a reference to a two-element list. The first element being the name of the planet, the second being a reference to a (possibly empty) list of its satellites. So this is actually a list of lists of lists.

Accessing a satellite now involves another level of dereferencing, hence another -> operator :

print "The name of the 4th planet is $planet_sat2[3]->[0]\n";
print "It has ", scalar(@{$planet_sat2[3]->[1]}), " satellites\n";
print "The first is $planet_sat2[3]->[1]->[0]\n";
print "The second is $planet_sat2[3]->[1]->[1]\n";

The second line in the code fragment above illustrates the use of @{...}. Its function is to dereference a list reference. The resulting list is then evaluated in scalar context to get an element count.

3.3.3. Lists of hashes

Our second example will be a database of the planets and their physical and astronomical characteristics. This time we'll use a list of hashes :

my @planet_prop =
(
  { 'name' => 'Mercury', 'mass' => 3.3022e23, 'tilt' =>   0.01 },
  { 'name' => 'Venus',   'mass' => 4.8685e24, 'tilt' => 177.36 },
  { 'name' => 'Earth',   'mass' => 5.9736e24, 'tilt' =>  23.44 },
  { 'name' => 'Mars',    'mass' => 6.4185e23, 'tilt' =>  25.19 },
);

print "The 4th planet is $planet_prop[3]->{'name'}\n";
print "Its mass is       $planet_prop[3]->{'mass'} kg\n";
print "Its axial tilt is $planet_prop[3]->{'tilt'} deg\n";

As you can see, "->{key}" is to hash references what "->[index]" is to list references.

3.3.4. Lists of hashes of lists

To fold back the satellites list into @planet_prop, add a "sat" member whose value is a reference to a list of satellites :

my @planet_prop2 =
(
  { 'name' => 'Mercury',
    'mass' => 3.3022e23,
    'tilt' => 0.01,
    'sat'  => [] },
  { 'name' => 'Venus',
    'mass' => 4.8685e24,
    'tilt' => 177.36,
    'sat'  => [] },
  { 'name' => 'Earth',
    'mass' => 5.9736e24,
    'tilt' =>  23.44,
    'sat'  => ['Moon'] },
  { 'name' => 'Mars',
    'mass' => 6.4185e23,
    'tilt' =>  25.19,
    'sat'  => ['Phobos', 'Deimos'] },
);

print "The 4th planet is $planet_prop2[3]->{'name'}\n";
print "It has ", scalar(@{$planet_prop2[3]->{'sat'}}), " satellites\n";
print "The first one is $planet_prop2[3]->{'sat'}->[0]\n";

4. Further reading

The man pages referenced in perl(1), and in particular :

perlsyn(1)
syntax
perldata(1)
data structures
perlop(1)
operators and precedence
perlsub(1)
subroutines
perlfunc(1)
built-in functions
perlvar(1)
predefined variables
perlreftut(1)
references short introduction
perlre(1)
regular expressions, the rest of the story
perlretut(1)
regular expressions tutorial
perlref(1)
references, the rest of the story
perlreref(1)
regular expressions quick reference

This is http://www.teaser.fr/~amajorel/psgfcp/, last modified AYM 2007-12-11.