Computing - Perl

Perl stands for Practical Extraction and Report Language. Perl is one of the most exciting computer languages available today. The ease in which you can do character processing in Perl has made it one of the most popular languages for developing web pages. Some web sites, such as Yahoo or SlashDot, rely very heavily on Perl, using tens of thousands of lines of Perl code.

Perl is an example of open source software. That means that Perl interpreters can be obtained for free, with even source code available. Thus, anyone can examine the code to find and fix any defects. Perl has been ported to a wide variety of operating systems, including Linux and Windows. As long as you don't use operating system specific commands, Perl programs can be very portable.

Here's an interesting story involving Perl. In 1997, one student overwhelmingly won the UCLA undergraduate programming contest using Perl to solve five out of six problems. The second place team used something else and only solved three problems. As a result of this, Perl has been banned from this contest!

I'm not going into much detail here on Perl since there is already much good information on it. But I will describe some of the basics and, later, I'll present a more substantial example.

Basics

Here is the usual "Hello World" program:

#!/usr/bin/perl
print "Hello, World!\n";

Exciting, huh? Well, okay, we'll get to something exciting later, but it seems that every language tutorial has to start with this example. Here are some of the essential concepts. For more information, check out the links.

Datatypes

Like other scripting languages, such as REXX, data in Perl is basically character. Some operations, however, such as the arithmetic operators, treat the character data as numbers.

Normally, you don't have to worry about the difference and you just code what makes sense, but you do have to be careful with comparisons. The operators eq, ne, le, lt, ge, gt, and cmp perform the comparison in character form. The operators ==, !=, <=, <, >=, >, and <=> perform the comparison in numeric form.

Data structures

Names are prefixed with $, @, or % depending on how the data is to be accessed. Scalar variables are accessed using $, for example, $max.

Arrays are indicated by @, such as @array. Arrays are indexed starting from 0 (like C and Java). An example of an indexed array is $array[17], which accesses the 18th element of the array. Note that since the array element contains a scalar value, the name is prefixed by $. @array gives you the entire array.

Hashes are indicated by %. A hash is like an array, but indexed with any arbitrary character string. An example of using a hash is $hash{'name'}. As with arrays, %hash refers to the entire hash.

References

References are like pointers in other languages. You can set a reference to another object using $r=\@array. In this case, $r is a reference to the other object, @$r refers to the original array object, and $$r[17] refers to a particular array element.

Predefined Variables

Perl has a large number of predefined variables. Here are a few of them:

@ARGV is an array that holds the command line parameters specified when calling the program.

@_ is an array that holds the parameters passed to a subroutine.

$_ is a commonly used variable. Many operations implicitly use this variable if an operand is not explicitly specified. For example, many programs loop through the records of a file. Rather than explicitly reading each record into a variable, these programs often default to using $_, which makes the code within the loop a little more concise.

Regular expressions

Here's where Perl really shines. Regular expressions are used in matching and extracting text from character strings. Regular expressions are typically coded between slashes, as in /something/. Conditional expression $var=~/what/ evaluates true if variable $var contains the string what.

To check if a variable contains a numeric value, you can use a regular expression like /^\d+$/. Symbol ^ matches the beginning of the string. Symbol \d matches a digit. The + indicates that one or more of the characters match. And finally, the $ matches the end of the string.

Subroutines

Subroutines are defined using statement sub, and work like subroutines in most other languages.

But, here's another place where you can really take advantage of the language. If you define a subroutine as taking a parameter of type &, you can pass an anonymous subroutine as a parameter. This is simply a block of code delimited by braces.

For example, Perl function sort can take a block as a parameter. The following expression sorts the array into numeric sequence (rather than character sequence).

sort { $a <=> $b } @array

Comments

Comments are what follow the character #. The first line of a Perl program is typically #!/usr/bin/perl which indicates to Posix style systems (like Unix and Linux) the interpreter that runs the program.

Links

.The Perl Institute is the best place to learn more about Perl. You don't need any other links! You can find good tutorials, lists of recommended books, and information on how to download and run Perl on your computer.

Sample Perl Program

Here is a non-trivial program that illustrates many features of Perl.

I first wrote a program to solve the triangle puzzle in C after my sister-in-law tried, and failed, to solve the puzzle. When I first ran the program, I thought there was a bug in it since it spewed out hundreds of solutions. But after checking out a few of them, I realised that there really are tens of thousands of solutions!

I lost the original C source, but I thought it would be a good exercise to try this again in Perl. The key to solving the puzzle is the recursive subroutine check. This subroutine checks to see if we have one peg left on the board and, thus, a solution to the puzzle. If not, the subroutine loops through all possible moves, and then checks the resulting configuration. Since that's what the subroutine already does, it can be called again.

By the way, if you want to see a sample of what this program writes out, scroll down to the bottom of this page.

#!/usr/bin/perl -w
########################################################################
# triangle.pl {empty-hole {max}}                                       #
#                                                                      #
# Solve the triangle puzzle.                                           #
#                                                                      #
# empty-hole -- The index of the empty hole.  It must be a number      #
#               between 0 and 14.  If not specified,  hole 0 is        #
#               initially empty.                                       #
# max -- The maximum number of desired solutions.  If not specified,   #
#        if defaults to 1.                                             #
#                                                                      #
# The triangle puzzle is played on a triangular arrangement of         #
# 15 holes into which are placed 14 pegs.  Pegs are moved jumping      #
# across neighboring pegs into the empty hole immediately beyond.      #
# The jumped peg is removed.  You win when all except one peg are      #
# removed.                                                             #
#                                                                      #
#               0                                                      #
#             1   2                                                    #
#           3   4   5                                                  #
#         6   7   8   9                                                #
#       10  11  12  13  14                                             #
#                                                                      #
#----------------------------------------------------------------------#
# (c) Copyright 1999 Hans Boldt                                        #
#                                                                      #
# This program may be freely distributed and modified provided that    #
# the original copyright be kept and that the source code must be      #
# available.                                                           #
########################################################################

# Initialize array that represents game board:
@board = (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1);

# 'moves' is an array that holds all possible moves.  Each element
# is a 3 element array that indicates:
#    - 0: index of hole with peg to move
#    - 1: index of hole to jump
#    - 2: index of destination
@moves = (
    [  0,  1,  3 ], [  0,  2,  5 ],
    [  1,  3,  6 ], [  1,  4,  8 ],
    [  2,  4,  7 ], [  2,  5,  9 ],
    [  3,  1,  0 ], [  3,  4,  5 ],
    [  3,  7, 12 ], [  3,  6, 10 ],
    [  4,  7, 11 ], [  4,  8, 13 ],
    [  5,  2,  0 ], [  5,  4,  3 ],
    [  5,  8,  12], [  5,  9, 14 ],
    [  6,  3,  1 ], [  6,  7,  8 ],
    [  7,  4,  2 ], [  7,  8,  9 ],
    [  8,  4,  1 ], [  8,  7,  6 ],
    [  9,  5,  2 ], [  9,  8,  7 ],
    [ 10,  6,  3 ], [ 10, 11, 12 ],
    [ 11,  7,  4 ], [ 11, 12, 13 ],
    [ 12, 11, 10 ], [ 12,  7,  3 ],
    [ 12,  8,  5 ], [ 12, 13, 14 ],
    [ 13, 12, 11 ], [ 13,  8,  4 ],
    [ 14, 13, 12 ], [ 14,  9,  5 ]   );

# Get parameters of program
($empty, $max, @rest) = @ARGV;

# If first parameter is ?, print out some help text
if ($empty and $empty eq '?')
{
   print "triangle.pl {first {max}}\n";
   print "\n";
   print "Solve the triangle puzzle.\n\n";
   print "   first -- Index of empty hole (0-14)\n";
   print "   max   -- Maximum number of solutions wanted\n\n";
   exit 1;
}

# Validate first parameter if specified
if ($empty and $empty ne '')
{
   # First parameter must be a non-negative number less than 15
   if (not ($empty =~ /^\d+$/)
   or  $empty > 14)
   {
      die "***** First parameter '$empty' is not valid.";
   }
}
else
{
   # Default to hole number 0
   $empty = 0;
}

# Validate second parameter if specified
if ($max and $max ne '')
{
   # Second parameter must be a non-negative number
   not ($max =~ /^\d+$/)
         and die "***** Second parameter '$max' is not valid.";
}
else
{
   # Default number of solutions desired is 1
   $max = 1;
}

# If third parameter specified, issue warning
if (@rest)
{
   print STDERR "+++++ Excess parameters @rest ignored\n\n";
}

# Remove desired peg from board
$board[$empty] = 0;
$count = 0;

# Check if we have a solution
check("");

exit 0;


#----------------------------------------------------------------------#
# check   -- Check if we have found a solution.  If not, try           #
#            all possible moves                                        #
#----------------------------------------------------------------------#
# Parameters:                                                          #
#   $sol -- list of moves that got us to this position                 #
#----------------------------------------------------------------------#
# Globals:                                                             #
#   @board -- game board.                                              #
#   @moves -- list of all possible moves.                              #
#----------------------------------------------------------------------#
# Notes:                                                               #
#   This is a recursive subroutine. First, we check if we have only    #
#   one peg left.  If so, we have a solution to the puzzle.            #
#                                                                      #
#   If we don't have a solution yet, try out all possible moves.  If   #
#   a move is possible, make the move and then check if we have only   #
#   one peg left.  This is where the recursion comes in - we can call  #
#   this subroutine again to check for a solution.                     #
#----------------------------------------------------------------------#
sub check
{
   # Get solution so far
   my $sol = $_[0];

   # Check if we have a solution
   $pegs = 0;
   foreach (@board)
   {
      $pegs += $_;
   }

   # Do we have a solution?
   if ($pegs == 1)
   {
      # Yes! We have a solution.
      $count++;
      chop $sol; # chop trailing blank
      chop $sol; # chop trailing comma
      print "$count:\t$sol\n";

      # Exit if we have enough solutions already.
      $count >= $max and exit;
      return;
   }

   # We don't have a solution yet.  Go through all possible moves
   # and check if we have a solution yet.
   my ($arr, $from, $skip, $to);
   foreach $arr (@moves)
   {
      # Get one possible move
      ($from, $skip, $to) = @$arr;

      # Is this move possible on this board?
      if ($board[$from] == 1
      and $board[$skip] == 1
      and $board[$to] == 0)
      {
         # Move is possible -- make move: a peg is removed from
         # one hole, jumps over a neighboring peg, and is placed
         # into the target hole.  The jumped peg is removed.
         $board[$from] = 0;
         $board[$skip] = 0;
         $board[$to]   = 1;

         # Check if we have a solution
         check("$sol$from-$to, ");

         # Back out of move
         $board[$from] = 1;
         $board[$skip] = 1;
         $board[$to]   = 0;
      }
   }

} # End of sub check

Here are a few sample solutions printed by this program:

1:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 12-5, 1-8, 2-9, 14-5, 5-12, 13-11, 10-12
2:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 12-5, 2-9, 1-8, 14-5, 5-12, 13-11, 10-12
3:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 12-5, 2-9, 14-5, 1-8, 5-12, 13-11, 10-12
4:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 13-11, 10-12, 12-5, 1-8, 2-9, 14-5, 5-12
5:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 13-11, 10-12, 12-5, 2-9, 1-8, 14-5, 5-12
6:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 13-11, 10-12, 12-5, 2-9, 14-5, 1-8, 5-12
7:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 13-11, 10-12, 12-5, 2-9, 14-5, 5-3, 1-6
8:      3-0, 5-3, 0-5, 6-1, 9-2, 11-4, 13-11, 10-12, 12-5, 2-9, 14-5, 5-3, 3-0
9:      3-0, 5-3, 0-5, 6-1, 9-2, 12-3, 1-6, 10-3, 13-4, 2-7, 3-12, 11-13, 14-12
10:     3-0, 5-3, 0-5, 6-1, 9-2, 12-3, 1-6, 10-3, 14-12, 11-13, 13-4, 2-7, 3-12