Genealogy Introduction Comment book Family Album Christmas 1998 Cats Our Cats Neighbor Cats Boldt Introduction Album Mecklenburg Names index Moll Introduction Moll Reunion 2003 MollFest 2000 Album Group Photos Famous relatives Names index Genealogy Computing Introduction Genealogy & Perl Perl Gedcom Objects Modules Programs 1 Programs 2 |
Computing - Perl and GedcomOkay, so here's where we get to the useful stuff. This page discusses some basic techniques for using Perl to process Gedcom files. I think it should be clear how powerful Perl is in this area. Parsing Gedcom RecordsUsing Perl's robust support for pattern matching, it is rather easy to split a Gedcom record into its parts. The following regular expression alone does that: /^(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/; Let's break down that expression to see what it does:
The parentheses around each element indicate that the matched
text is placed in predefined variables Reading Gedcom FilesThe following Perl code reads the Gedcom records from standard input:
@lines = <STDIN>; # Read all lines into array @lines
foreach (@lines) # Each line is placed into predefined scalar $_
{
# Get rid of CR/LF chars
chomp; # Get rid of end of line character
s/[\cM\cJ]//; # Get rid of other CR or LF characters
# Parse record
/^(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/;
$level = $1; # Save the level number
$label = $2; # Save the label (if specified)
$tag = uc($3); # Uppercase the tag
$text = $4; # Save everything else
# Process record
...
}
In this code fragment, you can substitute the ellipses with the code to process the record. Put loop into subroutineTo make this more flexible, the above loop can be specified in a subroutine. The processing to be done on each record can be passed in as a block:
sub loopGedcom (&\@)
{
my ($proc, $lins) = @_;
local $level, $label, $tag, $text;
foreach (@$lins)
{
# Get rid of CR/LF chars
chomp;
s/[\cM\cJ]//;
# Parse record
/^(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/;
$level = $1;
$label = $2;
$tag = uc($3);
$text = $4;
# Process record
&$proc;
}
}
This subroutine takes two parameters:
Here's an example of how to call the above subroutine:
# Read input file
@lines = <STDIN>;
# Loop through all records
loopGedcom
{
# Do we have a NAME record?
if ($tag eq 'NAME')
{
# Process NAME record
...
}
}
@lines;
Put subroutine into moduleSince this subroutine can be used by any Perl program that
processes Gedcom files, it should be placed in a Perl
module. Modules are files with extension A module can contain any Perl code and ends with the number
# Module "gedcom.pm"
sub loopGedcom (&\@)
{
my ($proc, $lins) = @_;
local $level, $label, $tag, $text;
foreach (@$lins)
{
# Get rid of CR/LF chars
chomp;
s/[\cM\cJ]//;
# Parse record
/^(\d+)\s+(@\S+@)?\s*(\S+)\s+(.*)/;
$level = $1;
$label = $2;
$tag = uc($3);
$text = $4;
# Process record
&$proc;
}
}
1
Sample ProgramNow, let's do something useful. When publishing genealogy data on the internet, it is useful to be able to list all surnames in a Gedcom file. The following program reads a Gedcom file from standard input and writes out the unique surnames to standard output.
#!/usr/bin/perl
########################################################################
# surnames #
# #
# Print a sorted list of surnames in the input gedcom file. #
# #
# Reads from STDIN and writes to STDOUT. #
#----------------------------------------------------------------------#
# (c) Copyright 1999 Hans Boldt #
# #
# This program may be freely distributed and modified provided that #
# the original copyright be kept and that the source code must be #
# available. #
########################################################################
use gedcom; # Contains subroutine 'loopGedcom'
# Read input file
@lines = <STDIN>>;
# Loop through GEDCOM data and extract surnames
loopGedcom
{
# Do we have a name?
if ($tag eq 'NAME'
&& $text =~ m"/(.*)/")
{
if ($1 and $1 ne '')
{
$list{$1} ++;
}
}
}
@lines;
# Print out list of surnames
foreach (sort keys %list) # Each item of list is placed into $_
{
print "$_\n";
}
exit;
(To call this program on your computer, check the documentation specific your version of Perl.) |