https://
gopher://
#+AUTHOR: Simon Watson
#+TITLE: In Defense of Perl
* Preamble
I've wanted to blog about Perl for a while now. I've had this conversation
with quite a few friends, peers, and co-workers over the years and I can never
seem to fully get to the bottom of it.
Perl has obviously lost a lot of ground in recent decades, and is not really
what I would consider a popular language anymore. I think there are a few
reasons for this, and they're not unfair reasons.
With that said, in my experience Perl seems to have a very bad reputation. Not
only is it not popular, people are often offended by it. It seems to provoke
strong reactions.
Languages like Ruby and PHP are often called "dead" and "not modern", and seem
to have many people calling out their usage. With that said both of these langs
seem to be thriving with strong communities, and for every person that nay-says
their use, there is another in the discussion celebrating them.
This doesn't seem to be the case with Perl in my experience.
Below I'd like to "defend" Perl a bit, and show why it still has a place in the
modern computing landscape. In doing so I will try to be clear about my own biases
and forthcoming about Perl's many issues.
Disclaimer/Note: Most of the comparisons I make in this article are between Perl
and Python. Not because Python is a bad language, but because Python is commonly
the language that's suggested as the "better" alternative to writing
something in Perl.
** My History and biases
Skip this section if you're more interested in the arguments I present on Perl's
behalf. This section is to give a litte background and to try and enumerate my
biases in favor of Perl.
______________________________________________________________________________
*** Bias #1
Perl was the first language I felt I had learned to a pretty complete level. I'm
always hesisitant to say I'm an expert in anything, but if I was ever an expert (or
close to it) in one area, it's probably Perl programming and syntax.
I think this immediately creates a bias in my head that Perl is a good language. I
think this kind of bias is pretty common when trying to talk about programming languages
objectively.
- I'm very familiar with it so it /feels/ easy
- I know all the standard patterns so I reach for it a lot, and
it's /very/ quick for me to write relative to other languages;
I often use it for prototyping even if I end up rewriting in
another lang later
*** Bias #2
I'm a sysadmin by trade, not a software engineer. Despite that, I've had to write
and maintain software (especially things like tooling) many times in my career.
I mention this because I think Perl favors this profession more so than strict
software engineering jobs. I'll touch more on this later, but wanted to mention
this here.
*** Bias #3
I very rarely need to deploy or write software for anything other than a Unix
environment, and even then 95% of the time, it's for a Linux environment. Again,
I'll cover this more later, but my experience and arguments are heavily biased
towards the /Unix/ Perl programming experience. Not only is this a bias, it's
a disclaimer of sorts, as I won't really be covering the non-Unix system Perl
programming experience.
** Why Perl is "bad"
Before I get into my arguments for using Perl today, it's important to cover some
of it's downsides and talk a bit about why I believe it's been relatively abandoned
by the modern software development community.
I think one of the biggest reasons right off the bat that Perl has fallen by the wayside is
the Perl5/Perl6 debacle. Others have covered this pretty extensively, so I won't
belabor the point, but in essence I think the effort to "modernize" Perl, and
try and make it into a language that non-Perl users could love, while still keeping
Perl5 users happy, was too tall of an order. The spiral that came out of it fragmented
the community and userbase, and gave the inititive to Python.
The Perl wikipedia page has some decent coverage of the various Perl5/6/7 lineages:
https://en.wikipedia.org/wiki/Perl#Raku_(Perl_6)
( For some more interesting reading/background, see:
https://en.wikipedia.org/wiki/Outline_of_Perl )
Secondly, the syntax. I will make some arguments for Perl's syntax later, but I'll take a
brief moment here to acknowledge that it can look esoteric on a good day, and down right
illegiable on a bad one. The heavy use of sigils is something I will aim to cast as a
positive later on, but I will admit that needing to memorize and have an awareness of
different context sensative sigils can make code look "messy" or hard to deciper. More
on this later.
Lastly, and by and large the argument I'm most likely to hear against using Perl:
"No one uses it."
In this blog post I hope to address these arguments and others, with
concrete and constructive counter points.
My aim in writing this is _not to convince people to program in Perl._ It's to convince
people that _Perl is perfectly fine language to use_ for many different problem
areas -- it's to show that it may in fact be the /better/ choice for some problem areas.
I hope the distinction is clear and that I can convince you!
* Addressing Arguments
Preamble out of the way, I'll get right down to brass tacks.
** Perl Syntax
*** Examples
People often talk about how Perl is completely unreadable and "write only". This can
be true, but I think it can be true for /any/ language, and as such doesn't really feel
like a valid criticism.
With that said, lets explore it a bit.
Let's start with something basic like making a hash using two arrays:
#+BEGIN_SRC perl
#!/usr/bin/perl
my @keys = ("a", "b", "c");
my @vals = (1, 2, 3);
my %hash;
@hash{@keys} = @vals;
# Output:
# perl ar2h.pl
# a : 1
# b : 2
# c : 3
#+END_SRC
#+BEGIN_SRC python
keys = ['a', 'b', 'c']
values = [1, 2, 3]
hash = {key: value for key, value in zip(keys, values)}
print(hash)
# Output:
# python3 ar2h.py
# {'a': 1, 'b': 2, 'c': 3}
#+END_SRC
For those unfamiliar with Perl's syntax, I'll break down briefly what's happening here:
We have two arrays with data in them, and an empty hash. Hashes in Perl are denoted by the '%'
symbol, arrays the '@' symbol.
By addressing the hash '%hash' with the '@' symbol, we are essentially addressing one dimention of
the hash. This 'syntax sugar' gives us an extremely ergonomic way to reason about how data assignment
is working in the assignment line.
We're taking the hash 'hash' and assigning the array 'keys' to it's first dimenstion, and the array
'vals' and assigning it to it's second dimenstion. Because arrays are ordered, this mapping is
intuituve and predictable.
There's nothing terrible about the example Python code to me, but the idea that it's intrinsically
more readable doesn't ring true for me, it's just different, and presupposes that you understand
it's assignment syntax in the way Perl presupposes you understand it's sigils and tokens. To reiterate,
Python's syntax is no better or worse than Perl's in this case -- it's just different. The programmer
may have preferences for one or the other, but I don't think an argument can be made that one or the
other is objectively better.
With a simple example out of the way, I'm going to provide a code example from some code I wrote in
the past month:
#+BEGIN_SRC perl
if ( $log_file_path =~ m/(\d{4}-\d{2}-\d{2}).log/ ) {
my $log_date = $1;
$log_date =~ tr/-//d;
if ( $log_date < $LATEST_DATE ) {
next;
} else {
my ($serial, $parsed_log_ref) = parse_log($log_file_path, \&json_line_parser);
my $output_file_path = $PROCESSED_LOG_DIR_PATH . "/" . $serial;
write_parsed_log_array($output_file_path, $parsed_log_ref);
}
} else {
die "Couldn't match log date in &process_seat_dir, exiting...\n";
}
#+END_SRC
This is a code path taken frequently in a log parser I wrote from something at my job.
Let's walk through the code and break it down in plain English. Feel free to skip if it's self evident:
Enter the =if= block if the variable =$log_file_path= matches a regex that looks something like =$YEAR-$MONTH-$DAY.log=.
Upon entering the block, capture the first regex capture group (enclosed in =()= in the regex) into a variable,
=$log_date=.
Use the Perl built in =tr()= to remove any =-= chars from the string.
Compare the resulting string (something that looks like =$YEAR$MONTH$DAY=) to a variable we set elsewhere in
the function scope, skipping the next code block if it's lower than =$LATEST_DATE=
Assign the return of =parse_log=
=parse_log()= expects to be passed two arguments: a string, and a function reference,
it returns two variables: a string, and a reference to an array, which represents an ordered list of
the lines in a file.
We assign these two returns into variables called =$serial= and =$parsed_log_ref=.
Construct a path name via the Perl built in string concat ( =.= ) and assign it to =$output_file_path=.
Finally, call a function that will flatten and write out the array of log lines to a file.
End syntax explanation.
____________________________________________________________
I think there are two potentially tricky Perl syntax-isms in the above snippet.
Firstly, the data type of =$parsed_log_ref= is completely opaque. If you don't have insight into what =parse_log=
is returning, you have no idea that =$parsed_log_ref= is an array reference. Strongly typed languages obviously
solve this kind of problem for you, but I think in the domain of dynamic languages, this is a common problem
that comes with the territory. To my knowledge Python or Ruby doesn't have great answers for this (please feel
free to correct me on this).
Secondly, unless you're familiar with perls tokens, it's unclear what =\&json_line_parser= is. I think this kind of
notation can actually be a /plus/ for Perl.
If I am passing some data to a function by reference, it's pretty clear what that data is (assuming it's not
encapsulated in a scalar like the aforementioned =$parsed_log_ref= example):
\@array
\%hash
\&function
\$scalar
For me personally, being able to denote type at a glance can be useful, as opposed to bare words in languages like python,
where lots of the time it's up to me as the reader to understand all the surrounding context in order to know what type
a variable is. I realize that in Python you can sometimes use ={}= or =[]= for type hints
As mentioned above, Perl has this issue as well to an extent, but I think to a lesser extent than dynamically typed
languages that don't denote type with any kind of special syntax.
I think though that it's difficult to make an _objective_ argument in this regard, so...moving on.
*** Perlvars and Perl Magic
A friend brought up another interesting point with the last code snippet, the use of the Perl built-in =$1= var.
To me there seems to be some difficulty in making _objective_ arguments around syntax preferences, but in
discussing this article with a friend, there are maybe _objective_ arguments to make against
some of the abstractions Perl provides the user.
In the context of the previous code snippet, I'm using =m//= and regex capture groups to assign a variable:
#+BEGIN_SRC perl
if ( $log_file_path =~ m/(\d{4}-\d{2}-\d{2}).log/ ) {
my $log_date = $1;
# ...
#+END_SRC
My friends argument against this kind of magic assignment was that it breaks well understood mental models of
how programs operate: You can't use a variable you haven't defined.
In the greater context of the program this snippet comes from, =$1= is never assigned anywhere,
it's provided to me via PCRE.
I was able to be convinced this kind of behavior is more harmful than the way Python handles this problem
( =re.match= /etc) because it forces the reader, who may be familiar with many other languages, but not
Perl, to understand Perl specific implementation details.
This is a very valid criticism I think, and as such this leads to the obvious question of what benefit
do you get from this complexity? I think, in a word: brevity.
* Perl is /fast/
You're right. It's probably not as fast as C, but below I will try and show that for a lot of cases, Perl is
much faster than it's main competition, Python. Particularly in certain domains.
Note: I know bench marking can be delicate and if not handled carefully produce poor data and poorer conclusions.
I've tried to be as fair and accurate as possible in these comparisions, and I'm making an effort to act in
good faith. If you believe I've made a mistake or that there is a faster way to do something, please let me know.
In the basic example below, generate a file with 1,000,000 newline separated 5 char strings:
Perl:
#+BEGIN_SRC perl
#!/usr/bin/perl
# perl --version | head -n2 | tail -n
# This is perl 5, version 34, subversion 0 (v5.34.0) built for x86_64-linux
use strict;
use warnings;
my @chars = ( "A".."Z" );
foreach ( 1..1000000 ) {
my $string = "";
foreach ( 1..5 ) {
$string = $string . $chars[ rand @chars ];
}
print("$string\n");
}
#+END_SRC
Result:
#+BEGIN_EXAMPLE
/tmp/tmp.G5LzZmzDQq λ time ./gen_words.pl > output.txt
real 0m1.065s
user 0m1.031s
sys 0m0.003s
#+END_EXAMPLE
Python:
#+BEGIN_SRC python
#python -V
#Python 3.10.2
import string
import random
word_list = list(string.ascii_uppercase)
print(random.choice(word_list))
for x in range(0,1000000):
string = ""
for y in range(0,5):
string = string + random.choice(word_list)
print(string);
#+END_SRC
Result:
#+BEGIN_EXAMPLE
/tmp/tmp.G5LzZmzDQq λ time python gen_words.py > output2.txt
real 0m3.986s
user 0m3.968s
sys 0m0.011s
#+END_EXAMPLE
Python ends up being almost 4x slower here.
In the basic example below, given a file of 1,000,000 newline separated 5 char strings, return how many are valid
words.
#+BEGIN_SRC perl
#!/usr/bin/perl
use strict;
use warnings;
foreach {
}
#+END_SRC
* There is no better Unix glue
- Here talk about Perl's "best" use case, as a glue language
for processing text streams and/or unstructured text data
* Lesser known Perl features
- Perl magic goes here
* Feedback/Topics/Notes To cover
- Top 10 things not to do in Python code
- People prefer python as more people know it
- Perception python stdlib is more complete
- People like Perl for it's portability
- People like Perl for text generation/report generation
- People like perl for it's use of one liners
- Cover "higher order perl"
- Perl's history as a "sysadmin lang" re: Larry Wall/Randal Schwartz
- Perl is more like Lisp and it is like C, and this is an important
distinction
- Talk about higher order functions
- Talk about string function references
- Talk about how Perl is /faster/ than Python in most text stream
processing cases (prove this!)
- Not welcoming to new comers
-