Using Regular Expressions

Home > Programming > Using Regular Expressions

Using Regular Expressions

November 10th, 2012 k3a Leave a comment Go to comments

In my opinion, regular expressions are one of the most useful things in general programming and are one of my secret weapons. I use them very often for parsing data from web pages, parsing out language strings from game code (for internationalizing the game), modifying and processing text files, processing output of various utilities like the “svn” terminal client. I mostly use Ruby or Perl for these as they offer various other cool features for text processing (and Perl is super-fast as well).

For example, removing all files marked with “!” characters in “svn stat” command I used this perl script:

svn stat | perl -ne 'system "svn rm \"$1@\"\n" if /\!\s+(.*)/'

Which I can now improve, since I have learnt a bit more of Perl in the meantime:

svn stat | perl -ne 'system qq[svn rm "$1@"\n] if /\!\s+(.*)/'

qqSEPxxxSEP is equal to the “xxx” string literal but allows you to specify a SEP separator.
Such a short one-line script can save you several minutes that would be spent by copy-pasting each file and putting rm command before it to delete it manually… would you do it manually again?

If you still don’t know which characters you have to escape and which not, don’t worry, it’s actually quite easy. All you need to do is to think about it. If you know, that parentheses “(” “)” are used for delimiting match groups and you wish to match “(” in the input, you have to escape it \( otherwise you will start a new match group inside specifying input character. The dot is used as “any character” in regexp, so if you want to match the dot in the input, you need to escape it like this “/\./”. The same goes for “/” if you use it as regexp delimiter. If you want to match “http://k3a.me/” input, you can’t obviously do this regexp “/http://k3a.me//” but you have to escape input characters like this: “/http:\/\/k3a.me\//”. Makes sense, eh?

And one more thing. Did you know that you don’t have to write this “/http:\/\/k3a\.me/” and you can write “m#http://k3a\.me#” instead? “m” allows you to specify the delimiter of the expression. By using a good delimiter, your regexp will be much more readable. It can be any character like “m|http://|” or “m,http://,”.

By the way, finally I came across one more advanced Perl feature – using perl code to create a replacement string with the help of regexp:

while (<DATA>) {
  s|<textarea rows="(.+?)">(.*?)</textarea>| {
    my $rows = ceil(length($3) / 80);
    qq[<textarea rows="$rows">$3</textarea>];
  }|egis;
  print;
}

It reads input DATA, trying to match and parse textarea tags. When a match is found, it executes the code to create a replacement string (specified by e modifier in “egis” modifiers), while using matched groups as input. You may also use function call or anything else like:

$str ~= s/some(thing|body)/getName($1)/eg

This is what I wanted to achieve for a long time. :)

Happy matching, parsing and replacing!

Categories: Programming Tags: boundary, delimiter, pattern, pcre, perl

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Parsing XML and HTML using Perl and LibXML GameCenter error “The requested operation has been cancelled.” explanation

K3A

Using Regular Expressions

Recent Posts

Recent Comments

Archives

Categories

Archives

Meta