LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Perl Regex Help Needed

[ILUG] Perl Regex Help Needed

Rory Winston rwinston at eircom.net
Sun Sep 11 16:53:30 IST 2005


Hi all

First of all, yes, I *do* realise that this is not a Perl mailing list 
per se. However, whenever I have gotten really stuck in the past and had 
to turn to this list, the combined expertise gathered here has never 
been found wanting. So apologies in advance. And yes, I have RTFM, etc. 
but I still cant figure this one out.

Consider the following - I have a single concatenated file of questions 
and answers. It looks something like this:

--- FILE 1 ---

Q 1) blah blah blah blah blah blah blah blah blah blah blah blah blah 
blah blah blah
blah blah blah blah blah blah blah blah

A 1) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb

Q 2) blah blah blah blah blah blah blah blah blah blah blah blah blah 
blah blah blah
blah blah blah blah blah blah blah blah

A 2) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb

Q 3) blah blah blah blah) blah blah blah blah blah blah blah blah blah 
blah blah blah)
blah blah blah blah blah blah blah blah

A 3) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb
 rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb 
rhubarb rhubarb


I want to extract the questions and answers, and mark them up as HTML. 
Sounds simple eh? I wish.

What I really want is a regex that can extract pertinent info from a 
question/answer line (such as the question/answer
number), and then the text itself. My first attempt was:

my $trivia = do { local $/; <TRIVIA> };     # Slurp

while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gm) {
    print "Matched ($1) and ($2)\n";
}

This works - sort of. The problem with the above regex is that it will 
only grab the question/answer text
to the end of the line, and not until the next question/answer 
delimiter. I guess I could add the \s flag
to capture newlines in the (.*) portion, like so:

while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gms) {
    print "Matched ($1) and ($2)\n";
}

But that doesn't work either. It greedily grabs *everything*. I tried to 
coerce the (.*) match
to be not quite so greedy by adding the ? lazy operator:

while ( $trivia =~ m/^[QA] (\d)+\) (.*?)/gms) {
    print "Matched ($1) and ($2)\n";
}

But that now grabs *nothing*. At this point (having also tried some 
combinations of using the \G operator), I am
well and truly stuck. I just want to say "grab everything up until the 
next instance of a pattern that signifies a question/answer".

If anyone can help with this at all, the retro computing community will 
be very thankful!!!

Thanks
Rory





More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell