Hi all
First of all, yes, I *do* realise that this is not a Perl mailing list
per se. However, whenever I have gotten really stuck in the past and had
to turn to this list, the combined expertise gathered here has never
been found wanting. So apologies in advance. And yes, I have RTFM, etc.
but I still cant figure this one out.
Consider the following - I have a single concatenated file of questions
and answers. It looks something like this:
--- FILE 1 ---
Q 1) blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah
blah blah blah blah blah blah blah blah
A 1) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
Q 2) blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah
blah blah blah blah blah blah blah blah
A 2) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
Q 3) blah blah blah blah) blah blah blah blah blah blah blah blah blah
blah blah blah)
blah blah blah blah blah blah blah blah
A 3) rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb rhubarb
rhubarb rhubarb
I want to extract the questions and answers, and mark them up as HTML.
Sounds simple eh? I wish.
What I really want is a regex that can extract pertinent info from a
question/answer line (such as the question/answer
number), and then the text itself. My first attempt was:
my $trivia = do { local $/; <TRIVIA> }; # Slurp
while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gm) {
print "Matched ($1) and ($2)\n";
}
This works - sort of. The problem with the above regex is that it will
only grab the question/answer text
to the end of the line, and not until the next question/answer
delimiter. I guess I could add the \s flag
to capture newlines in the (.*) portion, like so:
while ( $trivia =~ m/^[QA] (\d)+\) (.*)/gms) {
print "Matched ($1) and ($2)\n";
}
But that doesn't work either. It greedily grabs *everything*. I tried to
coerce the (.*) match
to be not quite so greedy by adding the ? lazy operator:
while ( $trivia =~ m/^[QA] (\d)+\) (.*?)/gms) {
print "Matched ($1) and ($2)\n";
}
But that now grabs *nothing*. At this point (having also tried some
combinations of using the \G operator), I am
well and truly stuck. I just want to say "grab everything up until the
next instance of a pattern that signifies a question/answer".
If anyone can help with this at all, the retro computing community will
be very thankful!!!
Thanks
Rory
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!