On Thu, Dec 1, 2011 at 20:09, Kingsley G. Morse Jr. <kingsley at loaner.com> wrote:
> Hi Marcus,
>> Here's a sample script that may help.
>> It echos your data into a pipe of two seds.
>> Maybe you can quickly and easily see it work by
> using your mouse to copy and paste it into a
> command line.
>>>> #!/bin/bash
>> echo "<tag>(FR) text
>> <tag> - (FR) text
>> <tag> (FR)
> text
>> <tag>
> (FR) text
>> <tag>
> - (FR) text
>> <tag>
> othertext - (FR) text
>> <tag>othertext - (FR) text
>> <tag>othertext -
> (FR) text" | sed -E -n '/<tag>/{N; s/\n//; s/<tag>(.*)\(FR\)/<tag language="FR" attribute=\1>/g;p;}' | sed 's/> \?/>\n/g'
>>>> It seems to me that useful examples are at
>>http://codesnippets.joyent.com/posts/show/2111>> I hope that helps,
It does, thanks.
With this one I was able to get to here:
sed -E -n '/>/{N; s/\n//; s/>(.*)\(([FR|PT|ES|GA]*)\)/ language="\2"
attribute="\1">/g;p;}' | sed -e 's/> \?/>\n/g'
which almost does what it should. I tried to get rid of the extra
spaces and dash if they were there using [ ]*-*[ ]* but it never seems
to match (although _just_ one space either side will match).
Also, is there any way in sed to specify "only two of these
characters"? So instead of FR|PT|ES etc I could use something like
[A-Z]2
Marcus
--
Marcus Furlong
Maintained by the ILUG website team. The aim of Linux.ie is to
support and help commercial and private users of Linux in Ireland. You can
display ILUG news in your own webpages, read backend
information to find out how. Networking services kindly provided by HEAnet, server kindly donated by
Dell. Linux is a trademark of Linus Torvalds,
used with permission. No penguins were harmed in the production or maintenance
of this highly praised website. Looking for the
Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!