LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Re: Re: sed question

[ILUG] Re: Re: sed question

Marcus Furlong furlongm at hotmail.com
Fri Aug 15 23:22:32 IST 2008


On Friday 15 August 2008 22:18 in <g84rpc$dbf$1 at ger.gmane.org>, Marcus
Furlong wrote:

> On Friday 15 August 2008 05:27 in
> <ddb467af0808142127x7bb9aff9re9a362ad5e264a80 at mail.gmail.com>, Emen Zhao
> wrote:
> 
>> Hello Marcus,
>> 
>> Try if this helps. It assumes all elements are missing ending tag, and
>> doesn't support embedded tags. If that's the case, a more sophisticated
>> script might be needed.
>> 
>> perl -0777 -wpl -e 's{(<(\w+).*?>.*?)(?=\s*(<\w|\z))}{$1. " </$2>"}esg'
>> 
>> Hope this helps.
> 
> It does, it _almost_ does what I need. It doesn't seem to handle the case
> where the tag content starts on a new line though:
> 
> <third>
> hello
> <third>hello
> 
> becomes
> 
> <third> </third>
> hello
> <third>hello </third>
> 
> I tried undef $/ (as per a different post) but that doesn't seem to help
> either. Any ideas how to fix this?

Ok this only happens if it's on the first line so I added an extra line and
it works perfectly now, thanks!

One final question for the list on the same topic..

Some of the tags contain an attribute, say "my_attribute", which according
to the DTD, should only contain certain values. If the value is not valid,
I want to remove the attribute entirely. E.g. if ASD SDF DFG FGH GHJ HJK
are the valid values for this attribute, then the following:

<third my_attribute="ASD">
<third my_attribute="AD AD">
<third my_attribute="">
<third my_attribute="HJK">

would become:

<third my_attribute="ASD">
<third>
<third>
<third my_attribute="HJK">

I threw together the following snippet, which works, but it strikes me as a
horrible hack, as I'm sure there's a perl/sed one liner that could do it.
Does anyone know how it could be done somewhat more elegantly?

# some values contain spaces so convert them to underscores
# and back again before removal
for k in `grep -o 'my_attribute=".*"' ${xml_filename} |
sed -e 's/my_attribute=//' | sed -e 's/ /_/g'` ; do
  valid=false
  # the following are the valid values that this attribute can have
  for l in ASD SDF DFG FGH GHJ HJK ; do
    if [ "${k}" == "\"${l}\"" ] ; then
      valid=true
    fi
  done
  if [ "${valid}" == "false" ] ; then
    k=`echo ${k} | sed -e 's/_/ /g'`
    sed -i -e "s/ my_attribute=${k}//" ${xml_filename}
  fi
done

Marcus.




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell