LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Sed question

[ILUG] Sed question

Brian Foster blf at utvinternet.ie
Fri Feb 22 17:01:35 GMT 2002


  | Date: Fri, 22 Feb 2002 14:10:05 +0000
  | From: Padraig Brady <padraig at antefacto.com>
  | 
  | Stephen_Reilly at dell.com wrote:
  | > <"
  | > sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
  | > tr "¬" "\n" |                          #on a new line.
  | > sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
  | > sort -u
  | > ">
  | > 
  | > hmmm, guess I better stop calling files "¬" ...
  | 
  | true. I wouldn't have to do it if sed recognised c escapes
  | like I mentioned previously.

if you are using a Bourne-ish shell (e.g., sh, ksh, bash, ...)
then to insert a newline (before each `foo' in the following
example), you can do (sans the indentation):

   sed -e 's/foo/\
   foo/g'

other shells with obnoxious quoting rules are exercises best
left to the reader ....

b.t.w., sed(1) does recognize \n for newline in REs; without
which, the hold space can be awkward to use in some cases.
I haven't tried, but I suspect the above IMG problem _might_
be solvable in one sed command, even with multiple IMGs on
one line.  harder, however, might be the SRC on a separate
line from its IMG (which I _think_ is legal HTML).

for your amusement, here's a little bash(1)/sed script that
I threw together a few days ago to solve a stupid little
format conversion problem.  much to my embarrassment, it
took me three tries to get it right ....  ;-(

=====(cut here and below)=====:fixup=====(cut here and below)=====
#!/bin/bash
#
# Copyright © 2002 Brian L Foster.  All rights reserved.
# $Id: :fixup,v 1.1 2002/02/19 17:32:52 blf Exp $
#
case $# in
2)	tex=$1
	raw=$2
	;;
1)	tex=$1
	raw=/dev/stdin
	;;
*)	echo "Usage: $0 source.tex [ dvi2tty.raw ]" >&2
	exit 2
	;;
esac

	# The inner sed(1) script transforms LaTeX  \textsc{word}
	# into the sed command                      s/\<word\>/WORD/g
	# which the outer sed executes.  The inner sed script hence
	# reads the original LaTeX input source, whilst the outer
	# script reads the dvi2tty(1) conversion of that source,
	# writing to stdout a modified version of the conversion.
	#
sed -e 's/IRL£/IEP/g'	\
    -e 's/unix/Unix/g'	\
    -e "$(
	cat -- "$tex" | tr ' \t' '\n\n' | \
		sed -n -e '/\\textsc{\([A-Za-z0-9]\{1,\}\)}/{
			s/^.*\\textsc{\([A-Za-z0-9]\{1,\}\)}.*$/\1/
			h
			y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
			x
			s,^.*$,s/\\<&\\>/,
			G
			s,\n\(.*\)$,\1/g,
			p
		}' | sort | uniq
 	)" -- "$raw"
=====(cut here and above)=====:fixup=====(cut here and above)=====

b.t.w., there's at least one spurious backslash in the above.

cheers!
	-blf-
--
 Innovative, very experienced, Unix and      | Brian Foster    Dublin, Ireland
 Chorus (embedded RTOS) kernel internals     | e-mail: blf at utvinternet.ie
 expert looking for a new position ...       | mobile: (+353 or 0)86 854 9268
  For a resume, contact me, or see my website  http://www.blf.utvinternet.ie




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell