[ILUG] Sed question
[ILUG] Sed question
Brian Foster
blf at utvinternet.ie
Fri Feb 22 17:01:35 GMT 2002
| Date: Fri, 22 Feb 2002 14:10:05 +0000
| From: Padraig Brady <padraig at antefacto.com >
|
| Stephen_Reilly at dell.com wrote:
| > <"
| > sed -e 's/<[^>]*img/¬<img/g' foo.jsp | #put each <img>...
| > tr "¬" "\n" | #on a new line.
| > sed -n 's/<.*img.*src="\([^"]*\)".*/\1/gp' |
| > sort -u
| > ">
| >
| > hmmm, guess I better stop calling files "¬" ...
|
| true. I wouldn't have to do it if sed recognised c escapes
| like I mentioned previously.
if you are using a Bourne-ish shell (e.g., sh, ksh, bash, ...)
then to insert a newline (before each `foo' in the following
example), you can do (sans the indentation):
sed -e 's/foo/\
foo/g'
other shells with obnoxious quoting rules are exercises best
left to the reader ....
b.t.w., sed(1) does recognize \n for newline in REs; without
which, the hold space can be awkward to use in some cases.
I haven't tried, but I suspect the above IMG problem _might_
be solvable in one sed command, even with multiple IMGs on
one line. harder, however, might be the SRC on a separate
line from its IMG (which I _think_ is legal HTML).
for your amusement, here's a little bash(1)/sed script that
I threw together a few days ago to solve a stupid little
format conversion problem. much to my embarrassment, it
took me three tries to get it right .... ;-(
=====(cut here and below)=====:fixup=====(cut here and below)=====
#!/bin/bash
#
# Copyright © 2002 Brian L Foster. All rights reserved.
# $Id: :fixup,v 1.1 2002/02/19 17:32:52 blf Exp $
#
case $# in
2) tex=$1
raw=$2
;;
1) tex=$1
raw=/dev/stdin
;;
*) echo "Usage: $0 source.tex [ dvi2tty.raw ]" >&2
exit 2
;;
esac
# The inner sed(1) script transforms LaTeX \textsc{word}
# into the sed command s/\<word\>/WORD/g
# which the outer sed executes. The inner sed script hence
# reads the original LaTeX input source, whilst the outer
# script reads the dvi2tty(1) conversion of that source,
# writing to stdout a modified version of the conversion.
#
sed -e 's/IRL£/IEP/g' \
-e 's/unix/Unix/g' \
-e "$(
cat -- "$tex" | tr ' \t' '\n\n' | \
sed -n -e '/\\textsc{\([A-Za-z0-9]\{1,\}\)}/{
s/^.*\\textsc{\([A-Za-z0-9]\{1,\}\)}.*$/\1/
h
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
x
s,^.*$,s/\\<&\\>/,
G
s,\n\(.*\)$,\1/g,
p
}' | sort | uniq
)" -- "$raw"
=====(cut here and above)=====:fixup=====(cut here and above)=====
b.t.w., there's at least one spurious backslash in the above.
cheers!
-blf-
--
Innovative, very experienced, Unix and | Brian Foster Dublin, Ireland
Chorus (embedded RTOS) kernel internals | e-mail: blf at utvinternet.ie
expert looking for a new position ... | mobile: (+353 or 0)86 854 9268
For a resume, contact me, or see my website http://www.blf.utvinternet.ie
More information about the ILUG
mailing list
Read this without the formatting .