Re: [ILUG] Filtering a file.

From: David Neary (dneary at domain wanadoo.fr)
Date: Fri 12 Jul 2002 - 09:32:39 IST


Brian Foster wrote:
> since the original poster has since changed the spec, to one
> which invites a different simpler solution, this discussion is
> now just academic (but, IMHO, still interesting). of course,
> if my reading of that original spec was correct rather than
> your very plausible alternative, then Stephen Reilly had the
> best solution.

Granted :) All my early ideas were just plain wrong.

> | Brian Foster wrote:
> | > <In reference to sed '/pattern/{d;d}'>
> | > yer missing a `;' here, ????????????
> | > after the 2nd `d' before
> | > the closing `}' [ ... ]
> more to the point, I'm fairly certain I've
> never met a `sed' where the originally posted version (i.e.,
> sans the 2nd `;') would work.

You're correct. I was wrong again. What I meant was that if I
made the modification that I thought was necessary (that is,
putting the } on a line of it's own) then the second d wouldn't
need a semi-colon. I refer to the } being required to be on a
line of it's own, when with a bit of testing, I now know that (at
least) GNU sed accepts a semi-colon before the closing brace,
rather than putting it on it's own line.

> | the semicolon isn't necessary as a separator.
>
> in what `sed' is it not necessary?? I'm puzzled as to which
> `sed' does this .... ?

All seds - a newline is an acceptable separator. Although I may
have been specifically referring to my botched first effor above,
in which case I was just plain wrong :)

> | >[ ... ] this somewhat obscure command does the trick:
> | >
> | > sed -n '\¬^/XXX/ CE¬{n;d;};p' filename
>
> good point. however, I call this an ambiguity in the original
> spec. whilst I no longer have the original posting readily
> available, I don't recall the description discussing, nor the
> sample data illustrating, this maybe-pathological case.

Nope - it didn't. It's only attempt at a spec was "I want to
remove lines matching /pattern/, and the line following them."

> (also, AFAIK, the `\¬...¬' RE address notation is also rare,
> but is, I _think_, in all(?) `sed's --- what does POSIX say??)
> "rare" means "not commonly used (or even known)".

Well, the ORA book insists that sed needs the // for adresses -
so this might be a GNU sed extension. It's at least documented by
GNU sed (unlike, say, the semicolon) :) It appears to work in BSD
sed but is undocumented there.

> | according to my man page, the closing } has to be on
> | it's own line.
>
> I do, however, believe your (presumably non-GNU) sed(1) man
> page says/implies the '}' has to be on its own line --- as I
> recall, both the 7th Edition and some commercial *ix man pages
> do seem to say just that.

Well, this is the BSD man page for sed...

   Two of the commands take a command-list, which is a list of
     sed commands separated by NEWLINE characters, as follows:

     { command
     command
     }

     The { can be preceded with blank characters and can be fol-
     lowed with white space. The commands can be preceded by
     white space. The terminating } must be preceded by a NEW-
     LINE character and can be preceded or followed by <blank>s.

> I am curious if you have/know a `sed' where adding that `;'
> doesn't work, or, for that matter, if you have/know a `sed'
> where your original neither-`;'-nor-newline posting works ...?

Nope - I don't know of a sed where adding the ; doesn't work. But
this behaviour is undocumented in all seds that I currently have
access to. And the original 'solution' was wrong. As was the
follow-up solution, but the last one was right :)

> nor can I. ;-\ strictly FYI, that script happens to very
> close to a script I first wrote c.20 years ago(!) and still
> have lying around somewhere, called `squeeze', which removes
> all trailing whitespace and then --- this is where it is quite
> similar --- collapses all consecutive empty lines into one.

There's an easier way to do that...

sed '
s/[ \t]*$//
/^$/{
N
s/[ \t]*$//
/^\n$/D
}' file

N adds the next line to the pattern buffer, rather than swapping
it into the pattern buffer. D deletes the pattern buffer up to
the first newline. So if the next line added is empty, then
the first empty line gets deleted, and the second gets fed back
into the loop, and if the next line is not empty, then the entire
pattern buffer gets printed, including the one empty line. The
substitution at the start is purely for the first line
(whitespace will already be stripped from the other lines by the
substitution in the second command, I think).

> thanks for an interesting discussion into some of the trivia
> of `sed'.

No problem :) I made some biggish mistakes, though...

Cheers,
Dave.

-- 
       David Neary,
    Marseille, France
  E-Mail: bolsh at domain gimp.org


This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:17:51 GMT