Re: [ILUG] Filtering a file.

From: Brian Foster (blf at domain utvinternet.ie)
Date: Thu 11 Jul 2002 - 15:11:26 IST


  | Date: Thu, 11 Jul 2002 09:28:11 +0200
  | From: David Neary <dneary at domain wanadoo.fr>
  |
  | David Neary [ previously ] wrote:
  | > Aherne Peter-pahern02 wrote:
  | > > What I want to do is get any line starting with /XXX/ CE and remove
  | > > that line and the following one. [ ... ]
  | >
  | > OK - this may not work, but the idea is sound enough.
  | >
  | > sed '/^\/XXX\/ CE/{d;d}' filename
                            ↑
 close, but not quite. │
 yer missing a `;' here, ───┘
 after the 2nd `d' before
 the closing `}', i.e.:

      sed '/^\/XXX\/ CE/{d;d;}' filename
 or:
      sed '\¬^/XXX/ CE¬{d;d;}' filename

 _however_, these commands are not actually correct!

 they are not correct because `d' starts the next cycle,
 i.e., the next line is read and the program starts from the
 beginning. hence, the 2nd `d' is never executed, and thus
 the following line is printed. but it should have been
 "removed" (not printed) .... ;-(

 unfortunately, the obvious ed(1)-inspired fix doesn't work,
 at least with GNU `sed', which seems to lack the concept of
 address arithmetic:

      sed '\¬^/XXX/ CE¬,//+1d' filename # DOES NOT WORK

 instead, this somewhat obscure command does the trick:

      sed -n '\¬^/XXX/ CE¬{n;d;};p' filename

 what this does:

  -n … … … … … … … … never print anything automagically.

  \¬^/XXX/ CE¬{ … … starting with lines matching the RE
                     `^/XXX/ CE' do the commands enclosed
                     in braces `{ ... }'.

  n; … … … … … … … … forget the current (matching) line and
                     read the next (following) line. (if `-n'
                     was not specified, this would first print
                     the current line.) hence, `sed' has read
                     both the matching line and the following
                     line, so now all we need to do is...

  d; … … … … … … … … forget (delete) the following line. this
                     ends the program (for matching lines), so
                     `sed' reads the next line and starts again.

  }; … … … … … … … … end of brace `{ ... }'-enclosed commands.

  p … … … … … … … … if we get this far, which could _only_
                     happen if neither the current nor the
                     previous line matched, print the line.

 the `sed' program now ends, so `sed' forgets what it just
 read and starts over again.

  |[ ... ]
  | Sed commands should be on separate lines,

 IMHO, it's a matter of taste/style. e.g., I'd normally write
 the above in mix of styles, as:

      sed -n -e '\¬^/XXX/ CE¬{n;d;}' -e p -- filename

 albeit it can be argued whether or not that is any clearer.

  | and the trailing }
  | needs to be on a line of it's own. Who knew!

 not quite. `}' is a command and hence needs to be separated
 from the other commands, either by `;' or a newline.

 the obscure topic of when `;'s are used in `sed' commands is,
 AFAIK, incompletely discussed in (most?) sed(1) manual pages;
 and the GNU sed(1) man page does not mention `;'s at all!

 the rule is simple: `;' can be used anyplace(?) a command-
 separating newline can be used. IMHO, part of the confusion
 arises because `}' is itself a _command_, unlike C/C++/awk/&tc
 (but similar to Bourne-ish shells), and hence must itself be
 separated from the other commands. (the other part to the
 confusion is `{' is not a command per se, and hence does not
 need to be separated.)

  | This would also work, I think...
  |
  | sed -n '/^\/XXX\/ CE/{
  | n
  | n
  | }
  | /^\/XXX\/ CE/! p' filename

 close but not quite. it will fail on the input:

    /XXX/ CE one, 1st line
                  2nd line
    /XXX/ CE two, 3rd line
                  4th line

 printing the `... 4th line'. the reason this fails is left as
 an exercise to the reader.

  | Sorry for the earlier misinformation.

 thanks for the corrections. I hope my comments above are also
 useful and not too misleading.

cheers!
        -blf-

  | David Neary,
  | Marseille, France
  | E-Mail: bolsh at domain gimp.org

--
 Innovative, very experienced, Unix and      | Brian Foster    Dublin, Ireland
 Chorus (embedded RTOS) kernel internals     | e-mail: blf at domain utvinternet.ie
 expert looking for a new position ...       | mobile: (+353 or 0)86 854 9268
  For a résumé, contact me, or see my website  http://www.blf.utvinternet.ie
    Stop E$$o (ExxonMobile):  «Whatever you do, don't buy Esso --- they
     don't give a damn about global warming.»    http://www.stopesso.com
     Supported by Greenpeace, Friends of the Earth, and numerous others...


This archive was generated by hypermail 2.1.6 : Thu 06 Feb 2003 - 13:17:50 GMT