LINUX.IE, website of the Irish Linux Users' Group
Tux rules!

   
Home
New Users
Articles
Download
Projects
Community
Vendors

  Print Version
Email to...
 
Archives:


planetILUG

Recent News

News Archive


Join the
ILUG
on FaceBook


Join the
ILUG
on LinkedIn


Join the
ILUG SETI
Group



















 
 :: Mailing Lists

[ILUG] Re: perl file processing

[ILUG] Re: perl file processing

Marcus Furlong furlongm at hotmail.com
Wed Oct 8 20:55:21 IST 2008


Brian Foster <blf <at> utvinternet.ie> writes:

> 
>  below's a quickly-put-together all-awk(1) solution,
>  albeit if this was my problem I'd be more inclined
>  to do some filtering first, probably with sed(1)
>  like Francis did.
> cheers!
> 	-blf-
> 
> #!/bin/gawk -f
> BEGIN {
> 	state  = 0
> 	ncols  = 0
> 	nlines = 0
> 
> 	STDERR = "/dev/stderr"
> }
> 
> state == 0 && $0 == "=== Stratified cross-validation ===" {
> 	state = 1
> 	next
> }
> 
> state == 1 && $0 == "=== Detailed Accuracy By Class ===" {
> 	state = 2
> 	next
> }
> 
> state == 2 && 2 <= NF && $NF ~ /^[A-Z]$/ {
> 	for (n = 1; n < NF; n++) {
> 		if ($n !~ /^[0-9.]*$/)
> 			next
> 	}
> 	state = 3
> 	ncols = NF
> }
> 
> state == 3 && NF != ncols { exit }	# goto END
> 
> state == 3 && $NF !~ /^[A-Z]$/ { exit }	# goto END
> 
> state == 3 {
> 	for (n = 1; n < ncols; n++)
> 		col[n] += (0 + $n)
> 	nlines++
> 	next
> }
> 
> END {
> #debug	print "EXIT(" FNR "): nlines=" nlines, "ncols=" ncols
> 	if (nlines <= 0) {
> 		print FILENAME ": Data not found, state =", state  >STDERR
> 		exit 1
> 	}
> 	for (n = 1; n < ncols; n++)
> 		print col[n]/nlines
> }
> 

Thanks for this one too.

I had to change [A-Z] to [a-zA-Z]* as the class is not always a single uppercase
character. I ran both yours and Francis' from a shell script over the 250 files,
and out of curiosity, I checked the times for each.

The sed/awk one comes out with the following:
real    0m32.733s
user    0m10.915s
sys     0m11.838s

whereas the pure awk one is a lot faster:
real    0m18.239s
user    0m2.035s
sys     0m2.761s

Ran a few times just in case, and those times are fairly consistent. Anyway
thanks to you both!

Marcus.




More information about the ILUG mailing list
Read this without the formatting.
                                                                                                    

 

Hosted by HEAnet


Maintained by the ILUG website team. The aim of Linux.ie is to support and help commercial and private users of Linux in Ireland. You can display ILUG news in your own webpages, read backend information to find out how. Networking services kindly provided by HEAnet, server kindly donated by Dell. Linux is a trademark of Linus Torvalds, used with permission. No penguins were harmed in the production or maintenance of this highly praised website. Looking for the Indian Linux Users' Group? Try here. If you've read all this and aren't a lawyer: you should be!
RSS Version
Powered by Dell