bash - How do I count multiple overlapping strings and get the total occurences per line (awk or anything else) -
i have input file this:
315secondbin x12121321211332123x 315firstbin 3212212121x 315thirdbin 132221312 316firstbin 121 316secondbin 1212
what want count how many instances of few different strings (say "121" , "212") exist in each line counting overlap. expected output be:
6 5 0 1 2
so modified awk thread use or operator in hopes count meets either condition:
{ count = 0 $0 = tolower($0) while (length() > 0) { m = match($0, /212/ || /121/) if (m == 0) break count++ $0 = substr($0, m + 1) } print count }
unfortunately, output this:
8 4 0 2 3
but if leave out or counts perfectly. doing wrong?
also, run script on file ymaz.txt running:
cat ymaz.txt | awk -v "pattern=" -f count3.awk
as alternate approach tried this:
{ count = 0 $0 = tolower($0) while (length() > 0) { m = match($0, /212/) y = match($0, /121/) if ((m == 0) && (y == 0)) break count++ $0 = substr($0, (m + 1) + (y + 1)) } print count }
but output this:
1 1 0 1 1
what doing wrong? know should understanding code , not cutting , pasting stuff together, that's skill level @ point.
btw when don't have or in there (ie i'm searching 1 string) works perfectly.
you're making complicated:
{ count=0 while ( match($0,/121|212/) ) { count++ $0=substr($0,rstart+1) } print count } $ awk -f tst.awk file 6 5 0 1 2
your fundamental problem confusing condition regexp. regexp can compared string form condition, , when string in question $0 can leave out , use regexp
shorthand $0 ~ regexp
in context what's being tested still condition. 2nd arg match() regexp, not condition. |
or
operator in regexp while ||
or
operator in condition. /.../
regexp delimiters.
/foo/
regexp
$0 ~ /foo/
condition
/foo/
in conditional context shorthand $0 ~ /foo/
in other context regexp.
/foo/ || /bar
in conditional context shorthand $0 ~ /foo/ || $0 ~ /bar/
2nd arg match() awk assumes intended write:
match($0,($0 ~ /foo/ || $0 ~ /bar/))
i.e. test current record against foo or bar , if true condition evaluates 1 , 1 given match() it's 2nd arg.
look:
$ echo foo | gawk 'match($0,/foo/||/bar/)' $ echo foo | gawk '{print /foo/||/bar/}' 1 $ echo 1foo | gawk 'match($0,/foo/||/bar/)' 1foo
get book effective awk programming, 4th edition, arnold robbins.
Comments
Post a Comment