bash - How do I count multiple overlapping strings and get the total occurences per line (awk or anything else) -
i have input file this:
315secondbin x12121321211332123x 315firstbin 3212212121x 315thirdbin 132221312 316firstbin 121 316secondbin 1212 what want count how many instances of few different strings (say "121" , "212") exist in each line counting overlap. expected output be:
6 5 0 1 2 so modified awk thread use or operator in hopes count meets either condition:
{ count = 0 $0 = tolower($0) while (length() > 0) { m = match($0, /212/ || /121/) if (m == 0) break count++ $0 = substr($0, m + 1) } print count } unfortunately, output this:
8 4 0 2 3 but if leave out or counts perfectly. doing wrong?
also, run script on file ymaz.txt running:
cat ymaz.txt | awk -v "pattern=" -f count3.awk as alternate approach tried this:
{ count = 0 $0 = tolower($0) while (length() > 0) { m = match($0, /212/) y = match($0, /121/) if ((m == 0) && (y == 0)) break count++ $0 = substr($0, (m + 1) + (y + 1)) } print count } but output this:
1 1 0 1 1 what doing wrong? know should understanding code , not cutting , pasting stuff together, that's skill level @ point.
btw when don't have or in there (ie i'm searching 1 string) works perfectly.
you're making complicated:
{ count=0 while ( match($0,/121|212/) ) { count++ $0=substr($0,rstart+1) } print count } $ awk -f tst.awk file 6 5 0 1 2 your fundamental problem confusing condition regexp. regexp can compared string form condition, , when string in question $0 can leave out , use regexp shorthand $0 ~ regexp in context what's being tested still condition. 2nd arg match() regexp, not condition. | or operator in regexp while || or operator in condition. /.../ regexp delimiters.
/foo/ regexp
$0 ~ /foo/ condition
/foo/ in conditional context shorthand $0 ~ /foo/ in other context regexp.
/foo/ || /bar in conditional context shorthand $0 ~ /foo/ || $0 ~ /bar/ 2nd arg match() awk assumes intended write:
match($0,($0 ~ /foo/ || $0 ~ /bar/)) i.e. test current record against foo or bar , if true condition evaluates 1 , 1 given match() it's 2nd arg.
look:
$ echo foo | gawk 'match($0,/foo/||/bar/)' $ echo foo | gawk '{print /foo/||/bar/}' 1 $ echo 1foo | gawk 'match($0,/foo/||/bar/)' 1foo get book effective awk programming, 4th edition, arnold robbins.
Comments
Post a Comment