python - Optimize this function with numpy (or other vectorization methods) -
i computing python classic calculation in field of population genetics. aware there exists many algorithm job wanted build own reason.
the below paragraph picture because mathjax not supported on stackoverflow
i have efficient algorithm calculate fst
. moment manage make loops , no calculations vectorized how can make calculation using numpy (or other vectorization methods)?
here code think should job:
def fst(w, p): = len(p[0]) k = len(p) h_t = 0 h_s = 0 in xrange(i): bar_p_i = 0 k in xrange(k): bar_p_i += w[k] * p[k][i] h_s += w[k] * p[k][i] * p[k][i] h_t += bar_p_i*bar_p_i h_t = 1 - h_t h_s = 1 - h_s return (h_t - h_s) / h_t def main(): w = [0.2, 0.1, 0.2, 0.5] p = [[0.1,0.3,0.6],[0,0,1],[0.4,0.5,0.1],[0,0.1,0.9]] f = fst(w,p) print("fst = " + str(f)) return main()
there's no reason use loops here. , shouldn't use numba or cython stuff - linear algebra expressions 1 have whole reason behind vectorized operations in numpy.
since type of problem going pop again , again if keep using numpy, recommend getting basic handle on linear algebra in numpy. might find book chapter helpful:
https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html
as specific situation: start creating numpy arrays variables:
import numpy np w = np.array(w) p = np.array(p)
now, \bar p_i^2 defined dot product. that's easy:
bar_p_i = p.t.dot(w)
note t, transpose, because dot product takes sum of elements indexed last index of first matrix , first index of second matrix. transpose inverts indices first index becomes last.
you h_t defined sum. that's easy:
h_t = 1 - bar_p_i.sum()
similarly h_s:
h_s = 1 - ((bar_p_i**2).t.dot(w)).sum()
Comments
Post a Comment