MATLAB spending an incredible amount of time writing a relatively small matrix -
i have small matlab script (included below) handling data read csv file 2 columns , hundreds of thousands of rows. each entry natural number, zeros occurring in second column. code taking incredible amount of time (hours) run should achievable in @ seconds. profiler identifies approximately 100% of run time spent writing matrix of zeros, size varies depending on input, in usage smaller 1000x1000.
the code follows
function [data] = datahandler(d) n = size(d,1); s = max(d,1); data = zeros(s,s); = 1:n data(d(i,1),d(i,2)+1) = data(d(i,1),d(i,2)+1) + 1; end
it's data = zeros(s,s);
line takes around 100% of runtime. can make code run changing out s's in line 1000, sufficient upper bound ensure won't run errors of data i'm looking at.
obviously there're better ways this, being bashed code format data wasn't concerned. said, fixed replacing s 1000 purposes, i'm perplexed why writing matrix bog matlab down several hours. new code runs instantaneously.
i'd interested if has seen kind of behaviour before, or knows why happening. little disconcerting, , able confident can initialize matrices freely without killing matlab.
your call zeros
incorrect. looking @ code, d
looks d x 2
array. however, call of s = max(d,1)
generate d x 2
array. consulting documentation max
, happens when call max
in way used:
c = max(a,b)
returns array same sizea
,b
largest elements takena
orb
. either dimensions ofa
,b
same, or 1 can scalar.
therefore, because used max(d,1)
, comparing every value in d
value of 1, you're getting copy of d
in end. using input zeros
has rather undefined behaviour. happen each row of s
, allocate temporary zeros
matrix of size , toss temporary result. dimensions of last row of s
recorded. because have large matrix d
, why profiler hangs here @ 100% utilization. therefore, each parameter zeros
must scalar, yet call produce s
produce matrix.
what believe intended should have been:
s = max(d(:));
this finds overall maximum of matrix d
unrolling d
single vector , finding overall maximum. if this, code should run faster.
as side note, post may interest you:
faster way initialize arrays via empty matrix multiplication? (matlab)
it shown in post doing zeros(n,n)
in fact slow , there several neat tricks initializing array of zeros. 1 way accomplish empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
one of personal favourites if assume data
not declared / initialized, can do:
data(n,n) = 0;
if can comment, for
loop quite inefficient. doing calculating 2d histogram / accumulation of data. can replace for
loop more efficient accumarray
call. avoids allocating array of zeros
, accumarray
under hood you.
as such, code become this:
function [data] = datahandler(d) data = accumarray([d(:,1) d(:,2)+1], 1);
accumarray
in case take pairs of row , column coordinates, stored in d(i,1)
, d(i,2) + 1
i = 1, 2, ..., size(d,1)
, place match same row , column coordinates separate 2d bin, add of occurrences , output @ 2d bin gives total tally of how many values @ 2d bin corresponds row , column coordinate of interest mapped location.
Comments
Post a Comment