Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Count identical rows in a matrix

Subject: Count identical rows in a matrix

From: H-H Fahad

Date: 1 Jun, 2009 16:35:02

Message: 1 of 8

Hello everybody,
I have a digit matrix of size [3200 2], and want to count identical rows in that matrix. Example:

[123 343] 2 repititions
[123 688]
[145 668]
[123 343] 2 repititions
...

I have tired to go through the rows one by one, and this resulted in a script with runtime n^2 (app 7 minutes, & I have to run it 188 times!)

I know matlab can do this much faster with its integrated functions, but I'm afraid that's all I know about it ;)
Since there are real experts here, I decided to post a message, and see if anybody can help me!
Regards Hamid

Subject: Count identical rows in a matrix

From: Steven Lord

Date: 1 Jun, 2009 16:37:22

Message: 2 of 8


"H-H Fahad" <kontaktemail@gmail.com> wrote in message
news:h00vvm$7vv$1@fred.mathworks.com...
> Hello everybody,
> I have a digit matrix of size [3200 2], and want to count identical rows
> in that matrix. Example:
>
> [123 343] 2 repititions
> [123 688]
> [145 668]
> [123 343] 2 repititions
> ...
>
> I have tired to go through the rows one by one, and this resulted in a
> script with runtime n^2 (app 7 minutes, & I have to run it 188 times!)
>
> I know matlab can do this much faster with its integrated functions, but
> I'm afraid that's all I know about it ;)
> Since there are real experts here, I decided to post a message, and see if
> anybody can help me!
> Regards Hamid

If all the elements of your 3200-by-2 matrix are positive integers, use
SPARSE or ACCUMARRAY with the first column of your matrix being the row
indices and the second the column indices.

--
Steve Lord
slord@mathworks.com

Subject: Count identical rows in a matrix

From: Bruno Luong

Date: 1 Jun, 2009 17:17:01

Message: 3 of 8

"Steven Lord" <slord@mathworks.com> wrote in message <h0103f$fiv$1@fred.mathworks.com>...

> If all the elements of your 3200-by-2 matrix are positive integers, use
> SPARSE or ACCUMARRAY with the first column of your matrix being the row
> indices and the second the column indices.
>

For generic array, call UNIQUE first

A=-ceil(4*rand(20,2));

[Au I J]=unique(A,'rows');
Count=accumarray(J,1);
fprintf('(%d,%d) -> %d times\n',[Au Count]')

Bruno

Subject: Count identical rows in a matrix

From: Jos

Date: 1 Jun, 2009 18:15:18

Message: 4 of 8

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <h012ed$hva$1@fred.mathworks.com>...
> "Steven Lord" <slord@mathworks.com> wrote in message <h0103f$fiv$1@fred.mathworks.com>...
>
> > If all the elements of your 3200-by-2 matrix are positive integers, use
> > SPARSE or ACCUMARRAY with the first column of your matrix being the row
> > indices and the second the column indices.
> >
>
> For generic array, call UNIQUE first
>
> A=-ceil(4*rand(20,2));
>
> [Au I J]=unique(A,'rows');
> Count=accumarray(J,1);
> fprintf('(%d,%d) -> %d times\n',[Au Count]')
>
> Bruno

I suggest using HISTC instead of accumarray:

Count = histc(J,1:numel(Au))

Jos

Subject: Count identical rows in a matrix

From: Bruno Luong

Date: 1 Jun, 2009 18:35:02

Message: 5 of 8


>
> I suggest using HISTC instead of accumarray:
>
> Count = histc(J,1:numel(Au))
>
> Jos

It probably depends on the size. accumarray has linear complexity O(N) - in principle (?) - but with non negligible overhead, histc is N*log(N), and very well optimized Matlab stick function. But I haven't time them to know which one is better. Sparse command - for sure - is at best O(N)*log(N).

Bruno

Subject: Count identical rows in a matrix

From: Bruno Luong

Date: 1 Jun, 2009 18:55:03

Message: 6 of 8

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <h0170m$cl0$1@fred.mathworks.com>...
>
> >
> > I suggest using HISTC instead of accumarray:
> >
> > Count = histc(J,1:numel(Au))
> >
> > Jos
>
> It probably depends on the size. accumarray has linear complexity O(N) - in principle (?) - but with non negligible overhead, histc is N*log(N), and very well optimized Matlab stick function. But I haven't time them to know which one is better. Sparse command - for sure - is at best O(N)*log(N).
>

OK I have timed the two for large size array:

>> B=randperm(1e6).';

>> tic; I=histc(B,1:1e6); toc;
Elapsed time is 0.672955 seconds. % best time for 3 runs

>> tic; J=accumarray(B,1); toc;
Elapsed time is 0.083807 seconds. % best time for 3 runs

It indicates that accumarray complexity is likely to be linear.

% Bruno

Subject: Count identical rows in a matrix

From: Jos

Date: 1 Jun, 2009 19:29:01

Message: 7 of 8

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <h01867$s4q$1@fred.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <h0170m$cl0$1@fred.mathworks.com>...
> >
> > >
> > > I suggest using HISTC instead of accumarray:
> > >
> > > Count = histc(J,1:numel(Au))
> > >
> > > Jos
> >
> > It probably depends on the size. accumarray has linear complexity O(N) - in principle (?) - but with non negligible overhead, histc is N*log(N), and very well optimized Matlab stick function. But I haven't time them to know which one is better. Sparse command - for sure - is at best O(N)*log(N).
> >
>
> OK I have timed the two for large size array:
>
> >> B=randperm(1e6).';
>
> >> tic; I=histc(B,1:1e6); toc;
> Elapsed time is 0.672955 seconds. % best time for 3 runs
>
> >> tic; J=accumarray(B,1); toc;
> Elapsed time is 0.083807 seconds. % best time for 3 runs
>
> It indicates that accumarray complexity is likely to be linear.
>
> % Bruno

You're right Bruno, thanks for your insight. Btw, specifying the size argument for accumarray will speed things up once more:

B = randperm(1e6) .';
f1 = @() accumarray(B,1) ;
f2 = @() accumarray(B,1,[numel(B) 1]) ;

t1 = timeit(f1)
t2 = timeit(f2)
% t1/t2 ~= 1.15

Best,
Jos

Subject: Count identical rows in a matrix

From: Bruno Luong

Date: 1 Jun, 2009 19:45:03

Message: 8 of 8

"Jos " <#10584@fileexchange.com> wrote in message <h01a5t$8b4$1@fred.mathworks.com>...

>
> t1 = timeit(f1)
> t2 = timeit(f2)
> % t1/t2 ~= 1.15
>

Ah good to know Jos. One more reason I won't use the free floating accumarray in the future (the other reasons are readability and consistent outputs).

Bruno

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us