Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
speedup of ismember

Subject: speedup of ismember

From: Michal Kvasnicka

Date: 22 Apr, 2011 10:25:08

Message: 1 of 8

Hello,
I am looking for any possibilities, how to speedup following code for location of the integer vector within the matrix (buffer).

Lets start with testing integer matrix (buffer) respresented by:
 m = 1000000; % ... number of rows
 n = 20; % ... dimension of vector
 M = randi([1 n],m,n);
 
and then the function which is able to find the location of unique row vector x within matrix M may looks like:

 testloc = 123456;
 x = M(testloc,:);
 [tf loc] = ismember(x,M,'rows');

Is possible to find locations of multiple vectors, too:
 [tf loc] = ismember([x;y;z],M,'rows');

This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
Is there any chance to speedup this searching method based on "ismember" function?

Important notes:
1. The matrix M contains only unique vectors (rows).
2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
3. The problem is motivated by optimization of the cpu time consuming black-box function. In this case is very important to avoid repeated evaluation randomly generated input sampling vectors by properly designed buffering.
4. The response of the searching method must be significantly faster then evaluation of the black-box function even in the case of large number od samplicng points in the buffer (matrix).

Michal

Subject: speedup of ismember

From: per isakson

Date: 22 Apr, 2011 13:07:05

Message: 2 of 8

"Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
> Hello,
> I am looking for any possibilities, how to speedup following code for location of the integer vector within the matrix (buffer).
>
> Lets start with testing integer matrix (buffer) respresented by:
> m = 1000000; % ... number of rows
> n = 20; % ... dimension of vector
> M = randi([1 n],m,n);
>
> and then the function which is able to find the location of unique row vector x within matrix M may looks like:
>
> testloc = 123456;
> x = M(testloc,:);
> [tf loc] = ismember(x,M,'rows');
>
> Is possible to find locations of multiple vectors, too:
> [tf loc] = ismember([x;y;z],M,'rows');
>
> This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
> Is there any chance to speedup this searching method based on "ismember" function?
>
> Important notes:
> 1. The matrix M contains only unique vectors (rows).
> 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
> 3. The problem is motivated by optimization of the cpu time consuming black-box function. In this case is very important to avoid repeated evaluation randomly generated input sampling vectors by properly designed buffering.
> 4. The response of the searching method must be significantly faster then evaluation of the black-box function even in the case of large number od samplicng points in the buffer (matrix).
>
> Michal

I have a function, which is six times faster than ismember with your first example (one row). It is bases on a contribution by us, which I once downloaded from the FEX (or copied from a post to this Newsgroup). The function is based on STRFIND (us' trick).

Now, I cannot find it in the FEX, but see: strpat: a pedestrian, exactly matching pattern finder / replacer, by us

BTW: tic, ix = find( all( M == repmat( x, m, 1 ), 2 ), 1 ); toc is more than twice as fast as ismember with your example.

- per

Subject: speedup of ismember

From: Michal Kvasnicka

Date: 22 Apr, 2011 14:05:19

Message: 3 of 8

"per isakson" wrote in message <ioruhp$fko$1@fred.mathworks.com>...
> "Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
> > Hello,
> > I am looking for any possibilities, how to speedup following code for location of the integer vector within the matrix (buffer).
> >
> > Lets start with testing integer matrix (buffer) respresented by:
> > m = 1000000; % ... number of rows
> > n = 20; % ... dimension of vector
> > M = randi([1 n],m,n);
> >
> > and then the function which is able to find the location of unique row vector x within matrix M may looks like:
> >
> > testloc = 123456;
> > x = M(testloc,:);
> > [tf loc] = ismember(x,M,'rows');
> >
> > Is possible to find locations of multiple vectors, too:
> > [tf loc] = ismember([x;y;z],M,'rows');
> >
> > This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
> > Is there any chance to speedup this searching method based on "ismember" function?
> >
> > Important notes:
> > 1. The matrix M contains only unique vectors (rows).
> > 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
> > 3. The problem is motivated by optimization of the cpu time consuming black-box function. In this case is very important to avoid repeated evaluation randomly generated input sampling vectors by properly designed buffering.
> > 4. The response of the searching method must be significantly faster then evaluation of the black-box function even in the case of large number od samplicng points in the buffer (matrix).
> >
> > Michal
>
> I have a function, which is six times faster than ismember with your first example (one row). It is bases on a contribution by us, which I once downloaded from the FEX (or copied from a post to this Newsgroup). The function is based on STRFIND (us' trick).
>
> Now, I cannot find it in the FEX, but see: strpat: a pedestrian, exactly matching pattern finder / replacer, by us
>
> BTW: tic, ix = find( all( M == repmat( x, m, 1 ), 2 ), 1 ); toc is more than twice as fast as ismember with your example.
>
> - per
My be I am stupid, but I can not find the way how to effectively use STRPAT as a faster ISMEMBER replacement.

Could you help me a bit more?

Michal

Subject: speedup of ismember

From: Matt J

Date: 22 Apr, 2011 14:21:04

Message: 4 of 8

"Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
>
> This solution is very elegant, but I am looking for something faster. For large m > 1e6 and m>20 is typical response time about 0.1sec. In a case of multiple vectors is the response time about 10times longer.
================================

Using BSXFUN in conjunction with FIND, I get nearly a factor of 20 speed-up in the test case below:


 m = 1000000; % ... number of rows
 n = 20; % ... dimension of vector
 M = randi([1 n],m,n);
 
  testloc = 123456;
  testloc=testloc:testloc+5;
 x = M(testloc,:);
 
 tic;
 [~,loc1] = ismember(x,M,'rows');
 toc;
 %Elapsed time is 4.058689 seconds.
 
  tic;
  loc= find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3));
 toc;
%Elapsed time is 0.260783 seconds.

Subject: speedup of ismember

From: Matt J

Date: 22 Apr, 2011 14:30:20

Message: 5 of 8

"Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
>
> 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
================

Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);

Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.

Subject: speedup of ismember

From: Michal Kvasnicka

Date: 22 Apr, 2011 14:59:05

Message: 6 of 8

"Matt J" wrote in message <ios3ds$d56$1@fred.mathworks.com>...
> "Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
> >
> > 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
> ================
>
> Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);
>
> Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.

So, if I understand you well, you propose work with matrix M with n rows and m columns and them use M' in your search method:
loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))
 Am I right?

Subject: speedup of ismember

From: per isakson

Date: 22 Apr, 2011 15:40:04

Message: 7 of 8

"Michal Kvasnicka" wrote in message <ios1uv$fp6$1@fred.mathworks.com>...
> "per isakson" wrote in message <ioruhp$fko$1@fred.mathworks.com>...
> > "Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
....
>
> Could you help me a bit more?
>
> Michal

Matt's solution is faster than my function, which is based one STRFIND.

However, the idea (which is not at all obvious) is to rearrange M and x so that they fit STRFIND. Use STRFIND as in:
>> row = (1:12);
>> pttrn= (3:5);
>> strfind( row, pttrn )
ans =
     3
and next check that the index returned match the beginning of a row.

- per

Subject: speedup of ismember

From: Matt J

Date: 22 Apr, 2011 15:52:05

Message: 8 of 8

"Michal Kvasnicka" wrote in message <ios53p$cf5$1@fred.mathworks.com>...
> "Matt J" wrote in message <ios3ds$d56$1@fred.mathworks.com>...
> > "Michal Kvasnicka" wrote in message <iorl24$dal$1@fred.mathworks.com>...
> > >
> > > 2. The matrix M plays a role of the sampling vectors buffer, so its starting size is zero and then the matrix M is extended by each unique vector sample without any ordering.
> > ================
> >
> > Incidentally, this is a notoriously bad thing to do. It is better that you pre-allocate M with the largest size it will need to assume. Then, run your search on some truncated portion of the matrix M(1:N,:);
> >
> > Also, if you are going to be appending vectors to matrices, it is better to do so column-wise instead of row-wise. This makes memory access more efficient.
>
> So, if I understand you well, you propose work with matrix M with n rows and m columns

Yes.

>and them use M' in your search method:
> loc= find(any(all(bsxfun(@eq,reshape(x,1,n,[]),M'),2),3))

No, if you work with M organized columnwise, I would reorganize this accordingly

  tic;
  loc1= find(any(all(bsxfun(@eq,M,reshape(x,n,1,size(x,2))),1),3));
 toc;
 %Elapsed time is 0.351774 seconds.

Another possibly faster approach based on Cauchy-Schwartz is as follows. I don't include the computation of Mnorms in the tic...toc, because that is something you could update incrementally, as you are doing with M,

  
   Mnorms=sqrt(sum(M.^2,1));
   tic;
    [~,loc]=max(bsxfun(@rdivide,x.'*M, Mnorms),[],2);
   toc;
   %Elapsed time is 0.144179 seconds.

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us