SpinMan

08-15-2011, 12:36 PM

I've got a long vector of numbers and I'd like to find out where each occurrence of a particular series is. An example might be where does [43,44,45] occur in indgen(100)?

Any ideas?

Any ideas?

View Full Version : Pattern Matching

SpinMan

08-15-2011, 12:36 PM

I've got a long vector of numbers and I'd like to find out where each occurrence of a particular series is. An example might be where does [43,44,45] occur in indgen(100)?

Any ideas?

Any ideas?

hcrisp

08-16-2011, 01:30 PM

Here is one way (you may have to protect against errors, however):

vals = INDGEN(100)

nvals = N_ELEMENTS(vals)

vsearch = [43, 44, 45]

nsearch = N_ELEMENTS(vsearch)

nend = nvals - nsearch ids = WHERE(vals(0:nend) EQ vsearch(0) AND vals(1:nend+1) EQ vsearch(1) AND vals(2:nend+2) EQ vsearch(2), cnt)

IF cnt GT 0 THEN PRINT, ids

Also, you could revert to using strings with delimiters and use STRPOS, but that would probably not be my choice of methods.

vals = INDGEN(100)

nvals = N_ELEMENTS(vals)

vsearch = [43, 44, 45]

nsearch = N_ELEMENTS(vsearch)

nend = nvals - nsearch ids = WHERE(vals(0:nend) EQ vsearch(0) AND vals(1:nend+1) EQ vsearch(1) AND vals(2:nend+2) EQ vsearch(2), cnt)

IF cnt GT 0 THEN PRINT, ids

Also, you could revert to using strings with delimiters and use STRPOS, but that would probably not be my choice of methods.

SpinMan

09-20-2011, 08:08 AM

Thanks! I'm still looking for a sliding window implementation whereby i could search very large arrays for long patterns.

hcrisp

09-20-2011, 10:35 AM

Well, here would be a way to do it with strings. The 'find' variable can be any length. Will work with integers, but probably not with float values. It will also error if your sequence contains false matches, like [4,3,4,4,4,5], but you could introduce a delimiter to deal with that possbility if your 'find' string is not unique enough.

vals = INDGEN(100)

istrvals = STRTRIM(vals, 2)

strvals = STRJOIN(istrvals)

indicies = LONARR(N_ELEMENTS(vals))

FOR i=1L, N_ELEMENTS(vals)-1 DO indicies(i) = TOTAL(STRLEN(istrvals(0:i-1)))

find = [43, 44, 45]

nfind = N_ELEMENTS(find)

strfind = STRJOIN(STRTRIM(find, 2))

pos = STRPOS(strvals, strfind)

ids = WHEREIN(indicies, pos)

PM, ids

FOR i=0L, N_ELEMENTS(ids)-1 DO PM, vals(ids(i):ids(i)+nfind-1)

vals = INDGEN(100)

istrvals = STRTRIM(vals, 2)

strvals = STRJOIN(istrvals)

indicies = LONARR(N_ELEMENTS(vals))

FOR i=1L, N_ELEMENTS(vals)-1 DO indicies(i) = TOTAL(STRLEN(istrvals(0:i-1)))

find = [43, 44, 45]

nfind = N_ELEMENTS(find)

strfind = STRJOIN(STRTRIM(find, 2))

pos = STRPOS(strvals, strfind)

ids = WHEREIN(indicies, pos)

PM, ids

FOR i=0L, N_ELEMENTS(ids)-1 DO PM, vals(ids(i):ids(i)+nfind-1)

hcrisp

09-20-2011, 10:49 AM

Found a better way. I've always known TENSORs can be useful, and I finally found a use case. It's unbelievably compact. I wonder why nobody from VNI mentioned this? Are they reading the forums?

a = [LINDGEN(100, LINDGEN(100)]

b = [43, 44, 45]

c = TENSOR_EQ(a, b)

id = WHERE(c(*,0) EQ 1, cnt)

pm, id

a = [LINDGEN(100, LINDGEN(100)]

b = [43, 44, 45]

c = TENSOR_EQ(a, b)

id = WHERE(c(*,0) EQ 1, cnt)

pm, id

frog

09-21-2011, 01:54 AM

a=findgen(100)

a(20:22)=[2,3,4]

idx = wherein(a, [2,3,4])

print, idx

2 3 4 20 21 22

Hope this help!

a(20:22)=[2,3,4]

idx = wherein(a, [2,3,4])

print, idx

2 3 4 20 21 22

Hope this help!

SpinMan

09-22-2011, 10:47 AM

Thanks for the suggestions.

The tensor method can easily exceed memory limits as I was wont to discover, and execution time is long. And if the pattern occurs more than once the indices returned multiple times; i.e., 43,85,143,185. So there's a little manuipulation to be done.

The wherein method just returns the locations of the elements in the pattern. Not where the pattern exists.

So, after much thought I'm going with a loop method of finding all the occurrences of pat(0) and then SAMEing each location+pattern_length with the pattern itself. This method is very fast. On 10^6 arrays with a pattern of 5 elements it took 1.25 seconds to find all 13 patterns.

Thanks again!

The tensor method can easily exceed memory limits as I was wont to discover, and execution time is long. And if the pattern occurs more than once the indices returned multiple times; i.e., 43,85,143,185. So there's a little manuipulation to be done.

The wherein method just returns the locations of the elements in the pattern. Not where the pattern exists.

So, after much thought I'm going with a loop method of finding all the occurrences of pat(0) and then SAMEing each location+pattern_length with the pattern itself. This method is very fast. On 10^6 arrays with a pattern of 5 elements it took 1.25 seconds to find all 13 patterns.

Thanks again!

allan

09-23-2011, 12:09 PM

below is some code similar to what spinman suggested but probably faster;

instead of looping over the container array, it loops over those elements of

the container array which match the first element of the pattern;

the runtime for the parameters mentioned in the last post

is about 0.1 seconds on the slowest machine i could find;

just paste the entire code block at the wave prompt;

m = 1e6

n = 5

p = 13

a = long( m*randomu(s,m) ) ;random container

b = long( m*randomu(s,n) ) ;random pattern

l = long( m*randomu(s,p)-n ) ;random locations

for i=0l,p-1 do a(l(i)) = b ;insert patterns

t = systime(1) ;start time

r = lonarr(m) ;result storage

i = where( a(0:m-n) eq b(0), q ) ;possible locations

for j=0l,q-1 do begin & k = i(j) & r(k) = min( a(k:k+p-1) eq b ) & end

j = where(r)

print, systime(1)-t

print, l(sort(l))

print, j

instead of looping over the container array, it loops over those elements of

the container array which match the first element of the pattern;

the runtime for the parameters mentioned in the last post

is about 0.1 seconds on the slowest machine i could find;

just paste the entire code block at the wave prompt;

m = 1e6

n = 5

p = 13

a = long( m*randomu(s,m) ) ;random container

b = long( m*randomu(s,n) ) ;random pattern

l = long( m*randomu(s,p)-n ) ;random locations

for i=0l,p-1 do a(l(i)) = b ;insert patterns

t = systime(1) ;start time

r = lonarr(m) ;result storage

i = where( a(0:m-n) eq b(0), q ) ;possible locations

for j=0l,q-1 do begin & k = i(j) & r(k) = min( a(k:k+p-1) eq b ) & end

j = where(r)

print, systime(1)-t

print, l(sort(l))

print, j

Powered by vBulletin® Version 4.2.3 Copyright © 2019 vBulletin Solutions, Inc. All rights reserved.