View Full Version : Pattern Matching

08-15-2011, 11:36 AM
I've got a long vector of numbers and I'd like to find out where each occurrence of a particular series is. An example might be where does [43,44,45] occur in indgen(100)?

Any ideas?

08-16-2011, 12:30 PM
Here is one way (you may have to protect against errors, however):

vals = INDGEN(100)
nvals = N_ELEMENTS(vals)
vsearch = [43, 44, 45]
nsearch = N_ELEMENTS(vsearch)
nend = nvals - nsearch ids = WHERE(vals(0:nend) EQ vsearch(0) AND vals(1:nend+1) EQ vsearch(1) AND vals(2:nend+2) EQ vsearch(2), cnt)
IF cnt GT 0 THEN PRINT, ids

Also, you could revert to using strings with delimiters and use STRPOS, but that would probably not be my choice of methods.

09-20-2011, 07:08 AM
Thanks! I'm still looking for a sliding window implementation whereby i could search very large arrays for long patterns.

09-20-2011, 09:35 AM
Well, here would be a way to do it with strings. The 'find' variable can be any length. Will work with integers, but probably not with float values. It will also error if your sequence contains false matches, like [4,3,4,4,4,5], but you could introduce a delimiter to deal with that possbility if your 'find' string is not unique enough.

vals = INDGEN(100)
istrvals = STRTRIM(vals, 2)
strvals = STRJOIN(istrvals)
indicies = LONARR(N_ELEMENTS(vals))
FOR i=1L, N_ELEMENTS(vals)-1 DO indicies(i) = TOTAL(STRLEN(istrvals(0:i-1)))
find = [43, 44, 45]
nfind = N_ELEMENTS(find)
strfind = STRJOIN(STRTRIM(find, 2))
pos = STRPOS(strvals, strfind)
ids = WHEREIN(indicies, pos)
PM, ids
FOR i=0L, N_ELEMENTS(ids)-1 DO PM, vals(ids(i):ids(i)+nfind-1)

09-20-2011, 09:49 AM
Found a better way. I've always known TENSORs can be useful, and I finally found a use case. It's unbelievably compact. I wonder why nobody from VNI mentioned this? Are they reading the forums?

a = [LINDGEN(100, LINDGEN(100)]
b = [43, 44, 45]
c = TENSOR_EQ(a, b)
id = WHERE(c(*,0) EQ 1, cnt)
pm, id

09-21-2011, 12:54 AM
idx = wherein(a, [2,3,4])
print, idx
2 3 4 20 21 22

Hope this help!

09-22-2011, 09:47 AM
Thanks for the suggestions.

The tensor method can easily exceed memory limits as I was wont to discover, and execution time is long. And if the pattern occurs more than once the indices returned multiple times; i.e., 43,85,143,185. So there's a little manuipulation to be done.

The wherein method just returns the locations of the elements in the pattern. Not where the pattern exists.

So, after much thought I'm going with a loop method of finding all the occurrences of pat(0) and then SAMEing each location+pattern_length with the pattern itself. This method is very fast. On 10^6 arrays with a pattern of 5 elements it took 1.25 seconds to find all 13 patterns.

Thanks again!

09-23-2011, 11:09 AM
below is some code similar to what spinman suggested but probably faster;
instead of looping over the container array, it loops over those elements of
the container array which match the first element of the pattern;
the runtime for the parameters mentioned in the last post
is about 0.1 seconds on the slowest machine i could find;
just paste the entire code block at the wave prompt;

m = 1e6
n = 5
p = 13
a = long( m*randomu(s,m) ) ;random container
b = long( m*randomu(s,n) ) ;random pattern
l = long( m*randomu(s,p)-n ) ;random locations
for i=0l,p-1 do a(l(i)) = b ;insert patterns
t = systime(1) ;start time
r = lonarr(m) ;result storage
i = where( a(0:m-n) eq b(0), q ) ;possible locations
for j=0l,q-1 do begin & k = i(j) & r(k) = min( a(k:k+p-1) eq b ) & end
j = where(r)
print, systime(1)-t
print, l(sort(l))
print, j