"Search-and-Replace" In an Array

Question

How do you search for elements in an array that meet a certain test, and then replace or select those elements (like the where function in IDL and the find function in MATLAB)?

Answer

In terms of "bang-for-the-buck," the IDL where and MATLAB find functions are arguably the single-most important functions available in those languages. Combined with the ability to easily extract any arbitrary list of elements, IDL and MATLAB enable scientists to write compact and readable data analysis code. For instance, let's say we have an array of one month of of daily temperatures (T), and we want to extract all temperatures greater than 280 K and put them into a separate array (bigT). The IDL code to do this would be [1]:

bigpts = where(T gt 280.0)
bigT = T[bigpts]

To replace the elements of T specified by bigpts with some other value (say 5.0), in IDL just type: T[bigpts] = 5.0.

How can we do this in Python? The key is instead of creating an array that contains the indices of the array elements that meet our condition, we create a mask which has values of 1 or 0, depending on whether the corresponding element in the original data array meets the condition [2].

We create an array of 28 points of temperature data:

T = Numeric.sin(Numeric.arange( 28 \
  , typecode=Numeric.Float32)) \
  * 5.0 + 279.0

where T has an amplitude of 5 K, oscillating around a mean of 279 K.

There are a variety of ways to create masks in Numeric. For instance, to create a mask of point greater than 280 K, you can use the Numeric.where command:

bigpts_mask = Numeric.where(T > 280.0, 1, 0)

or element-wise logical and comparison functions:

bigpts_mask = Numeric.greater(T, 280.0)

To extract these elements into the array bigT, use the Numeric.compress function:

bigT = Numeric.compress(bigpts_mask, T)

To replace the values in the original data array T for which bigpts_mask is true (i.e. 1), use the Numeric.putmask command. For instance, to replace all the points where T is greater than 280 K with 5 K:

Numeric.putmask(T, bigpts_mask, 5.0)

Numeric.putmask can also be used in replacing array members element by element, repeating the replacement array if necessary. For instance:

data = Numeric.array([3,5,-2,4,6,1,-5,-8] \
     , typecode=Numeric.Float32)
mask = [1,1,1,0,0,1,0,1]
newdata = [-999,-888]
Numeric.putmask(data, mask, newdata)

gives

>>> data
array([-999., -888., -999., 4., 6., -888., -5., -888.],'f')

Note that newdata is shorter than data; when doing the replacement based on mask, Python repeats newdata as many times as needed to make a replacement array of the same size as data, the applies the "virtual" replacement array at places where mask is true.


Footnotes:

[1] bigpts is an array of the indices in T that satisfy the condition in the where call. In IDL the indices are converted to "equivalent" 1-D values, regardless of the dimension of T, so they uniquely refer to elements in T, as long as the dimensions of T do not change. Thus, in IDL bigT is a 1-D array (regardless of the dimension of T). [Return to text.]

[2] Here when we mean "mask" we don't mean a "masked array," which is an array that has missing or invalid elements, and are accomodated in the MA or MV modules. [Return to text.]

Notes: Thanks to Mike Steder for the help!

Return to the Tips and Examples index page.

Updated: March 19, 2004 by Johnny Lin <email address>. License.