How do you search for elements in an array that meet a certain
test, and then replace or select those elements
(like the where
function
in IDL
and the find
function
in MATLAB)?
In terms of "bang-for-the-buck,"
the IDL where
and MATLAB find
functions are arguably
the single-most important functions available in those languages.
Combined with the ability to easily extract any arbitrary list of
elements, IDL and MATLAB enable scientists to write compact and
readable data analysis code.
For instance, let's say we have an array of one month of
of daily temperatures (T
), and we want to extract
all temperatures greater than 280 K and put them into a
separate array (bigT
).
The IDL code to do this would be
[1]:
bigpts = where(T gt 280.0) bigT = T[bigpts]
To replace the elements of T
specified by bigpts
with
some other value (say 5.0), in IDL just type: T[bigpts] = 5.0
.
How can we do this in Python? The key is instead of creating an array that contains the indices of the array elements that meet our condition, we create a mask which has values of 1 or 0, depending on whether the corresponding element in the original data array meets the condition [2].
We create an array of 28 points of temperature data:
T = Numeric.sin(Numeric.arange( 28 \ , typecode=Numeric.Float32)) \ * 5.0 + 279.0
where T
has an amplitude of 5 K, oscillating around a
mean of 279 K.
There are a variety of ways to create masks in Numeric
.
For instance, to create a mask of point greater than
280 K, you can use the Numeric.where
command:
bigpts_mask = Numeric.where(T > 280.0, 1, 0)
or element-wise logical and comparison functions:
bigpts_mask = Numeric.greater(T, 280.0)
To extract these elements into the array bigT
,
use the Numeric.compress
function:
bigT = Numeric.compress(bigpts_mask, T)
To replace the values in the original data array T
for which bigpts_mask
is true (i.e. 1), use the
Numeric.putmask
command. For instance, to replace all the
points where T
is greater than 280 K with 5 K:
Numeric.putmask(T, bigpts_mask, 5.0)
Numeric.putmask
can also be used in replacing array members
element by element, repeating the replacement array if necessary.
For instance:
data = Numeric.array([3,5,-2,4,6,1,-5,-8] \ , typecode=Numeric.Float32) mask = [1,1,1,0,0,1,0,1] newdata = [-999,-888] Numeric.putmask(data, mask, newdata)
gives
>>> data
array([-999., -888., -999., 4., 6., -888., -5., -888.],'f')
Note that newdata
is shorter than data
;
when doing the
replacement based on mask
, Python repeats
newdata
as
many times as needed to make a replacement array of the same size
as data
, the applies the "virtual" replacement array
at places where mask
is true.
Footnotes:
[1]
bigpts
is an array of the indices in T
that satisfy the condition in the where
call.
In IDL the indices are converted to "equivalent" 1-D values,
regardless of the dimension of T
, so they uniquely
refer to elements in T
, as long as the dimensions
of T
do not change. Thus, in IDL bigT
is a 1-D array (regardless of the dimension of T
).
[Return to text.]
[2]
Here when we mean "mask" we don't mean a "masked array,"
which is an array that has missing or invalid elements,
and are accomodated in the MA
or
MV
modules.
[Return to text.]
Notes: Thanks to Mike Steder for the help!