;---------------------------------------------------------------------------- ; \$Id: ymdh_mean.pro,v 1.3 2002/09/14 01:10:49 johnny Exp johnny \$ ;+ ; NAME: ; YMDH_MEAN ; ; PURPOSE: ; Given a set of timeseries and a corresponding vector of Year/Month/Day/ ; Hour long integer (expressed as 4-digit year yyyymmddhh) giving the ; time coordinates of each element in the timeseries, calculates a ; "staggered" running mean (in contrast with SMOOTH which calculates a ; non-staggered running mean), e.g. a 24-hour average is computed every ; 24 hours. See example for a fuller explanation. ; ; CATEGORY: ; Statistics. ; ; CALLING SEQUENCE: ; Result = YMDH_MEAN(in_data, in_ymdh, in_increment) ; ; INPUTS: ; in_data: Array of data at in_ymdh times. Dimensioned (NTIME, ; NSERIES) where each row of the array is a separate ; timeseries. Array can contain NaN values. Variable ; unchanged by routine. ; ; in_ymdh: Time values of the points in in_data. Year/Month/Day/ ; Hour (yyyymmddhh) as a long integer (dimensioned NTIME). ; Timestep between elements must be regular (e.g. every ; hour, 6 hours, day, etc.). Variable unchanged by ; routine. ; ; in_increment: The number of timesteps to compute the average on, in ; units of the in_ymdh interval. Thus, if in_increment=2 ; and in_ymdh is every 6 hours, the mean returned is ; that over every 12 hours. Scalar integer. Variable ; unchanged by routine. ; ; INPUT KEYWORD PARAMETERS: ; MIN_NUM: Minimum number of non-NaN data points needed for the ; function to return a mean value. If fewer than this ; number in an interval exists, NaN is returned. Default ; is set to the maximum of [in_increment-1, 1]. Variable ; is unchanged by routine. ; ; OUTPUT KEYWORD PARAMETERS: ; YMDH_OUT: Year/Month/Day/Hour (yyyymmddhh) of each element in the ; output averaged timeseries. Set to the original time ; value of the beginning of the sub-range that was ; averaged for that element in Result. Long integer ; dimensioned NTOUT. Created by function. ; ; OUTPUTS: ; Result: Array of means, dimensioned (NTOUT, NSERIES). Each ; row is a separate averaged timeseries. Same type as ; in_data if in_data is floating or double, but will ; default to floating if in_data is not floating or ; double. Created by function. ; ; FILE DEVICE I/O: ; None. ; ; COMMON BLOCKS: ; None. ; ; EXAMPLE: ; If you have an hourly dataset data at ymdh times and issue this ; function call: ; ; Result = YMDH_MEAN(data, ymdh, 6, Ymdh_Out=ymdh_out) ; ; then Result will contain the mean of data over every 6 hour ; period, computed every 6 hour period starting with the first ; hour of data. The mean will be written at the first occurrence ; of the data and will be calculated when there are at least 5 data ; non-NaN points available. ; ; Thus, if the first 8 data points occur at: ; ; [ 1998010102, 1998010103, 1998010104, 1998010105, 1998010106 \$ ; , 1998010107, 1998010108, 1998010109 ... ] ; ; then the first element in Result will correspond to the first ; element in ymdh_out which will have the value 1998010102, and that ; element in Result will be the mean of the non-NaN data values at: ; ; [ 1998010102, 1998010103, 1998010104, 1998010105, 1998010106 \$ ; , 1998010107 ], ; ; MODIFICATION HISTORY: ; - 12 Sep 2002: Orig. ver. Johnny Lin, CIRES/University of Colorado. ; Email: air_jlin@yahoo.com. Passed moderately adequate tests. ; ; NOTES: ; - Written for IDL 5.5. ; - All keyword parameters are optional unless otherwise stated. ; - Note that the output times exist only when a full in_increment number ; of input timesteps are present. Thus, if NTIME=10 and in_increment=3, ; then NTOUT=3 and the last input timestep is discarded and left unused ; in averaging. ; - No procedures called with _Extra keyword invoked. ; - User-written procedures called: YMDH_TO_JULDAY. ;- ; Copyright (c) 2002 Johnny Lin. For licensing, distribution conditions, ; and contact information, see http://www.johnny-lin.com/lib.html. ;---------------------------------------------------------------------------- FUNCTION YMDH_MEAN, in_data, in_ymdh, in_increment \$ , MIN_NUM = in_min_num \$ , YMDH_OUT = out_ymdh_out \$ , _EXTRA = extra ; -------------------- Error Check and Parameter Setting -------------------- COMPILE_OPT IDL2 ON_ERROR, 0 data = in_data ;- protect input ymdh = in_ymdh increment = in_increment epsilon = (MACHAR()).eps ;- set floating-point precision variable ; for use in floating-point comparisons if (SIZE(data, /N_Dimensions) eq 1) then begin ;- set NTIME, NSERIES NTIME = N_ELEMENTS(data) ; dimensions NSERIES = 1 endif else begin sd_array = SIZE(data, /Dimensions) NTIME = sd_array[0] NSERIES = sd_array[1] endelse if (N_ELEMENTS(in_min_num) ne 0) then begin ;- set min_num parameter min_num = in_min_num endif else begin min_num = (increment-1) > 1 endelse type_data = SIZE(data, /Type) ;- set type of output if ((type_data ne 4) and (type_data ne 5)) then \$ ; data type_data = 4 if ((SIZE(data))[1] ne NTIME) then \$ ;- error check MESSAGE, 'error--bad data dims' if ( (SIZE(ymdh, /Type) ne 3) or \$ (SIZE(increment, /Type) ne 3) ) then \$ MESSAGE, 'error--these inputs must be long' if (NTIME lt increment) then \$ MESSAGE, 'error--NTIME must be ge increment' if (min_num gt increment) then \$ MESSAGE, 'error--min_num must be le increment' ; -------------- Make Sure Input YMDH Is In Regular Timesteps --------------- jday_in = YMDH_TO_JULDAY(ymdh) time_incr = (jday_in - SHIFT(jday_in,1))[1:*] tmp = WHERE(ABS(time_incr - MEAN(time_incr)) ge 2.d0*epsilon, count) if (count ne 0) then MESSAGE, 'error--data timestep changes in timeseries' time_incr = 0 ;- deallocate variables to save memory jday_in = 0 ; -------------------------------- Set NTOUT -------------------------------- ; ; NTOUT is a long integer and is the number of elements in the output ; averaged timeseries. Since it is calculated using integer division, ; it is the same as FLOOR(FLOAT(NTIME)/FLOAT(increment)). NTOUT = NTIME/increment ; ----- Set Arrays of Beginning and Ending Indices For Averaging Regions ---- ; ; For each NTOUT point, beg_pt is the beginning (inclusive) index of the ; original timeseries to take the average over, and end_pt is the ending ; (inclusive) index to take the average over. beg_pt = LINDGEN(NTOUT+1) * increment end_pt = SHIFT(beg_pt, -1) - 1 beg_pt = beg_pt[0:NTOUT-1] end_pt = end_pt[0:NTOUT-1] if (MAX([beg_pt,end_pt]) ge NTIME) then \$ MESSAGE, 'error--bad indices' ; ------------- Take Means and Set Times Those Means "Occur" At ------------- ; ; Variable means is the means and ymdh_out is when those means occur at, ; which is set to the beginning of the range over which the means were ; calculated. means = MAKE_ARRAY(NTOUT, NSERIES, Type=type_data, Value=!VALUES.F_NAN) min_num_test = FLOAT(min_num)-0.01 ;- floating value for integral value ; ge min_num test for iser=0,NSERIES-1 do begin for it=0,NTOUT-1 do begin sub_data = data[beg_pt[it]:end_pt[it], iser] num_good = TOTAL(FINITE(sub_data)) if (num_good ge min_num_test) then begin means[it, iser] = MEAN(sub_data, /Nan) endif endfor endfor ymdh_out = ymdh[beg_pt] ; ----------------------------- Prepare Output ------------------------------ out_ymdh_out = TEMPORARY(ymdh_out) ;- return output times (optional) Result = TEMPORARY(means) ;- return means RETURN, Result END ;=== end of function === ; ========== end file ==========