;---------------------------------------------------------------------------- ; \$Id: pcomp_k67.pro,v 1.2 2002/08/17 03:24:38 johnny Exp \$ ;+ ; NAME: ; PCOMP_K67 ; ; PURPOSE: ; Principal component analysis using the method of Kutzbach (1967). ; This method is based on finding the eigenvector and eigenvalues of ; the covariance matrix of the input data. ; ; CATEGORY: ; Statistics. ; ; CALLING SEQUENCE: ; Result = PCOMP_K67(in_data) ; ; INPUTS: ; in_data: Input data of N observations in time at M locations. ; Floating array dimensioned (M,N), i.e. each column ; is N observations at a single location. Equivalent ; to transpose of matrix F in Kutzbach (1967, p. 792). ; Unchanged by function. ; ; INPUT KEYWORD PARAMETERS: ; REMOVE_MEAN If set to true, the mean of each of the M timeseries ; is removed to give a zero mean for each M timeseries ; in the calculations in the function. The returned ; projection onto the eigenvectors will be the projec- ; tion onto this zeroed mean data. Unchanged by fun- ; ction. ; ; OUTPUT KEYWORD PARAMETERS: ; EIGENVALUES: Returns vector of eigenvalues (float dimensioned M). ; Note that in Kutzbach (1967), it seems that eigen- ; values are treated as a "vector of eigenvalues" and ; so this keyword outputs them as an IDL vector (i.e. ; a row) and not a column vector. Eigenvalues are ; given in descending order. Created by function. ; ; EIGENVECTORS: Returns array of eigenvectors where each row i is ; the eigenvector corresponding to eigenvalue i. ; Equivalent to transpose of matrix E in Kutzbach ; (1967, p. 792). Floating array dimensioned (M,M). ; Created by function. ; ; MEAN_LOC: Returns the mean over the timeseries for each M ; location. Floating vector dimensioned M. Created ; by function. ; ; VARIANCE: Returns fraction of total variance explained by k+1 ; largest eigenvalues, at the kth element. Thus, at ; element M-1, the total fraction explained is 1. ; Floating vector (dimensioned M). Created by function. ; ; OUTPUTS: ; Result: Each M col. is a principal component, and correspond ; to the M eigenvalues and eigenvectors. Or, for each ; [i,n] value, the coefficient is the projection of ; the nth observation vector (or the observation vector ; with mean removed, if Remove_Mean is set) on the ith ; eigenvector. Equivalent to transpose of matrix C in ; Kutzbach (1967, p. 792). Floating array dimensioned ; (M,N). Created by function. ; ; FILE DEVICE I/O: ; None. ; ; COMMON BLOCKS: ; None. ; ; EXAMPLE: ; None. ; ; MODIFICATION HISTORY: ; - 16 Aug 2002: Orig. ver. Johnny Lin, CIRES/University of Colorado. ; Email: air_jlin@yahoo.com. Passed passably adequate tests. ; ; NOTES: ; - Written for IDL 5.5. ; - Reference: Kutzbach, J. E. (1967), "Empirical eigenvectors of sea- ; level pressure, surface temperature and precipitation complexes over ; North America," J. Applied Meteorology, Vol. 6, pp. 791-802. ; - Note that Kutzbach (1967)'s method assumes that the mean of each of ; the M timeseries is zero. An error check is implemented to make sure ; M and N are both greater than or equal to 2. ; - All keyword parameters are optional unless otherwise stated. ; - No procedures called with _Extra keyword invoked. ; - No user-written procedures called. ;- ; Copyright (c) 2002 Johnny Lin. For licensing, distribution conditions, ; and contact information, see http://www.johnny-lin.com/lib.html. ;---------------------------------------------------------------------------- FUNCTION PCOMP_K67, in_data \$ , EIGENVALUES = out_eigenvalues \$ , EIGENVECTORS = out_eigenvectors \$ , REMOVE_MEAN = remove_mean \$ , MEAN_LOC = out_mean_loc \$ , VARIANCE = out_variance \$ , _EXTRA = extra ; -------------------- Error Check and Parameter Setting -------------------- COMPILE_OPT IDL2 ON_ERROR, 0 dims = SIZE(in_data) if (dims[0] ne 2) then MESSAGE, 'error--bad dimensions' M = dims[1] ;- set number of locations N = dims[2] ;- set number of observations in time if (MIN([m,n]) lt 2) then MESSAGE, 'error--too few pts.' F = TRANSPOSE(in_data) ;- protect input and put input data in col.-major ; format (each row is N observations at a single ; location). ; ------------------ Calculate and Remove Mean (optional) ------------------- mean_value = FLTARR(M) ;- calculate mean of each timeseries for j=0,M-1 do begin ; location mean_value[j] = MEAN(F[*,j]) endfor if (KEYWORD_SET(remove_mean) eq 1) then begin ;- remove mean PRINT, 'PCOMP_K67: Removing mean for each timeseries' ; from time- for j=0,M-1 do begin ; series F[*,j] = F[*,j] - mean_value[j] endfor endif ; ------------------------------- Calculations ------------------------------ ; ; To make debugging easier, all calculations are done assuming conventional ; linear algebra notation, i.e. column-major (indexing through all elements ; in a column first), vs. row-major which is what IDL uses, for the ; following variables from Kutzbach (1967): F, R, E, C. R = ( F ## TRANSPOSE(F) ) / FLOAT(N) ;- calc. covariance ;- calc. eigenvalues and eigenvectors, and make the eigenvector matrix E ; column-major (eigenvalues are given in descending order): eigenvalues = EIGENQL(R, Eigenvectors=eigenvectors) E = TRANSPOSE(eigenvectors) C = TRANSPOSE(E) ## F ;- calc. coeff. representing the projection of the ; observation vectors on the eigenvectors N_eig = N_ELEMENTS(eigenvalues) ;- calc. tot. variance explained of tot_var_k = FLTARR(N_eig) ; the (i+1)th largest eigenvalues tot_var = TOTAL(eigenvalues) for i=0,N_eig-1 do begin tot_var_k[i] = TOTAL(eigenvalues[0:i]) / tot_var endfor ; ------------------------------ Prepare Output ----------------------------- out_eigenvalues = eigenvalues out_eigenvectors = eigenvectors out_mean_loc = mean_value out_variance = tot_var_k Result = TRANSPOSE(TEMPORARY(C)) RETURN, Result END ;=== end of function === ; ========== end file ==========