Friday 11 January 2019

PCA CORRELATION: A BRIEF INTRODUCTION TO DIMENSION REDUCTION (A Brute Force Mean Variance Standardization)

DISCLAIMER:  This blog's author is not a stakeholder of Chevron Texaco, nor this article is a formal and staid financial advice.

Have you ever wondered why covariance & correlation are so numerically different?   Can they converge?  On the following lines I will be explaining a crucial part of PRINCIPAL COMPONENT ANALYSIS: an early tool for dimension reduction and data noise removal.  The tool has more than 100 years, so,  in terms of today standards is obsolete and plenty of caveats.

The below URL displays prices for the CHEVRON-TEXACO share, high and low prices were taken into account against TEXAS WTI reference price, all of them in a spreadsheet:

https://1drv.ms/x/s!ApxRazJ7xJyUf9ZPugnE4Jenbuc


click on: CVX tab, there you will get amounts of raw data...What is the trick? 

1. I got continuously compounded returns, their average and their standard deviation

2. MEAN VARIANCE STANDARDIZATION ... Of those values, by resorting to this formula:             Z = (x - x̄) / s

3. Once you get your Z statistics, out of the returns; bear in mind that, the mean for a standard normal distribution = 0 and its standard deviation = 1
.  So thus, COVARIANCE & CORRELATION values are the same for 2 data sets.

Now, click on: COVARIANCE-CORRELATION  tab, you will get products of 2 data sets: high & low returns, high returns & Texas WTI price change, low returns & Texas WTI price change.


2 dimensions are now 1 for numerical purposes.  For the particular case of CHEVRON TEXACO, There is strong positive correlation for the variations involving the highest and lowest intraday price returns...While the relationship of such returns is moderately positive against the changes on the Texas WTI reference price...

The company is, perhaps, not hugely affected by changes in the price of CRUDE OIL due to investment diversification...

CAVEAT:   By turning all of the values from your data sets into a standard normal distribution , we are assuming PERFECT NORMALITY .   No values falling out of the tails or black swans are part of the ensemble or they are assumed as PERFECTLY NORMAL.

To solve the above problem, Variation Autoencoders from the field of Machine/Deep Learning are a more accurate exit.  Mathematical Models are quite inferior to Data Models in terms of today technology.

Sources & Acknowledgements:  
Shingai Manjengwa from Fireside Analytics Inc.
Mikhail Lakirovich, Greg Filla, Armand Ruiz &  Saeed Aghabozorgi from IBM


https://www.tastytrade.com/tt/learn/correlation