Friday, July 28, 2023

Distributions

 

Ain’t We Got Fun

There's nothing surer The rich get rich and the poor get poorer In the meantime, in between time Don't we have fun?

But is it normal for the rich to get richer?

It is proposed that the Cumulative Distribution Function, CDF, for an exponential distribution, which is 1-e-λx, with a rate parameter, λ, can be approximated by a coordinate translation of the random normal logistics distribution, also known as the hyperbolic secant squared distribution, whose CDF is ½*tanh((x-µ)/(2*s))+½, from an origin of (0,0) to an origin of (λ, 0.5) if that random normal CDF is also scaled by 2. This means that the range parameter, s, of the logistics distribution can be approximated by 1/(2*λ*ln(2)). While the exponential distributions is traditionally only defined for x>0, this can be translated to begin at any location, µ, if the exponential distribution is also defined for x>µ>0.

Because the logistics function is already defined for all ranges of x, this means that the exponential distribution, whose CDF is also known as the exponential association, can also be defined for all values of  x, including x<µ, if its parameter s is a function of λ. This means that there is no need for a combination of the exponential distribution and a random normal function, either as an Exponentially Modified Gaussian distribution as proposed by Grushka [1], or as an Exponentially Modified Logistic distribution as proposed by Reyes [2]

The figure below shows the CDF of a logistics distribution (blue), which does not look like the CDF of the exponential distribution (red). Also shown as a dash red curve is what the CDF for the exponential distribution would be for x<0. The doubling of logistics function with a shift along the y-axis of the origin from (0, 0) to an origin of (0, 0.5) does look like the exponential distribution for x>0 (green).


As shown below, if the curves are shifted on the x-axis to both cross at µ, by shifting the exponential distribution from an origin of (0, 0) to an origin of (µ, 0) then the two curves look more similar for x>µ.


By setting the two curves equal at a common location, µ, it is possible to solve for s, the range parameter of the logistics distribution in terms of λ, the rate parameter of the exponential distribution. This function is s=1/(2*ln(2)*λ). If the variance is equal to 1.0, then the relationship between s and the variance, σ2, as s2π2/3 can be used to compute that s=0.55. At that value of s, this means that the correlation between the two curves from µ to µ+3σ, is almost perfect at 0.9967. However if the difference between that scaled logistics distribution greater than the median and the exponential distribution is set to a minimum, the values become λ= 1/ln(2)=1.44, s =0.5, the variance thus becomes 0.822 and the correlation between the exponential and the logistics curve, scaled and shifted, increases to 0.9982.


It is thus proposed that there is no need to develop a new distribution combining the exponential and a random normal distribution. The exponential distribution with a constraint of x>µ, appears to be merely the upper half of a normal logistics distribution, the half beginning at the median. It is also suggested the lowest variance for a normal distribution should be 0.822, the lowest standard devaiation should be 0.9069, should be 0.5, and that the rate parameter of the exponential distribution is related to the difference between the mean and median of any distribution.

Thus if the mean household income in 2021 is $66,018 and the median household income is $58,153 according to the U.S. Census, and income follows an exponential distribution, the curve would be as shown below, which also shows the reported mean household income by the mid‑point of a decile, as well as the reported mean income limit of the highest 5%. This suggests that only when zero represents an absolute value, e.g. as the vector distance from an object, or an empty condition, where the mean and the median of the distribution are the same, will this be a true exponential distribution. It will be skewed by definition and is not normal. However if the median and the mean are appreciably different, then the distribution may only appear to follow an exponential distribution, but the distribution is in fact normal and its appearance as a skewed exponential distribution is because only the portion above the median is being used. Or as Garrison Keillor ironically puts it in his tales from Lake Wobegon, “All the children are above average.”


The chart above has been adjusted for inflation, i.e. all incomes are in 2021 US Dollars.  Both the 1968 and the 2021 distributions have the same total income for society but only vary in how it is distributed to individual households.  It suggests that, the income distribution in 1968 was less skewed, and that if it was viewed as a normal distribution for all incomes, including subsidies and transfers, i.e. negative incomes, the lower income range would be between $0 to $100,645 instead of the current range of $0 to $163,547 and the income to be wealthy would be $301,934 instead of $490,642.  The 1968 distribution was less normal, had a lower coefficent of determination, r2, to the random distribution, but was more equitable, had a lower variance. The 2021 distribution was more normal but less equitable. The challenge is to distribute incomes in a manner that is both normal and equitable.

[1] Grushka, E. (1972). Characteristics of Exponentially Modified Gaussian Peaks in Chromatography. Analytical Chemistry Vol 44, pp. 1733-1738.

[2] Reyes, J., Venegas, O., & Gómez, H. W. (2018). Exponentially-Modified Logistic Distribution with Application to Mining and Nutrition Data. Applied Mathematics & Information Sciences Vol 12 Number 6, pp. 1109-1116.

 

 







No comments:

Post a Comment