Saturday, October 8, 2022

Functions

 

Walkin' on the Sun

And it ain't no joke when our mama's handkerchief is soaked With her tears because her baby's life has been revoked The bond is broke up, so choke up and focus on the close up Mr. Wizard can't perform, no God-like hocus-pocus

Where is Mr. Wizard when you need him? Perhaps I can invoke his memory in this blog post.

A function which has the value of zero at zero, and a value of 1 at infinity, describes a whole family of functions.  But if the inverse of that function should also have the value of 1 at zero and a value of zero at infinity, then that narrows it down considerably. And by that, it is meant an actual value of zero, not a discontinuity that is merely being defined as zero.  For example, the limit of f(x)=x as x approaches infinity is infinity, which means that the limit of 1/f(x) as x approaches infinity is zero, but this is a limit, NOT the actual value.  Thus, at x=0, there is a discontinuity for 1/f(x), i.e. 1/0, that is often defined as zero.  The function where there is no discontinuity will be a form of an exponential function.  The integral, Cumulative Distribution Function, of an exponential distribution,  f(x)=λ*exp(-λ*x), has a value of zero at zero and a value of 1 at infinity.  The integral of that distribution is an exponential association, 1‑exp(‑λ*x)

The exponential distribution does not allow values of x less than zero (and by that it is meant ABSOLUTE ZERO, not a relative zero which is merely a translation from absolute zero. E.g.  Centigrade/Celsius is a relative temperature scale where 0 degrees C is 273.15 degrees Kelvin, an absolute scale.).  An exponential distribution is NOT normal.  It has skew of 2 compared to a normal distribution with a skew of 0.  One combination of an exponential and Gaussian normal distribution which approaches normal is an exponentially modified Gaussian distribution.  It is defined by the exponential rate parameter, λ, AND the normal distribution parameters, σ, the standard deviation where the variance is the square of σ, and the mean, µ. 

Another normal distribution is the Logistic distribution.  That distribution is based on the probability of an individual member of a group, a distribution, selecting an alternative.  That probability itself is based on the mean, µ, of all probabilities, as well as the scale, s, range, over which that probability changes.  The scale, s, over which the probability changes is directly proportional to the standard deviation, σ.  Like other normal distributions, the mean of a logistic distribution is equal to its median. If the exponential distribution parameter, λ, is chosen to be equal to the scale parameter, s, and the exponential distribution is combined with the Logistic distribution then the result is as proposed by Reyes.  (Reyes, Venegas, & Gómez, 2018)  The exponentially modified Gaussian distribution and the exponentially modified Logistic distribution both also do not have the desired feature of zero at zero and infinity at infinity, but they are a step in that direction.

Figure 1 Cumulative Distribution Functions. 


The optimal desired function would also be one that is normal, e.g., has a skew close to zero as in the Gaussian or Logistics distribution, but allows for any value of x including negative values of x, unlike the exponential distribution, which appears to be a continuation of the exponential function on the y-axis with a limitation of x>0.  The desirable function is thus most probably a combination of a normal distribution and the exponential distribution.  While a feature of a normal distribution is that the mean is equal to the median, another feature of a normal distribution is that it follows the 68/95/99.7 rule; in other words,

68% of the values fall between the mean plus or minus 1 standard deviation, +/-σ;

95% of the values fall between the mean plus or minus 2 standard deviations, +/-; and

99.7% of the values fall between the mean plus or minus 3 standard deviations,+/- . 

Thus, if the skew is zero, 49.5% of the values, which is close to the median, 50% of the values, must fall between 0 and 3 standard deviations,- 3σ, from the mean, μ.  If the standard deviation, σ, is the square root of the variance and the skew is zero then, according to Pearson’s Second Coefficient of Skewness, the mean, μ, divided by 3 must be σ.  Thus, to be normal, have a skew of zero, the mean, μ, can not be more than 1.5 times the median. The exponential distribution, even though it has a skew of 2 compared to a normal distribution’s skew of zero, has a ratio of its mean to median of 1.44.  If the ratio of the mean to the median exceeds 1.5, the distribution is not only not normal, its observations can not consist of a single distribution.  Also, as the mean, μ, increases the variance, σ2, must also increase in order to be a normal distribution.  An exponentially modified Gaussian distribution has a range of the variance depending on the values, of μ, σ, and λ.  Within the valid ranges of the variance, the ratio of the mean to the median can not exceed 1.15.  An exponentially modified Logistics distribution has a range depending on the values, of μ, and s.  Within the valid ranges, the ratio of the mean to the median can not exceed 1.5.

If the observations are expected to represent a distribution of a phase change, i.e.  going from a cumulative probability of zero to a cumulative probability of 1, and the resulting equation is NOT required to be normal, then the regression equation should be expected to be the Cumulative Distribution Function of an exponential distribution, in other words, an exponential association. (i.e., the black line in Figure 1).  This requires only regressing, solving, for one parameter. That is, solving for the parameter λ by regressing the data to fit

y=1-e-λx

If the ideal equation is also expected to also be normally distributed, i.e., the equation should have a skew as close to zero as possible, then the cumulative Exponentially Modified Logistic Distribution should be used as the basis of the regression.  (i.e., the red line in Figure 1). This will require solving for two parameters: solving for µ and s, where s = √3/ π *σ, by regressing the data to fit

y=(1−e(μ-x) /s)*ln(e(x−μ) /s+1).

The ideal equation will be somewhere between the black line and the red line. 

It is noted that both regressions are of non-linear equations.  Linear regression is a more common technique.  It is possible through logarithmic transformation of the dependent, y, or independent, x, data to create some non-linear functions.

·        Linear-linear, where neither the y nor x data are transformed, under linear regression solves for

y = m*x +b; Linear

·        Log- linear, where the independent data, x, is transformed to be its natural logarithm and the dependent data, y, is not transformed, under linear regression solves for

y=b+m*ln(x); Logarithmic

·        Linear -Log where the independent data, x, is not transformed and the dependent data, y, is transformed to be its natural logarithm, under linear regression solves for

y= eb+m*x; Exponential

·        Log-Log, where both the y and x data are transformed to be their natural logarithm, under linear regression solves for

y=eb * xm; Power

While all but the first bullet are also non-linear equations, none of these bulleted equations are the non-linear equations shown in Figure 1.  This does not mean that if nonlinear regression software is not available, that linear regression can not be used.  It is only necessary to find the x observation at which the y data is 99.7% of the maximum value, and separate the data into buckets of equal intervals of the independent variable, x between 0 and that point.  In electrical engineering, the voltage of a resistor‑capacitor, RC, circuit over time follows an exponential association.   It has also been shown that segregating the data into six buckets and doing a linear regression on each bucket, can yield results which are linear regressions that approximate the nonlinear curve.  The same segregation and use of linear regression can be used in any application of exponential associations.  And as noted above, most group, distribution, behavior should be expected to be similar to that of an exponential association.  If that behavior is also expected to follow a normal distribution, it is a little more work to identify the parameters, but not insurmountable.

References

Reyes, J., Venegas, O., & Gómez, H. W. (2018). Exponentially-modified logistic distribution with application to mining and nutrition data. Appl. Math 12.6, 1109-1116.

 

 

 

 

 



No comments:

Post a Comment