Walkin' on the Sun
And it ain't no joke
when our mama's handkerchief is soakedWith her tears because her baby's life has been revoked
The bond is broke up, so choke up and focus on the close up
Mr. Wizard can't perform, no God-like hocus-pocus
Where is Mr. Wizard when you need him? Perhaps I can invoke his memory in this blog post.
A function which has the value of zero at zero, and a value of 1 at infinity, describes a whole family of functions. But if the inverse of that function should also have the value of 1 at zero and a value of zero at infinity, then that narrows it down considerably. And by that, it is meant an actual value of zero, not a discontinuity that is merely being defined as zero. For example, the limit of f(x)=x as x approaches infinity is infinity, which means that the limit of 1/f(x) as x approaches infinity is zero, but this is a limit, NOT the actual value. Thus, at x=0, there is a discontinuity for 1/f(x), i.e. 1/0, that is often defined as zero. The function where there is no discontinuity will be a form of an exponential function. The integral, Cumulative Distribution Function, of an exponential distribution, f(x)=λ*exp(-λ*x), has a value of zero at zero and a value of 1 at infinity. The integral of that distribution is an exponential association, 1‑exp(‑λ*x)
The exponential distribution does not allow values of x
less than zero (and by that it is meant ABSOLUTE ZERO, not a relative zero
which is merely a translation from absolute zero. E.g. Centigrade/Celsius is a relative
temperature scale where 0 degrees C is 273.15 degrees Kelvin, an absolute
scale.). An exponential distribution is NOT
normal. It has skew of 2 compared to a
normal distribution with a skew of 0. One
combination of an exponential and Gaussian normal distribution which approaches
normal is an exponentially modified Gaussian distribution. It is defined by the exponential rate parameter, λ, AND
the normal distribution parameters, σ, the standard deviation where
the variance is the square of σ, and the mean, µ.
Another normal distribution is the Logistic distribution. That distribution is based on the probability
of an individual member of a group, a distribution, selecting an alternative. That probability itself is based on the mean,
µ, of all probabilities, as well as the scale, s, range, over
which that probability changes. The scale,
s, over which the probability changes is directly proportional to the
standard deviation, σ. Like other
normal distributions, the mean of a logistic distribution is equal to its
median. If the exponential distribution parameter, λ, is chosen to be equal
to the scale parameter, s, and the exponential distribution is combined
with the Logistic distribution then the result is as proposed by Reyes.
Figure 1 Cumulative Distribution Functions.
The optimal desired function would also be one that is normal, e.g., has a skew close to zero as in the Gaussian or Logistics distribution, but allows for any value of x including negative values of x, unlike the exponential distribution, which appears to be a continuation of the exponential function on the y-axis with a limitation of x>0. The desirable function is thus most probably a combination of a normal distribution and the exponential distribution. While a feature of a normal distribution is that the mean is equal to the median, another feature of a normal distribution is that it follows the 68/95/99.7 rule; in other words,
68% of the values fall between
the mean plus or minus 1 standard deviation, +/-σ;
95% of the values fall between
the mean plus or minus 2 standard deviations, +/-2σ; and
99.7% of the values fall between the
mean plus or minus 3 standard deviations,+/- 3σ.
Thus, if the skew is zero, 49.5% of the values, which is close
to the median, 50% of the values, must fall between 0 and 3 standard deviations,-
3σ, from the mean, μ. If the
standard deviation, σ, is the square root of the variance and the skew is
zero then, according to Pearson’s Second Coefficient of Skewness, the mean, μ,
divided by 3 must be σ. Thus, to
be normal, have a skew of zero, the mean, μ, can not be more than 1.5
times the median. The exponential distribution, even though it has a skew of 2
compared to a normal distribution’s skew of zero, has a ratio of its mean to median
of 1.44. If the ratio of the mean to the
median exceeds 1.5, the distribution is not only not normal, its observations
can not consist of a single distribution.
Also, as the mean, μ, increases the variance, σ2,
must also increase in order to be a normal distribution. An exponentially modified Gaussian distribution
has a range of the variance depending on the values, of μ, σ, and λ. Within the valid ranges of the variance, the
ratio of the mean to the median can not exceed 1.15. An exponentially modified Logistics
distribution has a range depending on the values, of μ, and s. Within the valid ranges, the ratio of the
mean to the median can not exceed 1.5.
If the observations are expected to represent a distribution
of a phase change, i.e. going from a cumulative
probability of zero to a cumulative probability of 1, and the resulting
equation is NOT required to be normal, then the regression equation should be
expected to be the Cumulative Distribution Function of an exponential
distribution, in other words, an exponential association. (i.e., the black line
in Figure 1). This requires only regressing, solving, for one
parameter. That is, solving for the parameter λ by regressing the data
to fit
y=1-e-λx
If the ideal equation is also expected to also be normally distributed,
i.e., the equation should have a skew as close to zero as possible, then the cumulative
Exponentially Modified Logistic Distribution should be used as the basis of the
regression. (i.e., the red line in Figure 1).
This will require solving for two parameters: solving for µ and s, where s = √3/ π *σ,
by regressing the data to fit
y=(1−e(μ-x) /s)*ln(e(x−μ)
/s+1).
The ideal equation will be somewhere between the black
line and the red line.
It is noted that both regressions are of non-linear
equations. Linear regression is a
more common technique. It is possible
through logarithmic transformation of the dependent, y, or independent, x,
data to create some non-linear functions.
·
Linear-linear, where neither the y nor x
data are transformed, under linear regression solves for
y = m*x +b;
Linear
·
Log- linear, where the independent data, x,
is transformed to be its natural logarithm and the dependent data, y, is
not transformed, under linear regression solves for
y=b+m*ln(x);
Logarithmic
·
Linear -Log where the independent data, x,
is not transformed and the dependent data, y, is transformed to be its
natural logarithm, under linear regression solves for
y=
eb+m*x; Exponential
·
Log-Log, where both the y and x
data are transformed to be their natural logarithm, under linear regression
solves for
y=eb *
xm; Power
While all but the first bullet are also non-linear equations,
none of these bulleted equations are the non-linear equations shown in Figure 1. This does not mean that if nonlinear
regression software is not available, that linear regression can not be
used. It is only necessary to find the x
observation at which the y data is 99.7% of the maximum value, and separate
the data into buckets of equal intervals of the independent variable, x between
0 and that point. In electrical
engineering, the voltage of a resistor‑capacitor, RC, circuit over time follows
an exponential association. It has also been shown that segregating the data
into six buckets and doing a linear regression on each bucket, can yield
results which are linear regressions that approximate the nonlinear curve. The same segregation and use of linear
regression can be used in any application of exponential associations. And as noted above, most group, distribution,
behavior should be expected to be similar to that of an exponential association. If that behavior is also expected to follow a
normal distribution, it is a little more work to identify the parameters, but
not insurmountable.
References
Reyes, J., Venegas, O., & Gómez, H. W. (2018).
Exponentially-modified logistic distribution with application to mining and
nutrition data. Appl. Math 12.6, 1109-1116.