Wednesday, August 30, 2023

Disaggregation II

 


Breaking Up Is Hard to Do

They say that breaking up is hard to doNow I knowI know that it's trueDon't say that this is the endInstead of breaking up I wish that we were making up again

Disaggregating is breaking up.

I am returning to the original purpose of this blog, which was to keep from repeating my self to colleagues, and instead referring them to a link to a blog post. So while the subject probably does not interest the vast majority of you, here it goes.

1.  Define disaggregation. The unique primary key fields in the Freight Analysis Framework, FAF, are Origin, Destination, Commodity, and Mode. Disaggregation is the opposite of aggregation. Each of these reported primary key fields are an aggregation.

  • Origins and destinations are reported at level of FAF regions. Domestically these are states or portions of states, and they can be disaggregated to county, districts, firms, etc. However, I do not believe there is an interest in disaggregating International FAF regions into countries. (e. g. Europe into Germany, France, etc.).
  • Commodities are in the Standard Classification of Transported Goods at the two-digit level, SCTG2 system and while these could be either disaggregated into another commodity classification system, e.g. Standard Transportation Commodity Code, STCC, or be disaggregated within the hierarchical SCTG system, e. g. SCTG5,  I do not believe that this is what is meant by disaggregation.
  • Similarly, modes have been aggregated, and can be disaggregated ( e.g. Water into Great Lakers, Inland barges, Deep Sea ships), but I do not believe that this is the intent.

IMHO disaggregation is intended to mean only DOMESTIC origins and destinations into smaller zones. 

2. Why are you disaggregating. IMHO, disaggregation is done of the trip table for use in analysis. And disaggregation is also done of the trip table for use in traffic assignment. When it is done for the trip table to be used in analysis, such as the preparation of Freight Plans, assignment may, or may not, also be done. When it is done only for assignment, to prevent “lumpy loading,” i.e. over assigning links, and the link volumes, and not the trip table itself, may be analyzed. So disaggregation does NOT mean assignment. In fact, the assignment generally is done of vehicles that carry freight while the FAF is the contents of those vehicles. These are NOT the same. 

3. What does disaggregation include? To STEM-ers like myself, disaggregation is simply the opposite of aggregation, which is if you compress a matrix to aggregate it, then you expand a matrix to disaggregate it. This means that if the matrix has II/IE/EI/EE, and I is the internal geography to be disaggregated and E is the external geography not to be disaggregated, then EE is NOT disaggregated/expanded. However non-STEM-ers often include EE pass through as part of disaggregation. It is possible to identify EE pass through, it is simply subarea extraction, but it needs a trip table AND a network, unlike matrix expansion which only needs a trip table. So, does disaggregation include, or exclude, EE pass through? 

4.  Where do you get the disaggregation factors? If you have decided to disaggregate Domestic Origins and Destinations, it is important to understand the reported Domestic Origins and Destinations in the FAF are sometimes POEs, i.e. Ports of Entry for imports and Ports of Exit for exports, not origins and destinations.  In the FAF, actual origins and destinations are correlated with the economy of the shippers and receivers, but the POE is correlated with the carrier(s). There are different sources for economic disaggregation factors, and carrier disaggregation factors. You can infer what kind of a POE it is based on the reported foreign and domestic modes. 

5. How finely can you disaggregate? The FAF is an expansion of a sample, e.g. the Commodity Flow Survey, CFS, and additions that are Out Of Scope, OOS, with respect to the CFS. The expansion of a survey is for statistically significant zones. To protect privacy of the survey respondents, the original survey and the expansion factors are not reported. Beside even if the surveys were reported, they would NOT be statistically significant for smaller geographies. Additionally, for OOS there might not even be surveys and the whole concept of statistically significant does not apply. But you can come up with most probable disaggregation factors even if you can not come up with exact disaggregation factors. When disaggregating on the most probable you may be effectively betting on the favorite but that does not preclude the possibility that the factors associated with the long shot should have been used. Because these are most probable, not exact, Disaggregation Factors, I am uncomfortable disaggregating, inferring, too finely. 

6. How different should disaggregation factors be? The FAF is an asymmetrical table, e.g. the tons of grain from Iowa to New York City does NOT equal the tons of grain from New York City to Iowa. The FAF is asymmetrical across every primary key field, i.e. origin/destination/commodity/mode. Thus, ideally disaggregation factors should also be different by origin/destination/commodity/mode. But this is the ideal. The more dimensions that can be included the better, but some dimensions, and not others, is better than nothing. 

7. How should a successful disaggregation be judged? Is the goal to match the original table or to match some secondary source? What are those secondary sources and what is the error between them and the original FAF? The difference between the disaggregated FAF and the secondary source might in fact only be error between the original FAF and the secondary source. Matching the original if the disaggregation factors sum to 100% is easy, but this may not match the secondary source. 


In conclusion

·        What is meant by disaggregation? Does it include only II/IE/EI or does it also includes EE passthrough?

·        What is the deliverable? Is it the disaggregation process and a set of default disaggregation factors, or does it also include the process by which to create new disaggregation factors.

·        If the process of creating new disaggregation factors relies on public data sources, what is to be done if the format, contents, link location  or processing of the public sources which are not under the consultants control change  (e.g. US Army Corps of Engineers, USACE, Waterborne Commerce Statistics Center, WCSC, changed the format, contents, and location of the reporting that is usable with FAF3 (2007), FAF4 (2012) and FAF5 (2017).  The number of FAF regions is different in FAF3 from those in  FAF4 or FAF5. The number of regions stayed the same between FAF4 and FAF5, but three counties switched regions. Also, the numeric code associated with Rhode Island as a FAF region changed from 440 to 441. (not disputing either change in the FAF, only pointing out that it did change, and this has impacted the software). The counties not served by active rail tracks or navigable water appears to be no longer reported by Oak Ridge National Laboratory, ORNL, Center for Transportation Analysis, CTA.

Disaggregation

 

Pave the World

I swear I'm in fourth gear,
My rollers down, trail of smears.
Asphalt, aggregate,
No more trees, pavement.

If you don’t like aggregate, then how do you feel about disaggregation.

I am returning to the original purpose of this blog, which was to keep from repeating my self to colleagues, and instead referring them to a link to a blog post. So while the subject probably does not interest the vast majority of you, here it goes.

Disaggregation

Disaggregation is merely the opposite of aggregation. Matrices might be aggregated; in which case they can of course be disaggregated. There are also cells in the matrix which might be required to be aggregated, or otherwise identified, for example cells that pass through the area to be “disaggregated.” A knowledge of which cells pass through the area to be disaggregated, for example a network and/or its routing table which identifies which cells of the matrix use links of the network that are located in the area being disaggregated, then also needs to be provided. The application of disaggregation factors, without identifying which links in the disaggregated network should be used, is called matrix expansion. The filtering of cells in the matrix that use links in the disaggregation area is variously called subarea extraction, windowing, or select link analysis. For purposes of this discussion, disaggregation is taken to mean that only occurring without a network and includes only cells that travel to, from, or within the study area which is being disaggregated. This is also known as I-I, I-E and E-I trips, where I are the internal cells to be disaggregated and E are the external cells that are not to be disaggregated. Cells that are E-E, which begin outside of the area to be disaggregated, end outside of the area to be disaggregated, but also use links within the area to be disaggregated, i.e. E-E Pass-through, use a different method than matrix expansion and its disaggregation factors.

Disaggregation, i.e. matrix expansion, requires the identification of how the matrix was aggregated in the first place. This is not always known, or even knowable. The matrix might not have been aggregated. For example, the Federal Highway Administration’s, FHWA’s, Freight Analysis Framework, FAF, is developed from the Commodity Flow Survey, CFS and methods applied to shipments that are outside the scope of the Commodity Flow Survey. Those CFS Out Of Scope, OOS, flows, were not aggregated. The Commodity Flow Survey flows were aggregated by applying expansion factors to surveys. However the surveys themselves are not disclosed to protect the confidentiality of the survey respondents. Those surveys also do not represent 100% of the flows. Their expansion factors only apply to the groupings of the surveys which are statically significant. There are no expansion factors for grouping of cells which were not statistically significant. The absence of aggregation factors  means that it is not possible to develop the guaranteed disaggregation factors. However that does not mean that it is not possible to develop most probable disaggregation factors.

The concept of the most probable is common in transportation planning.

·        Frataring, often known as IPF, Iterative Proportional Fitting.

·        the Gravity Model.

·        the Logit Mode Split Model; and

·        Origin Destination Matrix Estimation, ODME, etc.

are all based on finding the most probable trip table. Disaggregation itself is an intermediate, meso, state between individual, micro, states and the aggregated, macro, state. The desired matrix is an intermediate, thus the mathematical concept of the most probable, the mesostate which contains the greatest number of microstates, can be used. However using the most probable disaggregation requires honoring the concept of the most probable disaggregation.

Dimensions Being Disaggregated.

Theoretically any of the indices, dimensions, primary key fields of the matrix, can be used in disaggregation. For example, the FAF has four primary key fields:  the origin, the destination, the commodity, and the mode. While it is possible to disaggregate commodities in the FAF, which are currently reported as Standard Classification of Transported Goods at the two-digit level, SCTG2, by a greater number of SCTG digits, this is not often done (using a different Commodity classification system would be CROSSWALKING, not DISAGGREGATING). While the modes reported in the FAF can be disaggregated (e.g. Water can be disaggregated into Inland Barges, Great Lakers, and Deep Sea vessels) the most common application of disaggregation is for the origins and destinations, geography, from the reported states and regions to smaller zones, for example to county-equivalents in those states and regions.

Correlation is not causation

Disaggregation can be on any characteristics of the dimensions of the matrix. For example, a region with two counties can be disaggregated 50% to each county for both origins and destinations. However it is not expected that this disaggregation factor will be correlated with the number of those smaller zones. For example, it might be more likely that origins will be correlated with employment and that destinations will be correlated with population. If the two counties had respectively 70% and 30% of the employment and 60% and 40% of the population, those characteristics might have been used as disaggregation factors instead. However those factors do not consider the commodity or mode. It may not be possible to develop or obtain factors that are CAUSING the flow. It is more likely that information showing CORRELATED factors. In this case, care should be taken that the correlation makes sense and is not a spurious correlation.

Perspective Matters

The nature of the matrix being disaggregated may have information that can be used in creating disaggregation factors. For example, in the transportation of freight, there are three functions: shipper, receiver, and carrier. It is possible for two or more of those functions to be combined: e.g. Dole may carry goods that it receives in trucks which it owns. Wal-Mart may carry goods in trucks which it owns, and additionally serve as both the shipper and the receiver, and that flow may not even be reported if it is an internal business operation. The source of the matrix may provide information on the dimensions and the disaggregation factors to be used.

As noted, the FAF is derived from the Commodity Flow Survey of shippers. The dimension is an origin, but the Domestic Origin is the origin of the shipper for domestic trade, while the Foreign Origin is the origin of the shipper for import trade. The Domestic Origin for imports, and the Domestic Destination for exports, is more properly a Port Of Entry/Exit, POE, which is a carrier transfer point at which no economic activity of the shipper or receiver is likely to occur.

Because the flows are as reported by the shipper, the mode used by the carrier may not be known, only the rate that was charged by the carrier. Thus a mode in the FAF is Air and Truck where the shipper pays a rate that is typical of air, but the carrier might choose operationally to use a truck. Because the flow is as reported by the shipper, the commodity is always known which is why the SCTG2 system, which does not include Freight All Kind, FAK, is used, while the carrier may not know the commodity and use the Standard Transportation Classification Code commodity system which does include FAK. Similarly the FAF does not include the packaging of the commodity in containers, while containers may be an important reporting by carriers.

In carrier focused matrices, the origins may include both carrier and shipper origins and the destinations may include both carrier and receiver destinations. A source of economic activity from an origin may only be pertinent for shipper-receiver matrices, such as the FAF, not carrier matrices such as Transearch. The first origin and the last destination may also reflect the economic activity of a customer, or a carrier operations while intermediate stops as reported in modal databases, are almost always carrier operations/activity.

Disaggregation Factors should be unique.

The matrix is probably not symmetric and has unique flows in each cell, and thus the disaggregation factors should also be expected to be unique. This is the ideal. It may be that disaggregation factors can be created for several dimensions, but not for all dimensions. In which case something is better than nothing. For example, import trucks and trains are reported at Border Crossing Port of Entries. The commodity contents of those trucks and trains are not reported and the export flows are not reported. It can be assumed that the relative share for vehicles is the same as the relative share for the commodities in those vehicles, and the relative share for imports is the same as the relative share for exports. While this is hardly unique, it is preferable to making no assumptions at all.

Disaggregation factors should sum to 100%.

Disaggregation factors must sum to 100% for each dimension for each cell that is being disaggregated. Thus if regions are being disaggregated to counties, the disaggregation factors for those counties should sum to 100% for each region. This means that the absolute number of shipments from each county is not needed to create disaggregation factors, only the relative number of shipments, activities, for each county in that region.

The number of disaggregations, mesostates, should be kept as small possible.

The most probable disaggregation factors depend on having as many microstates as possible. The number of microstates will not change. Increasing the number of disaggregation factors for each cell increases the number of mesostates but decreases the number of microstates in each mesostate.

The most probable is not a  guarantee. For example in the game of craps, the most probable roll over a very large number of rolls is a seven. On the next roll the total may be a two, but this does not change the fact that seven was still the next most probable roll. To ensure that the most probable event has meaning, the number of events, i.e. the number of microstates in a mesostate, should be kept as high as possible which means that the number of mesostates should be as low as possible.

Activities should correlate with a disaggregation factor.

Economic activity may apply to all goods transported from a shipper to a receiver. However only those economic activities that also use a mode should be related to any disaggregation of those modal flows. Information is available on which counties are not served by active rail tracks or navigable water. This does not mean that they are served by rail or water carriers, only that if the tracks or navigable water do not exist, then service by carriers could not be offered. Such information is not available for other modes and as a default all economic activity may be considered in creating disaggregation factors. While there is no information on trucks, it is assumed that trucks serve all economic activities.

Disaggregation factors in the tool that I developed

Domestic economic activity

For the domestic trade type, the origin is the domestic origin, and the destination is the domestic destination. The disaggregation factors should also be by commodity and by mode. As noted above, the Oak Ridge National Laboratory’s Center of Transportation Analysis has identified counties that are NOT servicable by rail or water. And this information can be used in developing relative disaggregation factors. The economic activity information is not generally available by commodity but is available by employment in an economic sector, which is generally the industry by the three-digit North American Industrial Classification System, NAICS3. Economic Input - Output models have Make and Use tables which relate economic sectors with commodities. The Bureau of Economic Analysis maintains Make and Use tables and crosswalks can be developed between its commodity and economic sectors,  and NAICS3 industries and SCTG2 commodities. The BEA Make and Use tables are for average percentages of value by economic sector. These averages are just that and a firm may be more or less productive than the average. Additionally the share of employment may not equate to the share of economic value. And the share of the economic value may not equate to the share of tonnages. However if these are assumed, it is possible to develop the economic activity by commodity for origin (shipper/Make) and destination ( receiver/Use.)

Import carrier activity

The economic activity of imports to the domestic destination is the same as the domestic economic activity above. The domestic origin for imports is a Port of Entry. For truck and rail, the Bureau of Transportation Statistics, BTS, Transborder flows, can identify the relative share of carrier activity at each truck and train border crossing in a FAF region. For water, the US Army Corps of Engineers Waterborne Commerce Statistics Center reports import tonnages by commodity. It uses the Public Group commodity classification codes, but those can be crosswalked to SCTG2 commodities. The relative share by water Port POE for each commodity for each FAF region can be calculated from this information. There is no information on the domestic mode which is used at these POEs. For all other foreign modes there is no information that can be used to develop disaggregation factors, so the POE is merely relabeled with its foreign mode and domestic region.

 Export carrier activity

The economic activity of exports from the domestic origin is the same as the domestic economic activity above. The domestic destination for exports however is a Port of Exit, POE. For truck and rail, the Bureau of Transportation Statistics, BTS, Transborder flows, can identify the relative share of carrier import activity at each truck and train border crossing and it is assumed that carrier export activity has the same share. For water, the US Army Corps of Engineers Waterborne Commerce Statistics Center, reports export tonnages by commodity by port waterway. It uses the Public Group commodity classification codes, but these can be crosswalked to SCTG2 commodities as used by the FAF. The relative share by water Port POE for each commodity for each region can be calculated from this information. There is no information on the domestic mode that is used at these POEs. For all other foreign modes there is no information that can be used to develop disaggregation factors, so the POE is merely relabeled with its foreign mode and domestic region.

Monday, August 28, 2023

Complexity

 

Just My Imagination (Running Away with Me)

But it was just my imagination Runnin' away with me It was just my imagination Runnin' away with me

Reality is just one component of a complex number that requires an imagination.

A complex number, x, has a real component and an imaginary component, a+bi, where a is the coeffcint of the real component and b is the coeffcient of the imaginary component, i.  This can also be expressed as re, where r is the real component and θ is the angle to the imaginary axis.  It is thus proposed that x, the relationship to the absolute, is complex, and has both a real and an imaginary component.

For example, the Probability Density Function, PDF, of an exponential distribution, e.g. the formula for radioactive decay, is λe-λx, where λ is the decay rate and x >0.  This has a Cumulative Distribution Function, CDF, which is the integral of the PDF, of 1-e-λx.  This can be translated along the x-axis by an amount µ as PDF = λe-λ(x-µ) which makes the CDF =1-e-λ(x-µ).  The median is when the CDF=50%.  Thus 0.5=1-e-λ(x-µ) requires that the median occurs when x-µ=-ln(0.5)/λ.  If µ=0, which means that x is a relationship to absolute zero, the median is -ln(0.5)/λ. But for an exponential distribution with a µ of zero, the median has previously been defined as 1/λ.  To resolve this apparent paradox, where the median must be both -ln(0.5)/λ and 1/λ, it is proposed that the median is a complex number and that 1/λ is only the real component of that complex number, -ln(0.5)/λ.  This means that -ln(0.5)/λ =(1/λ)e,  and that the angle θ, in radians, of the imaginary axis, is thus ln(-ln(0.5))=-0.36651i. Also, according to Euler’s Formula, =ln(cos(θ)+isin(θ)), which means that (1/λ)e-0.36651i= a*cos(θ)+i*b*sin(θ). This means that a decay rate, λ, defines the complex number, the angle of the imaginary axis, and the translation along the x-axis of the complex number x.

Coincidence

 

It’s All Over Now, Baby Blue

The highway is for gamblers, better use your sense
Take what you have gathered from coincidence
The empty-handed painter from your streets
Is drawing crazy patterns on your sheets
This sky, too, is folding under you
And it's all over now, baby blue

Coincidence is God’s sense of humor

I have been musing on Pythagoras’ Theorem.  The Transportation Modeling Improvement Program included a post from someone who wanted to know whether he should get a Professional Engineering, PE, license.  In responding I was going to quote from the movie the Wizard of Oz https://www.youtube.com/watch?v=uCOxU2rKLas. When given a diploma, e.g. a PE, the Scarecrow  quotes Pythagoras’ Theorem, and to me that is the coincidence.  

The Scarecrow had to already know that Theorem.  The Wizard's diploma only said that he could be trusted to know it.  Getting a PE does not add to your wisdom.  It does serve as a shorthand that others have tested your wisdom and it can be trusted.    The reality is that you had to already know Pythagoras’ Theorem BEFORE you get your “Doctor of Thinkology, Th.D.”,…. or a PE.  A PE says that your judgement has been judged by others.  It may be that you will someday be in a position where you are not known, and your opinion may have to be quickly judged.  In that case a PE may serve as that proof. Getting a PE to me is like giving Chicken Soup to a dead man.  Will it help?  It couldn’t hurt!

Wednesday, August 23, 2023

Climate Change III

 

Sinnerman

Well, I run to the river It was boilin', I run to the sea It was boilin', I run to the sea It was boilin', all on that day So I ran to the Lord I said Lord, hide me Please hide me Please help me, all on that day He said, hide? Where were you? When you oughta have been prayin' I said Lord, Lord Hear me prayin', Lord, Lord Hear me prayin', Lord, Lord Hear me prayin', all on that day Sinnerman, you oughta be prayin' Oughta be prayin', sinnerman Oughta be prayin', all on that day

When the hurricane and earthquake hits SoCal
and the wildfire hits Maui and Canada, where you gonna run to?

Are you looking for someplace to hide from Climate Change?  Good luck with that!

Friday, August 18, 2023

Cordinate Transformation

 

Reflections

Through the mirror of my mind
Time after time
I see reflections of you and me
Reflections of
The way life used to be
Reflections of
The love you took from me

But can you reflect an absolute?

Mathematically you can translate a function with respect to an x-axis as g(x)=f(x-µ), where µ is the new location.  You can rotate a function with respect to an x-axis as g(x)=sin(θ)*f(x) where θ is the angle between the rotated and the original x-axis.  You can reflect a function with respect to an x-axis as g(x)=-f(x). But if x is the relationship to an absolute then that can restrict the ability to translate, rotate, or reflect a function of x, f(x). Translation, rotation, and reflection together can be called cordinate transformation

The simplest one to explain is the translation.  There is an absolute zero temperature.  It is measured in degrees Kelvin.  There is a 0° Kelvin but by definition you can not have a temperature below 0 degrees Kelvin.  It is more convenient for us to use a translated temperature scale where the origin, zero, has been translated by 273.15 degrees Kelvin, to the freezing point of water.  You can observe a temperature of -273° Celsius but you can’t observe a temperature of -274° Celsius because that would be below absolute zero.

Rotating a function in one direction is equivalent to rotating a function in the opposite direction by 2*π because the period of the sine function is 2π.  This has an impact on Euler’s Formula eix=cos(x)+isin(x) which is a rotation of the imaginary axis by x degrees.  If x is the relationship to the absolute, then x can not be less than zero because you can’t rotate the imaginary axis by something less than absolute zero.  That is why eix cannot be defined as a real number, or else it would create the paradox which would require both that cos(-x)= cos (x) and cos(-x) = -cos(x). 

This same restriction applies to the reflection.  If the origin of zero has been translated to µ, then you can reflect that function between µ and 0, but you can not reflect that function below 0 or else it would be a reflection below absolute zero.

Since a random number has the parameters µ and σ, and the variance, σ2, defines the range of that random number, a random number must be µ>3σ or else the random number could be below absolute zero.  

Thursday, August 17, 2023

Strategy v. Tactics

 

Climb Every Mountain

Climb every mountain
Ford every stream
Follow every mountain (every mountain)
Don't you ever give up, no ohh
Climb every mountain (every mountain)
There's a brighter day on the other side
Follow every rainbow
'Till you find your dream

It is NOT winning every battle!

Winning every battle might seem like a winning strategy, but it is not.  Bad strategies include Pyrrhic victories and "Winning the battle but losing the war". Winning a battle, being a good field general, a great tactician, does not mean that you would be a good strategist.  Frank Robinson might have been a great player and great manager, but the more likely examples are Ted Williams who was a great player/tactician, but a lousy manager/strategist and Joe Torre who was a lousy player/tactician but a great coach/strategist.  Dynasties are more likely when a great player is teamed with a great coach,  e.g. Tom Brady and Bill Belichick, than one without the other.

Winning every battle is a good strategy only if you prevent the other side in the battle from ever contending again.  But then you have to go all scorched earth on this, a la Rome v. Carthage. For example, while Dunkirk may have been a tactical loss for the British, destroying the British Army might have only emboldened the small ship captains who particpated in the miracle at Dunkirk and other civilians who would have ultimately defeated Hitler.  I think that this is a lesson that Vladmir Putin is learning in his ill-fated invasion of Ukraine.  The fact that the leader of Ukraine is a ordinary former TV comedian, and not a bully, only reinforces the fact that bullying/dominating alone does not always work.

Sometimes sacrificing a tactical victory is the better strategy.  In the 2004 American League Baseball Championship Series, Tim Wakefield was scheduled to be the Game 4 starting pitcher. But when Bronson Arroyo was knocked out of Game 3 after only two innings and the relievers were taking a beating in the game that the Sox would go on to lose 19-8, Wakefield sacrificed his chance to start Game 4 and volunteered to "take one for the team." He ended up pitching 3 1/3 innings in Game 3, allowing manager Terry Francona to keep the rest of the Red Sox bullpen rested. Relief pictchers Keith Foulke and Mike Timlin didn't have to pitch at all that day, and Alan Embree faced only four batters. When the next two games of the series went extra innings, it was crucial to have rested relievers, and Francona and his Red Sox teammates were quick to point out that without Wake's selfless act, the comeback would not have been possible.  In other words, Wakefield knew that a strategic series victory for the team was more important than his own tactical win in Game 4. And in doing so, he climbed the mountain and found every Red Sox fan’s dream.