Wednesday, August 30, 2023

Disaggregation

 

Pave the World

I swear I'm in fourth gear,
My rollers down, trail of smears.
Asphalt, aggregate,
No more trees, pavement.

If you don’t like aggregate, then how do you feel about disaggregation.

I am returning to the original purpose of this blog, which was to keep from repeating my self to colleagues, and instead referring them to a link to a blog post. So while the subject probably does not interest the vast majority of you, here it goes.

Disaggregation

Disaggregation is merely the opposite of aggregation. Matrices might be aggregated; in which case they can of course be disaggregated. There are also cells in the matrix which might be required to be aggregated, or otherwise identified, for example cells that pass through the area to be “disaggregated.” A knowledge of which cells pass through the area to be disaggregated, for example a network and/or its routing table which identifies which cells of the matrix use links of the network that are located in the area being disaggregated, then also needs to be provided. The application of disaggregation factors, without identifying which links in the disaggregated network should be used, is called matrix expansion. The filtering of cells in the matrix that use links in the disaggregation area is variously called subarea extraction, windowing, or select link analysis. For purposes of this discussion, disaggregation is taken to mean that only occurring without a network and includes only cells that travel to, from, or within the study area which is being disaggregated. This is also known as I-I, I-E and E-I trips, where I are the internal cells to be disaggregated and E are the external cells that are not to be disaggregated. Cells that are E-E, which begin outside of the area to be disaggregated, end outside of the area to be disaggregated, but also use links within the area to be disaggregated, i.e. E-E Pass-through, use a different method than matrix expansion and its disaggregation factors.

Disaggregation, i.e. matrix expansion, requires the identification of how the matrix was aggregated in the first place. This is not always known, or even knowable. The matrix might not have been aggregated. For example, the Federal Highway Administration’s, FHWA’s, Freight Analysis Framework, FAF, is developed from the Commodity Flow Survey, CFS and methods applied to shipments that are outside the scope of the Commodity Flow Survey. Those CFS Out Of Scope, OOS, flows, were not aggregated. The Commodity Flow Survey flows were aggregated by applying expansion factors to surveys. However the surveys themselves are not disclosed to protect the confidentiality of the survey respondents. Those surveys also do not represent 100% of the flows. Their expansion factors only apply to the groupings of the surveys which are statically significant. There are no expansion factors for grouping of cells which were not statistically significant. The absence of aggregation factors  means that it is not possible to develop the guaranteed disaggregation factors. However that does not mean that it is not possible to develop most probable disaggregation factors.

The concept of the most probable is common in transportation planning.

·        Frataring, often known as IPF, Iterative Proportional Fitting.

·        the Gravity Model.

·        the Logit Mode Split Model; and

·        Origin Destination Matrix Estimation, ODME, etc.

are all based on finding the most probable trip table. Disaggregation itself is an intermediate, meso, state between individual, micro, states and the aggregated, macro, state. The desired matrix is an intermediate, thus the mathematical concept of the most probable, the mesostate which contains the greatest number of microstates, can be used. However using the most probable disaggregation requires honoring the concept of the most probable disaggregation.

Dimensions Being Disaggregated.

Theoretically any of the indices, dimensions, primary key fields of the matrix, can be used in disaggregation. For example, the FAF has four primary key fields:  the origin, the destination, the commodity, and the mode. While it is possible to disaggregate commodities in the FAF, which are currently reported as Standard Classification of Transported Goods at the two-digit level, SCTG2, by a greater number of SCTG digits, this is not often done (using a different Commodity classification system would be CROSSWALKING, not DISAGGREGATING). While the modes reported in the FAF can be disaggregated (e.g. Water can be disaggregated into Inland Barges, Great Lakers, and Deep Sea vessels) the most common application of disaggregation is for the origins and destinations, geography, from the reported states and regions to smaller zones, for example to county-equivalents in those states and regions.

Correlation is not causation

Disaggregation can be on any characteristics of the dimensions of the matrix. For example, a region with two counties can be disaggregated 50% to each county for both origins and destinations. However it is not expected that this disaggregation factor will be correlated with the number of those smaller zones. For example, it might be more likely that origins will be correlated with employment and that destinations will be correlated with population. If the two counties had respectively 70% and 30% of the employment and 60% and 40% of the population, those characteristics might have been used as disaggregation factors instead. However those factors do not consider the commodity or mode. It may not be possible to develop or obtain factors that are CAUSING the flow. It is more likely that information showing CORRELATED factors. In this case, care should be taken that the correlation makes sense and is not a spurious correlation.

Perspective Matters

The nature of the matrix being disaggregated may have information that can be used in creating disaggregation factors. For example, in the transportation of freight, there are three functions: shipper, receiver, and carrier. It is possible for two or more of those functions to be combined: e.g. Dole may carry goods that it receives in trucks which it owns. Wal-Mart may carry goods in trucks which it owns, and additionally serve as both the shipper and the receiver, and that flow may not even be reported if it is an internal business operation. The source of the matrix may provide information on the dimensions and the disaggregation factors to be used.

As noted, the FAF is derived from the Commodity Flow Survey of shippers. The dimension is an origin, but the Domestic Origin is the origin of the shipper for domestic trade, while the Foreign Origin is the origin of the shipper for import trade. The Domestic Origin for imports, and the Domestic Destination for exports, is more properly a Port Of Entry/Exit, POE, which is a carrier transfer point at which no economic activity of the shipper or receiver is likely to occur.

Because the flows are as reported by the shipper, the mode used by the carrier may not be known, only the rate that was charged by the carrier. Thus a mode in the FAF is Air and Truck where the shipper pays a rate that is typical of air, but the carrier might choose operationally to use a truck. Because the flow is as reported by the shipper, the commodity is always known which is why the SCTG2 system, which does not include Freight All Kind, FAK, is used, while the carrier may not know the commodity and use the Standard Transportation Classification Code commodity system which does include FAK. Similarly the FAF does not include the packaging of the commodity in containers, while containers may be an important reporting by carriers.

In carrier focused matrices, the origins may include both carrier and shipper origins and the destinations may include both carrier and receiver destinations. A source of economic activity from an origin may only be pertinent for shipper-receiver matrices, such as the FAF, not carrier matrices such as Transearch. The first origin and the last destination may also reflect the economic activity of a customer, or a carrier operations while intermediate stops as reported in modal databases, are almost always carrier operations/activity.

Disaggregation Factors should be unique.

The matrix is probably not symmetric and has unique flows in each cell, and thus the disaggregation factors should also be expected to be unique. This is the ideal. It may be that disaggregation factors can be created for several dimensions, but not for all dimensions. In which case something is better than nothing. For example, import trucks and trains are reported at Border Crossing Port of Entries. The commodity contents of those trucks and trains are not reported and the export flows are not reported. It can be assumed that the relative share for vehicles is the same as the relative share for the commodities in those vehicles, and the relative share for imports is the same as the relative share for exports. While this is hardly unique, it is preferable to making no assumptions at all.

Disaggregation factors should sum to 100%.

Disaggregation factors must sum to 100% for each dimension for each cell that is being disaggregated. Thus if regions are being disaggregated to counties, the disaggregation factors for those counties should sum to 100% for each region. This means that the absolute number of shipments from each county is not needed to create disaggregation factors, only the relative number of shipments, activities, for each county in that region.

The number of disaggregations, mesostates, should be kept as small possible.

The most probable disaggregation factors depend on having as many microstates as possible. The number of microstates will not change. Increasing the number of disaggregation factors for each cell increases the number of mesostates but decreases the number of microstates in each mesostate.

The most probable is not a  guarantee. For example in the game of craps, the most probable roll over a very large number of rolls is a seven. On the next roll the total may be a two, but this does not change the fact that seven was still the next most probable roll. To ensure that the most probable event has meaning, the number of events, i.e. the number of microstates in a mesostate, should be kept as high as possible which means that the number of mesostates should be as low as possible.

Activities should correlate with a disaggregation factor.

Economic activity may apply to all goods transported from a shipper to a receiver. However only those economic activities that also use a mode should be related to any disaggregation of those modal flows. Information is available on which counties are not served by active rail tracks or navigable water. This does not mean that they are served by rail or water carriers, only that if the tracks or navigable water do not exist, then service by carriers could not be offered. Such information is not available for other modes and as a default all economic activity may be considered in creating disaggregation factors. While there is no information on trucks, it is assumed that trucks serve all economic activities.

Disaggregation factors in the tool that I developed

Domestic economic activity

For the domestic trade type, the origin is the domestic origin, and the destination is the domestic destination. The disaggregation factors should also be by commodity and by mode. As noted above, the Oak Ridge National Laboratory’s Center of Transportation Analysis has identified counties that are NOT servicable by rail or water. And this information can be used in developing relative disaggregation factors. The economic activity information is not generally available by commodity but is available by employment in an economic sector, which is generally the industry by the three-digit North American Industrial Classification System, NAICS3. Economic Input - Output models have Make and Use tables which relate economic sectors with commodities. The Bureau of Economic Analysis maintains Make and Use tables and crosswalks can be developed between its commodity and economic sectors,  and NAICS3 industries and SCTG2 commodities. The BEA Make and Use tables are for average percentages of value by economic sector. These averages are just that and a firm may be more or less productive than the average. Additionally the share of employment may not equate to the share of economic value. And the share of the economic value may not equate to the share of tonnages. However if these are assumed, it is possible to develop the economic activity by commodity for origin (shipper/Make) and destination ( receiver/Use.)

Import carrier activity

The economic activity of imports to the domestic destination is the same as the domestic economic activity above. The domestic origin for imports is a Port of Entry. For truck and rail, the Bureau of Transportation Statistics, BTS, Transborder flows, can identify the relative share of carrier activity at each truck and train border crossing in a FAF region. For water, the US Army Corps of Engineers Waterborne Commerce Statistics Center reports import tonnages by commodity. It uses the Public Group commodity classification codes, but those can be crosswalked to SCTG2 commodities. The relative share by water Port POE for each commodity for each FAF region can be calculated from this information. There is no information on the domestic mode which is used at these POEs. For all other foreign modes there is no information that can be used to develop disaggregation factors, so the POE is merely relabeled with its foreign mode and domestic region.

 Export carrier activity

The economic activity of exports from the domestic origin is the same as the domestic economic activity above. The domestic destination for exports however is a Port of Exit, POE. For truck and rail, the Bureau of Transportation Statistics, BTS, Transborder flows, can identify the relative share of carrier import activity at each truck and train border crossing and it is assumed that carrier export activity has the same share. For water, the US Army Corps of Engineers Waterborne Commerce Statistics Center, reports export tonnages by commodity by port waterway. It uses the Public Group commodity classification codes, but these can be crosswalked to SCTG2 commodities as used by the FAF. The relative share by water Port POE for each commodity for each region can be calculated from this information. There is no information on the domestic mode that is used at these POEs. For all other foreign modes there is no information that can be used to develop disaggregation factors, so the POE is merely relabeled with its foreign mode and domestic region.

No comments:

Post a Comment