Pave the
World
I swear I'm
in fourth gear,
My rollers down, trail of smears.
Asphalt, aggregate,
No more trees, pavement.
If you don’t like
aggregate, then how do you feel about disaggregation.
I am returning to the original purpose of this blog, which
was to keep from repeating my self to colleagues, and instead referring them to
a link to a blog post. So while the subject probably does not interest the vast
majority of you, here it goes.
Disaggregation
Disaggregation is merely the opposite of aggregation. Matrices
might be aggregated; in which case they can of course be disaggregated. There
are also cells in the matrix which might be required to be aggregated, or
otherwise identified, for example cells that pass through the area to be “disaggregated.”
A knowledge of which cells pass through the area to be disaggregated, for example
a network and/or its routing table which identifies which cells of the matrix
use links of the network that are located in the area being disaggregated, then
also needs to be provided. The application of disaggregation factors, without identifying
which links in the disaggregated network should be used, is called matrix expansion.
The filtering of cells in the matrix that use links in the disaggregation area
is variously called subarea extraction, windowing, or select link analysis. For
purposes of this discussion, disaggregation is taken to mean that only occurring
without a network and includes only cells that travel to, from, or
within the study area which is being disaggregated. This is also known as I-I,
I-E and E-I trips, where I are the internal cells to be disaggregated and E are the external cells that are not to be disaggregated.
Cells that are E-E, which begin outside of the area to be disaggregated,
end outside of the area to be disaggregated, but also use links within the area
to be disaggregated, i.e. E-E Pass-through, use a different method than matrix expansion
and its disaggregation factors.
Disaggregation, i.e. matrix expansion, requires the identification
of how the matrix was aggregated in the
first place. This is not always known, or even knowable. The matrix might not
have been aggregated. For example, the Federal Highway Administration’s, FHWA’s, Freight Analysis Framework, FAF, is developed from the Commodity Flow Survey, CFS and methods applied to shipments that are outside the scope of the Commodity Flow Survey. Those CFS Out Of Scope, OOS, flows,
were not aggregated. The Commodity Flow Survey flows were aggregated by
applying expansion factors to surveys. However the surveys themselves are not
disclosed to protect the confidentiality of the survey respondents. Those surveys
also do not represent 100% of the flows. Their expansion factors only apply to the
groupings of the surveys which are statically significant. There are no expansion
factors for grouping of cells which were not statistically significant. The absence
of aggregation factors means that it is
not possible to develop the guaranteed disaggregation factors. However
that does not mean that it is not possible to develop most probable disaggregation
factors.
The concept of the most probable is common in transportation
planning.
·
Frataring, often known as IPF, Iterative Proportional
Fitting.
·
the Gravity Model.
·
the Logit Mode Split Model; and
·
Origin Destination Matrix Estimation, ODME, etc.
are all based on finding the most probable trip
table. Disaggregation itself is an intermediate, meso, state between individual,
micro, states and the aggregated, macro, state. The desired matrix is an
intermediate, thus the mathematical concept of the most probable, the mesostate
which contains the greatest number of microstates, can be used. However using the
most probable disaggregation requires honoring the concept of the most probable
disaggregation.
Dimensions Being Disaggregated.
Theoretically any of the indices, dimensions, primary key
fields of the matrix, can be used in disaggregation. For example, the FAF has four primary key fields: the origin, the
destination, the commodity, and the mode. While it is possible to disaggregate commodities
in the FAF, which are currently reported as Standard Classification of Transported
Goods at the two-digit level, SCTG2, by a greater number of SCTG digits, this is
not often done (using a different Commodity classification system would be CROSSWALKING, not DISAGGREGATING). While the modes reported in the FAF can be disaggregated
(e.g. Water can be disaggregated into Inland Barges, Great Lakers, and Deep Sea
vessels) the most common application of disaggregation is for the origins and destinations,
geography, from the reported states and regions to smaller zones, for example
to county-equivalents in those states and regions.
Correlation is not causation
Disaggregation can be on any characteristics of the dimensions
of the matrix. For example, a region with two counties can be disaggregated 50%
to each county for both origins and destinations. However it is not expected
that this disaggregation factor will be correlated with the number of those smaller
zones. For example, it might be more likely that origins will be correlated
with employment and that destinations will be correlated with population. If
the two counties had respectively 70% and 30% of the employment and 60% and 40%
of the population, those characteristics might have been used as disaggregation
factors instead. However those factors do not consider the commodity or mode. It
may not be possible to develop or obtain factors that are CAUSING the flow. It
is more likely that information showing CORRELATED factors. In this case, care
should be taken that the correlation makes sense and is not a spurious
correlation.
Perspective Matters
The nature of the matrix being disaggregated may have information
that can be used in creating disaggregation factors. For example, in the transportation
of freight, there are three functions: shipper, receiver, and carrier. It is
possible for two or more of those functions to be combined: e.g. Dole may carry
goods that it receives in trucks which it owns. Wal-Mart may carry goods in
trucks which it owns, and additionally serve as both the shipper and the receiver,
and that flow may not even be reported if it is an internal business operation.
The source of the matrix may provide information on the dimensions and the disaggregation
factors to be used.
As noted, the FAF is derived from the Commodity Flow Survey
of shippers. The dimension is an origin, but the Domestic Origin is the origin
of the shipper for domestic trade, while the Foreign Origin is the origin of
the shipper for import trade. The Domestic Origin for imports, and the Domestic
Destination for exports, is more properly a Port Of Entry/Exit, POE, which is a
carrier transfer point at which no economic activity of the shipper or receiver
is likely to occur.
Because the flows are as reported by the shipper, the mode
used by the carrier may not be known, only the rate that was charged by the
carrier. Thus a mode in the FAF is Air and Truck where the shipper pays a rate
that is typical of air, but the carrier might choose operationally to use a
truck. Because the flow is as reported by the shipper, the commodity is always
known which is why the SCTG2 system, which does not include Freight All Kind,
FAK, is used, while the carrier may not know the commodity and use the Standard
Transportation Classification Code commodity system which does include FAK. Similarly
the FAF does not include the packaging of the commodity in containers, while
containers may be an important reporting by carriers.
In carrier focused matrices, the origins may include both
carrier and shipper origins and the destinations may include both carrier and receiver
destinations. A source of economic activity from an origin may only be
pertinent for shipper-receiver matrices, such as the FAF, not carrier matrices
such as Transearch. The first origin and the last destination may also reflect the
economic activity of a customer, or a carrier operations while intermediate
stops as reported in modal databases, are almost always carrier operations/activity.
Disaggregation Factors should be unique.
The matrix is probably not symmetric and has unique flows
in each cell, and thus the disaggregation factors should also be expected to be
unique. This is the ideal. It may be that disaggregation factors can be created
for several dimensions, but not for all dimensions. In which case something is
better than nothing. For example, import trucks and trains are reported at
Border Crossing Port of Entries. The commodity contents of those trucks and
trains are not reported and the export flows are not reported. It can be assumed
that the relative share for vehicles is the same as the relative share for the
commodities in those vehicles, and the relative share for imports is the same
as the relative share for exports. While this is hardly unique, it is preferable
to making no assumptions at all.
Disaggregation factors should sum to 100%.
Disaggregation factors must sum to 100% for each dimension
for each cell that is being disaggregated. Thus if regions are being disaggregated
to counties, the disaggregation factors for those counties should sum to 100%
for each region. This means that the absolute number of shipments from each
county is not needed to create disaggregation factors, only the relative number
of shipments, activities, for each county in that region.
The number of disaggregations, mesostates, should be kept as small possible.
The most probable disaggregation factors depend on having as many microstates as possible. The number of microstates will not change. Increasing
the number of disaggregation factors for each cell increases the number of mesostates
but decreases the number of microstates in each mesostate.
The most probable is not a
guarantee. For example in the game of craps, the most probable roll over
a very large number of rolls is a seven. On the next roll the total may be a
two, but this does not change the fact that seven was still the next most probable
roll. To ensure that the most probable event has meaning, the number of events,
i.e. the number of microstates in a mesostate, should be kept as high as possible
which means that the number of mesostates should be as low as possible.
Activities should correlate with a disaggregation factor.
Economic activity may apply to all goods transported from
a shipper to a receiver. However only those economic activities that also use a
mode should be related to any disaggregation of those modal flows. Information
is available on which counties are not served by active rail tracks or navigable
water. This does not mean that they are served by rail or water carriers, only
that if the tracks or navigable water do not exist, then service by carriers could not be offered. Such information
is not available for other modes and as a default all economic activity may be considered
in creating disaggregation factors. While there is no information on trucks, it
is assumed that trucks serve all economic activities.
Disaggregation factors in the tool that I developed
Domestic economic activity
For the domestic trade type, the origin is the domestic origin,
and the destination is the domestic destination. The disaggregation factors
should also be by commodity and by mode. As noted above, the Oak Ridge National
Laboratory’s Center of Transportation Analysis has identified counties that are
NOT servicable by rail or water. And this information can be used in developing relative
disaggregation factors. The economic activity information is not generally
available by commodity but is available by employment in an economic sector,
which is generally the industry by the three-digit North American Industrial Classification
System, NAICS3. Economic Input - Output models have Make and Use tables
which relate economic sectors with commodities. The Bureau of Economic Analysis
maintains Make and Use tables and crosswalks can be developed between its
commodity and economic sectors, and NAICS3
industries and SCTG2 commodities. The BEA Make and Use tables are for average percentages
of value by economic sector. These averages are just that and a firm may be
more or less productive than the average. Additionally the share of employment
may not equate to the share of economic value. And the share of the economic value
may not equate to the share of tonnages. However if these are assumed, it is
possible to develop the economic activity by commodity for origin (shipper/Make) and destination ( receiver/Use.)
Import carrier activity
The economic activity of imports to the domestic destination
is the same as the domestic economic activity above. The domestic origin for imports
is a Port of Entry. For truck and rail, the Bureau of Transportation Statistics, BTS, Transborder flows, can identify
the relative share of carrier activity at each truck and train border crossing in a FAF region.
For water, the US Army Corps of Engineers Waterborne Commerce Statistics Center
reports import tonnages by commodity. It uses the Public Group commodity classification
codes, but those can be crosswalked to SCTG2 commodities. The relative share by
water Port POE for each commodity for each FAF region can be calculated from this
information. There is no information on the domestic mode which is used at
these POEs. For all other foreign modes there is no information that can be used
to develop disaggregation factors, so the POE is merely relabeled with its
foreign mode and domestic region.
Export carrier activity
The economic activity of exports from the domestic origin
is the same as the domestic economic activity above. The domestic destination for
exports however is a Port of Exit, POE. For truck and rail, the Bureau of
Transportation Statistics, BTS, Transborder flows, can identify the relative share
of carrier import activity at each truck and train border crossing and it is
assumed that carrier export activity has the same share. For water, the US Army
Corps of Engineers Waterborne Commerce Statistics Center, reports export
tonnages by commodity by port waterway. It uses the Public Group commodity classification
codes, but these can be crosswalked to SCTG2 commodities as used by the FAF. The
relative share by water Port POE for each commodity for each region can be calculated
from this information. There is no information on the domestic mode that is
used at these POEs. For all other foreign modes there is no information that
can be used to develop disaggregation factors, so the POE is merely relabeled
with its foreign mode and domestic region.