Close
USC Libraries
University of Southern California
About
FAQ
Home
Login
USC Login
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Forecasting traffic volume using machine learning and kriging methods
(USC Thesis Other) 

Forecasting traffic volume using machine learning and kriging methods

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content
1


University of Southern California

Department of Preventive Medicine




Forecasting Traffic Volume Using Machine Learning and Kriging
Methods

By

Menglin Wang







Master of Science (BIOSTATISTICS)














May 2019










2



Contents
Abstract ......................................................................................................................................................... 3
1.      Introduction .......................................................................................................................................... 4
2.      Methods................................................................................................................................................ 5
2.1      Data Description and Preprocessing .............................................................................................. 5
2.2      Feature Engineering ....................................................................................................................... 7
2.2.1 Time Delay Embedding ............................................................................................................... 7
2.2.2 Spatiotemporal Distance and Neighborhood ............................................................................... 7
2.2.3 Tendency Indicator ...................................................................................................................... 8
2.2.4 Summary of Feature Engineering ................................................................................................ 9
2.3.      Machine Learning Methods .......................................................................................................... 9
2.3.1 Elastic Net Regression ................................................................................................................. 9
2.3.2 Gradient Boosting Regression Tree (GBRT) ............................................................................. 10
2.4.      Kriging ........................................................................................................................................ 11
3.      Results ................................................................................................................................................ 12
4.      Discussion and Further Approach ...................................................................................................... 17
5.      References .......................................................................................................................................... 18
6.      Appendix ............................................................................................................................................ 19




















3








Abstract


Spatiotemporally rich data on traffic volume can provide valuable information for other fields such
as city management and environmental research. Using a massive database of highway and arterial
traffic data provided by the University of Southern California TransDec Data System, machine
learning and kriging methods were used to predict traffic volumes in Long Beach, California. Time
delay embedding, spatiotemporal distance neighborhood and tendency indicators methods served
as feature engineering approaches. The results indicate that the temporal machine learning
prediction technique can provide accurate forecasts of traffic volume at sensor locations, while the
predictive performance of spatially kriging traffic volumes is poor. The potential reasons for the
failure of kriging and other limitations are also discussed.
 

4

1.      Introduction

Traffic is an important element of daily life for city dwellers and understanding the
spatial and temporal patterns in traffic flow can help both individuals and city planners.
Technological advances over the past several decades including GPS, accuracy of
roadway sensors, and mobile devices have provided researchers with better data that has
in turn led to improved traffic modeling and prediction.  
Timely and accurate traffic flow prediction can not only provide indispensable
information for commuters but can benefit research in many other areas such as air
pollution and noise mitigation [1] and smart city planning [2]. Long-term (days to
months) traffic prediction can provide a valuable means of evaluating future capacity
requirements, resulting in improved planning and decision making. Short-term prediction
(milliseconds to minutes) can inform dynamic resource allotment such as Quality of
Service (QoS) mechanisms and congestion control [3]. With technological advances in
measuring and recording traffic, the quantity of traffic data has exploded, and concurrent
advances in computational tools and capabilities have provided a practical foundation for
managing and analyzing these data.
In this study, we harness a rich database of traffic sensor measurements with the
objective of estimating, characterizing, and predicting the temporal and spatial
distribution of traffic flow on roads in the relatively large urban area of Long Beach
California. Traffic flow is quantified as a volume in terms of number of cars that pass
over a sensor every 30 seconds. The approach to predicting traffic volume is carried out
in two parts: 1) temporal forecasting and 2) spatial interpolation. Machine learning
methods, which provide an algorithmic approach, are used to predict future traffic
volumes at the sensor locations. Once we have the volumes, kriging is applied to estimate
traffic volume where there are no sensors collecting data. With a method that would
enable accurate estimates and predictions of traffic volumes along roads in a large urban
area, we can have a valuable tool for researchers interested in the environmental and
health impacts of traffic.  


5

2.      Methods
2.1      Data Description and Preprocessing
The University of Southern California TransDec [6] data system is a collection of traffic
data from sensors placed on highway and arterial (high capacity, non-highway) roads
throughout Los Angeles County. We selected Long Beach, California as our study area
(Figure 1).



Figure 1.  Sensor locations in area of Long Beach, CA  

The sensors are operated by state and local departments of transportation (Caltrans and
LA Metro) and record traffic information including occupancy, speed, volume, time,
HOV lane speed, and sensor statue every half minute from 2012 to present, making the
database massive. The structure of the data from both highway and arterial sensors is
similar, with a configuration table that provides the spatial locations of the sensors and a
history database where the collected traffic data are stored (Figure 2). Documentation that
provides explanations of the columns in the databases can be found in Appendix 1.  


6

HIGHWAY_CONGESTION_CONFIG HIGHWAY_CONGESTION_DATA
HIGHWAY_CONGESTION_HISTORY
CITY
AFFECTED_NUMBEROF_L
ANES
DATE_AND_TIME
LINK_TYPE
FROMSTREET
START_LAT_LONG
ONSTREET
DIRECTION
TOSTREET
POSTMILE
CONFIG_ID PK
LINK_ID PK
AGENCY PK
CONFIG_ID
SPEED
HOVSPEED
OCCUPANCY
LINK_STATUS
VOLUME
DATE_AND_TIME PK
LINK_ID PK
AGENCY PK
CONFIG_ID
SPEED
HOVSPEED
OCCUPANCY
LINK_STATUS
VOLUME
DATE_AND_TIME PK
LINK_ID PK
AGENCY PK




ARTERIAL_CONGESTION_CONFIG ARTERIAL_CONGESTION_DATA ARTERIAL_CONGESTION_HISTORY
CITY
AFFECTED_NUMBEROF_L
ANES
DATE_AND_TIME
LINK_TYPE
FROMSTREET
START_LAT_LONG
ONSTREET
DIRECTION
TOSTREET
POSTMILE
CONFIG_ID PK
LINK_ID PK
AGENCY PK
CONFIG_ID
SPEED
HOVSPEED
OCCUPANCY
LINK_STATUS
VOLUME
DATE_AND_TIME PK
LINK_ID PK
AGENCY PK
CONFIG_ID
SPEED
HOVSPEED
OCCUPANCY
LINK_STATUS
VOLUME
DATE_AND_TIME PK
LINK_ID PK
AGENCY PK



Figure 2. TransDec configuration diagrams: highway database (top) and arterial database (bottom).


The objective of this study is to temporally and spatially characterize longer-term traffic
volume that captures patterns over days and months; therefore, the half-minute data were
averaged to half-hour intervals and all units were consistent after averaging.

7

2.2      Feature Engineering
The original data cannot be used as our machine learning training dataset, since the way it
is organized does not reflect spatiotemporal relationships; thus, constructing appropriate
features to capture spatial and temporal information among data points is essential. To
make use of temporal information, lagged features were constructed by time delay
embedding, described in section 2.2.1. In addition, spatiotemporal neighborhood
averaging features described in 2.2.2 provide information not only about temporal
patterns, but also include spatial relations within a neighborhood of a sensor. Finally,
tendency indicators, described in 2.2.3, based on different neighborhoods around a sensor
can reflect the tendency change along a prescribed spatiotemporal distance.
2.2.1 Time Delay Embedding
One task is to forecast the future value of a time series at a certain location, based on
historical data. The most common way to accomplish this is to construct lagged features,
where the target variable is the future value of the series and the predictors are earlier
values. This method converts a time series problem to multivariable regression problem,
which can be addressed by machine learning. This technique is usually known as time
delay embedding. [7] Four one-step lagged features were constructed, which means we
used historical traffic data of last two hours to predict half-hour volumes ahead.
2.2.2 Spatiotemporal Distance and Neighborhood
The future traffic volume is not only associated to the historic volumes at the same
location, but also affected by traffic volumes at other nearby locations. Therefore, we
need also to describe the behavior of the time series within the surrounding area of the
target location. For this purpose, we define spatiotemporal distance similar to Mingyao et
al [8]

where di,j is the Euclidean distance between the locations with kilometers as its unit, ti,j is
the time distance between the objects (i and j) whose unit is hours, and α is weighting
parameter. In this study, 0.5 was used as weight, α, similar to Mingyao’s original method.
With this definition of spatiotemporal distance, we define the neighborhood, D, of a
point, o, at certain location and time. The neighborhood can be imagined as a cone, which
can be illustrated in Figure 3.

8


Figure 3. Illustration of the space-time neighborhood of point o [9]
 
The neighborhood can provide both temporal and spatial information to our model. As
the spatial distance decreases, the time distance will cover more historical data within a
smaller spatial area, and vice versa. It can be defined as,

where A is all available data, Di,o is the distance between data point, i, and target point, o,
and d is the width of the neighborhood.
In addition to the lagging features in section 2.2.1, the neighborhood-based averages with
different window widths (5, 1 and 0.2 km) were also constructed to capture associations
around the sensors.  
2.2.3 Tendency Indicator
Based on the moving averages with different widths, we construct tendency indicators, a
technique commonly used in finance. The tendency indicator is a ratio of two moving
averages of the same spatiotemporal location that can capture the general tendency
among neighborhoods described in 2.2.2. [9] For instance, if a narrower moving average,
which has smaller neighborhood distance and covers less spatiotemporal area, surpasses a
wider neighborhood average, we can know this is a downward tendency by their ratio.
The tendency indicator, TN,P, can be defined as follows,

where 𝑀 ̅
( 𝑁 ) is the moving average of the set of points belonging to neighborhood N and
𝑀 ̅
( 𝑃 ) is the moving average of set P, where P is a neighborhood with different width.

9

2.2.4 Summary of Feature Engineering
Given the aforementioned feature construction techniques, the original TransDec data
were reconstructed for the purpose of forecasting future (30-minute) traffic volumes at
the sensor locations. In summary, lagged features provide information about earlier
traffic and moving averages within spatiotemporal neighborhoods describe the
information of associated points within a spatiotemporal area. General traffic trends are
reflected through the constructed tendency indicators. The formulated features are
showed below,

where Vt is the data vector at time t including volume, speed and occupancy, 𝑀 ̅
(N5) is the
moving average with width 5 and T(N5, N1) is the tendency indicator of N5 and N1.

2.3.      Machine Learning Methods
After feature engineering, two machine learning methods were applied and compared for
forecasting traffic volume: Elastic Net Regression (Elastic Net) and Gradient Boosting
Regression Trees (GBRT). The python package sklearn was used for both methods. [11]
2.3.1 Elastic Net Regression
In LASSO, if there is a group of highly correlated features, the method tends to select one
of the variables and penalize other parameters to zero and empirically in this scenario, the
performance of LASSO is dominated by ridge regression [10]. This property is
determined by the L1 regularization term:

To avoid dropping related features and improve predictive performance, the elastic net
adds a L2 norm to the penalty, which is ridge regression when used alone:


10

The quadratic regularization term can make the loss function strictly convex and thus, it
improves the performance of the algorithm. The loss function of the elastic net is
expressed as:


It has been shown to overcome several limitations of LASSO (L1 penalty only) and ridge
methods (L2 penalty only).[15]
2.3.2 Gradient Boosting Regression Tree (GBRT)
Gradient boosting (GB) is an ensemble technique, which produces a strong prediction
model which have good predictive performance by combining many weak learners that
perform slightly better than random guess. Empirically, gradient boosting methods
usually have good prediction performance and are robust to outliers [11]. In particular,
the gradient boosting regression tree is one of the members of GB family and its
algorithm can be expressed as:

Input:    Dataset D = {(x1,y1), (x2,y2),…., (xn,yn)}
                 Loss function L(y, f(x))
Output:   learner 𝑓 ̂
(x)
1. Initialization  

2. For t = 1, 2, …, M:
For i = 1, 2, …, N:
1) Calculate negative gradient:

2) Fit a regression tree to r ti and get the leaf nodes R mj, j = 1, 2, …, J
3) For j = 1, 2, …., J, calculate:  

4) Update the learner:

11


3. Get the regression tree learner:
 
2.4.      Kriging
As sensor data are limited, but it is desirable to estimate traffic volumes along the spatial
continuum of roads, we test spatial prediction of the forecasted traffic volumes using
Kriging. Kriging is a standard geostatistical technique that involves minimum-mean-
squared-error method of spatial prediction and interpolation by weighted average.
Kriging weights are determined by the semi-variogram function and consist of two
statistical optimality criteria: 1) unbiasedness and 2) minimum mean-squared prediction
error [12]. Statistically, using observed spatial locations si (i.e. georeferenced traffic
sensor locations in latitude and longitude) to interpolate at unobserved locations s0 these
two principles can be expressed as



We used ordinary kriging with an exponential variogram model using Python’s pykrige
package [14]. Ordinary kriging does not introduce any trend or covariate spatial
information to predict spatial traffic volume, it only uses the spatial variance-covariance
information through the exponential function of distance between sensor points for
interpolation.  








12

3.      Results
TrandDec data from the entire month June 2012 Long Beach Area, CA was used as the
training dataset, and the data from the first five hours (00:00:00 – 05:00:00) of July 1
2012 at the same area was test set. The descriptive statistics of dataset can be found in
Table 1.

Table 1. Descriptive statistics of training data

 
Highway (108 sensors) Arterial (18 sensors) Overall (126 sensors)
 
Occupancy* Speed Volume Occupancy Speed Volume Occupancy Speed Volume
Train
Mean 5.99 63.54 11.55 2.86 31.72 4.37 5.66 60.26 10.81
Std 7.99 8.12 12.67 3.27 7.15 2.87 7.70 12.57 12.23
Max 75.00 70.00 107.57 39.57 50.00 15.71 75.00 70.00 107.57
Min 0.00 4.33 0.00 0.00 1.17 0.00 0.00 1.17 0.00
Test
Mean 1.99 66.60 3.23 0.25 28.09 1.25 1.85 63.75 3.08
Std 7.39 3.66 3.35 0.26 7.07 0.81 7.13 10.86 3.27
Max 75.00 70.00 28.52 1.04 41.25 3.81 75.00 70.00 28.52
Min 0.00 42.81 0.00 0.00 11.79 0.09 0.00 11.79 0.00

* Units of occupancy, speed and volume are percentage of time, miles/h and number of cars/half minute, respectively


On the training and test datasets, ElasticNet (Figure 4) and GBRT (Figure 5) have
satisfying predictive performance in terms of forecasting 30-minute traffic volumes. Both
methods have similar test R
2
(0.91 and 0.90, respectively), RMSE (0.63 for both) and
error rates (Table 1). The error rate is defined by the ratio of RMSE and mean traffic
volume, which can reflect the extent of deviation of forecasted volumes from the ground
truth.

13

   
Figure 4.  ElasticNet forecasting results

Figure 5. GBRT forecasting results

Table 2. Performance statistics of ElasticNet and GBRT for 30-minute forecasting  


Method Data R
2
RMSE Error Rate
Elastic Net Regression Training 0.967602 2.200636 0.203502
Testing 0.905806 0.625012 0.294160
GBRT Training 0.983217 1.583909 0.146471
Testing 0.903354 0.633095 0.297964


14

Surprisingly, the RMSEs of test sets are lower than those of the training sets, likely due to
the low average volume and variance of test set. Rather than RMSE, the error rate can
reflect the relative content of deviation from true volumes, adjusted for data used.


Figure 6. Forecasting performance of ElasticNet (top) and GBRT (bottom)
on highway (left) and arterial (right).

While elastic net regression and GBRT had similar performances over combined road
types, when we evaluate them separately ElasticNet surpasses GBRT significantly,
particularly on arterial roads (Table 3).  

Table 3. ElasticNet and GBRT performance statistics separated by road type.

Methods Road Type R
2
RMSE Error Rate
Elastic Net Highway 0.98841 0.252489 0.102247
Arterial 0.947385 0.129079 0.125385
GBRT Highway 0.88333 0.801078 0.3244
Arterial -0.03702 0.573053 0.556654


15

Figure 7. Kriging prediction for ElasticNet and GBRT forecasted traffic volumes  
at 7/1/2012 01:30:00

The leave-one-sensor-out cross validation was used to evaluate the performance of spatial
prediction of the forecasted traffic volumes. For comparison, both ElasticNet and GBRT
are used in spatial prediction. Figure 7 and Table 4 show the general predictive deviations
to the ground truth observed at the left-out sensor. Among the three performances with
different source data in Table 4, ElasticNet kriging had similar performance with kriging
with true volumes, while GBRT kriging performed worse. What’s more, GBRT kriging
had an odd R square, 0.386551, on arterial sensors, which might because of the deviation
of GBRT predicted arterial volumes (R
2
= -0.03702) used as training data. In general,
kriging method performances on highway surpassed the performances on arterial area.
This might be because far more highway sensors (108 highway sensors vs 18 arterial
sensors) were involved, compare to arterial sensors. However, the kriging performance
was not sufficient to capture the traffic distribution at different locations, even with the
true volumes as data source, which means we might need to improve the kriging method
to get a better result.  



16


Table 4. Kriging performance statistics separated by source data and road type

Source Data Road Type ErrorRate R
2
RMSE
ElasticNet Highway 0.776725 0.682805 1.918056

Arterial 0.616184 0.707667 0.634336

Overall 0.783046 0.684257 1.850142
GBRT Highway 0.793890 0.668630 1.960445

Arterial 0.892607 0.386551 0.918904

Overall 0.805134 0.666193 1.902331
Observed Volume Highway 0.777091 0.682506 1.918960

Arterial 0.637294 0.687294 0.656268

Overall 0.784051 0.683446 1.852518


Figure 8. ElasticNet kriging predicted volumes vs observed volumes




17

4.      Discussion and Further Approach
There are several challenges in predicting traffic flow. First, the distribution of measured
traffic volume is inherently spatiotemporally dynamic, varying from sensor to sensor.
Moreover, the distribution at a certain time contains complex interactions and
nonlinearities [13] and it might be impossible to build a statistical model that aptly
captures these characteristics. Finally, even with a significant amount of spatiotemporal
data, relying on fixed sensors can be a limitation, especially when one is trying to address
the spatial distribution on a mix of road types. The correlations between these sensors are
also not strong which make our prediction at the new location difficult.
In this study, the performance of temporal forecasting 30-minute intervals is
promising, while spatial interpolation by kriging is less so. Difficulty in spatial
interpolation could be due to the highly non-linear nature of traffic distribution, the non-
random selection of sensor locations only on highways and arterial roads and prescribed
along-road directionality of sensors. These spatial features of the data make the
continuous Gaussian random field assumption of kriging intractable for roadway
predictions. Future work could incorporate the concept of network-based kriging, which
rather than taking all pairwise distances takes non-Euclidean linked distances long a
network (road) [16].  
Overall, when we consider the spatial and temporal distribution of traffic volume,
traditional theoretical approaches may be inefficient due to the complexity of the data.
Considering the high nonlinearity and rich temporal information of the data, deep
learning proved to be a promising means of prediction and forecasting. Similar to what
we did for temporal prediction, an algorithmic approach might be a possible way to solve
the spatial problem. However, from the view of the data, the spatial component is still
limited, even though we had a massive amount of temporal information at each location.
To model this complex distribution over a larger area, we should consider increasing the
spatial domain and linking the sensor data along the roads by a network in order to
provide a more realistic spatial interpolation of traffic volumes.




18

5.      References
1.     Franklin, M., & Fruin, S. (2017). The role of traffic noise on the association between air
pollution and children’s lung function. Environmental Research, 157(April), 153–159.
http://doi.org/10.1016/j.envres.2017.05.024
2.   Batty, M. (2013). Big data, smart cities and city planning. Dialogues in Human
Geography, 3(3), 274–279. http://doi.org/10.1177/2043820613513390
3.    Joshi, M., & Hadi, T. H. (2015). A Review of Network Traffic Analysis and Prediction
Techniques. Retrieved from http://arxiv.org/abs/1507.05722
4.    Munoz, L., Xiaotian Sun, Horowitz, R., & Alvarez, L. (2003). Traffic density estimation
with the cell transmission model. Proceedings of the 2003 American Control Conference,
2003., 5(July), 3750–3755. http://doi.org/10.1109/ACC.2003.1240418
5.    Lv, Y., Duan, Y., Kang, W., Li, Z., & Wang, F. Y. (2015). Traffic Flow Prediction with
Big Data: A Deep Learning Approach. IEEE Transactions on Intelligent Transportation
Systems, 16(2), 865–873. https://doi.org/10.1109/TITS.2014.2345663
6.    Integrated Media Systems Center, USC. (2017) Our cutting-edge, interdisciplinary
research solves real-world problems both within and beyond the domain of
transportation. Retrieved from https://imsc.usc.edu/platforms/transdec/
7.    F. Takens, ‘Detecting strange attractors in turbulence’, Dynamical systems and
turbulence Warwick 1980, 898(1), 366–381, (1981)
8.    Q. Ming-yao, M. Li-xin, and S. Jie, ‘A spatio-temporal distance based two-phase
heuristic algorithm for vehicle routing problem’, in Fifth International Conference on
Natural Computation, ICNC’09, pp. 352–357. IEEE, (2009).
9.    Ohashi, O., & Torgo, L. (2012). Wind speed forecasting using spatio-temporal indicators.
Frontiers in Artificial Intelligence and Applications, 242, 975–980.
https://doi.org/10.3233/978-1-61499-098-7-975
10. Zou, H., & Hastie, T. (2005). elasticnet: Elastic Net regularization and variable selection.
R Package Version, 94305. https://doi.org/10.1037/h0100860
11. Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830,
2011.
12. Diggle, P. (2009). Applied Spatial Statistics for Public Health Data. Journal of the
American Statistical Association (Vol. 100). https://doi.org/10.1198/jasa.2005.s15
13. Dixon, M. F., Polson, N. G., & Sokolov, V. O. (2018). Deep learning for spatio-temporal
modeling: Dynamic traffic flows and high frequency trading. Applied Stochastic Models
in Business and Industry, 1–35. http://doi.org/10.1039/AN9941901659
14. PyKrige developers Revision, (2017). PyKrige Contents. Retrieved from
https://pykrige.readthedocs.io/en/latest/index.html
15. Hastie. (2009). Springer Series in Statistics The Elements of Statistical Learning, 27(2),
83–85. https://doi.org/10.1007/b94608
16. Zou H, Yue Y, Li Q, Yeh AGO. 2012. An improved distance metric for the interpolation
of link-based traffic data using kriging:A case study of a large-scale urban road network.
Int J Geogr Inf Sci 26:667–689; doi:10.1080/13658816.2011.609488.








19

6.      Appendix
Appendix 1 Highway and Arterial Database Dictionary

Column Description
CONFIG_ID Configuration version of inventory information
AGENCY Agency that provided the record
DATE_AND_TIME Date and time
LINK_ID Sensor ID
OCCUPANCY The percentage of time a sensor detects a vehicle in 30 seconds
●       For example, an occupancy of 5% means that of those 30
seconds, vehicle presence was detected for an aggregate
1.5 seconds
SPEED Distance traveled per unit time, and in traffic operations (mile/h)
●       mean speeds within a given roadway section on a
freeway
●       mean speeds within a given roadway section on a arterial
VOLUME Represents the number of vehicles passed by per sensor every 30
seconds
HOVSPEED Speed on HOV lane
LINK_STATUS Status of sensor, “OK”, “failed” or “unknown”


Appendix 2 Highway and Arterial Configuration Dictionary  

Column Description
CONFIG_ID Configuration version of inventory information
AGENCY Agency that provided the record
CITY City name
DATE_AND_TIME Update Date and time
LINK_ID Sensor ID
LINK_TYPE Sensor type
ONSTREET Street that sensor is on
START_LAT_LONG Latitude and longitude of sensor
DIRECTION 0->North, 1->South, 2->east, 3->West
POSTMILE Distance from a specific end of the road
AFFECTED_NUMBEROF_LANES Affected number of lanes 
Asset Metadata
Creator Wang, Menglin (author) 
Core Title Forecasting traffic volume using machine learning and kriging methods 
Contributor Electronically uploaded by the author (provenance) 
School Keck School of Medicine 
Degree Master of Science 
Degree Program Biostatistics 
Publication Date 04/25/2019 
Defense Date 04/24/2019 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag kriging,machine learning,oai:digitallibrary.usc.edu:usctheses,OAI-PMH Harvest,spatiotemporal prediction,Traffic 
Format application/pdf (imt) 
Language English
Advisor Franklin, Meredith (committee chair), Berhane, Kiros (committee member), Marjoram, Paul (committee member) 
Creator Email 963491922@qq.com,menglinw@usc.edu 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c89-146508 
Unique identifier UC11662792 
Identifier etd-WangMengli-7265.pdf (filename),usctheses-c89-146508 (legacy record id) 
Legacy Identifier etd-WangMengli-7265.pdf 
Dmrecord 146508 
Document Type Thesis 
Format application/pdf (imt) 
Rights Wang, Menglin 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract Spatiotemporally rich data on traffic volume can provide valuable information for other fields such as city management and environmental research. Using a massive database of highway and arterial traffic data provided by the University of Southern California TransDec Data System, machine learning and kriging methods were used to predict traffic volumes in Long Beach, California. Time delay embedding, spatiotemporal distance neighborhood and tendency indicators methods served as feature engineering approaches. The results indicate that the temporal machine learning prediction technique can provide accurate forecasts of traffic volume at sensor locations, while the predictive performance of spatially kriging traffic volumes is poor. The potential reasons for the failure of kriging and other limitations are also discussed. 
Tags
kriging
machine learning
spatiotemporal prediction
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button