Order Demand Prediction and Anomaly-point Identification for Online Car-hailing Orders Based on Hybrid Machine Learning Framework
-
摘要: 城市网约车订单需求体现了居民出行活力,同时表征了出行规律和内在特征。如何从复杂动态的时变数据中准确地识别异常点并进行调度优化,是优化网约车平台运力的关键环节。建立了网约车订单需求数据的时间序列图,并分析了订单需求的动态特性,提出1种基于混合机器学习框架的网约车订单需求预测模型(ARIMA-BPNN-DSR, ABD)。混合模型由差分整合移动平均自回归模型(auto regressive integrated moving average model,ARIMA)和反向传播神经网络(back propagation neural network,BPNN)通过动态选择回归算法(dynamic selection of regression,DSR)融合而成。混合模型汲取了统计方法的鲁棒性和机器学习方法的高效性,并考虑各个独立基线模型在数据局部空间上的性能表现。以2019年和2020年(疫情影响下)厦门市滴滴网约车平台订单数据作为试验基准并进行对比分析,结果表明:①与多个基线模型相比,ABD模型实现了最优的预测性能,同时在面向疫情外部因素影响下同样表现出优异的性能;②消融实验表明,在常规序列中,BPNN对融合模型的预测性能增益更高。混合模型相比较单独的ARIMA和BPNN模型,在预测性能指标上,平均绝对误差(mean absolute error,MAE)分别提高22.77%和13.50%,均方百分比误差(mean absolute percentage error,MAPE指标分别提高21.71%和12.37%。另外,在受到2020年的外部干扰下,ARIMA提供的稳定性至关重要;③预测结果与观测值之间的残差结合3-sigma异常检测准则实现订单数据中的需求突增异常点自动识别,以此提高交通管理效率。该结果说明,提出的ABD模型具有良好的预测精度和鲁棒性。Abstract: The demand for urban ride-hailing services holds significant potential for understanding residents'travel behaviors, patterns and intrinsic characteristics. Accurately identifying anomalies and optimizing scheduling from the complex and dynamic spatio-temporal data of ride-hailing usage can contribute to extending a platform's capacity. Time series graph of ride-hailing order data is established to analyze its dynamic characteristics. Therefore, a hybrid prediction model that predicts ride-hailing order demand based on machine learning methods, called ARIMA-BPNN-DSR (ABD), is proposed by integrating the auto regressive integrated moving average model (ARIMA) and the back propagation neural network (BPNN) modules. To achieve the hybrid prediction model, the dynamic selection of regression (DSR) method is applied to fuse these two modules. The DSR method takes advantage of the robustness of statistical methods and the efficiency of machine learning methods, and considers the performance of independent models within the local data space. Extensive experiments and analyses are conducted on the time series data from Didi's ride-hailing order demand in Xiamen City, including data from 2019 (without epidemic) and data from 2020 (with epidemic). Experimental results show that: ①The ABD model outperforms baseline models, providing accurate predictions for peak demand. Therefore, incorporating ensemble learning strategies significantly improves the prediction accuracy of the proposed model. ②Ablation experiments reveal that the BPNN significantly enhances the predictive performance of the fusion model in standard sequences. Compared to individual ARIMA and BPNN models, the mean absolute error (MAE) of ABD model is reduced by 22.77% and 13.50%, and the mean absolute percentage error (MAPE) is reduced by 21.71% and 12.37%, respectively. Considering the external interference in 2020, the stability provided by ARIMA is essential. ③By comparing the error between historical data and predicted results with the 3-sigma anomaly detection criteria, ABD model accurately identifies anomalies in the order data, thereby increasing the efficiency of traffic management. In conclusion, the proposed ABD model has a better performance in both accuracy and robustness.
-
表 1 正态分布检验结果
Table 1. Results of normality distribution test
指标 Shapiro-Wilk 统计量 df sig. 日订单量 0.994 286 0.363 表 2 融合模型所使用的超参数说明
Table 2. Description of hyperparameters of fusion model
模型 参数 取值 定义 p 1 偏自相关阶数 ARIMA d 0 差分阶数 q 0 自相关阶数 BPNN 学习率 0.01 缩放步长 隐层单元 3 特征缩放维度数 反向传播算法 Adam 更新网络参数的方式 迭代次数 200 网络遍历1次训练数据集的次数 DSR K 5 选择与测试数据集最邻近的训练数据集数目 表 3 融合模型与各基线模型的预测精度指标对比
Table 3. Evaluation metrics results of each sub-model
指标 基线模型 ABD RF XGBoost MAE/(×104) 1.73 1.98 2.21 MAPE/% 5.95 6.83 7.55 表 4 2019年数据上消融实验预测精度指标对比
Table 4. Comparison of ablation by ABD model in 2019
指标 BPNN ARIMA ABD MAE/(×104) 2.00 2.24 1.73 MAPE/% 6.79 7.60 5.95 表 5 不同数据上消融实验预测精度指标对比
Table 5. Comparison of prediction accuracy of ablation by ABD model on different time range data
指标 时间段 模型 BPNN ARIMA ABD MAE/(×104) 2019 2.00 2.24 1.73 2020 4.30 2.15 2.07 MAPE/% 2019 6.79 7.60 5.95 2020 15.29 7.45 7.15 -
[1] HUSHCHYN M, USTYUZHANIN A. Generalization of change-point detection in time series data based on direct density ratio estimation[J]. Journal of Computational Science, 2021(53): 101385. [2] HEIRUNG T A N, MESBAH A. Input design for active fault diagnosis[J]. Annual Reviews in Control, 2019(47): 35-50. [3] KOUW W M, LOOG M. A review of domain adaptation without target labels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(3): 766-785. doi: 10.1109/TPAMI.2019.2945942 [4] SMITH B L, WILLIAMS B M, KEITH OSWALD R. Comparison of parametric and nonparametric models for traffic flow forecasting[J]. Transportation Research Part C: Emerging Technologies, 2002, 10(4): 303-321. doi: 10.1016/S0968-090X(02)00009-8 [5] 张春辉, 宋瑞, 孙杨. 基于卡尔曼滤波的公交站点短时客流预测[J]. 交通运输系统工程与信息, 2011, 11(4): 154-159. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201104025.htmZHANG C H, SONG R, SUN Y. Kalman filter-based short-term passenger flow forecasting on bus stop[J]. Journal of Transportation Systems Engineering and Information Technology, 2011, 11(4): 154-159. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201104025.htm [6] 文琰杰, 许旺土, 张晓阳, 等. 基于SVR的逐日网约车服务需求预测方法[J]. 城市建筑, 2021, 18(10): 50-54. https://www.cnki.com.cn/Article/CJFDTOTAL-JZCS202110012.htmWEN Y J, XU W T, ZHANG X Y, et al. Forecasting method of daily network rounding service demand based on SVR[J]. Urbanism andArchitecture, 2021, 18(10): 50-54. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JZCS202110012.htm [7] 余婷, 裴莉莉, 李伟, 等. 基于随机森林算法的路面状况指数预测[J]. 公路交通科技, 2021, 38(10): 16-23. https://www.cnki.com.cn/Article/CJFDTOTAL-GLJK202110003.htmYU T, PEI L L, LI W. Prediction of pavement surface condition index based on random forest algorithm[J]. Journal of Highway and Transportation Research and Development, 2021, 38(10): 16-23. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-GLJK202110003.htm [8] 赵顗, 沈玲宏, 马健霄, 等. 综合小波分解和BP神经网络的交通小区生成交通短时预测[J]. 重庆交通大学学报(自然科学版), 2021, 40(11): 60-66. https://www.cnki.com.cn/Article/CJFDTOTAL-CQJT202111009.htmZHAO Y, SHEN L H, MA J X, et al. Traffic short-term prediction generated by wavelet decomposition and BP neural network of traffic zone[J]. Journal of Chongqing Jiaotong University(Natural Science), 2021, 40(11): 60-66. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CQJT202111009.htm [9] GENG X, LI Y, WANG L, et al. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 3656-3663. doi: 10.1609/aaai.v33i01.33013656 [10] 黄昕, 毛政元. 基于时空多图卷积网络的网约车乘客需求预测[J]. 地球信息科学学报, 2023, 25(2): 311-323. https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202302007.htmHUANG X, MAO Z Y. Prediction of passenger demand for online car-hailing based on spatio-temporal multi-graph convolution network[J]. Journal of Geo-information Science, 2023, 25(2): 311-323. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-DQXX202302007.htm [11] LIAO L, LI B, ZOU F, et al. MFGCN: a multimodal fusion graph convolutional network for online car-hailing demand prediction[J]. IEEE Intelligent Systems, 2023, 38(3): 21-30. [12] 帅春燕, 王昱翔, 许庚. 混合模型在网约车出行预测研究中的应用[J]. 重庆理工大学学报(自然科学), 2022, 36(7): 162-169. https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202207021.htmSHUAI C Y, WANG Y X, XU G. Application of hybrid model in ride-hailing trip prediction research[J]. Journal of Chongqing University of Technology(Natural Science), 2022, 36(7): 162-169. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CGGL202207021.htm [13] 谷远利, 李萌, 芮小平, 等. 基于深度学习的网约车供需缺口短时预测研究[J]. 交通运输系统工程与信息, 2019, 19(2): 223-230. https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201902032.htmGU Y L, LI M, RUI X P, et al. Short-term forecasting of supply-demand gap under online car-hailing services based on deep learning[J]. Journal of Transportation Systems Engineering and Information Technology, 2019, 19(2): 223-230. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-YSXT201902032.htm [14] CHEN Z, LIU K, WANG J, et al. H-ConvLSTM-based bagging learning approach for ride-hailing demand prediction considering imbalance problems and sparse uncertainty[J]. Transportation Research Part C: Emerging Technologies, 2022(140): 103709. [15] LAM P, WANG L, NGAN H Y, et al. Outlier detection in large-scale traffic data by naive bayes method and gaussian mixture model method[C]. IS&T International Symposium on Electronic Imaging: Intelligent Robotics and Industrial Applications using Computer Vision, Burlingame, USA: Society for Imaging Science and Technology(IS&T), 2017. [16] DANG T T, NGAN H Y T, LIU W. Distance-based k-nearest neighbors outlier detection method in large-scale traffic data[C]. IEEE International Conference on Digital Signal Processing(DSP), Singapore: IEEE, 2015 [17] CHENG Y, ZHANG Y, HU J, et al. Mining for similarities in urban traffic flow using wavelets[C]. 2007 IEEE Intelligent Transportation Systems Conference, Bellevue, USA: IEEE, 2007. [18] 许淼, 刘宏飞, 苏岳龙. 考虑交通事件影响的城市道路行程时间预测[J]. 中国公路学报, 2021, 34(12): 229-238. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL202112017.htmXU M, LIU H F, SU Y L. Urban road travel time prediction considering impact of traffic event[J]. China Journal of Highway and Transport, 2021, 34(12): 229-238. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-ZGGL202112017.htm [19] 闫少华, 谢晓璇, 张兆宁. 基于小波优化GRU-ARMA模型的空中交通流量短时预测方法[J]. 交通信息与安全, 2022, 40(4): 177-184. doi: 10.3963/j.jssn.1674-4861.2022.04.019YAN Shaohua, XIE Xiaoxuan, ZHANG Zhaoning. A short-term prediction of air traffic flow based on a wavelet-optimized GRU-ARMA model[J]. Journal of Transport Information and Safety, 2022, 40(4): 177-184. (in Chinese) doi: 10.3963/j.jssn.1674-4861.2022.04.019 [20] SUN B, CHENG W, GOSWAMI P, et al. Short-term traffic forecasting using self-adjusting k-nearest neighbours[J]. IET Intelligent Transport Systems, 2018, 12(1): 41-48. [21] 杨国亮, 温钧林, 赖振东, 等. 基于速度门控时空图卷积网络的交通流预测[J]. 传感器与微系统, 2022, 41(8): 128-30+35. https://www.cnki.com.cn/Article/CJFDTOTAL-CGQJ202208032.htmYANG G L, WEN J L, LAI Z D, et al. Traffic flow prediction based on speed gated spatiotemporal graph convolution network[J]. Transducer and Microsystem Technologies, 2022, 41(8): 128-130+135. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-CGQJ202208032.htm