Evolving a Bayesian network model with information flow for time series interpolation of multiple ocean variables

Ming Li Ren Zhang Kefeng Liu

Ming Li, Ren Zhang, Kefeng Liu. Evolving a Bayesian network model with information flow for time series interpolation of multiple ocean variables[J]. Acta Oceanologica Sinica, 2021, 40(7): 249-262. doi: 10.1007/s13131-021-1734-1
Citation: Ming Li, Ren Zhang, Kefeng Liu. Evolving a Bayesian network model with information flow for time series interpolation of multiple ocean variables[J]. Acta Oceanologica Sinica, 2021, 40(7): 249-262. doi: 10.1007/s13131-021-1734-1

doi: 10.1007/s13131-021-1734-1

Evolving a Bayesian network model with information flow for time series interpolation of multiple ocean variables

Funds: The National Natural Science Foundation of China under contract Nos 41875061 and 41976188; the “Double First-Class” Research Program of National University of Defense Technology under contract No. xslw05.
More Information
    • 关键词:
    •  / 
    •  / 
    •  / 
    •  
  • Figure  1.  Technique flowcharts of IFBN.

    Figure  2.  0/1 optimization solution of network.

    Figure  3.  Optimization result of GS algorithm.

    4.  Interpolation results with the data missing rate of 40%: WD (a), SST (b), SLP (c), SAL (d), RH (e), DEN (f).

    Figure  5.  MRE of each model under different missing rate.

    Figure  6.  R of each model with different length of consecutive missing data.

    Figure  7.  MRE of each model with different length of consecutive missing data.

    A1.  Interpolation results with the data missing rate of 50%: WD (a), SST (b), SLP (c), SAL (d), RH (e), DEN (f).

    A2.  Interpolation results with the data missing rate of 60%: WD (a), SST (b), SLP (c), SAL (d), RH (e), DEN (f).

    Algorithm 1 GS algorithm
    Input$ V $ is variable set, $ D $ is complete data set of $ V $, $ {G}_{0} $ is an initial structure
    Output$ G $ is the optimal structure
    Step 1Score the initial network structure ${G}_{0}\!\to\! oldscore$;
    Step 2Perform arc addition, arc reduction and determine arc direction by IF, score the new network structure $ {G}'\!\to\! tempscore $;
    if $ tempscore\!>\!oldscore $
    $ newscore\!\equiv\! tempscore $ and keep the corresponding arc operation;
    else
    $ newscore\!\equiv \!oldscore $ and discard the corresponding arc operation;
    end if
    Step 3If $ newsore\!\to\! max $
    return $ G\equiv {G}\!\to '\! $
    下载: 导出CSV

    Table  1.   Discretization standard of ocean variables

    Ocean variable
    WDSSTSLPSALRHDEN
    Interval1 m/s0.1°C1 Pa0.1‰1%0.1 kg/m3
    State label1–101–261–91–171–241–19
    下载: 导出CSV

    Table  2.   Discrete training time series

    Variable Training data
    1 d2 d3 d4 d5 d···500 d600 d
    WD/(m·s–1)32335···25
    SST/°C2524242224···1918
    SLP/Pa23233···67
    SAL/‰151515108···1515
    RH/%812141710···97
    DEN/(kg·m–3)10111176···1212
    下载: 导出CSV

    Table  3.   Standardized IF matrix

    VariableWDSSTSLPSALRHDEN
    WD\ 0.133 6–0.002 5–0.012 6 0.023 8 0.087 0
    SST–0.000 3\ 0.005 2 0.194 7 0.109 9 0.239 1
    SLP 0.011 9–0.011 8\ 0.027 2 0.006 0–0.023 9
    SAL 0.003 0 0.119 3 0.037 1\ 0.017 1 0.088 6
    RH 0.039 4 0.096 2 0.006 7–0.020 2\–0.018 6
    DEN–0.005 2 0.259 3 0.028 1 0.347 4 0.066 9\
    Note: \ means it cannot be calculated.
    下载: 导出CSV

    Table  4.   Conditional probability distribution of node $ {\rm{SST}} $

    Condition$P \left(\mathrm{S}\mathrm{S}\mathrm{T}\right|\mathrm{W}\mathrm{D})$
    $ \mathrm{W}\mathrm{D}\!=\!1 $ $ \mathrm{W}\mathrm{D}\!=\!2 $$ \mathrm{W}\mathrm{D}\!=\!3 $$ \mathrm{W}\mathrm{D}\!=\!4 $$ \mathrm{W}\mathrm{D}\!=\!5 $$ \mathrm{W}\mathrm{D}\!=\!6 $$ \mathrm{W}\mathrm{D}\!=\!7 $$ \mathrm{W}\mathrm{D}\!=\!8 $$ \mathrm{W}\mathrm{D}\!=\!9 $$ \mathrm{W}\mathrm{D}\!=\!10 $
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!1 $0000000.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!2 $000000.029 00.064 50.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!3 $00000.014 30.014 50000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!4 $00000.028 60.058 00.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!5 $0000.012 20.014 30.014 50.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!6 $000.021 30.024 40.028 60.029 00.032 30.153 800
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!7 $00.062 500.048 80.042 90.058 00.096 80.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!8$00.031 30.021 30.012 20.042 90.043 50.032 30.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!9 $00.031 30.042 60.109 80.014 30.014 500.153 800
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!10$0.10.031 30.042 60.024 40.028 60.014 500.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!11 $00.031 30.021 30.012 20.028 60.072 50.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!12 $000.063 80.048 80.071 40.087 000.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!13 $0.100.085 10.048 80.028 60.043 50.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!14 $0.1000.036 60.057 10.058 00.096 80.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!15 $00.031 30.063 80.048 80.057 10.014 50.064 50.076 90.333 30
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!16 $00.031 30.042 60.048 80.014 30.014 50.064 5000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!17$000.042 60.012 20.028 60.043 50000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!18 $0.20.031 30.085 10.073 20.114 30.014 50.064 000.333 30
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!19 $0.10.156 30.021 30.122 00.128 60.072 500.076 900
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!20 $00.062 50.063 80.085 40.014 30.058 00.096 8000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!21 $0.20.187 50.148 90.134 10.071 40.058 00.032 30.076 901
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!22$00.031 30.042 60.036 60.028 60.058 00.064 5000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!23 $0.10.125 00.063 80.012 20.028 60.072 50.032 3000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!24 $00.093 80.021 30.048 80.085 70.029 00.096 800.333 30
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!25 $0.10.031 30.085 100.028 60.029 00000
    $ \mathrm{S}\mathrm{S}\mathrm{T}\!=\!26 $00.031 30.021 30000000
    Note: WD=1, 2, ···, 10, and SST=1, 2, ···, 26 indicate WD and SST take different discrete states.
    下载: 导出CSV

    Table  5.   TIC of each model under different missing rate

    Missing rateInterpolation modelWDSSTSLPSALRHDEN
    40%IFBN0.091 50.001 10.001 40.000 90.011 50.001 7
    CSI0.182 90.005 30.002 70.002 20.022 10.004 3
    BP0.095 60.003 30.002 50.001 60.014 10.003 5
    CBN0.089 70.003 60.002 20.001 80.013 90.003 3
    50%IFBN0.117 10.001 40.001 50.001 10.017 50.001 3
    CSI0.177 60.013 20.006 10.005 60.039 50.009 8
    BP0.119 40.005 80.004 30.003 40.025 20.014 3
    CBN0.112 10.005 60.004 50.003 70.031 10.012 7
    60%IFBN0.106 50.003 70.002 40.002 20.018 90.001 8
    CSI0.215 80.015 40.010 90.009 30.062 90.013 1
    BP0.138 20.008 30.007 40.006 50.040 70.012 1
    CBN0.141 20.007 90.007 50.006 10.039 60.011 9
    Note: The bold numbers are the results obtained by proposed model.
    下载: 导出CSV

    Table  6.   R of each model under different missing rate

    Missing rateInterpolation modelWDSSTSLPSALRHDEN
    40%IFBN0.811 50.989 50.987 20.987 10.788 10.992 7
    CSI0.731 50.926 90.941 20.975 90.719 90.968 1
    BP0.636 60.915 70.935 30.985 90.720 50.973 6
    CBN0.701 10.914 80.939 80.984 60.731 60.975 4
    50%IFBN0.678 60.988 80.972 30.993 50.690 80.990 4
    CSI0.651 00.807 20.838 10.867 00.578 40.839 9
    BP0.512 70.943 80.905 70.975 50.596 10.816 5
    CBN0.611 40.915 20.902 10.969 80.601 30.824 1
    60%IFBN0.667 80.934 20.923 10.929 10.654 70.982 7
    CSI0.585 90.716 40.793 60.860 20.517 00.768 0
    BP0.618 30.883 20.826 50.973 80.574 40.798 3
    CBN0.609 70.892 10.799 80.976 10.584 20.801 1
    Note: The bold numbers are the results obtained by proposed model.
    下载: 导出CSV

    Table  7.   The Maximum and averaged length of the consecutive missing data for each variable at different missing rate

    Missing rateStatistics lengthWDSSTSLPSALRHDEN
    40%maximum/d4.005.004.004.004.003.00
    average/d1.591.921.781.781.481.32
    50%maximum/d6.005.007.007.006.007.00
    average/d2.322.463.563.913.642.98
    60%maximum/d8.008.007.009.008.009.00
    average/d3.785.234.165.354.625.01
    下载: 导出CSV

    Table  8.   The imformation of new experiment data

    Position namePeriedLatitude-longitude coordinate
    AJan. 1, 2013 to Jun. 30, 20155°N, 110°W
    BApr. 1, 2009 to Jul. 31, 201212°N, 70°W
    CMay 1, 2010 to Jan. 31, 201315°S, 90°E
    下载: 导出CSV

    Table  9.   Comparative analysis of interpolation results with different model in Position A

    Evaluation indicatorInterpolation modelWDSSTSLPSALRHDEN
    MREIFBN0.167 70.036 50.037 90.027 10.134 40.028 7
    CSI0.245 80.065 40.093 90.042 30.229 10.062 1
    BP0.191 50.056 40.041 20.035 90.179 90.038 1
    TICIFBN0.140 90.002 60.002 20.000 50.023 90.002 7
    CSI0.179 40.005 80.004 30.004 40.075 20.014 3
    BP0.147 60.003 20.003 10.002 60.049 50.009 8
    RIFBN0.618 60.968 80.960 20.950 80.633 30.910 3
    CSI0.582 70.923 80.835 70.865 50.514 40.736 5
    BP0.601 20.946 90.908 10.917 10.577 10.819 9
    Note: The bold numbers are the results obtained by proposed model.
    下载: 导出CSV

    Table  10.   Comparative analysis of interpolation results with different model in Position B

    Evaluation indicatorInterpolation modelWDSSTSLPSALRHDEN
    MREIFBN0.169 80.031 80.040 00.025 50.134 90.027 2
    CSI0.241 10.064 80.096 40.043 20.225 50.059 1
    BP0.189 30.055 20.039 00.033 10.176 40.035 6
    TICIFBN0.136 40.005 30.002 10.000 30.021 50.003 9
    CSI0.175 40.008 80.005 90.002 00.078 60.014 0
    BP0.150 80.003 10.002 90.002 70.047 00.008 3
    RIFBN0.620 50.968 70.956 40.952 80.636 40.913 6
    CSI0.580 90.923 30.835 70.869 40.511 80.737 4
    BP0.605 70.948 40.912 70.921 70.581 40.820 4
    Note: The bold numbers are the results obtained by proposed model.
    下载: 导出CSV

    Table  11.   Comparative analysis of interpolation results with different model in Position C

    Evaluation indicatorInterpolation modelWDSSTSLPSALRHDEN
    MREIFBN0.171 90.039 30.036 00.022 90.130 50.031 7
    CSI0.243 70.069 70.094 20.039 60.233 70.061 4
    BP0.194 10.052 70.037 90.040 00.174 90.042 2
    TICIFBN0.143 40.003 30.003 20.000 40.026 60.003 5
    CSI0.178 20.005 50.001 90.007 70.078 40.011 9
    BP0.148 30.003 70.004 60.003 10.053 20.006 3
    RIFBN0.614 40.967 20.962 10.955 80.629 10.906 7
    CSI0.578 20.920 40.838 20.861 30.513 40.740 2
    BP0.601 50.949 80.907 60.916 50.574 70.820 7
    Note: The bold numbers are the results obtained by proposed model.
    下载: 导出CSV
  • [1] Bai Chengzu, Hong Mei, Wang Dong, et al. 2014. Evolving an information diffusion model using a genetic algorithm for monthly river discharge time series interpolation and forecasting. Journal of Hydrometeorology, 15(6): 2236–2249. doi: 10.1175/JHM-D-13-0184.1
    [2] Barth A, Alvera-Azcárate A, Licer M, et al. 2020. DINCAE 1.0: a convolutional neural network with error estimates to reconstruct sea surface temperature satellite observations. Geoscientific Model Development, 13(3): 1609–1622. doi: 10.5194/gmd-13-1609-2020
    [3] Bouckaert R R. 1994. A stratified simulation scheme for inference in Bayesian belief networks. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence. Seattle, WA: Morgan Kaufmann Publishers Inc, 110–117
    [4] Bu Fanyu, Chen Zhikui, Zhang Qingchen. 2014. Incomplete big data imputation algorithm based on deep learning. Microelectronics & Computer (in Chinese), 31(12): 173–176
    [5] Chickering D M. 2003. Optimal structure identification with greedy search. The Journal of Machine Learning Research, 3(3): 507–554
    [6] Chickering M, Geiger D, Heckerman D. 1995. Learning Bayesian networks: search methods and experimental results. In: Proceedings of Fifth Conference on Artificial Intelligence and Statistics. Lauderdale, FL: Society for Artificial Intelligence in Statistics
    [7] Cooper G F, Herskovits E. 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4): 309–347
    [8] Gasca M, Sauer T. 2000. Polynomial interpolation in several variables. Advances in Computational Mathematics, 12(4): 377. doi: 10.1023/A:1018981505752
    [9] Gong Yi, Dong Chen. 2010. Data patching method based on Bayesian network. Journal of Shenyang University of Technology (in Chinese), 32(1): 79–83
    [10] Huang Rong, Hu Zeyong, Guan Ting, et al. 2014. Interpolation of temperature data in northern Qinghai-Xizang Plateau and preliminary analysis on its recent variation. Plateau Meteorology (in Chinese), 33(3): 637–646
    [11] Jiang Dong, Fu Jingying, Huang Yaohuan, et al. 2011. Reconstruction of time series data of environmental parameters: methods and application. Journal of Geo-Information Science (in Chinese), 13(4): 439–446. doi: 10.3724/SP.J.1047.2011.00439
    [12] Kaplan A, Kushnir Y, Cane M A. 2000. Reduced space optimal interpolation of historical marine sea level pressure: 1854−1992. Journal of Climate, 13(16): 2987–3002. doi: 10.1175/1520-0442(2000)013<2987:RSOIOH>2.0.CO;2
    [13] Li H. 2006. Lost data filling algorithm based on EM and Bayesian network. Computer Engineering and Applications, 46(5): 123–125
    [14] Li Ming, Hong Mei, Zhang Ren. 2018a. Improved Bayesian network-based risk model and its application in disaster risk assessment. International Journal of Disaster Risk Science, 9(2): 237–248. doi: 10.1007/s13753-018-0171-z
    [15] Li Haitao, Jin Guang, Zhou Jinglun, et al. 2008. Survey of Bayesian network inference algorithms. Systems Engineering and Electronics (in Chinese), 30(5): 935–939
    [16] Li Ming, Liu Kefeng. 2018. Application of intelligent dynamic Bayesian network with wavelet analysis for probabilistic prediction of storm track intensity index. Atmosphere, 9(6): 224. doi: 10.3390/atmos9060224
    [17] Li Ming, Liu Kefeng. 2019. Causality-based attribute weighting via information flow and genetic algorithm for naive Bayes classifier. IEEE Access, 7: 150630–150641. doi: 10.1109/ACCESS.2019.2947568
    [18] Li Ming, Liu Kefeng. 2020. Probabilistic prediction of significant wave height using dynamic Bayesian network and information flow. Water, 12(8): 2075. doi: 10.3390/w12082075
    [19] Li Ming, Zhang Ren, Hong Mei, et al. 2018b. Improved structure learning algorithm of Bayesian network based on information flow. Systems Engineering and Electronics (in Chinese), 40(6): 1385–1390
    [20] Liang Xiangsan. 2008. Information flow within stochastic dynamical systems. Physical Review: E, Statistical, Nonlinear, and Soft Matter Physics, 78(3): 031113
    [21] Liang Xiangsan. 2014. Unraveling the cause-effect relation between time series. Physical Review: E, Statistical, Nonlinear, and Soft Matter Physics, 90(5−1): 052150
    [22] Liang Xiangsan. 2015. Normalizing the causality between time series. Physical Review: E, Statistical, Nonlinear, and Soft Matter Physics, 92(2): 022126. doi: 10.1103/PhysRevE.92.022126
    [23] Liu Meiling, Liu Xiangnan, Liu Da, et al. 2015. Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm. Computers & Geosciences, 75: 44–56
    [24] Liu Dayou, Wang Fei, Lu Yinan, et al. 2001. Research on learning Bayesian network structure based on genetic algorithms. Journal of Computer Research & Development (in Chinese), 38(8): 916–922
    [25] Liu Tian, Yang Kun, Qin Jun, et al. 2018. Construction and applications of time series of monthly precipitation at weather stations in the central and eastern Qinghai-Tibetan Plateau. Plateau Meteorology (in Chinese), 37(6): 1449–1457
    [26] Liu Junna, Zhang Yousheng. 2006. An adaptive joint tree algorithm. In: System Simulation Technology and Its Application Academic Exchange Conference Proceedings. Hefei: China System Simulation Society
    [27] Pearl J. 1998. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Berlin: Elsevier Inc
    [28] Sheng Zheng, Shi Hanqing, Ding Youzhuan. 2009. Using DINEOF method to reconstruct missing satellite remote sensing sea temperature data. Advances in Marine Science (in Chinese), 27(2): 243–249
    [29] Shi Zhifu. 2012. Bayesian Network Theory and its Application in Military System (in Chinese). Beijing: Defense Industry Press
    [30] Wang Tong, Yang Jie. 2010. A heuristic method for learning Bayesian networks using discrete particle swarm optimization. Knowledge and Information Systems, 24(2): 269–281. doi: 10.1007/s10115-009-0239-6
    [31] Xu Zilong, Xing Zuoxia, Ma Shichang. 2018. Wind power data missing data processing based on adaptive BP neural network. In: Proceedings of the 15th Shenyang Scientific Academic Annual Meeting. Shenyang: Shenyang Science and Technology Association
    [32] Yao Zizhen. 2006. A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinformatics, 7(1): S11. doi: 10.1186/1471-2105-7-11
    [33] Zhang Chan. 2013. A support vector machine-based missing values filling algorithm. Computer Applications and Software (in Chinese), 30(5): 226–228
    [34] Zheng Chongwei, Chen Yunge, Zhan Chao, et al. 2019. Source tracing of the swell energy: A case study of the Pacific Ocean. IEEE Access, 7: 139264–139275. doi: 10.1109/ACCESS.2019.2943903
    [35] Zheng Chongwei, Liang Bingchen, Chen Xuan, et al. 2020. Diffusion characteristics of swells in the North Indian Ocean. Journal of Ocean University of China, 19(3): 479–488. doi: 10.1007/s11802-020-4282-y
    [36] Zhou Zhihua. 2016. Machine Learning (in Chinese). Beijing: Tsinghua University Press
    [37] Zhu Ke. 2016. Bootstrapping the portmanteau tests in weak auto-regressive moving average models. Journal of the Royal Statistical Society: Series B, 78(2): 463–485. doi: 10.1111/rssb.12112
  • 加载中
图(10) / 表(12)
计量
  • 文章访问数:  423
  • HTML全文浏览量:  136
  • PDF下载量:  14
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-06-01
  • 录用日期:  2020-09-21
  • 网络出版日期:  2021-06-09
  • 刊出日期:  2021-07-25

目录

    /

    返回文章
    返回