长江科学院院报 ›› 2019, Vol. 36 ›› Issue (6): 139-145.DOI: 10.11988/ckyyb.20171374

• 信息技术应用 • 上一篇    下一篇

水利工程灌浆大数据平台设计与实现

饶小康   

  1. 长江科学院 水利部岩土力学与工程重点实验室,武汉 430010
  • 收稿日期:2017-12-24 出版日期:2019-06-01 发布日期:2019-06-12
  • 作者简介:饶小康(1985-),男,湖北黄冈人,工程师,硕士,主要从事水利水电工程施工数字化、数据挖掘等方面的研究。E-mail:283139246@qq.com
  • 基金资助:
    国家重点研发计划项目(2017YFC1502600)

A Big Data Platform for Grouting of Water Conservancy Project: Design and Implementation

RAO Xiao-kang   

  1. Key Laboratory of Geotechnical Mechanics and Engineering of Ministry of Water Resources, Yangtze River Scientific Research Institute, Wuhan 430010,China
  • Received:2017-12-24 Published:2019-06-01 Online:2019-06-12

摘要: 随着云计算、大数据、物联网的发展,水利工程各类采集数据与日俱增,面对如此大规模的数据集,传统存储、计算相关的理论和方法已不能满足海量、多源、异构数据的存取与处理。针对水利工程灌浆大数据,设计平台总体架构,搭建Hadoop分布式集群,设计并行化数据挖掘算法,实现水利工程灌浆大数据平台,并基于B/S服务模式进行平台展现、应用和管理。平台功能模块主要包括数据资源下载、数据集上传与运行、自定义算法、运行状态及结果和大数据可视化等,并结合白鹤滩水利工程建立基于随机森林的灌浆工程单位注入量预测模型和基于K-Means聚类的灌浆成果异常检测模型进行应用示范。平台的设计与实现融合水利工程结构化与非结构化数据,将大数据集群并行计算和数据挖掘技术应用到水利工程中,改变传统随机抽样和单一挖掘分析模型,采用多粒度、多层次、多渠道的分析模型对数据全量进行挖掘分析,从海量数据中挖掘分析出于管理、决策和生产有用的信息,实现了数据资源的集成共享、业务的高效处理、数据信息的知识发现,提高了数据存储和处理效率和精度,为当前水利工程大数据的存储与计算提供一种新的解决思路

关键词: 大数据平台, 水利工程, 灌浆, Hadoop, Spark, 随机森林, K-Means

Abstract: The ever-rising quantity of collected data of water conservancy project together with the development of cloud computing, big data and Internet of Things poses higher demands for the storage and processing of massive, multi-source and heterogeneous data that traditional theories and methods could not meet. In this research, a big data platform for grouting data of water conservancy project is designed based on B/S service mode for display, operation, and management. The functional modules of the platform mainly include data resource downloading, data set uploading and running, customized algorithms, as well as visualization of running status and results and big data.Moreover, the platform was applied for demonstration with Baihetan water conservancy project as a case study. A model for predicting the grouting injection amount per unit based on random forest together with a model of anomaly detection of grouting result based on K-Means clustering was built.By integrating structural and unstructured data and by adopting Hadoop distributed cluster and parallelized data mining algorithm, the platform could achieve integrated sharing of data resource, effective processing, knowledge discovery of data information, and improves the efficiency and accuracy of data storage and processing. This research offers a new thinking for the big data storage and computing of water conservancy project

Key words: big data platform, water conservancy project, grouting, Hadoop, Spark, random forest, K-Means

中图分类号: