PDF(10984 KB)
Flood Inundation Range Extraction and Comparative Analysis Based on Segformer Model
ZHANG Yun-kang, LIU Yi, XIAO Wan, QIN Yang-yang, CHENG Cong, PENG Xu
Journal of Changjiang River Scientific Research Institute ›› 2025, Vol. 42 ›› Issue (10) : 165-173.
PDF(10984 KB)
PDF(10984 KB)
Flood Inundation Range Extraction and Comparative Analysis Based on Segformer Model
[Objective] With the development of deep learning and computer vision technologies, the application of machine vision in flood monitoring has gradually become a research hotspot. This study aims to overcome the limitations of traditional manual and satellite remote sensing methods in flood monitoring and early warning, including insufficient accuracy and high costs, and to explore the advantages and limitations of the Segformer model in extracting flood boundaries, while proposing future research directions and improvement strategies. [Methods] Using deep learning and machine vision techniques, a specialized “RiverDataset” for flood monitoring was constructed, and the performance of the Segformer model in extracting flood inundation range was evaluated based on this dataset. Additionally, the Segformer model was compared with the U-Net models based on ResNet50 and VGG16 to assess their performance in water body segmentation tasks. Taking Shashi District, Jingzhou City, Hubei Province as an example, UAV remote sensing imagery was used to accurately extract the flood contours in the region. [Results] The U-Net (VGG16) model demonstrated excellent performance on the training set but was slightly inferior to the Segformer model on the validation and test sets. The Segformer model achieved superior performance across most indicators, particularly outperforming the U-Net (ResNet50) model in complex scenarios. Although the U-Net (ResNet50) model achieved a slightly higher IoU, its higher loss value and lower mIoU and mAP indicated that its overall performance was inferior to that of the Segformer model and the U-Net model (VGG16). Consequently, the U-Net model based on VGG16 performed well across all evaluation indicators, demonstrating strong fitting capability during training. However, when processing complex water bodies, this model struggled to capture contextual information in narrow regions due to its limited receptive field, leading to frequent occurrences of information voids. Furthermore, the U-Net model failed to effectively eliminate gridding effects, compromising the local consistency of feature information. In contrast, the Segformer model did not exhibit this issue. This difference was mainly due to the relatively small receptive field of the convolutional kernels used in the U-Net model, which restricted its ability to interpret contextual information within narrow regions and hindered the establishment of long-range information continuity. The Segformer model, not constrained by the limited receptive field of conventional convolutional kernels, could better capture broader contextual information in the image. Additionally, the U-Net model failed to effectively eliminate the impact of the grid effect on feature information, resulting in extracted features that disrupted the local consistency of information. The absence of structures for perceiving local regional information made the model incapable of effectively eliminating the influence of interfering objects in complex aquatic environments, particularly in scenarios requiring precise boundary extraction. [Conclusions] In complex aquatic environments, the Segformer model demonstrates superior segmentation performance and robustness. This study validates the efficiency of the Segformer model in extracting flood inundation range, highlighting its potential for practical water body monitoring applications. Future research should further optimize the model, expand the dataset, and explore its potential for real-time application to enhance the efficiency and accuracy of flood early warning systems.
flood monitoring / visual segmentation / Transformer structure / Segformer model / U-Net model / precise boundary segmentation
| [1] |
董耀华. 2016洪水+长江中下游防洪与治河问题再探[J]. 长江科学院院报, 2020, 37(1): 1-6, 12.
基于2016—2019年长江防汛与河道查勘新实践、长江中下游洪水与河势的新变化与新形势,以及以往研究基础与成果,总结提出了2016洪水<sup>+</sup>长江中下游防洪与治河10大新问题,即流域水沙变化、河流情势变化、水文节律改变、河道洲滩演变、干流槽蓄变化、江湖关系影响、水位流量关系、防洪布局调整、河湖治理技术,以及河流管护研究。采用“原因Ⅰ-响应Ⅱ-影响Ⅲ-对策Ⅳ”4级体系,对10大新问题进行了体系分类、关系构建、课题分解与研究探讨,构建了“之字形层次驱动”和“田字形关联影响”2组关系。研究增进了三峡工程对长江中下游洪水与河势影响的辨识,强化了对变化条件下长江中下游防洪与治河问题的“层次化”与“系统化”梳理。
(
According to new practices of flood control and river survey of the Yangtze River during 2016-2019, changes and new situations of river floods and river regime, as well as previous researches and achievements, the author presents ten key issues on flood control and river regulation for the middle and lower Yangtze River after the 2016-type flood. Such key issues include: variation of river discharge and sediment load; change of river regime; change of hydrological rhythm; fluvial processes; change of river storage; relationship of river and lakes; relationship between stage and discharge; adjustment of flood control strategy; technology of river regulation; and research on river management and protection. The ten issues are sorted and decomposed into a four-step framework, namely, Cause-Effect-Impact-Response. The four-step framework contains two sets of relations (a z-shaped driving mode relation and a square-shaped intercorrelated relation). The research finding enhances understanding on the impacts of the Three Gorges Project on river flood and river regime in the middle and lower Yangtze River, and also helps strengthen the hierarchical and systematic understanding on issues of flood control and river regulation.
|
| [2] |
李菊, 崔东文. 基于数据分解与斑马算法优化的混合核极限学习机月径流预测[J]. 长江科学院院报, 2024, 41(6):42-50.
为提高月径流预测精度,改进混合核极限学习机(HKELM)预测性能,提出小波包分解(WPT)-斑马优化算法(ZOA)-HKELM组合模型。利用WPT处理月径流时序数据,构建局部高斯径向基核函数和全局多项式核函数相混合的HKELM;通过ZOA优化HKELM超参数(正则化参数、核参数、权重系数),建立WPT-ZOA-HKELM组合模型,并构建WPT-遗传算法(GA)-HKELM、WPT-灰狼优化(GWO)算法-HKELM、WPT-鲸鱼优化算法(WOA)-HKELM、WPT-ZOA-极限学习机(ELM)、WPT-ZOA-最小二乘支持向量机(LSSVM)、ZOA-HKELM作对比模型,通过黑河流域莺落峡、讨赖河水文站月径流时间序列预测实例对各模型进行检验。结果表明:①莺落峡、讨赖河水文站月径流时间序列WPT-ZOA-HKELM模型预测的平均绝对百分比误差分别为1.054%、0.761%,决定系数均达0.999 9,优于其他对比模型,具有更高的预测精度,预测效果更好。②利用ZOA优化HKELM超参数,可提高HKELM预测性能,优化效果优于GWO、WOA、GA。③预测模型能充分发挥WPT、ZOA和HKELM优势,提高月径流预测精度;在相同分解和优化情形下,HKELM的预测性能优于ELM、LSSVM。
(
|
| [3] |
刘美琴, 王子麟. 基于域适应的图像语义分割综述[J]. 北京交通大学学报, 2024, 48(2): 1-9.
(
|
| [4] |
毕蓉蓉, 王进科. CT图像下结合RCNN与U-Net的肺实质自动分割方法[J]. 哈尔滨理工大学学报, 2021, 26(3): 74-82.
(
|
| [5] |
李捷. 基于RGBD数据的三维场景形状补全与语义分割方法研究[D]. 南京: 南京理工大学, 2020.
(
|
| [6] |
李严, 董坤. 基于计算机视觉和DNN的运动姿态检测算法[J]. 电子设计工程, 2024, 32(11): 46-50.
(
|
| [7] |
车万翔, 窦志成, 冯岩松, 等. 大模型时代的自然语言处理: 挑战、机遇与发展[J]. 中国科学: 信息科学, 2023, 53(9): 1645-1687.
(
|
| [8] |
史童月, 王中卿. 基于Transformer的自然语言处理预训练语言模型概述[J]. 信息与电脑(理论版), 2022, 34(10): 52-56.
(
|
| [9] |
|
| [10] |
闫烁月, 王庆, 钟康, 等. 融合全局多层次特征的跨尺度河流精准识别方法[J]. 中国农村水利水电, 2024(6):10-20.
高分辨率遥感影像中河流自动化精准识别,在河湖环境监测和流域变化研究等方面具有重要意义和研究价值。然而,因河流在影像中面积占比较小,易造成数据集正负样本不平衡。此外,河流具有形态多变和尺度变换复杂等特点,导致河流识别易出现边界不连续和格网效应等问题。基于此,提出一种融合全局多层次特征的跨尺度河流精准识别方法。首先,选取全球具有明显特征的曲流河和辫状河,创建多特征河流数据集,以此增加数据多样性。其次,以轻量级语义分割模型Segformer为主干网络搭建R-Seg模型,设计全局多层次特征提取GASPP模块,通过各阶段与Transformer级联提取多尺度特征,使得模型能更好捕捉河流影像上下文特征信息,减少信息损失并放大全局维度交互特征。最后,提出基于掩膜加权投票的跨尺度河流影像预测方法,通过对大场景河流影像进行滑窗裁剪,将各单元预测块与特定掩膜加权相乘得到子预测结果,并按照重叠投票方式依次拼接组成最终结果,实现不同尺度河流影像精准识别。实验证明,在所构建包含曲流河和辫状河的多特征数据集中,通过与其他方法对比可发现:在定性方面,R-Seg整体网络结构既能确保主干河流的识别精度,又能缓解细小河流断流现象,有效平滑河流边界,对500×500小尺度河流影像识别具有较好的鲁棒性;此外,采用掩膜加权投票方法,能有效减少格网效应造成的单元图块边缘缺失问题,充分利用单元图块预测结果,提升对更大场景遥感影像的适应能力和河流预测精度,实现不同尺度河流影像精准识别。从定量角度,方法各类精度评价指标相对最优,总体精度可达99.49%;其次,对单张影像识别时间不到1 s,效率可满足大多数实际要求。此外,相比于纯粹重叠预测策略,掩膜加权投票预测策略的河流识别总体精度高约0.28%~6.93%;通过调整重叠度参数可发现,重叠度与精度并非正相关,大约在12.5%精度能达到相对最优。方法通过设计R-Seg网络模型和提出掩膜加权投票预测方法,能一定程度上减少河流边界识别不连续和格网效应等问题,有效提升不同场景下遥感影像河流识别精度,具有较好的鲁棒性和目视效果,识别结果对河流地质勘探及流域变化等有重要应用价值。
(
Automated precise identification of rivers in high-resolution remote sensing images holds significant importance and research value in river and lake environmental monitoring, as well as watershed change studies. However, due to the relatively small area occupied by rivers in the images, it can lead to an imbalance between positive and negative samples in the dataset. Additionally, the morphological variability and complex scale transformations inherent in rivers contribute to challenges in river identification, resulting in issues such as discontinuous boundaries and grid effects. In response to these challenges, this paper proposes a cross-scale river precise identification method with fusion of global multilevel features. The method can be divided into three main parts. Firstly, we construct a multi-feature river dataset by selecting globally distinctive meandering and braided rivers to enhance data diversity. Secondly, we construct the R-Seg model, utilizing the lightweight semantic segmentation model Segformer as the backbone network. We design the Global and Adaptive Scale Pyramid Pooling (GASPP) module for extracting multi-scale features. This module, coupled with Transformers, facilitates the extraction of multi-scale features, enabling the model to capture contextual information in river images, reduce information loss, and amplify global dimension interaction features. Lastly, we propose a cross-scale river image prediction method based on mask-weighted voting. By employing sliding window cropping on large-scale river images, we obtain sub-prediction results by multiplying each unit prediction block with a specific mask weight. These results are then sequentially concatenated through overlapping voting, achieving precise identification of river images at different scales. The experiments demonstrate that, in the constructed multi-feature dataset encompassing meandering and braided rivers, a comparative analysis with other methods reveals the following: qualitatively, the overall structure of the R-Seg network ensures high identification accuracy for main rivers and effectively mitigates interruptions in smaller river flows, smoothing river boundaries with good robustness for 500×500 small-scale river image identification. Moreover, the use of mask-weighted voting method significantly reduces the edge loss problem caused by grid effects in unit blocks, making full use of unit block prediction results, improving river prediction accuracy for larger scenes, and achieving accurate identification of river images of different scales. From a quantitative perspective, the method achieves an overall accuracy of 99.49% with optimal performance across various accuracy evaluation metrics. Also, the single-image identification time is less than 1 second, meeting the efficiency requirements of most practical applications. Furthermore, the mask-weighted voting strategy exhibits an overall higher river identification accuracy of approximately 0.28% to 6.93% compared to a pure overlap prediction strategy. By adjusting the overlap parameter, it is observed that accuracy and overlap are not positively correlated; an accuracy of approximately 12.5% achieves relative optimization. This approach, through the design of the R-Seg network model and the introduction of the mask-weighted voting prediction method, effectively alleviates issues such as discontinuity in river boundary recognition and grid effects. It significantly enhances the accuracy of river identification in remote sensing images across diverse scenarios, demonstrating strong robustness and visual performance. The identification outcomes hold crucial application value in geological exploration of rivers and studies on watershed changes. |
| [11] |
张昊, 何灵敏, 潘晨. 改进的SegFormer遥感图像语义分割网络[J]. 计算机工程与应用, 2023, 59(24): 248-258.
随着遥感技术的发展,遥感图像的语义分割在城乡资源管理、城乡规划等领域有着更为广泛的应用。因为小型无人机在遥感数据采集方面具有成本效益、灵活性和操作便捷等优势,所以使用无人机拍摄图像已经成为收集遥感图像数据集的首选方法。由于小型无人机低空斜角拍摄的特性,相较于传统遥感拍摄设备获取的图片,无人机图片目标细节信息更加丰富、目标关系更加复杂的特性导致基于局部卷积的传统深度学习模型无法再胜任此项工作。针对上述问题,提出了基于SegFormer的改进遥感图像语义分割网络。基于SegFormer,在编码层额外添加轮廓提取模块(edge contour extraction module,ECEM)辅助模型提取目标的浅层特征。鉴于城市遥感图像建筑物居多的特点,在编码层额外添加使用多尺度条纹池化(multi-scale strip pooling,MSP)替换全局平均池化的多尺度空洞空间卷积池化金字塔(multi-scale atrous spatial pyramid pooling,MSASPP)模块来提取图像中的长条状目标特征。针对原始解码器操作不利于特征信息还原的缺点,参考U-Net网络解码层的结构,将编码层接收到的特征融合之后再执行上采样提取以及SE通道注意力操作,以此加强特征的传播和融合。改进网络在国际摄影测量与遥感学会(International Society for Photogrammetry and Remote Sensing,ISPRS)提供的Vaihingen和无人机遥感图像语义分割数据集UAVid上进行了实验,网络分别取得了90.30%和77.90%的平均交并比(mean intersection over union,MIoU),比DeepLabV3+、Swin-Unet等通用分割网络具有更高的分割精确度。
(
With the development of remote sensing technology, semantic segmentation of remote sensing images has also been widely applied in urban and rural resource management, urban and rural planning, and other fields. Due to its advantages of cost effectiveness, flexibility and convenience of operation in remotely sensed data acquisition, using a small UAV to take images has become the preferred method for collecting remotely sensed image datasets. Due to the special properties of small UAVs (autonomous vehicles) during low-altitude, oblique photography, UAV image feature detail is more rich, while its relationship to target is more complex compared to the traditional remote sensing equipment. Therefore, traditional deep learning models based on local convolution can no longer fulfill this task. In response to the above issues, an improved remote sensing image semantic segmentation network based on SegFormer is proposed. Based on SegFormer, an additional edge contour extraction module(ECEM) is added to the coding layer to assist the model in extracting shallow features of the target. Due to the predominance of buildings in urban remote sensing images, an additional multi-scale atrus spatial pyramid pooling(MSASPP) module is added to the encoding layer to replace the global average pooling with multi-scale strip pooling(MSP) to extract the features of elongated targets in the image. In response to the drawback of the original decoder operation that is unfavorable for feature information restoration, refer to the structure of the U-Net network decoding layer, the features received by the encoding layer are fused before performing upsampling extraction and SE channel attention operations to strengthen feature propagation and fusion. The improved network is tested on the Vaihingen and UAVid remote sensing image semantic segmentation dataset provided by the International Society for Photogrammetry and Remote Sensing(ISPRS). The network achieves 90.30% and 77.90% mean intersection over union(MIoU), respectively, with higher segmentation accuracy than general segmentation networks such as DeepLabV3+ and Swin-Unet.
|
| [12] |
张琳翔, 郭勇, 王浩宇, 等. 基于DE-Segformer的无人机影像农业大棚信息提取方法[J]. 测绘工程, 2024, 33(2): 56-64.
(
|
| [13] |
祁昌贤, 任燕, 彭海月, 等. 基于GEE云平台的三江源湖泊面积提取及动态变化[J]. 长江科学院院报, 2023, 40(7): 179-185, 190.
基于GEE(Google Earth Engine)遥感云计算平台和Landsat TM、ETM+和OLI卫星影像数据,采用修正归一化水体指数(MNDWI)、归一化植被指数(NDVI)和增强植被指数(EVI)等多个指数的综合水体识别算法,提取三江源区1990—2020年大于1 km<sup>2</sup>的湖泊面积,结合气象、冰川编目、冻土分布等数据分析了湖泊面积变化及其影响因素。结果表明:1990年以来,三江源区湖泊增加了46个,湖泊面积从10 811.8 km<sup>2</sup>增加到12 449.53 km<sup>2</sup>,增长了15%;其中,黄河源区湖泊面积增长了10%,长江源区湖泊面积增长了29%,长江源区较黄河源湖泊面积增幅更明显。平均气温升高和降水量增加是湖泊面积增加的主要因素;气温升高导致冰川退缩和冻土退化,使得冰川补给型湖泊和冻土区湖泊面积增加更快,这就是长江源区湖泊增长更为明显的主因,而黄河源湖泊面积增长与降水变化联系更为紧密。
(
|
| [14] |
冯传勇, 张振军, 郑亚慧. 新型实时水边线测绘系统研究与开发[J]. 长江科学院院报, 2021, 38(11): 162-166.
针对传统人工接触式测量方式获取水边线难度大,航测方式临水作业存在风险且空域审批手续复杂,卫星遥感技术时效性差、数据处理自动化程度低等问题,提出基于民用航海雷达技术的新型水边线测绘方法。利用航海雷达、GNSS罗经集成实时水边线测绘系统,构建雷达全自动数据采集与处理理论及技术框架,完成实时水边线测绘系统的研发。典型应用结果表明,基于航海雷达的实时水边线测绘系统绝对定位误差最大值为1.19 m,中误差为0.70 m;重复测量误差最大值为0.97 m,中误差为0.59 m,精度可满足1∶5 000及以下比例尺地形图水边线测绘的要求;利用本系统可实现低成本、实时化、高效率、自动化测绘水边线,有效解决以往作业方式的不足。
(
Traditional method of manual contact measurement is hard to obtain water line; aerial survey method is risky and airspace approval procedures are complex; satellite remote sensing technology is of low time-efficiency and low degree of automation of data processing. In view of this, a water line surveying and mapping system integrating civil marine radar and GNSS compass is proposed. The theory and technical framework of automatic radar data acquisition and processing are constructed to complete the real-time water line surveying and mapping system. Typical application results unveil that the maximum absolute positioning error of the system based on marine radar is 1.19 m, and the mean square error is 0.70 m; the maximum repeated measurement error is 0.97 m, and the mean square error is 0.59 m. The accuracy of the system meets the requirements of water line surveying and mapping of 1∶5 000 and below scale topographic map. The system effectively overcomes the shortcomings of traditional operation modes with its low-cost, real-time, high efficiency and automation.
|
| [15] |
莫露, 邓汝艳. 基于Sentinel-1数据的河北涿州洪水监测研究[J]. 中国减灾, 2024(8):60-61.
(
|
| [16] |
赵琳, 董端, 刘金玉. 多源遥感技术在洪水监测中的应用研究[J]. 水利建设与管理, 2024, 44(4): 57-63.
(
|
| [17] |
李聪妤, 刘家奇, 刘欣鑫, 等. 适应复杂区域的时序SAR影像洪水监测与分析[J]. 遥感学报, 2024, 28(2): 346-358.
(
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
黄梦凡. 基于Transformer的高光谱图像分类方法[J]. 中国新通信, 2023, 25(21): 75-77.
(
|
| [22] |
|
| [23] |
宋文宣, 彭代锋. 一种改进全卷积网络的遥感影像变化检测[J]. 遥感信息, 2022, 37(6): 130-136.
(
|
| [24] |
李亦湘, 苏国韶, 黄涌, 等. 基于SegFormer语义分割网络的桥梁裂缝检测模型[J]. 电子技术应用, 2023, 49(11): 94-99.
(
|
/
| 〈 |
|
〉 |