Java知识分享网 - 轻松学习从此开始!    

Java知识分享网

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus权限系统实战课程 震撼发布        

最新Java全栈就业实战课程(免费)

AI人工智能学习大礼包

IDEA永久激活

66套java实战课程无套路领取

锋哥开始收Java学员啦!

Python学习路线图

锋哥开始收Java学员啦!
当前位置: 主页 > Java文档 > 人工智能AI >

Swin3D:一个用于3D室内场景理解的预先训练的Transformer主干 PDF 下载


分享到:
时间:2025-05-31 11:01来源:http://www.java1234.com 作者:转载  侵权举报
Swin3D:一个用于3D室内场景理解的预先训练的Transformer主干
失效链接处理
Swin3D:一个用于3D室内场景理解的预先训练的Transformer主干  PDF 下载

 
 
相关截图:
 

主要内容:
 

 

. Introduction
Pretrained backbones with fine-tuning have been widely
applied to various 2D vision and NLP tasks [132103],
where a backbone network pretrained on a large dataset is
concatenated with task-specific back-end and then fine-tuned
for different downstream tasks. This approach demonstrates
*
Interns at Microsoft Research Asia. †Contact person.
its superior performance and great advantages in reducing
the workload of network design and training, as well as the
amount of labeled data required for different vision tasks.
In the work, we present a pretrained 3D backbone, named
SWIN3D, for 3D indoor scene understanding tasks. Our
method represents the 3D point cloud of an input 3D scene as
sparse voxels in 3D space and adapts the Swin Transformer
[30] designed for regular 2D images to unorganized 3D
points as the 3D backbone. We analyze the key issues that
prevent the na¨ıve 3D extension of Swin Transformer from
exploring large models and achieving high performance,
i.e., the high memory complexitythe ignorance of signal
irregularity. Based on our analysis, we develop a novel
3D self-attention operator to compute the self-attentions of
sparse voxels within each local window, which reduces the
memory cost of self-attention from quadratic to linear with
respect to the number of sparse voxels within a window and
computes efficiently; enhances self-attention via capturing
various signal irregularities by our generalized contextual
relative positional embedding [4826].
The novel design of our SWIN3D backbone enables us to
scale up the backbone model and the amount of data used
for pretraining. To this end, we pretrained a large SWIN3D
model with 60M parameters via a 3D semantic segmenta
tion task over a synthetic 3D indoor scene dataset [60] that
includes 21K rooms and is about ten times larger than the
ScanNet dataset. After pretraining, we cascade the pretrained
SWIN3D backbone with task-specific back-end decoders
and fine-tune the models for various downstream 3D indoor
scene understanding tasks.
 


 

------分隔线----------------------------

锋哥公众号


锋哥微信


关注公众号
【Java资料站】
回复 666
获取 
66套java
从菜鸡到大神
项目实战课程

锋哥推荐