阿里云 OSS 迁移到 AWS S3

注意:本文最初编写于2020年9月。至2023年初,本文所介绍的S3-resumable-upload工具已经不再维护(参考本文末尾Github上介绍),最新的代码和功能被合并到Data Transfer Hub。新的Data Transfer Hub是带有GUI图形界面的S3迁移工具,支持阿里云等多个公有云的对象存储服务,可一键生成复制任务,背后通过ECS容器服务进行多线程复制,且支持增量复制。详情请跳转到Data Transfer Hub的官网。本工具本身免费,复制时候需要的资源如另外拉起ECS容器按云标准收费。

一、迁移工具

Amazon-s3-resumable-upload 是 AWS S3 的迁移工具,有单机版、集群版、无服务器Serverless(Lambda)版本,可以用于本地到S3上云,国内外S3互传等场景。其中如果源站是阿里云,则暂时只有单机版本支持。

部署迁移工具需要在AWS云上创建一个EC2,使用Amazon Linux 2操作系统,并安装Python3和相关依赖库。

二、环境部署

1、EC2环境选择

  • 测试阶段可选择 t2.medium 规格
  • 生产阶段建议选择 m5.large 规格 或 c5.large 规格
  • 磁盘默认8GB即可,本迁移工具不在磁盘落盘,因此不需要准备大量的EBS磁盘
  • 安全规则组选择开放TCP 22端口允许远程登录即可

2、安装基础环境

以 Amazon Linux 2 操作系统为例,执行以下命令安装。注意需要Python3版本,另外oss2是阿里云的sdk。注:部分下载过程可能比较慢,请等待一段时间。

yum update -y
yum install git
amazon-linux-extras install python3
pip3 install --upgrade pip --user
pip3 install boto3
pip3 install oss2

3、配置AWS CLI访问凭证

Amazon Linux 2 已经内置了 AWS CLI。因此执行如下命令:

aws configure

并配置正确的 Access_Key 就可。注意Region需要选择与目标S3桶一致的Region,输出格式(Output format)设置为 json即可。

4、下载脚本

执行如下命令下载脚本,将其保存在同一个目录下。

wget https://s3.cn-north-1.amazonaws.com.cn/myworkshop-lxy/oss-to-s3/s3_upload.py
wget https://s3.cn-north-1.amazonaws.com.cn/myworkshop-lxy/oss-to-s3/s3_upload_config.ini

5、设置OSS源站和S3目标桶信息。

编辑 s3_upload_config.ini 文件,找到其中如下一段,修改 DesBucket 为 S3 目标桶的名字,S3—Prefix为空,表示复制时候不筛选前缀,后边的 SrcFileIndex为*表示包含所有文件,其他字段不需要改动。

[Basic]
JobType = ALIOSS_TO_S3
DesBucket = s3bucketname
S3Prefix =
SrcFileIndex = *
DesProfileName = default

找到如下一段是OSS源站设置,修改OSS桶名、ID、Endpoint信息。

[ALIOSS_TO_S3]
ali_SrcBucket = ossbucketname
ali_access_key_id = yourkeyid
ali_access_key_secret = yourkeysecret
ali_endpoint = oss-cn-beijing.aliyuncs.com

完成后保存退出。

请注意此配置文件中其余部分,包括不是使用的Local-to-S3、S3-to-S3等模式的配置参数需要保留原状,即不要删除也不要注释掉,保持原状即可。

三、迁移数据

1、首次全量复制

首先验证 AWS CLI 工作正常。在 Root 权限下或者 User 权限下均可。执行命令时候请保持与上文配置 AWS CLI 访问凭证时候是在同一个账户下即可。执行如下命令。

aws s3 ls

确认可以显示出来目标S3桶。

执行如下命令开始复制。

python3 s3_upload.py --nogui

执行结果如下。

Reading config file: s3_upload_config.ini
Logging to file: /root/log/s3_upload-2020-08-24T04-17-55.log
Logging level: INFO
2020-08-24 04:17:55,895 INFO - Found credentials in shared credentials file: ~/.aws/credentials
2020-08-24 04:17:55,936 INFO - Checking write permission for: oss-to-s3-lxy
2020-08-24 04:17:56,004 INFO - Get source file list
2020-08-24 04:17:56,110 INFO - Get oss file list oss-to-s3-lxy
2020-08-24 04:17:56,228 INFO - Get s3 file list oss-to-s3-lxy
2020-08-24 04:17:56,253 INFO - Bucket list length:1
2020-08-24 04:17:56,253 INFO - Get unfinished multipart upload
2020-08-24 04:17:56,269 INFO - Start file: ABCDWallpaper-20200809.jpeg
--->Downloading ABCDWallpaper-20200809.jpeg - small file
2020-08-24 04:17:56,271 INFO - Start file: ABCDWallpaper-20200810.jpeg
--->Downloading ABCDWallpaper-20200810.jpeg - small file
2020-08-24 04:17:56,273 INFO - Start file: ABCDWallpaper-20200811.jpeg
--->Downloading ABCDWallpaper-20200811.jpeg - small file
2020-08-24 04:17:56,274 INFO - Start file: ABCDWallpaper-20200812.jpeg
--->Downloading ABCDWallpaper-20200812.jpeg - small file
2020-08-24 04:17:56,276 INFO - Start file: ABCDWallpaper-20200813.jpeg
--->Downloading ABCDWallpaper-20200813.jpeg - small file
    --->Uploading ABCDWallpaper-20200809.jpeg - small file
    --->Uploading ABCDWallpaper-20200811.jpeg - small file
        --->Complete ABCDWallpaper-20200809.jpeg - small file - 652.2 KB/s
2020-08-24 04:17:57,197 INFO - Start file: ABCDWallpaper-20200814.jpeg
--->Downloading ABCDWallpaper-20200814.jpeg - small file
    --->Uploading ABCDWallpaper-20200812.jpeg - small file
    --->Uploading ABCDWallpaper-20200813.jpeg - small file
        --->Complete ABCDWallpaper-20200812.jpeg - small file - 697.0 KB/s
2020-08-24 04:17:57,285 INFO - Start file: ABCDWallpaper-20200815.jpeg
--->Downloading ABCDWallpaper-20200815.jpeg - small file
        --->Complete ABCDWallpaper-20200811.jpeg - small file - 640.7 KB/s
2020-08-24 04:17:57,319 INFO - Start file: ABCDWallpaper-20200816.jpeg
--->Downloading ABCDWallpaper-20200816.jpeg - small file
        --->Complete ABCDWallpaper-20200813.jpeg - small file - 676.7 KB/s
2020-08-24 04:17:57,373 INFO - Start file: ABCDWallpaper-20200817.jpeg
--->Downloading ABCDWallpaper-20200817.jpeg - small file
    --->Uploading ABCDWallpaper-20200810.jpeg - small file
        --->Complete ABCDWallpaper-20200810.jpeg - small file - 727.8 KB/s
2020-08-24 04:17:57,977 INFO - Start file: ABCDWallpaper-20200818.jpeg
--->Downloading ABCDWallpaper-20200818.jpeg - small file
    --->Uploading ABCDWallpaper-20200814.jpeg - small file
    --->Uploading ABCDWallpaper-20200816.jpeg - small file
        --->Complete ABCDWallpaper-20200814.jpeg - small file - 752.6 KB/s
2020-08-24 04:17:58,587 INFO - Start file: ABCDWallpaper-20200819.jpeg
--->Downloading ABCDWallpaper-20200819.jpeg - small file
        --->Complete ABCDWallpaper-20200816.jpeg - small file - 729.6 KB/s
2020-08-24 04:17:58,620 INFO - Start file: ABCDWallpaper-20200820.jpeg
--->Downloading ABCDWallpaper-20200820.jpeg - small file
    --->Uploading ABCDWallpaper-20200817.jpeg - small file
        --->Complete ABCDWallpaper-20200817.jpeg - small file - 756.5 KB/s
2020-08-24 04:17:58,790 INFO - Start file: ABCDWallpaper-20200821.jpeg
--->Downloading ABCDWallpaper-20200821.jpeg - small file
    --->Uploading ABCDWallpaper-20200815.jpeg - small file
    --->Uploading ABCDWallpaper-20200821.jpeg - small file
        --->Complete ABCDWallpaper-20200815.jpeg - small file - 753.6 KB/s
        --->Complete ABCDWallpaper-20200821.jpeg - small file - 503.2 KB/s
    --->Uploading ABCDWallpaper-20200819.jpeg - small file
    --->Uploading ABCDWallpaper-20200818.jpeg - small file
        --->Complete ABCDWallpaper-20200819.jpeg - small file - 969.3 KB/s
        --->Complete ABCDWallpaper-20200818.jpeg - small file - 907.0 KB/s
    --->Uploading ABCDWallpaper-20200820.jpeg - small file
        --->Complete ABCDWallpaper-20200820.jpeg - small file - 1.3 MB/s
2020-08-24 04:17:59,832 INFO - Comparing destination and source ...
2020-08-24 04:17:59,832 INFO - Get s3 file list oss-to-s3-lxy
2020-08-24 04:17:59,853 INFO - Bucket list length:14
2020-08-24 04:17:59,853 INFO - Get oss file list oss-to-s3-lxy
2020-08-24 04:17:59,881 WARNING - All source files are in destination Bucket/Prefix. Job well done.
MISSION ACCOMPLISHED - Time: 0:00:03.956390  - FROM: oss-to-s3-lxy/ TO oss-to-s3-lxy/
Logged to file: /root/log/s3_upload-2020-08-24T04-17-55.log
PRESS ENTER TO QUIT

按回车键即可退出运行。

说明:全量复制的范围是由 s3_upload_config.ini 配置文件中的 SrcFileIndex = * 这个配置决定的。

2、增量传输

增量传输的方法与完全参数的过程一样,

python3 s3_upload.py --nogui

返回结果如下。

Reading config file: s3_upload_config.ini
Logging to file: /root/log/s3_upload-2020-08-24T04-25-22.log
Logging level: INFO
2020-08-24 04:25:22,553 INFO - Found credentials in shared credentials file: ~/.aws/credentials
2020-08-24 04:25:22,595 INFO - Checking write permission for: oss-to-s3-lxy
2020-08-24 04:25:22,665 INFO - Get source file list
2020-08-24 04:25:22,773 INFO - Get oss file list oss-to-s3-lxy

2020-08-24 04:25:22,907 INFO - Get s3 file list oss-to-s3-lxy
2020-08-24 04:25:22,936 INFO - Bucket list length:13
2020-08-24 04:25:22,936 INFO - Get unfinished multipart upload
2020-08-24 04:25:22,954 INFO - Start file: ABCDWallpaper-20200809.jpeg
2020-08-24 04:25:22,954 INFO - Duplicated. ABCDWallpaper-20200809.jpeg same size, goto next file.
2020-08-24 04:25:22,954 INFO - Start file: ABCDWallpaper-20200810.jpeg
2020-08-24 04:25:22,954 INFO - Duplicated. ABCDWallpaper-20200810.jpeg same size, goto next file.
2020-08-24 04:25:22,955 INFO - Start file: ABCDWallpaper-20200811.jpeg
2020-08-24 04:25:22,955 INFO - Duplicated. ABCDWallpaper-20200811.jpeg same size, goto next file.
2020-08-24 04:25:22,955 INFO - Start file: ABCDWallpaper-20200812.jpeg
2020-08-24 04:25:22,955 INFO - Duplicated. ABCDWallpaper-20200812.jpeg same size, goto next file.
2020-08-24 04:25:22,956 INFO - Start file: ABCDWallpaper-20200813.jpeg
2020-08-24 04:25:22,956 INFO - Duplicated. ABCDWallpaper-20200813.jpeg same size, goto next file.
2020-08-24 04:25:22,956 INFO - Start file: ABCDWallpaper-20200814.jpeg
2020-08-24 04:25:22,956 INFO - Duplicated. ABCDWallpaper-20200814.jpeg same size, goto next file.
2020-08-24 04:25:22,956 INFO - Start file: ABCDWallpaper-20200815.jpeg
2020-08-24 04:25:22,956 INFO - Duplicated. ABCDWallpaper-20200815.jpeg same size, goto next file.
2020-08-24 04:25:22,956 INFO - Start file: ABCDWallpaper-20200816.jpeg
2020-08-24 04:25:22,956 INFO - Duplicated. ABCDWallpaper-20200816.jpeg same size, goto next file.
2020-08-24 04:25:22,956 INFO - Start file: ABCDWallpaper-20200817.jpeg
2020-08-24 04:25:22,956 INFO - Duplicated. ABCDWallpaper-20200817.jpeg same size, goto next file.
2020-08-24 04:25:22,957 INFO - Start file: ABCDWallpaper-20200818.jpeg
2020-08-24 04:25:22,957 INFO - Duplicated. ABCDWallpaper-20200818.jpeg same size, goto next file.
2020-08-24 04:25:22,957 INFO - Start file: ABCDWallpaper-20200819.jpeg
2020-08-24 04:25:22,957 INFO - Duplicated. ABCDWallpaper-20200819.jpeg same size, goto next file.
2020-08-24 04:25:22,957 INFO - Start file: ABCDWallpaper-20200820.jpeg
2020-08-24 04:25:22,957 INFO - Duplicated. ABCDWallpaper-20200820.jpeg same size, goto next file.
2020-08-24 04:25:22,957 INFO - Start file: ABCDWallpaper-20200821.jpeg
--->Downloading ABCDWallpaper-20200821.jpeg - small file
    --->Uploading ABCDWallpaper-20200821.jpeg - small file
        --->Complete ABCDWallpaper-20200821.jpeg - small file - 976.7 KB/s
2020-08-24 04:25:23,230 INFO - Comparing destination and source ...
2020-08-24 04:25:23,230 INFO - Get s3 file list oss-to-s3-lxy
2020-08-24 04:25:23,277 INFO - Bucket list length:14
2020-08-24 04:25:23,277 INFO - Get oss file list oss-to-s3-lxy
2020-08-24 04:25:23,314 WARNING - All source files are in destination Bucket/Prefix. Job well done.
MISSION ACCOMPLISHED - Time: 0:00:00.696466  - FROM: oss-to-s3-lxy/ TO oss-to-s3-lxy/
Logged to file: /root/log/s3_upload-2020-08-24T04-25-22.log

复制完成。可以看到在遍历且对比后,仅复制了一个增量文件。

3、补充说明

(1)使用虚拟终端

建议使用screen虚拟终端,避免复制过程网络断线。Amazon Linux 2已经内置了screen组件。

执行 screen 启动一个新的虚拟终端。单网络断线时候,执行 screen -x 即可连接回之前的虚拟终端。

(2)分目录(前缀)进行拷贝

为了降低增量复制时候遍历整个存储桶带来的不必要负载,建议在迁移期间,对OSS存储桶采取修改前缀即新建目录的办法。例如当前数据在pic1目录下,当准备进行全量复制前,修改程序代码,将OSS存储桶内新生成的文件都保存到pic2目录中,pic1不再产生增量。这样可以避免下次增量同步产生遍历所有文件的开销。

为了复制存储桶内的目录,需要修改配置文件,需要修改 s3_upload_config.ini 配置文件,将其中的配置改为如下。

S3Prefix = pic1
SrcFileIndex = *

这样的两行组合表示只同步pic1目录下的所有文件。全量同步完成后,后续再同步pic2目录,即可有效降低开销。

(3)同步特定字符串开头的文件

假设在存储桶的根目录下有若干文件ABCDWallpaper-xxxx.jpg,都是以ABCD开头,那么修改 s3_upload_config.ini 配置文件如下。

S3Prefix = ABCD
SrcFileIndex = *

执行复制,将只同步以ABCD开头的文件。

四、参考资料

Amazon-s3-resumable-upload 项目Github网址。

https://github.com/aws-samples/amazon-s3-resumable-upload

数据在线传输解决方案

https://www.amazonaws.cn/solutions/data-transfer-hub/