分片 >
集群教程 >
集群部署教程 >
将复制集变为使用了复制集的集群

将复制集变为使用了复制集的集群¶

概述¶

按照以下教程,可以将一个三节点的复制集变为一个具有两个分片,每个分片都是三节点复制集的集群.

The tutorial uses a test environment running on a local system UNIX-like system. You should feel encouraged to “follow along at home.” If you need to perform this process in a production environment, notes throughout the document indicate procedural differences.

总体流程如下:

创建一个三节点的复制集并插入一些数据到一个集合中.
启动配置服务器并创建一个只有一个分片的集群.
使用另外三个 mongod 创建一个新的复制集.
将第二个复制集作为新节点加入到集群中.
在目标数据库和集合上开启分片.

过程¶

按照 ref:MongoDB安装教程 <tutorials-installation> 安装MongoDB.

使用测试数据部署一个复制集¶

如果你已经有一个部署好的 replica set ,可以省略这一步,从 部署集群设施: 继续.

按照以下步骤部署一个复制集并写入测试数据.

为第一个复制集创建以下目录,第一个复制集命名为 firstset :
- /data/example/firstset1
- /data/example/firstset2
- /data/example/firstset3
使用以下命令创建目录:
```
mkdir -p /data/example/firstset1 /data/example/firstset2 /data/example/firstset3
```
在单独的窗口终端或则GNU屏幕中,使用以下命令启动 mongod :
```
mongod --dbpath /data/example/firstset1 --port 10001 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset2 --port 10002 --replSet firstset --oplogSize 700 --rest
mongod --dbpath /data/example/firstset3 --port 10003 --replSet firstset --oplogSize 700 --rest
```
注解

The --oplogSize 700 option restricts the size of the operation log (i.e. oplog) for each mongod instance to 700MB. Without the --oplogSize option, each mongod reserves approximately 5% of the free disk space on the volume. By limiting the size of the oplog, each instance starts more quickly. Omit this setting in production environments.
从 mongo 终端中连接到10001端口的MongoDB服务器,如果是在生产环境中,首先阅读以下提示:
```
mongo localhost:10001/admin
```
注解

如果在生产环境或者多重系统的测试环境下进行操作,需要将 “localhost” 替换为可解析的域名,主机名或者IP地址.

在 mongo 终端中,使用以下命令初始化第一个复制集:

db.runCommand({"replSetInitiate" :
                    {"_id" : "firstset", "members" : [{"_id" : 1, "host" : "localhost:10001"},
                                                      {"_id" : 2, "host" : "localhost:10002"},
                                                      {"_id" : 3, "host" : "localhost:10003"}
             ]}})
{
        "info" : "Config now saved locally.  Should come online in about a minute.",
        "ok" : 1
}

在 mongo 终端中,使用以下Javascript操作创建集合并填充数据:

use test
switched to db test
people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"];
for(var i=0; i<1000000; i++){
                             name = people[Math.floor(Math.random()*people.length)];
                             user_id = i;
                             boolean = [true, false][Math.floor(Math.random()*2)];
                             added_at = new Date();
                             number = Math.floor(Math.random()*10001);
                             db.test_collection.save({"name":name, "user_id":user_id, "boolean": boolean, "added_at":added_at, "number":number });
                            }

以上的操作向 test_collection 集合中插入一百万条数据,花费的时间与系统性能有关.

脚本添加了如下格式的文档:

{ "_id" : ObjectId("4ed5420b8fc1dd1df5886f70"), "name" : "Greg", "user_id" : 4, "boolean" : true, "added_at" : ISODate("2011-11-29T20:35:23.121Z"), "number" : 74 }

部署集群设施:¶

这个过程创建了存储集群元信息的配置服务器:

注解

在测试环境中,一台配置服务器就已足够.在生产环境中要使用三台配置服务器.因为配置服务器只存储了集群元信息,所以占用的资源很少.

为三台配置服务器创建以下数据目录:
- /data/example/config1
- /data/example/config2
- /data/example/config3
使用以下命令:
```
mkdir -p /data/example/config1 /data/example/config2 /data/example/config3
```

在单独的窗口终端或者GNU屏幕中,使用以下命令启动配置服务器:

mongod --configsvr --dbpath /data/example/config1 --port 20001
mongod --configsvr --dbpath /data/example/config2 --port 20002
mongod --configsvr --dbpath /data/example/config3 --port 20003

在单独的窗口终端或者GNU屏幕中,使用以下命令启动 mongos :
```
mongos --configdb localhost:20001,localhost:20002,localhost:20003 --port 27017 --chunkSize 1
```
注解

如果你使用了刚刚创建的集合或者香试验以下分片,可以使用较小的 --chunkSize (1MB就可以),默认的 chunkSize 是64MB,意味着在自动均衡过程开始之前,你的集群至少有64MB数据.

在生产环境中,不要使用小的数据块大小:

配置 configDB 指定了 配置服务器 (比如,``localhost:20001``, localhost:20002,与 localhost:2003). mongos 运行在默认的 “MongoDB” 端口(即27017端口),分片运行在 30001 系列端口,在这个例子中,可以不指定 mongos 的 --port 27017 选项,而让它运行在默认端口.
在 mongos 中添加第一个分片,在新的窗口终端或者GNU屏幕中,按照以下过程添加第一个分片:
1. 使用以下命令连接到 mongos :
```
mongo localhost:27017/admin
```
2. 使用 addShard 命令添加第一个分片:
```
db.runCommand( { addShard : "firstset/localhost:10001,localhost:10002,localhost:10003" } )
```
3. 返回以下信息表明成功:
```
{ "shardAdded" : "firstset", "ok" : 1 }
```

部署第二个复制集¶

这个过程部署了第二个复制集,与之前部署第一个十分相似,只是不再插入测试数据.

为第二个复制集的成员创建以下目录,第二个复制集命名为 secondset:
- /data/example/secondset1
- /data/example/secondset2
- /data/example/secondset3

在三个终端窗口中,使用以下命令启动 mongod :

mongod --dbpath /data/example/secondset1 --port 10004 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset2 --port 10005 --replSet secondset --oplogSize 700 --rest
mongod --dbpath /data/example/secondset3 --port 10006 --replSet secondset --oplogSize 700 --rest

注解

向之前一样,第二个复制集也使用较小的 oplogSizeMB 配置,在在生产环境中忽略这个参数.

在 mongo 终端中,使用以下命令连接到复制集的一个成员:
```
mongo localhost:10004/admin
```

在 mongo 终端中,使用以下命令初始化第二个复制集:

db.runCommand({"replSetInitiate" :
                    {"_id" : "secondset",
                     "members" : [{"_id" : 1, "host" : "localhost:10004"},
                                  {"_id" : 2, "host" : "localhost:10005"},
                                  {"_id" : 3, "host" : "localhost:10006"}
             ]}})

{
     "info" : "Config now saved locally.  Should come online in about a minute.",
     "ok" : 1
}

连接到 mongos ,并按照以下命令添加集群的第二个分片:

use admin
db.runCommand( { addShard : "secondset/localhost:10004,localhost:10005,localhost:10006" } )

命令返回以下信息表示成功:

{ "shardAdded" : "secondset", "ok" : 1 }

使用 listShards 确认两个分片都正确配置,返回值示例如下:

db.runCommand({listShards:1})
{
       "shards" : [
              {
                     "_id" : "firstset",
                     "host" : "firstset/localhost:10001,localhost:10003,localhost:10002"
              },
              {
                     "_id" : "secondset",
                     "host" : "secondset/localhost:10004,localhost:10006,localhost:10005"
              }
      ],
     "ok" : 1
}

开启分片¶

MongoDB必须在数据库与集合级别都开启 sharding .

在数据库级别开启分片.¶

使用 enableSharding 命令,以下的示例在 “test” 数据库开启分片:

db.runCommand( { enableSharding : "test" } )
{ "ok" : 1 }

在片键上创建索引¶

MongoDB使用片键在分片间分发数据,一旦指定片键之后,不能再修改,好的片键:

取值可以将数据均匀地分发到各个分片.
能够将一次查询所需要的数据聚集在相邻的数据块中,并且
允许数据块在分片间有效地迁移.

通常片键是复合片键,包含一些哈希值和其他主键.选择片键需要依赖于存储的数据,应用的结构和使用模式和一些其他考虑.在这个示例中,我们使用 “number” 作为片键,在生产环境中这通常不是一个好的片键.

使用以下方法创建索引:

use test
db.test_collection.ensureIndex({number:1})

参见

参见 片键概览 与片键章节.

对集合开启分片¶

使用以下命令:

use admin
db.runCommand( { shardCollection : "test.test_collection", key : {"number":1} })
{ "collectionsharded" : "test.test_collection", "ok" : 1 }

集合 test_collection 现在已经开启了分片!

接下来几分钟,均衡器就开始在数据块之间分发数据,可以切换到 test 数据库并执行 db.stats() 或者 db.printShardingStatus() 确认这一点.

在客户端向集合插入数据时, mongos 就开始在分片间均衡数据.

在 mongo 终端中,使用以下命令返回对每个集群的统计:

use test
db.stats()
db.printShardingStatus()

db.stats() 命令返回示例如下:

{
     "raw" : {
             "firstset/localhost:10001,localhost:10003,localhost:10002" : {
                     "db" : "test",
                     "collections" : 3,
                     "objects" : 973887,
                     "avgObjSize" : 100.33173458522396,
                     "dataSize" : 97711772,
                     "storageSize" : 141258752,
                     "numExtents" : 15,
                     "indexes" : 2,
                     "indexSize" : 56978544,
                     "fileSize" : 1006632960,
                     "nsSizeMB" : 16,
                     "ok" : 1
             },
             "secondset/localhost:10004,localhost:10006,localhost:10005" : {
                     "db" : "test",
                     "collections" : 3,
                     "objects" : 26125,
                     "avgObjSize" : 100.33286124401914,
                     "dataSize" : 2621196,
                     "storageSize" : 11194368,
                     "numExtents" : 8,
                     "indexes" : 2,
                     "indexSize" : 2093056,
                     "fileSize" : 201326592,
                     "nsSizeMB" : 16,
                     "ok" : 1
             }
     },
     "objects" : 1000012,
     "avgObjSize" : 100.33176401883178,
     "dataSize" : 100332968,
     "storageSize" : 152453120,
     "numExtents" : 23,
     "indexes" : 4,
     "indexSize" : 59071600,
     "fileSize" : 1207959552,
     "ok" : 1
}

db.printShardingStatus() 命令返回示例如下:

--- Sharding Status ---
sharding version: { "_id" : 1, "version" : 3 }
shards:
       {  "_id" : "firstset",  "host" : "firstset/localhost:10001,localhost:10003,localhost:10002" }
       {  "_id" : "secondset",  "host" : "secondset/localhost:10004,localhost:10006,localhost:10005" }
databases:
       {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
       {  "_id" : "test",  "partitioned" : true,  "primary" : "firstset" }
                  test.test_collection chunks:
                                               secondset     5
                                               firstset      186

[...]

一段时间之后,可以在再次运行命令确认 数据块 正在从 firstset 迁移到 secondset .

在这个过程结束时,你已经将一个复制集转化成了每个分片都是一个复制集的集群.

← 在生产环境中部署三个配置服务器将一个集群转化为复制集 →