OPTIONS
翻译或纠错本页面

FAQ: MongoDB 关于开发者

这篇文档解答了使用MongoDB进行应用开发的常见问题。

如果没有找到你需要的信息,查看:doc:complete list of FAQs </faq> or post your question to the MongoDB User Mailing List <https://groups.google.com/forum/?fromgroups#!forum/mongodb-user>  .

在MongoDB中命名空间是什么?

A “namespace” is the concatenation of the database name and the collection names [1] with a period character in between.

Collections are containers for documents that share one or more indexes. Databases are groups of collections stored on disk using a single set of data files. [2]

For an example acme.users namespace, acme is the database name and users is the collection name. Period characters can occur in collection names, so that acme.user.history is a valid namespace, with acme as the database name, and user.history as the collection name.

While data models like this appear to support nested collections, the collection namespace is flat, and there is no difference from the perspective of MongoDB between acme, acme.users, and acme.records.

[1]

每个索引都有自己的命名空间。

[2]MongoDB database have a configurable limit on the number of namespaces in a database.

如何将所有的对象从一个集合复制到另一个?

mongo 命令行程序中,你可以使用以下操作来复制所有集合:

db.source.copyTo(newCollection)

警告

When using db.collection.copyTo() check field types to ensure that the operation does not remove type information from documents during the translation from BSON to JSON. Consider using cloneCollection() to maintain type fidelity.

The db.collection.copyTo() method uses the eval command internally. As a result, the db.collection.copyTo() operation takes a global lock that blocks all other read and write operations until the db.collection.copyTo() completes.

Also consider the cloneCollection command that may provide some of this functionality.

如果你删除一个文档,MongoDB会将其从硬盘上删除吗?

是的。

当你使用 remove() 函数,MongoDB在硬盘上将不再存储该对象。

MongoDB什么时候将更新的内容写入硬盘?

MongoDB flushes writes to disk on a regular interval. In the default configuration, MongoDB writes data to the main data files on disk every 60 seconds and commits the journal roughly every 100 milliseconds. These values are configurable with the commitIntervalMs and syncPeriodSecs.

These values represent the maximum amount of time between the completion of a write operation and the point when the write is durable in the journal, if enabled, and when MongoDB flushes data to the disk. In many cases MongoDB and the operating system flush data to disk more frequently, so that the above values represents a theoretical maximum.

However, by default, MongoDB uses a “lazy” strategy to write to disk. This is advantageous in situations where the database receives a thousand increments to an object within one second, MongoDB only needs to flush this data to disk once. In addition to the aforementioned configuration options, you can also use fsync and 安全写级别参考 to modify this strategy.

在MongoDB中如何交易和锁定?

MongoDB does not have support for traditional locking or complex transactions with rollback. MongoDB aims to be lightweight, fast, and predictable in its performance. This is similar to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.

MongoDB does have support for atomic operations within a single document. Given the possibilities provided by nested documents, this feature provides support for a large number of use-cases.

在MongoDB中如何聚合数据?

在2.1及以上版本中,你可以通过 aggregate 命令使用新的 aggregation framework

MongoDB also supports map-reduce with the mapReduce command, as well as basic aggregation with the group, count, and distinct. commands.

参见

聚合 页面。

为什么MongoDB日志中有许多 “Connection Accepted” 的事件记录?

If you see a very large number connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.

If these connections do not impact your performance you can use the run-time quiet option or the command-line option --quiet to suppress these messages from the log.

Amazon EBS上可以运行MongoDB吗?

是的。

MongoDB users of all sizes have had a great deal of success using MongoDB on the EC2 platform using EBS disks.

参见

Amazon EC2

为什么MongoDB的数据文件这么大?

MongoDB aggressively preallocates data files to reserve space and avoid file system fragmentation. You can use the storage.smallFiles setting to modify the file preallocation strategy.

对于小文档,如何优化存储?

Each MongoDB document contains a certain amount of overhead. This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.

Consider the following suggestions and strategies for optimizing storage utilization for these collections:

  • 明确使用 “_id” 字段。

    MongoDB clients automatically add an _id field to each document and generate a unique 12-byte ObjectId for the _id field. Furthermore, MongoDB always indexes the _id field. For smaller documents this may account for a significant amount of space.

    To optimize storage use, users can specify a value for the _id field explicitly when inserting documents into the collection. This strategy allows applications to store a value in the _id field that would have occupied space in another portion of the document.

    You can store any value in the _id field, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. If the field’s value is not unique, then it cannot serve as a primary key as there would be collisions in the collection.

  • 使用较短的字段名。

    MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. Consider a collection of documents that resemble the following:

    { last_name : "Smith", best_score: 3.9 }
    

    If you shorten the field named last_name to lname and the field named best_score to score, as follows, you could save 9 bytes per document.

    { lname : "Smith", score : 3.9 }
    

    Shortening field names reduces expressiveness and does not provide considerable benefit for larger documents and where document overhead is not of significant concern. Shorter field names do not reduce the size of indexes, because indexes have a predefined structure.

    一般来说,使用短的字段名不是必要的。

  • 嵌入文档。

    在某些情况下,你可能想在一些文档中嵌入文档并在每个文档上面保存。

什么时候我应该使用GridFS?

For documents in a MongoDB collection, you should always use GridFS for storing files larger than 16 MB.

In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.

  • If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
  • When you want to keep your files and metadata automatically synced and deployed across a number of systems and facilities. When using geographically distributed replica sets MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
  • When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.

Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.

Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.

GridFS的更多信息,查看 GridFS

How does MongoDB address SQL or Query injection?

BSON

As a client program assembles a query in MongoDB, it builds a BSON object, not a string. Thus traditional SQL injection attacks are not a problem. More details and some nuances are covered below.

MongoDB represents queries as BSON objects. Typically client libraries provide a convenient, injection free, process to build these objects. Consider the following C++ example:

BSONObj my_query = BSON( "name" << a_name );
auto_ptr<DBClientCursor> cursor = c.query("tutorial.persons", my_query);

Here, my_query then will have a value such as { name : "Joe" }. If my_query contained special characters, for example ,, :, and {, the query simply wouldn’t match any documents. For example, users cannot hijack a query and convert it to a delete.

JavaScript

注解

You can disable all server-side execution of JavaScript, by passing the --noscripting option on the command line or setting security.javascriptEnabled in a configuration file.

All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:

You must exercise care in these cases to prevent users from submitting malicious JavaScript.

Fortunately, you can express most queries in MongoDB without JavaScript and for queries that require JavaScript, you can mix JavaScript and non-JavaScript in a single query. Place all the user-supplied fields directly in a BSON field and pass JavaScript code to the $where field.

  • If you need to pass user-supplied values in a $where clause, you may escape these values with the CodeWScope mechanism. When you set user-submitted values as variables in the scope document, you can avoid evaluating them on the database server.

  • If you need to use db.eval() with user supplied values, you can either use a CodeWScope or you can supply extra arguments to your function. For instance:

    db.eval(function(userVal){...},
            user_value);
    

    This will ensure that your application sends user_value to the database server as data rather than code.

Dollar Sign Operator Escaping

Field names in MongoDB’s query language have semantic meaning. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.

In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”).

Consider the following example:

BSONObj my_object = BSON( a_key << a_name );

The user may have supplied a $ value in the a_key value. At the same time, my_object might be { $where : "things" }. Consider the following cases:

  • Insert. Inserting this into the database does no harm. The insert process does not evaluate the object as a query.

    注解

    MongoDB client drivers, if properly implemented, check for reserved characters in keys on inserts.

  • Update. The update() operation permits $ operators in the update argument but does not support the $where operator. Still, some users may be able to inject operators that can manipulate a single document only. Therefore your application should escape keys, as mentioned above, if reserved characters are possible.

  • Query Generally this is not a problem for queries that resemble { x : user_obj }: dollar signs are not top level and have no effect. Theoretically it may be possible for the user to build a query themselves. But checking the user-submitted content for $ characters in key names may help protect against this kind of injection.

Driver-Specific Issues

See the “PHP MongoDB Driver Security Notes” page in the PHP driver documentation for more information

How does MongoDB provide concurrency?

MongoDB implements a readers-writer lock. This means that at any one time, only one client may be writing or any number of clients may be reading, but that reading and writing cannot occur simultaneously.

In standalone and replica sets the lock’s scope applies to a single mongod instance or primary instance. In a sharded cluster, locks apply to each individual shard, not to the whole cluster.

更多信息,查看 FAQ: Concurrency

What is the compare order for BSON types?

MongoDB permits documents within a single collection to have fields with different BSON types. For instance, the following documents may exist within a single collection.

{ x: "string" }
{ x: 42 }

When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:

  1. MinKey (internal type)
  2. 数字(整形,长整形,浮点型)

  3. Symbol, String
  4. 对象

  5. 数组

  6. BinData
  7. ObjectId
  8. 布尔类型

  9. 日期,时间

  10. Regular Expression
  11. MaxKey (internal type)

MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.

The comparison treats a non-existent field as it would an empty BSON Object. As such, a sort on the a field in documents { } and { a: null } would treat the documents as equivalent in sort order.

With arrays, a less-than comparison or an ascending sort compares the smallest element of arrays, and a greater-than comparison or a descending sort compares the largest element of the arrays. As such, when comparing a field whose value is a single-element array (e.g. [ 1 ]) with non-array fields (e.g. 2), the comparison is between 1 and 2. A comparison of an empty array (e.g. [ ]) treats the empty array as less than null or a missing field.

MongoDB sorts BinData in the following order:

  1. 首先,数据的长度或大小

  2. Then, by the BSON one-byte subtype.
  3. Finally, by the data, performing a byte-by-byte comparison.

Consider the following mongo example:

db.test.insert( {x : 3 } );
db.test.insert( {x : 2.9 } );
db.test.insert( {x : new Date() } );
db.test.insert( {x : true } );

db.test.find().sort({x:1});
{ "_id" : ObjectId("4b03155dce8de6586fb002c7"), "x" : 2.9 }
{ "_id" : ObjectId("4b03154cce8de6586fb002c6"), "x" : 3 }
{ "_id" : ObjectId("4b031566ce8de6586fb002c9"), "x" : true }
{ "_id" : ObjectId("4b031563ce8de6586fb002c8"), "x" : "Tue Nov 17 2009 16:28:03 GMT-0500 (EST)" }

The $type operator provides access to BSON type comparison in the MongoDB query syntax. See the documentation on BSON types and the $type operator for additional information.

警告

Storing values of the different types in the same field in a collection is strongly discouraged.

参见

When multiplying values of mixed types, what type conversion rules apply?

The $mul multiplies the numeric value of a field by a number. For multiplication with values of mixed numeric types (32-bit integer, 64-bit integer, float), the following type conversion rules apply:

 

32位整型

64位整型

浮点型

32位整型

32位或64位整型

64位整型

浮点型

64位整型

64位整型

64位整型

浮点型

浮点型

浮点型

浮点型

浮点型

注解

  • If the product of two 32-bit integers exceeds the maximum value for a 32-bit integer, the result is a 64-bit integer.
  • Integer operations of any type that exceed the maximum value for a 64-bit integer produce an error.

如何查询空值字段?

文档中的字段可能会是空值,就像下面的抽象集合 “test” 中的文档:

{ _id: 1, cancelDate: null }
{ _id: 2 }

Different query operators treat null values differently:

  • The { cancelDate : null } query matches documents that either contains the cancelDate field whose value is null or that do not contain the cancelDate field:

    db.test.find( { cancelDate: null } )
    

    The query returns both documents:

    { "_id" : 1, "cancelDate" : null }
    { "_id" : 2 }
    
  • The { cancelDate : { $type: 10 } } query matches documents that contains the cancelDate field whose value is null only; i.e. the value of the cancelDate field is of BSON Type Null (i.e. 10) :

    db.test.find( { cancelDate : { $type: 10 } } )
    

    The query returns only the document that contains the null value:

    { "_id" : 1, "cancelDate" : null }
    
  • The { cancelDate : { $exists: false } } query matches documents that do not contain the cancelDate field:

    db.test.find( { cancelDate : { $exists: false } } )
    

    The query returns only the document that does not contain the cancelDate field:

    { "_id" : 2 }
    

参见

The reference documentation for the $type and $exists operators.

Are there any restrictions on the names of Collections?

Collection names can be any UTF-8 string with the following exceptions:

  • A collection name should begin with a letter or an underscore.
  • 空字符串 ("") 是一个无效的集合名。

  • Collection names cannot contain the $ character. (version 2.2 only)
  • Collection names cannot contain the null character: \0
  • Do not name a collection using the system. prefix. MongoDB reserves system. for system collections, such as the system.indexes collection.
  • The maximum size of a collection name is 128 characters, including the name of the database. However, for maximum flexibility, collections should have names less than 80 characters.

If your collection name includes special characters, such as the underscore character, then to access the collection use the db.getCollection() method or a similar method for your driver.

例子

To create a collection _foo and insert the { a : 1 } document, use the following operation:

db.getCollection("_foo").insert( { a : 1 } )

To perform a query, use the find() method, in as the following:

db.getCollection("_foo").find()

How do I isolate cursors from intervening write operations?

MongoDB cursors can return the same document more than once in some situations. [3] You can use the snapshot() method on a cursor to isolate the operation for a very specific case.

snapshot() traverses the index on the _id field and guarantees that the query will return each document (with respect to the value of the _id field) no more than once. [4]

The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.

警告

As an alternative, if your collection has a field or fields that are never modified, you can use a unique index on this field or these fields to achieve a similar result as the snapshot(). Query with hint() to explicitly force the query to use that index.

[3]As a cursor returns documents other operations may interleave with the query: if some of these operations are updates that cause the document to move (in the case of a table scan, caused by document growth) or that change the indexed field on the index used by the query; then the cursor will return the same document more than once.
[4]MongoDB does not permit changes to the value of the _id field; it is not possible for a cursor that transverses this index to pass the same document more than once.

什么时候应该把文档嵌入其它文档?

When modeling data in MongoDB, embedding is frequently the choice for:

  • “contains” relationships between entities.
  • one-to-many relationships when the “many” objects always appear with or are viewed in the context of their parents.

You should also consider embedding for performance reasons if you have a collection with a large number of small documents. Nevertheless, if small, separate documents represent the natural model for the data, then you should maintain that model.

If, however, you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider “rolling-up” the small documents into larger documents that contain an array of subdocuments. Keep in mind that if you often only need to retrieve a subset of the documents within the group, then “rolling-up” the documents may not provide better performance.

“Rolling up” these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses.

Additionally, “rolling up” documents and moving common fields to the larger document benefit the index on these fields. There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. See 索引概念 for more information on indexes.

在哪可以学习MongoDB的数据模型?

首先阅读 数据模型 章节的文档。这部分文档包括高级数据模型的介绍,还有数据模型的实战训练和详细问题。

另外,学习一些下面外部资源中提供的例子:

Can I manually pad documents to prevent moves during updates?

An update can cause a document to move on disk if the document grows in size. To minimize document movements, MongoDB uses padding.

You should not have to pad manually because MongoDB adds padding automatically and can adaptively adjust the amount of padding added to documents to prevent document relocations following updates. You can change the default paddingFactor calculation by using the collMod command with the usePowerOf2Sizes flag. The usePowerOf2Sizes flag ensures that MongoDB allocates document space in sizes that are powers of 2, which helps ensure that MongoDB can efficiently reuse free space created by document deletion or relocation.

However, if you must pad a document manually, you can add a temporary field to the document and then $unset the field, as in the following example.

警告

Do not manually pad documents in a capped collection. Applying manual padding to a document in a capped collection can break replication. Also, the padding is not preserved if you re-sync the MongoDB instance.

var myTempPadding = [ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                      "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"];

db.myCollection.insert( { _id: 5, paddingField: myTempPadding } );

db.myCollection.update( { _id: 5 },
                        { $unset: { paddingField: "" } }
                      )

db.myCollection.update( { _id: 5 },
                        { $set: { realField: "Some text that I might have needed padding for" } }
                      )