Data store size is a perennial discussion item in vSphere design. A large data store is great for simplicity of...
management, but there's the risk of performance overload. Small data stores make isolating performance easy, but increase the complexity of management.
The right data store size will be a balance of manageability and performance, and will depend on the workload that the data store needs to satisfy. Most customers have multiple types of workloads that lead to multiple data store sizes. Data stores need to be designed to accommodate the workload.
How to choose data store sizes
First, let's look at the possible range and limits. The smallest Virtual Machine File System (VMFS) data store you can have is 1.6 GB, which is ridiculously small, so you will definitely be able to create a small data store.
At the other extreme, there are two options. If your environment doesn't contain ESXi versions prior to 5.0, then you may use VMFS v5 data stores. These can have a single logical unit number (LUN) providing a data store of up to 64 TB, which is an absurdly large size that won't limit your vSphere design. However, you should consider the recovery time if such a large data store was destroyed.
If you still have older, pre-version 5, ESX versions, then the 64 TB data store size limit is still there. Each LUN can only be 2 TB, though, so a sensible maximum data store size is 2 TB. While you can join multiple LUNs -- referred to as extents -- to make a data store larger than your largest LUN, it is best to avoid this. Using extents increases the risk of breaking the data store and is seldom required with vSphere 5.
There is another set of limits to consider, which is the maximum number of LUNs per ESXi host. Each host can have no more than 255 LUNs and no more than 256 NFS data stores. Also, the data stores must be visible to every host in the Distributed Resource Scheduler and High Availability clusters. With all these factors, larger environments should have fewer, larger data stores, rather than many small data stores. I have seen environments with a large number of small data stores, which leads to a large number of small clusters. Since the clusters only differ in the data stores they access, the decision of which cluster to use for a new virtual machine (VM) is more complex than it should be.
For data stores, less is more
Another reason for fewer and larger data stores is the requirement to keep free space. The default data store free-space alarm will turn amber at 25% free space and red at 15% free space. In a larger data store, there will be more VMs -- assuming the VMs are the same size -- so there will be more averaging of growth, leading to less chance of running short of free space on one data store while another has ample free space. Another benefit of larger data stores is a longer notice period if a single VM is growing to consume the free space, since there will be a larger pool to consume.
Fewer larger data stores make your environment easier to manage, provided they can deliver the performance requirements for the VMs they hold. This is where we start to look at smaller data stores. All of the VMs on a data store share the performance of the data store, so the large data store must provide enough bandwidth and transactions for a larger number of VMs.
Each ESXi server has a maximum number of outstanding transactions -- reads or writes, called the queue depth -- for each LUN. If there is only one VM using this LUN from an ESXi server, that one VM gets the whole queue. If there are multiple VMs, then they share the queue, which may result in a bottleneck. The larger the data store, the more VMs it is likely to hold and the greater the chance of filling queues or saturating the LUN performance.