Vertical scaling evolves increasing the capacity and power of a single server by adding more RAM and or increasing CPU processing power which may have hard ceilings on cloud-based providers so in most cases there is a practical maximum for most vertical scaling
Horizontal scaling involves dividing the dataset and load to multiple servers. While the speed and storage capacity on any given machine might be relatively small, each machine only handles a subset of the overall workload. In concert these machines can potentially provide much more speed and capacity than any single server architecture
Mongo supports horizontal scaling through sharding, a sharding cluster is made up of three components
- Shard - a subset of the sharded data, each shard can be deployed as a replica set
- Mongos - acts as the query router, serving as the interface between the apps and the sharded cluster
- Config servers - store meta data and config settings for the cluster.
Each shard is identified by a unique shard key which is immutable. The choice of a shard key affects how the cluster balancer creates and distributes chunks across all shards. A Custer with the best hardware can be bottlenecked by choosing a weak shard key. The ideal shard key allows mongo to distribute chunks evenly
Chunks are contiguous ranges of chard key values and are inclusive of the lower boundary and exclusive of the upper boundary.
Advantages
Mongo will automatically distribute the workload across the shards allowing each shard to do its subset of work including both reads and write workloads.
Increased storage can be scaled up by adding more shards
Clusters can perform partial reads and write operations even if one or more shards are unavailable. While the subset of data on an unavailable shard is not accessible during downtime reads and writes directed at the available shards can still succeed as long as a majority of the replica set is available