With so many applications being built as SAAS offerings, an important decision that must be made when defining the data architecture is: Should each customers’ data be in a separate database, or can a shared database (keyed by a customer identifier) be used for all customers.
Pros of database per customer
There’s a variety of reasons to isolate a customer’s data into its own database. Here’s a few I’ve seen:
- Parameter sniffing
- Deletions
- Cross contamination of data
- Restoring due to a data issue
- Security and access audit
- Azure
Parameter sniffing
Parameter sniffing is not as much of an issue. Parameter sniffing occurs when parameterized SQL generates and caches different plans for different parameter sets. This is a common cause of performance problems when different customers have large variances in the amount of data they generate. An execution plan generated for a small customer easily cause a large customer’s query to stare off into space.
Deletions
In the unfortunate case that you lose a customer, selectively deleting large amounts of customer data can generate a variety of performance issues in a single database. If you have customer data isolated in its own databases it is a fairly simple drop of the whole database.
Cross contamination of data
Cross contamination of data is not an issue. I’ve seen a large commonly used software vendor write an extremely complex approach to allowing customers to generate their own SSRS reports against their own data. As a result, they needed to ensure that the customer reporting platform did not allow customer A query data against customer B. As a result, they were parsing queries submitted via their reporting platform and adding predicates (where clause) to a customer written query. This was an extremely complex fix for something that would have been easily solved if customer data was split per database.
Restore due to a data issue
I’ve seen scenarios where a customer will do something to mangle their data in a way that could not easily be fixed via the application. The fix for this was restoring from a backup. This is a simple process if the customer’s data is isolated in its own database, this is not a solution if their data is in a database with other customers.
Security and access audit
Having a separate database per customer also provides a smaller surface area when it comes to security. Different credentials can be used per customers’ data, and auditing of logins or access to customer data is made simpler.
Azure
The Azure SQL Database and Elastic Pool model fits nicely with the model of having a database per customer. If you plan to target Azure, this should definitely be a consideration.
Cons of database per customer
Of course there are drawbacks to having a separate database per customer. Here are some of them:
- Deployment of updates
- Indexing
- On-boarding of new customers
- Cross customer reporting
- Scheduled tasks/backups
- Maintenance
Deployment of updates
Deployments of updates can be tricky when attempting to perform deployments on a large number of databases. The deployment process should be fairly automated in order to allow for updating large numbers of databases.
Indexing
Indexing can get tricky if you are troubleshooting for one particular customer and are in a a putting out a fire scenario. If you add an index to one customer’s database, should that be applied across all? And if so, it’s not that simple to deploy that same index across all databases that might not need it.
On-boarding of new customers
With a single database per customer, on-boarding of a new customer requires the creation of a new database. With automation the complexity of this could be mitigated.
Cross customer reporting
Cross customer reporting becomes difficult. Generating reports that span multiple databases is quite difficult, and almost necessitates a separate reporting database which contains data across customers.
Scheduled tasks/backups
Scheduled tasks can be more complex as they need to run for each database and any failures that occur must be resolved on each individual database. If you are on Azure SQL Database backups are less of an issue as it is part of the platform.
Maintenance
Similar to scheduled tasks, performing index maintenance gets more complex in that you have to perform it across multiple databases.
Final thoughts
If you do intend, or have already decided to use a single database for multiple customers, something to consider is modifying key queries to not use parameterized values on the “customer identifier”. If an end user never enters or is aware of their customer identifier, this prevents the possibility of SQL injection.