With so many applications being built as SAAS offerings, an important decision that must be made when defining the data architecture is: Should each customers’ data be in a separate database, or can a shared database (keyed by a customer identifier) be used for all customers.
Pros of database per customer
There’s a variety of reasons to isolate a customer’s data into its own database. Here’s a few I’ve seen:
- Parameter sniffing
- Cross contamination of data
- Restoring due to a data issue
- Security and access audit
Parameter sniffing is not as much of an issue. Parameter sniffing occurs when parameterized SQL generates and caches different plans for different parameter sets. This is a common cause of performance problems when different customers have large variances in the amount of data they generate. An execution plan generated for a small customer easily cause a large customer’s query to stare off into space.
In the unfortunate case that you lose a customer, selectively deleting large amounts of customer data can generate a variety of performance issues in a single database. If you have customer data isolated in its own databases it is a fairly simple drop of the whole database.
Cross contamination of data
Cross contamination of data is not an issue. I’ve seen a large commonly used software vendor write an extremely complex approach to allowing customers to generate their own SSRS reports against their own data. As a result, they needed to ensure that the customer reporting platform did not allow customer A query data against customer B. As a result, they were parsing queries submitted via their reporting platform and adding predicates (where clause) to a customer written query. This was an extremely complex fix for something that would have been easily solved if customer data was split per database.
Restore due to a data issue
I’ve seen scenarios where a customer will do something to mangle their data in a way that could not easily be fixed via the application. The fix for this was restoring from a backup. This is a simple process if the customer’s data is isolated in its own database, this is not a solution if their data is in a database with other customers.
Security and access audit
Having a separate database per customer also provides a smaller surface area when it comes to security. Different credentials can be used per customers’ data, and auditing of logins or access to customer data is made simpler.
The Azure SQL Database and Elastic Pool model fits nicely with the model of having a database per customer. If you plan to target Azure, this should definitely be a consideration.
Cons of database per customer
Of course there are drawbacks to having a separate database per customer. Here are some of them:
- Deployment of updates
- On-boarding of new customers
- Cross customer reporting
- Scheduled tasks/backups
Deployment of updates
Deployments of updates can be tricky when attempting to perform deployments on a large number of databases. The deployment process should be fairly automated in order to allow for updating large numbers of databases.
Indexing can get tricky if you are troubleshooting for one particular customer and are in a a putting out a fire scenario. If you add an index to one customer’s database, should that be applied across all? And if so, it’s not that simple to deploy that same index across all databases that might not need it.
On-boarding of new customers
With a single database per customer, on-boarding of a new customer requires the creation of a new database. With automation the complexity of this could be mitigated.
Cross customer reporting
Cross customer reporting becomes difficult. Generating reports that span multiple databases is quite difficult, and almost necessitates a separate reporting database which contains data across customers.
Scheduled tasks can be more complex as they need to run for each database and any failures that occur must be resolved on each individual database. If you are on Azure SQL Database backups are less of an issue as it is part of the platform.
Similar to scheduled tasks, performing index maintenance gets more complex in that you have to perform it across multiple databases.
If you do intend, or have already decided to use a single database for multiple customers, something to consider is modifying key queries to not use parameterized values on the “customer identifier”. If an end user never enters or is aware of their customer identifier, this prevents the possibility of SQL injection.