With sharding all the ids have to be generated for global uniqueness. There are three strategies for this.
1. Use GUIDs as described here http://blogs.msdn.com/b/cbiyikoglu/archive/2011/06/20/id-generation-in-federations-identity-sequences-and-guids-uniqueidentifier.aspx
2. Having a central table that is accessed with a second connection to generate sequential ids
3. Using natural keys from the domain.
The second approach has the benefit of having numerical primary keys, however also a central failure location. The third strategy can seldom be used, because the domains don't allow this. Identity columns cannot be used at all.
@@@ php
<?php
use Doctrine\DBAL\DriverManager;
use Doctrine\DBAL\Id\TableHiLoIdGenerator;
$dbParams = array(
'dbname' => 'dbname.database.windows.net',
// ...
);
$conn = DriverManager::getConnection($dbParams);
$idGenerator = new TableHiLoIdGenerator($conn, 'id_table_name', $multiplicator = 1);
Doctrine Extension to support horizontal sharding in the Doctrine ORM.
## Idea
Implement sharding inside Doctrine at a level that is as unobtrusive to the developer as possible.
Problems to tackle:
1. Where to send INSERT statements?
2. How to generate primary keys?
3. How to pick shards for update, delete statements?
4. How to pick shards for select operations?
5. How to merge select queries that span multiple shards?
6. How to handle/prevent multi-shard queries that cannot be merged (GROUP BY)?
7. How to handle non-sharded data? (static metadata tables for example)
8. How to handle multiple connections?
9. Implementation on the DBAL or ORM level?
## Roadmap
Version 1: DBAL 2.3 (Multi-Tenant Apps)
1. ID Generation support (in DBAL + ORM done)
2. Multi-Tenant Support: Either pick a global metadata database or exactly one shard.
3. Fan-out queries over all shards (or a subset) by result appending
Version 2: ORM related (complex):
4. ID resolving (Pick shard for a new ID)
5. Query resolving (Pick shards a query should send to)
6. Shard resolving (Pick shards an ID could be on)
7. Transactions
8. Read Only objects
## Technical Requirements for Database Schemas
Sharded tables require the sharding-distribution key as one of their columns. This will affect your code compared to a normalized db-schema. If you have a Blog <-> BlogPost <-> PostComments entity setup sharded by `blog_id` then even the PostComment table needs this column, even if an "unsharded", normalized DB-Schema does not need this information.
## Implementation Details
Assumptions:
* For querying you either want to query ALL or just exactly one shard.
* IDs for ALL sharded tables have to be unique across all shards.
* Non-sharded data is replicated between all shards. They redundantly keep the information available. This is necessary so join queries on shards to reference data work.
* If you retrieve an object A from a shard, then all references and collections of this object reside on the same shard.
* The database schema on all shards is the same (or compatible)
### SQL Azure Federations
SQL Azure is a special case, points 1, 2, 3, 4, 7 and 8 are partly handled on the database level. This makes it a perfect test-implementation for just the subset of features in points 5-6. However there needs to be a way to configure SchemaTool to generate the correct Schema on SQL Azure.
* SELECT Operations: The most simple assumption is to always query all shards unless the user specifies otherwise explicitly.
* Queries can be merged in PHP code, this obviously does not work for DISTINCT, GROUP BY and ORDER BY queries.
### Generic Sharding
More features are necessary to implement sharding on the PHP level, independent from database support:
1. Configuration of multiple connections, one connection = one shard.
2. Primary Key Generation mechanisms (UUID, central table, sequence emulation)
## Primary Use-Cases
1. Multi-Tenant Applications
These are easier to support as you have some value to determine the shard id for the whole request very early on.
Here also queries can always be limited to a single shard.
2. Scale-Out by some attribute (Round-Robin?)
This strategy requires access to multiple shards in a single request based on the data accessed.
The sharding extension is currently in transition from a separate Project
into DBAL. Class names may differ.
This tutorial builds upon the `Brian Swans tutorial <http://blogs.msdn.com/b/silverlining/archive/2012/01/18/using-sql-azure-federations-via-php.aspx>`_
on SQLAzure Sharding and turns all the examples into examples using the Doctrine Sharding support.
It introduces SQL Azure Sharding, which is an abstraction layer in SQL Azure to
support sharding. Many features for sharding are implemented on the database
level, which makes it much easier to work with than generic sharding
implementations.
For this tutorial you need an Azure account. You don't need to deploy the code
on Azure, you can run it from your own machine against the remote database.
.. note::
You can look at the code from the 'examples/sharding' directory.
Install Doctrine
----------------
For this tutorial we will install Doctrine and the Sharding Extension through
`Composer <http://getcomposer.org>`_ which is the easiest way to install
Doctrine. Composer is a new package manager for PHP. Download the
``composer.phar`` from their website and put it into a newly created folder for
this tutorial. Now create a ``composer.json`` file in this project root with
the following content:
{
"require": {
"doctrine/dbal": "2.2.2",
"doctrine/shards": "0.2"
}
}
Open up the commandline and switch to your tutorial root directory, then call
``php composer.phar install``. It will grab the code and install it into the
``vendor`` subdirectory of your project. It also creates an autoloader, so that
we don't have to care about this.
Setup Connection
----------------
The first thing to start with is setting up Doctrine and the database connection:
.. code-block:: php
<?php
// bootstrap.php
use Doctrine\DBAL\DriverManager;
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
require_once "vendor/autoload.php";
$conn = DriverManager::getConnection(array(
'driver' => 'pdo_sqlsrv',
'dbname' => 'SalesDB',
'host' => 'tcp:dbname.windows.net',
'user' => 'user@dbname',
'password' => 'XXX',
'platform' => new \Doctrine\DBAL\Platforms\SQLAzurePlatform(),
'CREATE TABLE Products (ProductID INT NOT NULL, SupplierID INT NOT NULL, ProductName NVARCHAR(255) NOT NULL, Price NUMERIC(12, 2) NOT NULL, PRIMARY KEY (ProductID))',
'CREATE TABLE Customers (CustomerID INT NOT NULL, CompanyName NVARCHAR(255) NOT NULL, FirstName NVARCHAR(255) NOT NULL, LastName NVARCHAR(255) NOT NULL, PRIMARY KEY (CustomerID))',
'CREATE TABLE Orders (CustomerID INT NOT NULL, OrderID INT NOT NULL, OrderDate DATETIME2(6) NOT NULL, PRIMARY KEY (CustomerID, OrderID))',
'CREATE TABLE OrderItems (CustomerID INT NOT NULL, OrderID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, PRIMARY KEY (CustomerID, OrderID, ProductID))',