Commit 9b2fbbae authored by Benjamin Eberlei's avatar Benjamin Eberlei

Add Sharding documentation from doctrine/Shards

parent f0ccb4d2
......@@ -24,6 +24,8 @@ Contents:
reference/schema-representation
reference/events
reference/security
reference/sharding
reference/sharding_azure_tutorial
reference/supporting-other-databases
reference/portability
reference/known-vendor-issues
......
Sharding
========
Starting with 2.3 Doctrine DBAL contains some functionality to simplify the
development of horizontally sharded applications. In this first release it
contains a ``ShardManager`` interface. This interface allows to programatically
select a shard to send queries to. At the moment there are no functionalities
yet to dynamically pick a shard based on ID, query or database row yet. That
means the sharding extension is primarily suited for:
1. multi-tenant applications or
2. applications with completly separated datasets (example: weather data).
Both kind of application will work with both DBAL and ORM.
.. note::
Horizontal sharding is an evasive archicture that will affect your application code and using this
extension to Doctrine will not make it work "magically".
You have to understand and integrate the following drawbacks:
1. Pre-generation of IDs that are unique across all shards required.
2. No transaction support across shards.
3. No foreign key support across shards (meaning no "real" relations).
4. Very complex (or impossible) to query aggregates across shards.
5. Denormalization: Composite keys required where normalized non-sharded db schemas don't need them.
6. Schema Operations have to be done on all shards.
The primary questions in a sharding architecture are:
* Where is my data located?
* Where should I save this new data to find it later?
To answer these questions you generally have to craft a function that will tell
you for a given ID, on which shard the data for this ID is located. To simplify
this approach you will generally just pick a table which is the root of a set of
related data and decide for the IDs of this table. All the related data that
belong to this table are saved on the same shard.
Take for example a multi-user blog application with the following tables:
* Blog [id, name]
* Post [id, blog_id, subject, body, author_id]
* Comment [id, post_id, comment, author_id]
* User [id, username]
A sensible sharding architecture will split the application by blog. That means
all the data for a particular blog will be on a single shard and scaling is
done by putting the amound of blogs on many different database servers.
Now users can post and comment on different blogs that reside on different
shards. This makes the database schema above slightly tricky, because both
`author_id` columns cannot have foreign keys to `User (id)`. Instead the User
table is located in an entirely different "dimension" of the application in
terms of the sharding architecture.
To simplify working with this kind of multi-dimensional database schema, you
can replace the author_ids with something more "meaningful", for example the
e-mail address of the users if that is always known. The "user" table can then
be seperated from the database schema above and put on a second horizontally
scaled sharding architecture.
As you can see, even with just the four tables above, sharding actually becomes
quite complex to think about.
The rest of this section discusses Doctrine sharding functionality in technical
detail.
ID Generation
-------------
To solve the issue of unique ID-generation across all shards are several
approaches you should evaluate:
Use GUID/UUIDs
~~~~~~~~~~~~~~
The most simple ID-generation mechanism for sharding are
universally unique identifiers. These are 16-byte
(128-bit) numbers that are guaranteed to be unique across different servers.
You can `read up on UUIDs on Wikipedia
<http://en.wikipedia.org/wiki/Universally_unique_identifier>`_.
The drawback of UUIDs is the segmentation they cause on indexes. Because UUIDs
are not sequentially generated, they can have negative impact on index access
performance. Additionally they are much bigger
than numerical primary keys (which are normally 4-bytes in length).
At the moment Doctrine DBAL drivers MySQL and SQL Server support the generation
of UUID/GUIDs. You can use the following bit of code to generate them across
platforms:
.. code-block:: php
<?php
use Doctrine\DBAL\DriverManager;
$conn = DriverManager::getConnection(/**..**/);
$guid = $conn->fetchColumn('SELECT ' . $conn->getDatabasePlatform()->getGuidExpression());
$conn->insert("my_table", array("id" => $guid, "foo" => "bar"));
In your application you should hide this details in Id-Generation services:
.. code-block:: php
<?php
namespace MyApplication;
class IdGenerationService
{
private $conn;
public function generateCustomerId()
{
return $this->conn->fetchColumn('SELECT ' .
$this->conn->getDatabasePlatform()->getGuidExpression()
);
}
}
A good starting point to read up on GUIDs (vs numerical ids) is this blog post
`Coding Horror: Primary Keys: IDs vs GUIDs
<http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html>`_.
Table Generator
~~~~~~~~~~~~~~~
In some scenarios there is no way around a numerical, automatically
incrementing id. The way Auto incrementing IDs are implemented in MySQL and SQL
Server however is completly unsuitable for sharding. Remember in a sharding
architecture you have to know where the row for a specific ID is located and
IDs have to be globally unique across all servers. Auto-Increment Primary Keys
are missing both properties.
To get around this issue you can use the so-called "table-generator" strategy.
In this case you define a single database that is responsible for the
generation of auto-incremented ids. You create a table on this database and
through the use of locking create new sequential ids.
There are three important drawbacks to this strategy:
1. Single point of failure
2. Bottleneck when application is write-heavy
3. A second independent database connection is needed to guarantee transaction
safety.
If you can live with this drawbacks then you can use table-generation with the
following code in Doctrine:
.. code-block:: php
<?php
use Doctrine\DBAL\DriverManager;
use Doctrine\DBAL\Id\TableGenerator;
$conn = DriverManager::getConnection(/**..**); // connection 1
// creating the TableGenerator automatically opens a second connection.
$tableGenerator = new TableGenerator($conn, "sequences_tbl_name");
$id1 = $tableGenerator->nextValue("sequence_name1");
$id2 = $tableGenerator->nextValue("sequence_name2");
The table generator obviously needs a table to work. The schema of this table
is described in the ``TableGenerator`` class-docblock. Alternatively you
can use the ``Doctrine\DBAL\Id\TableGeneratorSchemaVisitor`` and apply it to your
``Doctrine\DBAL\Schema\Schema`` instance. It will automatically add the required
sequence table.
Natural Identifiers
~~~~~~~~~~~~~~~~~~~
Sometimes you are lucky and your application data-model comes with a natural
id. This is mostly the case for applications who get their IDs generated
somewhere else (exogeneous ID-generation) or that work with temporal data. In
that case you can just define the natural primary key and shard your
application based on this data.
Transactions
------------
Transactions in sharding can only work for data that is located on a single
shard. If you need transactions in your sharding architecture then you have to
make sure that the data updated during a transaction is located on a single
shard.
Foreign Keys
------------
Since you cannot create foreign keys between remote database servers, in a
sharding architecture you should put the data on a shard that belongs to each
other. But even if you can isolate most of the rows on a single shard there may
exist relations between tables that exist on different shards. In this case
your application should be aware of the potential inconsistencies and handle
them graciously.
Complex Queries
---------------
GROUP BY, DISTINCT and ORDER BY are clauses that cannot be easily used in a
sharding architecture. If you have to execute these queries against multiple
shards then you cannot just append the different results to each other.
You have to be aware of this problem and design your queries accordingly or
shard the data in a way that you never have to query multiple shards to
calculate a result.
ShardManager Interface
----------------------
The central API of the sharding extension is the ``ShardManager`` interface.
It contains two different groups of functions with regard to sharding.
First, it contains the Shard Selection API. You can pick a shard based on a
so-called "distribution-value" or reset the connection to the "global" shard,
a necessary database that often contains heavily cached, sharding independent
data such as meta tables or the "user/tenant" table.
.. code-block:: php
<?php
use Doctrine\DBAL\DriverManager;
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
$conn = DriverManager::getConnection(array(
'sharding' => array(
'federationName' => 'my_database',
'distributionKey' => 'customer_id',
)
));
$shardManager = new SQLAzureShardManager($conn);
$currentCustomerId = 1234;
$shardManager->selectShard($currentCustomerId);
// all queries after this call hit the shard
// where customer with id 1234 is on.
$shardManager->selectGlobal();
// the global database is selected.
To access the currently selected distribution value use the following API
method:
.. code-block:: php
<?php
$value = $shardManager->getCurrentDistributionValue();
The shard manager will prevent you switching shards when a transaction is open.
This is especially important when using sharding with the ORM. Because the ORM
uses a single transaction during the flush-operation this means that you can
only ever use one ``EntityManager`` with data from a single shard.
The second API is the "fan-out" query API. This allows you to execute queries against
ALL shards. The order of the results of this operation is undefined, that means
your query has to return the data in a way that works for the application, or
you have to sort the data in the application.
.. code-block:: php
<?php
$sql = "SELECT * FROM customers";
$rows = $shardManager->queryAll($sql, $params);
Schema Operations: SchemaSynchronizer Interface
-----------------------------------------------
Schema Operations in a sharding architecture are tricky. You have to perform
them on all databases instances (shards) at the same time. Also Doctrine
has problems with this in particular as you cannot generate an SQL file with
changes on any development machine anymore and apply this on production. The
required changes depend on the amount of shards.
To allow the Doctrine Schema API operations on a sharding architecture we
performed a refactored from code inside ORM ``Doctrine\ORM\Tools\SchemaTool``
class and extracted the code for operations on Schema instances into a new
``Doctrine\Shards\DBAL\SchemaSynchronizer`` interface.
Every sharding implementation can implement this interface and allow schema
operations to take part on multiple shards.
SQL Azure Federations
---------------------
Doctrine Shards ships with a custom implementation for Microsoft SQL
Azure. The Azure platform provides a native sharding functionality. In SQL
Azure the sharding functionality is called Federations. This
functionality applies the following restrictions (in line with the ones listed
above):
1. IDENTITY columns are not allowed on sharded tables (federated tables)
2. Each table may only have exactly one clustered index and this index has to
have the distribution key/sharding-id as one column.
3. Every unique index (or primary key) has to contain the
distribution-key/sharding-id.
Especially the requirements 2 and 3 prevent normalized database schemas. You
have to put the distribution key on every sharded table, which can affect your
application code quite a bit. This may lead to the creation of composite keys
where you normally wouldn't need them.
The benefit of SQL Azure Federations is that they implement all the
shard-picking logic on the server. You only have to make use of the ``USE
FEDERATION`` statement. You don't have to maintain a list of all the shards
inside your application and more importantly, resizing shards is done
transparently on the server.
Features of SQL Azure are:
- Central server to log into federations architecture. No need to know all
connection details of all shards.
- Database level operation to split shards, taking away the tediousness of this
operation for application developers.
- A global tablespace that can contain global data to all shards.
- One or many different federations (this library only supports working with
one)
- Sharded or non-sharded tables inside federations
- Allows filtering SELECT queries on the database based on the selected
sharding key value. This allows to implement sharded Multi-Tenant Apps very easily.
To setup an SQL Azure ShardManager use the following code:
.. code-block:: php
<?php
use Doctrine\DBAL\DriverManager;
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
$conn = DriverManager::getConnection(array(
'dbname' => 'my_database',
'host' => 'tcp:dbname.windows.net',
'user' => 'user@dbname',
'password' => 'XXX',
'sharding' => array(
'federationName' => 'my_federation',
'distributionKey' => 'customer_id',
'distributionType' => 'integer',
)
));
$shardManager = new SQLAzureShardManager($conn);
Currently you are limited to one federation in your application.
You can inspect all the currently known shards on SQL Azure using the
``ShardManager#getShards()`` function:
.. code-block:: php
<?php
foreach ($shardManager->getShards() as $shard) {
echo $shard['id'] . " " . $shard['rangeLow'] . " - " . $shard['rangeHigh'];
}
Schema Operations
~~~~~~~~~~~~~~~~~
Schema Operations on SQL Azure Federations are possible with the
``SQLAzureSchemaSynchronizer``. You can instantiate this from your code:
.. code-block:: php
<?php
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureSchemaSynchronizer;
$synchronizer = new SQLAzureSchemaSynchronizer($conn, $shardManager);
You can use the API such as ``createSchema($schema)`` then and it will be
distributed across all shards. The assumptions are:
- Using ``SchemaSynchronizer#createSchema()`` assumes the database is empty.
The federation is created during this operation.
- Using ``SchemaSynchronizer#updateSchema()`` assumes the database and the
federation exists. All shards of the federation are iterated and update is
applied to all shards consecutively.
For a schema with tables in the global or federated sub-schema you have to use
the Schema API to mark tables:
.. code-block:: php
<?php
use Doctrine\DBAL\Schema\Schema;
$schema = new Schema();
// no options set, this table will be on the federation root
$users = $schema->createTable('Users');
//...
// marked as sharded, but no distribution column given:
// non-federated table inside the federation
$products = $schema->createTable('Products');
$products->addOption('azure.federated', true);
//...
// shared + distribution column:
// federated table
$customers = $schema->createTable('Customers');
$customers->addColumn('CustomerID', 'integer');
//...
$customers->addOption('azure.federated', true);
$customers->addOption('azure.federatedOnColumnName', 'CustomerID');
SQLAzure Filtering
~~~~~~~~~~~~~~~~~~
SQL Azure comes with a powerful filtering feature, that allows you to
automatically implement a multi-tenant application for a formerly single-tenant
application. The restriction to make this work is that your application does not work with
IDENTITY columns.
Normally when you select a shard using ``ShardManager#selectShard()`` any query
executed against this shard will return data from ALL the tenants located on
this shard. With the "FILTERING=ON" flag on the ``USE FEDERATION`` query
however SQL Azure can automatically filter all SELECT queries with the chosen
distribution value. Additionally you can automatically set the currently
selected distribution value in every INSERT statement using a function for this
value as the ``DEFAULT`` part of the column. If you are using GUIDs for every
row then UPDATE and DELETE statements using only GUIDs will work out perfectly
as well, as they are by definition for unique rows. This feature allows you to
build multi-tenant applications, even though they were not originally designed
that way.
To enable filtering you can use the
``SQLAzureShardManager#setFilteringEnabled()`` method. This method is not part
of the interface. You can also set a default value for filtering by passing it
as the "sharding.filteringEnabled" parameter to
``DriverManager#getConnection()``.
Generic SQL Sharding Support
----------------------------
Besides the custom SQL Azure support there is a generic implementation that
works with all database drivers. It requires to specify all database
connections and will switch between the different connections under the hood
when using the ``ShardManager`` API. This is also the biggest drawback of this
approach, since fan-out queries need to connect to all databases in a single
request.
See the configuration for a sample sharding connection:
.. code-block:: php
<?php
use Doctrine\DBAL\DriverManager;
$conn = DriverManager::getConnection(array(
'wrapperClass' => 'Doctrine\Shards\DBAL\PoolingShardConnection',
'driver' => 'pdo_sqlite',
'global' => array('memory' => true),
'shards' => array(
array('id' => 1, 'memory' => true),
array('id' => 2, 'memory' => true),
),
'shardChoser' => 'Doctrine\Shards\DBAL\ShardChoser\MultiTenantShardChoser',
));
You have to configure the following options:
- 'wrapperClass' - Selecting the PoolingShardConnection as above.
- 'global' - An array of database parameters that is used for connecting to the
global database.
- 'shards' - An array of of shard database parameters. You have to specifiy an
'id' parameter for each of the shard configurations.
- 'shardChoser' - Implementation of the
``Doctrine\Shards\DBAL\ShardChoser\ShardChoser`` interface.
The Shard Choser interface maps the distribution value to a shard-id. This
gives you the freedom to implement your own strategy for sharding the data
horizontally.
SQLAzure Sharding Tutorial
==========================
This tutorial builds upon the `Brian Swans tutorial
<http://blogs.msdn.com/b/silverlining/archive/2012/01/18/using-sql-azure-federations-via-php.aspx>`_
on SQLAzure Sharding and turns all the examples into examples using the Doctrine Sharding support.
It introduces SQL Azure Sharding, which is an abstraction layer in SQL Azure to
support sharding. Many features for sharding are implemented on the database
level, which makes it much easier to work with than generic sharding
implementations.
For this tutorial you need an Azure account. You don't need to deploy the code
on Azure, you can run it from your own machine against the remote database.
.. note::
You can look at the code from the 'examples/sharding' directory.
Install Doctrine
----------------
For this tutorial we will install Doctrine and the Sharding Extension through
`Composer <http://getcomposer.org>`_ which is the easiest way to install
Doctrine. Composer is a new package manager for PHP. Download the
``composer.phar`` from their website and put it into a newly created folder for
this tutorial. Now create a ``composer.json`` file in this project root with
the following content:
{
"require": {
"doctrine/dbal": "2.2.2",
"doctrine/shards": "0.2"
}
}
Open up the commandline and switch to your tutorial root directory, then call
``php composer.phar install``. It will grab the code and install it into the
``vendor`` subdirectory of your project. It also creates an autoloader, so that
we don't have to care about this.
Setup Connection
----------------
The first thing to start with is setting up Doctrine and the database connection:
.. code-block:: php
<?php
// bootstrap.php
use Doctrine\DBAL\DriverManager;
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureShardManager;
require_once "vendor/autoload.php";
$conn = DriverManager::getConnection(array(
'driver' => 'pdo_sqlsrv',
'dbname' => 'SalesDB',
'host' => 'tcp:dbname.windows.net',
'user' => 'user@dbname',
'password' => 'XXX',
'platform' => new \Doctrine\DBAL\Platforms\SQLAzurePlatform(),
'driverOptions' => array('MultipleActiveResultSets' => false),
'sharding' => array(
'federationName' => 'Orders_Federation',
'distributionKey' => 'CustId',
'distributionType' => 'integer',
)
));
$shardManager = new SQLAzureShardManager($conn);
Create Database
---------------
Create a new database using the Azure/SQL Azure management console.
Create Schema
-------------
Doctrine has a powerful schema API. We don't need to use low-level DDL
statements to generate the database schema. Instead you can use an Object-Oriented API
to create the database schema and then have Doctrine turn it into DDL
statements.
We will recreate Brians example schema with Doctrine DBAL. Instead of having to
create federations and schema seperately as in his example, Doctrine will do it
all in one step:
.. code-block:: php
<?php
// create_schema.php
use Doctrine\DBAL\Schema\Schema;
use Doctrine\Shards\DBAL\SQLAzure\SQLAzureSchemaSynchronizer;
require_once 'bootstrap.php';
$schema = new Schema();
$products = $schema->createTable('Products');
$products->addColumn('ProductID', 'integer');
$products->addColumn('SupplierID', 'integer');
$products->addColumn('ProductName', 'string');
$products->addColumn('Price', 'decimal', array('scale' => 2, 'precision' => 12));
$products->setPrimaryKey(array('ProductID'));
$products->addOption('azure.federated', true);
$customers = $schema->createTable('Customers');
$customers->addColumn('CustomerID', 'integer');
$customers->addColumn('CompanyName', 'string');
$customers->addColumn('FirstName', 'string');
$customers->addColumn('LastName', 'string');
$customers->setPrimaryKey(array('CustomerID'));
$customers->addOption('azure.federated', true);
$customers->addOption('azure.federatedOnColumnName', 'CustomerID');
$orders = $schema->createTable('Orders');
$orders->addColumn('CustomerID', 'integer');
$orders->addColumn('OrderID', 'integer');
$orders->addColumn('OrderDate', 'datetime');
$orders->setPrimaryKey(array('CustomerID', 'OrderID'));
$orders->addOption('azure.federated', true);
$orders->addOption('azure.federatedOnColumnName', 'CustomerID');
$orderItems = $schema->createTable('OrderItems');
$orderItems->addColumn('CustomerID', 'integer');
$orderItems->addColumn('OrderID', 'integer');
$orderItems->addColumn('ProductID', 'integer');
$orderItems->addColumn('Quantity', 'integer');
$orderItems->setPrimaryKey(array('CustomerID', 'OrderID', 'ProductID'));
$orderItems->addOption('azure.federated', true);
$orderItems->addOption('azure.federatedOnColumnName', 'CustomerID');
// Create the Schema + Federation:
$synchronizer = new SQLAzureSchemaSynchronizer($conn, $shardManager);
$synchronizer->createSchema($schema);
// Or jut look at the SQL:
echo implode("\n", $synchronizer->getCreateSchema($schema));
View Federation Members
-----------------------
To see how many shard instances (called Federation Members) your SQLAzure database currently has
you can ask the ``ShardManager`` to enumerate all shards:
.. code-block:: php
<?php
// view_federation_members.php
require_once "bootstrap.php";
$shards = $shardManager->getShards();
foreach ($shards as $shard) {
print_r($shard);
}
Insert Data
-----------
Now we want to insert some test data into the database to see the behavior when
we split the shards. We use the same test data as Brian, but use the Doctrine
API to insert them. To insert data into federated tables we have to select the
shard we want to put the data into. We can use the ShardManager to execute this
operation for us:
.. code-block:: php
<?php
// insert_data.php
require_once "bootstrap.php";
$shardManager->selectShard(0);
$conn->insert("Products", array(
"ProductID" => 386,
"SupplierID" => 1001,
"ProductName" => 'Titanium Extension Bracket Left Hand',
"Price" => 5.25,
));
$conn->insert("Products", array(
"ProductID" => 387,
"SupplierID" => 1001,
"ProductName" => 'Titanium Extension Bracket Right Hand',
"Price" => 5.25,
));
$conn->insert("Products", array(
"ProductID" => 388,
"SupplierID" => 1001,
"ProductName" => 'Fusion Generator Module 5 kV',
"Price" => 10.50,
));
$conn->insert("Products", array(
"ProductID" => 388,
"SupplierID" => 1001,
"ProductName" => 'Bypass Filter 400 MHz Low Pass',
"Price" => 10.50,
));
$conn->insert("Customers", array(
'CustomerID' => 10,
'CompanyName' => 'Van Nuys',
'FirstName' => 'Catherine',
'LastName' => 'Abel',
));
$conn->insert("Customers", array(
'CustomerID' => 20,
'CompanyName' => 'Abercrombie',
'FirstName' => 'Kim',
'LastName' => 'Branch',
));
$conn->insert("Customers", array(
'CustomerID' => 30,
'CompanyName' => 'Contoso',
'FirstName' => 'Frances',
'LastName' => 'Adams',
));
$conn->insert("Customers", array(
'CustomerID' => 40,
'CompanyName' => 'A. Datum Corporation',
'FirstName' => 'Mark',
'LastName' => 'Harrington',
));
$conn->insert("Customers", array(
'CustomerID' => 50,
'CompanyName' => 'Adventure Works',
'FirstName' => 'Keith',
'LastName' => 'Harris',
));
$conn->insert("Customers", array(
'CustomerID' => 60,
'CompanyName' => 'Alpine Ski House',
'FirstName' => 'Wilson',
'LastName' => 'Pais',
));
$conn->insert("Customers", array(
'CustomerID' => 70,
'CompanyName' => 'Baldwin Museum of Science',
'FirstName' => 'Roger',
'LastName' => 'Harui',
));
$conn->insert("Customers", array(
'CustomerID' => 80,
'CompanyName' => 'Blue Yonder Airlines',
'FirstName' => 'Pilar',
'LastName' => 'Pinilla',
));
$conn->insert("Customers", array(
'CustomerID' => 90,
'CompanyName' => 'City Power & Light',
'FirstName' => 'Kari',
'LastName' => 'Hensien',
));
$conn->insert("Customers", array(
'CustomerID' => 100,
'CompanyName' => 'Coho Winery',
'FirstName' => 'Peter',
'LastName' => 'Brehm',
));
$conn->executeUpdate("DECLARE @orderId INT
DECLARE @customerId INT
SET @orderId = 10
SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Hensien' and FirstName = 'Kari'
INSERT INTO Orders (CustomerId, OrderId, OrderDate)
VALUES (@customerId, @orderId, GetDate())
INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
VALUES (@customerId, @orderId, 388, 4)
SET @orderId = 20
SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Harui' and FirstName = 'Roger'
INSERT INTO Orders (CustomerId, OrderId, OrderDate)
VALUES (@customerId, @orderId, GetDate())
INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
VALUES (@customerId, @orderId, 389, 2)
SET @orderId = 30
SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Brehm' and FirstName = 'Peter'
INSERT INTO Orders (CustomerId, OrderId, OrderDate)
VALUES (@customerId, @orderId, GetDate())
INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
VALUES (@customerId, @orderId, 387, 3)
SET @orderId = 40
SELECT @customerId = CustomerId FROM Customers WHERE LastName = 'Pais' and FirstName = 'Wilson'
INSERT INTO Orders (CustomerId, OrderId, OrderDate)
VALUES (@customerId, @orderId, GetDate())
INSERT INTO OrderItems (CustomerID, OrderID, ProductID, Quantity)
VALUES (@customerId, @orderId, 388, 1)"
);
This puts the data into the currently only existing federation member. We
selected that federation member by picking 0 as distribution value, which is by
definition part of the only existing federation.
Split Federation
----------------
Now lets split the federation, creating a second federation member. SQL Azure
will automatically redistribute the data into the two federations after you
executed this command.
.. code-block:: php
<?php
// split_federation.php
require_once 'bootstrap.php';
$shardManager->splitFederation(60);
This little script uses the shard manager with a special method only existing
on the SQL AZure implementation ``splitFederation``. It accepts a value at
at which the split is executed.
If you reexecute the ``view_federation_members.php`` script you can now see
that there are two federation members instead of just one as before. You can
see with the ``rangeLow`` and ``rangeHigh`` parameters what customers and
related entries are now served by which federation.
Inserting Data after Split
--------------------------
Now after we splitted the data we now have to make sure to be connected to the
right federation before inserting data. Lets add a new customer with ID 55 and
have him create an order.
.. code-block:: php
<?php
// insert_data_aftersplit.php
require_once 'bootstrap.php';
$newCustomerId = 55;
$shardManager->selectShard($newCustomerId);
$conn->insert("Customers", array(
"CustomerID" => $newCustomerId,
"CompanyName" => "Microsoft",
"FirstName" => "Brian",
"LastName" => "Swan",
));
$conn->insert("Orders", array(
"CustomerID" => 55,
"OrderID" => 37,
"OrderDate" => date('Y-m-d H:i:s'),
));
$conn->insert("OrderItems", array(
"CustomerID" => 55,
"OrderID" => 37,
"ProductID" => 387,
"Quantity" => 1,
));
As you can see its very important to pick the right distribution key in your
sharded application. Otherwise you have to switch the shards very often, which
is not really easy to work with. If you pick the sharding key right then it
should be possible to select the shard only once per request for the major
number of use-cases.
Fan-out the queries accross multiple shards should only be necessary for a
small number of queries, because these kind of queries are complex.
Querying data with filtering off
--------------------------------
To access the data you have to pick a shard again and then start selecting data
from it.
.. code-block:: php
<?php
// query_filtering_off.php
require_once "bootstrap.php";
$shardManager->selectShard(0);
$data = $conn->fetchAll('SELECT * FROM Customers');
print_r($data);
This returns all customers from the shard with distribution value 0. This will
be all customers with id 10 to less than 60, since we split federations at 60.
Querying data with filtering on
-------------------------------
One special feature of SQL Azure is the possibility to database level filtering
based on the sharding distribution values. This means that SQL Azure will add
WHERE clauses with distributionkey=current distribution value conditions to
each distribution key.
.. code-block:: php
<?php
// query_filtering_on.php
require_once "bootstrap.php";
$shardManager->setFilteringEnabled(true);
$shardManager->selectShard(55);
$data = $conn->fetchAll('SELECT * FROM Customers');
print_r($data);
Now you only get the customer with id = 55. The same holds for queries on the
``Orders`` and ``OrderItems`` table, which are restricted by customer id = 55.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment