Lookup as Primary Vindex
It is likely that a customer order goes through a life cycle of events. This would best be represented in a separate corder_event
table that will contain a corder_id
column as a foreign key into corder.corder_id
. It would also be beneficial to co-locate the event rows with their associated order.
Just like we shared the hash
vindex between customer
and corder
, we can share corder_keyspace_idx
between corder
and corder_event
. We can also make it the Primary Vindex for corder_event
. When an order is created, the lookup row for it is also created. Subsequently, an insert into corder_event
will request the vindex to compute the keyspace_id
for that corder_id
, and that will succeed because the lookup entry for it already exists. This is where the significance of the owner table comes into play: The owner table creates the entries, whereas other tables only read those entries.
Inserting a corder_event
row without creating a corresponding corder
entry will result in an error. This behavior is in line with the traditional foreign key constraint enforced by relational databases.
Sharing the lookup vindex also has the additional benefit of saving space because we avoid creating separate entries for the new table.
We start with creating the sequence table in the product
keyspace.
Schema:
create table corder_event_seq(id bigint, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
insert into corder_event_seq(id, next_id, cache) values(0, 1, 3);
VSchema:
"corder_event_seq": { "type": "sequence" }
We then create the corder_event
table in customer
:
create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), primary key(corder_id, corder_event_id));
In the VSchema, there is no need to create a vindex because we are going to reuse the existing one:
"corder_event": {
"column_vindexes": [{
"column": "corder_id",
"name": "corder_keyspace_idx"
}],
"auto_increment": {
"column": "corder_event_id",
"sequence": "product.corder_event_seq"
}
}
Alternate VSchema DDL:
alter vschema add sequence product.corder_event_seq;
alter vschema on customer.corder_event add vindex corder_keyspace_idx(corder_id);
alter vschema on customer.corder_event add auto_increment corder_event_id using product.corder_event_seq;
We can now insert rows in corder_event
against rows in corder
:
mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal');
Query OK, 5 rows affected (0.04 sec)
mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered');
Query OK, 2 rows affected (0.01 sec)
mysql> insert into corder_event(corder_id, ename) values(6, 'expect failure');
ERROR 1105 (HY000): vtgate: http://sougou-lap1:12345/: execInsertSharded: getInsertShardedRoute: could not map [INT64(6)] to a keyspace id
As expected, inserting a row for a non-existent order results in an error.
Reversible Vindexes #
In Vitess, it is insufficient for a table to only have a Lookup Vindex. This is because it is not practical to reshard such a table. The overhead of performing a lookup before redirecting every row event to a new shard would be prohibitively expensive.
To overcome this limitation, we must add a column with a non-lookup vindex, also known as Functional Vindex to the table. By rule, the Primary Vindex computes the keyspace id of the row. This means that the value of the column should also be such that it yields the same keyspace id.
A Reversible Vindex is one that can back-compute the column value from a given keyspace id. If such a vindex is used for this new column, then Vitess will automatically perform this work and fill in the correct value for it. The list of vindex properties, like Functional, Reversible, etc. are listed in the Vindexes Reference.
In other words, adding a column with a vindex that is both Functional and Reversible allows Vitess to auto-fill the values, thereby avoiding any impact to the application logic.
The binary
vindex is one that yields the input value itself as the keyspace_id
, and is naturally reversible. Using this Vindex will generate the keyspace_id
as the column value. The modified schema for the table will be as follows:
create table corder_event(corder_event_id bigint, corder_id bigint, ename varchar(128), keyspace_id varbinary(10), primary key(corder_id, corder_event_id));
We create a vindex instantiation for binary
:
"binary": {
"type": "binary"
}
Modify the table VSchema:
"corder_event": {
"column_vindexes": [{
"column": "corder_id",
"name": "corder_keyspace_idx"
}, {
"column": "keyspace_id",
"name": "binary"
}],
"auto_increment": {
"column": "corder_event_id",
"sequence": "product.corder_event_seq"
}
}
Alternate VSchema DDL:
alter vschema on customer.corder_event add vindex `binary`(keyspace_id) using `binary`;
Note that binary
needs to be backticked because it is a keyword.
After these modifications, we can now observe that the keyspace_id
column is getting automatically populated:
mysql> insert into corder(customer_id, product_id, oname) values (1,1,'gift'),(1,2,'gift'),(2,1,'work'),(3,2,'personal'),(4,1,'personal');
Query OK, 5 rows affected (0.01 sec)
mysql> insert into corder_event(corder_id, ename) values(1, 'paid'), (5, 'delivered');
Query OK, 2 rows affected (0.01 sec)
mysql> select corder_event_id, corder_id, ename, hex(keyspace_id) from corder_event;
+-----------------+-----------+-----------+------------------+
| corder_event_id | corder_id | ename | hex(keyspace_id) |
+-----------------+-----------+-----------+------------------+
| 1 | 1 | paid | 166B40B44ABA4BD6 |
| 2 | 5 | delivered | D2FD8867D50D2DFE |
+-----------------+-----------+-----------+------------------+
2 rows in set (0.00 sec)
There is no support for backfilling the reversible vindex column yet. This will be added soon.
keyspace_id
for all these rows came from customer_id
. Since hash
is also a reversible vindex, reversing the keyspace_id
using hash
will yield the customer_id
. We could instead leverage this knowledge to replace keyspace_id+binary
with customer_id+hash
. Vitess will auto-populate the correct value. Using this approach may be more beneficial because customer_id
is a value the application can understand and make use of.