Monday, March 19, 2012

Key/Index Questions

I have some simple questions about database design in MSSQL2005. My background
is primarily writing code, but I do understand databases, however I'm (fairly)
new to MSSQL.
My database is to be a single primary table (we`ll call it P) with other tables
(say, about 10-15) being related like this: P <-->> X
Every table has an Identity column which is also the defined Primary Key for
that table.
It's possible that Table P could have up to a million rows, and each Table X
could have up to between 3-10 rows each related to a single row in Table P.
It's a simple database really, but could get large in scope.
My question has to do with Keys and Indexes on these tables, and I guess I'm
having difficulty (in MSSQL terms) understanding what is a "rule" of the
database and what is a "part" of the database. I'm confused when declaring a
KEY (CONSTRAINT) - does that actually creates a "key/index"?
It seems when I declare a Primary Key constraint, then an Index gets created
automatically. But I'm not sure about declaring a Foreign Key on a child table
- does that create an Index or not?
I also want to design it best for performance too<g>. Each and every row will
be INSERTed and/or UPDATEd individually and very few (if any) DELETES will take
place.
Do I absolutely need to specify a FOREIGN KEY on every child table? My
understanding is that if I do have a FOREGN KEY, then any UPDATE on a child row
will cause a referential check to be made on the parent...maybe causing a
performance issue if the tables were large. Will the database perform well, evn
if I do not specify Foreign Keys?
Whether or not, I specify FOREIGN KEYS on tables X, do I need INDEXes on the
columns that relate to the parent table P? I am pretty sure I do, especially if
I execute a SELECT statement like this:
SELECT columns FROM TableX where FK_Column = 'value'
Thanks BrianBrian, see inline
Brian Staff wrote:
> I have some simple questions about database design in MSSQL2005. My background
> is primarily writing code, but I do understand databases, however I'm (fairly)
> new to MSSQL.
> My database is to be a single primary table (we`ll call it P) with other tables
> (say, about 10-15) being related like this: P <-->> X
> Every table has an Identity column which is also the defined Primary Key for
> that table.
Yuck. Different people have different opinions about this. My opinion is
that an Identity column should not be added out of convention. IMO, it
should be added if no other (proper) key is available, which is highly
unlikely to be the case for all your 10-15 tables...
> It's possible that Table P could have up to a million rows, and each Table X
> could have up to between 3-10 rows each related to a single row in Table P.
> It's a simple database really, but could get large in scope.
> My question has to do with Keys and Indexes on these tables, and I guess I'm
> having difficulty (in MSSQL terms) understanding what is a "rule" of the
> database and what is a "part" of the database. I'm confused when declaring a
> KEY (CONSTRAINT) - does that actually creates a "key/index"?
A "rule" is a type of generic check constraint that can be tagged to any
table. Rules are deprecated and in my experience nobody uses them
(because they are a hassle). Best to forget about them.
I have no idea what "part" refers to. I don't think there is an MSSQL
concept called "part".
> It seems when I declare a Primary Key constraint, then an Index gets created
> automatically. But I'm not sure about declaring a Foreign Key on a child table
> - does that create an Index or not?
Only a Primary Key constraint and a Unique constraint will automatically
create a corresponding (unique) index. If you want your Foreign Key
indexed, then you have to add such an index yourself.
> I also want to design it best for performance too<g>. Each and every row will
> be INSERTed and/or UPDATEd individually and very few (if any) DELETES will take
> place.
> Do I absolutely need to specify a FOREIGN KEY on every child table?
I don't know. If you want to ensure data integrity, then you need it. If
you don't care, or think all data modifications will be done through
your application, and you handle the data integrity in your application,
then you can choose to omit the foreign key constraint - at your own
risk.
> My
> understanding is that if I do have a FOREGN KEY, then any UPDATE on a child row
> will cause a referential check to be made on the parent...
Correct.
> maybe causing a performance issue if the tables were large.
Possible, but unlikely. Although you will see more reads (and therefore
potential I/O) and more locking/blocking.
> Will the database perform well, evn if I do not specify Foreign Keys?
Foreign Keys are not primarily for performance. In most cases they don't
help SELECT performance, but they never hinder either. They mostly
affect the performance of data modifications (negatively).
> Whether or not, I specify FOREIGN KEYS on tables X, do I need INDEXes on the
> columns that relate to the parent table P? I am pretty sure I do, especially if
> I execute a SELECT statement like this:
> SELECT columns FROM TableX where FK_Column = 'value'
If this FK_Column is not the first column of the Primary Key, then such
a query would definitely benefit from an index.
Also, such an index helps if you have the Foreign Key in place, and you
delete a row from the referenced table (in your case table P).
HTH,
Gert-Jan
> Thanks Brian
>|||Nice reply Gert-Jan, after reading yours, I deleted mine.
One note: DOUBLE & TRIPLE YUCK on making all PK's IDENTITY.
Besides what Gert-Jan said (which I agree with), if you ever decide to use
Distributed Partition Views across multiple nodes of a cluster, you won't be
able to select all relevant data in a select from a single server - and
performance will suck.
The only time I even consider IDENTITY for a PK on something other than a
parent table is when I'm getting into many part compound PK's - and even
then I look real hard for something better.
Jay
PS. Sorry, pet peeve.
"Gert-Jan Strik" <sorry@.toomuchspamalready.nl> wrote in message
news:46C9ECDD.A2606551@.toomuchspamalready.nl...
> Brian, see inline
> Brian Staff wrote:
>> I have some simple questions about database design in MSSQL2005. My
>> background
>> is primarily writing code, but I do understand databases, however I'm
>> (fairly)
>> new to MSSQL.
>> My database is to be a single primary table (we`ll call it P) with other
>> tables
>> (say, about 10-15) being related like this: P <-->> X
>> Every table has an Identity column which is also the defined Primary Key
>> for
>> that table.
> Yuck. Different people have different opinions about this. My opinion is
> that an Identity column should not be added out of convention. IMO, it
> should be added if no other (proper) key is available, which is highly
> unlikely to be the case for all your 10-15 tables...
>> It's possible that Table P could have up to a million rows, and each
>> Table X
>> could have up to between 3-10 rows each related to a single row in Table
>> P.
>> It's a simple database really, but could get large in scope.
>> My question has to do with Keys and Indexes on these tables, and I guess
>> I'm
>> having difficulty (in MSSQL terms) understanding what is a "rule" of the
>> database and what is a "part" of the database. I'm confused when
>> declaring a
>> KEY (CONSTRAINT) - does that actually creates a "key/index"?
> A "rule" is a type of generic check constraint that can be tagged to any
> table. Rules are deprecated and in my experience nobody uses them
> (because they are a hassle). Best to forget about them.
> I have no idea what "part" refers to. I don't think there is an MSSQL
> concept called "part".
>> It seems when I declare a Primary Key constraint, then an Index gets
>> created
>> automatically. But I'm not sure about declaring a Foreign Key on a child
>> table
>> - does that create an Index or not?
> Only a Primary Key constraint and a Unique constraint will automatically
> create a corresponding (unique) index. If you want your Foreign Key
> indexed, then you have to add such an index yourself.
>> I also want to design it best for performance too<g>. Each and every row
>> will
>> be INSERTed and/or UPDATEd individually and very few (if any) DELETES
>> will take
>> place.
>> Do I absolutely need to specify a FOREIGN KEY on every child table?
> I don't know. If you want to ensure data integrity, then you need it. If
> you don't care, or think all data modifications will be done through
> your application, and you handle the data integrity in your application,
> then you can choose to omit the foreign key constraint - at your own
> risk.
>> My
>> understanding is that if I do have a FOREGN KEY, then any UPDATE on a
>> child row
>> will cause a referential check to be made on the parent...
> Correct.
>> maybe causing a performance issue if the tables were large.
> Possible, but unlikely. Although you will see more reads (and therefore
> potential I/O) and more locking/blocking.
>> Will the database perform well, evn if I do not specify Foreign Keys?
> Foreign Keys are not primarily for performance. In most cases they don't
> help SELECT performance, but they never hinder either. They mostly
> affect the performance of data modifications (negatively).
>> Whether or not, I specify FOREIGN KEYS on tables X, do I need INDEXes on
>> the
>> columns that relate to the parent table P? I am pretty sure I do,
>> especially if
>> I execute a SELECT statement like this:
>> SELECT columns FROM TableX where FK_Column = 'value'
> If this FK_Column is not the first column of the Primary Key, then such
> a query would definitely benefit from an index.
> Also, such an index helps if you have the Foreign Key in place, and you
> delete a row from the referenced table (in your case table P).
> HTH,
> Gert-Jan
>> Thanks Brian|||> Yuck. Different people have different opinions about this. My opinion is
> that an Identity column should not be added out of convention. IMO, it
> should be added if no other (proper) key is available, which is highly
> unlikely to be the case for all your 10-15 tables...
Hmmm! I see your YUCK was "raised"...twice by Jay
That's taken the wind out of my sails. I thought I was doing well in my design
especially with that part<g>. My theory on Primary Keys is that they should be
preferably be one column and numeric, since searching by alpha and/or multiple
columns would undoubtedly be slower.
Obviously a PK needs to be unique, so I'll re-examine all of the child tables
again, but I have to say that advice does not sit well with my understanding of how
databases work. One question...why is having the identity column be the PK such a
bad idea? - apart from just "yuck!"
BTW - thanks on all of the other advice - that helps a lot.
Brian|||Brian Staff wrote:
> > Yuck. Different people have different opinions about this. My opinion is
> > that an Identity column should not be added out of convention. IMO, it
> > should be added if no other (proper) key is available, which is highly
> > unlikely to be the case for all your 10-15 tables...
> Hmmm! I see your YUCK was "raised"...twice by Jay
> That's taken the wind out of my sails. I thought I was doing well in my design
> especially with that part<g>. My theory on Primary Keys is that they should be
> preferably be one column and numeric, since searching by alpha and/or multiple
> columns would undoubtedly be slower.
> Obviously a PK needs to be unique, so I'll re-examine all of the child tables
> again, but I have to say that advice does not sit well with my understanding of how
> databases work. One question...why is having the identity column be the PK such a
> bad idea? - apart from just "yuck!"
> BTW - thanks on all of the other advice - that helps a lot.
> Brian
There are reasons why many (including me) prefer a natural key over a
surrogate key. IMO, an Identity is a good choice for a surrogate key. If
you google natural vs surrogate key you will probably find a whole lot
of information on that debate.
So I am not saying that having an Identity column as the Primary Key is
not bad per se. However, a table with a foreign key very often has the
meaning of a relation table. If you take the classic example of the
table that describes which book was written by which author. This table
would have a foreign key to the Authors table, and a foreign key to the
Books table. You want the combination of author-book to be unique, so
what simpler choice that to make (author, book) the primary key of this
table? No need to add a surrogate key, very useful index on the primary
key, no need to add another constraint to make the combination
author-book unique, etc.
Now you might argue that in the example above, the primary key of the
BookAuthors table might have been an Identity column and that that key
is narrower and therefore better for performance. But in practice that
is not the case. Because you would join to this table most of the time
(or rather: almost all of the time), and the join requires access to
either/both column author_id and/or book_id. In this example, the
Identity primary key would only perform better if you were to query the
exact Primary Key value.
So if you are going to join a lot of your 10-15 tables to your table P,
then the choice of your primary key and/or the indexes on your foreign
keys is very important.
Gert-Jan|||Gert-Jan,
Thanks for your explanation. I will look carefully at my table data and
re-evaluate the PK choice.
Brian|||> There are reasons why many (including me) prefer a natural key over a
> surrogate key. IMO, an Identity is a good choice for a surrogate key. If
> you google natural vs surrogate key you will probably find a whole lot
> of information on that debate.
> So I am not saying that having an Identity column as the Primary Key is
> not bad per se. However, a table with a foreign key very often has the
> meaning of a relation table. If you take the classic example of the
> table that describes which book was written by which author. This table
> would have a foreign key to the Authors table, and a foreign key to the
> Books table. You want the combination of author-book to be unique, so
> what simpler choice that to make (author, book) the primary key of this
> table? No need to add a surrogate key, very useful index on the primary
> key, no need to add another constraint to make the combination
> author-book unique, etc.
> Now you might argue that in the example above, the primary key of the
> BookAuthors table might have been an Identity column and that that key
> is narrower and therefore better for performance. But in practice that
> is not the case. Because you would join to this table most of the time
> (or rather: almost all of the time), and the join requires access to
> either/both column author_id and/or book_id. In this example, the
> Identity primary key would only perform better if you were to query the
> exact Primary Key value.
> So if you are going to join a lot of your 10-15 tables to your table P,
> then the choice of your primary key and/or the indexes on your foreign
> keys is very important.
> Gert-Jan
My primary objection is that an IDENTITY PK usually does not describe the
data in the column, that function is taken up by indexes and non-identifying
FK's. In Gert-Jan's example a row in BookAuthors is described by the
combination of book_id and author_id, not some arbitrary IDENTITY column.
This BTW, is in large part, the definition of 3NF.
For a good description on database normilaztion:
http://www.datamodel.org/NormalizationRules.html
Jay|||This model that I was working on, as well as another one that I work
on daily, are the reasons that I have come to be wary of the surrogate
key.
create table dbo.items (
item_num varchar(16) not null,
item_desc varchar(32) not null,
constraint pk_items
primary key(item_num),
constraint u_nc_item_desc
unique(item_desc))
create table dbo.customers (
cust_name varchar(32) not null,
constraint pk_customers
primary key(cust_name))
create table dbo.cust_items (
cust_item_num varchar(16) not null,
cust_name varchar(32) not null,
constraint pk_cust_items
primary key(cust_item_num, cust_name),
constraint fk_cust_items_customers
foreign key(cust_name)
references customers(cust_name)
on update cascade)
create table dbo.cust_item_cross_ref (
cust_item_num varchar(16) not null,
cust_name varchar(32) not null,
item_num varchar(16) not null,
constraint pk_cust_item_cross_ref
primary key(cust_name, cust_item_num),
constraint u_nc_item_customer
unique(cust_name, item_num),
constraint fk_cust_item_cross_ref_items
foreign key(item_num)
references dbo.items(item_num)
on update cascade,
constraint fk_cust_item_cross_ref_cust_items
foreign key(cust_item_num, cust_name)
references dbo.cust_items(cust_item_num, cust_name)
on update cascade)
The cust_item_cross_ref table was the source of my difficulty.
The business rules surrounding this table are simply this:
1. Each of our items can be referenced by a customer once.
- covered by u_nc_item_customer
2. Each customer_item can be referenced once.
- covered by pk_cust_item_cross_ref
3. Each item can be referenced to multiple customer_items
If I had used a surrogate key in the cust_items table then I would
have a very interesting time in trying to enforce my business rules at
the cross_ref table. Example below:
create table dbo.cust_items_id (
cust_item_id int identity(1,1),
cust_item_num varchar(16) not null,
cust_name varchar(32) not null,
constraint pk_cust_items
primary key (cust_item_id),
constraint nk_cust_items
primary key(cust_item_num, cust_name),
constraint fk_cust_items_customers
foreign key(cust_name)
references dbo.customers(cust_name)
on update cascade)
create table dbo.cust_item_cross_ref_id (
cust_item_id int not null,
item_num varchar(16) not null,
constraint pk_cust_item_cross_ref
primary key(cust_item_id),
constraint fk_cust_item_cross_ref_items
foreign key(item_num)
references dbo.items(item_num)
on update cascade,
constraint fk_cust_item_cross_ref_cust_items
foreign key(cust_item_id)
references dbo.cust_items(cust_item_id)
on update cascade)
This does not allow me to enforce my second requirement. See the
sample data.
insert into items (item_num, item_desc) values ('Item A', 'Item A')
insert into items (item_num, item_desc) values ('Item B', 'Item B')
insert into customers (cust_name) values ('Cust A')
insert into cust_items (cust_item_num, cust_name) values ('AA', 'Cust
A')
insert into cust_items (cust_item_num, cust_name) values ('AB', 'Cust
A')
insert into cust_items_id (cust_item_num, cust_name) values ('AA',
'Cust A') -- id = 1
insert into cust_items_id (cust_item_num, cust_name) values ('AB',
'Cust A') -- id = 2
In this example I have two items and one customer who has two items.
Both methods prevent me from assigning a customer item to two of my
items. That rule is well enforced. However, in the surrogate key
method, I cannot enforce the other rule. I can freely assign 'Item A'
to cust_id = 1 and to cust_id = 2.
The method that I came up with to solve the problem was either a
trigger (bad...) or to create an indexed view.
create view dbo.bandaid_view with schemabinding
as
select
ci.cust_name,
cr.item_num
from
dbo.cust_item_cross_ref_id as cr
inner join dbo.cust_items_id as ci
on cr.cust_item_id = ci.cust_item_id
go
create unique clustered index u_bandaid on dbo.bandaid_view(cust_name,
item_num)
That solution worked, and would could be used in future. Since I'm
not sold on the performance issue of natural keys vs surrogate keys,
I'll have to stick with the natural keys for now. That may change in
the future, but it will stay this way for now.
Cheers,
Jason Lepack
On Aug 20, 9:34 pm, "JayKon" <s...@.nospam.org> wrote:
> > There are reasons why many (including me) prefer a natural key over a
> > surrogate key. IMO, an Identity is a good choice for a surrogate key. If
> > you google natural vs surrogate key you will probably find a whole lot
> > of information on that debate.
> > So I am not saying that having an Identity column as the Primary Key is
> > not bad per se. However, a table with a foreign key very often has the
> > meaning of a relation table. If you take the classic example of the
> > table that describes which book was written by which author. This table
> > would have a foreign key to the Authors table, and a foreign key to the
> > Books table. You want the combination of author-book to be unique, so
> > what simpler choice that to make (author, book) the primary key of this
> > table? No need to add a surrogate key, very useful index on the primary
> > key, no need to add another constraint to make the combination
> > author-book unique, etc.
> > Now you might argue that in the example above, the primary key of the
> > BookAuthors table might have been an Identity column and that that key
> > is narrower and therefore better for performance. But in practice that
> > is not the case. Because you would join to this table most of the time
> > (or rather: almost all of the time), and the join requires access to
> > either/both column author_id and/or book_id. In this example, the
> > Identity primary key would only perform better if you were to query the
> > exact Primary Key value.
> > So if you are going to join a lot of your 10-15 tables to your table P,
> > then the choice of your primary key and/or the indexes on your foreign
> > keys is very important.
> > Gert-Jan
> My primary objection is that an IDENTITY PK usually does not describe the
> data in the column, that function is taken up by indexes and non-identifying
> FK's. In Gert-Jan's example a row in BookAuthors is described by the
> combination of book_id and author_id, not some arbitrary IDENTITY column.
> This BTW, is in large part, the definition of 3NF.
> For a good description on database normilaztion:
> http://www.datamodel.org/NormalizationRules.html
> Jay- Hide quoted text -
> - Show quoted text -

No comments:

Post a Comment