Friday, February 24, 2012

database design for fast client updates

I'm trying to work out a database design to make it quicker for my client
program to read and display updates to the data set. Currently it reads in
the entire data set again after each change, which was acceptable when the
data set was small but now it's large enough to start causing noticable
delays. I've come up with a possible solution but am looking for others'
input on its suitability to the problem.

Here is the DDL for one of the tables:

create table epl_packages
(
customer varchar(8) not null, -- \
package_type char not null, -- primary key
package_no int not null, -- /
dimensions varchar(50) not null default(0),
weight_kg int not null,
despatch_id int, -- filled in on despatch
loaded bit not null default(0),
item_count int not null default(0)
)

alter table epl_packages
add constraint pk_epl_packages
primary key (customer, package_type, package_no)

My first thought was to add a datetime column to each table to record the
time of the last change, but that would only work for inserts and updates.
So I figured that a separate table for deletions would make this complete.
DDL would be something like:

create table epl_packages
(
customer varchar(8) not null,
package_type char not null,
package_no int not null,
dimensions varchar(50) not null default(0),
weight_kg int not null,
despatch_id int,
loaded bit not null default(0),
item_count int not null default(0),
last_update_time datetime default(getdate()) -- new column
)

alter table epl_packages
add constraint pk_epl_packages
primary key (customer, package_type, package_no)

create table epl_packages_deletions
(
delete_time datetime,
customer varchar(8) not null,
package_type char not null,
package_no int not null
)

And then these triggers on update and delete (insert is handled automatically
by the default constraint on last_update_time):

create trigger tr_upd_epl_packages
on epl_packages
for update
as
-- check for primary key change
if (columns_updated() & 1792) > 0 -- first three columns: 256+512+1024
insert epl_packages_deletions
select
getdate(),
customer,
package_type,
package_no
from deleted

update A
set last_update_time = getdate()
from epl_packages A
join inserted B
on A.customer = B.customer and
A.package_type = B.package_type and
A.package_no = B.package_no
go

create trigger tr_del_epl_packages
on epl_packages
for delete
as
insert epl_packages_deletions
select
getdate(),
customer,
package_type,
package_no
from deleted
go

The client program would then do the initial read as follows:

select getdate()

select
customer,
package_type,
package_no,
dimensions,
weight_kg,
despatch_id,
loaded,
item_count
from epl_packages
where
customer = {current customer}
order by
customer,
package_type,
package_no

It would store the output of getdate() to be used in subsequent updates,
which would be read from the server as follows:

select getdate()

select
customer,
package_type,
package_no,
dimensions,
weight_kg,
despatch_id,
loaded,
item_count
from epl_packages
where
customer = {current customer} and
last_update_time > {output of getdate() from previous read}
order by
customer,
package_type,
package_no

select
customer,
package_type,
package_no
from epl_packages_deletions
where
customer = {current customer} and
delete_time > {output of getdate() from previous read}

The client program will then apply the deletions and the updated/inserted
rows, in that order. This would be done for each table displayed in the
client.

Any critical comments on this approach and any improvements that could
be made would be much appreciated!T.H.N. wrote:
> I'm trying to work out a database design to make it quicker for my client
> program to read and display updates to the data set. Currently it reads in
> the entire data set again after each change, which was acceptable when the
> data set was small but now it's large enough to start causing noticable
> delays. I've come up with a possible solution but am looking for others'
> input on its suitability to the problem.

Use stored procedures for your updates, inserts, deletes and selects.
As far as I can see you aren't doing that now. Is there a reason why
not?

If you use procs then you won't need the triggers, you probably won't
need two tables and you'll probably see a performance improvement along
with all the other benefits of procs.

Thanks for including the DDL. Always tell us what version you are using
as well - it does help.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/...US,SQL.90).aspx

--|||David Portas wrote:

> Use stored procedures for your updates, inserts, deletes and selects.
> As far as I can see you aren't doing that now. Is there a reason why
> not?
> If you use procs then you won't need the triggers, you probably won't
> need two tables and you'll probably see a performance improvement along
> with all the other benefits of procs.
> Thanks for including the DDL. Always tell us what version you are using
> as well - it does help.

Thanks for your advice. The reason I used triggers for this is that it was
easier to test approach this out without modifying the existing client
program, which uses a mix of stored procs and direct SQL statements -
something I do need to fix!

Regarding performance improvements - my goal is to make the time it takes
to refresh the client's displayed data set relative to the amount of changes
since the last refresh, rather than the size of the entire set. So I think
I do need an auxillary table as I can't see any other way of logging the
deletions.

I'm using SQL Server 2000, sorry for not including that in my original post!|||T.H.N. (newsgroup.replies@.spam.la) writes:
> I'm trying to work out a database design to make it quicker for my client
> program to read and display updates to the data set. Currently it reads in
> the entire data set again after each change, which was acceptable when the
> data set was small but now it's large enough to start causing noticable
> delays. I've come up with a possible solution but am looking for others'
> input on its suitability to the problem.

I assume that these updates are not performed by your client program,
but by some other process?

Rather than using a datetime column, you could use a timestamp column.
A timestamp column is automatically updated with a monotonically
increasing value each time a row is inserted or updated. Each value is
unique in the database. Note that the name of the type is misleading.
The value is a binary(8) and has to relation to time.

You would still need that table for deleted rows. (Unless you add a
"deleted" bit to the table; the client program would then actually
perform the deletion once it has consumed the update.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/pr...oads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodin...ions/books.mspx|||Erland Sommarskog wrote:
> I assume that these updates are not performed by your client program,
> but by some other process?
> Rather than using a datetime column, you could use a timestamp column.
> A timestamp column is automatically updated with a monotonically
> increasing value each time a row is inserted or updated. Each value is
> unique in the database. Note that the name of the type is misleading.
> The value is a binary(8) and has to relation to time.
> You would still need that table for deleted rows. (Unless you add a
> "deleted" bit to the table; the client program would then actually
> perform the deletion once it has consumed the update.

Yes, sort of - there are several client programs all operating on the same
data set.

Thanks for pointing me to the timestamp data type, it's much better for this!
As well as its uniqueness per change I find it preferable to datetime as it's
independent of the system clock. My DDL now looks like this:

create table epl_packages
(
customer varchar(8) not null,
package_type char not null,
package_no int not null,
dimensions varchar(50) not null default(0),
weight_kg int not null,
despatch_id int,
loaded bit not null default(0),
item_count int not null default(0),
ts_last_update timestamp
)

alter table epl_packages
add constraint pk_epl_packages
primary key (customer, package_type, package_no)

create table epl_packages_deletions
(
ts_delete timestamp primary key, -- generated on insert. ok for PK as
-- there will be no updates to this table
customer varchar(8) not null,
package_type char not null,
package_no int not null
)

The triggers have less to do:

create trigger tr_del_epl_packages
on epl_packages
for delete
as
insert epl_packages_deletions
values (
customer,
package_type,
package_no
)
select
customer,
package_type,
package_no
from deleted
go

create trigger tr_upd_epl_packages
on epl_packages
for update
as
-- check for primary key change
if (columns_updated() & 1792) > 0 -- first three columns: 256+512+1024
insert epl_packages_deletions
values (
customer,
package_type,
package_no
)
select
customer,
package_type,
package_no
from deleted

And the client program uses "select @.@.dbts" prior to reading the data set or
changes to it to determine the point at which its displayed data was current.

No comments:

Post a Comment