Saturday, February 25, 2012

database design question

Database design is a lot harder than I thought. I'm going through a few
books, but I can't figure this one out...
Imagine a database to track employee timecards. We'd have information on
employees, job descriptions, hours worked, and things like that. Pretty
basic stuff.
Now imagine, for example, a dotcom which wants to store timecard infomation
for thousands of companies. The table with the employee ID, company name,
and hours worked would get huge since there'd be just one table with
information about all employees, regardless of their company. This seems
like a problem.
Is the solution to partition the table so that rows pertaining to each
company are stored on the same server? Or, horizontally partition the huge
table so that a year's worth of timecard info is split into 12 partitions,
one for each month? Or ...?
Or, how about making a new database for each company which subscribes to the
service? SS2000 can support 32,000+ databases, but the overhead for running
thousands of databases simultaneously must be huge.
any thoughts?
Josh> Now imagine, for example, a dotcom which wants to store timecard
infomation
> for thousands of companies. The table with the employee ID, company name,
> and hours worked would get huge since there'd be just one table with
> information about all employees, regardless of their company. This seems
> like a problem.
Not really. Any table that is properly designed and indexed will work fine
with tens of millions of rows.
You might however want to create a separate database for each customer for
security reasons (timecard information is a sensitive issue, just ask
British Airways), and that might solve some potential performance problems.
I don't think the overhead for running multiple databases on one server is
very noticable.
Jacco Schalkwijk MCDBA, MCSD, MCSE
Database Administrator
Eurostop Ltd.
"Josh Meyer" <jmeyer@.msg.ucsf.edu> wrote in message
news:#LAiZfNVDHA.2368@.TK2MSFTNGP09.phx.gbl...
> Database design is a lot harder than I thought. I'm going through a few
> books, but I can't figure this one out...
> Imagine a database to track employee timecards. We'd have information on
> employees, job descriptions, hours worked, and things like that. Pretty
> basic stuff.
> Now imagine, for example, a dotcom which wants to store timecard
infomation
> for thousands of companies. The table with the employee ID, company name,
> and hours worked would get huge since there'd be just one table with
> information about all employees, regardless of their company. This seems
> like a problem.
> Is the solution to partition the table so that rows pertaining to each
> company are stored on the same server? Or, horizontally partition the
huge
> table so that a year's worth of timecard info is split into 12 partitions,
> one for each month? Or ...?
> Or, how about making a new database for each company which subscribes to
the
> service? SS2000 can support 32,000+ databases, but the overhead for
running
> thousands of databases simultaneously must be huge.
> any thoughts?
> Josh
>|||I would imagine the the "huge" table would consist in the main of integer
datatypes (FKs back top the main tables). This should keep the table Size,
physical down.
Indexing is Key. Choose your clustered index carefully. Michelle Poolet
had a good article in SQL Mag a while back about it
http://www.sqlmag.com/Articles/Index.cfm?ArticleID=27334
Before Andrew gives you a huge KB list of resources <grin> here is something
you will also need to consider
Disk Subsystem. Make it a fast RAID array (0+1)
Put logs on a another array
TempDB on Another
Database design plays a large part in application efficiency. It is good
you are reading up on things before just throwing something at a database
server.
--
--
Allan Mitchell (Microsoft SQL Server MVP)
MCSE,MCDBA
www.SQLDTS.com
I support PASS - the definitive, global community
for SQL Server professionals - http://www.sqlpass.org
"Josh Meyer" <jmeyer@.msg.ucsf.edu> wrote in message
news:#LAiZfNVDHA.2368@.TK2MSFTNGP09.phx.gbl...
> Database design is a lot harder than I thought. I'm going through a few
> books, but I can't figure this one out...
> Imagine a database to track employee timecards. We'd have information on
> employees, job descriptions, hours worked, and things like that. Pretty
> basic stuff.
> Now imagine, for example, a dotcom which wants to store timecard
infomation
> for thousands of companies. The table with the employee ID, company name,
> and hours worked would get huge since there'd be just one table with
> information about all employees, regardless of their company. This seems
> like a problem.
> Is the solution to partition the table so that rows pertaining to each
> company are stored on the same server? Or, horizontally partition the
huge
> table so that a year's worth of timecard info is split into 12 partitions,
> one for each month? Or ...?
> Or, how about making a new database for each company which subscribes to
the
> service? SS2000 can support 32,000+ databases, but the overhead for
running
> thousands of databases simultaneously must be huge.
> any thoughts?
> Josh
>|||Jacco's post deserves to be co-signed. Definitely
consider the security of separate databases. The
additional overhead should be neglegible, but the security
benefits are potentially huge.
Matthew Bando
BandoM@.CSCTechnologies.com
>--Original Message--
>> Now imagine, for example, a dotcom which wants to store
timecard
>infomation
>> for thousands of companies. The table with the
employee ID, company name,
>> and hours worked would get huge since there'd be just
one table with
>> information about all employees, regardless of their
company. This seems
>> like a problem.
>Not really. Any table that is properly designed and
indexed will work fine
>with tens of millions of rows.
>You might however want to create a separate database for
each customer for
>security reasons (timecard information is a sensitive
issue, just ask
>British Airways), and that might solve some potential
performance problems.
>I don't think the overhead for running multiple databases
on one server is
>very noticable.
>
>--
>Jacco Schalkwijk MCDBA, MCSD, MCSE
>Database Administrator
>Eurostop Ltd.
>
>"Josh Meyer" <jmeyer@.msg.ucsf.edu> wrote in message
>news:#LAiZfNVDHA.2368@.TK2MSFTNGP09.phx.gbl...
>> Database design is a lot harder than I thought. I'm
going through a few
>> books, but I can't figure this one out...
>> Imagine a database to track employee timecards. We'd
have information on
>> employees, job descriptions, hours worked, and things
like that. Pretty
>> basic stuff.
>> Now imagine, for example, a dotcom which wants to store
timecard
>infomation
>> for thousands of companies. The table with the
employee ID, company name,
>> and hours worked would get huge since there'd be just
one table with
>> information about all employees, regardless of their
company. This seems
>> like a problem.
>> Is the solution to partition the table so that rows
pertaining to each
>> company are stored on the same server? Or,
horizontally partition the
>huge
>> table so that a year's worth of timecard info is split
into 12 partitions,
>> one for each month? Or ...?
>> Or, how about making a new database for each company
which subscribes to
>the
>> service? SS2000 can support 32,000+ databases, but the
overhead for
>running
>> thousands of databases simultaneously must be huge.
>> any thoughts?
>> Josh
>>
>
>.
>|||> Now imagine, for example, a dotcom which wants to store timecard
infomation
> for thousands of companies. The table with the employee ID, company
name,
> and hours worked would get huge since there'd be just one table with
> information about all employees, regardless of their company. This
seems
> like a problem.
1000000 employees x 250 days = 250000000 rows/year. This is not large
by today's standards. Assuming you store about 100 bytes per row,
you'll only have 25GB. Horizontal partitioning is usually employed to
facilitate scale-out and administration, not performance on a single
server. I suggest that you avoid addressing performance problems in
your database design unless you are certain you will have a problem.
--
Hope this helps.
Dan Guzman
SQL Server MVP
--
SQL FAQ links (courtesy Neil Pike):
http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
http://www.sqlserverfaq.com
http://www.mssqlserver.com/faq
--
"Josh Meyer" <jmeyer@.msg.ucsf.edu> wrote in message
news:%23LAiZfNVDHA.2368@.TK2MSFTNGP09.phx.gbl...
> Database design is a lot harder than I thought. I'm going through a
few
> books, but I can't figure this one out...
> Imagine a database to track employee timecards. We'd have information
on
> employees, job descriptions, hours worked, and things like that.
Pretty
> basic stuff.
> Now imagine, for example, a dotcom which wants to store timecard
infomation
> for thousands of companies. The table with the employee ID, company
name,
> and hours worked would get huge since there'd be just one table with
> information about all employees, regardless of their company. This
seems
> like a problem.
> Is the solution to partition the table so that rows pertaining to each
> company are stored on the same server? Or, horizontally partition the
huge
> table so that a year's worth of timecard info is split into 12
partitions,
> one for each month? Or ...?
> Or, how about making a new database for each company which subscribes
to the
> service? SS2000 can support 32,000+ databases, but the overhead for
running
> thousands of databases simultaneously must be huge.
> any thoughts?
> Josh
>|||"Dan Guzman" <danguzman@.nospam-earthlink.net> wrote in message
news:%23vS9U2QVDHA.2340@.TK2MSFTNGP10.phx.gbl...
> 1000000 employees x 250 days = 250000000 rows/year. This is not large
> by today's standards. Assuming you store about 100 bytes per row,
> you'll only have 25GB. Horizontal partitioning is usually employed to
> facilitate scale-out and administration, not performance on a single
> server.
Yup, I was thinking I'd have to scale-out in the future thinking that
thousands of HR departments would hit the server simultaneously to look up
timecard information for their employees. As the load on all federated
servers, I could just add another machine and partition things a little
more.
250 million rows a year might not be very large, but how about 10,000
clients hitting the database at the same time. am i correct in assuming i'd
need several servers to support that?
Josh

No comments:

Post a Comment