Wednesday, March 7, 2012

Database Design Theory

I have been tasked with creating a Data Warehouse.

Problem is that old storage vs reporting debate.

I have determined that the data that I will recieve and store will be like follows (simplified form) for expandability

KEY FldKEy FldData DateTime AuditTrail

Daily I will use this data based on use input process this data into the following format and say
if fldkey/ flddata open a cycle.
populate row with null close date
if fldkey/ flddata closes cycle
update row with date

If fldkey/ flddata changes a cutable value
update row

if fldkey/ flddata changes a cutable value (type 2 table)
insert a row into detail update value and obsolete previous row.

KEY DateStart DateEnd FLDDATA1 FLDDATE2 Op_Cl_IND HEADER Record

KEY EFFdate OBSDATE FLDdata3 FLDData4 Detail Records
KEY EFFdate OBSDATE FLDdata3 FLDData4
KEY EFFdate OBSDATE FLDdata3 FLDData4

Problem: FLDKey is a finite count however the max is undefined.

IS there any way to solve the problem of not being able to nail down users to tell you what they want to cut by. What I have been instructed by mgr (old IDMS) is that they wish to see all on the FldData and have the ability to cut by all of it. However the Flddata could be anything (cannot be indexed).

400,000,000 rows at least.

Do I need to nail the users down or am I am missing something.

Sorry if so cryptic

:(Too cryptic, and too much jargon. (Cut?, Cycle?)

But this sure doesn't look like a data warehouse project to me.|||Thought so.

User requirement store a undetermined amount of data about an item

Cannot create a structure with
Key Make Model Year Cost Sold Junked
1 Ford Taurus 97 10000 1/1/98 2/28/06
2 Ford Ranger 97 20000 1/12/99 null

Need to create
Key Fieldname data
1 Make Ford
1 Model Taurus
1 Year 97
1 Cost 10000
1 Sold 1/1/98
1 Junked 2/28/06

As the Number of fields captured is not fixed.

Now I need to sum the that in the data field by a cut of that data.
Total Cost of All 1997 Ford Cars and grouped by the age in months of the auto.

Work of a small table to do this however 400 m rows of data over 18 months Tring to find the data 1997 and Ford in the data field then determining the total dollars of those

Well.

Question is there some thing that I am missing or do I need to get the user to define a more static structure. And transform this into a star schema.

The table structure to store the data (Audit trail)

perhaps datamart is a more appropriate term.|||So you're trying to do the logical vertical table thingee...

Why?

There are sooo many bad reasons for doing this...

The concept is that you want to be able to add new "columns" on the fly

I'll tell you this, if you do go this route, you better make sure to use a partitioned view where the tables are in their own file groups and distributed across many physical disks...

I wonder what you would partition on though|||That is my thought that this a bad idea.

I need to convince my MGR and Users to give me more Defined specs.

Was just asking if there is something that I am missing.

The Goal to the Audit Trail is to capture every poteitial detail of a thing. And to get that info If you have that things ID you can do this. However any and look at every detail about this. the other reason for the data structure of the audit trail is scalability of data storage.

If the users can define the key of a thing and details about that I can use the daily audit file to transpose this information.

I just wanted to make sure that my thoughts were not clouded by preconcieved notions or by lack of knowledge about new techniques.

Thank you|||What you are talking about is known as an EAV (Entity/Attribute/Value) model, and while it is occasionally appropriate, it is very difficult to write code around and I would NEVER recommend it for a "data warehouse".

If you need an unstructured data schema, consider developing in SQL Server 2005 and using the new XML datatype instead of an EAV design.|||Thank You BlindMan for putting a name to my problem. (EAV)
Thank You Brett

I am only using this struture to store an audit trail of a dynamic data structure.

I will persuade the users better to define there business needs so that I may make a true Data Warehouse.

I will use the Audit trail data and evaluate the rows of data that I recieve nighlty to populate the Warehouse. From predifined specs.

I cannot force a change in the data entering the Database.

But once here it is mine all mine.. AH HA HA HA :D|||I think you need to rethink your audit trail. If you are trying to use a single table to store all data changes from any other table, you are going to find the resulting dataset quite unwieldy. Either just store a description of the change that occured, or create separate archive tables for each production to table to store a history of modifications.|||Thank You BlindMan for putting a name to my problem. (EAV)have a look at tony andrews' article OTLT and EAV: the two big design mistakes all beginners make (http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html)

tony's another dbforums stallwart|||To tell the truth...for me, doing audits...simpler is better

Just have an exact copy of the table.

Add 3 columns, HIST_ADD_BY, HIST_ADD_DT, HIST_ADD_TYPE

Create a trigger that inserts into this table for all UPDATE and DELETE Operations and move the entire row

Put no indexes on this table

Boom, audit done

If you want performance on analysis, I'd reccomend that you bcp the data out and load it to a table with indexes (applied after the load)

I would not want to interfere with the triggers at all

MOO|||Thanks all for the reference to the column.

The EAV I am forced to use is a decision of My Management and cannot push them off of that concept. Oh the fate of a peon. I have explained the pros and cons of this decision but this is what has been decided.

The other option that they suggested was to dynamically evaluate the key fields and if a new one appears generate the DDL to add a field to a table of 800+ fields and transpose this data.

They wish to make the data flexible for any forseeable senario without further IT involvment.

If I had total control ....|||They wish to make the data flexible for any forseeable senario without further IT involvment.lol
I've been enjoying the evolution of this thread. I love it when threads like this end with the management concluding that they can conceive of a model that will require no "further IT involvment".

I've never had the misfortune to run into a full blown EAV model (appropriately implemented or not) in the flesh but it's pretty clear with only a little reading and even a substandard imagination like mine that this is precisly where they do not lead you.

Pro - can stick any data in the db
Con - can stick any data in the db
Con - Try getting it out again

"for any forseeable senario" lol.

Ah well - you tried. Best of luck :D|||Put it in writing/e-mail right now that you think XML is the way to go with this. Then save a copy of the document in your CYA file.
Some very complex coding is ahead of you.|||committing to writing that XML is the way to go may backfire on ya...

not that i've got any XML experience, but i've heard horror stories about performance, and since XML has such cachet with management, they may take you up on it and then you could be cooked|||Yes, the performance sucks compared with a standard normalized database, but is probably no worse than an EAV design and with a helluva lot less programming. I would not place data in an XML column unnecessarily, but would reserve it only for data that could not be predefined in the business model.

No comments:

Post a Comment