Saturday, February 25, 2012

Database Design Question

Anyone know a good database scheme for storing formatted text? The text will be coming from XML in this type of format:

<text id="01">
<headline>Here is the title</headline>
<subhead>Here is a subhead</subhead>
<paragraph>Here is some text</paragraph>
</text>
<text id="02">
<headline>Here is the title</headline>
<subhead>Here is a subhead</subhead>
<paragraph>Here is some text</paragraph>
<subhead>Here is a subhead</subhead>
<paragraph>Here is some text</paragraph>
</text>
... etc

How can I store in a database and keep the headlines, subheads, and paragraphs delimited without just storing the XML tags and keeping everything in order (each "text" may contain multiple paragraphs and subheadings ... )?

I'm just wondering if anyone has done this before and what has worked for them ... just looking for opinions on the most efficient way to accomplish ... thanks!If the data length will be less than 8000 characters drop the data into an a varchar column. If the data will exceed 8000 characters you will have to drop the data into a text column.|||Originally posted by smorton
If the data length will be less than 8000 characters drop the data into an a varchar column. If the data will exceed 8000 characters you will have to drop the data into a text column.

Thanks for the reply. I'm looking for the most efficient way to set up relationships that will keep my text formatted the same way it is in the XML doc and in the same order. Any ideas?|||Again, why don't you just drop this into a varchar or text field? Do you need to be able to query on different Tags? I don't get why you need to establish relationships to the XML.

blindman|||I need to be able to store other properties about each piece of text ... I guess this could be done with attributes of the XML tags. Is that the best way? Just dump the XML right into the db? Thanks!|||What other attributes?

How do you define a piece of text? One line? All the text between two tags?

What about nested tags?

blindman|||There will be no nested tags ... no need for them at this point. Attributes will be font size, family, color, etc ... and a transition id.

(I'm using ASP to generate XML for Flash ... but I also need to save the everything I write the XML in a database).

So I guess I could just store in the db as follows:

<text id="1">
<head font="arial" size="12" color="000000" transition_id="1">This is the title</head>
<sub font="arial" size="11" color="000000" transition_id="2">This is a sub title</sub>
<para font="arial" size="10" color="000000" transition_id="3">this is a paragraph</para>
<para font="arial" size="10" color="000000" transition_id="3">this is a paragraph</para>
<sub font="arial" size="11" color="FF0000" transition_id="2">This is a sub title</sub>
<para font="arial" size="10" color="FF0000" transition_id="3">this is a paragraph</para>
</text>
... etc

Does that sound right to you? Thanks!|||(follow up) ... text is everything between 2 tags.|||Well, you could have each line be an independent record containing both the text and the common attributes as separate fields. To keep the lines in order, use an incrementing key.

blindman|||you can also identify all the attributes that define a piece of text and separate them into a set of tables that would contain key-based references, all grouped together by a page reference (if applicable).|||Thanks guys. This is kind of what I was thinking ... just wanted to get an expert's opinion. If I use both the incrementing key and the page id with several related tables then I can keep the data all seperated out into organized tables and carry all the required attributes. Then when I am ready to write to XML, for each page I just do a select across the related tables by page id and order it by the incremented order id.

That brings the text out for each page in the right order with all of the formatting in place.

Sounds right to me. Thanks again!

No comments:

Post a Comment