Encoding Guidelines

Below is an outline of the guidelines for encoding cookbooks in MSU's digitization project "Feeding America: The Historic American Cookbook Project."

Also available is the instruction manual which was written to train our cookbook coders. This document provides a more indepth explanation of the coding process than the outline provides, as well as explaining XML related terms you may be unfamiliar with.

The DTD for Feeding America is also available.


Outline

You may select a section for faster browsing:

Top of the Page


Formal Public Identifier for cookbook.dtd

A customized tagset has been devised for this project:

System identifier: "cookbook.dtd"
Public identifier: ''-//Michigan State University, Digital & Multimedia Center//DTD -//cookbook 1.0//EN"

The DTD may be examined here. However, this is probably not the final version of the DTD. Contact Ruth Ann Jones, MSU Digital & Multimedia Center, for more information.

Top of the Page



PREPARING TO ENCODE

The first set of cookbooks were typed using the basic instructions for Sunday School books, that is, using the TEI tagset instead of the customized Cookbook tagset. Because of this, all italicized text is tagged as <emph rend="italic>. The cookbook DTD uses a slightly different model for indicating font shifts. <emph> still exists as an element, but should be used only for italics used to indicate linguistic emphasis (as in this sentence). All low-level elements have attributes to indicate font shift as well as the descriptive content of the tag name:
 
 

Value Allowable attributes
align

center, right, indent1, indent2

 

rend

bold, italic, ornate

 

size

larger, smaller (than surrounding text)

 

placement

heading (on a line by itself) or inline

 

Without these values, it might be necessary to use two sets of tags on the same word or phrase, for example:

<ingredient><emph rend="italic">vanilla flavouring</emph></ingredient>

With these values, the tagging is a bit more streamlined:

<ingredient rend="italic">vanilla flavouring</ingredient>

This can be fixed with a global change if you determine that all instances of <emph rend="ital"> occur for the same reason: that is, to mark ingredients. Be cautious with global operations! If you make an error, it can take longer to correct the error than it would have taken to tag the sections manually. If you need to experiment to see if a situation can be corrected with a global search-and-replace, always make a backup copy of the document.

Top of the Page



THE WRAPPER ELEMENT <cookbook>

A <cookbook> consists of four sections:

<meta>

metadata, expressed as Dublin Core elements: required

 

<front>

frontmatter: required

 

<body>

main body of book: required

 

<back>

backmatter: not required, but normally it will be used at least minimally to hold the <pb> reference for the back cover image

 

Top of the Page



ATTRIBUTES ESTABLISHED FOR THE WRAPPER ELEMENT <cookbook>

type= Required. This contains general categories which can characterize an entire cookbook, a chapter or section, or (infrequently) individual recipes or formulas. Allowed values are general, charity, famous, frugal, restaurant, invalid, histperiod, and encyclopedia. See Table 3 for definitions.
chefschool= Optional, but should be used if a cookbook is identified as type=famous. Fill in the name of the chef or cooking school. 
histperiod= Optional, but should be used if a cookbook is identified as type=histperiod. Fill in the name of the historical period (such as "Temperance movement" as given in the cookbook.
class1= Required. These are categories for foods and other types of activities described in the cookbooks. Allowed values are fruitveg, meatfishgame, eggscheesedairy, breadsweets, soups, seasoningmisc, beverages, generalfood, menus, medhealth, household, farmgarden, childrear, etiquette, restaurant, servants, generalnonfood, foodandnonfood. The values shown in italics will probably be the ones used most often at this level, when you are describing an entire cookbook. However, some books will have focus on certain types of foods and one of the other values will be appropriate. The same values are used for individual recipes. See below for definitions. 
class2= Optional. Same allowed values as class1; use this if necessary to represent a secondary focus of a particular cookbook.
region= Required. If a recipe is identified with a specific place or region in the U.S. or with a particular ethnic group, use this attribute. Allowed values are northeast, south, midwest, west, ethnic, and general. Use the U.S. Census map to decide which region a place is in. 
subregion= Optional but should be used if the region= attribute is used. Fill in the more specific region as identified by the cookbook.
ethnicgroup= Optional, but should be used if a cookbook or portion of a cookbook is identified as <element region="ethnic">. Fill in the name of the group as identified in the cookbook. 
occasion= Optional. If a cookbook is identified with a special occasion, use this attribute. Allowed values are Thanksgiving, Christmas, wedding, birthday, patriotic, spring, summer, fall, winter, other. 

Top of the Page



VALUES FOR THE TYPE= ATTRIBUTE OF THE WRAPPER ELEMENT <cookbook>

"general"

General works that do not fall into one of the special categories listed below.

 

"famous"

Cookbooks by a famous chef (Julia Child, Fannie Farmer) or produced by a well-known cooking school. Put the chef's or school's name in chefschool=

 

"charity"

Cookbooks produced by church or community groups for fundraising.

 

"frugal"

Works on cooking economically or using inexpensive ingredients.

 

"restaurant"

Works featuring large-scale recipes for restaurants.

 

"invalid"

Works on cooking for invalids or treating various conditions through diet (e.g. diabetes cookbooks).

 

"histperiod"

Works based on the cooking of a specific historical period. Put the name of the period in the attribute histperiod=. For example, a Civil War cookbook:
< cookbook type="histperiod" histperiod="Civil War">

 

"encyclopedia"

Works organized as a dictionary or encyclopedia: that is, articles arranged alphabetically by topic.

 

Top of the Page



METADATA

The metadata section makes use of unqualified Dublin Core elements. The element names are preceded with "dc:" to distinguish them from the non-metadata elements.

A template with instructions for each element is available at K:\cookery\metadata.txt. Open the document in NoteTab or WordPad, copy the entire template, and paste it into the xml document. Delete the duplicate "meta" tags if necessary. The metadata template will validate against the cookbook.dtd.

Top of the Page


FRONTMATTER AND BACKMATTER

Divide the <front> and <back> of each book into <div> sections based on their content. Each <div> must have a type indicated. Allowable values are: advertisement, appendix, backcover, contents, copyrightstmt, dedication, frontcover, glossary,  halftitlepage, illustration, introduction, index, preface, titlepage, and other.

This list is meant to be fairly comprehensive, so don't use "other" unless none of the other values fit at all. For example, an editor's note is pretty similar to a preface, so tag it as "type=preface." Use "type=contents" for lists of illustrations, figures, and other special items as well as the usual table of contents.

If necessary, a <div> can be divided into <subdiv> elements. This is similar to the <div1>, <div2> relationship in TEI. However, it shouldn't be necessary very often. An exception might be a lengthy introduction that is divided into two parts, each with their own heading. Normally, the paragraph tag <p> will be all you need within a <div>.

Top of the Page


THE BODY: STRUCTURAL ELEMENTS

The <body> structure is also limited in the number of text division levels, compared to the TEI structure that runs from <div1> to <div7>. The body can be divided into chapters, which can be divided into sections, which can be divided into subsections, which can be divided into recipes.

It is not necessary to use all of these levels; only use as many as necessary to reflect the actual structure of the book. The DTD allows the <recipe> element to be located immediately within the <body> element or within <chapter> or <section>. For example, Amelia Simmons' American Cookery has no chapter divisions at all, simply a title page and a series of recipes, so the <recipe> elements would be placed immediately within the <body> element.

This means the following structures are possible, from simplest to most layered:
 

<cookbook>
     <meta></meta>
     <front></front>
     <body>
           <recipe></recipe>
     </body>
< /cookbook>

<cookbook>
     <meta></meta>
     <front></front>
     <body>
           <chapter>
                <recipe></recipe>
           </chapter>
           <chapter>
                <recipe></recipe>
           </chapter>
     </body>
< /cookbook>

<cookbook>
     <meta></meta>
     <front></front>
     <body>
           <chapter>
                <section>
                     <recipe></recipe>
                </section>
                <section>
                      <recipe></recipe>
                </section>
            </chapter>
     </body>
< /cookbook>

<cookbook>
     <meta></meta>
     <front></front>
     <body>
           <chapter>
                <section>
                     <subsection>
                          <recipe></recipe>
                      </subsection>
                      <subsection>
                           <recipe></recipe>
                      </subsection>
                </section>
            </chapter>
     </body>
< /cookbook> 

Top of the Page



WHEN TO USE THE "CLASS=" ATTRIBUTE FOR <chapter>, <section>, and <subsection>
 

The "class=" attribute is optional for the <chapter>, <section>, and <subsection> elements. As of May 2002, "class=" should be used only for the smallest of these three elements in use in a particular book or portion of a book. Theoretically, this will avoid having the XPAT search engine produce duplicate search results.

Top of the Page



THE BODY: RECIPES AND FORMULAS

Within the <chapter> or <section> or <subsection> elements (which hold the major portions of the text) the majority of the text should be tagged as one of three types:

<recipe>

Directions for making something edible, and intended as a food or beverage. This category does not include medicines taken internally.

 

<formula>

Directions for making something non-edible (or not intended as a food or beverage), such as laundry starch, fabric dyes, or medicines.

 

<p>

General commentary that is not part of a recipe or formula, such as advice on how to choose foods in the market place, foods that go together well, table manners, etc. Some of the books will also have extensive sections on other domestic matters such as childrearing, care of invalids, advice on household management, etc. 

 

Some recipes and formulas will contain more than one paragraph, and (as one might expect) these are indicated with <p> tags. However, although many recipes are complete in a single paragraph, these must also have a set of <p> tags immediately within the <recipe> tags.

Although this is somewhat repetitive, it serves two purposes. In order for the style sheet to produce a consistent screen display, the coding also needs to be consistent, whether a recipe has one paragraph or two. This means we must either use the extra <p> tags, or have no <p> tags at all within recipes and use <lb> or some other construction to indicate a second or third paragraph. Since the typists are already inserting <p> tags (following the SSB typing conventions) the first choice makes more sense.

Top of the Page


LOW-LEVEL ELEMENTS FOR RECIPES AND FORMULAS

purpose

This is the title of the recipe or the statement of what the directions will produce. In older cookbooks, this is often located in the first sentence: "To make a bread pudding..." In later cookbooks this is usually a heading located before a list of ingredients. When the <purpose> appears as a heading on the page, use <purpose placement="heading">.
(Don't "double-tag" it as <purpose><head>Chocolate Cake</head></purpose>)

 

process

This would be used for verbs: braise, boil, etc. Use this very sparingly, for actions that are uncommon in 20th century cooking. Don't tag words like "stir" or "roast." Do tag words or phrases like "let the batter sweat overnight".

 

ingredient

This would be used for ingredients in a recipe or the items used to make a formula: things like madder root or walnut hulls would be ingredients for a fabric dye formula. Use only for uncommon ingredients.

 

implement

Objects used to perform some action in a recipe or formula. Ignore common items like spoons, bowls, pots and pans.

 

measurement  

Use this to flag unusual measurement terms such as gill or teacup-full.

 

contributor

Use in church and charity cookbooks when contributors of individual recipes are listed. 

 

attribution

Use when a recipe is attributed to someone else besides the editor or author of the book being tagged. <attribution>"This is based on Julia Child's recipe for boiled turnips."</attribution>

 

variation

Use for variations on a recipe. Usually this means an instruction to follow the same cooking directions and set of ingredients but with one or two substitutions.

 

Top of the Page



DEFINITIONS OF "CLASS=" VALUES FOR FOOD TOPICS

"fruitvegbeans"

Preparing and preserving fruits, vegetables, beans, and legumes of all kinds; selecting these foods at market; proper storage conditions; nutrional value of these foods.

 

"meatfishgame"

Preparing or preserving beef, lamb, mutton, poultry, seafood, and wild game such as venison, squirrel, buffalo, etc. Include organ meats such as kidney, brains, tripe, etc. Also, selecting and storing these foods; nutritional value of these foods.

 

"eggscheesedairy"

Making cheese or other dairy products (i.e. yogurt) and recipes which have eggs, cheese, or dairy products as the major ingredients (i.e. puddings, custards, quiche). Also, selecting and storing these foods; nutritional value of these foods.

 

"breadsweets"

Breads and baked goods: crackers, muffins, tarts, pies, cakes, pancakes, etc. Also, sweets or desserts even if they are not baked, such as fudge, boiled sugar candies, icings for cakes, etc. Also, selecting and storing these foods; nutritional value of these foods.

 

"soups"

Soup recipes. This category takes precedence over "fruitveg" and "meatfishgame" -- i.e. asparagus soup goes here, not in "fruitveg"; beef broth goes here, not in "meatfishgame". Also, selecting and storing these foods; nutritional value of these foods.

 

"accompaniments"

This category encompasses foods meant to season or flavor other foods, rather than being eaten alone. This includes recipes for sauces, jams and preserves, and condiments such as mustard or pesto, as well as directions for using or preparing herbs or spices. Also, selecting and storing these foods; nutritional value of these foods.

 

"beverages"

Anything meant to be drunk instead of eaten. Milk or eggnog goes here, not in "eggscheesedairy." Fruit juice goes here, not in "fruitveg." Also, selecting and storing these foods; nutritional value of these foods.

 

"generalfood"

Applies only to <cookbook>, <chapter>, and <section>. This is to be used when two or more categories are covered by the material. Most cookbooks will be class="general" because they cover all types of food.

 

"menus"

Restaurant menus, "bills of fare" and other portions of text that list foods that go together well. This can only be used with <cookbook>, <chapter>, <section>, or <passage>.

 

Each <cookbook> and each <recipe> may have two of these terms applied: one as the value of the class1= attribute and one as the value of the class2= attribute. Even so, some recipes will be hard to classify. If in doubt:

In any kind of classification system, there always will be differences of opinion about what category a certain thing should go in; this is normal and to be expected. Try to be consistent, but don't agonize over individual recipe classifications.


Top of the Page



DEFINITIONS OF "CLASS=" VALUES FOR NON-FOOD TOPICS

"medhealth"

Information about health, nutrition, hygiene, or care of the sick. Examples might include: "a tincture for mouth and gums" (i.e. toothpaste), "tonics" or nutritional supplements like cod liver oil, or poultices for dressing wounds.

 

"household"

Information about household management. Examples of <formula> under this category would include directions for preparing things like laundry starch or fabric dyes. A <passage> under this category might discuss ways to heat a home more efficiently.

 

"farmgarden"

Anything related to the raising of food or livestock. Examples might include advice on caring for an orphaned calf or making a spray to ward off potato beetles.

 

"childrear"

Advice on raising children.

 

"etiquette"

Advice on good manners, how to behave in social situations, etc.

 

"restaurant"

Advice on managing a restaurant or hotel, or (in the case of Tunis Campbell) training employees of a restaurant or hotel.

 

"servants"

Use this for servants in private homes. Hotel employees should be listed under "restaurant" (because that is actually shorthand for "restaurants and hotels".)

 

"generalnonfood"

Anything that doesn't fit in one of the categories above.

 


 
 

And, a very general category, probably only applicable at the <cookbook> level: 

"foodandnonfood"

For those encyclopedic works that address many sorts of foods and many sorts of noncooking topics such as gardening, nursing the sick, organizing household work, etc.

 

Top of the Page




GENERAL FORMATTING ELEMENTS

Use these low-level elements to format the text as necessary.

<p>

Paragraph: to subdivide <recipe> or <formula> or <passage>, as needed.

 

<pb>

Page break. Follow the same rules as for other typing projects, i.e. <pb n="pagenumber" id="book079.jpg">

 

<lb>

Line break. Can be used any time there are special line breaks, as on the title page.

 

<emph>

Use only for linguistic emphasis (see top of instructions for explanation).

 

<alt>

Use to give the 20th century equivalents of archaic terms. For example, 
"Add 3 egg <alt synonym="yolks">yelks</alt> to the cake batter."

 


 
Formatting Elements Used in <list>
<list>

Use to indicate a list of items. A <list> contains a series of <item>s.

 

<item>

The individual lines or sections of a list. Can contain <term>, <definition>, and <ref>.

 

<term>

Use only when <list> is being used to encode a glossary-type section.

 

<definition>

Use only when <list> is being used to encode a glossary-type section.

 

<ref>

Use for footnotes, or for page numbers in a table of contents. More explanation below.

 


 
 
Formatting attributes that occur in most other elements in the cookbook DTD.
align=

Allowed values are center, right, indent1, and indent2.

 

rend=

Allowed values are bold, italic, and ornate.

 

size=

Allowed values are larger and smaller. (This means larger or smaller than the text immediately surrounding the tagged text.)

 

placement=

Allowed values are heading and inline. Heading means on a line by itself. Inline means not on a line by itself, like the text in a paragraph.

 

height=

This is an attribute for <ref>, for marking up footnotes. Allowed values are subscript (below the line of text) and superscript (above the line of text).

 

Top of the Page



TARGET AND ID PAIRS

Target and ID "pairs" are used to refer a reader from one part of a document to another. They're referred to as pairs because whenever you use <ref target="example"> in one part of a book, there has to be another portion tagged as <element id="example">. The "id=example" can be used in many different elements in the cookbook DTD. It is required in <pb>, so that is the most frequent use. It can also be used in lists, chapter headings, illustrations, etc.

Top of the Page


EXTERNAL REFERENCES

The cookbook collection will be accompanied by several groups of supplementary material: author biographies, essays on individual books and cooking genres, a glossary of cooking terms, and a gallery of museum objects. We will need to create links to this material from within the cookbook texts. The attributes xref= and item= will be used to do that. These attributes can be used with most elements.

The allowed values for xref= are authors, essays, objects, glossary (the same four categories named above).
The value for item= is a code referring to the particular author, essay, object, or glossary entry. We'll need to create standardized lists for these.

Example: one of the museum objects being photographed is a Dutch oven. When a recipe mentions using this item, it would be tagged like this:

"Let the stew simmer for three hours in a <implement xref="objects" item="dutchoven">Dutch oven</implement> placed among the coals."

Top of the Page


ILLUSTRATIONS

Follow the same guidelines as in the Sunday school books to decide if something is an illustration. If you see an abstract design used to fill a little space at the end of a chapter, don't tag it at all. If you see something decorative that is an identifiable object (a little row of spoons, perhaps) tag it as an illustration even if it's only there to fill space.

Illustration tags.
<illustration>

Contains <caption> and <description>. The caption is optional, since there might not be one. The description is required. 

 

<caption>

Wrap this tag around the picture's caption.

 

<description>

Write a brief description of the picture. See the Sunday school books website for examples of good descriptions.

 

Top of the Page


EDITORIAL NOTES

The TEI tagset, which we used to code the Sunday school books, has numerous tags for indicating unusual characteristics of a book that cannot be clearly understood through the transcription of the text. For example, there is a tag <inscription> for marking up handwritten inscriptions, a tag for <gap> to indicate that words are missing because of damage to a book, and <unclear> for words that cannot be accurately transcribed because (for example) the ink is badly smudged.

The cookbook tagset attempts to reduce the number of tags needed for situations like these by including an element <ednote>. When you encounter something that needs a bit of explanation, go ahead and put it in, wrapped in <ednote> tags. The XSL stylesheet will be written to display <ednote> material inside square brackets with a heading [Note: ] so it will be clear to the reader that this is an addition to the original text. In general, put the <ednote> after the place that needs explanation. For example, on the title page of Amelia Simmons:

<div type=titlepage>

    <p>John Hammond</p>
    <ednote>Handwritten inscription.</ednote>

    <docTitle>American Cookery...</docTitle>

</div>
 

Top Of Page


Comments?
Updated: 05/21/04