Version4 Mrda Manual
User Manual:
Open the PDF directly: View PDF
.
Page Count: 132
| Download | |
| Open PDF In Browser | View PDF |
Meeting Recorder Project: Dialog Act Labeling Guide ICSI Technical Report TR-04-002 February 9, 2004 Rajdip Dhillon Sonali Bhagat Hannah Carvey Elizabeth Shriberg ACKNOWLEDGEMENTS We especially thank Jeremy Ang for processing our data and Chuck Wooters for providing us with the TableTrans software. We are also grateful to Don Baron and Chris Oei for their assistance in preparing data for annotation. We are thankful to Ashley Krupski for her annotation efforts; Barbara Peskin and Jane Edwards for their assistance in using the corpus; and Dan Jurafsky and Andrei Popescu-Belis for supplying us with their input. This work was supported by a DARPA Communicator project, ICSI NSF ITR Award IIS0121396, SRI NASA Award NCC2-1256, SRI NSF IRI-9619921, SRI DARPA ROAR project N66001-99-D-8504, and by an ICSI award from the Swiss National Science Foundation through the research network IM2. The views represented herein are those of the authors and do not represent the views of the funding agencies. TABLE OF CONTENTS Introduction ......................................................................................................................1 Section 1: Quick Reference Information ..........................................................................2 1.1 Terminology.......................................................................................................2 1.2 Mapping Meeting Recorder DA (MRDA) Tags to SWBD-DAMSL Tags ............3 1.3 Meeting Recorder DA (MRDA) Tagset ..............................................................6 Section 2: Segmentation..................................................................................................8 Section 3: How to Label .................................................................................................15 3.1 Basic Format of DAs and Labels .....................................................................15 3.2 Label Construction...........................................................................................15 3.3 Annotating Utterances Containing Multiple DAs ..............................................18 3.4 Disruption Forms .............................................................................................19 3.5 Quotes.............................................................................................................21 3.6 Using TableTrans (Annotation Interface).........................................................22 Section 4: Adjacency Pairs ............................................................................................25 4.1 Purpose and Definition ....................................................................................25 4.2 Labeling Adjacency Pairs ................................................................................25 4.3 Labeling Conventions ......................................................................................26 4.4 Restrictions on Using Adjacency Pairs ............................................................30 Section 5: Tag Descriptions ...........................................................................................32 5.1 Preliminaries....................................................................................................32 5.2 Group 1: Statements .......................................................................................32 5.3 Group 2: Questions .........................................................................................33 5.4 Group 3: Floor Mechanisms ............................................................................43 5.5 Group 4: Backchannels and Acknowledgments ..............................................48 5.6 Group 5: Responses........................................................................................57 5.7 Group 6: Action Motivators ..............................................................................70 5.8 Group 7: Checks..............................................................................................76 5.9 Group 8: Restated Information ........................................................................80 5.10 Group 9: Supportive Functions ........................................................................87 5.11 Group 10: Politeness Mechanisms ..................................................................92 5.12 Group 11: Further Descriptions .......................................................................96 5.13 Group 12: Disruption Forms ..........................................................................110 5.14 Group 13: Nonlabeled ...................................................................................113 Appendix 1: Labeled Meeting Sample .........................................................................115 Appendix 2: Unused/Merged SWBD-DAMSL Tags .....................................................120 Appendix 3: Unique MRDA Tags .................................................................................123 Appendix 4: Final MRDA Tagset Revisions .................................................................126 Bibliography .................................................................................................................128 Index of Tags ...............................................................................................................129 INTRODUCTION This labeling guide is adapted from work on the Switchboard recordings and the accompanying manual (Jurafsky et al. 1997). The Switchboard-DAMSL (SWBDDAMSL) manual for labeling one-on-one phone conversations provided a useful starting point for the types of dialog acts (DAs) that arose in the ICSI meeting corpus. However, the tagset for labeling meetings presented here has been modified as necessary to better reflect the types of interaction we observed in multiparty face-to-face meetings. This guide consists of five major sections: Quick Reference Information, Segmentation, How to Label, Adjacency Pairs, and Tag Descriptions. The first section supplies definitions for terms used throughout this guide and contains the correspondence of the Meeting Recorder DA (MRDA) tagset, which is the tagset detailed within this guide, to the SWBD-DAMSL tagset. This section also contains the entire MRDA tagset organized into groups according to syntactic, semantic, pragmatic, and functional similarities of the utterances they mark. The section entitled “Segmentation,” as its name indicates, details the rules and guidelines governing what constitutes an utterance along with how to determine utterance boundaries. The third section, “How to Label,” provides instruction regarding label construction, the management of utterances requiring additional DAs or containing quotes, and the use of the annotation software. The section entitled “Adjacency Pairs” details how adjacency pairs are constructed and the rules governing their usage. The section entitled “Tag Descriptions” provides explanations of each tag within the MRDA tagset. Two appendices are also found within this guide. The first provides a labeled portion of a meeting and the second contains information regarding tags used for a select number of meetings. With regard to the examples from meeting data found throughout this guide, it must be noted that the start and end times for each utterance within the examples do not reflect the most recent time alignments. However, the start and end times are accurate to a point which allows for them to be located within their corresponding audio files without difficulty. 1 SECTION 1: QUICK REFERENCE INFORMATION 1.1 Terminology Below is some rudimentary terminology used in dialog act labeling: utterance: a segment of speech occupying one line in the transcript by a single speaker which is prosodically and/or syntactically significant within the conversational context speech: a group of successive utterances or successive portions of an utterance turn: the period during which a speaker has the floor label: the entire set of DAs and/or other tags applicable to an utterance dialog act (DA): the tag or sequence of tags pertaining to the function of an utterance or portion of an utterance. Each DA contains at least one general tag and may contain one or more specific tags, depending upon the nature of the utterance tag: the individual component(s) of a DA or label general tag: the tag which represents the basic form of an utterance (e.g., statement, question, backchannel, etc.) specific tag: the tag which represents the function or a characteristic of an utterance and is appended to the general tag (e.g., accepting, rejecting, acknowledging, rising tone, etc.) disruption form: the tag which represents a disruption or otherwise indiscernible utterance 2 1.2 Mapping Meeting Recorder DA (MRDA) Tags to SWBD-DAMSL Tags The following table shows the correspondence between Switchboard-DAMSL (SWBDDAMSL) dialog tags and those used to label Meeting Recorder DA (MRDA) data. The tags within the table are ordered according to the categorical structure within the SWBD-DAMSL manual, with tags unique to the MRDA tagset being inserted in accordance with this categorical structure. The SWBD-DAMSL categories are not explicitly marked within this table in order to avoid confusion with the categories of the MRDA tagset. Tags listed in italics are based upon SWBD-DAMSL tags but have had their meanings altered for the purposes of the MRDA data. Tags in boldface are not in the original SWBD-DAMSL manual but have been added to accurately characterize the MRDA data. Tag titles in boldface correspond to names of MRDA tags. All other tag titles correspond to names of SWBD-DAMSL tags. Additionally, the reasoning behind why certain SWBD-DAMSL tags are not used in the MRDA tagset is found in Appendix 2. Explanations regarding the presence of tags unique to the MRDA tagset are found in Appendix 3. TAG TITLE SWBD-DAMSL MRDA Uninterpretable % % Abandoned %- %-- Interruption not marked %- Nonspeech x x Self-talk t1 t1 3rd-party-talk t3 t3 About-task t t About-communication c not marked Statement-non-opinion sd s Statement-opinion sv s Open-option oo not marked Yes-No-question qy qy Wh-Question qw qw 3 Open-Question qo qo Or-Question qr qr Or-Clause qrr qrr Rhetorical-Question qh qh Declarative-Question d d Tag-Question g g Action-directive ad co Offer co cs Commit cc cc Conventional-opening fp not marked Conventional-closing fc not marked Explicit-performative fx not marked Exclamation fe fe Other-forward-function fo not marked Thanks ft ft Welcome fw fw Apology fa fa Topic Change not marked tc Floor Holder not marked fh Floor Grabber not marked fg Accept aa aa Accept-part aap aap Maybe am am Reject-part arp arp Reject ar ar Hold before answer/agreement h h Signal-non-understanding br br Continuer b b 4 Rhetorical-question continuer bh bh Acknowledge-answer bk bk Mimic other m m not marked r Collaborative completion 2 2 Reformulate/summarize bf bs Assessment/appreciation ba ba Sympathy by by Downplayer bd bd Correct-misspeaking bc bc Misspeak Self-Correction not marked bsc Understanding Check not marked bu Defending/Explanation not marked df "Follow Me" not marked f Yes answers ny aa No answers nn ar Affirmative non-yes answers na na Negative non-no answers ng ng Other answers no no Expansions of y/n answers e e Dispreferred answers nd nd Quoted Material q not marked Hedge h not marked Continued from previous line + not marked Humorous Material not marked j Rising Tone not marked rt Nonlabeled not marked z Repeat 5 1.3 Meeting Recorder DA (MRDA) Tagset The categorization scheme for the Meeting Recorder DA (MRDA) tagset differs from the scheme employed for the SWBD-DAMSL tags seen. The reasoning behind this is that, in the process of adjusting the definitions of previously established SWBD-DAMSL tags and creating new tags to assist in adequately assessing the MRDA data, the resulting MRDA tagset could not be appropriately characterized when placed in direct relation to the SWBD-DAMSL tagset, given the nature of the data for which the MRDA tagset was employed. Consequently, the tags are not organized on a dimensional level, but rather the correspondences for the MRDA tagset are listed on the tag level. Descriptions of the individual tags within the MRDA tagset are found in Section 5. Group 1: Statements s Statement Group 2: Questions qy Y/N Question qw Wh-Question qr Or Question qrr Or Clause After Y/N Question qo Open-ended Question qh Rhetorical Question Group 3: Floor Mechanisms fg Floor Grabber fh Floor Holder h Hold Group 4: Backchannels and Acknowledgements b Backchannel bk Acknowledgement ba Assessment/Appreciation bh Rhetorical Question Backchannel Group 5: Responses Positive aa Accept aap Partial Accept na Affirmative Answer Negative ar Reject arp Partial Reject nd Dispreferred Answer ng Negative Answer Uncertain am Maybe no No Knowledge 6 Group 6: Action Motivators co Command cs Suggestion cc Commitment Group 7: Checks f "Follow Me" br Repetition Request bu Understanding Check Group 8: Restated Information Repetition r Repeat m Mimic bs Summary Correction bc Correct Misspeaking bsc Self-Correct Misspeaking Group 9: Supportive Functions df Defending/Explanation e Elaboration 2 Collaborative Completion Group 10: Politeness Mechanisms bd Downplayer by Sympathy fa Apology ft Thanks fw Welcome Group 11: Further Descriptions fe Exclamation t About-Task tc Topic Change j Joke t1 Self Talk t3 Third Party Talk d Declarative Question g Tag Question rt Rising Tone Group 12: Disruption Forms % Indecipherable %Interrupted %-- Abandoned x Nonspeech Group 13: Nonlabeled z Nonlabeled 7 SECTION 2: SEGMENTATION Utterance segmentation is one of the most debated topics in discourse analysis. The function of dialog must always be considered when determining utterance boundaries. Lengthy utterances containing multiple conjunctions, speaker rambling, and floorholding are just a few factors complicating the decisions regarding utterance boundaries. In order to segment transcribed speech into distinguishable utterances, the following factors are taken into consideration within the context of the conversation: syntax, pragmatic function, and prosody. Prior to determining how to segment transcribed speech, knowledge of how utterance boundaries are marked within the transcript is necessary. There are two ways to mark utterance boundaries within the transcript. When a speaker trails off or is interrupted and consequently does not complete his utterance, an utterance boundary in the form of <==> is marked at the end of the corresponding utterance in the transcript. In Example 1 on the following page, speaker c2 does not finish his utterance (speaker c3 adds the remainder of c2's utterance shortly after) and an utterance boundary is signaled by the <==> in the transcript. If a speaker's utterance is complete, an utterance boundary in the form of < . > is marked at the end of the corresponding utterance in the transcript. Returning to the factors involved in segmentation, in terms of syntax, utterance boundaries are primarily derived on a phrasal level. This is not to say that an utterance consists only of a noun phrase or a verb phrase, but rather that it is permitted for a complete utterance to consist only of a noun phrase, a verb phrase, or both. In Example 11, the noun phrase "jose" constitutes a complete utterance: Example 1: Bmr010 280.000-284.762 c2 s.%-- 284.762-288.568 c2 s 287.474-288.294 c3 s^2 and i did some training on - on one dialogue which was transcribed by == yeah we - we did a nons- - s- speech nonspeech transcription . jose. Example 2 and 3 depict instances where verb phrases, "got it" and "wants to conserve" in Example 2 and "confused" in Example 3, behave as complete utterances: 1 Examples take a format in which the numerical values of the first column represent start and end times of utterances, the second column indicates the channel, the third indicates the DA, and the fourth presents the transcript. 8 Example 2: Bed011 114.007-116.680 c2 s 116.680-119.347 c2 s 119.120-119.320 119.726-120.386 121.961-122.331 122.160-123.170 c1 c2 c1 c4 s^bk s s^bk s 2950.850-2957.110 c3 s 2952.260-2953.830 c2 s^2 and um - i - i told it to stay on forever and ever . but if it's not plugged in it just doesn't obey my commands . okay . it has a mind . got it . wants to conserve . Example 3: Bed003 yeah the only like - possible interpretation is that they are - like come here just to rob the museum or something to that effect . confused . The pragmatic function of an utterance is also an important consideration for utterance boundary identification. Phrases or clauses that do not appear complete grammatically may actually form complete utterances on account of having unique functions within conversation. Although it may seem peculiar to segment utterances on a phrasal and clausal level, such a method of segmentation is utilized for the purpose of maximizing the amount of information derived from DAs. Example 4 presents an utterance that appears complete grammatically, yet does not maximize the amount of information which can be derived from DAs. Example 4: Bmr010 217.921-227.363 c6 s^cs that uh - if we had something that worked for many cases before maybe starting from there a little bit because ultimately we're going to end up with some s- - kind of structure like that. In Example 5, the same utterance from Example 4 is shown, however the utterance is segmented at the clausal level so that more information may be provided by the DAs that otherwise would not be present had the utterance not been segmented. 9 Example 5: Bmr010 217.921-222.161 c6 s^cs 222.161-227.363 c6 s^df that uh - if we had something that worked for many cases before maybe starting from there a little bit . because ultimately we're going to end up with some s- - kind of structure like that. Syntax and pragmatic function are both taken into account when encountering conjunctions. Conjunctions such as "and," "or," "but," and "so" often behave as cues to locations where a string of clauses might be segmented into separate utterances. Rather than simply start a new utterance, a speaker might use one of these conjunctions as a connection between two complete utterances, as seen in a presegmented utterance in Example 6: Example 6: Bmr020 595.187-608.363 c6 s that's somewhat - that's somewhat subject to error but still we - we uh don did some ha- - hand checking and – and we think that - based on that we think that the results are you know valid although of course some error is going to be in there . Example 7 depicts a correctly segmented version of Example 6: Example 7: Bmr020 595.187-596.880 c6 s 596.880-601.180 c6 s 601.310-604.837 c6 s 604.837-608.363 c6 s that's somewhat - that's somewhat subject to error . but still we - we uh don did some ha- hand checking . and - and we think that - based on that we think that the results are you know valid . although of course some error is going to be in there . Caution must be taken not to segment utterances upon the appearance of conjunctions in every instance. Quite often, conjunctions are used to simply connect noun phrases or verb phrases that would not constitute separate utterances in the context in which they are used. In these cases, the utterance is not segmented at the conjunction. 10 Example 8 and Example 9 demonstrate instances when an utterance is not segmented upon the appearance of a conjunction: Example 8: Bro014 238.387-240.098 c2 s^e i mean it's like one little text file you edit and change those numbers . c2 s now h t k's compiled for both the linux and for um the sparcs . Example 9: Bro014 302.417-305.275 On occasion, a speaker may have an extremely lengthy utterance with many conjunctive clauses and parentheticals. In such situations, each clause or parenthetical is segmented into a separate utterance. As with segmenting on a clausal or phrasal level, segmenting parentheticals in such a way allows for the maximization of information provided by DAs. In deciding how to segment such instances within transcribed speech, it is helpful to determine whether a speaker actually had the whole string of speech in mind or else unintentionally diverged from his original thoughts. Example 10 depicts a rather lengthy utterance prior to segmentation and Example 11 presents a segmented version of the same utterance. Example 10: Bmr005 1012.960-1033.300 c4 s but i - i mean - i think also to some extent its just educating the human subjects people in a way because there's if uh - you know - there's court transcripts there's - there's transcripts of radio shows i mean - people say people's names all the time so i think it - it can't be bad to say people's names it's just that i mean - you're right that there's more poten- - if we never say anybody's name then there's no chance of - of - of slandering anybody . c4 s but i - i mean - i think also to some extent its just educating the human subjects people in a way . Example 11: Bmr005 1012.960-1019.350 11 1019.350-1025.740 c4 s^df 1026.390-1028.940 c4 s 1029.270-1033.300 c4 s^df because there's if uh - you know there's court transcripts there's there's transcripts of radio shows i mean - people say people's names all the time . so i think it - it can't be bad to say people's names . it's just that i mean - you're right that there's more poten- - if we never say anybody's name then there's no chance of - of - of slandering anybody . Prosody is also of considerable importance in detecting utterance boundaries. To take the prosody of an utterance into consideration is to take the aural cues such as the rise and fall of pitch, the energy level, and duration of the words of the utterance as well as the complete utterance into consideration. Utterances that appear complete syntactically, whether they are quite lengthy or consist of short phrases or clauses, may be incomplete prosodically. If the prosody of the end of an utterance consists of a pitch, energy level, or duration that is incongruent with that of a complete utterance, then that particular utterance is considered incomplete. General prosodic patters found within complete utterances and prosodic patterns specific to certain speakers are necessary factors in determining how to assess the prosody of a complete utterance. Prosody is of use in determining whether an utterance is interrupted or abandoned. If a speaker begins trailing off in pitch and the energy level begins to decrease, the speaker's utterance is most likely to be marked as abandoned. Prosody can also help distinguish between floor grabbers and backchannels, as floor grabbers tend to have a higher energy level in contrast to the surrounding speech and backchannels do not. Pauses also behave as signifiers to utterance boundaries. Oftentimes, the appearance of a lengthy pause indicates that the segment of speech following the pause constitutes a new utterance. If the portion of speech immediately preceding the pause is incomplete, that portion may either be an abandoned utterance or the beginning of an utterance of which the portion of speech following the pause is the end. If the former applies, and the portion preceding the pause is actually abandoned, a change in DAs, prosody, or both is an obvious signal that the pause is indicative of a boundary. However, if the latter case is applicable, no such drastic change in the prosody between the segment preceding and the segment following the pause will be present and both portions of speech are to comprise one utterance. To reiterate with regard to the latter case, an utterance boundary will not be marked at the pause. As a side note, it must be mentioned that some speakers tend to speak slowly in such a manner that their utterances are filled with frequent pauses. In such instances, pauses are not indicators of utterance boundaries unless the segment of speech following a pause is incongruent with the segment preceding. 12 As difficulty in determining utterance boundaries is encountered when considering the factors of syntax, prosodic function, prosody, and pauses, additional segmentation issues occasionally arise with the applicability of certain tags, namely, , , , , , and . Regarding , , and , often the problem at hand is whether to segment an utterance in which a speaker utters a string of s, s, or s, as seen in Example 12. If there exist significant pauses between each portion of the string of s, s, or s, the utterance is segmented upon each pause and each resulting utterance is labeled appropriately as , , or , depending upon its nature. However, if no such significant pauses exist, then the entire utterance remains intact and receives a suitable label. Additionally, it is far more difficult to judge if a pause actually signifies an utterance boundary within strings of s, s, or s than within strings of fluent speech. Example 12: Bmr012 1886.800-1891.3100 c1 s^cs 1891.310-1892.080 c1 fh and then just sort of have that as the and then you can have groups of twenty people or whatever . and - and uh == As a general convention, unless an utterance is comprised solely of floor holders, it is not to end with a floor holder . In the case that a floor holder is found at the end of an utterance, it is split from the utterance and either receives its own line or is merged with the following utterance of the same speaker, depending primarily upon its prosody and its temporal proximity to the following utterance. If the length of the floor holder is incongruent to the length of the words of the following utterance, the floor holder is of a different intonation in relation to the following utterance, or a significant pause exists between the floor holder and the following utterance, the floor holder is not merged with the following utterance. If the floor holder is merged with the following utterance and the following utterance is not a floor holder, then it is permissible for the resulting utterance, which consists of a floor holder and another DA, to contain multiple DAs. Additionally, although a floor grabber and a hold do not occur mid-speech as a floor holder does, these tags may also be merged with the following utterance if deemed necessary and the resulting utterance will also contain multiple DAs. Section 3.3 specifies the manner in which utterances with multiple DAs are treated. After splitting a floor holder from an utterance, it must be decided whether the portion which originally preceded the floor holder is complete or incomplete. Example 13 depicts an utterance ending with a floor holder and the same utterance is seen in Example 14 with the exception that the utterance has been segmented so that the floor holder receives its own line. 13 Example 13: Bmr010 601.519-604.014 c0 s and if it's good enough we'll arrange windows machines to be available so == 601.519-602.707 c0 s 603.465-604.014 c0 fh and if it's good enough we'll arrange windows machines to be available . so == Example 14: Bmr010 Regarding the tags , , , and , the largest problem is determining whether or not an utterance boundary exists after speech labeled with the tag , , or , that is if speech from the same speaker immediately follows, or if a boundary exists before speech labeled with the tag , that is if speech from the same speaker immediately precedes the portion labeled with the tag . This problem only emerges if the speech surrounding the portions labeled with the tags previously specified is such that the prosody bears no indication of a boundary between utterances, the speaker speaks so quickly that a boundary cannot be discerned, or else no significant pause is found to mark a boundary. When the issue arises that a boundary cannot be marked between speech labeled with the previously mentioned tags and the surrounding speech, then it is permissible for an utterance to have multiple DAs. Section 3.3 details the format of labels for utterances which have multiple DAs. Another issue regarding segmentation concerns otherwise complete utterances being segmented in such a way that yields abandoned utterances. For instance, a complete utterance may be quite lengthy and appear as though it ought to be segmented. However, segmenting the utterance may yield incomplete utterances that would be marked as abandoned. As the original intact utterance is complete and some of the segmented portions are marked as being abandoned, it is clear that segmenting the utterance in a way that yields abandoned utterances is incorrect. As an addendum to the aforementioned system of segmentation, if uncertainty exists as to whether or not to segment an utterance, a general guideline is to segment the utterance regardless. Also, portions of speech that constitute one utterance but for some reason, perhaps mistakenly, are segmented as multiple utterances are merged to form one utterance. 14 SECTION 3: HOW TO LABEL 3.1 Basic Format of DAs and Labels The basic format of a DA is as follows2: [ ^ specific tag ] The basic format of a label is as follows (depending upon the utterance, the portions enclosed in brackets may or may not be necessary): [ [ ^ ] [ | [ ^ ] ] [ . ] ] 3.2 Label Construction The general tag is a mandatory component of every label. Only one general tag is present in each DA. Specific tags and disruption forms (which indicate when a speaker has been interrupted, trails off, or else is indecipherable) are included within a label only when an utterance cannot be sufficiently characterized by a general tag and when further characterization is needed. Specific tags are appended to general tags when necessary and are not used alone. For the purpose of uniformity among annotators, when multiple specific tags are appended to a general tag, they are attached in alphabetical order3. In the following sets of tags, the first set contains general tags, the second set contains specific tags, and the third set contains disruption forms. Detailed descriptions of the tags in the three sets can be found in Section 5. Note that the tags found in Set 1 are 2 3 Throughout this manual, when discussing format, the convention of enclosing portions in brackets denotes that, depending upon an utterance, those portions may or may not be necessary. As specific tags are attached in alphabetical order, the tag <2> is the last tag within the alphabetically ordered hierarchy, rather than the first. 15 only used as general tags, the tags found in Set 2 are only used as specific tags (in conjunction with a general tag), and tags in Set 3 are only used as disruption forms. Set 1: General Tags s qy qw qr qrr qo qh b fg fh h Set 2: Specific Tags aa aap am ar arp ba bc bd bh bk br bs bsc bu by cc co cs d df e f fa fe ft fw g j m na nd ng no r rt t tc t1 t3 2 Set 3: Disruption Forms Disruptions %- %-- Indecipherable x % Within a DA, when specific tags are necessary, they are attached to the general tag with a caret (^), thus rendering the following depiction of a DA: < general tag >^< specific tag 1 >^< specific tag 2 >^< specific tag 3 > ...^< specific tag n > Disruption forms are attached to and separated from the end of a DA with a period < . >, as seen in the following representation: < general tag > [ ^ < specific tag 1 > ...^< specific tag n >] . < disruption form > 16 It must be noted that, in some cases, a disruption form is present within an utterance without sufficient information to assign a DA to that utterance. In such instances, a label comprised solely of a disruption form is necessary. Additionally, if for some reason an utterance is not to be labeled with a DA, then that particular utterance receives a label consisting only of the tag . For instance, if an utterance contains data that is not to be labeled on account of it containing digits, containing pre- or post-meeting chatter, pertaining to a "bleeped" portion in the corresponding audio file, or else is simply not relevant to the labeling task, a label comprised solely of the tag is used. As the tag is used to mark utterances which otherwise would be labeled with DAs but instead are intentionally not to be labeled, it is clear why the tag is not included within the other groups of tags (i.e. general tags, specific tags, and disruption forms). The tag does not provide any information regarding the characteristics and functions of utterances as the tags of the other groups do, and for this reason it is separated from those groups. The following is a partial list of sample labels that are acceptable within the previously established conventions for label construction: s qy qr b fg % s^bk qy^d^f^g^rt qr^rt b.% fh^rt %- s^nd qy^bh qrr.%-- b.x h %-- s^aa^rt.%-- qy^bu.%- qh^rt.% b^rt z x Listed below is an incomplete list of sample labels that are not acceptable within the previously established conventions for label construction: s^s aa^bk x.%-- %--.s^qy^d s^z s^s^aa %.%-- %--.x b.%- z.%-- It is worthy of mention that other restrictions apply in constructing labels. Such restrictions include particular specific tags which may only appear with certain general tags, particular general tags which have a limited set of applicable specific tags, and sets of specific tags which are prohibited from appearing in the same DA. Restrictions applying to the usage of tags are discussed in the individual tag descriptions in Section 5. 17 3.3 Annotating Utterances Containing Multiple DAs In cases where one DA does not suffice to represent an utterance, two DAs are used. Such a need arises in cases as those described in Section 2, usually with tags such as , , , , , , and which correspond to short utterances. Often, an utterance requires multiple DAs when a floor grabber or floor holder is uttered at the beginning of a statement or question, when a short answer of the nature, , or is following by a longer explanation, or when a statement is followed by a tag question . In some cases, an utterance requires multiple DAs when a statement is followed by a short answer of the nature, , or . In which case, the DAs can be separated in both the label and the portion of the transcript containing the utterance with a pipe bar < | >. The pipe bar < | > is only used when sequential portions of an utterance that operate closely together require different characterizations. For instance, a pipe bar is not used for an agreement and a question that immediately follows it. In fact, an agreement followed by a question does not constitute an utterance but constitutes two separate utterances instead. Rather, an agreement immediately followed by an explanation of the agreement, a longer, narrative form of agreement, or a direct reference to what the agreement regards would require a pipe bar so long as the prosody and lack of significant pauses warrants such usage of a pipe bar. The use of a pipe bar indicates that segmenting an utterance is not necessary, despite that the initial portion of an utterance, or last portion in the case of , has a different DA than the rest of the utterance. The pipe bar is indicated in the appropriate location within the label as well as within the transcription. Within the label, the pipe bar separates the DAs. Within the transcript, the pipe bar separates the portions of an utterance to which the different DAs apply. This is done in such a manner that the DA to the left of the pipe bar in the label pertains to the portion of the utterance to the left of the pipe bar in the transcript and the DA to the right of the pipe bar in the label pertains to the portion of the utterance to the right of the pipe bar in the transcript. Example 1 demonstrates the correct usage of a pipe bar, whereas Example 2 and Example 3 depict the incorrect usage of a pipe bar. Example 1: Bmr012 94.861-99.771 c4 fg|s^t um - | everyone should have at least two forms possibly three in front of you depending on who you are . 18 Example 2: Bmr012 94.861-99.771 c4 s^t|fg um - | everyone should have at least two forms possibly three in front of you depending on who you are . c4 fg|s^t um - everyone | should have at least two forms possibly three in front of you depending on who you are . Example 3: Bmr012 94.861-99.771 3.4 Disruption Forms Disruption forms are used to mark utterances that are indecipherable, abandoned, or interrupted. Only one disruption form may be used per utterance. Disruption forms are included in a label in one of three formats, depending upon the nature of an utterance. When a DA is not detected, a disruption form alone may comprise an entire label. When used in conjunction with a DA, disruption forms are marked using either a period < . > or a pipe bar < | >. If an utterance contains a disruption form and is too short to determine which DA applies to it, then only the disruption form is marked in the label. An utterance that is indecipherable may actually be quite lengthy, but because it cannot be deciphered, an appropriate DA cannot be assigned to it and only the disruption form is marked. Example 4 depicts a disrupted utterance which contains insufficient information to provide a DA: Example 4: Bro014 1207.310-1207.880 c1 %- but i- == Exceptions occasionally apply to short utterances deemed indecipherable. Utterances which appear to be backchannels, for instance, yet are indecipherable may be labeled with the appropriate DA along with a period and the applicable disruption form. Such treatment of indecipherable utterances is only employed when there is a high probability that the specific DA applies to the utterance based upon the surrounding context of the short utterance and the speaker's speech patterns. The following are two sample labels pertaining to short indecipherable utterances: 19 b.% b.x A period or a pipe bar is used in conjunction with a disruption form if a disruption form is indeed applicable to an utterance and if an utterance contains sufficient information to assign to it a DA. For instance, if an utterance, such as a statement, is interrupted or abandoned, the DA is marked and then followed by a period and the appropriate disruption form, as seen in Example 5: Example 5: Bro014 495.681-499.134 c4 s.%-- some people are arguing that it would be better to have weights on == In the case of Example 5, the utterance contains sufficient information to determine that it is indeed a statement, despite being abandoned. If an utterance does not contain adequate information to decide which DA applies to it, then a DA is not marked. Two types of instances exist in which an utterance containing a pipe bar requires a disruption form. In the first, an utterance requiring a pipe bar, such as what is discussed in Section 3.3, is either abandoned or incomplete. To the left of the pipe bar is a DA containing a tag such as or and to the right is a statement or explanation of some sort that is either incomplete or abandoned. Note that the disruption form only applies to the DA to the right of the pipe bar. Keeping in mind that the portion of the utterance to the right of the pipe bar contains sufficient information to assign to it a DA and is also abandoned or incomplete, its DA is followed by a period and the appropriate disruption form, as seen in Example 6: Example 6: Bro014 1897.760-1904.500 c0 s^bk|s.%-- yeah | hopefully i think what we want to have is to put these features in s- some kind of == In the second instance in which an utterance containing a pipe bar requires a disruption form, the portion of the utterance to the right of the pipe bar does not contain sufficient information to assign to it a DA. This portion may be abandoned, interrupted, or indecipherable. The DA designated to the portion of the utterance to the left of the pipe bar clearly begins upon the onset of the utterance and ends at the point where the pipe bar is placed. The DA pertaining to the initial portion of the utterance is marked, a pipe bar is placed after the DA in the label and at the point where that particular DA ends in the transcript, and a disruption form is marked after the pipe bar, as seen in Example 7 and Example 8: 20 Example 7: Bmr028 1187.370-1188.240 c1 fg|%- yeah | he == c2 s^aa|%-- yeah | it's uh == Example 8: Bro014 403.710-405.428 The distinction between the use of the pipe bar and a period exists in how an utterance can be divided. An utterance divided by a pipe bar behaves in some ways as two separate utterances. The segment of the utterance to the left of the pipe bar will be annotated with a particular DA that is different from the DA used to annotate the right, that is if it is possible to assign a DA. The pipe bar exists as a clear boundary which marks where one DA ends and another begins in a single utterance. The portion to the right of the pipe bar behaves as a separate utterance in that it alone is the specific segment which is interrupted, abandoned, or indecipherable. The portion to the left is complete. With regard to periods, and even labels consisting solely of disruption forms, no clear and comparable boundary as found in utterances requiring pipe bars exists. The exact region within an utterance where the disruption form occurs does not behave as a separate segment of the utterance that can be marked clearly with a mechanism such as a pipe bar. It is also unnecessary to use a pipe bar to mark where an interruption begins or where a speaker abandons his utterance, since the DA to the left of the pipe bar may also apply to the other side where the disruption form is marked. Additionally, the reasoning behind why a disruption form is not used as a tag within a DA is that the tags used within a DA apply primarily to the function of an entire utterance. Disruption forms, however, usually apply only to the end of the utterance. For this reason, the use of periods with disruption forms is deemed necessary. 3.5 Quotes Utterances that contain quoted material are to end with punctuation that reflects the DA of the utterance overall. If a quoted question is embedded within a statement, a period, rather than a question mark, is used at the end of the utterance in the transcript and no other punctuation is used. A colon in the label signifies that there is quoted material in the transcription. The DA to the left of the colon characterizes the function of the entire utterance and the DA to the right of the colon characterizes only the quote. If the quoted material only consists of a 21 few words, such as a noun phrase, DA annotation of the quotation is unnecessary. Example 9 demonstrates the manner with which quotes are handled: Example 9: Bmr026 941.984-944.924 c1 s^cs:qw 945.464-947.864 c1 s:qy and just say an e- - just ask him that you know wha- - what should you do . and in my answer back was are you sure you just want one . 3.6 Using TableTrans (Annotation Interface) A. The Interface There are three sections of TableTrans: the labeling and transcription section located at the top, the time-segmented transcription located in the middle, and the waveform located at the bottom. 22 In the labeling and transcription section, the first and second columns on the left provide the start and end times for each utterance and the third column denotes the speaker or channel number. DA and adjacency pair (AP) labels are entered in the fourth and fifth columns. The comment field is located in the sixth column and is primarily for an annotator's notes regarding an utterance. The last column on the right, under the "Trans" heading, provides the transcript of the utterances. In order to label a meeting, the "Open Annotation File" command must be selected from the "File" menu. A sub-menu will appear providing three formats that can be used. "Table Format" is the format that is most widely used. A window will appear with a "Feature List" and a "Delimiter" to which clicking the "OK" button is necessary. Shortly after, the segment of the meeting to be annotated will appear. Although the data within the fourth, fifth, sixth, and seventh columns may be altered within the interface, the Time-Segmented section, which is the first two columns and shows the annotator a series of utterances in chronological order, and the third column denoting the speaker cannot be modified. B. TableTrans Commands COMMAND ACTION Changing the Transcript Ctrl-s Splits the current row at the location of the cursor in the TRANS field. Ctrl-m Merges the current row with the next row by the same speaker. Moving within a Field Ctrl-f or left-arrow Moves forward one character in a field. Ctrl-b or right arrow Moves backward one character in a field. Ctrl-p or up-arrow Moves up to previous row. Ctrl-n or down-arrow Moves down to next row. Shift + left-arrow Moves to previous field in the same row. Shift + right-arrow Moves to next field in the same row. Ctrl-1 (In the Time-Segmented Transcription window) Opens up Comment Field Window Plays a segment Ctrl-a Moves cursor to the beginning of a field Ctrl-e Moves cursor to the end of a field right-click 23 C. Printing Commands Annotators can print out their comments using the program "csvcomment." The command "csvcomment " is entered in the terminal window, where is the name of the ".csv" file to print. D. Playing the Sound File To open up the wave file of a meeting to be labeled, a link command can be made from the location where the sound file is saved in the annotator's home directory. After returning to the TableTrans interface, "Open Sound File" is selected from the "File" menu. The file can then be opened after browsing through the annotator's home directory. 24 SECTION 4: ADJACENCY PAIRS 4.1 Purpose and Definition Labeling adjacency pairs (AP) in meetings provides a means to extract the information provided by the interaction between speakers. Adjacency pairs reflect the structure of conversation and are paired utterances such as question-answer, greeting-greeting, offer-acceptance, and apology-downplay. (Levinson 1983) APs are defined as sequences of two utterances that are: 1. produced by different speakers 2. ordered with a first part (marked with “a”) and a second part (marked with “b”) (Levinson 1983) An example of an AP is shown below: Example 1: Bro016 113.976-116.502 116.883-117.850 c4 c5 s^bu s^aa but you were looking at mel cepstrum . yes . In Example 1, the utterances depict direct interaction between the two speakers. 4.2 Labeling Adjacency Pairs Adjacency pairs consist of two parts, where each part is produced by a different speaker. The basic form of an AP is seen below: This format allows APs to be enumerated as: 1a, 1b, 2a, 2b, and so on. A different number is assigned for each AP, yet every AP will contain an "a" part and a "b" part. A labeled AP is seen in Example 2: 25 Example 2: Bmr023 312.382-314.770 c2 qy^rt 30a 314.770-318.470 c3 s^na 30b are you implying that it's currently disorganized ? in my mind . Although APs are to be marked sequentially in ascending order, it is possible that the numerical value of an AP jumps ahead of the numerical value of the previous AP by more than a value of one (e.g., an AP has a numerical value of 5 and the following AP has a numerical value of 7 instead of 6). However, such is only permitted so long as the sequential order of the APs is preserved and the numerical values are not repeated or used cyclically for entirely different APs. 4.3 Labeling Conventions Specific labeling conventions have been established when marking APs in instances in which an utterance contains multiple AP parts, an AP part consists of multiple utterances, multiple speakers pertain to the same AP part, and an AP is overlooked. A. Multiple AP Parts per Utterance If an utterance functions as a "b" part of one AP and an "a" part of another AP, then both APs are marked with a period < . > separating the two APs, as seen below: . A portion of a conversation in which APs are labeled is seen in Example 3: Example 3: Bro021 66.555-68.227 c2 s^rt 69.904-70.928 70.928-71.952 c2 c2 fh fh 4a 26 well the first thing maybe is that the p- eurospeech paper is uh accepted . um == yeah . 72.059-74.710 c5 qw^rt 4b.5a 74.702-81.090 c2 s^rt 5b.6a 80.320-82.794 c5 qy^bu^d^rt 6b.7a 82.614-83.700 83.110-83.750 c2 c5 s^aa s^bk 7b.8a 8b this is - what - what do you uh - what's in the paper there ? so it's the paper that describe basically the um system that were proposed for the aurora . the one that we s- - we submitted the last round ? right yeah . uhhuh . Refer to Section D for details regarding the treatment of utterances requiring three AP parts. B. Continued AP Parts A continued AP part is an AP part consisting of multiple utterances by the same speaker. When a continued AP part arises, a plus sign <+> is placed at the end of the AP. Example 5 depicts an instance where an AP part consists of multiple utterances: Example 5: Bro016 1494.110-1499.560 c1 qy^rt 20a 1497.570-1501.320 c5 s^ar|s^nd 20b 1501.320-1503.200 c5 s^df^nd 20b+ 1503.200-1505.070 1505.690-1509.900 c5 c5 s.%-s^cs 20b++ 20b+++ do you have something simple in mind for - i mean vocal tract length normalization ? uh no | i hadn't - i hadn't thought - it was - thought too much about it really . it just - something that popped into my head just now . and so i - i == i mean you could maybe use the ideas - a similar idea to what they do in vocal tract length normalization . Additionally, an utterance consisting of a tag question is included within an AP part, assuming the utterance containing the statement preceding it is a portion of the AP part. In which case, the utterance containing the tag question will receive the appropriate number of plus signs when labeled with an AP. 27 If an utterance contains multiple APs, where one or both is a continued AP part, a period < . > is inserted between the two APs to separate them (e.g., 5b++.6a+). C. Multiple Speakers per AP Part In some cases, an AP part consists of two or more speakers. This occurs most often with the "b" part and quite rarely with the "a" part. When such an occurrence arises, the corresponding AP number and AP part are marked. Then each speaker contributing to the same AP part receives a numerical value based upon the order in which the speakers make their utterances. So the first speaker to contribute to an AP part receives a value of 1, the second a 2, and so on. A hyphen <-> followed by a speaker's numerical value is then appended to the AP. The format of an AP consisting of multiple speakers is seen below:- AP parts containing multiple speakers are seen in Example 5: Example 5: Btr001 150.780-152.664 c5 s^bu 9a 151.730-152.365 152.467-153.164 c3 c2 s^aa s^aa 9b-1 9b-2 parentheses meaning uncertainty . yeah . uhhuh . If, for instance, the speaker designated as c2 in Example 5 continued speaking so that a continued AP part resulted, then his next utterance would be labeled as 9b-2+, the next 9b-2++, and so on as necessary. When continued AP parts occur within AP parts consisting of multiple speakers, each speaker retains his designated numerical value and plus signs <+> are appended after the numerical values as necessary. Additionally, if an utterance contains multiple APs, where one or both is an AP part consisting of multiple speakers, a period < . > is inserted between the two APs to separate them (e.g., 5b-1.6a+, 1b-3+.2a). D. Handling Overlooked APs As stated in Section 4.2, APs are to be marked sequentially in ascending order. Occasionally, an AP is overlooked. If marking an overlooked AP with the next numerical value in sequence results in a non-sequential ordering of APs then an additional convention is implemented to handle the overlooked AP. 28 For instance, if a meeting is labeled with APs in sequence starting with a numerical value of 1 and ending with a value of 50 and an overlooked AP exists between an AP with a numerical value of 34 and an AP with a numerical value of 35, the overlooked AP is not to receive a numerical value of 51. Instead, the AP receives a numerical value of 34 followed by an underscore <_> and the appropriate AP part. The AP part is followed by a hyphen with a numerical value and plus signs when necessary. An overlooked AP located between two APs has the following format: _ [ - ][+1, +2, …+n] If a number of overlooked APs exist in sequence, for instance if three APs exist between APs 34 and 35, then a slight modification of the above convention is necessary. The first overlooked AP receives an AP in the format detailed above. The second overlooked AP receives an AP in the same format but with two underscore <_> symbols instead of one. The third overlooked AP receives an AP in the same format but with three underscore symbols and so on, thus yielding the following format: _1, _2, …_n [ - ][+1, +2, …+n] E. Labeled Meeting Sample Example 6 depicts the labeling conventions discussed in Sections A through C. What is particularly unique about this example is that it contains an utterance requiring two “a” parts. Additionally, this utterance requires a total of three AP parts – two “a” parts and one “b” part – when utterances usually require at most two. Example 6: Bmr003 1594.720-1595.830 c3 qy^d 47b.48a 1595.360-1596.610 c2 s.%- 1595.400-1595.950 1595.570-1597.070 c4 c0 s^aa|s^na s^na 1596.530-1597.570 1597.130-1597.510 1597.570-1597.840 1597.840-1598.100 1597.760-1597.990 1598.170-1598.360 c3 c2 c3 c3 c2 c0 s^bk s^bk s^bk s^ba s^bk qy^d^g^rt 29 you've already - you've already done some ? 48b-1 she - she's done one – she's one == 48b-2 yes | i have . 48b-3.49a.50a she's - she's done about half a meeting . 49b-1 oh- - oh i see . 49b-2 right . 49b-1+ o_k . 49b-1++ good . 49b-2+ right . 50a+ right ? 1598.580-1598.950 1598.580-1598.980 1599.150-1600.160 c2 c0 c4 s.%qy^d^rt s^no 50a++ 50b.51a i'm go- == about half ? s- - i'm not sure if it's that's much . This utterance requires a “b” part as it contains the response to an earlier utterance, which constitutes the “a” part of the AP with a numerical value of 48. The “a” part of the AP with a numerical value of 49 only consists of one utterance and receives a number of responses. The utterance requires another “a” part for the AP with a numerical value of 50 as this utterance, along with the speaker’s following two utterances, comprise the “a” part for yet another AP. F. Complex Form of an AP The following is a complex form of an AP, taking into account the aforementioned conventions: [ _1 , _2 , …_n ] [- ][ +1, +2, …+n ][ . >[ _1 , _2 , …_n ] … ] 4.4 Restrictions on Using Adjacency Pairs Certain restrictions apply to which tags can or cannot be labeled with an AP. APs denote direct interaction between speakers. Backchannels , which serve simply to encourage the current speaker, are never marked with APs. Backchannels are not uttered directly to a speaker as a response and do not function in a way that elicits a response either. Rhetorical question backchannels , receive APs when uttered as acknowledgments and do not receive APs when uttered as backchannels. Floor holders and floor grabbers are also never marked with APs, since they, like backchannels, are not said directly to anyone. Holds , however, are marked. The definition of a hold entails that a speaker is given the floor and is expected to speak in response to something and "holds-off" prior to making an utterance. As the speaker is expected to speak and then utters a hold, which is usually followed by a response, the hold is considered part of the response. Mimics and collaborative completions <2> are always marked with APs, as they are always in direct reference to another speaker's utterance. 30 When indecipherable utterances appear, if the utterance can be characterized with a DA and it appears as though the utterance functions within an AP, then an AP is marked accordingly. Otherwise, no AP is marked. In some cases, it is quite difficult to determine to which utterance a response refers. If such difficulty arises, then an AP is not marked. For instance, a scenario may arise where two or three speakers utter statements simultaneously and another speaker utters an acknowledgment. As an acknowledgment by one speaker to another speaker is usually marked with an AP, if it cannot be determined whom a speaker is acknowledging, then an AP is not marked. 31 SECTION 5: TAG DESCRIPTIONS 5.1 Preliminaries This section provides a detailed description of each tag and the rules governing the usage of each tag. The tags are categorized into thirteen groups according to syntactic, semantic, pragmatic, and functional similarities of the utterances they mark. Beneath a group heading will be a general description of the group along with explanations of the tags within the group. Most tag descriptions will contain examples4 from data to further elucidate a tag's usage. With regard to the examples provided within this section, it is of much use to listen to the corresponding audio portions, as some examples cannot be fully comprehended otherwise. In particular, utterances marked as floor grabbers , floor holders , holds , backchannels , acknowledgements , and accepts share a common vocabulary which renders examples of these tags in text insufficient in fully communicating how utterances marked as such are identified. 5.2 Group 1: Statements This group contains only one tag, , and serves as the default general tag. StatementThetag is the most widely used tag in the MRDA tagset. Unless an utterance is completely indecipherable or else can be further described by a general tag as being a type of question, backchannel, floor grabber, floor holder, or hold, then its default status as a statement remains. When necessary, specific tags are appended to thetag to further characterize utterances. The use of thetag is seen in Example 1 through Example 4: 4 In some examples, when displaying surrounding context, unnecessary lines, such as those which are irrelevant to characterizing a particular tag within the tag descriptions, may be edited out. The content of utterances within the examples remains unchanged. 32 Example 1: Bro004 578.567-585.527 c3 s if we exclude english um - there is not much difference with the data . c5 s^ba it's a great story . c1 s^bu so this changes the whole mapping for every utterance . c1 s^bk okay . Example 2: Bed016 70.600-71.470 Example 3: Bro021 3201.960-3204.850 Example 4: Bro021 3204.850-3205.490 5.3 Group 2: Questions This group contains all general tags pertaining to questions. The tag description for elaborationsprovides instructions regarding the treatment of questions followed by elaborations. Y/N Question This tag marks utterances in the form of yes/no questions if and only if they have the pragmatic force along with the syntactic and prosodic indications of a yes/no question (i.e. subject-auxiliary inversion or question intonation). Essentially, an utterance is considered a yes-no question if it sounds as if it elicits a yes or no answer. This is not to say that all yes/no questions will receive yes or no answers. A question may be asked in a yes/no manner, but the response it receives may not be a simple yes or no. Regardless of the answer, the utterance is still considered a yes/no question. Basic yes/no questions are seen in Example 5 through Example 8: Example 5: Bro016 58.863-61.782 c4 qy^rt do you think that would be the case for next week also ? 33 Example 6: Bmr027 2049.340-2051.730 c5 qy^rt did i say that ? c4 qy^bu^rt didn't they want to do language modeling on you know recognition compatible transcripts ? c1 qy^rt is this channel one ? Example 7: Bmr027 1836.000-1838.580 Example 8: Bmr012 6.805-17.875 The tag is also used as the general tag for tag questions (e.g., "Yeah?", "Isn't it?", etc.) and rhetorical question backchannels (e.g., "Really?", "Isn't that interesting?", etc.). Many declarative questions are also in the form of yes/no questions. Example 9 through Example 11 exhibit these characteristics: Example 9: Bro016 513.765-514.316 c4 qy^d^g^rt right ? c5 qy^bh oh really ? c4 qy^bu^d^rt the insertion number is quite high ? Example 10: Bmr027 2016.230-2017.440 Example 11: Bmr027 514.316-514.867 Additionally, a convention has been established in handling instances when a yes/no question is followed by an elaboration which requires its own line. In such cases, the following elaboration could be considered a declarative yes/no question . Instead, the elaboration receives a DA of , along with any other necessary specific tags. An instance of a yes/no question followed by an elaboration is seen in Example 12: Example 12: Bro021 316.709-319.202 c5 qy^rt wasn't there some experiment you were going to try ? 34 319.202-325.216 c5 s^e.%-- where you did something differently for each um uh - i don't know whether it was each mel band or each uh um f f t bin or someth- == In some cases, it may be difficult to determine whether an utterance is a yes/no question or an "or" question. The tag description for details how distinguish between the two tags in certain scenarios. Wh-Question Wh-questions are questions that require a specific answer. These usually contain "wh" words such as the following: what, which, where, when, who, why, or how. However, not all questions containing a "wh" word are considered wh-questions. The section on open-ended questions elucidates this point. Wh-questions are shown in Example 13 and Example 14: Example 13: Bmr012 62.153-64.053 c3 qw^r^t3 why didn't you get the same results and the unadapted ? c2 qw^t3 i guess - what time do we have to leave ? Example 14: Bmr012 231.944-233.704 Declarative wh-questions often appear as wh-questions prior to wh-movement. instance in which a declarative wh-question is used is seen in Example 15. An Example 15: Bed003 2889.130-2890.200 2890.330-2890.750 2891.010-2892.820 c1 c3 c1 qw qw^d^rt s^rt what's the technical term ? for which ? for the uh - nodes that are observable . In some cases, utterances that do not contain wh-words are labeled as wh-questions because they function as wh-questions. Such an instance is seen in Example 16: 35 Example 16: Bmr012 61.563-61.713 c0 qw^br^t3 hm ? In Example 16, the utterance functions as a wh-question, in that "hm?" is akin to "what?" as a request for repetition. "Huh?", "excuse me?", and "pardon?" also appear as whquestions in that they can also function in the same manner as what is exemplified in Example 16. Caution must be taken to distinguish whether such utterances are indeed wh-questions or if they are floor grabbers, floor holders, holds, backchannels, yes/no questions that are rhetorical question backchannels, or acknowledgments. Declarative wh-questions that do not contain "wh" words are often confused with declarative forms of other questions because they appear the same syntactically. Despite this syntactic similarity, they differ functionally based upon the response that the question seeks. In determining whether an utterance is a declarative wh-question that does not contain a "wh" word, the surrounding context, in particular the response the question generates, is crucial to note. Most often, declarative wh-questions that do not contain "wh" words are requests for repetition, such as those seen in Example 17 through Example 19. Example 17: Bmr031 947.610-948.925 948.925-950.240 949.569-951.874 c8 c8 c2 s %fg|s it's still yeah two or three d v ds . but == yeah | not if you have to distribute the video also . two or three ? if you use both sides and the two layer and all that . 949.941-950.878 951.125-953.860 c5 c8 qw^br^d s^df 3193.230-3198.820 c2 fh|s^cc 3198.820-3204.400 c2 s^df 3205.460-3208.780 3207.200-3207.840 3208.780-3210.430 c2 c8 c2 fh qw^br^d^rt h|s c8 qw^br^d^rt Example 18: Bro003 and um | for the broader class nets we're - we're going to increase that . because the um the digits nets only correspond to about twenty phonemes . so . broader class ? um | the broader - broader training corpus nets . Example 19: Bro003 3400.840-3402.950 36 and - and you're saying about the 3403.290-3404.350 3405.000-3409.590 c4 c4 spanish ? the spanish labels . that was in different format . s s Or Question "Or" questions offer the listener at least two answers or options from which to choose. Section 2 and Section 3.3, which deal with segmentation and multiple DAs within an utterance, are quite helpful in determining if a question is actually an "or" question or if it is a yes/no question followed by an "or" clause after a yes/no question . Select "or" questions can be seen in Example 20 through Example 23: Example 20: Bmr001 305.466-307.826 c0 qr^rt are we going to - i mean - is it going to be over there or is it going to be in there ? c4 qr are you assuming that or not ? c1 qr^rt do we have like a cabinet on order or do we just need to do that ? cB qr is this the same as the e mail or different ? Example 21: Bed003 1214.120-1215.140 Example 22: Bmr001 339.042-342.612 Example 23: Bmr007 165.987-167.447 In terms of the responses "or" questions receive, the obvious response is one in which a speaker selects one of the options posed within the "or" question. Sometimes the "or" question is interrupted and answered as if it is a yes/no question. In these cases, the question is marked as an "or" question if it seems as if the speaker would have continued the question in an "or" question format if he had not been interrupted. In other instances, the speaker asking the question might abandon his utterance, and the speaker answering the question may respond as if the question were a yes/no question without having interrupted the question at all. 37 If a speaker abandons a question that is seemingly an "or" question, it is actually a rather cumbersome task determining whether the question is indeed an "or" question or not. The point where the speaker abandons his question is of crucial importance. If the speaker abandons while posing at least a second option or after having posed at least two options, the question can be considered an "or" question. If the speaker abandons after saying the word "or" and has not issued a second option, the question could either be an abandoned "or" question or a yes/no question followed by an "or" clause, as mentioned above. If the speaker abandons at the word "or" abruptly, the utterance is most likely an "or" question. If the speaker trails off at the word "or" so that the word "or" is lengthened and sounds reminiscent of a floor holder , the "or" is segmented from the utterance or else separated by a pipe bar and is labeled as an abandoned "or" clause after a yes/no question and the remainder of the utterance is labeled as a yes/no question. Example 24 through Example 31 depict instances of interrupted and abandoned "or" questions: Example 24: Bed011 2776.460-2779.490 c1 qr.%- is that roughly the equivalent of - of what i've seen in english or is it ?== c5 qr.%- you know - did she miss some overlaps or did she ?== cB qr.%- is this uh just raw counts or is it ==? c2 qr.%-- well - oh wa- - in terms of the speakers or the conditions or the ?== c1 qr^rt.%-- do the transcribers actually start wi- with uh - transcribing new meetings or are they ?== c8 qr.%-- has that started or is that ?== Example 25: Bmr005 2018.090-2023.710 Example 26: Bmr007 369.570-372.515 Example 27: Bmr013 1987.000-1989.000 Example 28: Bmr013 2064.000-2069.000 Example 29: Bmr014 582.763-585.270 38 Example 30: Bmr001 944.512-945.412 c8 qr^rt.%-- per channel or ?== c2 qr.%-- and north midland like like - uh illinois or ?== Example 31: Bmr009 1748.000-1751.000 If an utterance is suspected to be an "or" question but the speaker abandons or is interrupted before saying "or" and has not posed a second option, the utterance cannot be considered an "or" question since there is insufficient evidence to label it with the tag. Furthermore, even with the presence of the word "or" along with a second option, it may be difficult to determine whether an utterance is an "or" question or a yes/no question, wh-question, or an open-ended question. If the question is actually presenting two specific options, the question is an "or" question. The question is not an "or" question if it presents one option and ends with a clause such as "or something." If a question ends with such a clause, the clause is not labeled separately with the tag . Example 32 through Example 34 show instances when questions that are seemingly "or" questions are to be labeled as otherwise: Example 32: Bmr005 3550.080-3551.680 c2 qy^d^rt^2 lapel mikes or something ? c0 qw what if there was a door slam or something ? c6 qy is there a - a transformation uh - like principal components transformation or something ? Example 33: Bmr006 2057.610-2061.670 Example 34: Bmr010 425.800-429.800 39 Or Clause After Y/N Question This tag marks when a speaker adds an "or" clause to a yes/no question. The previous description of "or" questions in conjunction with Section 2 and Section 3.3, which deal with segmentation and multiple DAs within an utterance, are also quite useful in determining whether a segment is an "or" clause and how to treat it. As with the description of the tag , utterances marked with must actually be posing some sort of option, rather than being a wh-question, for instance, preceded by the word "or." Oftentimes, "or" clauses following yes/no questions are abandoned or else interrupted and the entire utterance consists of the word "or." In these cases, the label for such an utterance contains the tag along with the appropriate disruption form. Example 35 through Example 39 display in context instances where the tag is used: Example 35: Bed003 1867.670-1868.970 1868.970-1870.270 c1 c1 qy^rt qrr do you have the true source files ? or just the class ? 405.920-411.860 c1 qy^rt 411.860-413.440 c1 qrr the - i guess the question on my mind is do we wait for the transcribers to adjust the marks for the whole meeting before we give anything to i b m ? or do we go ahead and send them a sample ? c0 c0 qy^d^rt qrr.%-- so - is it - it's going to disk ? or is this ?== 2722.490-2727.000 c1 qr 2727.000-2728.000 c1 qrr.%-- did they ever try going - going the other direction from simpler task to more complicated tasks ? or ?== Example 36: Bmr018 Example 37: Bmr001 2178.450-2179.950 2179.950-2180.340 Example 38: Bmr018 40 Example 39: Bro004 1922.810-1928.020 c1 qy 1928.020-1928.130 c1 qrr.%- so do you - are you - w- - did you have something going on - on the side with uh - or on - on this ? or ?== Open-ended Question An open-ended question places few syntactic or semantic constraints on the form of the answer it elicits. A question containing a "wh" word and consequently appearing to be a wh-question may actually be an open-ended question instead. Additionally, a question that is seemingly a yes/no question or an "or" question may actually be an open-ended question. As a wh-question, a yes/no question, and an "or" question require a specific answer, an open-ended question, as its name suggests, does not seek a specific answer at all. Rather, an open-ended question is asked in a broad sense. Open-ended questions are seen in Example 40 through Example 48: Example 40: Bmr007 112.365-116.868 c3 fh|qo^d^rt um | and anything else ? c3 qo^d nothing else ? c3 fh|qo^d^rt um | and anything else anyone wants to talk about ? c3 qo^rt d- e- - anybody do you have any anybody have any opinion about that ? c5 qo anybody have any intuitions or suggestions ? Example 41: Bmr007 117.088-118.018 Example 42: Bmr007 92.862-98.798 Example 43: Bmr013 654.000-657.000 Example 44: Bmr026 2307.190-2309.690 41 Example 45: Bmr007 1681.390-1683.180 c3 fg|qo but - | what - what do you think about that ? c4 qo^j how about them energy crises ? c0 qo^t what about the um - your trip yesterday ? c2 qo^d questions ? Example 46: Bmr014 2691.750-2693.090 Example 47: Bmr007 100.580-102.340 Example 48: Bed006 666.870-667.530 Rhetorical Question The tag marks questions to which no answer is expected. Such questions are used by the speaker for rhetorical effect; they are essentially statements formulated as questions. Although rhetorical questions and rhetorical question backchannels are similar, lacks semantic content, functions mostly as a continuer, and is not used by a speaker who has the floor. Rhetorical questions are seen in Example 49 through Example 55: Example 49: Bed011 2204.540-2206.420 c2 qh^rt i mean is this realistic ? c4 qh^aa why not ? c4 qh^cs so why don't you - you start with that ? c3 qh s- - i mean who cares ? Example 50: Bmr005 3802.380-3802.680 Example 51: Bmr005 525.596-530.188 Example 52: Bmr009 2089.900-2090.800 42 Example 53: Bmr009 2512.610-2513.290 c1 qh^ba isn't that wonderful ? c0 qh^co why don't you read the digits ? c1 fh|qh uh - | but who knows ? Example 54: Bmr009 2778.960-2779.800 Example 55: Bmr012 1414.430-1415.430 5.4 Group 3: Floor Mechanisms This group contains all general tags pertaining to mechanisms of grabbing or maintaining the floor. The only disruption forms that can be appended to tags within this section are the indecipherable tag <%> and the nonspeech tag . Additionally, no specific tag may be appended to the tags denoted as floor mechanisms. Section 2 and Section 3.3 detail the issues regarding segmentation with floor mechanisms. Floor Grabber Floor grabbers usually mark instances in which a speaker has not been speaking and wants to gain the floor so that he may commence speaking. They are often repeated by the speaker to gain attention and are used by speakers to interrupt the current speaker who has the floor. Most often, floor grabbers tend to occur at the beginning of a speaker's turn. In some cases, none of the speakers will have the floor, resulting in multiple speakers vying for the floor and consequently using floor grabbers to attain it. During such occurrences, many speakers talk over one another without actually having the floor. Floor grabbers are also used to mark instances in which a speaker who has the floor begins losing energy during his turn and then uses a floor grabber to either regain the attention of his audience or else because it seems as though he is relinquishing the floor, which he does not wish to do. Such mid-speech floor grabbers are usually followed by a change in topic. Floor grabbers are generally louder than the surrounding speech. Although the energy of a floor grabber is relative to the energy of the surrounding speech, it is also relative to the energy of a speaker's normal speech. 43 Common floor grabbers include, but are not limited to, the following: "well," "and," "but," "so," "um," "uh," "I mean," "okay," and "yeah." It is worth mentioning that the identification of floor grabbers is not merely based purely on the vocabulary used, but rather on the speaker's actual attempt, whether successful or not, to gain the floor. As previously mentioned, floor grabbers are not to be identified solely based upon the vocabulary used, as floor grabbers, floor holders , holds , backchannels , acknowledgements , and accepts share a very similar vocabulary. In order to properly distinguish whether an utterance is performing as a floor grabber, floor holder, hold, backchannel, acknowledgement, or accept, it is necessary to take into account the details provided within the individual tag descriptions and to listen to the audio portions corresponding to the examples within those tag descriptions. Utterances labeled with these tags tend to appear very similar in text yet emerge exceedingly different in sound. As floor grabbers and backchannels are often confused on the basis of having a similar vocabulary, they are actually quite distinct in sound. The main distinctions between the two is that backchannels have a lower energy level in relation to the surrounding speech and are not used by someone who has or is attempting to gain the floor. Also, backchannels are considered "background" speech. The floor grabbers seen in Example 56 through Example 60 are shown merely to illustrate how they appear in text. The surrounding context has been omitted for each example, as it provides little to no information regarding how to identify floor grabbers. Example 56: Bed004 1017.990-1018.180 c4 fg but uh == c2 fg okay . c2 fg yeah but == c2 fg|s.%- well | or also for you know - if people are not == c4 fg|qy^df well i mean - | is the - is the handheld really any better ? Example 57: Bed004 1052.310-1052.620 Example 58: Bed004 2264.780-2265.060 Example 59: Bmr012 1814.65-1817.01 Example 60: Bmr012 1822.12-1824.17 44 Floor Holder A floor holder occurs mid-speech by a speaker who has the floor. A floor holder is usually an utterance such as "uh" or "so" and is used as a means to pause and continue holding the floor. In some cases, a speaker will utter a floor holder at the end of his turn as a means to relinquish the floor. The duration of a floor holder is usually longer than that of the other words spoken by a speaker. Also, the energy of a floor holder is often similar to that of the surrounding speech by the same speaker. Common floor holders include, but are not limited to, the following: "so," "and," "or," "um," "uh," "let's see," "well," "and what else," "anyway," "I mean," "okay," and "yeah." In terms of placement, floor holders do not occur at the beginning of a speaker's turn, but rather occur throughout the middle and at the end5 of a speaker’s turn. Although floor holders do not occur at the beginning of a speaker’s turn or speech, they may occur at the beginning of a speaker's utterance. If a speaker begins his turn with a floor grabber followed by a floor holder, it is permissible to label the suspected floor holder as such. Section 2 discusses the treatment of floor holders in succession. Floor holders are often found mid-utterance. In such cases, if an utterance is complete and splitting it to mark the floor holder would yield an incomplete utterance, the utterance remains intact and the floor holder is not marked. In some cases, an utterance will end with a typical floor-holding word such as "um" or "uh" and, despite the presence of a common floor-holding word, a floor holder is not actually present, since the floor-holding word lacks the duration or "pause" property common to most floor holders. If such occurs, the utterance, while containing the floorholding word, is simply marked as incomplete and the floor-holding word is not marked as an actual floor holder. As previously mentioned, floor holders are not to be identified solely based upon the vocabulary used, as floor holders, floor grabbers , holds , backchannels , acknowledgements , and accepts share a very similar vocabulary. In order to properly distinguish whether an utterance is performing as a floor holder, floor grabber, hold, backchannel, acknowledgement, or accept, it is necessary to take into account the details provided within the individual tag descriptions and to listen to the audio portions corresponding to the examples within those tag descriptions. Utterances labeled with these tags tend to appear very similar in text yet emerge exceedingly different in sound. 5 As mentioned in Section 2, floor holders are not permitted to occur at the end of utterances. The treatment of floor holders within the transcript is discussed in Section 2 and Section 3.3. 45 Example 61 through Example 65 present floor holders in context: Example 61: Bed003 2524.030-2526.510 2526.510-2531.970 c1 c1 s fh|s.%-- so it's a - it's a rather huge huge thing . but um - um - | we can sort of == 2579.930-2581.760 c4 s 2581.760-2583.600 c4 fh like all the different sort of general schemas that they might be following . okay . 1336.010-1339.280 c2 s 1340.180-1344.840 c2 fh|s c2 c2 s^arp fh no i understand that . but i- - but um == c2 c2 fg fh okay so so == uh == Example 62: Bed003 Example 63: Bed004 i think we got plenty of stuff to talk about . and then um - | just see how a discussion goes . Example 64: Bed004 1596.700-1598.000 1598.000-1599.540 Example 65: Bed004 1672.310-1673.880 1673.880-1675.440 Hold The tag is used when a speaker who is given the floor and is expected to speak "holds off" prior to making an utterance. The tag is predominantly used when a speaker is responding to a question that he in particular was asked, and that speaker pauses or "holds off" prior to answering the question. Common holds include, but are not limited to, the following: "so," "um," "uh," "let's see," "well," "I mean," "okay," and "yeah." Holds are very similar to floor holders in the way that they sound, however holds occur at the beginning of a speaker's turn, as opposed to floor holders which occur in the middle or at the end of a speaker's turn. 46 Although the primary distinction between holds and floor holders is location, holds are not collapsed with floor holders as they provide explicit information regarding a speaker’s turn. Utterances marked as holds explicitly indicate that a speaker is given the floor, whereas utterances marked as holds indicate that a speaker merely has the floor. If a speaker's initial utterance is marked as a hold and his following utterances appear to be either holds or floor holders, those following utterances are marked as holds. In other words, if a speaker's initial utterance is a hold and his following utterances are seemingly floor holders, those utterances appearing as floor holders are marked as holds until an utterance is encountered that is to be marked with a question tag or with the statement tag. After such a question or statement is encountered, any following segment within that same speaker's speech that appears to be a floor holder is marked as a floor holder and not as a hold. As previously mentioned, holds are not to be identified solely based upon the vocabulary used, as holds, floor grabbers , floor holders , backchannels , acknowledgements , and accepts share a very similar vocabulary. In order to properly distinguish whether an utterance is performing as a hold, floor grabber, floor holder, backchannel, acknowledgement, or accept, it is necessary to take into account the details provided within the individual tag descriptions and to listen to the audio portions corresponding to the examples within those tag descriptions. Example 666 through Example 68 present instances of holds in context: Example 66: Bro021 817.043-821.220 c1 qw 820.060-821.922 823.605-827.084 c2 c2 h s 828.960-829.683 830.079-831.107 838.050-839.197 c2 c2 c2 fh s^r fh c1 qy^d^rt i mean what was the rest of the system ? um == yeah it was - it was uh the same system . uhhuh . it was the same system . huh == Example 67: Bro021 3238.590-3243.580 so you estimated uh f- completely forgetting what you had before ? 6 In Example 66, the word “uhhuh” is used as a floor holder . Although the word “uhhuh” is not commonly used as a floor holder, this instance exemplifies the need to listen to corresponding audio portions in order to correctly assess the function of an utterance and not to label utterances according to the vocabulary used alone. 47 3244.200-3248.840 3248.840-3251.170 c4 c4 h s^ar|s^nd um == no no no | it's not completely noise . 1542.550-1546.120 c5 qy^rt 1546.120-1549.520 c5 qw 1550.050-1550.740 1550.740-1551.150 1551.150-1559.900 c0 c0 c0 h h s does there some kind of a distance metric that they use ? or how do they for cla- - what do they do for classification ? um == right . so the - the simple idea behind a support vector machine is um - you have - you have this feature space . Example 68: Bro018 5.5 Group 4: Backchannels and Acknowledgments This group contains the general tag for backchannels and the specific tags for acknowledgments , assessments/appreciations , and rhetorical question backchannels . The commonality among the tags of this group is that they are most often used to mark utterances that are often responses, in the form of acknowledgments or backchannels, to a speaker who has the floor as that speaker is talking. Such responses generally do not elicit feedback. Also, utterances marked with these tags generally do not serve the purpose of halting the speaker who has the floor. It may seem as though the tags and could be grouped with the tags in Group 5, since they are responses of a sort, they are instead placed in Group 4 due to the nature of the utterances they mark. The tags in Group 5 are limited to being orthogonally categorized as positive, negative, or uncertain. Utterances marked with are perceived as being neutral, whereas utterances marked with can be either positive or negative. Thus the tag is not included within Group 5 as its dynamic nature would prevent the preservation of the orthogonal categorization scheme within Group 5. Additionally, utterances marked with the tag generally tend to have more in common with utterances marked with the tag than with the tags in Group 5. These similarities are discussed in the tag description for . 48 Backchannel Utterances which function as backchannels are not made by the speaker who has the floor. Instead, backchannels are utterances made in the background that simply indicate that a listener is following along or at least is yielding the illusion that he is paying attention. When uttering backchannels, a speaker is not speaking directly to anyone in particular or even to anyone at all. Common backchannels include the following: "uhhuh," "okay," "right," "oh," "yes," "yeah," "oh yeah," "uh yeah," "huh," "sure," and "hm." The nature of backchannels does not usually permit utterances such as "uh," "um," and "well" as being perceived as backchannels, since these utterances do not indicate that a speaker is following along, but rather that a speaker has something to say or else is attempting to say something. As previously mentioned, backchannels are not to be identified solely based upon the vocabulary used, as backchannels, floor grabbers , floor holders , holds , acknowledgements , and accepts share a very similar vocabulary. In order to properly distinguish whether an utterance is performing as a floor grabber, floor holder, hold, backchannel, acknowledgement, or accept, it is necessary to take into account the details provided within the individual tag descriptions and to listen to the audio portions corresponding to the examples within those tag descriptions. Utterances labeled with these tags tend to appear very similar in text yet emerge exceedingly different in sound. Furthermore, backchannels are most often confused with acknowledgments and accepts than with floor grabbers, floor holders, and holds. One method in distinguishing if the , or tag is appropriate lies in the point at which the utterance occurs with regard to the speaker who has the floor's utterance. Acknowledgments generally appear after another speaker has completed a phrase or an utterance, as they are acknowledging the semantic significance of what is said. Accepts usually occur at the end of another speaker's utterances, as they are agreeing with what is said. Backchannels, although they can occur in the same locations as acknowledgments and accepts, can also be found in the middle of another speaker's phrase. Such midphrasal placement is a strong indicator that an utterance is a backchannel, rather than an acknowledgment or an accept, as the speaker uttering the backchannel lacks adequate semantic information from the other speaker's utterance to acknowledge it or agree to it. Additionally, backchannels are usually uttered with a significantly lower energy level than the surrounding speech, while acknowledgments tend not to be quite so low as backchannels and accepts are generally at the same level or else higher. Additionally, the only specific tag that may be appended to a backchannel is the rising tone tag