Difference between revisions of "TreeTagger"
(→See also) |
(→See also) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The '''TreeTagger''' is a tool for annotating text with part-of-speech and lemma information which has been developed within the [http://www.ims.uni-stuttgart.de/projekte/tc TC project] at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available. | The '''TreeTagger''' is a tool for annotating text with part-of-speech and lemma information which has been developed within the [http://www.ims.uni-stuttgart.de/projekte/tc TC project] at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available. | ||
+ | |||
+ | ==Background== | ||
+ | '''Part-of-speech tagging''' ('''POS tagging''' or '''POST'''), also called '''grammatical tagging''', is the process of marking up the words in a text as corresponding to a particular [[wikipedia:Lexical category|part of speech]], based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught school-age children, in the identification of words as [[noun]]s, [[verb]]s, [[adjective]]s, [[adverb]]s, etc. (see: [[:Category:Grammar]]). | ||
==Example Usage== | ==Example Usage== | ||
Line 13: | Line 16: | ||
finished. | finished. | ||
− | ==Part-of-speech tags used== | + | ==Part-of-speech tags (POST) used== |
− | . sentence closer (. ; ? *) | + | *. sentence closer (. ; ? *) |
*( left paren | *( left paren | ||
*) right paren | *) right paren | ||
Line 96: | Line 99: | ||
*WQL wh- qualifier (how) | *WQL wh- qualifier (how) | ||
*WRB wh- adverb (how, where, when) | *WRB wh- adverb (how, where, when) | ||
+ | |||
+ | ===Modified POST=== | ||
+ | <div style="float:left; margin:0px 20px 20px 0px;"> | ||
+ | {| align="center" style="border: 1px solid #999; background-color:#FFFFFF" | ||
+ | |- | ||
+ | |-align="center" bgcolor="#1188ee" | ||
+ | !Tag | ||
+ | !Description | ||
+ | !Examples | ||
+ | |- | ||
+ | |CC || Conjunction; coordinating || and, or | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |CD || Adjective; cardinal number || 3, fifteen | ||
+ | |- | ||
+ | |DET || Determiner || this, each, some | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |EX || Pronoun, existential there || there | ||
+ | |- | ||
+ | |FW || Foreign words || | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |IN || Preposition / Conjunction || for, of, although, that | ||
+ | |- | ||
+ | |JJ || Adjective || happy, bad | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |JJR || Adjective; comparative || happier, worse | ||
+ | |- | ||
+ | |JJS || Adjective; superlative || happiest, worst | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |LS || Symbol, list item || A, A. | ||
+ | |- | ||
+ | |MD || Verb; modal || can, could, 'll | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |NN || Noun || aircraft, data | ||
+ | |- | ||
+ | |NNP || Noun; proper || London, Michael | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |NNPS || Noun, proper, plural || Australians, Methodists | ||
+ | |- | ||
+ | |NNS || Noun; plural || women, books | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |PDT || Determiner; prequalifier || quite, all, half | ||
+ | |- | ||
+ | |POS || Possessive || 's, ' | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |PRP || Determiner; possessive second || mine, yours | ||
+ | |- | ||
+ | |PRPS || Determiner; possessive || their, your | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |RB || Adverb || often, not, very, here | ||
+ | |- | ||
+ | |RBR || Adverb; comparative || faster | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |RBS || Adverb; superlative || fastest | ||
+ | |- | ||
+ | |RP || Adverb; particle || up, off, out | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |SYM || Symbol || * | ||
+ | |- | ||
+ | |TO || Preposition || to | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |UH || Interjection || oh, yes, mmm | ||
+ | |- | ||
+ | |VB || Verb; infinitive || take, live | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |VBD || Verb; past tense || took, lived | ||
+ | |- | ||
+ | |VBG || Verb; gerund || taking, living | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |VBN || Verb; past/passive participle || taken, lived | ||
+ | |- | ||
+ | |VBP || Verb; base present form || take, live | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |VBZ || Verb; present 3SG -s form || takes, lives | ||
+ | |- | ||
+ | |WDT || Determiner; question || which, whatever | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |WP || Pronoun; question || who, whoever | ||
+ | |- | ||
+ | |WPS || Determiner; possessive and question || whose | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |WRB || Adverb; question || when, how, however | ||
+ | |- | ||
+ | ! colspan="4" bgcolor="#fff" | '''Punctuation''' | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |PP || Punctuation; sentence ender || ., !, ? | ||
+ | |- | ||
+ | |PPC || Punctuation; comma || , | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |PPD || Punctuation; dollar sign || $ | ||
+ | |- | ||
+ | |PPL || Punctuation; quotation mark left || `` | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |PPR || Punctuation; quotation mark right || <nowiki>''</nowiki> | ||
+ | |- | ||
+ | |PPS || Punctuation; colon, semicolon, elipsis || :, ..., - | ||
+ | |--bgcolor="#eeeeee" | ||
+ | |LRB || Punctuation; left bracket || (, {, [ | ||
+ | |- | ||
+ | |RRB || Punctuation; right bracket || ), }, ] | ||
+ | |} | ||
==See also== | ==See also== | ||
*[[wikipedia:Part-of-speech tagging]] | *[[wikipedia:Part-of-speech tagging]] | ||
*[[wikipedia:Phrase chunking]] | *[[wikipedia:Phrase chunking]] | ||
− | *[[http://search.cpan.org/~acoburn/Lingua-EN-Tagger/ Lingua-EN-Tagger | + | *[[Perl/Modules/Lingua]] |
+ | *[http://search.cpan.org/~acoburn/Lingua-EN-Tagger/ Lingua-EN-Tagger] — a [[Perl]] module | ||
+ | **[http://overstated.net/2004/10/01/presidential-debate-analysis Presidential Debate Analysis] (uses Lingua-EN-Tagger to parse [http://www.usingenglish.com/glossary/noun-phrase.html noun phrases]) | ||
+ | *CLAWS: | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws1tags.html UCREL CLAWS1 (LOB) Tagset] | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws2tags.html UCREL CLAWS2 Tagset] | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws5tags.html UCREL CLAWS5 Tagset] | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws6tags.html UCREL CLAWS6 Tagset] | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws7tags.html UCREL CLAWS7 Tagset] | ||
+ | **[http://www.comp.lancs.ac.uk/ucrel/claws/mapC7toC5.txt mapC7toC5] | ||
==External links== | ==External links== |
Latest revision as of 03:37, 18 June 2007
The TreeTagger is a tool for annotating text with part-of-speech and lemma information which has been developed within the TC project at the Institute for Computational Linguistics of the University of Stuttgart. The TreeTagger has been successfully used to tag German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, and old French texts and is easily adaptable to other languages if a lexicon and a manually tagged training corpus are available.
Contents
[hide]Background
Part-of-speech tagging (POS tagging or POST), also called grammatical tagging, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. (see: Category:Grammar).
Example Usage
% echo 'The three big red dogs.' | cmd/tree-tagger-english reading parameters ... tagging ... The DT the three CD three big JJ big red JJ red dogs NNS dog . SENT . finished.
Part-of-speech tags (POST) used
- . sentence closer (. ; ? *)
- ( left paren
- ) right paren
- * not, n't
- -- dash
- , comma
- : colon
- ABL pre-qualifier (quite, rather)
- ABN pre-quantifier (half, all)
- ABX pre-quantifier (both)
- AP post-determiner (many, several, next )
- AT article (a, the, no)
- BE be
- BED were
- BEDZ was
- BEG being
- BEM am
- BEN been
- BER are, art
- BEZ is
- CC coordinating conjunction (and, or)
- CD cardinal numberal (one, two, 2, etc.)
- CS subordinating conjunction (if, although)
- DO do
- DOD did
- DOZ does
- DT singular determiner/quantifier (this, that)
- DTI singular or plural determiner/quantifier (some, any)
- DTS plural determiner (these, those)
- DTX determiner/double conjunction (either)
- EX existential there
- FW foreign word (hypenated before regular tag)
- HV have
- HVD had (past tense)
- HVG having
- HVN had (past participle)
- IN preposition
- JJ adjective
- JJR comparative adjective
- JJS semantically superlative adjective (chief,top)
- JJT morphologically superlative adjective (biggest)
- MD modal auxiliary (can, should, will)
- NC cited word (hyphenated after regular tag)
- NN singular or mass noun
- NN$ possessive singular noun
- NNS plural noun
- NNS$ possessive plural noun
- NP proper noun or part of name phrase
- NP$ possessive proper noun
- NPS$ possessive plural proper noun
- NR adverbial noun (home, today, west)
- OD ordinal numeral (first, 2nd)
- PN nominal pronoun (everybody, nothing)
- PN$ possessive nominal pronoun
- PP$ possessive personal pronoun (my, our)
- PP$$ second (nominal) possessive prounon (mine, ours)
- PPL singular reflexive/intensive personal pronoun (myself)
- PPLS plural reflexive/intensive personal pronoun (ourselves)
- PPO objective personal pronoun (me, him, it, them)
- PPS 3rd. singular nominative pronoun (he, she, it, one)
- PPSS other nominative personal pronoun (I, we, they, you)
- QL qualifier (very, fairly)
- QLP post-qualifer (enough, indeed)
- RB adverb
- RBR comparative adverb
- RBT superlative adverb
- RN nominal adverb (here, then, indoors)
- RP adverb/particle (about, off, up)
- TO infinitive marker to
- UH interjection, exclamation
- VB verb, base form
- VBD verb, past tense
- VBG verb, present participle/gerund
- VBN verb, past participle
- VBZ verb, 3rd. singular present
- WDT wh- determiner (what, which)
- WP$ possessive wh- pronoun (whose)
- WPO objective wh- pronoun (whom, which, that)
- WPS nominative wh- pronoun (who, which, that)
- WQL wh- qualifier (how)
- WRB wh- adverb (how, where, when)
Modified POST
Tag | Description | Examples | |
---|---|---|---|
CC | Conjunction; coordinating | and, or | |
CD | Adjective; cardinal number | 3, fifteen | |
DET | Determiner | this, each, some | |
EX | Pronoun, existential there | there | |
FW | Foreign words | ||
IN | Preposition / Conjunction | for, of, although, that | |
JJ | Adjective | happy, bad | |
JJR | Adjective; comparative | happier, worse | |
JJS | Adjective; superlative | happiest, worst | |
LS | Symbol, list item | A, A. | |
MD | Verb; modal | can, could, 'll | |
NN | Noun | aircraft, data | |
NNP | Noun; proper | London, Michael | |
NNPS | Noun, proper, plural | Australians, Methodists | |
NNS | Noun; plural | women, books | |
PDT | Determiner; prequalifier | quite, all, half | |
POS | Possessive | 's, ' | |
PRP | Determiner; possessive second | mine, yours | |
PRPS | Determiner; possessive | their, your | |
RB | Adverb | often, not, very, here | |
RBR | Adverb; comparative | faster | |
RBS | Adverb; superlative | fastest | |
RP | Adverb; particle | up, off, out | |
SYM | Symbol | * | |
TO | Preposition | to | |
UH | Interjection | oh, yes, mmm | |
VB | Verb; infinitive | take, live | |
VBD | Verb; past tense | took, lived | |
VBG | Verb; gerund | taking, living | |
VBN | Verb; past/passive participle | taken, lived | |
VBP | Verb; base present form | take, live | |
VBZ | Verb; present 3SG -s form | takes, lives | |
WDT | Determiner; question | which, whatever | |
WP | Pronoun; question | who, whoever | |
WPS | Determiner; possessive and question | whose | |
WRB | Adverb; question | when, how, however | |
Punctuation | |||
PP | Punctuation; sentence ender | ., !, ? | |
PPC | Punctuation; comma | , | |
PPD | Punctuation; dollar sign | $ | |
PPL | Punctuation; quotation mark left | `` | |
PPR | Punctuation; quotation mark right | '' | |
PPS | Punctuation; colon, semicolon, elipsis | :, ..., - | |
LRB | Punctuation; left bracket | (, {, [ | |
RRB | Punctuation; right bracket | ), }, ] |
See also
- wikipedia:Part-of-speech tagging
- wikipedia:Phrase chunking
- Perl/Modules/Lingua
- Lingua-EN-Tagger — a Perl module
- Presidential Debate Analysis (uses Lingua-EN-Tagger to parse noun phrases)
- CLAWS:
External links
- TreeTagger Chunker
- TermExtractor
- CLAWS part-of-speech tagger for English
- The XTAG Project — an on-going project to develop a wide-coverage grammar for English using a lexicalized Tree Adjoining Grammar (TAG) formalism.
- Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources
- Part Of Speech Tagging — Google Directory