Difference between revisions of "Perl/Modules/Lingua"
From Christoph's Personal Wiki
Line 6: | Line 6: | ||
*[http://search.cpan.org/~dconway/Lingua-EN-Inflect-1.89/lib/Lingua/EN/Inflect.pm Lingua::EN::Inflect] | *[http://search.cpan.org/~dconway/Lingua-EN-Inflect-1.89/lib/Lingua/EN/Inflect.pm Lingua::EN::Inflect] | ||
*[http://search.cpan.org/~samv/Lingua-Translate-0.08/lib/Lingua/Translate.pm Lingua::Translate] — Translate text from one language to another | *[http://search.cpan.org/~samv/Lingua-Translate-0.08/lib/Lingua/Translate.pm Lingua::Translate] — Translate text from one language to another | ||
+ | *[http://search.cpan.org/~splice/Lingua-EN-Segmenter-0.1/lib/Lingua/EN/StopWords.pm Lingua::EN::StopWords] — Typical stop words for an English corpus (see below) | ||
+ | |||
+ | ==Stop words== | ||
+ | see: [[wikipedia:Stop words]], [http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words List of 319 stop words] | ||
+ | <pre> | ||
+ | a about above across adj after again against all almost alone along also | ||
+ | although always am among an and another any anybody anyone anything anywhere | ||
+ | apart are around as aside at away be because been before behind being below | ||
+ | besides between beyond both but by can cannot could deep did do does doing done | ||
+ | down downwards during each either else enough etc even ever every everybody | ||
+ | everyone except far few for forth from get gets got had hardly has have having | ||
+ | her here herself him himself his how however i if in indeed instead into inward | ||
+ | is it its itself just kept many maybe might mine more most mostly much must | ||
+ | myself near neither next no nobody none nor not nothing nowhere of off often on | ||
+ | only onto or other others ought our ours out outside over own p per please plus | ||
+ | pp quite rather really said seem self selves several shall she should since so | ||
+ | some somebody somewhat still such than that the their theirs them themselves | ||
+ | then there therefore these they this thorough thoroughly those through thus to | ||
+ | together too toward towards under until up upon v very was well were what | ||
+ | whatever when whenever where whether which while who whom whose will with | ||
+ | within without would yet young your yourself | ||
+ | </pre> | ||
+ | |||
+ | ==Spelling rules== | ||
+ | ;E Ending Rule: When a base word ends with an E and you add an ending that begins with a vowel, drop the E. Go to the Free Spelling Worksheet that teaches this spelling rule. | ||
+ | ;The CVC Rule: When a word that is 4 letters or less ends CVC, and the next ending begins with a V, you must double the final vowel. CVC + V = CVCCV | ||
+ | ::Doubling-letters: b, d, f, g, m, n, p, r, and t | ||
+ | ::Ignoring double: c, h, j, k, q, v, w, and x | ||
+ | ::''Note: The C-V-C rule can be used only if the first letter of the suffix (the letters you are adding) is a vowel''. For example: | ||
+ | :::Regret | ||
+ | :::add the suffix '''–ed''', and we get '''regretted''' (double t). | ||
+ | :::add the suffix '''–ful''', and we get '''regretful''' (NOT a double t). | ||
+ | ;The Vowel Changers: When the letter Y or the letter W is at the end of a word, it acts like a vowel letter. | ||
+ | ;The Flighty Y Rule: When a word ends Consonant and a Y (C+Y) and you ad an ending there must be an I in the word. Either the Y changes to I (C+Y+___ = C+I+___) or the ending has an I and it becomes (C+Y+I = C+Y+I). | ||
+ | ;Plural Rules: To make a regular word plural, add a "S" If the word ends in the letter S, Z, X, SH, or CH, or a "C+Y" add an "ES" | ||
+ | ;The Disappearing E Rule: When a word ends with a W, and you ad the ending EN, drop the E. | ||
+ | ;The Appearing AL Rule: When a word ends with "IC" and you at the ending "LY" it must become "ICALLY" | ||
+ | ;Double the Fun Rule: When the ending of a word is a short base CVC rule, follow the CVC doubling rule. | ||
+ | ;The ION Rule: When a word ends in ION, another form of the word may end in OR. | ||
==External links== | ==External links== | ||
*[http://search.cpan.org/search?query=lingua&mode=all List of Lingua modeules] — on CPAN.org | *[http://search.cpan.org/search?query=lingua&mode=all List of Lingua modeules] — on CPAN.org | ||
+ | ===Resources=== | ||
+ | *[http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words A List of English Stop Words] (about 3kb). | ||
+ | *[http://www.ranks.nl/tools/stopwords.html A list of stop words in English and other languages] | ||
+ | *[http://snowball.tartarus.org/ The snowball project] — currently provides lists of stopwords for English, French, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Russian, Finnish and Hungarian as part of a software stemmer project. These lists are used in other software such as the [[Perl]] Lingua::StopWords module. | ||
[[Category:Linguistics]] | [[Category:Linguistics]] |
Revision as of 05:16, 28 May 2007
This article will list my favourite Perl modules related to linguistics (or "Lingua"). Linguistics is a hobby of mine.
Modules
- Lingua::EN::Tagger — a Perl module
- Presidential Debate Analysis (uses Lingua-EN-Tagger to parse noun phrases)
- Lingua::EN::Inflect
- Lingua::Translate — Translate text from one language to another
- Lingua::EN::StopWords — Typical stop words for an English corpus (see below)
Stop words
see: wikipedia:Stop words, List of 319 stop words
a about above across adj after again against all almost alone along also although always am among an and another any anybody anyone anything anywhere apart are around as aside at away be because been before behind being below besides between beyond both but by can cannot could deep did do does doing done down downwards during each either else enough etc even ever every everybody everyone except far few for forth from get gets got had hardly has have having her here herself him himself his how however i if in indeed instead into inward is it its itself just kept many maybe might mine more most mostly much must myself near neither next no nobody none nor not nothing nowhere of off often on only onto or other others ought our ours out outside over own p per please plus pp quite rather really said seem self selves several shall she should since so some somebody somewhat still such than that the their theirs them themselves then there therefore these they this thorough thoroughly those through thus to together too toward towards under until up upon v very was well were what whatever when whenever where whether which while who whom whose will with within without would yet young your yourself
Spelling rules
- E Ending Rule
- When a base word ends with an E and you add an ending that begins with a vowel, drop the E. Go to the Free Spelling Worksheet that teaches this spelling rule.
- The CVC Rule
- When a word that is 4 letters or less ends CVC, and the next ending begins with a V, you must double the final vowel. CVC + V = CVCCV
- Doubling-letters: b, d, f, g, m, n, p, r, and t
- Ignoring double: c, h, j, k, q, v, w, and x
- Note: The C-V-C rule can be used only if the first letter of the suffix (the letters you are adding) is a vowel. For example:
- Regret
- add the suffix –ed, and we get regretted (double t).
- add the suffix –ful, and we get regretful (NOT a double t).
- The Vowel Changers
- When the letter Y or the letter W is at the end of a word, it acts like a vowel letter.
- The Flighty Y Rule
- When a word ends Consonant and a Y (C+Y) and you ad an ending there must be an I in the word. Either the Y changes to I (C+Y+___ = C+I+___) or the ending has an I and it becomes (C+Y+I = C+Y+I).
- Plural Rules
- To make a regular word plural, add a "S" If the word ends in the letter S, Z, X, SH, or CH, or a "C+Y" add an "ES"
- The Disappearing E Rule
- When a word ends with a W, and you ad the ending EN, drop the E.
- The Appearing AL Rule
- When a word ends with "IC" and you at the ending "LY" it must become "ICALLY"
- Double the Fun Rule
- When the ending of a word is a short base CVC rule, follow the CVC doubling rule.
- The ION Rule
- When a word ends in ION, another form of the word may end in OR.
External links
- List of Lingua modeules — on CPAN.org
Resources
- A List of English Stop Words (about 3kb).
- A list of stop words in English and other languages
- The snowball project — currently provides lists of stopwords for English, French, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Russian, Finnish and Hungarian as part of a software stemmer project. These lists are used in other software such as the Perl Lingua::StopWords module.