Style and Diction

From Christoph's Personal Wiki
Jump to: navigation, search

style and diction are two Linux command-line linguistic utilities. They are part of the GNU software suite.

style 
analyses surface characteristics of a document, including sentence length and other readability measures.
diction 
identifies wordy and commonly misused phrases.

These programs cannot help you structure a document well, but they can help to avoid poor wording and compare the readability (not the understandability!) of your documents with others. Both commands support English and German documents.

Theory

Generally, readability is a measure of the accessibility of a piece of writing and/or associated page layout, indicating how effectively it will reach a given reading audience.

Readability is a judgement of how easy a text is to understand for a given established and characterised population.

The understandability of a text is an interaction between the reader who's possible prior knowledge of aspects of the content and the text features would influence the ease with which they access the text, as well as the fixed aspects of the text itself. Presentation factors unrelated to the language of the text also affect readability, for example choice of typeface, text size, layout and colours.

Indices

Kincaid / Flesch-Kincaid Readability Test

The Flesch/Flesch–Kincaid Readability Tests are readability tests designed to indicate how difficult a reading passage is to understand. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level. Although they supposedly use the same measures, just placed into a different scale, the results of the two tests do not always correlate (a text with a better score on the Reading Ease test over another text may end up with a worse score on the Grade Level test). Both these systems were devised by Rudolf Flesch.

Flesch Reading Ease

In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark harder-to-read passages. The formula for the Flesch Reading Ease Score (FRES) test is:

Equation - Flesch Reading Ease Score.png

where total syllables/total words = average number of syllables per word (ASW) and total words/total sentences = average sentence length (ASL).

As a rule of thumb, scores of 90.0–100.0 are considered easily understandable by an average 5th grader. 8th and 9th grade students could easily understand passages with a score of 60–70, and passages with results of 0–30 are best understood by college graduates. Reader's Digest magazine has a readability index of about 65, Time magazine scores about 52, and the Harvard Law Review has a general readability score in the low 30s.

This test has become a U.S. governmental standard. Many government agencies require documents or forms to meet specific readability levels. The U.S. Department of Defense uses the Reading Ease test as the standard test of readability for its documents and forms.

Most states require insurance forms to score 40–50 on the test.

Use of this scale is so ubiquitous that it is bundled with popular word processing programs such as KWord and OpenOffice.

Long words affect this score significantly more than they do the grade level score.

Flesch–Kincaid Grade Level

An obvious use for readability tests is in the field of education. The "Flesch–Kincaid Grade Level Formula" translates the 0–100 score to a U.S. grade level, making it easier for teachers, parents, librarians, and others to judge the readability level of various books and texts. It can also mean the number of years of education required to understand this text, relevant when the formula results in a number greater than 12.[1] The grade level is calculated with the following formula:

Kincaid = 0.39*(total_words/total_sentences) + 11.8*(total_syllables/total_words) - 15.59

The result is a number that corresponds with a grade level. For example, a score of 6.1 would indicate that the text is understandable by an average student in 6th grade.

Automated Readability Index (ARI)

The Automated Readability Index (ARI) is a readability test designed to gauge the understandability of a text. Like the Flesch-Kincaid Grade Level, Gunning-Fog Index, SMOG Index, and Coleman-Liau Index, its output is an approximate representation of the U.S. grade level needed to comprehend the text.

Unlike the other indices, the ARI, along with the Coleman-Liau, relies on a factor of characters per word, instead of the usual syllables per word. Although opinion varies on its accuracy as compared to the syllables/word and complex words indices, characters/word is often easier to calculate, as the number of characters is more readily and accurately counted by computer programs than syllables.

To calculate the Automated Readability Index:

  1. Divide the number of characters by the number of words, and multiply by 4.71. This is #1.
  2. Divide the number of words by the number of sentences, and multiply by 0.5. This is #2.
  3. Add #1 and #2 together, and subtract 21.43.
ARI = 4.71*(characters/words) + 0.5*(words/sentences) - 21.43

Coleman-Liau Index

The Coleman-Liau Index is a readability test designed by Meri Coleman and T. L. Liau to gauge the understandability of a text.[2] Like the Flesch-Kincaid Grade Level, Gunning-Fog Index, SMOG Index, and Automated Readability Index, its output approximates the U.S. grade level thought necessary to comprehend the text.

Like the ARI but unlike most of the other indices, Coleman-Liau relies on characters instead of syllables per word. Although opinion varies on its accuracy as compared to the syllable/word and complex word indices, characters are more readily and accurately counted by computer programs than are syllables.

To calculate the Coleman-Liau:

  1. Divide the number of characters by the number of words, and multiply by 5.89. Call this A.
  2. Take the number of sentences in a fragment of 100 words, and multiply by 0.3. Call this B.
  3. Subtract B from A and subtract 15.8
Coleman-Liau = 5.89*(characters/words) - 30*(sentences/words) - 15.8

Fog Index / Gunning fog index

In linguistics, the Gunning fog index is a test designed to measure the readability of a sample of English writing. The resulting number is an indication of the number of years of formal education that a person requires in order to easily understand the text on the first reading. That is, if a passage has a fog index of 12, it has the reading level of a US high school senior. The test was developed by Robert Gunning, an American businessman, in 1952.[3]

The fog index is generally used by people who want their writing to be read easily by a large segment of the population. Texts that are designed for a wide audience generally require a fog index of less than 12.

  • Typical Gunning fog indices of selected magazines:
    • 12 — Atlantic Monthly
    • 11 — TIME, Harper's
    • 10 — Newsweek
    • 9 — Reader's Digest
    • 8 — Ladies' Home Journal
    • 7 — True Confessions
    • 6 — comic books

Calculating the Gunning fog index

The Gunning fog index can be calculated with the following algorithm:

  1. Take a full passage that is around 100 words (do not omit any sentences).
  2. Find the average sentence length (divide the number of words by the number of sentences).
  3. Count words with three or more syllables (complex words), not including proper nouns (for example, Djibouti), compound words, or common suffixes such as -es, -ed, or -ing as a syllable, or familiar jargon.
  4. Add the average sentence length and the percentage of complex words (ex., +13.37%, not simply + 0.1337)
  5. Multiply the result by 0.4
Fog = 0.4*[(words/sentences) + 100*(complex_words/words)]

While the index is a good indication of reading difficulty, it still has flaws. Not all multisyllabic words are difficult. For example, the word "asparagus" is generally not considered to be a difficult word, even though it has four syllables.

Fog index example

The following paragraph, from the Wikipedia article on "logorrhea", has a Gunning-Fog Index of 17.5.

The word logorrhoea is often used pejoratively to describe prose that is highly abstract and contains little concrete language. Since abstract writing is hard to visualize, it often seems as though it makes no sense and all the words are excessive. Writers in academic fields that concern themselves mostly with the abstract, such as philosophy and especially postmodernism, often fail to include extensive concrete examples of their ideas, and so a superficial examination of their work might lead one to believe that it is all nonsense.

Lix

SMOG-Grading / SMOG Index

The SMOG Index is the approximate version of SMOG (Simple Measure Of Gobbledygook), making this readability test easy to calculate manually.[4] SMOG can be calculated more accurately using the instant free online SMOG Calculator. The output estimates the number of years of US education needed to fully comprehend the text.

To calculate the SMOG Index:

  1. Count the number of complex words (words containing 3 or more syllables).
  2. Multiply the number of complex words by a factor of (30/number of sentences).
  3. Take the square root of the resultant number.
  4. Add 3 to the resultant number.
Equation - SMOG Index.png

Style example

Below is the output of style using William Shakespeare's, "The Merchant of Venice" as the input:

readability grades:
        Kincaid: 2.8
        ARI: 2.9
        Coleman-Liau: 7.8
        Flesch Index: 94.7
        Fog Index: 5.5
        Lix: 24.2 = below school year 5
        SMOG-Grading: 6.1
sentence info:
        89249 characters
        22274 words, average length 4.01 characters = 1.20 syllables
        2043 sentences, average length 10.9 words
        55% (1131) short sentences (at most 6 words)
        15% (308) long sentences (at least 21 words)
        647 paragraphs, average length 3.2 sentences
        9% (191) questions
        23% (473) passive sentences
        longest sent 132 wds at sent 1396; shortest sent 1 wds at sent 8
word usage:
        verb types:
        to be (634) auxiliary (492)
        types as % of total:
        conjunctions 6% (1315) pronouns 16% (3612) prepositions 9% (1997)
        nominalizations 1% (130)
sentence beginnings:
        pronoun (366) interrogative pronoun (125) article (87)
        subordinating conjunction (37) conjunction (77) preposition (64)

Using the following formulae:

Kincaid = 0.39*(total_words/total_sentences) + 11.8*(total_syllables/total_words) - 15.59
ARI = 4.71*(characters/words) + 0.5*(words/sentences) - 21.43
Coleman-Liau = 5.89*(characters/words) - 30*(sentences/words) - 15.8
Flesch = 206.835 - 1.015*(total_words/total_sentences) - 84.6*(total_syllables/total_words)
Fog = 0.4*[(words/sentences) + 100*(complex_words/words)]
SMOG = sqrt[total_complex_words * (30/total_sentences)] + 3

See also

CPAN

Wikipedia

References

  1. Readability statistics — accessed 2007-02-10.
  2. Coleman M, Liau TL (1975). A computer readability formula designed for machine scoring. Journal of Applied Psychology 60:283-284.
  3. Plain Language At Work Newsletter — 2004-03-23 (accessed 2007-02-10).
  4. McLaughlin GH (1969). SMOG grading: A new readability formula (PDF). Journal of Reading 12(8):639-646.

Further reading

  • Flesch R (1948). A new readability yardstick. Journal of Applied Psychology 32:221-233.
  • Kincaid JP, Fishburne Jr RP, Rogers RL, Chissom BS (1975). Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel. Research Branch Report 8-75, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN.
  • Farr JN, Jenkins JJ, Paterson DG (1951). Simplification of Flesch Reading Ease Formula. Journal of Applied Psychology 35(5):333-337.

External links

Servers / Software

Other