Difference between revisions of "Paraphrase algorithm"
From Christoph's Personal Wiki
(→External links) |
|||
Line 57: | Line 57: | ||
*[http://www.cs.cornell.edu/home/llee/papers/statpar-informal.draft.html An informal explanation of "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment"] | *[http://www.cs.cornell.edu/home/llee/papers/statpar-informal.draft.html An informal explanation of "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment"] | ||
*[http://ejohn.org/projects/javascript-diff-algorithm/ Javascript Diff Algorithm] | *[http://ejohn.org/projects/javascript-diff-algorithm/ Javascript Diff Algorithm] | ||
+ | *[http://aclweb.org/aclwiki/index.php?title=DIRT_Paraphrase_Collection DIRT Paraphrase Collection] | ||
+ | *[http://aclweb.org/aclwiki/index.php?title=Distributional_Hypothesis Distributional Hypothesis] | ||
+ | *[http://aclweb.org/aclwiki/index.php?title=Statistical_Semantics Statistical Semantics] | ||
+ | *[http://semantics.isi.edu/ocean/ VerbOcean] | ||
[[Category:Linguistics]] | [[Category:Linguistics]] |
Latest revision as of 03:53, 18 June 2007
This article will describe my work in developing a paraphrase algorithm using Project Gutenberg as my corpora. It is a form of natural language processing (NLP) in computational linguistics.
The idea is to first extract comparable sub-corpora from my main corpus and use this as my training set. To start with, I will first build a basic sub-subset of parallel corpus and use this for sentence clustering. I am trying to collect data for inferring templates from sentences that appear to be similar on a word-by-word level.
Contents
[hide]Examples
Sem Experiment
See: Statistical Paraphrasing Project from the Cornell Natural Language Processing Group
Let A1 = Isaiah 2:4
, B1 = Micah 4:3
, and C1 = Joel 3:10
. With,
- A1
- And he shall judge among the nations, and shall rebuke many people: and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up sword against nation, neither shall they learn war any more.
- B1
- And he shall judge among many people, and rebuke strong nations afar off; and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up a sword against nation, neither shall they learn war any more.
- C1
- Beat your plowshares into swords, and your pruninghooks into spears: let the weak say, I am strong.
- sentence clustering:
A1a: And he shall judge among the nations, B1a: And he shall judge among many people, A1b: and shall rebuke many people: B1b: and rebuke strong nations afar off; A1c: and they shall beat their swords into plowshares, B1c: and they shall beat their swords into plowshares, A1d: and their spears into pruninghooks: B1d: and their spears into pruninghooks: A1e: nation shall not lift up sword against nation, B1e: nation shall not lift up a sword against nation, A1f: neither shall they learn war any more. B1f: neither shall they learn war any more.
- inducing patterns (arguments in square brackets):
{A1c,A1d,A1f} = {B1c,B1d,B1f} A1a: And he shall judge among [the nations], B1a: And he shall judge among [many people], A1b: and [shall rebuke] [many people]: B1b: and [rebuke] [strong nations] [afar off]; A1e: nation shall not lift up [sword] against nation, B1e: nation shall not lift up [a sword] against nation,
References
- Barzilay R, Lee L (2003). "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment". Proceedings of HLT-NAACL, pp 16-23.
See also
- wikipedia:Natural language processing
- wikipedia:Computational linguistics
- wikipedia:Corpus linguistics
- wikipedia:Part-of-speech tagging (POS tagging or POST also called grammatical tagging)
- Computational Linguistics (journal)
External links
- COMPUTATIONAL LINGUISTICS: Models, Resources, Applications — free online book
- An informal explanation of "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment"
- Javascript Diff Algorithm
- DIRT Paraphrase Collection
- Distributional Hypothesis
- Statistical Semantics
- VerbOcean