Jump to content

Wikipedia:Wikipedia Signpost/2009-04-13/Dispatches: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
+ rewards
m sp
Line 8: Line 8:


===Understanding plagiarism===
===Understanding plagiarism===
The problem with plagiarism is not that it involves the use of other people's ideas—that is only to be expected, as Wikipedia is not meant to be a [[WP:PRIMARY|primary source]], nor to contain [[WP:NOR|original research]]; indeed, everything that appears on Wikipedia should be rooted in a [[WP:RS|reliable source]]. The problem is when other people's words or ideas are ''misrepresented'', specifically when they are presented as though they were "and editor's own original work". Even if a contributor provides a citation for a sentence, it may still be plagiarism if he or she does not clearly note the use of the same wording as appears in the source. Citations are universally understood as indicating a source for information; lacking quotation marks, readers do ''not'' expect a citation to indicate that language has been taken from the source as well.
The problem with plagiarism is not that it involves the use of other people's ideas—that is only to be expected, as Wikipedia is not meant to be a [[WP:PRIMARY|primary source]], nor to contain [[WP:NOR|original research]]; indeed, everything that appears on Wikipedia should be rooted in a [[WP:RS|reliable source]]. The problem is when other people's words or ideas are ''misrepresented'', specifically when they are presented as though they were " editor's own original work". Even if a contributor provides a citation for a sentence, it may still be plagiarism if he or she does not clearly note the use of the same wording as appears in the source. Citations are universally understood as indicating a source for information; lacking quotation marks, readers do ''not'' expect a citation to indicate that language has been taken from the source as well.


====Plagiarism and copyright infringement====
====Plagiarism and copyright infringement====

Revision as of 21:15, 3 April 2009

The Signpost
 
Dispatches


Dispatches

Let's get serious about plagiarism

Plagiarism, as Wikipedia's article on the topic explains, "is the use or close imitation of the language and ideas of another author and representation of them as one's own original work". It is an important topic for the project, both because one of the common complaints made against Wikipedia is that it is easily plagiarized and because the encyclopedia itself contains instances of plagiarized material.

This dispatch is concerned with the second problem: the presence of plagiarized material on Wikipedia—how to recognize it and how to avoid adding to it. These issues are not as simple as they may at first appear. Plagiarism is often accidental or inadvertent, but it is still plagiarism. The best way to address it is to understand clearly what it is and how to avoid it.

Understanding plagiarism

The problem with plagiarism is not that it involves the use of other people's ideas—that is only to be expected, as Wikipedia is not meant to be a primary source, nor to contain original research; indeed, everything that appears on Wikipedia should be rooted in a reliable source. The problem is when other people's words or ideas are misrepresented, specifically when they are presented as though they were "an editor's own original work". Even if a contributor provides a citation for a sentence, it may still be plagiarism if he or she does not clearly note the use of the same wording as appears in the source. Citations are universally understood as indicating a source for information; lacking quotation marks, readers do not expect a citation to indicate that language has been taken from the source as well.

Plagiarism is not the same as copyright infringement. Copyright infringement is sometimes a form of plagiarism, but even the reproduction of public domain material without proper attribution is plagiarism and must be avoided. For instance, one report about a plagiarism scandal on Wikipedia claimed that "Wikipedia editors ... declared a handful [of the allegedly plagiarized articles] to be OK because copied passages came from the public domain."[1] If this was indeed the reaction of Wikipedia editors, they were mistaken. To make this clear, think of the famous opening line of Jane Austen's novel Pride and Prejudice (1813): "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."[2] The text of this novel, like the text of the 1911 Encyclopedia Britannica, is in the public domain, because it was published before 1923 in the United States. However, this does not mean that Wikipedia's editors could insert this sentence into the plot summary of the article on Pride and Prejudice without quotation marks. These are Austen's words and even though she no longer owns the copyright to them, others need to acknowledge that the precise diction (word choice) and syntax (word order) are hers. If they do not do so, they are plagiarizing Austen. Apart from this moral dimension, as a matter of precision and accuracy, Wikipedia has a duty to inform its readers of the source of such a sentence.

Wikipedia policies have much to say about copyright violations; however, they have far less to say about plagiarism. The guideline on the topic was written only last year and has yet to be fully adopted by the community. Wikipedia's co-founder Jimmy Wales took a clear stand on the issue in 2005: "Let me say quite firmly that for me, the legal issues are important, but far far far more important are the moral issues. We want to be able, all of us, to point at Wikipedia and say: we made it ourselves, fair and square."[3]

Knowing what isn't plagiarism

From a plagiarism standpoint, not everything requires attribution. When a fact is "common knowledge"—that is, generally known—it is not plagiarism to repeat it, even if a contributor him or herself learned it from a specific reference. For example, that Vientiane is the capital of Laos is common knowledge. Also not plagiarism is the reproduction of non-creative lists of basic information, such as an alphabetical directory of actors appearing in a film. While Wikipedia's verifiability policy encourages citing such information, a failure to do so is not plagiarism.

However, though common knowledge and non-creative lists of basic facts do not "belong" to a source and do not require credit to avoid plagiarism, less commonly known information and creative lists do. Likewise, creative presentation even of common knowledge "belongs" to its original author. Contributors can safely reuse the fact, but not the language, unless it is utterly devoid of creativity, like a title or a common phrase.[4] Less-commonly known facts or interpretations of facts must be cited to avoid plagiarism, and creative text must either be quoted or properly revised.

Avoiding plagiarism

To construct articles that read smoothly and naturally while remaining faithful to their sources, it is essential to learn how to properly use other people's ideas and words. Wikipedia's contributors need to know when to give credit, how to revise text, how to best use sources, and when to use quotations (that is, with quotation marks).

Revising

The two primary forms of revision are paraphrasing and summarizing. These generally differ in the amount of detail. Summaries are often used for longer expanses of text and cover only the major points in a passage, omitting or touching lightly on examples or definitions. A summary is generally considerably shorter than the original source. Paraphrasing remains closer to the original and may be nearly as long as or even longer than the source.

Revising text, whether by paraphrasing or summarizing, is a difficult skill, and much inadvertent plagiarism arises as a result. Commonly, for instance, editors believe that by changing a few words here or there—or even by changing a great number of the words found in the original source text—they have avoided plagiarism. It ain't necessarily so.

Problems with revision: Example 1

In the following example, a writer attempting to stay close to the source's information also remained too close to the source's language. Although sections of the original source have been removed, almost all of the language in the Wikipedia section is copied directly from the source.

Wikipedia article:

The Spokane Falls and its surroundings were a gathering place and focus for settlement for the area's indigenous people due to the fertile hunting grounds and abundance of salmon in the Spokane River. For unrecorded millennia, the Spokane tribe lived in the area around the Spokane River and led a seasonal way of life that consisted of fishing, hunting, and gathering. The Spokane Falls were the tribe's center of trade and fishing. Early in the 19th century, white fur trappers from the east came into the northern Columbia Plateau forests. They were friendly with the native people they encountered. In 1810, the Spokane commenced major trading with white men when the North West Company's Spokane House was established on their lands.

Source

For unrecorded millennia, the Spokane tribe lived in the area around the Spokane River, leading a seasonal way of life consisting of fishing, hunting and gathering endeavors.
The Spokane people shared their territory and language with several other tribes, including the Colville, Flathead, and Kalispel tribes. The Spokane consisted of three bands that lived along the Spokane River. The Spokane Falls were the tribe's center of trade and fishing....
Early in the 19th century, Indian and white fur trappers out of the east came into the northern Columbia Plateau forests. They were friendly with the native people they encountered. They often lived with them, took on their customs, and intermarriage was not uncommon. In 1810, the Spokane commenced major trading with white men. The Northwest Company's Spokane House was established on their lands; it was moved to Fort Colville in 1826.
Problems with revision: Example 2

In this example, the writer has attempted to paraphrase the source, but retained much of the diction (word choice) and syntax (word order and sentence structure) of the original passage. Wikipedia's article very closely resembles the original - the small changes are not enough to constitute an "original" rewriting of the passage.

Wikipedia:

"A statement issued by the receiver, Deloitte's David Carson, confirmed that, of the 670 employees, 480 of them would be laid off. The workers responded angrily to this unexpected decision and at least 100 of them began an unofficial sit-in in the visitors' gallery at the factory that night. They insisted they would refuse to leave until they had met with Carson. Following the revelations, there was a minor scuffle during which the main door to the visitors' centre was damaged. Local Sinn Féin Councillor Joe Kelly was amongst those who occupied the visitors' gallery."

Source

"A statement from the receiver, David Carson of Deloitte, confirmed that 480 of the 670 employees have been made redundant....At least 100 Waterford Crystal employees are refusing to leave the visitors' gallery at the factory tonight and are staging an unofficial sit-in. The employees say they will not be leaving until they meet with Mr Carson. There were some scuffles at one point and a main door to the visitors' centre was damaged....Local Sinn Féin Councillor Joe Kelly, who is one of those currently occupying the visitors' gallery, said the receiver had told staff he would not close the company while there were interested investors."

Analysis:

  • "A statement issued by the receiver, Deloitte's David Carson, confirmed that, of the 670 employees, 480 of them would be laid off" vs. "A statement from the receiver, David Carson of Deloitte, confirmed that 480 of the 670 employees have been made redundant" - The structure of Wikipedia's statement is essentially the same as the original. Changing a single word and slightly reordering one phrase is not enough to constitute a paraphrase.
  • "They insisted they would refuse to leave until they had met with Carson" vs. "The employees say they will not be leaving until they meet with Mr Carson" - The structure of this sentence is the same.
  • "there was a minor scuffle during which the main door to the visitors' centre was damaged" vs. "There were some scuffles at one point and a main door to the visitors' centre was damaged" - The structure of the sentence and the language is the same.
  • "Local Sinn Féin Councillor Joe Kelly was amongst those who occupied the visitors' gallery" vs. "Local Sinn Féin Councillor Joe Kelly, who is one of those currently occupying the visitors' gallery" - This slight rewording does not change the fact that the underlying structure and language is the same.
Revision done right

In terms of both plagiarism and copyright, the author of a text not only "owns" the precise, creative language he or she uses, but also less tangible creative features of presentation, which may incorporate the structure of the piece and the choice of facts. In terms of plagiarism, if not copyright, the author also "owns" the facts or his or her interpretation of them, unless these are, as mentioned above, common knowledge. Revising to avoid plagiarism means completely restructuring a source in word choice and arrangement while making sure to give due credit for the ideas and information taken from it. Editors should always compare their final drafts with the sources they've used, just to make sure that they haven't accidentally come too close in language and structure or failed to attribute when necessary.

Use of sources

One way editors may minimize the tendency to reuse text is not to copy and paste it on their computers as the basis for working drafts. Instead, they should print out source materials, organize them, and then create a working draft in their own words and with their own scheme of organization. Taking notes, organizing them according to a new outline, and only then writing a draft reduces the temptation (and makes it harder) to adopt verbatim language from the sources.

At the same time, when taking notes from a source for their own use, editors may find it a useful to take them verbatim, with quotation marks, if they will not have access to that source as they are writing their final draft. If different language is used in note-taking, an editor may find him or herself accidentally restoring some of the author's original words when constructing a draft. Being able to see at a glance exactly how the source was written can help avoid this.

Use multiple sources, if possible. If writing an article for Wikipedia based on a single text, an editor may find it more difficult to avoid following too closely, as he or she will necessarily be limited to those details selected by the author of that original source. It's not impossible to revise and reorganize a single source sufficiently to avoid plagiarism or copyright infringement, but it is more difficult. Compiling information from multiple sources helps to avoid following too closely on one.

Quoting

When editors do want to use verbatim excerpts, there is one very simple way to avoid plagiarism: use direct quotations. The words should be reproduced exactly as they appear in the original source, enclosed within quotation marks and identified by an inline citation after the quotation. However, direct quotations should not be overused. They run the risk of copyright infringement if the sources used are not free content, that is, public domain or permissively licensed for reuse. Wikipedia's non-free content guidelines offer some guidance on when to use direct quotations and also remind that "Extensive quotation of copyrighted text is prohibited." But even when free sources are used, the overuse of direct quotation produces articles that are simply collections of quotations. These articles fail to explain the broader context of the material presented in the quotation, and readers are left to piece together the story the article is trying to tell.

Spotting plagiarism

Wikipedia's editors should, of course, be careful not to add plagiarized material to the site. However, they can also help protect Wikipedia by spotting plagiarism and helping to correct it. When large sections of a source are copied word-for-word into Wikipedia, it is often easy to spot and repair. The use of ideas or uncommon facts without credit, possibly the most common form of plagiarism on Wikipedia, can be repaired by sourcing. Detecting and dealing with subtler forms of plagiarism may be more of a challenge, but it's not impossible.

Certain red flags for plagiarism include:

  • Inconsistent authorial voice. Although many articles on Wikipedia are multi-authored and thus have several authorial voices, sudden, jarring switches in tone throughout an article may indicate plagiarism. For example, if the "History" section in an article on a city sounds like a tourist brochure and the "Climate and geography" section is filled with highly-technical and jargon-filled language, readers might suspect that specialized sources have been followed too closely. A particular tip-off is the sudden introduction in a single passage of sophisticated text or ideas that do not seem to mesh with the authorial voice of the material around it.
  • Inconsistent language. If the language of an article or passage is colloquial or otherwise feels "off"—for example if jargon or idioms are used incorrectly—readers might have reason to wonder if a contributor has improperly used a source.
  • Atypical elegance. A reader may have cause for concern if a section or an article seems to have been written "too well". Most writing on Wikipedia is not at the level of professional publications. Therefore, when readers suddenly comes across professional-level writing on Wikipedia, with no spelling or grammar errors, they may want to investigate further.
  • Hasty construction. Wikipedia has certain processes and contests that reward editors with barnstars or similar accolades for creating and expanding articles. The desire to obtain these rewards my prompt editors, in their haste, to inadequately revise source text. Readers may wish to consider the time frame and context in which an article was created or expanded.

If a Wikipedian suspects plagiarism, he or she might want to begin by checking the article's history. If it doesn't sound like it was written by the same person but the contributor history suggests that it was, there could be good reason for concern. It may be worth checking the contribution history of an editor across a number of articles, to see if there is a discernible authorial voice or if there is a pattern of such inconsistency. Too, there may be a history of problems, with older notes on the editor's talk page concerning plagiarism or copyright infringement.

Another good starting point is to review the article's sources. Particularly when plagiarism results from misunderstanding rather than intent to deceive, a contributor may clearly identify the sources from which he or she has plagiarized, and even link to them. Concerned readers can also utilize search engines and plagiarism checkers for plagiarism detection. When searching manually, it is helpful to isolate small sections of text from an article. But, nota bene, many of the results found in this fashion may be from mirrors and forks of Wikipedia itself, particularly if the article has been around a while.

Addressing plagiarism

If on investigation an article does seem to follow too closely on the language and structure of another work, the first point to consider is whether it is a matter of copyright infringement or plagiarism. If the source is not free and the text may represent a legal concern for Wikipedia, the procedures set out at Wikipedia's copyright violations policy should be followed. However, even if the source is free, steps should be taken to remedy plagiarism. Wikipedia's proposed guideline on plagiarism suggests politely discussing concerns with the contributor or repairing the plagiarism. If it can be attributed, revised or turned into a usable quotation, it should be. If the contributor who discovers the problem is unable to repair it or uncertain of how it should be addressed, it should be brought to the attention of other contributors. There are templates such as {{Copypaste}} or {{Close paraphrase}} that may draw assistance; concerns might be noted at an appropriate forum or WikiProject.

As the main page says, Wikipedia is "the free encyclopedia that anyone can edit." Anyone can, and should, repair plagiarism.

Notes

  1. ^ Jesdanun, Anick (2006-11-04). "Wikipedia Critic Finds Copied Passages". Associated Press. Retrieved 2007-12-26.
  2. ^ Austen, Jane. Pride and Prejudice. 1813. Chicago: Charles Scribners' Sons, 1914, pg. 1. Google Books. Retrieved 18 March 2009.
  3. ^ Wales, Jimmy (2005-12-28). "Comment". Wikipedia. Retrieved 2009-03-31.
  4. ^ From a copyright standpoint, the level of creativity required is minimal. The U.S. Supreme Court has indicated that "[t]he vast majority of works make the grade quite easily, as they possess some creative spark, "no matter how crude, humble or obvious" it might be." (Feist Publications v. Rural Telephone Service, 499 U.S. 340 (United States Supreme Court, 1991). Similarly, most text will be creative enough that its replication will be plagiarism.