Plagiarism Detection Techniques






                         Plagiarism refers to “the act of copying materials without actually acknowledging the original source”. Plagiarism has seen a widespread activity in the recent times. The increase in the number of materials available now in the electronic form and the easy access to the internet has increased plagiarism. Manual detection of plagiarism is not very easy and is time consuming due to the vast amount of contents available. Techniques are available now which help us to detect plagiarism. As the amount of programming code being created is increasing,   techniques are available now to detect plagiarism in source code also.   Current   research is in the field of development of algorithms that can compare and detect plagiarism. In this paper a few techniques used in the Plagiarism Detection is shown along with some tools which are being used.

Plagiarism has become a world-wide problem and is increasing day by day. This problem is getting worse mainly because of the increase in the volume of on- line publications. Relying only on exact-word or phrase matching for plagiarism detection is not sufficient now. People have started paraphrasing or rearranging words to give a new look to their sentences and thus declare themselves as authors of the material. Using Plagiarism Detection Techniques we can compare a given material with any target material which is either a particular document or in a repository. Different techniques used in the Plagiarism Detection algorithms are discussed in detail here. Here I have given more emphasis on source code related plagiarism. A few case studies show that detection can be done within a large repository. The efficiency and time of the output depends on the algorithms used.

Plagiarize according to the Merriam-Webster Online Dictionary is

• to steal and pass off (the ideas or words of another) as one's own

• to use (another's production) without crediting the source

• to commit literary theft

• to present as new and original an idea or product derived from an existing   source.

The expression of original ideas is considered intellectual property and is protected by copyright laws, just like original inventions. Almost all forms of expression fall under copyright protection as long as they are recorded in some way (such as a book or a computer file). In other words, plagiarism is an act of fraud. It involves both stealing someone else's work and lying about it afterward.

The following are considered as plagiarism

•  turning in someone else's work as your own

• copying words or ideas from someone else without giving credit

• failing to put a quotation in quotation marks

• giving incorrect information about the source of a quotation

• changing words but copying the sentence structure of a source without giving credit

• copying so many words or ideas from a source that it makes up the   majority of your work, whether you give credit or not

Plagiarism can be deliberate or accidental. Figure 1 shows the range between Deliberate and Accidental Plagiarism. Deliberate plagiarism is done when a person’s self esteem is very low. The person, therefore, actually steals the property of somebody else and claims it to be his own. He might also hire somebody to do his work. Accidental plagiarism is done when somebody unknowingly cites a phrase or copies words without acknowledging the author of the material.


Plagiarism is rampant now. With most of the data available to us in digital format the venues for plagiarism is opening up. To avoid this kind of cheating and to acknowledge the originality of the author new detection techniques are to be created. Not only systems with speed but also new systems should which can be able to collect information about plagiarism in the web or large repositories. As there are a large number of detection tools available for text based plagiarism the number of copying incidents have reduced considerably in this field. Currently we use a lot of computer based applications. To protect the intellectual property in the source code new techniques are to be developed and implemented.


0 comments: