may look at the title of the article, the reader must think I’m not advocating "non original". He is an ordinary person, limited brain capacity, even if want to break the head can not produce high quality original articles every day. So there will be a pseudo original is in need of repair knowledge.

algorithm is based on inverted index, the index parameters can be added in the signature page. The signature algorithm should be based on low consumption, suitable for large-scale computing. He can combine a variety of algorithm implementation, such as a key position, the weights of the key word, or word order.

we use two ways to understand to explain the judgment on the search engine page similarity. The core idea of the first is the identification of the content, there are many ways to realize the. For example, take the words literally continuous, backward out comparison, or take the N line of the I words, etc.. The second algorithm is to take several keywords highest weights were compared, which even add these weight as additional conditions. For the two algorithm, the amount of calculation is huge, only to realize in the experiment, the huge cost of commercial operation, is not used.

in understanding how the pseudo original before, we first look at the search engine is how to get non repetitive valuable information to users: the same information pages are generally in the different domain, the different page, get the source code can be said is not necessarily the same from grasping system in content extraction, page analysis of search engine. An important job is to judge the similarity of web pages, to determine the nature of the web page, if convicted of non original, so get in the index of identity is a two class citizen, and the original page weight is not comparable. And this kind of disparity, is difficult to make up in other ways.


see here, is not that the pseudo original also have knowledge greatly, even feel more complex than the original. Indeed, but Dan believes that after the real perfect original is false in the search engine crawl rules to its dregs, its essence. A perfect pseudo original, may be more readable than the original, more attractive.

pseudo original success is analysis and elimination can be recognized in the search engines after you. We can also say that the key of success is the ability to have a pseudo original "fire this off. Of course, this is a technology live, because even if the original article can not be recognized by the spider.

