mirror of
https://github.com/yacy/yacy_search_server.git
synced 2024-09-21 00:00:13 +02:00
60078cf322
to use this, you must user the -incollection command before (see SVN 5687) and you need a used.dump file that has been produced with that process. Now you can use that file, to do a URL-hash compare with the urls in the URL-DB. To do that, execute java -Xmx1000m -cp classes de.anomic.data.URLAnalysis -diffurlcol DATA/INDEX/freeworld/TEXT used.dump diffurlcol.dump or use different names for the dump files or more memory. As a result, you get the file diffurlcol.dump which contains all the url hashes that occur in the URL database, but not in the collections. The file has the format {hash-12}* that means: 12 byte long hashes are listed without any separation. The next step could be to process this file and delete all these URLs with the computed hashes, or to export them before deletion. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5692 6c8d7289-2bf4-0310-a012-ef5d649a1542 |
||
---|---|---|
.. | ||
AbstractBlacklist.java | ||
Blacklist.java | ||
DefaultBlacklist.java | ||
Document.java | ||
Index.java | ||
IndexCache.java | ||
IndexCell.java | ||
IndexCollection.java | ||
IndexReader.java | ||
MetadataRepository.java | ||
MetadataRowContainer.java | ||
Phrase.java | ||
Reference.java | ||
ReferenceContainer.java | ||
ReferenceContainerArray.java | ||
ReferenceContainerCache.java | ||
ReferenceContainerOrder.java | ||
ReferenceOrder.java | ||
ReferenceRow.java | ||
ReferenceVars.java | ||
URLMetadata.java | ||
Word.java |