mirror of
https://github.com/yacy/yacy_search_server.git
synced 2024-09-19 00:01:41 +02:00
- documentaton changes (removed old forum links)
- different handling of link quotation - different handling of link normalization - enhanced html/unicode en/de-coding git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
This commit is contained in:
parent
dcb8687904
commit
40b0547611
|
@ -68,7 +68,7 @@ If you download the software, you must accept the <a href="License.html">License
|
|||
<li><a href="http://www.yacy.net/yacy/release/yacy_v0.52_20070512_3715.exe"><tt>yacy_v0.52_20070512_3715.exe</tt></a></li>
|
||||
</ul>
|
||||
</ul></p>
|
||||
<p>Fresh builds from compiles out of SVN can be obtained from <a href="http://latest.yacy-forum.net">http://latest.yacy-forum.net/</a>.</p>
|
||||
<p>Fresh builds from compiles out of SVN can be obtained <a href="http://www.findenstattsuchen.info/YaCy/latest/index.php">here</a>.</p>
|
||||
|
||||
<br><h3>Installation</h3>
|
||||
<p><ul>
|
||||
|
|
|
@ -37,7 +37,6 @@ Example:
|
|||
# first published on http://www.anomic.de
|
||||
# Frankfurt, Germany, 2005
|
||||
#
|
||||
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
|
||||
# This file is written by (chronological order) Roland Ramthun <admin@yacy-forum.de>, Oliver Wunder <webmaster@daburna.de>, Jan Sandbrink
|
||||
|
||||
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
|
||||
|
@ -94,7 +93,6 @@ Full example:
|
|||
# first published on http://www.anomic.de
|
||||
# Frankfurt, Germany, 2005
|
||||
#
|
||||
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
|
||||
# This file is written by (chronological order) Roland Ramthun <admin@yacy-forum.de>, Oliver Wunder <webmaster@daburna.de>, Jan Sandbrink
|
||||
|
||||
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
|
||||
|
@ -106,7 +104,7 @@ Full example:
|
|||
# Thank you for your help!
|
||||
<!-- lang -->default\(english\)==Deutsch
|
||||
<!-- author -->==Roland Ramthun, Oliver Wunder, Jan Sandbrink
|
||||
<!-- maintainer -->==<admin@yacy-forum.de>
|
||||
<!-- maintainer -->==
|
||||
#-----------------------------
|
||||
|
||||
#File: Blacklist_p.html
|
||||
|
|
|
@ -54,10 +54,9 @@ globalheader();
|
|||
|
||||
<p>Other YaCy Project Sites
|
||||
<ul>
|
||||
<li><a href="http://www.yacy-websuche.de/wiki"><b>YaCy Wiki</b></a> - administrated by Alexander Schier</li>
|
||||
<li><a href="http://www.yacy-websuche.de"><b>German documentation</b></a> - initiated and administrated by Alexander Schier</li>
|
||||
<li><a href="http://www.yacy-forum.de"><b>Deutsches Forum</b></a>, administrated by Roland Ramthun</li>
|
||||
<li><a href="http://sourceforge.net/forum/?group_id=116142"><b>English Forum</b></a></li>
|
||||
<li><a href="http://www.yacy-websearch.net/wiki"><b>YaCy Wiki</b></a></li>
|
||||
<li><a href="http://forum.yacy.de"><b>Deutsches Forum</b></a></li>
|
||||
<li><a href="http://yacy-forum.huzzaar.com/"><b>English Forum</b></a></li>
|
||||
<li><a href="http://developer.berlios.de/projects/yacy/"><b>YaCy at BerliOS</b> - our SVN hosting service</li>
|
||||
<li><a href="http://freshmeat.net/projects/yacyproxy/"><b>YaCy at fresmeat.net</b></a> - Project Announcement Page (please click here to support the project and enhance Rating/Popularity)</li>
|
||||
<li><a href="http://sourceforge.net/projects/yacy/"><b>YaCy at sourceforge.net</b></a> - Project Services; Forum and (in the future) CVS Hosting.</li>
|
||||
|
@ -65,10 +64,10 @@ globalheader();
|
|||
|
||||
<p>Public Interfaces to YaCy Services and Statistics
|
||||
<ul>
|
||||
<li><a href="http://www.yacystats.de/"><b>Statistics about the YaCy network and indexed pages</b></a> - from Alexander Fieger</li>
|
||||
<li><a href="http://yacy.naggel.info/"><b>PHP-based Interface to YaCy Search</b> using YaCys RSS Search Result Output</a> - from Hendrik Richter</li>
|
||||
<li><a href="http://www.deruwe.de/yacy.html"><b>Public Interface for Crawl-Start Entry</b> - from <a href="http://www.deruwe.de/">slick</a></li>
|
||||
<li><a href="http://yacy.naggel.info/stats.php"><b>Stats about the YaCy network and indexed pages</b></a> - from Hendrik Richter</li>
|
||||
<li><a href="http://borg-0300.dyndns.org:3000/"><b>Statistics about the YaCy network and indexed pages</b></a> - from Thomas/Borg-0300</li>
|
||||
</ul></p><br>
|
||||
|
||||
<p>Publications about YaCy
|
||||
|
|
|
@ -342,7 +342,7 @@ location.</li>
|
|||
<li>enhancements to YaCyWiki</li>
|
||||
<li>added interface for customised blacklist classes</li>
|
||||
<li>enhancements for dir.html application: dirlisting for all empty directories, new place in htroot/htdocsdefault</li>
|
||||
<li>Interface YPStats_p.html for http://ypstats.yacy-forum.de/index.php to collect statistics</li>
|
||||
<li>Interface YPStats_p.html to collect statistics</li>
|
||||
</ul>
|
||||
<li>Enhanced Stability</li>
|
||||
<ul>
|
||||
|
@ -746,7 +746,6 @@ location.</li>
|
|||
<li>auto-heal of seed.db - fail</li>
|
||||
<li>many minor bug fixed</li>
|
||||
</ul>
|
||||
<li>new <a href="http://www.yacy-forum.de">german forum at http://www.yacy-forum.de</a>, provided by Roland Ramthun</li>
|
||||
</ul>
|
||||
|
||||
<br><p>v0.33_build20050107
|
||||
|
|
|
@ -59,13 +59,11 @@ globalheader();
|
|||
<li><b>Timo Leise</b> suggested and implemented an extension to the blacklist feature: part-of-domain matching.</li>
|
||||
<li><b>Marc Nause</b> made many major enhancements to the YaCyWiki, the Message- and User-Profile menues and functions.</li>
|
||||
<li><b>Thomas Quella</b> designed the Kaskelix mascot. He also made a large number of bug fixes.</li>
|
||||
<li><b>Roland Ramthun</b> owns and administrates the <a href="http://www.yacy-forum.de/">German YaCy-Forum</a>. He publishes a monthly YaCy newsletter, cares for correct English spelling and a German translation of the YaCy user interface. Roland and other forum participants extended the PHPForum code to make it usable as bug- and feature-tracking system..</li>
|
||||
<li><b>Wolfgang Sander-Beuermann</b>, executive board member of the German search-engine association <a href="http://www.suma-ev.de/">SuMa-eV</a>
|
||||
and manager of the meta-search-engine <a href="http://www.metager.de">metaGer</a> provided computing resources for a <a href="http://www.suma-lab.de:8080">demo peer</a>. He also pushed the project by arranging promotional events.</li>
|
||||
<li><b>Alexander Schier</b> did much alpha-testing from beginning of project, and suggested many features; implemented the blacklist feature, bookmarks, log-menu, user-db, skin-feature, windows-installer and provided first implementation of the yacybar Firefox extension; admin of yacy-websuche.de and the media-wiki at yacy-websuche.de/wiki.</li>
|
||||
<li><b>Alexander Schier</b> did much alpha-testing from beginning of project, and suggested many features; implemented the blacklist feature, bookmarks, log-menu, user-db, skin-feature, windows-installer and provided first implementation of the yacybar Firefox extension.</li>
|
||||
<li><b>Matthias Söhnholz</b> added the offline-browsing feature</li>
|
||||
<li><b>slick</b> helps as packager (.rpm, .deb etc)</li>
|
||||
<li><b>Martin Thelian</b> made system-wide performance enhancement by introducing thread pools; he added ICAP and SOAP support, most of external parser integration, maintains the http protocol implementation, added squid compatibility, robots protocol, better logging and many index protocol, import/export and transfer enhancements. He created a YaCy screensaver and coded major parts of the yacybar Firefox extension.</li>
|
||||
<li><b>Oliver Wunder</b> provided some german translation. He also made bittorrent-releases</li>
|
||||
</ul>
|
||||
|
||||
|
|
|
@ -59,7 +59,7 @@ In case you don't know how to make such a file please read <a href="http://www.r
|
|||
<br>
|
||||
After some hours all yacybots will obey your instructions.
|
||||
<h3>This didn't help me.</h3>
|
||||
If there are any questions left please visit our <a href="http://www.yacy-forum.net">forum</a> and ask for help.
|
||||
If there are any questions left please visit our <a href="http://forum.yacy.net">forum</a> and ask for help.
|
||||
<!-- ----- HERE ENDS CONTENT PART ----- -->
|
||||
<SCRIPT LANGUAGE="JavaScript1.1"><!--
|
||||
globalfooter();
|
||||
|
|
|
@ -77,7 +77,7 @@ globalheader();
|
|||
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle"> <a href="Contact.html" class="dark">Contact</a></td></tr>
|
||||
<tr><td height="2"></td></tr><tr><td height="20" class="white" bgcolor="#FFFFFF" valign="middle"> </td></tr>
|
||||
<tr><td height="2"></td></tr>
|
||||
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle"> <a href="http://www.yacy-forum.de" class="dark"><nobr>Deutsches Forum</nobr></a></td></tr>
|
||||
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle"> <a href="http://forum.yacy.de" class="dark"><nobr>Deutsches Forum</nobr></a></td></tr>
|
||||
<tr><td height="2"></td></tr>
|
||||
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle"> <a href="http://sourceforge.net/forum/?group_id=116142" class="dark">English Forum</a></td></tr>
|
||||
<tr><td height="2"></td></tr><tr><td height="20" class="white" bgcolor="#FFFFFF" valign="middle"> </td></tr>
|
||||
|
|
|
@ -174,7 +174,7 @@ public class Bookmarks {
|
|||
indexURLEntry.Components comp = urlentry.comp();
|
||||
document = switchboard.snippetCache.retrieveDocument(comp.url(), true, 5000, true);
|
||||
prop.put("mode_edit", 0); // create mode
|
||||
prop.put("mode_url", comp.url().toNormalform());
|
||||
prop.put("mode_url", comp.url().toNormalform(false, true));
|
||||
prop.put("mode_title", comp.title());
|
||||
prop.put("mode_description", (document == null) ? comp.title(): document.getTitle());
|
||||
prop.put("mode_author", comp.author());
|
||||
|
@ -270,9 +270,9 @@ public class Bookmarks {
|
|||
bookmark=switchboard.bookmarksDB.getBookmark((String)it.next());
|
||||
if(bookmark!=null){
|
||||
if(bookmark.getFeed() && isAdmin)
|
||||
prop.put("bookmarks_"+count+"_link", "/FeedReader_p.html?url="+de.anomic.data.htmlTools.replaceXMLEntities(bookmark.getUrl()));
|
||||
prop.put("bookmarks_"+count+"_link", "/FeedReader_p.html?url="+de.anomic.data.htmlTools.encodeUnicode2html(bookmark.getUrl(), false));
|
||||
else
|
||||
prop.put("bookmarks_"+count+"_link", de.anomic.data.htmlTools.replaceXMLEntities(bookmark.getUrl()));
|
||||
prop.put("bookmarks_"+count+"_link", de.anomic.data.htmlTools.encodeUnicode2html(bookmark.getUrl(), false));
|
||||
prop.put("bookmarks_"+count+"_title", bookmark.getTitle());
|
||||
prop.put("bookmarks_"+count+"_description", bookmark.getDescription());
|
||||
prop.put("bookmarks_"+count+"_date", serverDate.dateToiso8601(new Date(bookmark.getTimeStamp())));
|
||||
|
|
|
@ -127,7 +127,7 @@ public class CacheAdmin_p {
|
|||
// path.append((pathString.length() == 0) ? linkPathString("/", true) : linkPathString(pathString, false));
|
||||
linkPathString(prop, ((pathString.length() == 0) ? ("/") : (pathString)), true);
|
||||
|
||||
urlstr = url.toNormalform();
|
||||
urlstr = url.toNormalform(true, true);
|
||||
prop.put("info_url", urlstr);
|
||||
|
||||
info.ensureCapacity(10000);
|
||||
|
@ -286,9 +286,9 @@ public class CacheAdmin_p {
|
|||
descr = ((String) entry.getValue()).trim();
|
||||
if (descr.length() == 0) { descr = "-"; }
|
||||
prop.put("info_type_use." + extension + "_" + extension + "_" + i + "_name",
|
||||
de.anomic.data.htmlTools.replaceXMLEntities(descr.replaceAll("\n", "").trim()));
|
||||
de.anomic.data.htmlTools.encodeUnicode2html(descr.replaceAll("\n", "").trim(), true));
|
||||
prop.put("info_type_use." + extension + "_" + extension + "_" + i + "_link",
|
||||
de.anomic.data.htmlTools.replaceXMLEntities(entry.getKey().toString()));
|
||||
de.anomic.data.htmlTools.encodeUnicode2html(entry.getKey().toString(), true));
|
||||
i++;
|
||||
}
|
||||
prop.put("info_type_use." + extension, (i == 0) ? 0 : 1);
|
||||
|
@ -303,7 +303,7 @@ public class CacheAdmin_p {
|
|||
ie = (htmlFilterImageEntry) iter.next();
|
||||
prop.put("info_type_use.images_images_" + i + "_name", ie.alt().replaceAll("\n", "").trim());
|
||||
prop.put("info_type_use.images_images_" + i + "_link",
|
||||
de.anomic.data.htmlTools.replaceXMLEntities(ie.url().toNormalform()));
|
||||
de.anomic.data.htmlTools.encodeUnicode2html(ie.url().toNormalform(false, true), false));
|
||||
i++;
|
||||
}
|
||||
prop.put("info_type_use.images", (i == 0) ? 0 : 1);
|
||||
|
|
|
@ -171,7 +171,7 @@ public class CrawlResults {
|
|||
initiatorSeed = yacyCore.seedDB.getConnected(initiatorHash);
|
||||
executorSeed = yacyCore.seedDB.getConnected(executorHash);
|
||||
|
||||
urlstr = comp.url().toNormalform();
|
||||
urlstr = comp.url().toNormalform(false, true);
|
||||
urltxt = nxTools.shortenURLString(urlstr, 72); // shorten the string text like a URL
|
||||
cachepath = cacheManager.getCachePath(new URL(urlstr)).toString().replace('\\', '/').substring(cacheManager.cachePath.toString().length() + 1);
|
||||
|
||||
|
|
|
@ -143,8 +143,8 @@ public class CrawlURLFetch_p {
|
|||
if (post.get("source", "").equals("url")) {
|
||||
try {
|
||||
url = new URL(post.get("host", null));
|
||||
if (!savedURLs.contains(url.toNormalform()))
|
||||
savedURLs.add(url.toNormalform());
|
||||
if (!savedURLs.contains(url.toNormalform(true, true)))
|
||||
savedURLs.add(url.toNormalform(true, true));
|
||||
prop.put("host", post.get("host", url.toString()));
|
||||
} catch (MalformedURLException e) {
|
||||
prop.put("host", post.get("host", ""));
|
||||
|
|
|
@ -283,7 +283,7 @@ public class DetailedSearch {
|
|||
prop.put("type_results_" + i + "_former", results.getFormerSearch());
|
||||
prop.put("type_results_" + i + "_rankingprops", result.getUrlentry().word().toPropertyForm() + ", domLengthEstimated=" + plasmaURL.domLengthEstimation(result.getUrlhash()) +
|
||||
((plasmaURL.probablyRootURL(result.getUrlhash())) ? ", probablyRootURL" : "") +
|
||||
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform() : ""));
|
||||
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform(false, false) : ""));
|
||||
// adding snippet if available
|
||||
if (result.hasSnippet()) {
|
||||
prop.put("type_results_" + i + "_snippet", 1);
|
||||
|
|
|
@ -188,7 +188,7 @@ public class IndexControl_p {
|
|||
if (entry == null) {
|
||||
prop.put("result", "No Entry for URL hash " + urlhash + "; nothing deleted.");
|
||||
} else {
|
||||
urlstring = entry.comp().url().toNormalform();
|
||||
urlstring = entry.comp().url().toNormalform(false, true);
|
||||
prop.put("urlstring", "");
|
||||
switchboard.urlRemove(urlhash);
|
||||
prop.put("result", "Removed URL " + urlstring);
|
||||
|
@ -328,7 +328,7 @@ public class IndexControl_p {
|
|||
if (entry == null) {
|
||||
prop.put("result", "No Entry for URL hash " + urlhash);
|
||||
} else {
|
||||
prop.put("urlstring", entry.comp().url().toNormalform());
|
||||
prop.put("urlstring", entry.comp().url().toNormalform(false, true));
|
||||
prop.putAll(genUrlProfile(switchboard, entry, urlhash));
|
||||
}
|
||||
}
|
||||
|
@ -464,7 +464,7 @@ public class IndexControl_p {
|
|||
if (le == null) {
|
||||
referrer = "<unknown>";
|
||||
} else {
|
||||
referrer = le.comp().url().toNormalform();
|
||||
referrer = le.comp().url().toNormalform(false, true);
|
||||
}
|
||||
if (comp.url() == null) {
|
||||
prop.put("genUrlProfile", 1);
|
||||
|
@ -472,7 +472,7 @@ public class IndexControl_p {
|
|||
return prop;
|
||||
}
|
||||
prop.put("genUrlProfile", 2);
|
||||
prop.put("genUrlProfile_urlNormalform", comp.url().toNormalform());
|
||||
prop.put("genUrlProfile_urlNormalform", comp.url().toNormalform(false, true));
|
||||
prop.put("genUrlProfile_urlhash", urlhash);
|
||||
prop.put("genUrlProfile_urlDescr", comp.title());
|
||||
prop.put("genUrlProfile_moddate", entry.moddate());
|
||||
|
@ -513,7 +513,7 @@ public class IndexControl_p {
|
|||
if (le == null) {
|
||||
tm.put(uh[0], uh);
|
||||
} else {
|
||||
us = le.comp().url().toNormalform();
|
||||
us = le.comp().url().toNormalform(false, true);
|
||||
tm.put(us, uh);
|
||||
|
||||
}
|
||||
|
|
|
@ -140,11 +140,11 @@ public class IndexCreateIndexingQueue_p {
|
|||
totalSize += entrySize;
|
||||
initiator = yacyCore.seedDB.getConnected(pcentry.initiator());
|
||||
prop.put("indexing-queue_list_"+entryCount+"_dark", (inProcess)? 2: ((dark) ? 1 : 0));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_depth", pcentry.depth());
|
||||
prop.put("indexing-queue_list_"+entryCount+"_modified", pcentry.getModificationDate());
|
||||
prop.put("indexing-queue_list_"+entryCount+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.replaceHTML(pcentry.anchorName()));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_url", htmlTools.replaceHTML(pcentry.normalizedURLString()));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.encodeUnicode2html(pcentry.anchorName(), true));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_url", htmlTools.encodeUnicode2html(pcentry.url().toNormalform(false, true), false));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_size", bytesToString(entrySize));
|
||||
prop.put("indexing-queue_list_"+entryCount+"_inProcess", (inProcess)?1:0);
|
||||
prop.put("indexing-queue_list_"+entryCount+"_inProcess_hash", pcentry.urlHash());
|
||||
|
@ -187,9 +187,9 @@ public class IndexCreateIndexingQueue_p {
|
|||
executorHash = entry.executor();
|
||||
initiatorSeed = yacyCore.seedDB.getConnected(initiatorHash);
|
||||
executorSeed = yacyCore.seedDB.getConnected(executorHash);
|
||||
prop.put("rejected_list_"+j+"_initiator", ((initiatorSeed == null) ? "proxy" : htmlTools.replaceHTML(initiatorSeed.getName())));
|
||||
prop.put("rejected_list_"+j+"_executor", ((executorSeed == null) ? "proxy" : htmlTools.replaceHTML(executorSeed.getName())));
|
||||
prop.put("rejected_list_"+j+"_url", htmlTools.replaceHTML(url.toString()));
|
||||
prop.put("rejected_list_"+j+"_initiator", ((initiatorSeed == null) ? "proxy" : htmlTools.encodeUnicode2html(initiatorSeed.getName(), true)));
|
||||
prop.put("rejected_list_"+j+"_executor", ((executorSeed == null) ? "proxy" : htmlTools.encodeUnicode2html(executorSeed.getName(), true)));
|
||||
prop.put("rejected_list_"+j+"_url", htmlTools.encodeUnicode2html(url.toNormalform(false, true), false));
|
||||
prop.put("rejected_list_"+j+"_failreason", entry.anycause());
|
||||
prop.put("rejected_list_"+j+"_dark", ((dark) ? 1 : 0));
|
||||
dark = !dark;
|
||||
|
|
|
@ -80,9 +80,9 @@ public class IndexCreateLoaderQueue_p {
|
|||
|
||||
initiator = yacyCore.seedDB.getConnected(theMsg.initiator);
|
||||
prop.put("loader-set_list_"+count+"_dark", ((dark) ? 1 : 0) );
|
||||
prop.put("loader-set_list_"+count+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
|
||||
prop.put("loader-set_list_"+count+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
|
||||
prop.put("loader-set_list_"+count+"_depth", theMsg.depth );
|
||||
prop.put("loader-set_list_"+count+"_url", htmlTools.replaceHTML(theMsg.url.toString())); // null pointer exception here !!! maybe url = null; check reason.
|
||||
prop.put("loader-set_list_"+count+"_url", htmlTools.encodeUnicode2html(theMsg.url.toNormalform(false, true), false)); // null pointer exception here !!! maybe url = null; check reason.
|
||||
dark = !dark;
|
||||
count++;
|
||||
}
|
||||
|
|
|
@ -120,12 +120,12 @@ public class IndexCreateWWWGlobalQueue_p {
|
|||
profileHandle = urle.profileHandle();
|
||||
profileEntry = (profileHandle == null) ? null : switchboard.profiles.getEntry(profileHandle);
|
||||
prop.put("crawler-queue_list_"+showNum+"_dark", ((dark) ? 1 : 0) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : htmlTools.replaceHTML(profileEntry.name())));
|
||||
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : htmlTools.encodeUnicode2html(profileEntry.name(), true)));
|
||||
prop.put("crawler-queue_list_"+showNum+"_depth", urle.depth());
|
||||
prop.put("crawler-queue_list_"+showNum+"_modified", daydate(urle.loaddate()) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.replaceHTML(urle.name()));
|
||||
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.replaceHTML(urle.url().toString()));
|
||||
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.encodeUnicode2html(urle.name(), true));
|
||||
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.encodeUnicode2html(urle.url().toNormalform(false, true), false));
|
||||
prop.put("crawler-queue_list_"+showNum+"_hash", urle.urlhash());
|
||||
dark = !dark;
|
||||
showNum++;
|
||||
|
|
|
@ -135,7 +135,7 @@ public class IndexCreateWWWLocalQueue_p {
|
|||
case ANCHOR: value = entry.name(); break;
|
||||
case DEPTH: value = Integer.toString(entry.depth()); break;
|
||||
case INITIATOR:
|
||||
value = (entry.initiator() == null) ? "proxy" : htmlTools.replaceHTML(entry.initiator());
|
||||
value = (entry.initiator() == null) ? "proxy" : htmlTools.encodeUnicode2html(entry.initiator(), true);
|
||||
break;
|
||||
case MODIFIED: value = daydate(entry.loaddate()); break;
|
||||
default: value = null;
|
||||
|
@ -184,12 +184,12 @@ public class IndexCreateWWWLocalQueue_p {
|
|||
profileHandle = urle.profileHandle();
|
||||
profileEntry = (profileHandle == null) ? null : switchboard.profiles.getEntry(profileHandle);
|
||||
prop.put("crawler-queue_list_"+showNum+"_dark", ((dark) ? 1 : 0) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : profileEntry.name()));
|
||||
prop.put("crawler-queue_list_"+showNum+"_depth", urle.depth());
|
||||
prop.put("crawler-queue_list_"+showNum+"_modified", daydate(urle.loaddate()) );
|
||||
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.replaceHTML(urle.name()));
|
||||
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.replaceHTML(urle.url().toString()));
|
||||
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.encodeUnicode2html(urle.name(), true));
|
||||
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.encodeUnicode2html(urle.url().toNormalform(false, true), false));
|
||||
prop.put("crawler-queue_list_"+showNum+"_hash", urle.urlhash());
|
||||
dark = !dark;
|
||||
showNum++;
|
||||
|
|
|
@ -53,6 +53,7 @@ import java.net.MalformedURLException;
|
|||
import java.net.URLDecoder;
|
||||
import java.util.Date;
|
||||
|
||||
import de.anomic.data.htmlTools;
|
||||
import de.anomic.http.httpHeader;
|
||||
import de.anomic.plasma.plasmaURL;
|
||||
import de.anomic.net.URL;
|
||||
|
@ -121,23 +122,13 @@ public class QuickCrawlLink_p {
|
|||
boolean xsstopw = post.get("xsstopw", "").equals("on");
|
||||
boolean xdstopw = post.get("xdstopw", "").equals("on");
|
||||
boolean xpstopw = post.get("xpstopw", "").equals("on");
|
||||
|
||||
String escapedTitle = (title==null)?"unknown":title.replaceAll("&","&")
|
||||
.replaceAll("<", "<")
|
||||
.replaceAll(">", ">")
|
||||
.replaceAll("\"", """);
|
||||
|
||||
String escapedURL = (crawlingStart==null)?"unknown":crawlingStart.replaceAll("&","&")
|
||||
.replaceAll("<", "<")
|
||||
.replaceAll(">", ">")
|
||||
.replaceAll("\"", """);
|
||||
|
||||
prop.put("mode_url",escapedURL);
|
||||
prop.put("mode_title",escapedTitle);
|
||||
prop.put("mode_url", (crawlingStart == null) ? "unknown" : htmlTools.encodeUnicode2html(crawlingStart, false));
|
||||
prop.put("mode_title", (title == null) ? "unknown" : htmlTools.encodeUnicode2html(title, true));
|
||||
|
||||
if (crawlingStart != null) {
|
||||
crawlingStart = crawlingStart.trim();
|
||||
try {crawlingStart = new URL(crawlingStart).toNormalform();} catch (MalformedURLException e1) {}
|
||||
try {crawlingStart = new URL(crawlingStart).toNormalform(true, true);} catch (MalformedURLException e1) {}
|
||||
|
||||
// check if url is proper
|
||||
URL crawlingStartURL = null;
|
||||
|
|
|
@ -39,8 +39,8 @@
|
|||
#(warningGoOnline)#::
|
||||
<dt class="hintIcon"><img src="env/grafics/bad.png" width="32" height="32" alt="bad"/></dt>
|
||||
<dd class="hint">The peer must go online to get a peer address.
|
||||
If you don't know how to configure your system to use a proxy,
|
||||
see the <a href="http://www.yacy.net/yacy/Installation.html#wininst">installation instructions</a>.
|
||||
If you don't know how to configure your system,
|
||||
see the <a href="http://www.yacy.net/yacy/Installation.html">installation instructions</a>.
|
||||
</dd>
|
||||
#(/warningGoOnline)#
|
||||
|
||||
|
|
|
@ -147,7 +147,7 @@ public class Supporter {
|
|||
prop.put("supporter_results_" + i + "_authorized_recommend_showScore", (showScore ? 1 : 0));
|
||||
|
||||
prop.put("supporter_results_" + i + "_authorized_urlhash", urlhash);
|
||||
prop.put("supporter_results_" + i + "_url", de.anomic.data.htmlTools.replaceXMLEntities(url));
|
||||
prop.put("supporter_results_" + i + "_url", de.anomic.data.htmlTools.encodeUnicode2html(url, false));
|
||||
prop.put("supporter_results_" + i + "_urlname", nxTools.shortenURLString(url, 60));
|
||||
prop.put("supporter_results_" + i + "_urlhash", urlhash);
|
||||
prop.put("supporter_results_" + i + "_title", (showScore) ? ("(" + ranking.getScore(urlhash) + ") " + title) : title);
|
||||
|
|
|
@ -155,7 +155,7 @@ public class Surftips {
|
|||
prop.put("surftips_results_" + i + "_authorized_recommend_showScore", (showScore ? 1 : 0));
|
||||
|
||||
prop.put("surftips_results_" + i + "_authorized_urlhash", urlhash);
|
||||
prop.put("surftips_results_" + i + "_url", de.anomic.data.htmlTools.replaceXMLEntities(url));
|
||||
prop.put("surftips_results_" + i + "_url", de.anomic.data.htmlTools.encodeUnicode2html(url, false));
|
||||
prop.put("surftips_results_" + i + "_urlname", nxTools.shortenURLString(url, 60));
|
||||
prop.put("surftips_results_" + i + "_urlhash", urlhash);
|
||||
prop.put("surftips_results_" + i + "_title", (showScore) ? ("(" + ranking.getScore(urlhash) + ") " + title) : title);
|
||||
|
|
|
@ -270,7 +270,7 @@ public class ViewFile {
|
|||
|
||||
} else if (viewMode.equals("iframe")) {
|
||||
prop.put("viewMode", VIEW_MODE_AS_IFRAME);
|
||||
prop.put("viewMode_url", url.toNormalform());
|
||||
prop.put("viewMode_url", url.toNormalform(false, true));
|
||||
|
||||
} else if (viewMode.equals("parsed") || viewMode.equals("sentences") || viewMode.equals("links")) {
|
||||
// parsing the resource content
|
||||
|
@ -348,8 +348,8 @@ public class ViewFile {
|
|||
prop.put("viewMode_links_" + i + "_dark", ((dark) ? 1 : 0));
|
||||
prop.put("viewMode_links_" + i + "_type", "image");
|
||||
prop.putASIS("viewMode_links_" + i + "_text", markup(wordArray, entry.alt()));
|
||||
prop.put("viewMode_links_" + i + "_url", (String) entry.url().toNormalform());
|
||||
prop.putASIS("viewMode_links_" + i + "_link", markup(wordArray, (String) entry.url().toNormalform()));
|
||||
prop.put("viewMode_links_" + i + "_url", (String) entry.url().toNormalform(false, true));
|
||||
prop.putASIS("viewMode_links_" + i + "_link", markup(wordArray, (String) entry.url().toNormalform(false, true)));
|
||||
if (entry.width() > 0 && entry.height() > 0)
|
||||
prop.putASIS("viewMode_links_" + i + "_attr", entry.width() + "x" + entry.height() + " Pixel");
|
||||
else
|
||||
|
@ -365,7 +365,7 @@ public class ViewFile {
|
|||
if (document != null) document.close();
|
||||
}
|
||||
prop.put("error", 0);
|
||||
prop.put("error_url", url.toNormalform());
|
||||
prop.put("error_url", url.toNormalform(false, true));
|
||||
prop.put("error_hash", urlHash);
|
||||
prop.put("error_wordCount", Integer.toString(wordCount));
|
||||
prop.put("error_desc", descr);
|
||||
|
@ -386,7 +386,7 @@ public class ViewFile {
|
|||
}
|
||||
|
||||
private static final String markup(String[] wordArray, String message) {
|
||||
message = htmlTools.replaceXMLEntities(message);
|
||||
message = htmlTools.encodeUnicode2html(message, true);
|
||||
if (wordArray != null)
|
||||
for (int j = 0; j < wordArray.length; j++) {
|
||||
String currentWord = wordArray[j].trim();
|
||||
|
|
|
@ -152,7 +152,7 @@ public class WatchCrawler_p {
|
|||
if (pos == -1) crawlingStart = "http://" + crawlingStart;
|
||||
|
||||
// normalizing URL
|
||||
try {crawlingStart = new URL(crawlingStart).toNormalform();} catch (MalformedURLException e1) {}
|
||||
try {crawlingStart = new URL(crawlingStart).toNormalform(true, true);} catch (MalformedURLException e1) {}
|
||||
|
||||
// check if url is proper
|
||||
URL crawlingStartURL = null;
|
||||
|
@ -276,7 +276,7 @@ public class WatchCrawler_p {
|
|||
nexturlstring = nexturlstring.trim();
|
||||
|
||||
// normalizing URL
|
||||
nexturlstring = new URL(nexturlstring).toNormalform();
|
||||
nexturlstring = new URL(nexturlstring).toNormalform(true, true);
|
||||
|
||||
// generating an url object
|
||||
URL nexturlURL = null;
|
||||
|
|
|
@ -62,8 +62,8 @@ public class config_p {
|
|||
int count=0;
|
||||
while(keys.hasNext()){
|
||||
key = (String) keys.next();
|
||||
prop.put("options_"+count+"_key", htmlTools.replaceXMLEntities(key));
|
||||
prop.put("options_"+count+"_value", htmlTools.replaceXMLEntities(env.getConfig(key, "ERROR")));
|
||||
prop.put("options_"+count+"_key", htmlTools.encodeUnicode2html(key, true));
|
||||
prop.put("options_"+count+"_value", htmlTools.encodeUnicode2html(env.getConfig(key, "ERROR"), true));
|
||||
count++;
|
||||
}
|
||||
prop.put("options", count);
|
||||
|
|
|
@ -125,11 +125,11 @@ public class queues_p {
|
|||
totalSize += entrySize;
|
||||
initiator = yacyCore.seedDB.getConnected(pcentry.initiator());
|
||||
prop.put("list-indexing_"+i+"_profile", (pcentry.profile() != null) ? pcentry.profile().name() : "deleted");
|
||||
prop.putSafeXML("list-indexing_"+i+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())));
|
||||
prop.putSafeXML("list-indexing_"+i+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)));
|
||||
prop.put("list-indexing_"+i+"_depth", pcentry.depth());
|
||||
prop.put("list-indexing_"+i+"_modified", pcentry.getModificationDate());
|
||||
prop.putSafeXML("list-indexing_"+i+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.replaceHTML(pcentry.anchorName()));
|
||||
prop.putSafeXML("list-indexing_"+i+"_url", pcentry.normalizedURLString());
|
||||
prop.putSafeXML("list-indexing_"+i+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.encodeUnicode2html(pcentry.anchorName(), true));
|
||||
prop.putSafeXML("list-indexing_"+i+"_url", pcentry.url().toNormalform(false, true));
|
||||
prop.put("list-indexing_"+i+"_size", entrySize);
|
||||
prop.put("list-indexing_"+i+"_inProcess", (inProcess)?1:0);
|
||||
prop.put("list-indexing_"+i+"_hash", pcentry.urlHash());
|
||||
|
@ -199,7 +199,7 @@ public class queues_p {
|
|||
prop.put(tableName + "_" + showNum + "_depth", urle.depth());
|
||||
prop.put(tableName + "_" + showNum + "_modified", daydate(urle.loaddate()));
|
||||
prop.putSafeXML(tableName + "_" + showNum + "_anchor", urle.name());
|
||||
prop.putSafeXML(tableName + "_" + showNum + "_url", urle.url().toString());
|
||||
prop.putSafeXML(tableName + "_" + showNum + "_url", urle.url().toNormalform(false, true));
|
||||
prop.put(tableName + "_" + showNum + "_hash", urle.urlhash());
|
||||
showNum++;
|
||||
}
|
||||
|
|
|
@ -182,11 +182,11 @@ public final class crawlOrder {
|
|||
// old method: only one url
|
||||
|
||||
// normalizing URL
|
||||
String newURL = new URL((String) urlv.get(0)).toNormalform();
|
||||
String newURL = new URL((String) urlv.get(0)).toNormalform(true, true);
|
||||
if (!newURL.equals(urlv.get(0))) {
|
||||
env.getLog().logWarning("crawlOrder: Received not normalized URL " + urlv.get(0));
|
||||
}
|
||||
String refURL = (refv.get(0) == null) ? null : new URL((String) refv.get(0)).toNormalform();
|
||||
String refURL = (refv.get(0) == null) ? null : new URL((String) refv.get(0)).toNormalform(true, true);
|
||||
if ((refURL != null) && (!refURL.equals(refv.get(0)))) {
|
||||
env.getLog().logWarning("crawlOrder: Received not normalized Referer URL " + refv.get(0) + " of URL " + urlv.get(0));
|
||||
}
|
||||
|
|
|
@ -151,7 +151,7 @@ public final class crawlReceipt {
|
|||
switchboard.wordIndex.loadedURL.store(entry);
|
||||
switchboard.wordIndex.loadedURL.stack(entry, youare, iam, 1);
|
||||
switchboard.delegatedURL.remove(entry.hash()); // the delegated work has been done
|
||||
log.logInfo("crawlReceipt: RECEIVED RECEIPT from " + otherPeerName + " for URL " + entry.hash() + ":" + comp.url().toNormalform());
|
||||
log.logInfo("crawlReceipt: RECEIVED RECEIPT from " + otherPeerName + " for URL " + entry.hash() + ":" + comp.url().toNormalform(false, true));
|
||||
|
||||
// ready for more
|
||||
prop.putASIS("delay", "10");
|
||||
|
|
|
@ -125,7 +125,7 @@ public final class list {
|
|||
int cnt = 0;
|
||||
for (int i=0; i<count; i++) {
|
||||
if ((url = db.pop()) == null) continue;
|
||||
b.append(htmlTools.deReplaceHTMLEntities(url.toNormalform())).append("\n");
|
||||
b.append(htmlTools.decodeHtml2Unicode(url.toNormalform(false, true))).append("\n");
|
||||
cnt++;
|
||||
}
|
||||
prop.put("list", b);
|
||||
|
|
|
@ -135,7 +135,7 @@ public final class transferURL {
|
|||
// check if the entry is blacklisted
|
||||
if ((blockBlacklist) && (plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_DHT, lEntry.hash(), comp.url()))) {
|
||||
int deleted = sb.wordIndex.tryRemoveURLs(lEntry.hash());
|
||||
yacyCore.log.logFine("transferURL: blocked blacklisted URL '" + comp.url().toNormalform() + "' from peer " + otherPeerName + "; deleted " + deleted + " URL entries from RWIs");
|
||||
yacyCore.log.logFine("transferURL: blocked blacklisted URL '" + comp.url().toNormalform(false, true) + "' from peer " + otherPeerName + "; deleted " + deleted + " URL entries from RWIs");
|
||||
lEntry = null;
|
||||
blocked++;
|
||||
continue;
|
||||
|
@ -145,7 +145,7 @@ public final class transferURL {
|
|||
try {
|
||||
sb.wordIndex.loadedURL.store(lEntry);
|
||||
sb.wordIndex.loadedURL.stack(lEntry, iam, iam, 3);
|
||||
yacyCore.log.logFine("transferURL: received URL '" + comp.url().toNormalform() + "' from peer " + otherPeerName);
|
||||
yacyCore.log.logFine("transferURL: received URL '" + comp.url().toNormalform(false, true) + "' from peer " + otherPeerName);
|
||||
received++;
|
||||
} catch (IOException e) {
|
||||
e.printStackTrace();
|
||||
|
|
|
@ -250,7 +250,7 @@ public class yacysearch {
|
|||
if (document != null) {
|
||||
// create a news message
|
||||
HashMap map = new HashMap();
|
||||
map.put("url", comp.url().toNormalform().replace(',', '|'));
|
||||
map.put("url", comp.url().toNormalform(false, true).replace(',', '|'));
|
||||
map.put("title", comp.title().replace(',', ' '));
|
||||
map.put("description", ((document == null) ? comp.title() : document.getTitle()).replace(',', ' '));
|
||||
map.put("author", ((document == null) ? "" : document.getAuthor()));
|
||||
|
@ -314,8 +314,6 @@ public class yacysearch {
|
|||
for(int i=0;i<results.numResults();i++){
|
||||
plasmaSearchResults.searchResult result=results.getResult(i);
|
||||
prop.put("type_results_" + i + "_authorized_recommend", (yacyCore.newsPool.getSpecific(yacyNewsPool.OUTGOING_DB, yacyNewsPool.CATEGORY_SURFTIPP_ADD, "url", result.getUrl()) == null) ? 1 : 0);
|
||||
//prop.put("type_results_" + i + "_authorized_recommend_deletelink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&deleteref=" + result.getUrlhash() + "&urlmaskfilter=.*");
|
||||
//prop.put("type_results_" + i + "_authorized_recommend_recommendlink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&recommendref=" + result.getUrlhash() + "&urlmaskfilter=.*");
|
||||
prop.put("type_results_" + i + "_authorized_recommend_deletelink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&deleteref=" + result.getUrlhash() + "&urlmaskfilter=.*");
|
||||
prop.put("type_results_" + i + "_authorized_recommend_recommendlink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&recommendref=" + result.getUrlhash() + "&urlmaskfilter=.*");
|
||||
prop.put("type_results_" + i + "_authorized_urlhash", result.getUrlhash());
|
||||
|
@ -339,7 +337,7 @@ public class yacysearch {
|
|||
prop.put("type_results_" + i + "_former", results.getFormerSearch());
|
||||
prop.put("type_results_" + i + "_rankingprops", result.getUrlentry().word().toPropertyForm() + ", domLengthEstimated=" + plasmaURL.domLengthEstimation(result.getUrlhash()) +
|
||||
((plasmaURL.probablyRootURL(result.getUrlhash())) ? ", probablyRootURL" : "") +
|
||||
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform() : ""));
|
||||
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform(false, true) : ""));
|
||||
// adding snippet if available
|
||||
if (result.hasSnippet()) {
|
||||
prop.put("type_results_" + i + "_snippet", 1);
|
||||
|
|
|
@ -1859,7 +1859,7 @@ and set an administration password.==und geben Sie ein Administrator Passwort ei
|
|||
You have not published your peer seed yet. This happens automatically, just wait.==Ihr Peer ist dem Netzwerk noch nicht bekannt. Warten Sie noch ein wenig, dies geschieht automatisch.
|
||||
While you have this status you are not allowed to search other peers.==Während Sie diesen Status haben, ist es Ihnen nicht erlaubt andere Peers zu durchsuchen.
|
||||
The peer must go online to get a peer address.==Ihr Peer muss online gehen, um eine Adresse zu bekommen.
|
||||
If you don't know how to configure your system to use a proxy,==Wenn Sie nicht wissen, wie Sie Ihr System konfigurieren, sodass es einen Proxy benutzt,
|
||||
If you don't know how to configure your system,==Wenn Sie nicht wissen, wie Sie Ihr System konfigurieren,
|
||||
see the <a==lesen Sie die <a
|
||||
installation instructions</a>.==Installationsanleitung</a>.
|
||||
You cannot be reached from outside.==Ihr Peer kann nicht von außen erreicht werden.
|
||||
|
|
|
@ -7,7 +7,6 @@
|
|||
# first published on http://www.anomic.de
|
||||
# Frankfurt, Germany, 2005
|
||||
#
|
||||
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
|
||||
# This file is written by (chronological order) Riccardo Lemmi <riccardo@reflab.it>
|
||||
|
||||
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
|
||||
|
@ -19,11 +18,10 @@
|
|||
#Thank you for your help!
|
||||
<!-- lang -->default\(english\)==Italian
|
||||
<!-- author -->==Riccardo Lemmi
|
||||
<!-- maintainer -->==<admin@yacy-forum.de>
|
||||
<!-- maintainer -->==
|
||||
|
||||
#-----------------------------------------------------------
|
||||
#File: Blacklist_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Blacklist Manager==Blacklist Manager
|
||||
Blacklist==Blacklist
|
||||
|
|
|
@ -23,7 +23,6 @@
|
|||
|
||||
#-----------------------------------------------------------
|
||||
#File: Blacklist_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Blacklist Manager==Spravca blacklistu
|
||||
Blacklist==Blacklist
|
||||
|
@ -152,7 +151,6 @@ The maximum cache size is==Maximalna velkost cache je
|
|||
|
||||
#-----------------------------------------------------------
|
||||
#File: Config_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Advanced Config==Pokrocile nastavenia
|
||||
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
|
||||
|
@ -164,7 +162,6 @@ You can change anything, but some options need a restart, and some options can c
|
|||
|
||||
#-----------------------------------------------------------
|
||||
#File: ConfigAdvanced_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Advanced Config==Pokrocile nastavenia
|
||||
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
|
||||
|
@ -216,7 +213,6 @@ location</a> in 10 seconds.==adresu</a> za 10 sekund.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ConfigLanguage_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Language selection==Výber jazyka
|
||||
Language selection==Výber jazyka
|
||||
|
@ -254,7 +250,6 @@ Comment==Komentár
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ConfigSkins_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Skin Selection==Vyber skinov
|
||||
You can change the appearance of YaCy with skins. Select one of the default skins, download new skins, or create your own skin.==Vzhlad YaCy mozete zmenit pomocou skinov. Zvolte jeden z predvytvorenych skinov, stiahnite si nove, alebo vytvorte vlastne skiny.
|
||||
|
@ -270,7 +265,6 @@ Error saving the skin.==Chyba pri stahovani skinu.
|
|||
|
||||
#-----------------------------------------------------------
|
||||
#File: Connections_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Connection Tracking==Stav spojenia
|
||||
Incoming Connections==Prichadzajuce spojenia
|
||||
|
@ -286,7 +280,6 @@ Waiting for new request nr.==Caka sa na poziadavku cislo.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: CookieMonitorIncoming_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Incoming Cookies Monitor==Sledovanie prichadzajucich cookies
|
||||
Cookie Monitor: Incoming Cookies==Sledovanie cookies: Prichadzajuce cookies
|
||||
|
@ -301,7 +294,6 @@ Cookie==Cookie
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: CookieMonitorOutgoing_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Outgoing Cookies Monitor==Sledovanie odchadzajucich cookies
|
||||
Cookie Monitor: Outgoing Cookies==Sledovanie cookies: Odchadzajuce cookies
|
||||
|
@ -397,7 +389,6 @@ There is ".html" at the end, which is not included with the Regular Expression.=
|
|||
|
||||
#-----------------------------------------------------------
|
||||
#File: index.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': Search Page==YaCy '#[clientname]#': Vyhladavacia stranka
|
||||
# NOT USED
|
||||
|
@ -499,7 +490,6 @@ show all==zobrazit vsetko
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCleaner_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
# NOT USED
|
||||
#Index Control==Kontrola indexu
|
||||
|
@ -558,7 +548,6 @@ Word-Hash:</td>==Hash-slovo:</td>
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCreate_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Creation==Tvorba indexu
|
||||
Start Crawling Job:==Odstartuj crawling:
|
||||
|
@ -717,7 +706,6 @@ Busy Peers==Vytazeni peeri
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCreateIndexingQueue_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Creation/Indexing Queue==Vytvorenie indexu/Cakacia listina indexu
|
||||
Index Creation: Indexing Queue==Vytvorenie indexu: Cakacia listina indexu
|
||||
|
@ -745,7 +733,6 @@ Fail-Reason==Dovod zlyhania
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCreateLoaderQueue_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Creation / Loader Queue==Vytvorenie indexu / Cakacia listina nahravaca
|
||||
Index Creation: Loader Queue==Vytvorenie indexu: Cakacia listina nahravaca
|
||||
|
@ -757,7 +744,6 @@ URL==URL adresa
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCreateWWWGlobalQueue_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
# NOT USED
|
||||
#YaCy '\#\[clientname\]\#': Index Creation / WWW Global Crawl Crawl Queue==YaCy '#[clientname]#': Vytvorenie indexu / Globalna WWW cakacia listina
|
||||
|
@ -779,7 +765,6 @@ Anchor Name==Meno kotvy
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexCreateWWWLocalQueue_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy '\#\[clientname\]\#': Index Creation / WWW Local Crawl Queue==YaCy '#[clientname]#': Vytvorenie indexu / Lokalna WWW cakacia listina
|
||||
Index Creation: WWW Local Crawl Queue==Vytvorenie indexu: Lokalna WWW cakacia listina
|
||||
|
@ -808,7 +793,6 @@ This may take a quite long time.==Toto moze chvilku trvat.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexImport_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': Index Import==YaCy '#[clientname]#': Import indexu
|
||||
Index DB Import==Import databazoveho indexu
|
||||
|
@ -873,7 +857,6 @@ Continue==Pokracuj
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexMonitor.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
#YaCy '#[clientname]#': Index Monitor
|
||||
Index Monitor Menu==Menu monitoringu indexu
|
||||
|
@ -950,7 +933,6 @@ URL==URL adresa
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: IndexTransfer_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
The local index currently consists of \(at least\) \#\[wcount\]\# reverse word indexes and \#\[ucount\]\# URL references.== Lokalny index momentalne pozostava z (priblizne) #[wcount]# slov a #[ucount]# URL adries.
|
||||
# NOT USED
|
||||
|
@ -975,7 +957,6 @@ Start/Stop Transfer==Start/Stop prenosu
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Language_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Language selection==Vyber jazyka
|
||||
Language selection==Vyber jazyka
|
||||
|
@ -995,7 +976,6 @@ Error saving the language file.==Pri ukladani jazykoveho suboru doslo k chybe.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Lab.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
# NOT USED
|
||||
#YaCy \'\#\[clientname\]\#\': Lab==YaCy '#[clientname]#': Laboratorium
|
||||
|
@ -1010,7 +990,6 @@ Configuration</a>==Nastavenia</a>
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Messages_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
>Messages==>Spravy
|
||||
>Date==>Datum
|
||||
|
@ -1028,7 +1007,6 @@ I/O error reading message table: ==Vstupno/Vystupna chyba pri citani tabulky spr
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: MessageSend_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Send message==Posli spravu
|
||||
You cannot send a message to==Nemozete poslat spravu pre
|
||||
|
@ -1053,7 +1031,6 @@ Network</a> page.==stranku siete</a>.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Network.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Network Overview==Prehlad stavu siete
|
||||
Network Menu==Menu siet
|
||||
|
@ -1167,7 +1144,6 @@ add Peer==Pridaj peera
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: News.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Network Menu==Menu siete
|
||||
News Overview==Prehlad sprav
|
||||
|
@ -1421,7 +1397,6 @@ The network picture below shows how the latest search query was solved by asking
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ProxyIndexingMonitor_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Monitor for Proxy Indexing==Monitor indexu pre indexaciu proxy
|
||||
This is the control page for web pages that your peer has indexed during the current application run-time==Toto je kontrolna stranke pre web stranky, ktore Vas peer indexoval pocas aktualneho behu aplikacie
|
||||
|
@ -1467,7 +1442,6 @@ Page.==stranke.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: QuickCrawlLink_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': Quick Crawl Link==YaCy '#[clientname]#''#[clientname]#': Rychly Crawl Link
|
||||
Quick Crawl Link==Rychly Crawl Link
|
||||
|
@ -1485,7 +1459,6 @@ Unable to add URL to crawler queue:==Nie je mozne pridat URL adresu do cakacej l
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Settings_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': Settings==YaCy '#[clientname]#': Nastavenia
|
||||
<h2>Settings</h2>==<h2>Nastavenia</h2>
|
||||
|
@ -1762,7 +1735,6 @@ You can reach your YaCy server under the new location==Vas YaCy server je pristu
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Settings_Admin.inc
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Administration Account Settings==Nastavenia konta administratora
|
||||
This is the account that restricts access to this 'Settings' page. If you have not customized it yet, you should do so now:==Toto je konto ktore obmedzuje pristum na tuto stranku 'Nastaveni'. Ak ste toto konte este nevytvorili, mali by ste teraz tak urobit.
|
||||
|
@ -1773,7 +1745,6 @@ value="submit">==value="Uloz">
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Settings_SystemBehaviour.inc
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
System Behaviour Settings==Systemove nastavenia
|
||||
Auto pop-up of status page on start-up:==Automaticky pop-up stranky stavu pri starte YaCy:
|
||||
|
@ -1782,7 +1753,6 @@ Auto pop-up of status page on start-up:==Automaticky pop-up stranky stavu pri st
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: simple_search.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': Search Page==YaCy '#[clientname]#': Vyhladavacia stranka
|
||||
"Search for \#\[former\]\#"=="Hladaj #[former]#"
|
||||
|
@ -1815,7 +1785,6 @@ from 'late' peers to enrich this search result.==z pomalych peerov na zlepsenie
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Status.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
System-, Index- and Peer-Status==Stav systemu, indexu a peera
|
||||
Welcome to YaCy!==Vitajte v YaCy!
|
||||
|
@ -1835,7 +1804,7 @@ Not assigned. The peer must go online to get an address.==Nepriradena. Vas pees
|
|||
The peer does not go online until you use the proxy to surf the internet,==Vas peer neprejde do online modu pokym nepouzijete proxy na surfovanie v internete,
|
||||
thus proving that you <i>want</i> to go online.==cim signalizujete ze <i>chcete</i> prejst do online modu.
|
||||
#---
|
||||
If you don't know how to configure your system to use a proxy,==Navod ako nakonfigurovat system tak aby ste pouzivali proxy,
|
||||
If you don't know how to configure your system,==Navod ako nakonfigurovat system,
|
||||
see the <a==precitajte si <a
|
||||
installation instructions</a>.==instalacne instrukcie</a>.
|
||||
#---
|
||||
|
@ -1893,7 +1862,6 @@ Last Refresh:==Posledna aktualizacia:
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: Status_p.inc
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Private System Properties==Sukromne systemove vlastnosti
|
||||
System Resources==Systemove zdroje
|
||||
|
@ -1936,7 +1904,6 @@ Global Crawl Trigger==odchadzajuce vzialene crawly
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: Steering.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Steering==Ovladanie
|
||||
Steering Receipt:==Navod na ovladanie
|
||||
|
@ -1974,7 +1941,6 @@ user</a> page.==stranky pouzivatelov</a>.
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ViewFile.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
YaCy \'\#\[clientname\]\#\': View URL Content==YaCy '#[clientname]#': Zobraz obsah URL adresy
|
||||
View URL Content==Zobraz obsah URL adresy
|
||||
|
@ -2001,7 +1967,6 @@ Original Resource Content==Originalny obsah zdroja
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ViewLog_p.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Lines==Riadkov
|
||||
reversed order==v prevratenom poradi
|
||||
|
@ -2009,7 +1974,6 @@ reversed order==v prevratenom poradi
|
|||
|
||||
#-------------------------------------------------------
|
||||
#File: ViewProfile.html
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Remote Peer Profile==Profil vzdialeneho peera
|
||||
Remote Peer Profile:==Profil vzdialeneho peera:
|
||||
|
@ -2090,7 +2054,6 @@ Architecture \(C\) by Michael Peter Christen==Architektur (C) von Michael Peter
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/simpleheader.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Project Home==Domovská stránka
|
||||
Help / Wiki==Pomoc / Wiki
|
||||
|
@ -2098,7 +2061,6 @@ Peer Owner Profile==Profi vlastníka peera
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/header.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
# NOT USED
|
||||
#YaCy - Distributed Web Indexing - Administration==YaCy - Indexovanie Distribuovaného Internetu - Administrácia
|
||||
|
@ -2156,7 +2118,6 @@ Interface Skins==Nastavenie vzhladu
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/submenuCookie.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Cookie Menu==Cookie Menu
|
||||
Incoming Cookies==Prichadzajuce cookies
|
||||
|
@ -2164,7 +2125,6 @@ Outgoing Cookies==Odchadzajuce cookies
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/submenuIndexControl.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Control Menu==Menu spravy indexu
|
||||
#Index Administration==Administracia indexu
|
||||
|
@ -2173,7 +2133,6 @@ Index Control Menu==Menu spravy indexu
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/submenuIndexCreate.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Index Creation Menu==Menu vytvorenia indexu
|
||||
Control Queues==Kontrola cakacej listiny
|
||||
|
@ -2191,7 +2150,6 @@ Media Crawl Queues==Cakacia listina Media crawlu
|
|||
|
||||
#--------------------------------------------------------
|
||||
#File: env/templates/submenuPerformance.template
|
||||
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
|
||||
|
||||
Performance Menu==Menu vykonu
|
||||
Queues Performance Settings==Cakacia listina nastaveny vykonu
|
||||
|
|
12
readme.txt
12
readme.txt
|
@ -17,7 +17,6 @@ WHERE IS THE DOCUMENTATION?
|
|||
|
||||
The complete documentation can be found at:
|
||||
(English) http://yacy.net/yacy
|
||||
(Deutsch) http://www.yacy-websuche.de
|
||||
(Wiki:de) http://www.yacy-websuche.de/wiki/index.php/De:Start
|
||||
(Wiki:en) http://www.yacy-websearch.net/wiki/index.php/En:Start
|
||||
|
||||
|
@ -72,17 +71,16 @@ ANY MORE CONFIGURATIONS?
|
|||
- after startup, you see the configuration page in your web browser.
|
||||
just open http://localhost:8080
|
||||
all you have to do (should do) is to enter a password for your peer
|
||||
- You can use YaCy as your web proxy. But you don't need to do that.
|
||||
|
||||
- You can use YaCy as your web proxy. This is an option, you don't need to do that.
|
||||
Simply configure your internet connection to use a proxy at port 8080.
|
||||
- You can add a YaCy toolbar to your Firefox web browser.
|
||||
This release contains the yacybar.xpi file from Alexander Schier
|
||||
and Martin Thelian. Please install this file as a Firefox extension.
|
||||
|
||||
|
||||
|
||||
CONTACT:
|
||||
|
||||
If you have any questions, please do not hesitate to contact the author:
|
||||
Send an email to Michael Christen (mc@anomic.de) with a meaningful subject
|
||||
Send an email to Michael Christen (mc@yacy.net) with a meaningful subject
|
||||
including the word 'yacy' to prevent that your email gets stuck
|
||||
in my anti-spam filter.
|
||||
|
||||
|
@ -91,5 +89,5 @@ feel free to ask the author for a business proposal to customize YaCy
|
|||
according to your needs. We also provide integration solutions if the
|
||||
software is about to be integrated into your enterprise application.
|
||||
|
||||
Germany, Frankfurt a.M., 02.12.2006
|
||||
Germany, Frankfurt a.M., 19.07.2007
|
||||
Michael Peter Christen
|
||||
|
|
|
@ -87,7 +87,7 @@ public class URLFetcherStack {
|
|||
public boolean push(URL url) {
|
||||
try {
|
||||
this.db.push(this.db.row().newEntry(
|
||||
new byte[][] { url.toNormalform().getBytes() }
|
||||
new byte[][] { url.toNormalform(true, true).getBytes() }
|
||||
));
|
||||
this.pushed++;
|
||||
return true;
|
||||
|
|
|
@ -731,7 +731,7 @@ public class bookmarksDB {
|
|||
public Bookmark(String urlHash, URL url){
|
||||
super();
|
||||
this.urlHash=urlHash;
|
||||
entry.put(BOOKMARK_URL, url.toString());
|
||||
entry.put(BOOKMARK_URL, url.toNormalform(false, true));
|
||||
tags=new HashSet();
|
||||
timestamp=System.currentTimeMillis();
|
||||
}
|
||||
|
|
|
@ -144,7 +144,7 @@ public class diff {
|
|||
* <code> ,{__,_1,__} </code><br>
|
||||
* <code> ,{__,__,_1} </code><br>
|
||||
* <ul>
|
||||
* TODO: some optimisation ideas see the discusion <a href="http://www.yacy-forum.de/viewtopic.php?t=3557">Diff.findDiagonal(..) buggy????</a>
|
||||
* TODO: some optimisation ideas
|
||||
* <li>search for a better algorithm on the inet!!! :) </li>
|
||||
* <li>pass only the part of the matrix where the search takes place - not the whole matrix everytime</li>
|
||||
* <li>break the inner loop if the rest of the matrix is smaller than minLength (and no diagonal has been found yet) </li>
|
||||
|
@ -272,7 +272,7 @@ public class diff {
|
|||
case diff.Part.ADDED: sb.append("added"); break;
|
||||
case diff.Part.DELETED: sb.append("deleted"); break;
|
||||
}
|
||||
sb.append("\">").append(htmlTools.replaceXMLEntities(ps[j].getString()).replaceAll("\n", "<br />"));
|
||||
sb.append("\">").append(htmlTools.encodeUnicode2html(ps[j].getString(), true).replaceAll("\n", "<br />"));
|
||||
sb.append("</span>");
|
||||
}
|
||||
sb.append("</p>");
|
||||
|
|
|
@ -2,99 +2,65 @@ package de.anomic.data;
|
|||
|
||||
public class htmlTools {
|
||||
|
||||
/** Replaces special characters from a string. Avoids XSS attacks and ensures correct display of
|
||||
* special characters in non UTF-8 capable browsers.
|
||||
* @param text a string that possibly contains HTML
|
||||
* @return the string with all special characters encoded
|
||||
*/
|
||||
//[MN]
|
||||
public static String replaceHTML(String text) {
|
||||
text = replace(text, xmlentities);
|
||||
text = replace(text, htmlentities);
|
||||
return text;
|
||||
}
|
||||
|
||||
/** Replaces special characters from a string. Ensures correct display of
|
||||
* special characters in non UTF-8 capable browsers.
|
||||
* @param text a string that possibly contains special characters
|
||||
* @return the string with all special characters encoded
|
||||
*/
|
||||
//[MN]
|
||||
public static String replaceHTMLEntities(String text) {
|
||||
text = replace(text, htmlentities);
|
||||
return text;
|
||||
}
|
||||
|
||||
/** Replaces special characters from a string. Avoids XSS attacks.
|
||||
* @param text a string that possibly contains HTML
|
||||
* @return the string without any HTML-tags that can be used for XSS
|
||||
*/
|
||||
//[MN]
|
||||
public static String replaceXMLEntities(String text) {
|
||||
text = replace(text, xmlentities);
|
||||
return text;
|
||||
}
|
||||
|
||||
/** Replaces characters in a string with other characters defined in an array.
|
||||
* @param text a string that possibly contains special characters
|
||||
* @param entities array that contains characters to be replaced and characters it will be replaced by
|
||||
* @return the string with all characters replaced by the corresponding character from array
|
||||
*/
|
||||
//[FB], changes by [MN]
|
||||
public static String replace(String text, String[] entities) {
|
||||
if (text==null) { return null; }
|
||||
for (int x=0;x<=entities.length-1;x=x+2) {
|
||||
int p=0;
|
||||
while ((p=text.indexOf(entities[x],p))>=0) {
|
||||
text=text.substring(0,p)+entities[x+1]+text.substring(p+entities[x].length());
|
||||
p+=entities[x+1].length();
|
||||
}
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
public static String deReplaceHTML(String text) {
|
||||
text = deReplaceHTMLEntities(text);
|
||||
text = deReplaceXMLEntities(text);
|
||||
return text;
|
||||
}
|
||||
|
||||
public static String deReplaceHTMLEntities(String text) {
|
||||
return deReplace(text, htmlentities);
|
||||
}
|
||||
|
||||
public static String deReplaceXMLEntities(String text) {
|
||||
return deReplace(text, xmlentities);
|
||||
}
|
||||
|
||||
public static String deReplace(String text, String[] entities) {
|
||||
//[FB], changes by [MN], re-implemented by [MC]
|
||||
public static String encodeUnicode2html(String text, boolean includingAmpersand) {
|
||||
if (text == null) return null;
|
||||
for (int i=entities.length-1; i>0; i-=2) {
|
||||
int p = 0;
|
||||
while ((p = text.indexOf(entities[i])) >= 0) {
|
||||
text = text.substring(0, p) + entities[i - 1] + text.substring(p + entities[i].length());
|
||||
p += entities[i - 1].length();
|
||||
int pos = 0;
|
||||
StringBuffer sb = new StringBuffer(text.length());
|
||||
search: while (pos < text.length()) {
|
||||
// find a (forward) mapping
|
||||
loop: for (int i = (includingAmpersand) ? 0 : 2; i < mapping.length; i += 2) {
|
||||
if (text.charAt(pos) != mapping[i].charAt(0)) continue loop;
|
||||
// found match
|
||||
sb.append(mapping[i + 1]);
|
||||
pos++;
|
||||
continue search;
|
||||
}
|
||||
// not found match
|
||||
sb.append(text.charAt(pos));
|
||||
pos++;
|
||||
}
|
||||
return text;
|
||||
return new String(sb);
|
||||
}
|
||||
|
||||
public static String decodeHtml2Unicode(String text) {
|
||||
if (text == null) return null;
|
||||
int pos = 0;
|
||||
StringBuffer sb = new StringBuffer(text.length());
|
||||
search: while (pos < text.length()) {
|
||||
// find a reverse mapping. TODO: replace matching with hashtable(s)
|
||||
loop: for (int i = 0; i < mapping.length; i += 2) {
|
||||
if (pos + mapping[i + 1].length() > text.length()) continue loop;
|
||||
for (int j = mapping[i + 1].length() - 1; j >= 0; j--) {
|
||||
if (text.charAt(pos + j) != mapping[i + 1].charAt(j)) continue loop;
|
||||
}
|
||||
// found match
|
||||
sb.append(mapping[i]);
|
||||
pos = pos + mapping[i + 1].length();
|
||||
continue search;
|
||||
}
|
||||
// not found match
|
||||
sb.append(text.charAt(pos));
|
||||
pos++;
|
||||
}
|
||||
return new String(sb);
|
||||
}
|
||||
|
||||
//This array contains codes (see http://mindprod.com/jgloss/unicode.html for details)
|
||||
//that will be replaced. To add new codes or patterns, just put them at the end
|
||||
//of the list. Codes or patterns in this list can not be escaped with [= or <pre>
|
||||
public static final String[] xmlentities={
|
||||
private static final String[] mapping = {
|
||||
// Ampersands _have_ to be replaced first. If they were replaced later,
|
||||
// other replaced characters containing ampersands would get messed up.
|
||||
"\u0026","&", //ampersand
|
||||
"\"",""", //quotation mark
|
||||
"\u003C","<", //less than
|
||||
"\u003E",">", //greater than
|
||||
};
|
||||
|
||||
//This array contains codes (see http://mindprod.com/jgloss/unicode.html for details) and
|
||||
//patterns that will be replaced. To add new codes or patterns, just put them at the end
|
||||
//of the list. Codes or patterns in this list can not be escaped with [= or <pre>
|
||||
public static final String[] htmlentities={
|
||||
"\\", "\", // Backslash
|
||||
"\u005E","^", // Caret
|
||||
|
||||
|
@ -230,4 +196,12 @@ public class htmlTools {
|
|||
"\u00FE","þ",
|
||||
"\u00FF","ÿ"
|
||||
};
|
||||
|
||||
public static void main(String[] args) {
|
||||
String text = "Test-Text mit & um zyklische ü & Ersetzungen auszuschliessen ŠšŸ";
|
||||
String txet = encodeUnicode2html(text, true);
|
||||
System.out.println(txet);
|
||||
System.out.println(decodeHtml2Unicode(txet));
|
||||
if (decodeHtml2Unicode(txet).equals(text)) System.out.println("correct");
|
||||
}
|
||||
}
|
||||
|
|
|
@ -400,7 +400,7 @@ public final class robotsParser{
|
|||
httpHeader reqHeaders = new httpHeader();
|
||||
|
||||
// adding referer
|
||||
reqHeaders.put(httpHeader.REFERER, (new URL(robotsURL,"/")).toString());
|
||||
reqHeaders.put(httpHeader.REFERER, (URL.newURL(robotsURL,"/")).toNormalform(true, true));
|
||||
|
||||
if (entry != null) {
|
||||
oldEtag = entry.getETag();
|
||||
|
@ -455,7 +455,7 @@ public final class robotsParser{
|
|||
redirectionUrlString = redirectionUrlString.trim();
|
||||
|
||||
// generating the new URL object
|
||||
URL redirectionUrl = new URL(robotsURL, redirectionUrlString);
|
||||
URL redirectionUrl = URL.newURL(robotsURL, redirectionUrlString);
|
||||
|
||||
// returning the used httpc
|
||||
httpc.returnInstance(con);
|
||||
|
|
|
@ -91,7 +91,6 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
|
|||
private boolean preformatted = false; //needed for preformatted text
|
||||
private boolean preformattedSpan = false; //needed for <pre> and </pre> spanning over several lines
|
||||
private boolean replacedHTML = false; //indicates if method replaceHTML has been used with line already
|
||||
private boolean replacedCharacters = false; //indicates if method replaceCharachters has been used with line
|
||||
private boolean table = false; //needed for tables, because they reach over several lines
|
||||
private int preindented = 0; //needed for indented <pre>s
|
||||
private int escindented = 0; //needed for indented [=s
|
||||
|
@ -178,7 +177,7 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
|
|||
else {
|
||||
line+=parseTableProperties(result.substring(lenCellDivider,propEnd-lenAttribDivider).trim()).toString();
|
||||
}
|
||||
// quick&dirty fix for http://www.yacy-forum.de/viewtopic.php?t=2825 [MN]
|
||||
// quick&dirty fix [MN]
|
||||
if(propEnd > cellEnd){
|
||||
propEnd = lenCellDivider;
|
||||
}
|
||||
|
@ -707,7 +706,7 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
|
|||
}
|
||||
directory = "<table><tr><td><div class=\"WikiTOCBox\">\n" + directory + "</div></td></tr></table>\n";
|
||||
}
|
||||
//(http://www.yacy-forum.de/viewtopic.php?t=4034) [MN]
|
||||
// [MN]
|
||||
if(!dirElements.isEmpty()){
|
||||
dirElements.clear();
|
||||
headlines = 0;
|
||||
|
@ -777,14 +776,9 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
|
|||
public String transformLine(String result, String publicAddress, plasmaSwitchboard switchboard) {
|
||||
//If HTML has not bee replaced yet (can happen if method gets called in recursion), replace now!
|
||||
if (!replacedHTML || preformattedSpan){
|
||||
result = htmlTools.replaceXMLEntities(result);
|
||||
result = htmlTools.encodeUnicode2html(result, true);
|
||||
replacedHTML = true;
|
||||
}
|
||||
//If special characters have not bee replaced yet, replace now!
|
||||
if (!replacedCharacters || preformattedSpan){
|
||||
result = htmlTools.replaceHTMLEntities(result);
|
||||
replacedCharacters = true;
|
||||
}
|
||||
|
||||
//check if line contains escape symbols([= =]) or if we are in an escape sequence already.
|
||||
if ((result.indexOf("[=")>=0)||(result.indexOf("=]")>=0)||(escapeSpan)){
|
||||
|
@ -837,7 +831,6 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
|
|||
}
|
||||
|
||||
if (!preformatted) replacedHTML = false;
|
||||
replacedCharacters = false;
|
||||
if ((result.endsWith("</li>"))||(defList)||(escape)||(preformatted)||(table)||(cellprocessing)) return result;
|
||||
return result + "<br />";
|
||||
}
|
||||
|
|
|
@ -161,7 +161,7 @@ public class htmlFilterContentScraper extends htmlFilterAbstractScraper implemen
|
|||
|
||||
private String absolutePath(String relativePath) {
|
||||
try {
|
||||
return new URL(root, relativePath).toString();
|
||||
return URL.newURL(root, relativePath).toNormalform(false, true);
|
||||
} catch (Exception e) {
|
||||
return "";
|
||||
}
|
||||
|
|
|
@ -93,7 +93,7 @@ public class htmlFilterImageEntry implements Comparable {
|
|||
// create a total ordering on images with respect on the image size
|
||||
assert (url != null);
|
||||
assert (h instanceof htmlFilterImageEntry);
|
||||
if (this.url.toString().equals(((htmlFilterImageEntry) h).url.toString())) return 0;
|
||||
if (this.url.toNormalform(true, true).equals(((htmlFilterImageEntry) h).url.toNormalform(true, true))) return 0;
|
||||
int thc = this.hashCode();
|
||||
int ohc = ((htmlFilterImageEntry) h).hashCode();
|
||||
if (thc < ohc) return -1;
|
||||
|
|
|
@ -900,12 +900,7 @@ public final class httpd implements serverHandler {
|
|||
// 06.01.2007: decode HTML entities by [FB]
|
||||
public static String decodeHtmlEntities(String s) {
|
||||
// replace all entities defined in wikiCode.characters and htmlentities
|
||||
for (int i=1; i<htmlTools.htmlentities.length; i+=2) {
|
||||
s = s.replaceAll(htmlTools.htmlentities[i], htmlTools.htmlentities[i - 1]);
|
||||
}
|
||||
for (int i=1; i<htmlTools.xmlentities.length; i+=2) {
|
||||
s = s.replaceAll(htmlTools.xmlentities[i], htmlTools.xmlentities[i - 1]);
|
||||
}
|
||||
s = htmlTools.decodeHtml2Unicode(s);
|
||||
|
||||
// replace all other
|
||||
CharArrayWriter b = new CharArrayWriter(s.length());
|
||||
|
|
|
@ -344,7 +344,7 @@ public final class httpdProxyHandler extends httpdAbstractHandler implements htt
|
|||
//redirector
|
||||
if (redirectorEnabled){
|
||||
synchronized(redirectorProcess){
|
||||
redirectorWriter.println(url.toString());
|
||||
redirectorWriter.println(url.toNormalform(false, true));
|
||||
redirectorWriter.flush();
|
||||
}
|
||||
String newUrl=redirectorReader.readLine();
|
||||
|
|
|
@ -172,7 +172,7 @@ public class indexURLEntry {
|
|||
|
||||
public static byte[] encodeComp(URL url, String descr, String author, String tags, String ETag) {
|
||||
serverCharBuffer s = new serverCharBuffer(200);
|
||||
s.append(url.toNormalform()).append(10);
|
||||
s.append(url.toNormalform(false, true)).append(10);
|
||||
s.append(descr).append(10);
|
||||
s.append(author).append(10);
|
||||
s.append(tags).append(10);
|
||||
|
@ -248,7 +248,7 @@ public class indexURLEntry {
|
|||
//System.out.println("author=" + comp.author());
|
||||
try {
|
||||
s.append("hash=").append(hash());
|
||||
s.append(",url=").append(crypt.simpleEncode(comp.url().toNormalform()));
|
||||
s.append(",url=").append(crypt.simpleEncode(comp.url().toNormalform(false, true)));
|
||||
s.append(",descr=").append(crypt.simpleEncode(comp.title()));
|
||||
s.append(",author=").append(crypt.simpleEncode(comp.author()));
|
||||
s.append(",tags=").append(crypt.simpleEncode(comp.tags()));
|
||||
|
|
|
@ -95,6 +95,8 @@ public class kelondroObjects {
|
|||
|
||||
protected synchronized kelondroObjectsEntry get(final String key, final boolean storeCache) throws IOException {
|
||||
// load map from cache
|
||||
assert cache != null;
|
||||
assert key != null;
|
||||
kelondroObjectsEntry map = (kelondroObjectsEntry) cache.get(key);
|
||||
if (map != null) return map;
|
||||
|
||||
|
|
|
@ -104,50 +104,76 @@ public class URL {
|
|||
this("file", "", -1, file.getAbsolutePath());
|
||||
}
|
||||
|
||||
public URL(URL baseURL, String relPath) throws MalformedURLException {
|
||||
public static URL newURL(String baseURL, String relPath) throws MalformedURLException {
|
||||
if ((baseURL == null) ||
|
||||
(relPath.startsWith("http://")) ||
|
||||
(relPath.startsWith("https://")) ||
|
||||
(relPath.startsWith("ftp://")) ||
|
||||
(relPath.startsWith("file://")) ||
|
||||
(relPath.startsWith("smb://"))) {
|
||||
return new URL(relPath);
|
||||
} else {
|
||||
return new URL(new URL(baseURL), relPath);
|
||||
}
|
||||
}
|
||||
|
||||
public static URL newURL(URL baseURL, String relPath) throws MalformedURLException {
|
||||
if ((baseURL == null) ||
|
||||
(relPath.startsWith("http://")) ||
|
||||
(relPath.startsWith("https://")) ||
|
||||
(relPath.startsWith("ftp://")) ||
|
||||
(relPath.startsWith("file://")) ||
|
||||
(relPath.startsWith("smb://"))) {
|
||||
return new URL(relPath);
|
||||
} else {
|
||||
return new URL(baseURL, relPath);
|
||||
}
|
||||
}
|
||||
|
||||
private URL(URL baseURL, String relPath) throws MalformedURLException {
|
||||
if (baseURL == null) throw new MalformedURLException("base URL is null");
|
||||
if (relPath == null) throw new MalformedURLException("relPath is null");
|
||||
int p = relPath.indexOf(':');
|
||||
String relprotocol = (p < 0) ? null : relPath.substring(0, p).toLowerCase();
|
||||
if (relprotocol != null && "http.https.ftp.mailto".indexOf(relprotocol) >= 0) {
|
||||
parseURLString(relPath);
|
||||
} else if (relprotocol == null || relprotocol.equals("javascript")) {
|
||||
this.protocol = baseURL.protocol;
|
||||
this.host = baseURL.host;
|
||||
this.port = baseURL.port;
|
||||
this.userInfo = baseURL.userInfo;
|
||||
if (relPath.toLowerCase().startsWith("javascript:")) {
|
||||
this.path = baseURL.path;
|
||||
} else if (relPath.startsWith("/")) {
|
||||
this.path = relPath;
|
||||
} else if (baseURL.path.endsWith("/")) {
|
||||
if (relPath.startsWith("#") || relPath.startsWith("?")) {
|
||||
throw new MalformedURLException("relative path malformed: " + relPath);
|
||||
} else {
|
||||
this.path = baseURL.path + relPath;
|
||||
}
|
||||
|
||||
this.protocol = baseURL.protocol;
|
||||
this.host = baseURL.host;
|
||||
this.port = baseURL.port;
|
||||
this.userInfo = baseURL.userInfo;
|
||||
if (relPath.toLowerCase().startsWith("javascript:")) {
|
||||
this.path = baseURL.path;
|
||||
} else if (
|
||||
(relPath.startsWith("http://")) ||
|
||||
(relPath.startsWith("https://")) ||
|
||||
(relPath.startsWith("ftp://")) ||
|
||||
(relPath.startsWith("file://")) ||
|
||||
(relPath.startsWith("smb://"))) {
|
||||
this.path = baseURL.path;
|
||||
} else if (relPath.startsWith("/")) {
|
||||
this.path = relPath;
|
||||
} else if (baseURL.path.endsWith("/")) {
|
||||
if (relPath.startsWith("#") || relPath.startsWith("?")) {
|
||||
throw new MalformedURLException("relative path malformed: " + relPath);
|
||||
} else {
|
||||
if (relPath.startsWith("#") || relPath.startsWith("?")) {
|
||||
this.path = baseURL.path + relPath;
|
||||
this.path = baseURL.path + relPath;
|
||||
}
|
||||
} else {
|
||||
if (relPath.startsWith("#") || relPath.startsWith("?")) {
|
||||
this.path = baseURL.path + relPath;
|
||||
} else {
|
||||
int q = baseURL.path.lastIndexOf('/');
|
||||
if (q < 0) {
|
||||
this.path = relPath;
|
||||
} else {
|
||||
int q = baseURL.path.lastIndexOf('/');
|
||||
if (q < 0) {
|
||||
this.path = relPath;
|
||||
} else {
|
||||
this.path = baseURL.path.substring(0, q + 1) + relPath;
|
||||
}
|
||||
this.path = baseURL.path.substring(0, q + 1) + relPath;
|
||||
}
|
||||
}
|
||||
this.quest = baseURL.quest;
|
||||
this.ref = baseURL.ref;
|
||||
|
||||
path = resolveBackpath(path);
|
||||
identRef();
|
||||
identQuest();
|
||||
escape();
|
||||
} else {
|
||||
throw new MalformedURLException("unknown protocol: " + relprotocol);
|
||||
}
|
||||
this.quest = baseURL.quest;
|
||||
this.ref = baseURL.ref;
|
||||
|
||||
path = resolveBackpath(path);
|
||||
identRef();
|
||||
identQuest();
|
||||
escape();
|
||||
}
|
||||
|
||||
public URL(String protocol, String host, int port, String path) throws MalformedURLException {
|
||||
|
@ -182,8 +208,6 @@ public class URL {
|
|||
matcher.reset(path);
|
||||
}
|
||||
|
||||
/* another version at http://www.yacy-forum.de/viewtopic.php?p=26871#26871 */
|
||||
|
||||
return path.equals("")?"/":path;
|
||||
}
|
||||
|
||||
|
@ -228,7 +252,7 @@ public class URL {
|
|||
quest = qtmp.substring((qtmp.length() > 0) ? 1 : 0);
|
||||
}
|
||||
|
||||
final static String[] hex = {
|
||||
private final static String[] hex = {
|
||||
"%00", "%01", "%02", "%03", "%04", "%05", "%06", "%07",
|
||||
"%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E", "%0F",
|
||||
"%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
|
||||
|
@ -301,7 +325,8 @@ public class URL {
|
|||
sbuf.append((char)ch);
|
||||
} else if (ch == ' ') { // space
|
||||
sbuf.append("%20");
|
||||
} else if (ch == '-' || ch == '_' // unreserved
|
||||
} else if (ch == '&' || ch == ':' // unreserved
|
||||
|| ch == '-' || ch == '_'
|
||||
|| ch == '.' || ch == '!'
|
||||
|| ch == '~' || ch == '*'
|
||||
|| ch == '\'' || ch == '('
|
||||
|
@ -462,15 +487,18 @@ public class URL {
|
|||
return quest;
|
||||
}
|
||||
|
||||
public String toNormalform() {
|
||||
return toString(false);
|
||||
}
|
||||
|
||||
public String toString() {
|
||||
return toString(true);
|
||||
return toNormalform(false, true);
|
||||
}
|
||||
|
||||
public String toString(boolean includeReference) {
|
||||
public String toNormalform(boolean stripReference, boolean stripAmp) {
|
||||
if (stripAmp)
|
||||
return toNormalform(!stripReference).replaceAll("&", "&");
|
||||
else
|
||||
return toNormalform(!stripReference);
|
||||
}
|
||||
|
||||
private String toNormalform(boolean includeReference) {
|
||||
// generates a normal form of the URL
|
||||
boolean defaultPort = false;
|
||||
if (this.protocol.equals("mailto")) {
|
||||
|
@ -537,21 +565,24 @@ public class URL {
|
|||
new String[]{"http://www.anomic.de/home", "ftp://ftp.delegate.org/"},
|
||||
new String[]{"http://www.anomic.de","mailto:yacy@weltherrschaft.org"},
|
||||
new String[]{"http://www.anomic.de","javascipt:temp"},
|
||||
new String[]{null,"http://yacy-websuche.de/wiki/index.php?title=De:IntroInformationFreedom&action=history"},
|
||||
new String[]{null, "http://diskusjion.no/index.php?s=5bad5f431a106d9a8355429b81bb0ca5&showuser=23585"},
|
||||
new String[]{null, "http://diskusjion.no/index.php?s=5bad5f431a106d9a8355429b81bb0ca5&showuser=23585"}
|
||||
};
|
||||
String environment, url;
|
||||
de.anomic.net.URL aURL = null;
|
||||
java.net.URL jURL = null;
|
||||
de.anomic.net.URL aURL, aURL1;
|
||||
java.net.URL jURL;
|
||||
for (int i = 0; i < test.length; i++) {
|
||||
environment = test[i][0];
|
||||
url = test[i][1];
|
||||
try {aURL = de.anomic.net.URL.newURL(environment, url);} catch (MalformedURLException e) {aURL = null;}
|
||||
if (environment == null) {
|
||||
try {aURL = new de.anomic.net.URL(url);} catch (MalformedURLException e) {aURL = null;}
|
||||
try {jURL = new java.net.URL(url);} catch (MalformedURLException e) {jURL = null;}
|
||||
} else {
|
||||
try {aURL = new de.anomic.net.URL(new de.anomic.net.URL(environment), url);} catch (MalformedURLException e) {aURL = null;}
|
||||
try {jURL = new java.net.URL(new java.net.URL(environment), url);} catch (MalformedURLException e) {jURL = null;}
|
||||
}
|
||||
|
||||
// check equality to java.net.URL
|
||||
if (((aURL == null) && (jURL != null)) ||
|
||||
((aURL != null) && (jURL == null)) ||
|
||||
((aURL != null) && (jURL != null) && (!(jURL.toString().equals(aURL.toString()))))) {
|
||||
|
@ -559,6 +590,20 @@ public class URL {
|
|||
System.out.println((jURL == null) ? "jURL rejected input" : "jURL=" + jURL.toString());
|
||||
System.out.println((aURL == null) ? "aURL rejected input" : "aURL=" + aURL.toString());
|
||||
}
|
||||
|
||||
// check stability: the normalform of the normalform must be equal to the normalform
|
||||
if (aURL != null) try {
|
||||
aURL1 = new de.anomic.net.URL(aURL.toNormalform(false, true));
|
||||
if (!(aURL1.toNormalform(false, true).equals(aURL.toNormalform(false, true)))) {
|
||||
System.out.println("no stability for url:");
|
||||
System.out.println("aURL0=" + aURL.toString());
|
||||
System.out.println("aURL1=" + aURL1.toString());
|
||||
}
|
||||
} catch (MalformedURLException e) {
|
||||
System.out.println("no stability for url:");
|
||||
System.out.println("aURL0=" + aURL.toString());
|
||||
System.out.println("aURL1 cannot be computed:" + e.getMessage());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -77,7 +77,7 @@ public class ResourceInfo implements IResourceInfo {
|
|||
|
||||
// generating the url hash
|
||||
this.url = objectURL;
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
|
||||
|
||||
// create the http header object
|
||||
this.propertyMap = new HashMap(objectInfo);
|
||||
|
@ -88,7 +88,7 @@ public class ResourceInfo implements IResourceInfo {
|
|||
|
||||
// generating the url hash
|
||||
this.url = objectURL;
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
|
||||
|
||||
// create the http header object
|
||||
this.propertyMap = new HashMap();
|
||||
|
|
|
@ -76,7 +76,7 @@ public class ResourceInfo implements IResourceInfo {
|
|||
|
||||
// generating the url hash
|
||||
this.url = objectURL;
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
|
||||
|
||||
// create the http header object
|
||||
this.responseHeader = new httpHeader(null, objectInfo);
|
||||
|
@ -88,7 +88,7 @@ public class ResourceInfo implements IResourceInfo {
|
|||
|
||||
// generating the url hash
|
||||
this.url = objectURL;
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
|
||||
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
|
||||
|
||||
this.requestHeader = requestHeaders;
|
||||
this.responseHeader = responseHeaders;
|
||||
|
|
|
@ -188,7 +188,7 @@ public class CrawlWorker extends AbstractCrawlWorker implements plasmaCrawlWorke
|
|||
if (isFolder) {
|
||||
fullPath = fullPath + "/";
|
||||
file = "";
|
||||
this.url = new URL(this.url,fullPath);
|
||||
this.url = URL.newURL(this.url,fullPath);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -318,7 +318,7 @@ public final class CrawlWorker extends AbstractCrawlWorker {
|
|||
}
|
||||
|
||||
// normalizing URL
|
||||
redirectionUrlString = new URL(this.url, redirectionUrlString).toNormalform();
|
||||
redirectionUrlString = new URL(redirectionUrlString).toNormalform(true, true);
|
||||
|
||||
// generating the new URL object
|
||||
URL redirectionUrl = new URL(redirectionUrlString);
|
||||
|
@ -351,16 +351,15 @@ public final class CrawlWorker extends AbstractCrawlWorker {
|
|||
if (redirectedEntry != null) {
|
||||
// TODO: Here we can store the content of the redirection
|
||||
// as content of the original URL if some criterias are met
|
||||
// See: http://www.yacy-forum.de/viewtopic.php?t=1719
|
||||
//
|
||||
// plasmaHTCache.Entry newEntry = (plasmaHTCache.Entry) redirectedEntry.clone();
|
||||
// newEntry.url = url;
|
||||
// TODO: which http header should we store here?
|
||||
// TODO: which http header should we store here?
|
||||
//
|
||||
// // enQueue new entry with response header
|
||||
// if (profile != null) {
|
||||
// cacheManager.push(newEntry);
|
||||
// }
|
||||
// cacheManager.push(newEntry);
|
||||
// }
|
||||
// htCache = newEntry;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -153,7 +153,7 @@ public abstract class AbstractParser implements Parser{
|
|||
if (file.isDirectory()) {
|
||||
result += parseDir(location, prefix, file, doc);
|
||||
} else try {
|
||||
URL url = new URL(location, "/" + prefix + "/"
|
||||
URL url = URL.newURL(location, "/" + prefix + "/"
|
||||
// XXX: workaround for relative paths within document
|
||||
+ file.getPath().substring(file.getPath().indexOf(File.separatorChar) + 1)
|
||||
+ "/" + file.getName());
|
||||
|
|
|
@ -117,7 +117,7 @@ public class SZParserExtractCallback extends ArchiveExtractCallback {
|
|||
plasmaParserDocument theDoc;
|
||||
// workaround for relative links in file, normally '#' shall be used behind the location, see
|
||||
// below for reversion of the effects
|
||||
URL url = new URL(doc.getLocation(), this.prefix + "/" + super.filePath);
|
||||
URL url = URL.newURL(doc.getLocation(), this.prefix + "/" + super.filePath);
|
||||
String mime = plasmaParser.getMimeTypeByFileExt(super.filePath.substring(super.filePath.lastIndexOf('.') + 1));
|
||||
if (this.cfos.isFallback()) {
|
||||
theDoc = this.parser.parseSource(url, mime, null, this.cfos.getContentFile());
|
||||
|
@ -129,7 +129,7 @@ public class SZParserExtractCallback extends ArchiveExtractCallback {
|
|||
Map nanchors = new HashMap(theDoc.getAnchors().size(), 1f);
|
||||
Iterator it = theDoc.getAnchors().entrySet().iterator();
|
||||
Map.Entry entry;
|
||||
String base = doc.getLocation().toNormalform();
|
||||
String base = doc.getLocation().toNormalform(false, true);
|
||||
while (it.hasNext()) {
|
||||
entry = (Map.Entry)it.next();
|
||||
if (((String)entry.getKey()).startsWith(base + "/")) {
|
||||
|
|
|
@ -166,7 +166,7 @@ public class tarParser extends AbstractParser implements Parser {
|
|||
checkInterruption();
|
||||
|
||||
// parsing the content
|
||||
subDoc = theParser.parseSource(new URL(location,"#" + entryName),entryMime,null,subDocTempFile);
|
||||
subDoc = theParser.parseSource(URL.newURL(location,"#" + entryName),entryMime,null,subDocTempFile);
|
||||
} catch (ParserException e) {
|
||||
this.theLogger.logInfo("Unable to parse tar file entry '" + entryName + "'. " + e.getMessage());
|
||||
} finally {
|
||||
|
|
|
@ -149,7 +149,7 @@ public class zipParser extends AbstractParser implements Parser {
|
|||
serverFileUtils.copy(zippedContent,subDocTempFile,entry.getSize());
|
||||
|
||||
// parsing the zip file entry
|
||||
subDoc = theParser.parseSource(new URL(location,"#" + entryName),entryMime,null, subDocTempFile);
|
||||
subDoc = theParser.parseSource(URL.newURL(location,"#" + entryName),entryMime,null, subDocTempFile);
|
||||
} catch (ParserException e) {
|
||||
this.theLogger.logInfo("Unable to parse zip file entry '" + entryName + "'. " + e.getMessage());
|
||||
} finally {
|
||||
|
|
|
@ -208,7 +208,7 @@ public final class plasmaCondenser {
|
|||
htmlFilterImageEntry ientry;
|
||||
while (i.hasNext()) {
|
||||
ientry = (htmlFilterImageEntry) i.next();
|
||||
insertTextToWords((String) ientry.url().toNormalform(), 99, flag_cat_hasimage, wflags);
|
||||
insertTextToWords((String) ientry.url().toNormalform(false, true), 99, flag_cat_hasimage, wflags);
|
||||
insertTextToWords((String) ientry.alt(), 99, flag_cat_hasimage, wflags);
|
||||
}
|
||||
|
||||
|
|
|
@ -63,7 +63,7 @@ public class plasmaCrawlEntry {
|
|||
|
||||
private String initiator; // the initiator hash, is NULL or "" if it is the own proxy;
|
||||
// if this is generated by a crawl, the own peer hash in entered
|
||||
private String urlhash; // the url's hash
|
||||
private String urlhash; // the url's hash
|
||||
private String referrer; // the url's referrer hash
|
||||
private URL url; // the url as string
|
||||
private String name; // the name of the url, from anchor tag <a>name</a>
|
||||
|
|
|
@ -454,7 +454,6 @@ public final class plasmaCrawlLURL {
|
|||
}
|
||||
|
||||
// The Cleaner class was provided as "UrldbCleaner" by Hydrox
|
||||
// see http://www.yacy-forum.de/viewtopic.php?p=18093#18093
|
||||
public Cleaner makeCleaner() {
|
||||
return new Cleaner();
|
||||
}
|
||||
|
@ -502,15 +501,15 @@ public final class plasmaCrawlLURL {
|
|||
remove(entry.hash());
|
||||
} else if (plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_CRAWLER, comp.url()) ||
|
||||
plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_DHT, comp.url())) {
|
||||
lastBlacklistedUrl = comp.url().toNormalform();
|
||||
lastBlacklistedUrl = comp.url().toNormalform(true, true);
|
||||
lastBlacklistedHash = entry.hash();
|
||||
serverLog.logFine("URLDBCLEANER", ++blacklistedUrls + " blacklisted (" + ((double) blacklistedUrls / totalSearchedUrls) * 100 + "%): " + entry.hash() + " " + comp.url().toNormalform());
|
||||
serverLog.logFine("URLDBCLEANER", ++blacklistedUrls + " blacklisted (" + ((double) blacklistedUrls / totalSearchedUrls) * 100 + "%): " + entry.hash() + " " + comp.url().toNormalform(false, true));
|
||||
remove(entry.hash());
|
||||
if (blacklistedUrls % 100 == 0) {
|
||||
serverLog.logInfo("URLDBCLEANER", "Deleted " + blacklistedUrls + " URLs until now. Last deleted URL-Hash: " + lastBlacklistedUrl);
|
||||
}
|
||||
}
|
||||
lastUrl = comp.url().toNormalform();
|
||||
lastUrl = comp.url().toNormalform(true, true);
|
||||
lastHash = entry.hash();
|
||||
}
|
||||
}
|
||||
|
|
|
@ -223,7 +223,7 @@ public final class plasmaCrawlStacker {
|
|||
}
|
||||
|
||||
return stackCrawl(
|
||||
theMsg.url().toString(),
|
||||
theMsg.url().toNormalform(true, true),
|
||||
theMsg.referrerhash(),
|
||||
theMsg.initiator(),
|
||||
theMsg.name(),
|
||||
|
|
|
@ -303,7 +303,7 @@ public final class plasmaHTCache {
|
|||
if (deleteFileandDirs(key, getCachePath(url), msg)) {
|
||||
try {
|
||||
// As the file is gone, the entry in responseHeader.db is not needed anymore
|
||||
this.log.logFinest("Trying to remove responseHeader from URL: " + url.toString());
|
||||
this.log.logFinest("Trying to remove responseHeader from URL: " + url.toNormalform(false, true));
|
||||
this.responseHeaderDB.remove(plasmaURL.urlHash(url));
|
||||
} catch (IOException e) {
|
||||
resetResponseHeaderDB();
|
||||
|
@ -365,7 +365,7 @@ public final class plasmaHTCache {
|
|||
} else {
|
||||
URL url = getURL(file);
|
||||
if (url != null) {
|
||||
this.log.logFinest("Trying to remove responseHeader for URL: " + url.toString());
|
||||
this.log.logFinest("Trying to remove responseHeader for URL: " + url.toNormalform(false, true));
|
||||
this.responseHeaderDB.remove(plasmaURL.urlHash(url));
|
||||
}
|
||||
}
|
||||
|
@ -507,7 +507,7 @@ public final class plasmaHTCache {
|
|||
public IResourceInfo loadResourceInfo(URL url) throws UnsupportedProtocolException, IllegalAccessException {
|
||||
|
||||
// getting the URL hash
|
||||
String urlHash = plasmaURL.urlHash(url.toNormalform());
|
||||
String urlHash = plasmaURL.urlHash(url.toNormalform(true, true));
|
||||
|
||||
// loading data from database
|
||||
Map hdb = this.responseHeaderDB.getMap(urlHash);
|
||||
|
@ -976,7 +976,7 @@ public final class plasmaHTCache {
|
|||
|
||||
|
||||
// normalize url
|
||||
this.nomalizedURLString = url.toNormalform();
|
||||
this.nomalizedURLString = url.toNormalform(true, true);
|
||||
|
||||
try {
|
||||
this.url = new URL(this.nomalizedURLString);
|
||||
|
|
|
@ -756,7 +756,7 @@ public final class plasmaParser {
|
|||
int p = 0;
|
||||
for (int i = 1; i <= 4; i++) for (int j = 0; j < scraper.getHeadlines(i).length; j++) sections[p++] = scraper.getHeadlines(i)[j];
|
||||
plasmaParserDocument ppd = new plasmaParserDocument(
|
||||
new URL(location.toNormalform()),
|
||||
new URL(location.toNormalform(true, true)),
|
||||
mimeType,
|
||||
charSet,
|
||||
scraper.getKeywords(),
|
||||
|
@ -841,7 +841,7 @@ public final class plasmaParser {
|
|||
loop: while (i.hasNext()) {
|
||||
o = i.next();
|
||||
if (o instanceof String) url = (String) o;
|
||||
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform();
|
||||
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform(true, true);
|
||||
else {
|
||||
assert false;
|
||||
continue;
|
||||
|
@ -874,7 +874,7 @@ public final class plasmaParser {
|
|||
while (i.hasNext()) {
|
||||
o = i.next();
|
||||
if (o instanceof String) url = (String) o;
|
||||
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform();
|
||||
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform(true, true);
|
||||
else {
|
||||
assert false;
|
||||
continue;
|
||||
|
|
|
@ -331,7 +331,7 @@ public class plasmaParserDocument {
|
|||
}
|
||||
try {
|
||||
url = new URL(u);
|
||||
u = url.toNormalform();
|
||||
u = url.toNormalform(true, true);
|
||||
if (plasmaParser.mediaExtContains(ext)) {
|
||||
// this is not a normal anchor, its a media link
|
||||
if (plasmaParser.imageExtContains(ext)) {
|
||||
|
|
|
@ -87,7 +87,7 @@ public final class plasmaSearchImages {
|
|||
Map.Entry e = (Map.Entry) i.next();
|
||||
String nexturlstring;
|
||||
try {
|
||||
nexturlstring = new URL((String) e.getKey()).toNormalform();
|
||||
nexturlstring = new URL((String) e.getKey()).toNormalform(true, true);
|
||||
addAll(new plasmaSearchImages(sc, serverDate.remainingTime(start, maxTime, 10), new URL(nexturlstring), depth - 1));
|
||||
} catch (MalformedURLException e1) {
|
||||
e1.printStackTrace();
|
||||
|
|
|
@ -112,7 +112,7 @@ public final class plasmaSearchPostOrder {
|
|||
// take out relevant information for reference computation
|
||||
indexURLEntry.Components comp = page.comp();
|
||||
if ((comp.url() == null) || (comp.title() == null)) return;
|
||||
String[] urlcomps = htmlFilterContentScraper.urlComps(comp.url().toNormalform()); // word components of the url
|
||||
String[] urlcomps = htmlFilterContentScraper.urlComps(comp.url().toNormalform(true, true)); // word components of the url
|
||||
String[] descrcomps = comp.title().toLowerCase().split(htmlFilterContentScraper.splitrex); // words in the description
|
||||
|
||||
// store everything
|
||||
|
@ -173,7 +173,7 @@ public final class plasmaSearchPostOrder {
|
|||
// first scan all entries and find all urls that are referenced
|
||||
while (i.hasNext()) {
|
||||
entry = (Map.Entry) i.next();
|
||||
paths.put(((indexURLEntry) entry.getValue()).comp().url().toNormalform(), entry.getKey());
|
||||
paths.put(((indexURLEntry) entry.getValue()).comp().url().toNormalform(true, true), entry.getKey());
|
||||
//if (path != null) path = shortenPath(path);
|
||||
//if (path != null) paths.put(path, entry.getKey());
|
||||
}
|
||||
|
@ -183,7 +183,7 @@ public final class plasmaSearchPostOrder {
|
|||
String shorten;
|
||||
while (i.hasNext()) {
|
||||
entry = (Map.Entry) i.next();
|
||||
shorten = shortenPath(((indexURLEntry) entry.getValue()).comp().url().toNormalform());
|
||||
shorten = shortenPath(((indexURLEntry) entry.getValue()).comp().url().toNormalform(true, true));
|
||||
// scan all subpaths of the url
|
||||
while (shorten != null) {
|
||||
if (pageAcc.size() <= query.wantedResults) break;
|
||||
|
@ -259,7 +259,7 @@ public final class plasmaSearchPostOrder {
|
|||
String hash, fill;
|
||||
String[] paths1 = new String[urls.length]; for (int i = 0; i < urls.length; i++) {
|
||||
fill = ""; for (int j = 0; j < 35 - urls[i].toString().length(); j++) fill +=" ";
|
||||
paths1[i] = urls[i].toNormalform();
|
||||
paths1[i] = urls[i].toNormalform(true, true);
|
||||
hash = plasmaURL.urlHash(urls[i]);
|
||||
System.out.println("paths1[" + urls[i] + fill +"] = " + hash + ", typeID=" + plasmaURL.flagTypeID(hash) + ", tldID=" + plasmaURL.flagTLDID(hash) + ", lengthID=" + plasmaURL.flagLengthID(hash) + " / " + paths1[i]);
|
||||
}
|
||||
|
|
|
@ -308,7 +308,7 @@ public class plasmaSearchRankingProfile {
|
|||
|
||||
// prefer hit with 'prefer' pattern
|
||||
indexURLEntry.Components comp = page.comp();
|
||||
if (comp.url().toNormalform().matches(query.prefer)) ranking += 256 << coeff_prefer;
|
||||
if (comp.url().toNormalform(true, true).matches(query.prefer)) ranking += 256 << coeff_prefer;
|
||||
if (comp.title().matches(query.prefer)) ranking += 256 << coeff_prefer;
|
||||
|
||||
// apply 'common-sense' heuristic using references
|
||||
|
|
|
@ -682,7 +682,7 @@ public class plasmaSnippetCache {
|
|||
ArrayList result = new ArrayList();
|
||||
while (i.hasNext()) {
|
||||
ientry = (htmlFilterImageEntry) i.next();
|
||||
url = (String) ientry.url().toNormalform();
|
||||
url = (String) ientry.url().toNormalform(true, true);
|
||||
desc = (String) ientry.alt();
|
||||
//result.add(new MediaSnippet("image", url, (desc.length() == 0) ? url : desc, ientry.width() + " x " + ientry.height()));
|
||||
s = removeAppearanceHashes(url, queryhashes);
|
||||
|
@ -882,12 +882,12 @@ public class plasmaSnippetCache {
|
|||
(snippet.getErrorCode() == ERROR_RESOURCE_LOADING) ||
|
||||
(snippet.getErrorCode() == ERROR_PARSER_FAILED) ||
|
||||
(snippet.getErrorCode() == ERROR_PARSER_NO_LINES)) {
|
||||
log.logInfo("error: '" + snippet.getError() + "', remove url = " + snippet.getUrl().toNormalform() + ", cause: " + snippet.getError());
|
||||
log.logInfo("error: '" + snippet.getError() + "', remove url = " + snippet.getUrl().toNormalform(false, true) + ", cause: " + snippet.getError());
|
||||
sb.wordIndex.loadedURL.remove(urlHash);
|
||||
sb.wordIndex.removeHashReferences(queryhashes, urlHash);
|
||||
}
|
||||
if (snippet.getErrorCode() == ERROR_NO_MATCH) {
|
||||
log.logInfo("error: '" + snippet.getError() + "', remove words '" + querystring + "' for url = " + snippet.getUrl().toNormalform() + ", cause: " + snippet.getError());
|
||||
log.logInfo("error: '" + snippet.getError() + "', remove words '" + querystring + "' for url = " + snippet.getUrl().toNormalform(false, true) + ", cause: " + snippet.getError());
|
||||
sb.wordIndex.removeHashReferences(snippet.remaingHashes, urlHash);
|
||||
}
|
||||
return snippet.getError();
|
||||
|
|
|
@ -2444,7 +2444,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
|
|||
sbStackCrawlThread.enqueue(nextUrl, entry.urlHash(), initiatorPeerHash, (String) nextEntry.getValue(), docDate, entry.depth() + 1, entry.profile());
|
||||
} catch (MalformedURLException e1) {}
|
||||
}
|
||||
log.logInfo("CRAWL: ADDED " + hl.size() + " LINKS FROM " + entry.normalizedURLString() +
|
||||
log.logInfo("CRAWL: ADDED " + hl.size() + " LINKS FROM " + entry.url().toNormalform(false, true) +
|
||||
", NEW CRAWL STACK SIZE IS " + noticeURL.stackSize(plasmaCrawlNURL.STACK_TYPE_CORE));
|
||||
}
|
||||
stackEndTime = System.currentTimeMillis();
|
||||
|
@ -2471,7 +2471,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
|
|||
indexingStartTime = System.currentTimeMillis();
|
||||
|
||||
checkInterruption();
|
||||
log.logFine("Condensing for '" + entry.normalizedURLString() + "'");
|
||||
log.logFine("Condensing for '" + entry.url().toNormalform(false, true) + "'");
|
||||
plasmaCondenser condenser = new plasmaCondenser(document, entry.profile().indexText(), entry.profile().indexMedia());
|
||||
|
||||
// generate citation reference
|
||||
|
@ -2575,8 +2575,8 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
|
|||
String language = plasmaURL.language(entry.url());
|
||||
char doctype = plasmaURL.docType(document.getMimeType());
|
||||
indexURLEntry.Components comp = newEntry.comp();
|
||||
int urlLength = comp.url().toNormalform().length();
|
||||
int urlComps = htmlFilterContentScraper.urlComps(comp.url().toNormalform()).length;
|
||||
int urlLength = comp.url().toNormalform(true, true).length();
|
||||
int urlComps = htmlFilterContentScraper.urlComps(comp.url().toNormalform(true, true)).length;
|
||||
|
||||
// iterate over all words
|
||||
Iterator i = condenser.words().entrySet().iterator();
|
||||
|
@ -2672,12 +2672,12 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
|
|||
|
||||
// if this was performed for a remote crawl request, notify requester
|
||||
if ((processCase == PROCESSCASE_6_GLOBAL_CRAWLING) && (initiatorPeer != null)) {
|
||||
log.logInfo("Sending crawl receipt for '" + entry.normalizedURLString() + "' to " + initiatorPeer.getName());
|
||||
log.logInfo("Sending crawl receipt for '" + entry.url().toNormalform(false, true) + "' to " + initiatorPeer.getName());
|
||||
if (clusterhashes != null) initiatorPeer.setAlternativeAddress((String) clusterhashes.get(initiatorPeer.hash));
|
||||
yacyClient.crawlReceipt(initiatorPeer, "crawl", "fill", "indexed", newEntry, "");
|
||||
}
|
||||
} else {
|
||||
log.logFine("Not Indexed Resource '" + entry.normalizedURLString() + "': process case=" + processCase);
|
||||
log.logFine("Not Indexed Resource '" + entry.url().toNormalform(false, true) + "': process case=" + processCase);
|
||||
addURLtoErrorDB(entry.url(), referrerUrlHash, initiatorPeerHash, docDescription, plasmaCrawlEURL.DENIED_UNKNOWN_INDEXING_PROCESS_CASE, new kelondroBitfield());
|
||||
}
|
||||
} catch (Exception ee) {
|
||||
|
@ -2956,7 +2956,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
|
|||
urlname = "http://share." + seed.getName() + ".yacy" + filename;
|
||||
if ((p = urlname.indexOf("?")) > 0) urlname = urlname.substring(0, p);
|
||||
} else {
|
||||
urlstring = comp.url().toNormalform();
|
||||
urlstring = comp.url().toNormalform(false, true);
|
||||
urlname = urlstring;
|
||||
}
|
||||
|
||||
|
|
|
@ -268,10 +268,6 @@ public class plasmaSwitchboardQueue {
|
|||
return url;
|
||||
}
|
||||
|
||||
public String normalizedURLString() {
|
||||
return url.toNormalform();
|
||||
}
|
||||
|
||||
public String urlHash() {
|
||||
return plasmaURL.urlHash(url);
|
||||
}
|
||||
|
@ -365,7 +361,7 @@ public class plasmaSwitchboardQueue {
|
|||
return "Indexing_Not_Allowed";
|
||||
}
|
||||
|
||||
String nURL = normalizedURLString();
|
||||
String nURL = url.toNormalform(true, true);
|
||||
// -CGI access in request
|
||||
// CGI access makes the page very individual, and therefore not usable in caches
|
||||
if (!profile().crawlingQ()) {
|
||||
|
@ -420,7 +416,7 @@ public class plasmaSwitchboardQueue {
|
|||
return "Indexing_Not_Allowed";
|
||||
}
|
||||
|
||||
final String nURL = normalizedURLString();
|
||||
final String nURL = url().toNormalform(true, true);
|
||||
// -CGI access in request
|
||||
// CGI access makes the page very individual, and therefore not usable in caches
|
||||
if (!profile().crawlingQ()) {
|
||||
|
|
|
@ -460,7 +460,7 @@ public class plasmaURL {
|
|||
// combine the attributes
|
||||
StringBuffer hash = new StringBuffer(12);
|
||||
// form the 'local' part of the hash
|
||||
hash.append(kelondroBase64Order.enhancedCoder.encode(serverCodings.encodeMD5Raw(url.toNormalform())).substring(0, 5)); // 5 chars
|
||||
hash.append(kelondroBase64Order.enhancedCoder.encode(serverCodings.encodeMD5Raw(url.toNormalform(true, true))).substring(0, 5)); // 5 chars
|
||||
hash.append(subdomPortPath(subdom, port, rootpath)); // 1 char
|
||||
// form the 'global' part of the hash
|
||||
hash.append(protocolHostPort(url.getProtocol(), host, port)); // 5 chars
|
||||
|
|
|
@ -279,7 +279,7 @@ public final class plasmaWordIndex implements indexRI {
|
|||
// use all the words in one condenser object to simultanous create index entries
|
||||
|
||||
int wordCount = 0;
|
||||
int urlLength = url.toString().length();
|
||||
int urlLength = url.toNormalform(true, true).length();
|
||||
int urlComps = htmlFilterContentScraper.urlComps(url.toString()).length;
|
||||
|
||||
// iterate over all words of context text
|
||||
|
@ -542,7 +542,6 @@ public final class plasmaWordIndex implements indexRI {
|
|||
}
|
||||
|
||||
// The Cleaner class was provided as "UrldbCleaner" by Hydrox
|
||||
// see http://www.yacy-forum.de/viewtopic.php?p=18093#18093
|
||||
public synchronized Cleaner makeCleaner(plasmaCrawlLURL lurl, String startHash) {
|
||||
return new Cleaner(lurl, startHash);
|
||||
}
|
||||
|
|
|
@ -167,8 +167,6 @@ public final class serverCharBuffer extends Writer {
|
|||
// do not use/implement the following method, a
|
||||
// "overridden method is a bridge method"
|
||||
// will occur
|
||||
// see also: http://www.yacy-forum.de/viewtopic.php?p=26407#26407
|
||||
// and http://www.yacy-forum.de/viewtopic.php?t=2833
|
||||
// public serverCharBuffer append(char b) {
|
||||
// write(b);
|
||||
// return this;
|
||||
|
|
|
@ -90,7 +90,7 @@ public class serverObjects extends Hashtable implements Cloneable {
|
|||
* like put, but it replaces any HTML special chars.
|
||||
*/
|
||||
public Object putSafeXML(Object key, String value){
|
||||
return put(key, htmlTools.replaceXMLEntities(value));
|
||||
return put(key, htmlTools.encodeUnicode2html(value, true));
|
||||
}
|
||||
|
||||
// new put takes also null values
|
||||
|
|
|
@ -169,9 +169,9 @@ public class SearchService extends AbstractService
|
|||
// Postprocess search ...
|
||||
int count = Integer.valueOf(searchResult.get("type_results","0")).intValue();
|
||||
for (int i=0; i < count; i++) {
|
||||
searchResult.put("type_results_" + i + "_url",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_url","")));
|
||||
searchResult.put("type_results_" + i + "_description",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_description","")));
|
||||
searchResult.put("type_results_" + i + "_urlname",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_urlname","")));
|
||||
searchResult.put("type_results_" + i + "_url",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_url",""), false));
|
||||
searchResult.put("type_results_" + i + "_description",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_description",""), true));
|
||||
searchResult.put("type_results_" + i + "_urlname",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_urlname",""), true));
|
||||
}
|
||||
|
||||
// format the result
|
||||
|
|
|
@ -749,12 +749,12 @@ public final class yacyClient {
|
|||
yacyNetwork.enrichRequestPost(post, plasmaSwitchboard.getSwitchboard(), target.hash);
|
||||
post.put("process", "crawl");
|
||||
if (url.length == 1) {
|
||||
post.put("url", crypt.simpleEncode(url[0].toString()));
|
||||
post.put("referrer", crypt.simpleEncode((referrer[0] == null) ? "" : referrer[0].toString()));
|
||||
post.put("url", crypt.simpleEncode(url[0].toNormalform(true, true)));
|
||||
post.put("referrer", crypt.simpleEncode((referrer[0] == null) ? "" : referrer[0].toNormalform(true, true)));
|
||||
} else {
|
||||
for (int i=0; i< url.length; i++) {
|
||||
post.put("url" + i, crypt.simpleEncode(url[i].toString()));
|
||||
post.put("ref" + i, crypt.simpleEncode((referrer[i] == null) ? "" : referrer[i].toString()));
|
||||
post.put("url" + i, crypt.simpleEncode(url[i].toNormalform(true, true)));
|
||||
post.put("ref" + i, crypt.simpleEncode((referrer[i] == null) ? "" : referrer[i].toNormalform(true, true)));
|
||||
}
|
||||
}
|
||||
post.put("depth", "0");
|
||||
|
|
|
@ -144,16 +144,16 @@ public final class yacyVersion implements Comparator, Comparable {
|
|||
|
||||
public boolean equals(Object obj) {
|
||||
yacyVersion v = (yacyVersion) obj;
|
||||
return (this.svn == v.svn) && (this.url.toNormalform().equals(v.url.toNormalform()));
|
||||
return (this.svn == v.svn) && (this.url.toNormalform(true, true).equals(v.url.toNormalform(true, true)));
|
||||
}
|
||||
|
||||
public int hashCode() {
|
||||
return this.url.toNormalform().hashCode();
|
||||
return this.url.toNormalform(true, true).hashCode();
|
||||
}
|
||||
|
||||
public String toAnchor() {
|
||||
// generates an anchor string that can be used to embed in an html for direct download
|
||||
return "<a href=" + this.url.toNormalform() + ">YaCy " + ((this.proRelease) ? "pro release" : "standard release") + " v" + this.releaseNr + ", SVN " + this.svn + "</a>";
|
||||
return "<a href=" + this.url.toNormalform(true, true) + ">YaCy " + ((this.proRelease) ? "pro release" : "standard release") + " v" + this.releaseNr + ", SVN " + this.svn + "</a>";
|
||||
}
|
||||
|
||||
// static methods:
|
||||
|
@ -215,36 +215,54 @@ public final class yacyVersion implements Comparator, Comparable {
|
|||
// check if we know that there is a release that is more recent than that which we are using
|
||||
TreeSet[] releasess = yacyVersion.allReleases(true); // {0=promain, 1=prodev, 2=stdmain, 3=stddev}
|
||||
boolean pro = new File(sb.getRootPath(), "libx").exists();
|
||||
yacyVersion latestmain = (yacyVersion) releasess[(pro) ? 0 : 2].last();
|
||||
yacyVersion latestdev = (yacyVersion) releasess[(pro) ? 1 : 3].last();
|
||||
yacyVersion latestmain = (releasess[(pro) ? 0 : 2].size() == 0) ? null : (yacyVersion) releasess[(pro) ? 0 : 2].last();
|
||||
yacyVersion latestdev = (releasess[(pro) ? 1 : 3].size() == 0) ? null : (yacyVersion) releasess[(pro) ? 1 : 3].last();
|
||||
String concept = sb.getConfig("update.concept", "any");
|
||||
String blacklist = sb.getConfig("update.blacklist", ".\\...[123]");
|
||||
|
||||
if ((manual) || (concept.equals("any"))) {
|
||||
// return a dev-release or a main-release
|
||||
if ((latestdev.compareTo(latestmain) > 0) && (!(Float.toString(latestdev.releaseNr).matches(blacklist)))) {
|
||||
if (latestdev.compareTo(thisVersion()) > 0) return latestdev; else {
|
||||
yacyCore.log.logInfo("rulebasedUpdateInfo: latest dev " + latestdev.name + " is not more recent than installed release " + thisVersion().name);
|
||||
if ((latestdev != null) &&
|
||||
((latestmain == null) || (latestdev.compareTo(latestmain) > 0)) &&
|
||||
(!(Float.toString(latestdev.releaseNr).matches(blacklist)))) {
|
||||
// consider a dev-release
|
||||
if (latestdev.compareTo(thisVersion()) > 0) {
|
||||
return latestdev;
|
||||
} else {
|
||||
yacyCore.log.logInfo(
|
||||
"rulebasedUpdateInfo: latest dev " + latestdev.name +
|
||||
" is not more recent than installed release " + thisVersion().name);
|
||||
return null;
|
||||
}
|
||||
} else {
|
||||
}
|
||||
if (latestmain != null) {
|
||||
// consider a main release
|
||||
if ((Float.toString(latestmain.releaseNr).matches(blacklist))) {
|
||||
yacyCore.log.logInfo("rulebasedUpdateInfo: latest dev " + latestdev.name + " matches with blacklist '" + blacklist + "'");
|
||||
yacyCore.log.logInfo(
|
||||
"rulebasedUpdateInfo: latest dev " + latestdev.name +
|
||||
" matches with blacklist '" + blacklist + "'");
|
||||
return null;
|
||||
}
|
||||
if (latestmain.compareTo(thisVersion()) > 0) return latestmain; else {
|
||||
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " is not more recent than installed release (1) " + thisVersion().name);
|
||||
yacyCore.log.logInfo(
|
||||
"rulebasedUpdateInfo: latest main " + latestmain.name +
|
||||
" is not more recent than installed release (1) " + thisVersion().name);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (concept.equals("main")) {
|
||||
if ((concept.equals("main")) && (latestmain != null)) {
|
||||
// return a main-release
|
||||
if ((Float.toString(latestmain.releaseNr).matches(blacklist))) {
|
||||
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " matches with blacklist'" + blacklist + "'");
|
||||
yacyCore.log.logInfo(
|
||||
"rulebasedUpdateInfo: latest main " + latestmain.name +
|
||||
" matches with blacklist'" + blacklist + "'");
|
||||
return null;
|
||||
}
|
||||
if (latestmain.compareTo(thisVersion()) > 0) return latestmain; else {
|
||||
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " is not more recent than installed release (2) " + thisVersion().name);
|
||||
yacyCore.log.logInfo(
|
||||
"rulebasedUpdateInfo: latest main " + latestmain.name +
|
||||
" is not more recent than installed release (2) " + thisVersion().name);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -906,10 +906,10 @@ public final class yacy {
|
|||
indexURLEntry.Components comp = entry.comp();
|
||||
if ((entry != null) && (comp.url() != null)) {
|
||||
if (html) {
|
||||
bos.write(("<a href=\"" + comp.url().toNormalform() + "\">" + comp.title() + "</a><br>").getBytes("UTF-8"));
|
||||
bos.write(("<a href=\"" + comp.url().toNormalform(false, true) + "\">" + comp.title() + "</a><br>").getBytes("UTF-8"));
|
||||
bos.write(serverCore.crlf);
|
||||
} else {
|
||||
bos.write(comp.url().toNormalform().getBytes());
|
||||
bos.write(comp.url().toNormalform(false, true).getBytes());
|
||||
bos.write(serverCore.crlf);
|
||||
}
|
||||
}
|
||||
|
@ -1037,9 +1037,8 @@ public final class yacy {
|
|||
}
|
||||
|
||||
/**
|
||||
* Searching for peers affected by Bug documented in <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
|
||||
* Searching for peers affected by Bug
|
||||
* @param homePath
|
||||
* @see <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
|
||||
*/
|
||||
public static void testPeerDB(String homePath) {
|
||||
|
||||
|
|
|
@ -27,7 +27,7 @@ Echo **** (C) by Michael Peter Christen, usage granted under the GPL Version 2
|
|||
Echo **** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****
|
||||
Echo ** LOG of YaCy: DATA/LOG/yacy00.log (and yacy^<xx^>.log) **
|
||||
Echo ** STOP YaCy: execute stopYACY.bat and wait some seconds **
|
||||
Echo ** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **
|
||||
Echo ** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **
|
||||
Echo *******************************************************************************
|
||||
Echo ^>^> YaCy started as daemon process. Administration at http://localhost:%port% ^<^<
|
||||
|
||||
|
|
|
@ -5,7 +5,7 @@ echo "**** (C) by Michael Peter Christen, usage granted under the GPL Version 2
|
|||
echo "**** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****"
|
||||
echo "** LOG of YaCy: DATA/LOG/yacy00.log (and yacy<xx>.log) **"
|
||||
echo "** STOP YaCy: execute stopYACY.sh and wait some seconds **"
|
||||
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **"
|
||||
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **"
|
||||
echo "*******************************************************************************"
|
||||
echo " >> YaCy started as daemon process. Administration at http://localhost:8080 <<"
|
||||
echo " You can close this window now, this will NOT shut down your YaCy peer."
|
||||
|
|
|
@ -124,7 +124,7 @@ else
|
|||
echo "**** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****"
|
||||
echo "** LOG of YaCy: DATA/LOG/yacy00.log (and yacy<xx>.log) **"
|
||||
echo "** STOP YaCy: execute stopYACY.sh and wait some seconds **"
|
||||
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **"
|
||||
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **"
|
||||
echo "*******************************************************************************"
|
||||
echo " >> YaCy started as daemon process. Administration at http://localhost:8080 << "
|
||||
eval $cmdline
|
||||
|
|
|
@ -5,7 +5,6 @@ public class ParseVersion extends TestCase {
|
|||
/**
|
||||
* Test method for 'yacy.combinedVersionString2PrettyString(String)'
|
||||
* @author Bost
|
||||
* @link <a href="http://www.yacy-forum.de/viewtopic.php?t=2717">yacy-forum.de: ne Verbesserung von combinedVersionString2PrettyString(...)</a>
|
||||
*/
|
||||
public void testCombinedVersionString2PrettyString() {
|
||||
assertEquals("dev/00000", yacy.combined2prettyVersion("")); // not a number
|
||||
|
|
Loading…
Reference in New Issue
Block a user