- documentaton changes (removed old forum links)

- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
This commit is contained in:
orbiter 2007-07-19 15:32:10 +00:00
parent dcb8687904
commit 40b0547611
80 changed files with 334 additions and 381 deletions

View File

@ -68,7 +68,7 @@ If you download the software, you must accept the <a href="License.html">License
<li><a href="http://www.yacy.net/yacy/release/yacy_v0.52_20070512_3715.exe"><tt>yacy_v0.52_20070512_3715.exe</tt></a></li>
</ul>
</ul></p>
<p>Fresh builds from compiles out of SVN can be obtained from <a href="http://latest.yacy-forum.net">http://latest.yacy-forum.net/</a>.</p>
<p>Fresh builds from compiles out of SVN can be obtained <a href="http://www.findenstattsuchen.info/YaCy/latest/index.php">here</a>.</p>
<br><h3>Installation</h3>
<p><ul>

View File

@ -37,7 +37,6 @@ Example:
# first published on http://www.anomic.de
# Frankfurt, Germany, 2005
#
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
# This file is written by (chronological order) Roland Ramthun <admin@yacy-forum.de>, Oliver Wunder <webmaster@daburna.de>, Jan Sandbrink
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
@ -94,7 +93,6 @@ Full example:
# first published on http://www.anomic.de
# Frankfurt, Germany, 2005
#
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
# This file is written by (chronological order) Roland Ramthun <admin@yacy-forum.de>, Oliver Wunder <webmaster@daburna.de>, Jan Sandbrink
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
@ -106,7 +104,7 @@ Full example:
# Thank you for your help!
<!-- lang -->default\(english\)==Deutsch
<!-- author -->==Roland Ramthun, Oliver Wunder, Jan Sandbrink
<!-- maintainer -->==&lt;admin@yacy-forum.de&gt;
<!-- maintainer -->==
#-----------------------------
#File: Blacklist_p.html

View File

@ -54,10 +54,9 @@ globalheader();
<p>Other YaCy Project Sites
<ul>
<li><a href="http://www.yacy-websuche.de/wiki"><b>YaCy Wiki</b></a> - administrated by Alexander Schier</li>
<li><a href="http://www.yacy-websuche.de"><b>German documentation</b></a> - initiated and administrated by Alexander Schier</li>
<li><a href="http://www.yacy-forum.de"><b>Deutsches Forum</b></a>, administrated by Roland Ramthun</li>
<li><a href="http://sourceforge.net/forum/?group_id=116142"><b>English Forum</b></a></li>
<li><a href="http://www.yacy-websearch.net/wiki"><b>YaCy Wiki</b></a></li>
<li><a href="http://forum.yacy.de"><b>Deutsches Forum</b></a></li>
<li><a href="http://yacy-forum.huzzaar.com/"><b>English Forum</b></a></li>
<li><a href="http://developer.berlios.de/projects/yacy/"><b>YaCy at BerliOS</b> - our SVN hosting service</li>
<li><a href="http://freshmeat.net/projects/yacyproxy/"><b>YaCy at fresmeat.net</b></a> - Project Announcement Page (please click here to support the project and enhance Rating/Popularity)</li>
<li><a href="http://sourceforge.net/projects/yacy/"><b>YaCy at sourceforge.net</b></a> - Project Services; Forum and (in the future) CVS Hosting.</li>
@ -65,10 +64,10 @@ globalheader();
<p>Public Interfaces to YaCy Services and Statistics
<ul>
<li><a href="http://www.yacystats.de/"><b>Statistics about the YaCy network and indexed pages</b></a> - from Alexander Fieger</li>
<li><a href="http://yacy.naggel.info/"><b>PHP-based Interface to YaCy Search</b> using YaCys RSS Search Result Output</a> - from Hendrik Richter</li>
<li><a href="http://www.deruwe.de/yacy.html"><b>Public Interface for Crawl-Start Entry</b> - from <a href="http://www.deruwe.de/">slick</a></li>
<li><a href="http://yacy.naggel.info/stats.php"><b>Stats about the YaCy network and indexed pages</b></a> - from Hendrik Richter</li>
<li><a href="http://borg-0300.dyndns.org:3000/"><b>Statistics about the YaCy network and indexed pages</b></a> - from Thomas/Borg-0300</li>
</ul></p><br>
<p>Publications about YaCy

View File

@ -342,7 +342,7 @@ location.</li>
<li>enhancements to YaCyWiki</li>
<li>added interface for customised blacklist classes</li>
<li>enhancements for dir.html application: dirlisting for all empty directories, new place in htroot/htdocsdefault</li>
<li>Interface YPStats_p.html for http://ypstats.yacy-forum.de/index.php to collect statistics</li>
<li>Interface YPStats_p.html to collect statistics</li>
</ul>
<li>Enhanced Stability</li>
<ul>
@ -746,7 +746,6 @@ location.</li>
<li>auto-heal of seed.db - fail</li>
<li>many minor bug fixed</li>
</ul>
<li>new <a href="http://www.yacy-forum.de">german forum at http://www.yacy-forum.de</a>, provided by Roland Ramthun</li>
</ul>
<br><p>v0.33_build20050107

View File

@ -59,13 +59,11 @@ globalheader();
<li><b>Timo Leise</b> suggested and implemented an extension to the blacklist feature: part-of-domain matching.</li>
<li><b>Marc Nause</b> made many major enhancements to the YaCyWiki, the Message- and User-Profile menues and functions.</li>
<li><b>Thomas Quella</b> designed the Kaskelix mascot. He also made a large number of bug fixes.</li>
<li><b>Roland Ramthun</b> owns and administrates the <a href="http://www.yacy-forum.de/">German YaCy-Forum</a>. He publishes a monthly YaCy newsletter, cares for correct English spelling and a German translation of the YaCy user interface. Roland and other forum participants extended the PHPForum code to make it usable as bug- and feature-tracking system..</li>
<li><b>Wolfgang Sander-Beuermann</b>, executive board member of the German search-engine association <a href="http://www.suma-ev.de/">SuMa-eV</a>
and manager of the meta-search-engine <a href="http://www.metager.de">metaGer</a> provided computing resources for a <a href="http://www.suma-lab.de:8080">demo peer</a>. He also pushed the project by arranging promotional events.</li>
<li><b>Alexander Schier</b> did much alpha-testing from beginning of project, and suggested many features; implemented the blacklist feature, bookmarks, log-menu, user-db, skin-feature, windows-installer and provided first implementation of the yacybar Firefox extension; admin of yacy-websuche.de and the media-wiki at yacy-websuche.de/wiki.</li>
<li><b>Alexander Schier</b> did much alpha-testing from beginning of project, and suggested many features; implemented the blacklist feature, bookmarks, log-menu, user-db, skin-feature, windows-installer and provided first implementation of the yacybar Firefox extension.</li>
<li><b>Matthias S&ouml;hnholz</b> added the offline-browsing feature</li>
<li><b>slick</b> helps as packager (.rpm, .deb etc)</li>
<li><b>Martin Thelian</b> made system-wide performance enhancement by introducing thread pools; he added ICAP and SOAP support, most of external parser integration, maintains the http protocol implementation, added squid compatibility, robots protocol, better logging and many index protocol, import/export and transfer enhancements. He created a YaCy screensaver and coded major parts of the yacybar Firefox extension.</li>
<li><b>Oliver Wunder</b> provided some german translation. He also made bittorrent-releases</li>
</ul>

View File

@ -59,7 +59,7 @@ In case you don't know how to make such a file please read <a href="http://www.r
<br>
After some hours all yacybots will obey your instructions.
<h3>This didn't help me.</h3>
If there are any questions left please visit our <a href="http://www.yacy-forum.net">forum</a> and ask for help.
If there are any questions left please visit our <a href="http://forum.yacy.net">forum</a> and ask for help.
<!-- ----- HERE ENDS CONTENT PART ----- -->
<SCRIPT LANGUAGE="JavaScript1.1"><!--
globalfooter();

View File

@ -77,7 +77,7 @@ globalheader();
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle">&nbsp;<a href="Contact.html" class="dark">Contact</a></td></tr>
<tr><td height="2"></td></tr><tr><td height="20" class="white" bgcolor="#FFFFFF" valign="middle">&nbsp;</td></tr>
<tr><td height="2"></td></tr>
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle">&nbsp;<a href="http://www.yacy-forum.de" class="dark"><nobr>Deutsches Forum</nobr></a></td></tr>
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle">&nbsp;<a href="http://forum.yacy.de" class="dark"><nobr>Deutsches Forum</nobr></a></td></tr>
<tr><td height="2"></td></tr>
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle">&nbsp;<a href="http://sourceforge.net/forum/?group_id=116142" class="dark">English Forum</a></td></tr>
<tr><td height="2"></td></tr><tr><td height="20" class="white" bgcolor="#FFFFFF" valign="middle">&nbsp;</td></tr>

View File

@ -174,7 +174,7 @@ public class Bookmarks {
indexURLEntry.Components comp = urlentry.comp();
document = switchboard.snippetCache.retrieveDocument(comp.url(), true, 5000, true);
prop.put("mode_edit", 0); // create mode
prop.put("mode_url", comp.url().toNormalform());
prop.put("mode_url", comp.url().toNormalform(false, true));
prop.put("mode_title", comp.title());
prop.put("mode_description", (document == null) ? comp.title(): document.getTitle());
prop.put("mode_author", comp.author());
@ -270,9 +270,9 @@ public class Bookmarks {
bookmark=switchboard.bookmarksDB.getBookmark((String)it.next());
if(bookmark!=null){
if(bookmark.getFeed() && isAdmin)
prop.put("bookmarks_"+count+"_link", "/FeedReader_p.html?url="+de.anomic.data.htmlTools.replaceXMLEntities(bookmark.getUrl()));
prop.put("bookmarks_"+count+"_link", "/FeedReader_p.html?url="+de.anomic.data.htmlTools.encodeUnicode2html(bookmark.getUrl(), false));
else
prop.put("bookmarks_"+count+"_link", de.anomic.data.htmlTools.replaceXMLEntities(bookmark.getUrl()));
prop.put("bookmarks_"+count+"_link", de.anomic.data.htmlTools.encodeUnicode2html(bookmark.getUrl(), false));
prop.put("bookmarks_"+count+"_title", bookmark.getTitle());
prop.put("bookmarks_"+count+"_description", bookmark.getDescription());
prop.put("bookmarks_"+count+"_date", serverDate.dateToiso8601(new Date(bookmark.getTimeStamp())));

View File

@ -127,7 +127,7 @@ public class CacheAdmin_p {
// path.append((pathString.length() == 0) ? linkPathString("/", true) : linkPathString(pathString, false));
linkPathString(prop, ((pathString.length() == 0) ? ("/") : (pathString)), true);
urlstr = url.toNormalform();
urlstr = url.toNormalform(true, true);
prop.put("info_url", urlstr);
info.ensureCapacity(10000);
@ -286,9 +286,9 @@ public class CacheAdmin_p {
descr = ((String) entry.getValue()).trim();
if (descr.length() == 0) { descr = "-"; }
prop.put("info_type_use." + extension + "_" + extension + "_" + i + "_name",
de.anomic.data.htmlTools.replaceXMLEntities(descr.replaceAll("\n", "").trim()));
de.anomic.data.htmlTools.encodeUnicode2html(descr.replaceAll("\n", "").trim(), true));
prop.put("info_type_use." + extension + "_" + extension + "_" + i + "_link",
de.anomic.data.htmlTools.replaceXMLEntities(entry.getKey().toString()));
de.anomic.data.htmlTools.encodeUnicode2html(entry.getKey().toString(), true));
i++;
}
prop.put("info_type_use." + extension, (i == 0) ? 0 : 1);
@ -303,7 +303,7 @@ public class CacheAdmin_p {
ie = (htmlFilterImageEntry) iter.next();
prop.put("info_type_use.images_images_" + i + "_name", ie.alt().replaceAll("\n", "").trim());
prop.put("info_type_use.images_images_" + i + "_link",
de.anomic.data.htmlTools.replaceXMLEntities(ie.url().toNormalform()));
de.anomic.data.htmlTools.encodeUnicode2html(ie.url().toNormalform(false, true), false));
i++;
}
prop.put("info_type_use.images", (i == 0) ? 0 : 1);

View File

@ -171,7 +171,7 @@ public class CrawlResults {
initiatorSeed = yacyCore.seedDB.getConnected(initiatorHash);
executorSeed = yacyCore.seedDB.getConnected(executorHash);
urlstr = comp.url().toNormalform();
urlstr = comp.url().toNormalform(false, true);
urltxt = nxTools.shortenURLString(urlstr, 72); // shorten the string text like a URL
cachepath = cacheManager.getCachePath(new URL(urlstr)).toString().replace('\\', '/').substring(cacheManager.cachePath.toString().length() + 1);

View File

@ -143,8 +143,8 @@ public class CrawlURLFetch_p {
if (post.get("source", "").equals("url")) {
try {
url = new URL(post.get("host", null));
if (!savedURLs.contains(url.toNormalform()))
savedURLs.add(url.toNormalform());
if (!savedURLs.contains(url.toNormalform(true, true)))
savedURLs.add(url.toNormalform(true, true));
prop.put("host", post.get("host", url.toString()));
} catch (MalformedURLException e) {
prop.put("host", post.get("host", ""));

View File

@ -283,7 +283,7 @@ public class DetailedSearch {
prop.put("type_results_" + i + "_former", results.getFormerSearch());
prop.put("type_results_" + i + "_rankingprops", result.getUrlentry().word().toPropertyForm() + ", domLengthEstimated=" + plasmaURL.domLengthEstimation(result.getUrlhash()) +
((plasmaURL.probablyRootURL(result.getUrlhash())) ? ", probablyRootURL" : "") +
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform() : ""));
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform(false, false) : ""));
// adding snippet if available
if (result.hasSnippet()) {
prop.put("type_results_" + i + "_snippet", 1);

View File

@ -188,7 +188,7 @@ public class IndexControl_p {
if (entry == null) {
prop.put("result", "No Entry for URL hash " + urlhash + "; nothing deleted.");
} else {
urlstring = entry.comp().url().toNormalform();
urlstring = entry.comp().url().toNormalform(false, true);
prop.put("urlstring", "");
switchboard.urlRemove(urlhash);
prop.put("result", "Removed URL " + urlstring);
@ -328,7 +328,7 @@ public class IndexControl_p {
if (entry == null) {
prop.put("result", "No Entry for URL hash " + urlhash);
} else {
prop.put("urlstring", entry.comp().url().toNormalform());
prop.put("urlstring", entry.comp().url().toNormalform(false, true));
prop.putAll(genUrlProfile(switchboard, entry, urlhash));
}
}
@ -464,7 +464,7 @@ public class IndexControl_p {
if (le == null) {
referrer = "<unknown>";
} else {
referrer = le.comp().url().toNormalform();
referrer = le.comp().url().toNormalform(false, true);
}
if (comp.url() == null) {
prop.put("genUrlProfile", 1);
@ -472,7 +472,7 @@ public class IndexControl_p {
return prop;
}
prop.put("genUrlProfile", 2);
prop.put("genUrlProfile_urlNormalform", comp.url().toNormalform());
prop.put("genUrlProfile_urlNormalform", comp.url().toNormalform(false, true));
prop.put("genUrlProfile_urlhash", urlhash);
prop.put("genUrlProfile_urlDescr", comp.title());
prop.put("genUrlProfile_moddate", entry.moddate());
@ -513,7 +513,7 @@ public class IndexControl_p {
if (le == null) {
tm.put(uh[0], uh);
} else {
us = le.comp().url().toNormalform();
us = le.comp().url().toNormalform(false, true);
tm.put(us, uh);
}

View File

@ -140,11 +140,11 @@ public class IndexCreateIndexingQueue_p {
totalSize += entrySize;
initiator = yacyCore.seedDB.getConnected(pcentry.initiator());
prop.put("indexing-queue_list_"+entryCount+"_dark", (inProcess)? 2: ((dark) ? 1 : 0));
prop.put("indexing-queue_list_"+entryCount+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())));
prop.put("indexing-queue_list_"+entryCount+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)));
prop.put("indexing-queue_list_"+entryCount+"_depth", pcentry.depth());
prop.put("indexing-queue_list_"+entryCount+"_modified", pcentry.getModificationDate());
prop.put("indexing-queue_list_"+entryCount+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.replaceHTML(pcentry.anchorName()));
prop.put("indexing-queue_list_"+entryCount+"_url", htmlTools.replaceHTML(pcentry.normalizedURLString()));
prop.put("indexing-queue_list_"+entryCount+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.encodeUnicode2html(pcentry.anchorName(), true));
prop.put("indexing-queue_list_"+entryCount+"_url", htmlTools.encodeUnicode2html(pcentry.url().toNormalform(false, true), false));
prop.put("indexing-queue_list_"+entryCount+"_size", bytesToString(entrySize));
prop.put("indexing-queue_list_"+entryCount+"_inProcess", (inProcess)?1:0);
prop.put("indexing-queue_list_"+entryCount+"_inProcess_hash", pcentry.urlHash());
@ -187,9 +187,9 @@ public class IndexCreateIndexingQueue_p {
executorHash = entry.executor();
initiatorSeed = yacyCore.seedDB.getConnected(initiatorHash);
executorSeed = yacyCore.seedDB.getConnected(executorHash);
prop.put("rejected_list_"+j+"_initiator", ((initiatorSeed == null) ? "proxy" : htmlTools.replaceHTML(initiatorSeed.getName())));
prop.put("rejected_list_"+j+"_executor", ((executorSeed == null) ? "proxy" : htmlTools.replaceHTML(executorSeed.getName())));
prop.put("rejected_list_"+j+"_url", htmlTools.replaceHTML(url.toString()));
prop.put("rejected_list_"+j+"_initiator", ((initiatorSeed == null) ? "proxy" : htmlTools.encodeUnicode2html(initiatorSeed.getName(), true)));
prop.put("rejected_list_"+j+"_executor", ((executorSeed == null) ? "proxy" : htmlTools.encodeUnicode2html(executorSeed.getName(), true)));
prop.put("rejected_list_"+j+"_url", htmlTools.encodeUnicode2html(url.toNormalform(false, true), false));
prop.put("rejected_list_"+j+"_failreason", entry.anycause());
prop.put("rejected_list_"+j+"_dark", ((dark) ? 1 : 0));
dark = !dark;

View File

@ -80,9 +80,9 @@ public class IndexCreateLoaderQueue_p {
initiator = yacyCore.seedDB.getConnected(theMsg.initiator);
prop.put("loader-set_list_"+count+"_dark", ((dark) ? 1 : 0) );
prop.put("loader-set_list_"+count+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
prop.put("loader-set_list_"+count+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
prop.put("loader-set_list_"+count+"_depth", theMsg.depth );
prop.put("loader-set_list_"+count+"_url", htmlTools.replaceHTML(theMsg.url.toString())); // null pointer exception here !!! maybe url = null; check reason.
prop.put("loader-set_list_"+count+"_url", htmlTools.encodeUnicode2html(theMsg.url.toNormalform(false, true), false)); // null pointer exception here !!! maybe url = null; check reason.
dark = !dark;
count++;
}

View File

@ -120,12 +120,12 @@ public class IndexCreateWWWGlobalQueue_p {
profileHandle = urle.profileHandle();
profileEntry = (profileHandle == null) ? null : switchboard.profiles.getEntry(profileHandle);
prop.put("crawler-queue_list_"+showNum+"_dark", ((dark) ? 1 : 0) );
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : htmlTools.replaceHTML(profileEntry.name())));
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : htmlTools.encodeUnicode2html(profileEntry.name(), true)));
prop.put("crawler-queue_list_"+showNum+"_depth", urle.depth());
prop.put("crawler-queue_list_"+showNum+"_modified", daydate(urle.loaddate()) );
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.replaceHTML(urle.name()));
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.replaceHTML(urle.url().toString()));
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.encodeUnicode2html(urle.name(), true));
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.encodeUnicode2html(urle.url().toNormalform(false, true), false));
prop.put("crawler-queue_list_"+showNum+"_hash", urle.urlhash());
dark = !dark;
showNum++;

View File

@ -135,7 +135,7 @@ public class IndexCreateWWWLocalQueue_p {
case ANCHOR: value = entry.name(); break;
case DEPTH: value = Integer.toString(entry.depth()); break;
case INITIATOR:
value = (entry.initiator() == null) ? "proxy" : htmlTools.replaceHTML(entry.initiator());
value = (entry.initiator() == null) ? "proxy" : htmlTools.encodeUnicode2html(entry.initiator(), true);
break;
case MODIFIED: value = daydate(entry.loaddate()); break;
default: value = null;
@ -184,12 +184,12 @@ public class IndexCreateWWWLocalQueue_p {
profileHandle = urle.profileHandle();
profileEntry = (profileHandle == null) ? null : switchboard.profiles.getEntry(profileHandle);
prop.put("crawler-queue_list_"+showNum+"_dark", ((dark) ? 1 : 0) );
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())) );
prop.put("crawler-queue_list_"+showNum+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)) );
prop.put("crawler-queue_list_"+showNum+"_profile", ((profileEntry == null) ? "unknown" : profileEntry.name()));
prop.put("crawler-queue_list_"+showNum+"_depth", urle.depth());
prop.put("crawler-queue_list_"+showNum+"_modified", daydate(urle.loaddate()) );
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.replaceHTML(urle.name()));
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.replaceHTML(urle.url().toString()));
prop.put("crawler-queue_list_"+showNum+"_anchor", htmlTools.encodeUnicode2html(urle.name(), true));
prop.put("crawler-queue_list_"+showNum+"_url", htmlTools.encodeUnicode2html(urle.url().toNormalform(false, true), false));
prop.put("crawler-queue_list_"+showNum+"_hash", urle.urlhash());
dark = !dark;
showNum++;

View File

@ -53,6 +53,7 @@ import java.net.MalformedURLException;
import java.net.URLDecoder;
import java.util.Date;
import de.anomic.data.htmlTools;
import de.anomic.http.httpHeader;
import de.anomic.plasma.plasmaURL;
import de.anomic.net.URL;
@ -121,23 +122,13 @@ public class QuickCrawlLink_p {
boolean xsstopw = post.get("xsstopw", "").equals("on");
boolean xdstopw = post.get("xdstopw", "").equals("on");
boolean xpstopw = post.get("xpstopw", "").equals("on");
String escapedTitle = (title==null)?"unknown":title.replaceAll("&","&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll("\"", "&quot;");
String escapedURL = (crawlingStart==null)?"unknown":crawlingStart.replaceAll("&","&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll("\"", "&quot;");
prop.put("mode_url",escapedURL);
prop.put("mode_title",escapedTitle);
prop.put("mode_url", (crawlingStart == null) ? "unknown" : htmlTools.encodeUnicode2html(crawlingStart, false));
prop.put("mode_title", (title == null) ? "unknown" : htmlTools.encodeUnicode2html(title, true));
if (crawlingStart != null) {
crawlingStart = crawlingStart.trim();
try {crawlingStart = new URL(crawlingStart).toNormalform();} catch (MalformedURLException e1) {}
try {crawlingStart = new URL(crawlingStart).toNormalform(true, true);} catch (MalformedURLException e1) {}
// check if url is proper
URL crawlingStartURL = null;

View File

@ -39,8 +39,8 @@
#(warningGoOnline)#::
<dt class="hintIcon"><img src="env/grafics/bad.png" width="32" height="32" alt="bad"/></dt>
<dd class="hint">The peer must go online to get a peer address.
If you don't know how to configure your system to use a proxy,
see the <a href="http://www.yacy.net/yacy/Installation.html#wininst">installation instructions</a>.
If you don't know how to configure your system,
see the <a href="http://www.yacy.net/yacy/Installation.html">installation instructions</a>.
</dd>
#(/warningGoOnline)#

View File

@ -147,7 +147,7 @@ public class Supporter {
prop.put("supporter_results_" + i + "_authorized_recommend_showScore", (showScore ? 1 : 0));
prop.put("supporter_results_" + i + "_authorized_urlhash", urlhash);
prop.put("supporter_results_" + i + "_url", de.anomic.data.htmlTools.replaceXMLEntities(url));
prop.put("supporter_results_" + i + "_url", de.anomic.data.htmlTools.encodeUnicode2html(url, false));
prop.put("supporter_results_" + i + "_urlname", nxTools.shortenURLString(url, 60));
prop.put("supporter_results_" + i + "_urlhash", urlhash);
prop.put("supporter_results_" + i + "_title", (showScore) ? ("(" + ranking.getScore(urlhash) + ") " + title) : title);

View File

@ -155,7 +155,7 @@ public class Surftips {
prop.put("surftips_results_" + i + "_authorized_recommend_showScore", (showScore ? 1 : 0));
prop.put("surftips_results_" + i + "_authorized_urlhash", urlhash);
prop.put("surftips_results_" + i + "_url", de.anomic.data.htmlTools.replaceXMLEntities(url));
prop.put("surftips_results_" + i + "_url", de.anomic.data.htmlTools.encodeUnicode2html(url, false));
prop.put("surftips_results_" + i + "_urlname", nxTools.shortenURLString(url, 60));
prop.put("surftips_results_" + i + "_urlhash", urlhash);
prop.put("surftips_results_" + i + "_title", (showScore) ? ("(" + ranking.getScore(urlhash) + ") " + title) : title);

View File

@ -270,7 +270,7 @@ public class ViewFile {
} else if (viewMode.equals("iframe")) {
prop.put("viewMode", VIEW_MODE_AS_IFRAME);
prop.put("viewMode_url", url.toNormalform());
prop.put("viewMode_url", url.toNormalform(false, true));
} else if (viewMode.equals("parsed") || viewMode.equals("sentences") || viewMode.equals("links")) {
// parsing the resource content
@ -348,8 +348,8 @@ public class ViewFile {
prop.put("viewMode_links_" + i + "_dark", ((dark) ? 1 : 0));
prop.put("viewMode_links_" + i + "_type", "image");
prop.putASIS("viewMode_links_" + i + "_text", markup(wordArray, entry.alt()));
prop.put("viewMode_links_" + i + "_url", (String) entry.url().toNormalform());
prop.putASIS("viewMode_links_" + i + "_link", markup(wordArray, (String) entry.url().toNormalform()));
prop.put("viewMode_links_" + i + "_url", (String) entry.url().toNormalform(false, true));
prop.putASIS("viewMode_links_" + i + "_link", markup(wordArray, (String) entry.url().toNormalform(false, true)));
if (entry.width() > 0 && entry.height() > 0)
prop.putASIS("viewMode_links_" + i + "_attr", entry.width() + "x" + entry.height() + " Pixel");
else
@ -365,7 +365,7 @@ public class ViewFile {
if (document != null) document.close();
}
prop.put("error", 0);
prop.put("error_url", url.toNormalform());
prop.put("error_url", url.toNormalform(false, true));
prop.put("error_hash", urlHash);
prop.put("error_wordCount", Integer.toString(wordCount));
prop.put("error_desc", descr);
@ -386,7 +386,7 @@ public class ViewFile {
}
private static final String markup(String[] wordArray, String message) {
message = htmlTools.replaceXMLEntities(message);
message = htmlTools.encodeUnicode2html(message, true);
if (wordArray != null)
for (int j = 0; j < wordArray.length; j++) {
String currentWord = wordArray[j].trim();

View File

@ -152,7 +152,7 @@ public class WatchCrawler_p {
if (pos == -1) crawlingStart = "http://" + crawlingStart;
// normalizing URL
try {crawlingStart = new URL(crawlingStart).toNormalform();} catch (MalformedURLException e1) {}
try {crawlingStart = new URL(crawlingStart).toNormalform(true, true);} catch (MalformedURLException e1) {}
// check if url is proper
URL crawlingStartURL = null;
@ -276,7 +276,7 @@ public class WatchCrawler_p {
nexturlstring = nexturlstring.trim();
// normalizing URL
nexturlstring = new URL(nexturlstring).toNormalform();
nexturlstring = new URL(nexturlstring).toNormalform(true, true);
// generating an url object
URL nexturlURL = null;

View File

@ -62,8 +62,8 @@ public class config_p {
int count=0;
while(keys.hasNext()){
key = (String) keys.next();
prop.put("options_"+count+"_key", htmlTools.replaceXMLEntities(key));
prop.put("options_"+count+"_value", htmlTools.replaceXMLEntities(env.getConfig(key, "ERROR")));
prop.put("options_"+count+"_key", htmlTools.encodeUnicode2html(key, true));
prop.put("options_"+count+"_value", htmlTools.encodeUnicode2html(env.getConfig(key, "ERROR"), true));
count++;
}
prop.put("options", count);

View File

@ -125,11 +125,11 @@ public class queues_p {
totalSize += entrySize;
initiator = yacyCore.seedDB.getConnected(pcentry.initiator());
prop.put("list-indexing_"+i+"_profile", (pcentry.profile() != null) ? pcentry.profile().name() : "deleted");
prop.putSafeXML("list-indexing_"+i+"_initiator", ((initiator == null) ? "proxy" : htmlTools.replaceHTML(initiator.getName())));
prop.putSafeXML("list-indexing_"+i+"_initiator", ((initiator == null) ? "proxy" : htmlTools.encodeUnicode2html(initiator.getName(), true)));
prop.put("list-indexing_"+i+"_depth", pcentry.depth());
prop.put("list-indexing_"+i+"_modified", pcentry.getModificationDate());
prop.putSafeXML("list-indexing_"+i+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.replaceHTML(pcentry.anchorName()));
prop.putSafeXML("list-indexing_"+i+"_url", pcentry.normalizedURLString());
prop.putSafeXML("list-indexing_"+i+"_anchor", (pcentry.anchorName()==null)?"":htmlTools.encodeUnicode2html(pcentry.anchorName(), true));
prop.putSafeXML("list-indexing_"+i+"_url", pcentry.url().toNormalform(false, true));
prop.put("list-indexing_"+i+"_size", entrySize);
prop.put("list-indexing_"+i+"_inProcess", (inProcess)?1:0);
prop.put("list-indexing_"+i+"_hash", pcentry.urlHash());
@ -199,7 +199,7 @@ public class queues_p {
prop.put(tableName + "_" + showNum + "_depth", urle.depth());
prop.put(tableName + "_" + showNum + "_modified", daydate(urle.loaddate()));
prop.putSafeXML(tableName + "_" + showNum + "_anchor", urle.name());
prop.putSafeXML(tableName + "_" + showNum + "_url", urle.url().toString());
prop.putSafeXML(tableName + "_" + showNum + "_url", urle.url().toNormalform(false, true));
prop.put(tableName + "_" + showNum + "_hash", urle.urlhash());
showNum++;
}

View File

@ -182,11 +182,11 @@ public final class crawlOrder {
// old method: only one url
// normalizing URL
String newURL = new URL((String) urlv.get(0)).toNormalform();
String newURL = new URL((String) urlv.get(0)).toNormalform(true, true);
if (!newURL.equals(urlv.get(0))) {
env.getLog().logWarning("crawlOrder: Received not normalized URL " + urlv.get(0));
}
String refURL = (refv.get(0) == null) ? null : new URL((String) refv.get(0)).toNormalform();
String refURL = (refv.get(0) == null) ? null : new URL((String) refv.get(0)).toNormalform(true, true);
if ((refURL != null) && (!refURL.equals(refv.get(0)))) {
env.getLog().logWarning("crawlOrder: Received not normalized Referer URL " + refv.get(0) + " of URL " + urlv.get(0));
}

View File

@ -151,7 +151,7 @@ public final class crawlReceipt {
switchboard.wordIndex.loadedURL.store(entry);
switchboard.wordIndex.loadedURL.stack(entry, youare, iam, 1);
switchboard.delegatedURL.remove(entry.hash()); // the delegated work has been done
log.logInfo("crawlReceipt: RECEIVED RECEIPT from " + otherPeerName + " for URL " + entry.hash() + ":" + comp.url().toNormalform());
log.logInfo("crawlReceipt: RECEIVED RECEIPT from " + otherPeerName + " for URL " + entry.hash() + ":" + comp.url().toNormalform(false, true));
// ready for more
prop.putASIS("delay", "10");

View File

@ -125,7 +125,7 @@ public final class list {
int cnt = 0;
for (int i=0; i<count; i++) {
if ((url = db.pop()) == null) continue;
b.append(htmlTools.deReplaceHTMLEntities(url.toNormalform())).append("\n");
b.append(htmlTools.decodeHtml2Unicode(url.toNormalform(false, true))).append("\n");
cnt++;
}
prop.put("list", b);

View File

@ -135,7 +135,7 @@ public final class transferURL {
// check if the entry is blacklisted
if ((blockBlacklist) && (plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_DHT, lEntry.hash(), comp.url()))) {
int deleted = sb.wordIndex.tryRemoveURLs(lEntry.hash());
yacyCore.log.logFine("transferURL: blocked blacklisted URL '" + comp.url().toNormalform() + "' from peer " + otherPeerName + "; deleted " + deleted + " URL entries from RWIs");
yacyCore.log.logFine("transferURL: blocked blacklisted URL '" + comp.url().toNormalform(false, true) + "' from peer " + otherPeerName + "; deleted " + deleted + " URL entries from RWIs");
lEntry = null;
blocked++;
continue;
@ -145,7 +145,7 @@ public final class transferURL {
try {
sb.wordIndex.loadedURL.store(lEntry);
sb.wordIndex.loadedURL.stack(lEntry, iam, iam, 3);
yacyCore.log.logFine("transferURL: received URL '" + comp.url().toNormalform() + "' from peer " + otherPeerName);
yacyCore.log.logFine("transferURL: received URL '" + comp.url().toNormalform(false, true) + "' from peer " + otherPeerName);
received++;
} catch (IOException e) {
e.printStackTrace();

View File

@ -250,7 +250,7 @@ public class yacysearch {
if (document != null) {
// create a news message
HashMap map = new HashMap();
map.put("url", comp.url().toNormalform().replace(',', '|'));
map.put("url", comp.url().toNormalform(false, true).replace(',', '|'));
map.put("title", comp.title().replace(',', ' '));
map.put("description", ((document == null) ? comp.title() : document.getTitle()).replace(',', ' '));
map.put("author", ((document == null) ? "" : document.getAuthor()));
@ -314,8 +314,6 @@ public class yacysearch {
for(int i=0;i<results.numResults();i++){
plasmaSearchResults.searchResult result=results.getResult(i);
prop.put("type_results_" + i + "_authorized_recommend", (yacyCore.newsPool.getSpecific(yacyNewsPool.OUTGOING_DB, yacyNewsPool.CATEGORY_SURFTIPP_ADD, "url", result.getUrl()) == null) ? 1 : 0);
//prop.put("type_results_" + i + "_authorized_recommend_deletelink", "/yacysearch.html?search=" + results.getFormerSearch() + "&amp;Enter=Search&amp;count=" + results.getQuery().wantedResults + "&amp;order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&amp;resource=local&amp;time=3&amp;deleteref=" + result.getUrlhash() + "&amp;urlmaskfilter=.*");
//prop.put("type_results_" + i + "_authorized_recommend_recommendlink", "/yacysearch.html?search=" + results.getFormerSearch() + "&amp;Enter=Search&amp;count=" + results.getQuery().wantedResults + "&amp;order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&amp;resource=local&amp;time=3&amp;recommendref=" + result.getUrlhash() + "&amp;urlmaskfilter=.*");
prop.put("type_results_" + i + "_authorized_recommend_deletelink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&deleteref=" + result.getUrlhash() + "&urlmaskfilter=.*");
prop.put("type_results_" + i + "_authorized_recommend_recommendlink", "/yacysearch.html?search=" + results.getFormerSearch() + "&Enter=Search&count=" + results.getQuery().wantedResults + "&order=" + crypt.simpleEncode(results.getRanking().toExternalString()) + "&resource=local&time=3&recommendref=" + result.getUrlhash() + "&urlmaskfilter=.*");
prop.put("type_results_" + i + "_authorized_urlhash", result.getUrlhash());
@ -339,7 +337,7 @@ public class yacysearch {
prop.put("type_results_" + i + "_former", results.getFormerSearch());
prop.put("type_results_" + i + "_rankingprops", result.getUrlentry().word().toPropertyForm() + ", domLengthEstimated=" + plasmaURL.domLengthEstimation(result.getUrlhash()) +
((plasmaURL.probablyRootURL(result.getUrlhash())) ? ", probablyRootURL" : "") +
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform() : ""));
(((wordURL = plasmaURL.probablyWordURL(result.getUrlhash(), query[0])) != null) ? ", probablyWordURL=" + wordURL.toNormalform(false, true) : ""));
// adding snippet if available
if (result.hasSnippet()) {
prop.put("type_results_" + i + "_snippet", 1);

View File

@ -1859,7 +1859,7 @@ and set an administration password.==und geben Sie ein Administrator Passwort ei
You have not published your peer seed yet. This happens automatically, just wait.==Ihr Peer ist dem Netzwerk noch nicht bekannt. Warten Sie noch ein wenig, dies geschieht automatisch.
While you have this status you are not allowed to search other peers.==W&auml;hrend Sie diesen Status haben, ist es Ihnen nicht erlaubt andere Peers zu durchsuchen.
The peer must go online to get a peer address.==Ihr Peer muss online gehen, um eine Adresse zu bekommen.
If you don't know how to configure your system to use a proxy,==Wenn Sie nicht wissen, wie Sie Ihr System konfigurieren, sodass es einen Proxy benutzt,
If you don't know how to configure your system,==Wenn Sie nicht wissen, wie Sie Ihr System konfigurieren,
see the <a==lesen Sie die <a
installation instructions</a>.==Installationsanleitung</a>.
You cannot be reached from outside.==Ihr Peer kann nicht von au&szlig;en erreicht werden.

View File

@ -7,7 +7,6 @@
# first published on http://www.anomic.de
# Frankfurt, Germany, 2005
#
# This file is maintained by Roland Ramthun <admin@yacy-forum.de>
# This file is written by (chronological order) Riccardo Lemmi <riccardo@reflab.it>
# If you find any mistakes or untranslated strings in this file please don't hesitate to email them to the maintainer.
@ -19,11 +18,10 @@
#Thank you for your help!
<!-- lang -->default\(english\)==Italian
<!-- author -->==Riccardo Lemmi
<!-- maintainer -->==&lt;admin@yacy-forum.de&gt;
<!-- maintainer -->==
#-----------------------------------------------------------
#File: Blacklist_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Blacklist Manager==Blacklist Manager
Blacklist==Blacklist

View File

@ -23,7 +23,6 @@
#-----------------------------------------------------------
#File: Blacklist_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Blacklist Manager==Spravca blacklistu
Blacklist==Blacklist
@ -152,7 +151,6 @@ The maximum cache size is==Maximalna velkost cache je
#-----------------------------------------------------------
#File: Config_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Advanced Config==Pokrocile nastavenia
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
@ -164,7 +162,6 @@ You can change anything, but some options need a restart, and some options can c
#-----------------------------------------------------------
#File: ConfigAdvanced_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Advanced Config==Pokrocile nastavenia
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
@ -216,7 +213,6 @@ location</a> in 10 seconds.==adresu</a> za 10 sekund.
#-------------------------------------------------------
#File: ConfigLanguage_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Language selection==V&yacute;ber jazyka
Language selection==V&yacute;ber jazyka
@ -254,7 +250,6 @@ Comment==Koment&aacute;r
#-------------------------------------------------------
#File: ConfigSkins_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Skin Selection==Vyber skinov
You can change the appearance of YaCy with skins. Select one of the default skins, download new skins, or create your own skin.==Vzhlad YaCy mozete zmenit pomocou skinov. Zvolte jeden z predvytvorenych skinov, stiahnite si nove, alebo vytvorte vlastne skiny.
@ -270,7 +265,6 @@ Error saving the skin.==Chyba pri stahovani skinu.
#-----------------------------------------------------------
#File: Connections_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Connection Tracking==Stav spojenia
Incoming Connections==Prichadzajuce spojenia
@ -286,7 +280,6 @@ Waiting for new request nr.==Caka sa na poziadavku cislo.
#-------------------------------------------------------
#File: CookieMonitorIncoming_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Incoming Cookies Monitor==Sledovanie prichadzajucich cookies
Cookie Monitor: Incoming Cookies==Sledovanie cookies: Prichadzajuce cookies
@ -301,7 +294,6 @@ Cookie==Cookie
#-------------------------------------------------------
#File: CookieMonitorOutgoing_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Outgoing Cookies Monitor==Sledovanie odchadzajucich cookies
Cookie Monitor: Outgoing Cookies==Sledovanie cookies: Odchadzajuce cookies
@ -397,7 +389,6 @@ There is ".html" at the end, which is not included with the Regular Expression.=
#-----------------------------------------------------------
#File: index.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': Search Page==YaCy '#[clientname]#': Vyhladavacia stranka
# NOT USED
@ -499,7 +490,6 @@ show all==zobrazit vsetko
#-------------------------------------------------------
#File: IndexCleaner_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
# NOT USED
#Index Control==Kontrola indexu
@ -558,7 +548,6 @@ Word-Hash:</td>==Hash-slovo:</td>
#-------------------------------------------------------
#File: IndexCreate_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Creation==Tvorba indexu
Start Crawling Job:==Odstartuj crawling:
@ -717,7 +706,6 @@ Busy Peers==Vytazeni peeri
#-------------------------------------------------------
#File: IndexCreateIndexingQueue_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Creation/Indexing Queue==Vytvorenie indexu/Cakacia listina indexu
Index Creation: Indexing Queue==Vytvorenie indexu: Cakacia listina indexu
@ -745,7 +733,6 @@ Fail-Reason==Dovod zlyhania
#-------------------------------------------------------
#File: IndexCreateLoaderQueue_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Creation / Loader Queue==Vytvorenie indexu / Cakacia listina nahravaca
Index Creation: Loader Queue==Vytvorenie indexu: Cakacia listina nahravaca
@ -757,7 +744,6 @@ URL==URL adresa
#-------------------------------------------------------
#File: IndexCreateWWWGlobalQueue_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
# NOT USED
#YaCy '\#\[clientname\]\#': Index Creation / WWW Global Crawl Crawl Queue==YaCy '#[clientname]#': Vytvorenie indexu / Globalna WWW cakacia listina
@ -779,7 +765,6 @@ Anchor Name==Meno kotvy
#-------------------------------------------------------
#File: IndexCreateWWWLocalQueue_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy '\#\[clientname\]\#': Index Creation / WWW Local Crawl Queue==YaCy '#[clientname]#': Vytvorenie indexu / Lokalna WWW cakacia listina
Index Creation: WWW Local Crawl Queue==Vytvorenie indexu: Lokalna WWW cakacia listina
@ -808,7 +793,6 @@ This may take a quite long time.==Toto moze chvilku trvat.
#-------------------------------------------------------
#File: IndexImport_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': Index Import==YaCy '#[clientname]#': Import indexu
Index DB Import==Import databazoveho indexu
@ -873,7 +857,6 @@ Continue==Pokracuj
#-------------------------------------------------------
#File: IndexMonitor.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
#YaCy '#[clientname]#': Index Monitor
Index Monitor Menu==Menu monitoringu indexu
@ -950,7 +933,6 @@ URL==URL adresa
#-------------------------------------------------------
#File: IndexTransfer_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
The local index currently consists of \(at least\) \#\[wcount\]\# reverse word indexes and \#\[ucount\]\# URL references.== Lokalny index momentalne pozostava z (priblizne) #[wcount]# slov a #[ucount]# URL adries.
# NOT USED
@ -975,7 +957,6 @@ Start/Stop Transfer==Start/Stop prenosu
#-------------------------------------------------------
#File: Language_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Language selection==Vyber jazyka
Language selection==Vyber jazyka
@ -995,7 +976,6 @@ Error saving the language file.==Pri ukladani jazykoveho suboru doslo k chybe.
#-------------------------------------------------------
#File: Lab.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
# NOT USED
#YaCy \'\#\[clientname\]\#\': Lab==YaCy '#[clientname]#': Laboratorium
@ -1010,7 +990,6 @@ Configuration</a>==Nastavenia</a>
#-------------------------------------------------------
#File: Messages_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
>Messages==>Spravy
>Date==>Datum
@ -1028,7 +1007,6 @@ I/O error reading message table: ==Vstupno/Vystupna chyba pri citani tabulky spr
#-------------------------------------------------------
#File: MessageSend_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Send message==Posli spravu
You cannot send a message to==Nemozete poslat spravu pre
@ -1053,7 +1031,6 @@ Network</a> page.==stranku siete</a>.
#-------------------------------------------------------
#File: Network.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Network Overview==Prehlad stavu siete
Network Menu==Menu siet
@ -1167,7 +1144,6 @@ add Peer==Pridaj peera
#-------------------------------------------------------
#File: News.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Network Menu==Menu siete
News&nbsp;Overview==Prehlad sprav
@ -1421,7 +1397,6 @@ The network picture below shows how the latest search query was solved by asking
#-------------------------------------------------------
#File: ProxyIndexingMonitor_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Monitor for Proxy Indexing==Monitor indexu pre indexaciu proxy
This is the control page for web pages that your peer has indexed during the current application run-time==Toto je kontrolna stranke pre web stranky, ktore Vas peer indexoval pocas aktualneho behu aplikacie
@ -1467,7 +1442,6 @@ Page.==stranke.
#-------------------------------------------------------
#File: QuickCrawlLink_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': Quick Crawl Link==YaCy '#[clientname]#''#[clientname]#': Rychly Crawl Link
Quick Crawl Link==Rychly Crawl Link
@ -1485,7 +1459,6 @@ Unable to add URL to crawler queue:==Nie je mozne pridat URL adresu do cakacej l
#-------------------------------------------------------
#File: Settings_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': Settings==YaCy '#[clientname]#': Nastavenia
<h2>Settings</h2>==<h2>Nastavenia</h2>
@ -1762,7 +1735,6 @@ You can reach your YaCy server under the new location==Vas YaCy server je pristu
#-------------------------------------------------------
#File: Settings_Admin.inc
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Administration Account Settings==Nastavenia konta administratora
This is the account that restricts access to this 'Settings' page. If you have not customized it yet, you should do so now:==Toto je konto ktore obmedzuje pristum na tuto stranku 'Nastaveni'. Ak ste toto konte este nevytvorili, mali by ste teraz tak urobit.
@ -1773,7 +1745,6 @@ value="submit">==value="Uloz">
#-------------------------------------------------------
#File: Settings_SystemBehaviour.inc
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
System Behaviour Settings==Systemove nastavenia
Auto pop-up of status page on start-up:==Automaticky pop-up stranky stavu pri starte YaCy:
@ -1782,7 +1753,6 @@ Auto pop-up of status page on start-up:==Automaticky pop-up stranky stavu pri st
#-------------------------------------------------------
#File: simple_search.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': Search Page==YaCy '#[clientname]#': Vyhladavacia stranka
"Search for \#\[former\]\#"=="Hladaj #[former]#"
@ -1815,7 +1785,6 @@ from 'late' peers to enrich this search result.==z pomalych peerov na zlepsenie
#-------------------------------------------------------
#File: Status.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
System-, Index- and Peer-Status==Stav systemu, indexu a peera
Welcome to YaCy!==Vitajte v YaCy!
@ -1835,7 +1804,7 @@ Not assigned. The peer must go online to get an address.==Nepriradena. Vas pees
The peer does not go online until you use the proxy to surf the internet,==Vas peer neprejde do online modu pokym nepouzijete proxy na surfovanie v internete,
thus proving that you <i>want</i> to go online.==cim signalizujete ze <i>chcete</i> prejst do online modu.
#---
If you don't know how to configure your system to use a proxy,==Navod ako nakonfigurovat system tak aby ste pouzivali proxy,
If you don't know how to configure your system,==Navod ako nakonfigurovat system,
see the <a==precitajte si <a
installation instructions</a>.==instalacne instrukcie</a>.
#---
@ -1893,7 +1862,6 @@ Last Refresh:==Posledna aktualizacia:
#--------------------------------------------------------
#File: Status_p.inc
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Private System Properties==Sukromne systemove vlastnosti
System Resources==Systemove zdroje
@ -1936,7 +1904,6 @@ Global Crawl Trigger==odchadzajuce vzialene crawly
#-------------------------------------------------------
#File: Steering.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Steering==Ovladanie
Steering Receipt:==Navod na ovladanie
@ -1974,7 +1941,6 @@ user</a> page.==stranky pouzivatelov</a>.
#-------------------------------------------------------
#File: ViewFile.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
YaCy \'\#\[clientname\]\#\': View URL Content==YaCy '#[clientname]#': Zobraz obsah URL adresy
View URL Content==Zobraz obsah URL adresy
@ -2001,7 +1967,6 @@ Original Resource Content==Originalny obsah zdroja
#-------------------------------------------------------
#File: ViewLog_p.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Lines==Riadkov
reversed order==v prevratenom poradi
@ -2009,7 +1974,6 @@ reversed order==v prevratenom poradi
#-------------------------------------------------------
#File: ViewProfile.html
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Remote Peer Profile==Profil vzdialeneho peera
Remote Peer Profile:==Profil vzdialeneho peera:
@ -2090,7 +2054,6 @@ Architecture \(C\) by Michael Peter Christen==Architektur (C) von Michael Peter
#--------------------------------------------------------
#File: env/templates/simpleheader.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Project Home==Domovsk&aacute; str&aacute;nka
Help / Wiki==Pomoc / Wiki
@ -2098,7 +2061,6 @@ Peer Owner Profile==Profi vlastn&iacute;ka peera
#--------------------------------------------------------
#File: env/templates/header.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
# NOT USED
#YaCy&nbsp;-&nbsp;Distributed&nbsp;Web&nbsp;Indexing&nbsp;-&nbsp;Administration==YaCy - Indexovanie Distribuovan&eacute;ho Internetu - Administr&aacute;cia
@ -2156,7 +2118,6 @@ Interface Skins==Nastavenie vzhladu
#--------------------------------------------------------
#File: env/templates/submenuCookie.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Cookie Menu==Cookie Menu
Incoming&nbsp;Cookies==Prichadzajuce cookies
@ -2164,7 +2125,6 @@ Outgoing&nbsp;Cookies==Odchadzajuce cookies
#--------------------------------------------------------
#File: env/templates/submenuIndexControl.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Control Menu==Menu spravy indexu
#Index Administration==Administracia indexu
@ -2173,7 +2133,6 @@ Index Control Menu==Menu spravy indexu
#--------------------------------------------------------
#File: env/templates/submenuIndexCreate.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Index Creation Menu==Menu vytvorenia indexu
Control Queues==Kontrola cakacej listiny
@ -2191,7 +2150,6 @@ Media Crawl Queues==Cakacia listina Media crawlu
#--------------------------------------------------------
#File: env/templates/submenuPerformance.template
#Completely translated. If you find any untranslated string in the webinterface, send it to <admin@yacy-forum.de>
Performance Menu==Menu vykonu
Queues Performance Settings==Cakacia listina nastaveny vykonu

View File

@ -17,7 +17,6 @@ WHERE IS THE DOCUMENTATION?
The complete documentation can be found at:
(English) http://yacy.net/yacy
(Deutsch) http://www.yacy-websuche.de
(Wiki:de) http://www.yacy-websuche.de/wiki/index.php/De:Start
(Wiki:en) http://www.yacy-websearch.net/wiki/index.php/En:Start
@ -72,17 +71,16 @@ ANY MORE CONFIGURATIONS?
- after startup, you see the configuration page in your web browser.
just open http://localhost:8080
all you have to do (should do) is to enter a password for your peer
- You can use YaCy as your web proxy. But you don't need to do that.
- You can use YaCy as your web proxy. This is an option, you don't need to do that.
Simply configure your internet connection to use a proxy at port 8080.
- You can add a YaCy toolbar to your Firefox web browser.
This release contains the yacybar.xpi file from Alexander Schier
and Martin Thelian. Please install this file as a Firefox extension.
CONTACT:
If you have any questions, please do not hesitate to contact the author:
Send an email to Michael Christen (mc@anomic.de) with a meaningful subject
Send an email to Michael Christen (mc@yacy.net) with a meaningful subject
including the word 'yacy' to prevent that your email gets stuck
in my anti-spam filter.
@ -91,5 +89,5 @@ feel free to ask the author for a business proposal to customize YaCy
according to your needs. We also provide integration solutions if the
software is about to be integrated into your enterprise application.
Germany, Frankfurt a.M., 02.12.2006
Germany, Frankfurt a.M., 19.07.2007
Michael Peter Christen

View File

@ -87,7 +87,7 @@ public class URLFetcherStack {
public boolean push(URL url) {
try {
this.db.push(this.db.row().newEntry(
new byte[][] { url.toNormalform().getBytes() }
new byte[][] { url.toNormalform(true, true).getBytes() }
));
this.pushed++;
return true;

View File

@ -731,7 +731,7 @@ public class bookmarksDB {
public Bookmark(String urlHash, URL url){
super();
this.urlHash=urlHash;
entry.put(BOOKMARK_URL, url.toString());
entry.put(BOOKMARK_URL, url.toNormalform(false, true));
tags=new HashSet();
timestamp=System.currentTimeMillis();
}

View File

@ -144,7 +144,7 @@ public class diff {
* <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,{__,_1,__} </code><br>
* <code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;,{__,__,_1} </code><br>
* <ul>
* TODO: some optimisation ideas see the discusion <a href="http://www.yacy-forum.de/viewtopic.php?t=3557">Diff.findDiagonal(..) buggy????</a>
* TODO: some optimisation ideas
* <li>search for a better algorithm on the inet!!! :) </li>
* <li>pass only the part of the matrix where the search takes place - not the whole matrix everytime</li>
* <li>break the inner loop if the rest of the matrix is smaller than minLength (and no diagonal has been found yet) </li>
@ -272,7 +272,7 @@ public class diff {
case diff.Part.ADDED: sb.append("added"); break;
case diff.Part.DELETED: sb.append("deleted"); break;
}
sb.append("\">").append(htmlTools.replaceXMLEntities(ps[j].getString()).replaceAll("\n", "<br />"));
sb.append("\">").append(htmlTools.encodeUnicode2html(ps[j].getString(), true).replaceAll("\n", "<br />"));
sb.append("</span>");
}
sb.append("</p>");

View File

@ -2,99 +2,65 @@ package de.anomic.data;
public class htmlTools {
/** Replaces special characters from a string. Avoids XSS attacks and ensures correct display of
* special characters in non UTF-8 capable browsers.
* @param text a string that possibly contains HTML
* @return the string with all special characters encoded
*/
//[MN]
public static String replaceHTML(String text) {
text = replace(text, xmlentities);
text = replace(text, htmlentities);
return text;
}
/** Replaces special characters from a string. Ensures correct display of
* special characters in non UTF-8 capable browsers.
* @param text a string that possibly contains special characters
* @return the string with all special characters encoded
*/
//[MN]
public static String replaceHTMLEntities(String text) {
text = replace(text, htmlentities);
return text;
}
/** Replaces special characters from a string. Avoids XSS attacks.
* @param text a string that possibly contains HTML
* @return the string without any HTML-tags that can be used for XSS
*/
//[MN]
public static String replaceXMLEntities(String text) {
text = replace(text, xmlentities);
return text;
}
/** Replaces characters in a string with other characters defined in an array.
* @param text a string that possibly contains special characters
* @param entities array that contains characters to be replaced and characters it will be replaced by
* @return the string with all characters replaced by the corresponding character from array
*/
//[FB], changes by [MN]
public static String replace(String text, String[] entities) {
if (text==null) { return null; }
for (int x=0;x<=entities.length-1;x=x+2) {
int p=0;
while ((p=text.indexOf(entities[x],p))>=0) {
text=text.substring(0,p)+entities[x+1]+text.substring(p+entities[x].length());
p+=entities[x+1].length();
}
}
return text;
}
public static String deReplaceHTML(String text) {
text = deReplaceHTMLEntities(text);
text = deReplaceXMLEntities(text);
return text;
}
public static String deReplaceHTMLEntities(String text) {
return deReplace(text, htmlentities);
}
public static String deReplaceXMLEntities(String text) {
return deReplace(text, xmlentities);
}
public static String deReplace(String text, String[] entities) {
//[FB], changes by [MN], re-implemented by [MC]
public static String encodeUnicode2html(String text, boolean includingAmpersand) {
if (text == null) return null;
for (int i=entities.length-1; i>0; i-=2) {
int p = 0;
while ((p = text.indexOf(entities[i])) >= 0) {
text = text.substring(0, p) + entities[i - 1] + text.substring(p + entities[i].length());
p += entities[i - 1].length();
int pos = 0;
StringBuffer sb = new StringBuffer(text.length());
search: while (pos < text.length()) {
// find a (forward) mapping
loop: for (int i = (includingAmpersand) ? 0 : 2; i < mapping.length; i += 2) {
if (text.charAt(pos) != mapping[i].charAt(0)) continue loop;
// found match
sb.append(mapping[i + 1]);
pos++;
continue search;
}
// not found match
sb.append(text.charAt(pos));
pos++;
}
return text;
return new String(sb);
}
public static String decodeHtml2Unicode(String text) {
if (text == null) return null;
int pos = 0;
StringBuffer sb = new StringBuffer(text.length());
search: while (pos < text.length()) {
// find a reverse mapping. TODO: replace matching with hashtable(s)
loop: for (int i = 0; i < mapping.length; i += 2) {
if (pos + mapping[i + 1].length() > text.length()) continue loop;
for (int j = mapping[i + 1].length() - 1; j >= 0; j--) {
if (text.charAt(pos + j) != mapping[i + 1].charAt(j)) continue loop;
}
// found match
sb.append(mapping[i]);
pos = pos + mapping[i + 1].length();
continue search;
}
// not found match
sb.append(text.charAt(pos));
pos++;
}
return new String(sb);
}
//This array contains codes (see http://mindprod.com/jgloss/unicode.html for details)
//that will be replaced. To add new codes or patterns, just put them at the end
//of the list. Codes or patterns in this list can not be escaped with [= or <pre>
public static final String[] xmlentities={
private static final String[] mapping = {
// Ampersands _have_ to be replaced first. If they were replaced later,
// other replaced characters containing ampersands would get messed up.
"\u0026","&amp;", //ampersand
"\"","&quot;", //quotation mark
"\u003C","&lt;", //less than
"\u003E","&gt;", //greater than
};
//This array contains codes (see http://mindprod.com/jgloss/unicode.html for details) and
//patterns that will be replaced. To add new codes or patterns, just put them at the end
//of the list. Codes or patterns in this list can not be escaped with [= or <pre>
public static final String[] htmlentities={
"\\", "&#092;", // Backslash
"\u005E","&#094;", // Caret
@ -230,4 +196,12 @@ public class htmlTools {
"\u00FE","&thorn;",
"\u00FF","&yuml;"
};
public static void main(String[] args) {
String text = "Test-Text mit & um zyklische &uuml; &amp; Ersetzungen auszuschliessen ŠšŸ";
String txet = encodeUnicode2html(text, true);
System.out.println(txet);
System.out.println(decodeHtml2Unicode(txet));
if (decodeHtml2Unicode(txet).equals(text)) System.out.println("correct");
}
}

View File

@ -400,7 +400,7 @@ public final class robotsParser{
httpHeader reqHeaders = new httpHeader();
// adding referer
reqHeaders.put(httpHeader.REFERER, (new URL(robotsURL,"/")).toString());
reqHeaders.put(httpHeader.REFERER, (URL.newURL(robotsURL,"/")).toNormalform(true, true));
if (entry != null) {
oldEtag = entry.getETag();
@ -455,7 +455,7 @@ public final class robotsParser{
redirectionUrlString = redirectionUrlString.trim();
// generating the new URL object
URL redirectionUrl = new URL(robotsURL, redirectionUrlString);
URL redirectionUrl = URL.newURL(robotsURL, redirectionUrlString);
// returning the used httpc
httpc.returnInstance(con);

View File

@ -91,7 +91,6 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
private boolean preformatted = false; //needed for preformatted text
private boolean preformattedSpan = false; //needed for <pre> and </pre> spanning over several lines
private boolean replacedHTML = false; //indicates if method replaceHTML has been used with line already
private boolean replacedCharacters = false; //indicates if method replaceCharachters has been used with line
private boolean table = false; //needed for tables, because they reach over several lines
private int preindented = 0; //needed for indented <pre>s
private int escindented = 0; //needed for indented [=s
@ -178,7 +177,7 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
else {
line+=parseTableProperties(result.substring(lenCellDivider,propEnd-lenAttribDivider).trim()).toString();
}
// quick&dirty fix for http://www.yacy-forum.de/viewtopic.php?t=2825 [MN]
// quick&dirty fix [MN]
if(propEnd > cellEnd){
propEnd = lenCellDivider;
}
@ -707,7 +706,7 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
}
directory = "<table><tr><td><div class=\"WikiTOCBox\">\n" + directory + "</div></td></tr></table>\n";
}
//(http://www.yacy-forum.de/viewtopic.php?t=4034) [MN]
// [MN]
if(!dirElements.isEmpty()){
dirElements.clear();
headlines = 0;
@ -777,14 +776,9 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
public String transformLine(String result, String publicAddress, plasmaSwitchboard switchboard) {
//If HTML has not bee replaced yet (can happen if method gets called in recursion), replace now!
if (!replacedHTML || preformattedSpan){
result = htmlTools.replaceXMLEntities(result);
result = htmlTools.encodeUnicode2html(result, true);
replacedHTML = true;
}
//If special characters have not bee replaced yet, replace now!
if (!replacedCharacters || preformattedSpan){
result = htmlTools.replaceHTMLEntities(result);
replacedCharacters = true;
}
//check if line contains escape symbols([= =]) or if we are in an escape sequence already.
if ((result.indexOf("[=")>=0)||(result.indexOf("=]")>=0)||(escapeSpan)){
@ -837,7 +831,6 @@ public class wikiCode extends abstractWikiParser implements wikiParser {
}
if (!preformatted) replacedHTML = false;
replacedCharacters = false;
if ((result.endsWith("</li>"))||(defList)||(escape)||(preformatted)||(table)||(cellprocessing)) return result;
return result + "<br />";
}

View File

@ -161,7 +161,7 @@ public class htmlFilterContentScraper extends htmlFilterAbstractScraper implemen
private String absolutePath(String relativePath) {
try {
return new URL(root, relativePath).toString();
return URL.newURL(root, relativePath).toNormalform(false, true);
} catch (Exception e) {
return "";
}

View File

@ -93,7 +93,7 @@ public class htmlFilterImageEntry implements Comparable {
// create a total ordering on images with respect on the image size
assert (url != null);
assert (h instanceof htmlFilterImageEntry);
if (this.url.toString().equals(((htmlFilterImageEntry) h).url.toString())) return 0;
if (this.url.toNormalform(true, true).equals(((htmlFilterImageEntry) h).url.toNormalform(true, true))) return 0;
int thc = this.hashCode();
int ohc = ((htmlFilterImageEntry) h).hashCode();
if (thc < ohc) return -1;

View File

@ -900,12 +900,7 @@ public final class httpd implements serverHandler {
// 06.01.2007: decode HTML entities by [FB]
public static String decodeHtmlEntities(String s) {
// replace all entities defined in wikiCode.characters and htmlentities
for (int i=1; i<htmlTools.htmlentities.length; i+=2) {
s = s.replaceAll(htmlTools.htmlentities[i], htmlTools.htmlentities[i - 1]);
}
for (int i=1; i<htmlTools.xmlentities.length; i+=2) {
s = s.replaceAll(htmlTools.xmlentities[i], htmlTools.xmlentities[i - 1]);
}
s = htmlTools.decodeHtml2Unicode(s);
// replace all other
CharArrayWriter b = new CharArrayWriter(s.length());

View File

@ -344,7 +344,7 @@ public final class httpdProxyHandler extends httpdAbstractHandler implements htt
//redirector
if (redirectorEnabled){
synchronized(redirectorProcess){
redirectorWriter.println(url.toString());
redirectorWriter.println(url.toNormalform(false, true));
redirectorWriter.flush();
}
String newUrl=redirectorReader.readLine();

View File

@ -172,7 +172,7 @@ public class indexURLEntry {
public static byte[] encodeComp(URL url, String descr, String author, String tags, String ETag) {
serverCharBuffer s = new serverCharBuffer(200);
s.append(url.toNormalform()).append(10);
s.append(url.toNormalform(false, true)).append(10);
s.append(descr).append(10);
s.append(author).append(10);
s.append(tags).append(10);
@ -248,7 +248,7 @@ public class indexURLEntry {
//System.out.println("author=" + comp.author());
try {
s.append("hash=").append(hash());
s.append(",url=").append(crypt.simpleEncode(comp.url().toNormalform()));
s.append(",url=").append(crypt.simpleEncode(comp.url().toNormalform(false, true)));
s.append(",descr=").append(crypt.simpleEncode(comp.title()));
s.append(",author=").append(crypt.simpleEncode(comp.author()));
s.append(",tags=").append(crypt.simpleEncode(comp.tags()));

View File

@ -95,6 +95,8 @@ public class kelondroObjects {
protected synchronized kelondroObjectsEntry get(final String key, final boolean storeCache) throws IOException {
// load map from cache
assert cache != null;
assert key != null;
kelondroObjectsEntry map = (kelondroObjectsEntry) cache.get(key);
if (map != null) return map;

View File

@ -104,50 +104,76 @@ public class URL {
this("file", "", -1, file.getAbsolutePath());
}
public URL(URL baseURL, String relPath) throws MalformedURLException {
public static URL newURL(String baseURL, String relPath) throws MalformedURLException {
if ((baseURL == null) ||
(relPath.startsWith("http://")) ||
(relPath.startsWith("https://")) ||
(relPath.startsWith("ftp://")) ||
(relPath.startsWith("file://")) ||
(relPath.startsWith("smb://"))) {
return new URL(relPath);
} else {
return new URL(new URL(baseURL), relPath);
}
}
public static URL newURL(URL baseURL, String relPath) throws MalformedURLException {
if ((baseURL == null) ||
(relPath.startsWith("http://")) ||
(relPath.startsWith("https://")) ||
(relPath.startsWith("ftp://")) ||
(relPath.startsWith("file://")) ||
(relPath.startsWith("smb://"))) {
return new URL(relPath);
} else {
return new URL(baseURL, relPath);
}
}
private URL(URL baseURL, String relPath) throws MalformedURLException {
if (baseURL == null) throw new MalformedURLException("base URL is null");
if (relPath == null) throw new MalformedURLException("relPath is null");
int p = relPath.indexOf(':');
String relprotocol = (p < 0) ? null : relPath.substring(0, p).toLowerCase();
if (relprotocol != null && "http.https.ftp.mailto".indexOf(relprotocol) >= 0) {
parseURLString(relPath);
} else if (relprotocol == null || relprotocol.equals("javascript")) {
this.protocol = baseURL.protocol;
this.host = baseURL.host;
this.port = baseURL.port;
this.userInfo = baseURL.userInfo;
if (relPath.toLowerCase().startsWith("javascript:")) {
this.path = baseURL.path;
} else if (relPath.startsWith("/")) {
this.path = relPath;
} else if (baseURL.path.endsWith("/")) {
if (relPath.startsWith("#") || relPath.startsWith("?")) {
throw new MalformedURLException("relative path malformed: " + relPath);
} else {
this.path = baseURL.path + relPath;
}
this.protocol = baseURL.protocol;
this.host = baseURL.host;
this.port = baseURL.port;
this.userInfo = baseURL.userInfo;
if (relPath.toLowerCase().startsWith("javascript:")) {
this.path = baseURL.path;
} else if (
(relPath.startsWith("http://")) ||
(relPath.startsWith("https://")) ||
(relPath.startsWith("ftp://")) ||
(relPath.startsWith("file://")) ||
(relPath.startsWith("smb://"))) {
this.path = baseURL.path;
} else if (relPath.startsWith("/")) {
this.path = relPath;
} else if (baseURL.path.endsWith("/")) {
if (relPath.startsWith("#") || relPath.startsWith("?")) {
throw new MalformedURLException("relative path malformed: " + relPath);
} else {
if (relPath.startsWith("#") || relPath.startsWith("?")) {
this.path = baseURL.path + relPath;
this.path = baseURL.path + relPath;
}
} else {
if (relPath.startsWith("#") || relPath.startsWith("?")) {
this.path = baseURL.path + relPath;
} else {
int q = baseURL.path.lastIndexOf('/');
if (q < 0) {
this.path = relPath;
} else {
int q = baseURL.path.lastIndexOf('/');
if (q < 0) {
this.path = relPath;
} else {
this.path = baseURL.path.substring(0, q + 1) + relPath;
}
this.path = baseURL.path.substring(0, q + 1) + relPath;
}
}
this.quest = baseURL.quest;
this.ref = baseURL.ref;
path = resolveBackpath(path);
identRef();
identQuest();
escape();
} else {
throw new MalformedURLException("unknown protocol: " + relprotocol);
}
this.quest = baseURL.quest;
this.ref = baseURL.ref;
path = resolveBackpath(path);
identRef();
identQuest();
escape();
}
public URL(String protocol, String host, int port, String path) throws MalformedURLException {
@ -182,8 +208,6 @@ public class URL {
matcher.reset(path);
}
/* another version at http://www.yacy-forum.de/viewtopic.php?p=26871#26871 */
return path.equals("")?"/":path;
}
@ -228,7 +252,7 @@ public class URL {
quest = qtmp.substring((qtmp.length() > 0) ? 1 : 0);
}
final static String[] hex = {
private final static String[] hex = {
"%00", "%01", "%02", "%03", "%04", "%05", "%06", "%07",
"%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E", "%0F",
"%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
@ -301,7 +325,8 @@ public class URL {
sbuf.append((char)ch);
} else if (ch == ' ') { // space
sbuf.append("%20");
} else if (ch == '-' || ch == '_' // unreserved
} else if (ch == '&' || ch == ':' // unreserved
|| ch == '-' || ch == '_'
|| ch == '.' || ch == '!'
|| ch == '~' || ch == '*'
|| ch == '\'' || ch == '('
@ -462,15 +487,18 @@ public class URL {
return quest;
}
public String toNormalform() {
return toString(false);
}
public String toString() {
return toString(true);
return toNormalform(false, true);
}
public String toString(boolean includeReference) {
public String toNormalform(boolean stripReference, boolean stripAmp) {
if (stripAmp)
return toNormalform(!stripReference).replaceAll("&amp;", "&");
else
return toNormalform(!stripReference);
}
private String toNormalform(boolean includeReference) {
// generates a normal form of the URL
boolean defaultPort = false;
if (this.protocol.equals("mailto")) {
@ -537,21 +565,24 @@ public class URL {
new String[]{"http://www.anomic.de/home", "ftp://ftp.delegate.org/"},
new String[]{"http://www.anomic.de","mailto:yacy@weltherrschaft.org"},
new String[]{"http://www.anomic.de","javascipt:temp"},
new String[]{null,"http://yacy-websuche.de/wiki/index.php?title=De:IntroInformationFreedom&action=history"},
new String[]{null, "http://diskusjion.no/index.php?s=5bad5f431a106d9a8355429b81bb0ca5&showuser=23585"},
new String[]{null, "http://diskusjion.no/index.php?s=5bad5f431a106d9a8355429b81bb0ca5&amp;showuser=23585"}
};
String environment, url;
de.anomic.net.URL aURL = null;
java.net.URL jURL = null;
de.anomic.net.URL aURL, aURL1;
java.net.URL jURL;
for (int i = 0; i < test.length; i++) {
environment = test[i][0];
url = test[i][1];
try {aURL = de.anomic.net.URL.newURL(environment, url);} catch (MalformedURLException e) {aURL = null;}
if (environment == null) {
try {aURL = new de.anomic.net.URL(url);} catch (MalformedURLException e) {aURL = null;}
try {jURL = new java.net.URL(url);} catch (MalformedURLException e) {jURL = null;}
} else {
try {aURL = new de.anomic.net.URL(new de.anomic.net.URL(environment), url);} catch (MalformedURLException e) {aURL = null;}
try {jURL = new java.net.URL(new java.net.URL(environment), url);} catch (MalformedURLException e) {jURL = null;}
}
// check equality to java.net.URL
if (((aURL == null) && (jURL != null)) ||
((aURL != null) && (jURL == null)) ||
((aURL != null) && (jURL != null) && (!(jURL.toString().equals(aURL.toString()))))) {
@ -559,6 +590,20 @@ public class URL {
System.out.println((jURL == null) ? "jURL rejected input" : "jURL=" + jURL.toString());
System.out.println((aURL == null) ? "aURL rejected input" : "aURL=" + aURL.toString());
}
// check stability: the normalform of the normalform must be equal to the normalform
if (aURL != null) try {
aURL1 = new de.anomic.net.URL(aURL.toNormalform(false, true));
if (!(aURL1.toNormalform(false, true).equals(aURL.toNormalform(false, true)))) {
System.out.println("no stability for url:");
System.out.println("aURL0=" + aURL.toString());
System.out.println("aURL1=" + aURL1.toString());
}
} catch (MalformedURLException e) {
System.out.println("no stability for url:");
System.out.println("aURL0=" + aURL.toString());
System.out.println("aURL1 cannot be computed:" + e.getMessage());
}
}
}
}

View File

@ -77,7 +77,7 @@ public class ResourceInfo implements IResourceInfo {
// generating the url hash
this.url = objectURL;
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
// create the http header object
this.propertyMap = new HashMap(objectInfo);
@ -88,7 +88,7 @@ public class ResourceInfo implements IResourceInfo {
// generating the url hash
this.url = objectURL;
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
// create the http header object
this.propertyMap = new HashMap();

View File

@ -76,7 +76,7 @@ public class ResourceInfo implements IResourceInfo {
// generating the url hash
this.url = objectURL;
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
// create the http header object
this.responseHeader = new httpHeader(null, objectInfo);
@ -88,7 +88,7 @@ public class ResourceInfo implements IResourceInfo {
// generating the url hash
this.url = objectURL;
this.urlHash = plasmaURL.urlHash(this.url.toNormalform());
this.urlHash = plasmaURL.urlHash(this.url.toNormalform(true, true));
this.requestHeader = requestHeaders;
this.responseHeader = responseHeaders;

View File

@ -188,7 +188,7 @@ public class CrawlWorker extends AbstractCrawlWorker implements plasmaCrawlWorke
if (isFolder) {
fullPath = fullPath + "/";
file = "";
this.url = new URL(this.url,fullPath);
this.url = URL.newURL(this.url,fullPath);
}
}

View File

@ -318,7 +318,7 @@ public final class CrawlWorker extends AbstractCrawlWorker {
}
// normalizing URL
redirectionUrlString = new URL(this.url, redirectionUrlString).toNormalform();
redirectionUrlString = new URL(redirectionUrlString).toNormalform(true, true);
// generating the new URL object
URL redirectionUrl = new URL(redirectionUrlString);
@ -351,16 +351,15 @@ public final class CrawlWorker extends AbstractCrawlWorker {
if (redirectedEntry != null) {
// TODO: Here we can store the content of the redirection
// as content of the original URL if some criterias are met
// See: http://www.yacy-forum.de/viewtopic.php?t=1719
//
// plasmaHTCache.Entry newEntry = (plasmaHTCache.Entry) redirectedEntry.clone();
// newEntry.url = url;
// TODO: which http header should we store here?
// TODO: which http header should we store here?
//
// // enQueue new entry with response header
// if (profile != null) {
// cacheManager.push(newEntry);
// }
// cacheManager.push(newEntry);
// }
// htCache = newEntry;
}
}

View File

@ -153,7 +153,7 @@ public abstract class AbstractParser implements Parser{
if (file.isDirectory()) {
result += parseDir(location, prefix, file, doc);
} else try {
URL url = new URL(location, "/" + prefix + "/"
URL url = URL.newURL(location, "/" + prefix + "/"
// XXX: workaround for relative paths within document
+ file.getPath().substring(file.getPath().indexOf(File.separatorChar) + 1)
+ "/" + file.getName());

View File

@ -117,7 +117,7 @@ public class SZParserExtractCallback extends ArchiveExtractCallback {
plasmaParserDocument theDoc;
// workaround for relative links in file, normally '#' shall be used behind the location, see
// below for reversion of the effects
URL url = new URL(doc.getLocation(), this.prefix + "/" + super.filePath);
URL url = URL.newURL(doc.getLocation(), this.prefix + "/" + super.filePath);
String mime = plasmaParser.getMimeTypeByFileExt(super.filePath.substring(super.filePath.lastIndexOf('.') + 1));
if (this.cfos.isFallback()) {
theDoc = this.parser.parseSource(url, mime, null, this.cfos.getContentFile());
@ -129,7 +129,7 @@ public class SZParserExtractCallback extends ArchiveExtractCallback {
Map nanchors = new HashMap(theDoc.getAnchors().size(), 1f);
Iterator it = theDoc.getAnchors().entrySet().iterator();
Map.Entry entry;
String base = doc.getLocation().toNormalform();
String base = doc.getLocation().toNormalform(false, true);
while (it.hasNext()) {
entry = (Map.Entry)it.next();
if (((String)entry.getKey()).startsWith(base + "/")) {

View File

@ -166,7 +166,7 @@ public class tarParser extends AbstractParser implements Parser {
checkInterruption();
// parsing the content
subDoc = theParser.parseSource(new URL(location,"#" + entryName),entryMime,null,subDocTempFile);
subDoc = theParser.parseSource(URL.newURL(location,"#" + entryName),entryMime,null,subDocTempFile);
} catch (ParserException e) {
this.theLogger.logInfo("Unable to parse tar file entry '" + entryName + "'. " + e.getMessage());
} finally {

View File

@ -149,7 +149,7 @@ public class zipParser extends AbstractParser implements Parser {
serverFileUtils.copy(zippedContent,subDocTempFile,entry.getSize());
// parsing the zip file entry
subDoc = theParser.parseSource(new URL(location,"#" + entryName),entryMime,null, subDocTempFile);
subDoc = theParser.parseSource(URL.newURL(location,"#" + entryName),entryMime,null, subDocTempFile);
} catch (ParserException e) {
this.theLogger.logInfo("Unable to parse zip file entry '" + entryName + "'. " + e.getMessage());
} finally {

View File

@ -208,7 +208,7 @@ public final class plasmaCondenser {
htmlFilterImageEntry ientry;
while (i.hasNext()) {
ientry = (htmlFilterImageEntry) i.next();
insertTextToWords((String) ientry.url().toNormalform(), 99, flag_cat_hasimage, wflags);
insertTextToWords((String) ientry.url().toNormalform(false, true), 99, flag_cat_hasimage, wflags);
insertTextToWords((String) ientry.alt(), 99, flag_cat_hasimage, wflags);
}

View File

@ -63,7 +63,7 @@ public class plasmaCrawlEntry {
private String initiator; // the initiator hash, is NULL or "" if it is the own proxy;
// if this is generated by a crawl, the own peer hash in entered
private String urlhash; // the url's hash
private String urlhash; // the url's hash
private String referrer; // the url's referrer hash
private URL url; // the url as string
private String name; // the name of the url, from anchor tag <a>name</a>

View File

@ -454,7 +454,6 @@ public final class plasmaCrawlLURL {
}
// The Cleaner class was provided as "UrldbCleaner" by Hydrox
// see http://www.yacy-forum.de/viewtopic.php?p=18093#18093
public Cleaner makeCleaner() {
return new Cleaner();
}
@ -502,15 +501,15 @@ public final class plasmaCrawlLURL {
remove(entry.hash());
} else if (plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_CRAWLER, comp.url()) ||
plasmaSwitchboard.urlBlacklist.isListed(plasmaURLPattern.BLACKLIST_DHT, comp.url())) {
lastBlacklistedUrl = comp.url().toNormalform();
lastBlacklistedUrl = comp.url().toNormalform(true, true);
lastBlacklistedHash = entry.hash();
serverLog.logFine("URLDBCLEANER", ++blacklistedUrls + " blacklisted (" + ((double) blacklistedUrls / totalSearchedUrls) * 100 + "%): " + entry.hash() + " " + comp.url().toNormalform());
serverLog.logFine("URLDBCLEANER", ++blacklistedUrls + " blacklisted (" + ((double) blacklistedUrls / totalSearchedUrls) * 100 + "%): " + entry.hash() + " " + comp.url().toNormalform(false, true));
remove(entry.hash());
if (blacklistedUrls % 100 == 0) {
serverLog.logInfo("URLDBCLEANER", "Deleted " + blacklistedUrls + " URLs until now. Last deleted URL-Hash: " + lastBlacklistedUrl);
}
}
lastUrl = comp.url().toNormalform();
lastUrl = comp.url().toNormalform(true, true);
lastHash = entry.hash();
}
}

View File

@ -223,7 +223,7 @@ public final class plasmaCrawlStacker {
}
return stackCrawl(
theMsg.url().toString(),
theMsg.url().toNormalform(true, true),
theMsg.referrerhash(),
theMsg.initiator(),
theMsg.name(),

View File

@ -303,7 +303,7 @@ public final class plasmaHTCache {
if (deleteFileandDirs(key, getCachePath(url), msg)) {
try {
// As the file is gone, the entry in responseHeader.db is not needed anymore
this.log.logFinest("Trying to remove responseHeader from URL: " + url.toString());
this.log.logFinest("Trying to remove responseHeader from URL: " + url.toNormalform(false, true));
this.responseHeaderDB.remove(plasmaURL.urlHash(url));
} catch (IOException e) {
resetResponseHeaderDB();
@ -365,7 +365,7 @@ public final class plasmaHTCache {
} else {
URL url = getURL(file);
if (url != null) {
this.log.logFinest("Trying to remove responseHeader for URL: " + url.toString());
this.log.logFinest("Trying to remove responseHeader for URL: " + url.toNormalform(false, true));
this.responseHeaderDB.remove(plasmaURL.urlHash(url));
}
}
@ -507,7 +507,7 @@ public final class plasmaHTCache {
public IResourceInfo loadResourceInfo(URL url) throws UnsupportedProtocolException, IllegalAccessException {
// getting the URL hash
String urlHash = plasmaURL.urlHash(url.toNormalform());
String urlHash = plasmaURL.urlHash(url.toNormalform(true, true));
// loading data from database
Map hdb = this.responseHeaderDB.getMap(urlHash);
@ -976,7 +976,7 @@ public final class plasmaHTCache {
// normalize url
this.nomalizedURLString = url.toNormalform();
this.nomalizedURLString = url.toNormalform(true, true);
try {
this.url = new URL(this.nomalizedURLString);

View File

@ -756,7 +756,7 @@ public final class plasmaParser {
int p = 0;
for (int i = 1; i <= 4; i++) for (int j = 0; j < scraper.getHeadlines(i).length; j++) sections[p++] = scraper.getHeadlines(i)[j];
plasmaParserDocument ppd = new plasmaParserDocument(
new URL(location.toNormalform()),
new URL(location.toNormalform(true, true)),
mimeType,
charSet,
scraper.getKeywords(),
@ -841,7 +841,7 @@ public final class plasmaParser {
loop: while (i.hasNext()) {
o = i.next();
if (o instanceof String) url = (String) o;
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform();
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform(true, true);
else {
assert false;
continue;
@ -874,7 +874,7 @@ public final class plasmaParser {
while (i.hasNext()) {
o = i.next();
if (o instanceof String) url = (String) o;
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform();
else if (o instanceof htmlFilterImageEntry) url = ((htmlFilterImageEntry) o).url().toNormalform(true, true);
else {
assert false;
continue;

View File

@ -331,7 +331,7 @@ public class plasmaParserDocument {
}
try {
url = new URL(u);
u = url.toNormalform();
u = url.toNormalform(true, true);
if (plasmaParser.mediaExtContains(ext)) {
// this is not a normal anchor, its a media link
if (plasmaParser.imageExtContains(ext)) {

View File

@ -87,7 +87,7 @@ public final class plasmaSearchImages {
Map.Entry e = (Map.Entry) i.next();
String nexturlstring;
try {
nexturlstring = new URL((String) e.getKey()).toNormalform();
nexturlstring = new URL((String) e.getKey()).toNormalform(true, true);
addAll(new plasmaSearchImages(sc, serverDate.remainingTime(start, maxTime, 10), new URL(nexturlstring), depth - 1));
} catch (MalformedURLException e1) {
e1.printStackTrace();

View File

@ -112,7 +112,7 @@ public final class plasmaSearchPostOrder {
// take out relevant information for reference computation
indexURLEntry.Components comp = page.comp();
if ((comp.url() == null) || (comp.title() == null)) return;
String[] urlcomps = htmlFilterContentScraper.urlComps(comp.url().toNormalform()); // word components of the url
String[] urlcomps = htmlFilterContentScraper.urlComps(comp.url().toNormalform(true, true)); // word components of the url
String[] descrcomps = comp.title().toLowerCase().split(htmlFilterContentScraper.splitrex); // words in the description
// store everything
@ -173,7 +173,7 @@ public final class plasmaSearchPostOrder {
// first scan all entries and find all urls that are referenced
while (i.hasNext()) {
entry = (Map.Entry) i.next();
paths.put(((indexURLEntry) entry.getValue()).comp().url().toNormalform(), entry.getKey());
paths.put(((indexURLEntry) entry.getValue()).comp().url().toNormalform(true, true), entry.getKey());
//if (path != null) path = shortenPath(path);
//if (path != null) paths.put(path, entry.getKey());
}
@ -183,7 +183,7 @@ public final class plasmaSearchPostOrder {
String shorten;
while (i.hasNext()) {
entry = (Map.Entry) i.next();
shorten = shortenPath(((indexURLEntry) entry.getValue()).comp().url().toNormalform());
shorten = shortenPath(((indexURLEntry) entry.getValue()).comp().url().toNormalform(true, true));
// scan all subpaths of the url
while (shorten != null) {
if (pageAcc.size() <= query.wantedResults) break;
@ -259,7 +259,7 @@ public final class plasmaSearchPostOrder {
String hash, fill;
String[] paths1 = new String[urls.length]; for (int i = 0; i < urls.length; i++) {
fill = ""; for (int j = 0; j < 35 - urls[i].toString().length(); j++) fill +=" ";
paths1[i] = urls[i].toNormalform();
paths1[i] = urls[i].toNormalform(true, true);
hash = plasmaURL.urlHash(urls[i]);
System.out.println("paths1[" + urls[i] + fill +"] = " + hash + ", typeID=" + plasmaURL.flagTypeID(hash) + ", tldID=" + plasmaURL.flagTLDID(hash) + ", lengthID=" + plasmaURL.flagLengthID(hash) + " / " + paths1[i]);
}

View File

@ -308,7 +308,7 @@ public class plasmaSearchRankingProfile {
// prefer hit with 'prefer' pattern
indexURLEntry.Components comp = page.comp();
if (comp.url().toNormalform().matches(query.prefer)) ranking += 256 << coeff_prefer;
if (comp.url().toNormalform(true, true).matches(query.prefer)) ranking += 256 << coeff_prefer;
if (comp.title().matches(query.prefer)) ranking += 256 << coeff_prefer;
// apply 'common-sense' heuristic using references

View File

@ -682,7 +682,7 @@ public class plasmaSnippetCache {
ArrayList result = new ArrayList();
while (i.hasNext()) {
ientry = (htmlFilterImageEntry) i.next();
url = (String) ientry.url().toNormalform();
url = (String) ientry.url().toNormalform(true, true);
desc = (String) ientry.alt();
//result.add(new MediaSnippet("image", url, (desc.length() == 0) ? url : desc, ientry.width() + " x " + ientry.height()));
s = removeAppearanceHashes(url, queryhashes);
@ -882,12 +882,12 @@ public class plasmaSnippetCache {
(snippet.getErrorCode() == ERROR_RESOURCE_LOADING) ||
(snippet.getErrorCode() == ERROR_PARSER_FAILED) ||
(snippet.getErrorCode() == ERROR_PARSER_NO_LINES)) {
log.logInfo("error: '" + snippet.getError() + "', remove url = " + snippet.getUrl().toNormalform() + ", cause: " + snippet.getError());
log.logInfo("error: '" + snippet.getError() + "', remove url = " + snippet.getUrl().toNormalform(false, true) + ", cause: " + snippet.getError());
sb.wordIndex.loadedURL.remove(urlHash);
sb.wordIndex.removeHashReferences(queryhashes, urlHash);
}
if (snippet.getErrorCode() == ERROR_NO_MATCH) {
log.logInfo("error: '" + snippet.getError() + "', remove words '" + querystring + "' for url = " + snippet.getUrl().toNormalform() + ", cause: " + snippet.getError());
log.logInfo("error: '" + snippet.getError() + "', remove words '" + querystring + "' for url = " + snippet.getUrl().toNormalform(false, true) + ", cause: " + snippet.getError());
sb.wordIndex.removeHashReferences(snippet.remaingHashes, urlHash);
}
return snippet.getError();

View File

@ -2444,7 +2444,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
sbStackCrawlThread.enqueue(nextUrl, entry.urlHash(), initiatorPeerHash, (String) nextEntry.getValue(), docDate, entry.depth() + 1, entry.profile());
} catch (MalformedURLException e1) {}
}
log.logInfo("CRAWL: ADDED " + hl.size() + " LINKS FROM " + entry.normalizedURLString() +
log.logInfo("CRAWL: ADDED " + hl.size() + " LINKS FROM " + entry.url().toNormalform(false, true) +
", NEW CRAWL STACK SIZE IS " + noticeURL.stackSize(plasmaCrawlNURL.STACK_TYPE_CORE));
}
stackEndTime = System.currentTimeMillis();
@ -2471,7 +2471,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
indexingStartTime = System.currentTimeMillis();
checkInterruption();
log.logFine("Condensing for '" + entry.normalizedURLString() + "'");
log.logFine("Condensing for '" + entry.url().toNormalform(false, true) + "'");
plasmaCondenser condenser = new plasmaCondenser(document, entry.profile().indexText(), entry.profile().indexMedia());
// generate citation reference
@ -2575,8 +2575,8 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
String language = plasmaURL.language(entry.url());
char doctype = plasmaURL.docType(document.getMimeType());
indexURLEntry.Components comp = newEntry.comp();
int urlLength = comp.url().toNormalform().length();
int urlComps = htmlFilterContentScraper.urlComps(comp.url().toNormalform()).length;
int urlLength = comp.url().toNormalform(true, true).length();
int urlComps = htmlFilterContentScraper.urlComps(comp.url().toNormalform(true, true)).length;
// iterate over all words
Iterator i = condenser.words().entrySet().iterator();
@ -2672,12 +2672,12 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
// if this was performed for a remote crawl request, notify requester
if ((processCase == PROCESSCASE_6_GLOBAL_CRAWLING) && (initiatorPeer != null)) {
log.logInfo("Sending crawl receipt for '" + entry.normalizedURLString() + "' to " + initiatorPeer.getName());
log.logInfo("Sending crawl receipt for '" + entry.url().toNormalform(false, true) + "' to " + initiatorPeer.getName());
if (clusterhashes != null) initiatorPeer.setAlternativeAddress((String) clusterhashes.get(initiatorPeer.hash));
yacyClient.crawlReceipt(initiatorPeer, "crawl", "fill", "indexed", newEntry, "");
}
} else {
log.logFine("Not Indexed Resource '" + entry.normalizedURLString() + "': process case=" + processCase);
log.logFine("Not Indexed Resource '" + entry.url().toNormalform(false, true) + "': process case=" + processCase);
addURLtoErrorDB(entry.url(), referrerUrlHash, initiatorPeerHash, docDescription, plasmaCrawlEURL.DENIED_UNKNOWN_INDEXING_PROCESS_CASE, new kelondroBitfield());
}
} catch (Exception ee) {
@ -2956,7 +2956,7 @@ public final class plasmaSwitchboard extends serverAbstractSwitch implements ser
urlname = "http://share." + seed.getName() + ".yacy" + filename;
if ((p = urlname.indexOf("?")) > 0) urlname = urlname.substring(0, p);
} else {
urlstring = comp.url().toNormalform();
urlstring = comp.url().toNormalform(false, true);
urlname = urlstring;
}

View File

@ -268,10 +268,6 @@ public class plasmaSwitchboardQueue {
return url;
}
public String normalizedURLString() {
return url.toNormalform();
}
public String urlHash() {
return plasmaURL.urlHash(url);
}
@ -365,7 +361,7 @@ public class plasmaSwitchboardQueue {
return "Indexing_Not_Allowed";
}
String nURL = normalizedURLString();
String nURL = url.toNormalform(true, true);
// -CGI access in request
// CGI access makes the page very individual, and therefore not usable in caches
if (!profile().crawlingQ()) {
@ -420,7 +416,7 @@ public class plasmaSwitchboardQueue {
return "Indexing_Not_Allowed";
}
final String nURL = normalizedURLString();
final String nURL = url().toNormalform(true, true);
// -CGI access in request
// CGI access makes the page very individual, and therefore not usable in caches
if (!profile().crawlingQ()) {

View File

@ -460,7 +460,7 @@ public class plasmaURL {
// combine the attributes
StringBuffer hash = new StringBuffer(12);
// form the 'local' part of the hash
hash.append(kelondroBase64Order.enhancedCoder.encode(serverCodings.encodeMD5Raw(url.toNormalform())).substring(0, 5)); // 5 chars
hash.append(kelondroBase64Order.enhancedCoder.encode(serverCodings.encodeMD5Raw(url.toNormalform(true, true))).substring(0, 5)); // 5 chars
hash.append(subdomPortPath(subdom, port, rootpath)); // 1 char
// form the 'global' part of the hash
hash.append(protocolHostPort(url.getProtocol(), host, port)); // 5 chars

View File

@ -279,7 +279,7 @@ public final class plasmaWordIndex implements indexRI {
// use all the words in one condenser object to simultanous create index entries
int wordCount = 0;
int urlLength = url.toString().length();
int urlLength = url.toNormalform(true, true).length();
int urlComps = htmlFilterContentScraper.urlComps(url.toString()).length;
// iterate over all words of context text
@ -542,7 +542,6 @@ public final class plasmaWordIndex implements indexRI {
}
// The Cleaner class was provided as "UrldbCleaner" by Hydrox
// see http://www.yacy-forum.de/viewtopic.php?p=18093#18093
public synchronized Cleaner makeCleaner(plasmaCrawlLURL lurl, String startHash) {
return new Cleaner(lurl, startHash);
}

View File

@ -167,8 +167,6 @@ public final class serverCharBuffer extends Writer {
// do not use/implement the following method, a
// "overridden method is a bridge method"
// will occur
// see also: http://www.yacy-forum.de/viewtopic.php?p=26407#26407
// and http://www.yacy-forum.de/viewtopic.php?t=2833
// public serverCharBuffer append(char b) {
// write(b);
// return this;

View File

@ -90,7 +90,7 @@ public class serverObjects extends Hashtable implements Cloneable {
* like put, but it replaces any HTML special chars.
*/
public Object putSafeXML(Object key, String value){
return put(key, htmlTools.replaceXMLEntities(value));
return put(key, htmlTools.encodeUnicode2html(value, true));
}
// new put takes also null values

View File

@ -169,9 +169,9 @@ public class SearchService extends AbstractService
// Postprocess search ...
int count = Integer.valueOf(searchResult.get("type_results","0")).intValue();
for (int i=0; i < count; i++) {
searchResult.put("type_results_" + i + "_url",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_url","")));
searchResult.put("type_results_" + i + "_description",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_description","")));
searchResult.put("type_results_" + i + "_urlname",htmlTools.replaceXMLEntities(searchResult.get("type_results_" + i + "_urlname","")));
searchResult.put("type_results_" + i + "_url",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_url",""), false));
searchResult.put("type_results_" + i + "_description",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_description",""), true));
searchResult.put("type_results_" + i + "_urlname",htmlTools.encodeUnicode2html(searchResult.get("type_results_" + i + "_urlname",""), true));
}
// format the result

View File

@ -749,12 +749,12 @@ public final class yacyClient {
yacyNetwork.enrichRequestPost(post, plasmaSwitchboard.getSwitchboard(), target.hash);
post.put("process", "crawl");
if (url.length == 1) {
post.put("url", crypt.simpleEncode(url[0].toString()));
post.put("referrer", crypt.simpleEncode((referrer[0] == null) ? "" : referrer[0].toString()));
post.put("url", crypt.simpleEncode(url[0].toNormalform(true, true)));
post.put("referrer", crypt.simpleEncode((referrer[0] == null) ? "" : referrer[0].toNormalform(true, true)));
} else {
for (int i=0; i< url.length; i++) {
post.put("url" + i, crypt.simpleEncode(url[i].toString()));
post.put("ref" + i, crypt.simpleEncode((referrer[i] == null) ? "" : referrer[i].toString()));
post.put("url" + i, crypt.simpleEncode(url[i].toNormalform(true, true)));
post.put("ref" + i, crypt.simpleEncode((referrer[i] == null) ? "" : referrer[i].toNormalform(true, true)));
}
}
post.put("depth", "0");

View File

@ -144,16 +144,16 @@ public final class yacyVersion implements Comparator, Comparable {
public boolean equals(Object obj) {
yacyVersion v = (yacyVersion) obj;
return (this.svn == v.svn) && (this.url.toNormalform().equals(v.url.toNormalform()));
return (this.svn == v.svn) && (this.url.toNormalform(true, true).equals(v.url.toNormalform(true, true)));
}
public int hashCode() {
return this.url.toNormalform().hashCode();
return this.url.toNormalform(true, true).hashCode();
}
public String toAnchor() {
// generates an anchor string that can be used to embed in an html for direct download
return "<a href=" + this.url.toNormalform() + ">YaCy " + ((this.proRelease) ? "pro release" : "standard release") + " v" + this.releaseNr + ", SVN " + this.svn + "</a>";
return "<a href=" + this.url.toNormalform(true, true) + ">YaCy " + ((this.proRelease) ? "pro release" : "standard release") + " v" + this.releaseNr + ", SVN " + this.svn + "</a>";
}
// static methods:
@ -215,36 +215,54 @@ public final class yacyVersion implements Comparator, Comparable {
// check if we know that there is a release that is more recent than that which we are using
TreeSet[] releasess = yacyVersion.allReleases(true); // {0=promain, 1=prodev, 2=stdmain, 3=stddev}
boolean pro = new File(sb.getRootPath(), "libx").exists();
yacyVersion latestmain = (yacyVersion) releasess[(pro) ? 0 : 2].last();
yacyVersion latestdev = (yacyVersion) releasess[(pro) ? 1 : 3].last();
yacyVersion latestmain = (releasess[(pro) ? 0 : 2].size() == 0) ? null : (yacyVersion) releasess[(pro) ? 0 : 2].last();
yacyVersion latestdev = (releasess[(pro) ? 1 : 3].size() == 0) ? null : (yacyVersion) releasess[(pro) ? 1 : 3].last();
String concept = sb.getConfig("update.concept", "any");
String blacklist = sb.getConfig("update.blacklist", ".\\...[123]");
if ((manual) || (concept.equals("any"))) {
// return a dev-release or a main-release
if ((latestdev.compareTo(latestmain) > 0) && (!(Float.toString(latestdev.releaseNr).matches(blacklist)))) {
if (latestdev.compareTo(thisVersion()) > 0) return latestdev; else {
yacyCore.log.logInfo("rulebasedUpdateInfo: latest dev " + latestdev.name + " is not more recent than installed release " + thisVersion().name);
if ((latestdev != null) &&
((latestmain == null) || (latestdev.compareTo(latestmain) > 0)) &&
(!(Float.toString(latestdev.releaseNr).matches(blacklist)))) {
// consider a dev-release
if (latestdev.compareTo(thisVersion()) > 0) {
return latestdev;
} else {
yacyCore.log.logInfo(
"rulebasedUpdateInfo: latest dev " + latestdev.name +
" is not more recent than installed release " + thisVersion().name);
return null;
}
} else {
}
if (latestmain != null) {
// consider a main release
if ((Float.toString(latestmain.releaseNr).matches(blacklist))) {
yacyCore.log.logInfo("rulebasedUpdateInfo: latest dev " + latestdev.name + " matches with blacklist '" + blacklist + "'");
yacyCore.log.logInfo(
"rulebasedUpdateInfo: latest dev " + latestdev.name +
" matches with blacklist '" + blacklist + "'");
return null;
}
if (latestmain.compareTo(thisVersion()) > 0) return latestmain; else {
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " is not more recent than installed release (1) " + thisVersion().name);
yacyCore.log.logInfo(
"rulebasedUpdateInfo: latest main " + latestmain.name +
" is not more recent than installed release (1) " + thisVersion().name);
return null;
}
}
}
if (concept.equals("main")) {
if ((concept.equals("main")) && (latestmain != null)) {
// return a main-release
if ((Float.toString(latestmain.releaseNr).matches(blacklist))) {
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " matches with blacklist'" + blacklist + "'");
yacyCore.log.logInfo(
"rulebasedUpdateInfo: latest main " + latestmain.name +
" matches with blacklist'" + blacklist + "'");
return null;
}
if (latestmain.compareTo(thisVersion()) > 0) return latestmain; else {
yacyCore.log.logInfo("rulebasedUpdateInfo: latest main " + latestmain.name + " is not more recent than installed release (2) " + thisVersion().name);
yacyCore.log.logInfo(
"rulebasedUpdateInfo: latest main " + latestmain.name +
" is not more recent than installed release (2) " + thisVersion().name);
return null;
}
}

View File

@ -906,10 +906,10 @@ public final class yacy {
indexURLEntry.Components comp = entry.comp();
if ((entry != null) && (comp.url() != null)) {
if (html) {
bos.write(("<a href=\"" + comp.url().toNormalform() + "\">" + comp.title() + "</a><br>").getBytes("UTF-8"));
bos.write(("<a href=\"" + comp.url().toNormalform(false, true) + "\">" + comp.title() + "</a><br>").getBytes("UTF-8"));
bos.write(serverCore.crlf);
} else {
bos.write(comp.url().toNormalform().getBytes());
bos.write(comp.url().toNormalform(false, true).getBytes());
bos.write(serverCore.crlf);
}
}
@ -1037,9 +1037,8 @@ public final class yacy {
}
/**
* Searching for peers affected by Bug documented in <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
* Searching for peers affected by Bug
* @param homePath
* @see <a href="http://www.yacy-forum.de/viewtopic.php?p=16056#16056">YaCy-Forum Posting 16056</a>
*/
public static void testPeerDB(String homePath) {

View File

@ -27,7 +27,7 @@ Echo **** (C) by Michael Peter Christen, usage granted under the GPL Version 2
Echo **** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****
Echo ** LOG of YaCy: DATA/LOG/yacy00.log (and yacy^<xx^>.log) **
Echo ** STOP YaCy: execute stopYACY.bat and wait some seconds **
Echo ** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **
Echo ** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **
Echo *******************************************************************************
Echo ^>^> YaCy started as daemon process. Administration at http://localhost:%port% ^<^<

View File

@ -5,7 +5,7 @@ echo "**** (C) by Michael Peter Christen, usage granted under the GPL Version 2
echo "**** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****"
echo "** LOG of YaCy: DATA/LOG/yacy00.log (and yacy<xx>.log) **"
echo "** STOP YaCy: execute stopYACY.sh and wait some seconds **"
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **"
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **"
echo "*******************************************************************************"
echo " >> YaCy started as daemon process. Administration at http://localhost:8080 <<"
echo " You can close this window now, this will NOT shut down your YaCy peer."

View File

@ -124,7 +124,7 @@ else
echo "**** USE AT YOUR OWN RISK! Project home and releases: http://yacy.net/yacy ****"
echo "** LOG of YaCy: DATA/LOG/yacy00.log (and yacy<xx>.log) **"
echo "** STOP YaCy: execute stopYACY.sh and wait some seconds **"
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and www.yacy-forum.de **"
echo "** GET HELP for YaCy: see www.yacy-websearch.net/wiki and forum.yacy.de **"
echo "*******************************************************************************"
echo " >> YaCy started as daemon process. Administration at http://localhost:8080 << "
eval $cmdline

View File

@ -5,7 +5,6 @@ public class ParseVersion extends TestCase {
/**
* Test method for 'yacy.combinedVersionString2PrettyString(String)'
* @author Bost
* @link <a href="http://www.yacy-forum.de/viewtopic.php?t=2717">yacy-forum.de: ne Verbesserung von combinedVersionString2PrettyString(...)</a>
*/
public void testCombinedVersionString2PrettyString() {
assertEquals("dev/00000", yacy.combined2prettyVersion("")); // not a number