yacy_search_server/doc/News.html
orbiter 578c2ef130 release 0.52
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-11 22:12:29 +00:00

1180 lines
64 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>YaCy: News</title>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
<!-- <meta name="Content-Language" content="German, Deutsch, de, at, ch"> -->
<meta name="Content-Language" content="English, Englisch">
<meta name="keywords" content="YaCy HTTP Proxy search engine spider indexer java network open free download Mac Windwos Software development">
<meta name="description" content="YaCy P2P Web Search: News and Change-Log">
<meta name="copyright" content="Michael Christen">
<script src="navigation.js" type="text/javascript"></script>
<link rel="stylesheet" media="all" href="style.css">
<link rel="alternate" type="application/rss+xml" title="RSS" href="News.rss" />
<!-- Realisation: Michael Christen; Contact: mc<at>anomic.de-->
</head>
<body bgcolor="#fefefe" marginheight="0" marginwidth="0" leftmargin="0" topmargin="0">
<SCRIPT LANGUAGE="JavaScript1.1"><!--
globalheader();
//--></SCRIPT>
<NOSCRIPT>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
<tr><td></td></tr>
<tr><td height="1" bgcolor="#000000"></td></tr>
<tr><td>
<!-- start headline -->
<table bgcolor="#4070A0" border="0" cellspacing="0" cellpadding="0" width="100%">
<tr><td width="180" height="80" rowspan="3"><a href="http://www.yacy.net"><img border="0" src="grafics/yacy.gif" align="top"></a></td>
<td></td><td width="120"></td></tr>
</table>
<!-- end headline -->
</td></tr>
<tr><td height="2"></td></tr>
<tr><td>
<table border="0" cellspacing="0" cellpadding="0" width="100%">
<tr>
<td width="100" valign="top">
<!-- start lmenue -->
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr><td height="2"></td></tr>
<tr><td height="20" class="white" bgcolor="#BDCDD4" valign="middle">&nbsp;<a href="index.html" class="dark">Main Index</a></td></tr>
</table>
<!-- end lmenue -->
</td>
<td width="10" valign="top"></td>
<td valign="top">
<table border="0" cellspacing="0" cellpadding="0" width="100%">
<tr><td height="2"></td></tr>
<tr><td><br>
</NOSCRIPT>
<!-- ----- HERE STARTS CONTENT PART ----- -->
<h2>News</h2>
<p>This is essentially the release change-log. We have a <a href="http://www.yacy-websuche.de/wiki/index.php/Dev:Roadmap">release roadmap</a> and releases published here will (hopefully) match the milestones from the roadmap's vision.
<p>Release list in reverse order:
<!--
<br><p>
<ul>
<li></li>
<ul>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
<li></li>
<ul>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
<li></li>
<ul>
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
</ul>
-->
<br><p><a name="3715">v0.52_20070512_3715</a>
<ul>
<li>New Functions</li>
<ul>
<li>Added exclusion-search (a search with '-' to exclude specific words from the search results)</li>
<li>Added extraction of sitemap-url from robots.txt, which can be used for crawl starts</li>
<li>Added a network configuration menu for new cluster configuration functions: a set of peers may now operate as an isle within the YaCy network, without exchange of index data over the border of the isle. Peers within the cluster can trigger internal remote crawls and search only within the own cluster.</li>
<li>Added a postscript parser</li>
</ul>
<li>Interface Enhancements</li>
<ul>
<li>Redesigned the status page, shows now also hints and warnings</li>
<li>Better layout for image search results</li>
<li>The peer profile can now be displayed as vcard, e.g. http://localhost:8080/ViewProfile.vcf?hash=localhash</li>
</ul>
<li>Performance Enhancements</li>
<ul>
<li>Added an option to configure a path to a secondary index location.
This shall be used to store a fragment of the index on another physical device,
to split IO load and enhance access speed. The index is splitted in such a way
that the LURLs are stored to the secondary location, and the RWIs to the primary
location.</li>
<li>Optimized memory allocation when accessing the web-index (now half of memory throughput as before)</li>
<li>Fixed bugs in database engine that corrupted the data when entries had been removed</li>
</ul>
</ul>
<br><p><a name="3501">v0.51_20070321_3501</a>
<ul>
<li>Better Crawling</li>
<ul>
<li>Higher crawling speed possible caused by better ram cache flush methods</li>
<li>The crawl balancer now has a security function which prevents that remote web servers are accessed more than two times in one second. In case a crawling from a single domain is made, this means the crawling speed is restricted to not more than 120 pages per minute</li>
<li>The crawl balancer chooses better urls. Newly added urls are now prevented from beeing hidden by masses of links generated by the crawler. The effect is that in most cases the security function described above is not needed.</li>
<li>Added a crawling speed button on the crawling monitor page.</li>
<li>Crawl targets get informed about the yacy bot; a link to http://yacy.net/yacy/bot.html is attached to each crawl request; the page explains YaCy and that YaCy respects robots.txt</li>
</ul>
<li>Better Monitoring</li>
<ul>
<li>New search result page SearchStatistics_p.html shows local and remote search requests; remote requests are anonymized</li>
<li>Added network-wide QPM (queries per minute) computation to show how much the network is used for web search. The statistics are not reported from searching peers, but from searched peers; therefore the accumulation preserves privacy of the searcher</li>
<li>New page LogStatistics_p.html which shows an evaluation of entries from the log.</li>
<li>New page BlacklistCleaner_p.html to clean up wrong blacklist entries. The page allows categorization of blacklist error case, correction of error and the optional deletion of the blacklist entry.</li>
<li>Added RSS feed for YaCyNews</li>
</ul>
<li>Enhanced User Interface</li>
<ul>
<li>Added a robots.txt configuration menu to enable/disable external crawlers to access the yacy user interface</li>
<li>New wiki-parser</li>
<li>Blog entries may now have user-comments</li>
<li>The network list page now provides links to the users blog pages</li>
<li>The menu points had been rearranged</li>
</ul>
<li>Less Memory Usage and Better Memory Management</li>
<ul>
<li>All caches (node cache, object cache) now have enhanced self-organization and dont need fixed size assigments</li>
<li>Memory protection by disallowing collection arrays beyond kca-7. Collections larger than those are written to 'common' files.</li>
<li>The network picture uses less memory</li>
</ul>
<li>Bugfixes: a very large number of bugfixes were made.</li>
</ul>
<br><p><a name="3124">v0.50_20061222_3124</a>
<ul>
<li>Added Media Search</li>
<ul>
<li>Added search pages for Images, Audio, Video and Application search.</li>
<li>Added media link presentation during snippet fetch; the Image Search presents search results as image thumbnails.</li>
<li>Better recognition of search hits for text snippet generation.</li>
<li>Media search results are indexed again after remote search results are collected; only media links are used to update the index.</li>
</ul>
<li>Better Result Ranking</li>
<ul>
<li>New ranking parameters and appearance attributes are now considered.</li>
<li>Faster ranking; more references can be ranked and sorted within given search time.</li>
<li>Ranking Parameters can be handed over to remote peers and are applied there.</li>
<li>Adopted Detailed Search to new ranking parameters.</li>
<li>Coefficients from detailed search can be set as default ranking for search page; this replaces the old ranking alternatives.</li>
</ul>
<li>Better Crawl Monitoring</li>
<ul>
<li>After a crawl start was initialized, the Crawler Monitor is shown.</li>
<li>The Crawl Monitor now shows all queue elements in one table.</li>
<li>Monitoring of index size.</li>
<li>The Crawl Profiles are shown; crawls can be interrupted within the profile table.</li>
<li>A crawl may now distinguish between text indexing and media link indexing.</li>
</ul>
<li>Migration to new Database Structure</li>
<ul>
<li>The new Collection Database is now the only database structure that can be used; Assortments are switched off.</li>
<li>Added functions to migrate Assortment databases and WORDS databases to Collection database.</li>
<li>Removed all methods to write Assortment data structures.</li>
<li>Migrated DHT position computation to base64-decoded values; this changes the DHT structure slightly and closes the gaps in the old DHT structure.</li>
</ul>
</ul>
<br><p><a name="3040">v0.49_20061202_3040</a>
<ul>
<li>Enhanced search service</li>
<ul>
<li>Web searches are faster because of the new data structures implemented in this version (see below)
and because bugs had been found and fixed.</li>
<li>Searches can be re-done with changed search properties. Please use the 'more options' link at the search page.</li>
<li>Added search constraints. These are search restrictions to web searches which are applied to information that is scraped
from the web pages during page parsing. The first application of search constraints is a search restricted to
index pages ('index of'). Please use the flag at the extended search functions.</li>
<li>Enabled index-abstracts search; this should solve the distributed-combined search challenge (still beeing tested).</li>
</ul>
<li>New Database Structures for Index and URL storage</li>
<ul>
<li>The new 'Collections' Data Structure is now the default data structure.</li>
<li>Index entries and URL entries carry more ranking and selection attributes, e.g. for image, video, audio and application search.</li>
<li>Enhanced Storage of URLs: they are now divided into different creation times. This enables easy deletion
of outdated URLs, enables a index-limitation function and solves the problem that the URL database was too
big to fit into a 2 GB file.</li>
<li>Search requests can now be answered in less time.</li>
<li>The index organization needs less IO.</li>
<li>Index transfers will now only be done to latest peers supporting the collection data structure.</li>
<li>Index transfers from old peers to new peers are translated automatically to new data format.</li>
<li>Assortments are no longer supported.</li>
</ul>
<li>Enhanced SOAP support</li>
<ul>
<li>Added protocol for peer administration, custom services, status queries, blacklist management,
file share management, support for outgoing transfer- and content-encoding, better error handling,
function to get and set message forwarding, handling of YaCy bookmarks, log display,
manage peer messages, get and set peer profile, query peer status, query the pause/resume state of the crawling queues,
and a check if a specific URL is blacklisted.</li>
<li>Added new ANT target to allow generation of client stub classes for YaCy SOAP api.</li>
</ul>
<li>Other new Features</li>
<ul>
<li>Added DNS-cache-miss caching.</li>
<li>Added Flash (experimental), MS Excel and Powerpoint parser.</li>
<li>New mint-green and dark skin.</li>
<li>Better non-7bit ascii character support.</li>
<li>Added ant support for rpms.</li>
<li>Added ant target for windows installer.</li>
<li>Added template to display file share in xml format.</li>
<li>Better object caching for kelondro database (combined read/write object cache with synergy effects).</li>
<li>More anonymization in logging.</li>
<li>New HTCACHE layout using files hashes; tree- and hash-layout can be used simultanously; hash-layout is now default.</li>
<li>Access to wiki is now limited to administrator, if wanted. This can be configured at the wiki page.</li>
<li>..and many bugfixes.</>
</ul>
<li>New 'satellite' Projects: these applications work as service applications for the YaCy application (start-up/experimental status)</li>
<ul>
<li>YaCy admin: a swing-based client, that is able to administrate yacy using the SOAP interface.</li>
<li>YaCy Screen Saver: presentation of the peer status in a screen saver</li>
<li>YaCy Updater: automated donwloads/updates</li>
<li>YaCy logalizer: analyzer for the YaCy log</li>
</ul>
</ul>
<br><p><a name="2743">v0.48_20061010_2743</a>
<ul>
<li>New Features</li>
<ul>
<li>Optional web cache organisation using url-hashes</li>
<li>Optional filter in online log and different colors in log messages</li>
<li>Indexing of files that are loaded for snippet-generation</li>
<li>New delete button for crawl profiles</li>
<li>New -t(aillog) option, to start monitoring the log after startup which can be stopped with ctrl+c, without stopping yacy</li>
<li>Added protocol for YaCy Bookmarks export/input: posts/get from the del.icio.us api</li>
</ul>
<li>Enhanced Features</li>
<ul>
<li>Better snippet - generation</li>
<li>Faster web search</li>
<li>Surftipps appear on their own web page</li>
</ul>
<li>Bugfixes</li>
<ul>
<li>Remote search did not work because of a bug in abstract generation</li>
<li>Better handling of large files and files with exotic charsets during parsing</li>
<li>Better UTF-8 handling</li>
<li>Files fetched for snippets can now be monitored in Cache Monitor</li>
<li>Better handling of objects that can be hold in memory instead of writing them to file caches</li>
</ul>
</ul>
<br><p><a name="2665">v0.47_20060927_2665</a>
<ul>
<li>New Features</li>
<ul>
<li>Exchanged complete HTML (table-based) interface by new XHTML interface pages</li>
<li>Added Crawler/Indexing Monitor</li>
<li>Added Surftipps and voting for surftipps, public bookmarks and other public links</li>
</ul>
<li>Technical Enhancements</li>
<ul>
<li>Added IndexAbstracts Interface for join-Search (not yet working)</li>
<li>Added Crawler for FTP</li>
<li>Added fully controlled DHT cache</li>
<li>Added write cache for LURLs</li>
<li>Enabled new database structures; they are switched off until testing is finished</li>
<li>Refactoring of parser and charset recognition</li>
<li>Updated versions of external parser</li>
<li>Enhanced YaCy SOAP services</li>
</ul>
<li>Bugfixes</li>
<ul>
<li>Fixed wrong ranking calculation</li>
<li>Fixed version number presentation</li>
<li>UTF-8 Bugfixes, better overall handling of non-ascii characters</li>
<li>Many minor bugfixes</li>
</ul>
</ul>
<br><p><a name="2442">v0.46_20060823_2442</a>
<ul>
<li>Web Interface Enhancements</li>
<ul>
<li>added localisation: Italian language file</li>
<li>added localisation: Slovak language file</li>
<li>enhancements to YaCyBlog; added news generation in case of new blog entries; added xml export</li>
<li>enhancements to YaCyWiki</li>
<li>added interface for customised blacklist classes</li>
<li>enhancements for dir.html application: dirlisting for all empty directories, new place in htroot/htdocsdefault</li>
<li>Interface YPStats_p.html for http://ypstats.yacy-forum.de/index.php to collect statistics</li>
</ul>
<li>Enhanced Stability</li>
<ul>
<li>enhanced p2p connection stability; no disconnections from peers that are simply busy</li>
<li>better synchronisation and protection against deadlocks</li>
<li>better memory management; overall protection against OutOfMemoryError occurrences</li>
</ul>
<li>Enhanced Security</li>
<ul>
<li>protection against too long authentication strings</li>
<li>applied blacklists to URLs that are received by DHT</li>
<li>added additional user authentication login methods (form-login, html-login) to httpd</li>
<li>added native SSL support to http; implemented PKCS12 certificate support for https</li>
<li>more restrictions to index receive to get secure against index overload from other peers; overloaded peers send a
timeout message which the other peer considers until new index transmissions</li>
</ul>
<li>Database and Indexing Enhancements</li>
<ul>
<li>added new caching structures that use less RAM (row-collections)</li>
<li>added object hit cache and a key-miss cache to kelondroTree databases</li>
<li>better queueing and synchronisation of threads to omit database locks</li>
<li>bulk read method for database iterators; added pre-load of node cache to kelondroTree databases</li>
<li>removed high/med/low priority management from node cache management because of too much computing overhead</li>
<li>refactoring of indexing classes to enable new database structures (to come with release 0.47)</li>
<li>better abstraction of database entry values</li>
<li>replaced java.net.URL by own class de.anomic.net.URL. The original class made DNS lookups which slowed down.</li>
</ul>
<li>New Supported Standards</li>
<ul>
<li>added OpenSearch compatibility</li>
<li>added support for UPnP; added option for router configuration in web interface</li>
<li>added virtual host support; YaCy sends &lt;peer-hexhash&gt;.yacyh inside Host tag</li>
</ul>
</ul>
<br><p><a name="2046">v0.45_20060501_2046</a>
<ul>
<li>Enhanced Search Functions</li>
<ul>
<li>Re-designed search page: this shows now a simplified page without the left navigation column. The column is only shown if the user is logged-in.</li>
<li>Added Picture-option to search results: this shows only pictures that are on the found page in thumbnail size.</li>
<li>Added 'prefer' option to search: this will move search results with matching headlines to the top of the search results.</li>
</ul>
<li>Enhanced Crawler Options</li>
<ul>
<li>Introduced a re-crawl option for crawl starts of already indexed web pages.</li>
<li>Introduced a auto-dom-filter for special steering of the crawler in case that the crawl start is a link-list and crawls should stay within only the domains of the link-list.</li>
<li>Introduced a link-per-domain limitation for the crawler.</li>
</ul>
<li>Other User Interface Enhancements</li>
<ul>
<li>Re-designed (re-ordered) left navigation bar.</li>
<li>Added blog to Communication/Publication section of navigation bar.</li>
<li>New index cleaner job and interface to delete blacklisted indexes from the database.</li>
<li>The Profile-Viewer may now also show the local profile. A link to a profile-presentation of the local peer is included on the new simplified search page.</li>
</ul>
<li>Performance Enhancements</li>
<ul>
<li>Introduced a second RAM cache for indexes that are received from DHT transmission and removed second cache flush limit from normal index cache.</li>
<li>Added a period of rest to the cache flush process: this prevents that indexes are flushed to disc if they could be candidates for DHT transmission.</li>
</ul>
<li>Bugfixes</li>
<ul>
<li>Simplified installation instructions (removed proxy config).</li>
<li>Enhanced html parser to support image tags.</li>
<li>Added checks to protect peers from wrong seeds.</li>
<li>Re-design of kelondroTree-iterator to prevent concurrent modification exceptions.</li>
<li>Blacklists are now also applied to urls during DHT receives.</li>
<li>Fixed bugs with index cache flushs.</li>
<li>Fixed behavior of remote crawl trigger that caused long pauses in case of local index queue overflow</li>
<li>Better robots.txt support.</li>
<li>Many minor bugfixes concerning file type handling, url normal form computation, crawl profile storage, network peer counting, parser errors, wiki code, htcache storage etc.</li>
</ul>
</ul>
<br><p><a name="1844">v0.44_20060307_1844</a>
<ul>
<li>New Features</li>
<ul>
<li>New simplified search page</li>
<li>ajax-driven search result enrichment with snippet post-fetch. Snippets that are not available on page load time will be fetched using ajax requests.</li>
<li>New 1-2-3(-4) configuration menu makes it easier to configure YaCy for first-time-users</li>
<li>New yacy.badwords filters bad topwords</li>
<li>Show public Bookmarks in Bookmarks.html, private ones, if the user is logged in.</li>
<li>Added a yacybar.xpi to the release</li>
</ul>
<li>Bugfixes</li>
<ul>
<li>fixed conjunctive search; was broken because of wrong data structures</li>
<li>special chars (like german umlauts) are now allowed in bookmark tagNames</li>
<li>/xml/bookmarks/* now uses one file for private/public entries. private only with password</li>
<li>disabled write cache to avoid database corruption in case of crash</li>
<li>bugfixes to HTTP/0.9 header handling</li>
<li>fixed re-search bug</li>
</ul>
<li>Enhancements</li>
<ul>
<li>index write access (dht transmission, indexing, dht deletion) is now completely synchronized, which increases speed and reduces IO</li>
<li>there is now a real streaming support for lage files</li>
<li>support of chunked transfer-encoding for http/1.1 clients</li>
<li>support of gzip content-encoding suitable clients</li>
<li>automatic TOC generation for pages in wiki</li>
<li>changed user-agent string for yacy crawl access to 'yacybot'</li>
<li>added default-skins</li>
</ul>
</ul>
<br><p><a name="1593">v0.43_20060210_1593</a>
<ul>
<li>New Features</li>
<ul>
<li>Better result ranking due to many new ranking attributes</li>
<li>nearby-search in general and nearby-1 for queries enclosed in doublequotes</li>
<li>new DetailedSearch page for ranking testing</li>
<li>new Bookmark manager; search results can easily be added to bookmark collection</li>
<li>UTF-8 encoded charachters can now be used in built-in wiki and messages</li>
<li>new import for external crawling queues</li>
<li>additional Shutdownmethod to run YaCy as Windows Service</li>
</ul>
<li>Feature Enhancements</li>
<ul>
<li>more templates in yacy wiki</li>
<li>the yacy httpd is now able to set cookies using custom http headers</li>
<li>beautification of many pages</li>
<li>added majority voting for peer type decision, reduced the number of peer pings sent out</li>
<li>new database handling of index entries, less io overhead</li>
<li>re-organization of HTCACHE files, better file structure</li>
<li>many architecture changes to enhance database speed, stability and capacity to hold new ranking parameters</li>
<li>backup-option for lost assortment files and import interface for these files</li>
<li>enhancements for index distribution (better selection, less blocking, bugfixes)</li>
<li>security bugfixes: UserDB Passwordcheck, YBR transmission protocol path selection</li>
<li>enhanced German translations</li>
</ul>
<li>Important Bugfixes</li>
<ul>
<li>database bugfixes (iteration, peer-listing)</li>
<li>several thread-lockings solved</li>
<li>enhanced html cleaner for better security in wiki and messages</li>
<li>Memorysettings now also working for Windows</li>
<li>some Filemodes were set wrong, fixed</li>
<li>minor bug-fix in Cache for some rare URLs</li>
<li>Translations work now with readonly htroot</li>
</ul>
</ul>
<br><p><a name="1219">v0.42_20051216_1219</a>
<ul>
<li>New Features</li>
<ul>
<li>Introduction of Block Rank; Generation, transmission and collection of block rank statistics; computation of block rank tables for new search ranking YBR (YaCy Block Rank)</li>
<li>New network picture on network page</li>
<li>New Connection tracking page</li>
<li>New ICAP support and SQUID-like redirectors</li>
<li>Many small changes</li>
</ul>
<li>Improvements</li>
<ul>
<li>New search behavior with better search results and less search time.</li>
<li>Index transfer with less computation overhead.</li>
<li>Better robots.txt parser.</li>
<li>New write cache for less IO resulting in better database speed.</li>
<li>Asynchronous queueing of crawl jobs and better crawl+indexing performance</li>
<li>More document parser. The following document formats are supported:</li>
<ul>
<li>Acrobat Portable Document</li>
<li>Word Document</li>
<li>MimeType</li>
<li>Rich Site Summary/Atom Feed</li>
<li>OASIS OpenDocument V2 Text Document</li>
<li>Bzip 2 UNIX Compressed File</li>
<li>GNU Zip Compressed Archive</li>
<li>Rich Text Format</li>
<li>Tape Archive File</li>
<li>rpm Parser</li>
<li>Compressed Archive File</li>
<li>vCard</li>
</ul>
</ul>
<li>very large number of bugfixes</li>
</ul>
<br><p><a name="848">v0.41_20051004_848</a>
<ul>
<li>Prevention of unwanted DDoS effects caused by YaCy crawls by doing a target-server load balancing; further prevention is done by robots.txt</li>
<li>New setting of proxy cache size and storage path in submenu 'Proxy Indexing'</li>
<li>Modified cleanup of HTCache</li>
<li>Sorted config list in Config_p.html and sorted file list in Cache Admin menu</li>
<li>Code clean-up: added finals and constants</li>
<li>New Network menu design; showing peer status, Index Receive and Crawl Receive properties as images</li>
<li>Added ICAP support; now an experimental ICAP Server is embedded and YaCy allows other proxies to use the indexing service via icap response modification requests</li>
<li>Single-Peer permanent index transfer (full flush to other peer)</li>
<li>Added a templateCache to httpFileHandler</li>
<li>Added blacklist support for https requests and crawler</li>
<li>Adding functionality to delete entries from Indexing and Crawler Queue</li>
<li>Adding functionality to clear the whole indexing queue</li>
<li>Proxy now supports the X-Forwarded-For Header</li>
<li>Indexing queue now displays total size of enqueued content in kb</li>
<li>Remembering Crawler-isPaused setting by storing status into config file</li>
<li>Splitting of status page into a private and a public accessible part</li>
<li>Adding Queue overview to status page</li>
<li>New symbols for Peer Status, connection status, Index-Receive-Granted and Crawl-Receive-Granted in Network menu, replacing old separate columns</li>
<li>Support for robots.txt</li>
<ul>
<li>Implementation of a robots.txt parser</li>
<li>Control of remote crawls with robots.txt</li>
</ul>
<li>Performance enhancements</li>
<ul>
<li>Better Database Caching</li>
<ul>
<li>Better usage of memory in kelondro Record-Nodes and less IO access</li>
<li>New cache-control menu within the performance menu</li>
<li>Better cache-size default values</li>
</ul>
<li>Accelerated Blacklists import; makes big lists possible</li>
<li>Content-Encoding GZIP support for http post requests on index transfer/distribution</li>
</ul>
<li>Bugfixes</li>
<ul>
<li>Normalization problems: prevent URLs with ':80'</li>
<li>Many bug fixes for NULL pointer occurrences</li>
<li>Display of an proxy error page instead of a white page if the server has closed the connection before yacy was able to receive the http response line</li>
<li>Crawler Redirection bug fixed</li>
<li>Indexer now gets the mimeType now from the parsed document instead of the responseHeader (this is especially necessary if mimeType has to be detected by the MimeType parser)</li>
<li>URLs pointing to a server having a private ip addess will not be indexed anymore</li>
<li>Unsupported MimeTypes and fileExtensions will not be queued by the cachemanager in the indexer queue anymore (to reduce unneccesary IO)</li>
</ul>
<li>.. many more small changes and bugfixes. for details please see the <a href="http://svn.berlios.de/wsvn/yacy/?op=log&rev=0&sc=0&isdir=1">SVN history</a></li>
</ul>
<br><p><a name="548">v0.40_20050816_548</a>
<ul>
<li>Index distribution to DHT now fully active</li>
<ul>
<li>All peers, principal, senior and junior peers distribute their index to other principal or senior peers. Juniors send their index to only one principal/senior peer, while principal/senior peers distribute to 3 redundant other peers.</li>
<li>The index distribution and the index receive flag on Index-Control must be set to enable global search on the own peer</li>
<li>For a global search, only relevant peers according to DHT rules are selected</li>
</ul>
<li>New YaCyNews feature</li>
<ul>
<li>A new menu 'News' shows the news processing queues for incoming and outgoing news. However, news shall not be monitored here but they influence the presentation of other information throughout the system:</li>
<li>The Index-Create menu now shows a list of previously started crawls of other peers. They are distinguished between Crawls in progress and finished crawls.</li>
<li>When a Crawl in the Index-Create menu is started with the 'RemoteIndexing' - flag set on, then automatically a YaCyNews is generated to inform other peers about that crawl start. A message can be attached to explain why this crawl was startet.</li>
<li>When a personal profile is changed, a News Message is generated.</li>
<li>When a Wiki entry is changed, a Message is generated.</li>
<li>Within the Network menu, Alerts for Profile Updates, Wiki Updates and Cralws in Progress may appear.</li>
</ul>
<li>The YaCy wiki has enhanced and was moved out from the 'Lab' to the main menu. The Wiki-System now supports embedded images and can show a preview.</li>
<li>New logging policy: logs are now exclusively written to the rotating logs in <app-home>/log/</li>
<li>Search was enhanced using the intermission feature (all other processes are paused while search is in progress)</li>
<li>Enhancements to Translation/Localization</li>
<li>Proxy-Caution-Delay (forced idle time of the crawler after a proxy access) is now configurable in the Performance menue</li>
<li>Many bugfixes for time-out problems, database crashes, DHT management, version numbers</li>
</ul>
<br><p><a name="424">v0.39_20050722_424</a>
<ul>
<li>New Features:</li>
<ul>
<li>Added snippets to search results. Snippets are fetched by searching peer from original web sites and are also transported during result transmission from remote search results.</li>
<li>Proxy shows now an error page in case of errors.</li>
<li>Preparation for localization: started (not finished) German translation</li>
<li>Status page shows now memory amount, transfer volume and indexing speed as PPM (pages per minute). A global PPM (sum over all peers) is also computed.</li>
<li>Re-Structuring of Index-Creation Menue: added more submenues and queue monitors</li>
<li>Added feature to start crawling on bookmark files</li>
<li>Added blocking of blacklistet URLs in indexReceive (remote DHT index transmissions)</li>
<li>Added port forwarding for remote peer connections (the peer may now be connected to an configurable address)</li>
<li>Added bbCode for Profiles</li>
<li>Memory Management in Performance Menu: a memory-limit can be set as condition for queue execution.</li>
<li>Added option to do performance-limited remote crawls (use this instead to switch off remote indexing if you are scared about too much performance loss on your machine)</li>
<li>Enhanced logging, configuration with yacy.logging</li>
</ul>
<li>Performance: enhanced indexing speed</li>
<ul>
<li>Implemented indexing/loading multithreading</li>
<li>Enhanced caching in database (less memory occupation)</li>
<li>Replaced RAM-queue after indexing by a file-based queue (makes long queues possible)</li>
<li>Changed assortment cache-flush procedure: words may now appear in any assortment, not only one assortment. This prevents assortment-flushes, increases the capacity and prevents creation of files in DATA/PLASMADB/WORDS, which further speeds up indexing.</li>
<li>Speed-up of start-up and shut-down by replacement of stack by array. The dumped index takes also less memory on disk now. Because dumping is faster, the cache may be bigger which also increases indexing speed.</li>
</ul>
<li>Bugfixes:</li>
<ul>
<li>Better shut-down behavior, time-out on sockets, less exceptions</li>
<li>Fixed gzip decoding and content-length in http-client</li>
<li>Better httpd header validation</li>
<li>Fixed possible memory leaks</li>
<li>Fixed 100% CPU bug (caused by repeated GC when memory was low)</li>
<li>Fixed UTF8-decoding for parser</li>
</ul>
</ul>
<br><p><a name="208">v0.38_20050603_208</a>
<ul>
<li>Enhanced Crawling:
<ul>
<li>There are now 3 different crawl threads: local crawling, global crawl trigger and remote-triggered crawl jobs.</li>
<li>The thread pools can now be configured through the Performance-Menu and a customized number of crawling threads is possible.</li>
<li>Crawling can be paused and resumed.</li>
<li>Changed method of index caching; this speeds up crawling and provides a more economic data structure.</li>
</ul>
</li>
<li>Enhanced Proxy: added transparent proxy support. It is now possible to route http traffic through yacy without setting a proxy configuration in browsers. Example: set your iptables configuration with <br><tt>iptables -t nat -A PREROUTING -p tcp -s 192.168.0.0/16 --dport 80 -j DNAT --to 192.168.0.1:8080</tt></li>
<li>Extended seed-upload methods for principal peers: more configuration options, better extensibility. Added support for scp.</li>
<li>More external parsers. YaCy now supports tar, zip, gzip, bzip, rss, rtf, pdf, doc. To use these parsers, an additional libx-library must be installed which comes separately to the YaCy core distribution.</li>
<li>Enhanced Shutdown procedure: many unnesessary threads had been removed, a shutdown hook had been added. Missing file closings hat been added. The new index caching method flushes the cache faster.</li>
<li>Added support for localization: it is now possible to extend YaCy with localization data; added languages can be accessed with the new Language-Menu</li>
</ul>
<br><p>v0.37_build20050502
<ul>
<li>YaCy's source code is now hosted in a Subversion/svn version control system on developer.berlios.de: <a href="http://developer.berlios.de/projects/yacy/">yacy@berlios.de</a></li>
<li>overall speed enhancements:</li>
<ul>
<li>new Thread-Pools and performance enhancements from Martin Thelian: much faster http-server and more responsive web interface</li>
<li>fixed bug in database caching that prevented from caching at all; now database much faster. This also speeded up proxy mode (must read http-header from database)</li>
<li>modified thread control for non-blocking dequeueing</li>
<li>increased cache memory settings</li>
</ul>
<li>added a concept for external parsers; pdf an doc parser are integrated but not active yet.</li>
<li>fixed several bugs that caused thread-locks and 100% CPU load</li>
<li>fixed bug with cookie storage; changed handling of multiple cookies</li>
<li>fixed brute-force password attack denial</li>
<li>check on new peer names: must not occur already and may only contain letters, numbers and '_' or '-'.</li>
<li>many minor bug fixes and spell corrections in web-interface</li>
</ul>
<br><p>v0.36_build20050326
<ul>
<li>Enhanced thread control and added performance menu: this can be used to steer scheduling tasks and for profiling.</li>
<li>Enhanced search result ranking.</li>
</ul>
<br><p>v0.35_build20050306
<ul>
<li>new Features</li>
<ul>
<li>new user-profile management and remote access of profiles through the network-page</li>
<li>new cookie-monitor. Will be used to manage cookie-filter</li>
<li>new template engine and re-design of many administration pages as preparation for upcoming localization</li>
<li>now permanent storage of passive peers</li>
<li>enabled switch-of of proxy-cache</li>
<li>new proxy-indexing monitor and moved proxy-indexing configuration to that new page</li>
<li>more functions to DHT-management:</li>
<ul>
<li>remote indexing tagets now selected by DHT rule</li>
<li>remote search now selects hierarchically with DHT-rule</li>
</ul>
<li>enhanced access control to YaCy administration</li>
<ul>
<li>passwords are now encoded to MD5-Hashes before stored to httpProxy.conf</li>
<li>brute-force password-hack prevention by additional delay's</li>
<li>added new 'steering' servlets for automated processes that need authorization</li>
</ul>
<li>re-design</li>
<ul>
<li>re-designed main menu: new sub-menu for proxy functions</li>
<li>re-design of Network Monitor page</li>
<li>re-design of seed database management and implementation of seed-action interface</li>
</ul>
</ul>
<li>fixed bugs:</li>
<ul>
<li>fixed a bug with cache-control</li>
<li>fixed a bug with peer-list uploading</li>
<li>fixed a bug that provoked indexing of YaCy's own web pages</li>
<li>fixed a bug that prevented loading of some web pages: (JavaScript bug) doublequote/singlequote mixture removed</li>
<li>better binary-check on files before indexing</li>
<li>fixed misbehavior of Network-Page: re-design of enumeration method and auto-heal function in kelondroTree</li>
</ul>
</ul>
<br><p>v0.34_build20050208
<ul>
<li>Remote transmission of index (RWI) information to other peers with correct DHT position</li>
<ul>
<li>implemented two new yacy-protocol - commands: yacy/transferRWI and yacy/transferURL for RWI partition transfer</li>
<li>selection of DHT positions and selection of correct RWI partitions for transmission</li>
<li>performing full flush of index if peer is running in junior mode: now these juniors can contribute to the global index.</li>
<li>default full receive of index transmission in senior peers; these peers will currently not transfer indexes. This is a test configuration and senior2senior RWI transmission will be enabled in future releases.</li>
<li>Configuration flags (grant/do not grant) in 'Index Control' menu.</li>
</ul>
<li>Enhanced remote search</li>
<ul>
<li>selelction of less result values: less traffic, faster response.</li>
<li>pre-sorting of results in remote peers before transmission: better results</li>
</ul>
<li>more properties in seeds</li>
<ul>
<li>Flags for "accept remote crawls" and "accept remote indexes"</li>
<li>Flags for "grant index distribution" and "grant index receive"</li>
<li>Control values for received/send RWI/URL</li>
<li>All flag values are shown on Network page</li>
</ul>
<li>Bug-fixes:</li>
<ul>
<li>no re-set of remote crawl delay after re-connect</li>
<li>proxy fail (shows white pages) fixed: better timeout value</li>
<li>local indexing = off did not work, fixed.</li>
<li>auto-heal of seed.db - fail</li>
<li>many minor bug fixed</li>
</ul>
<li>new <a href="http://www.yacy-forum.de">german forum at http://www.yacy-forum.de</a>, provided by Roland Ramthun</li>
</ul>
<br><p>v0.33_build20050107
<ul>
<li>Support for Stop-Words; default stopwords are included; stopwords are excluded for indexing and in search query results</li>
<li>Skin support</li>
<li>New start/stop-script for unix/linux daemon init process</li>
<li>File-Share entries can now have description entries</li>
<li>Enhanced File-Sharing Menu</li>
<ul>
<li>Every entry can have a comment attached</li>
<li>Comments or picture preview visible in file list</li>
<li>File name and comment field can be indexed and globaly searched</li>
<li>Files found with search interface are dynamically linked to the actual IP of the peer hosting the file</li>
</ul>
</ul>
<br><p>v0.32_build20041221
<ul>
<li>New Crawling-Profiles for Crawl-Threads
<ul>
<li>every crawl start now defines it's own crawl job; new crawls do not interfere with previously started and still running jobs; all started jobs may run concurrently</li>
<li>new crawl properties: accept urls containing '?'; flag for storage of pages in proxy cache; flags for local and remote indexing</li>
</ul>
</li>
<li>New Design, new documentation, new mascot 'Kaskelix' (appears on search page), new home page location <a href="http://www.yacy.net/yacy">http://www.yacy.net/yacy</a></li>
<li>Promotion-String on search page</li>
<li>New shutdown-trigger (no more file polling, new stop scripts)</li>
<li>Principal-peer gaining after file generation</li>
<li>New 'Log'-menu: view the application log on the web interface</li>
<li>Bug-fixes
<ul>
<li>Termination process should succeed now.</li>
<li>Cross-Site-Scripting bug removed</li>
<li>Removed deadlock occurred during concurrent crawl job starts</li>
</ul>
</li>
</ul>
<br><p>v0.31_build20041209
<ul>
<li>Integrated url filter for crawl jobs (Index Creation - page) and search requests (Search Page).</li>
<li>Removed a bug that caused sudden termination when a not-valid url was crawled.</li>
<li>Massively enhanced indexing speed by implementation of an additional word index cache.</li>
<li>Added button to delete/empty the crawl url stack.</li>
<li>Many minor changes.</li>
</ul>
<br><p>v0.30_build20041125
<ul>
<li>Implemented Remote Crawling
<ul>
<li>Every Senior and Principal Peer may now start Remote Crawls: The initiating peer starts with the crawl and may assign URLs to qualified other peers. Those peers load the assigned resource, index them and return the index statistics back to the initiator. Executing peer may only be a Senior or Principal peer.</li>
<li>Extended URL management: URLs are now organized in three different sets: Noticed-URLs (not loaded but possibly queued for crawling), Error-URLs (not loaded but may be re-loaded to avoid index loss in case of temporary target server downtime or network problems) and Loaded-URLs. The Loaded-URLs are again divided into six categories:</li>
<ol>
<li>remote index: retrieved by other peers</li>
<li>partly remote/local index: result of search queries</li>
<li>partly remote/local index: result of index transfer (to be implemented soon)</li>
<li>local index: result of proxy fetch/prefetch</li>
<li>local index: result of local crawling</li>
<li>local index: result of remote crawling requests</li>
</ol>
<li>New monitoring pages: Local Index Monitor for results of LURL's (see above), cases 1-5 and the Global Index Monitor for case 6. Because the results of global crawls are not personal to the peer owner, the monitor page is not protected.</li>
<li>Options to allow or disallow remote crawling; either as initiating or executing peer.</li>
<li>Idle/Due-Time - management for each peer: to organize remote-crawl load-balancing, a delay time is used to schedule remote crawls. The seed management was extended to store and maintain these delay times.</li>
</ul>
</li>
<li>Proxy Performance Enhancements
<ul>
<li>changed+enhanced caching algorithm; re-implemented routines</li>
<li>process enhancements in httpc and httpd classes</li>
<li>gzip-load mode in httpc fixed</li>
<li>removed DNS bottleneck (the java DNS blocks while accessed simultanously)</li>
<li>integrated DNS-prefetch</li>
</ul>
</li>
<li>Implemented Shut-Down Procedure
<ul>
<li>Integrated notifier procedure in all threads.</li>
<li>The application now creates a file 'yacyProxy.control' after start-up.</li>
<li>To stop the yacyProxy, remove the control file.</li>
<li>Integrated a 'Shutdown' - button on the 'Status'-page which also triggers shut-down</li>
<li>After shut-down is initiated the application first processes all scheduled crawling- and indexing tasks which may last some minutes in the worst case.</li>
</ul>
</li>
<li>Removed bugs
<ul>
<li>URL normalization</li>
<li>many minor bugs</li>
</ul>
</li>
</ul>
<br><p>v0.29_build20041022
<ul>
<li>New option to start explicit crawling jobs: a start url and a crawling depth
(differently from the prefetch depth) can be set.</li>
<li>Integrated monitoring interface for prefetch/crawling activities.
The user can now observe the crawling and indexing activity in detail.
There is also a report page that lists all newly indexed pages with the option
to delete these indexes again. The interface also reports the initiator
of the crawling/indexing tasks which can be currently either the prefetch mechanism
or explicit crawling requests. In future releases the initiator may also refer to
remote crawling requests.
</li>
<li>New caching procedure for database requests on file-system level.
</li>
<li>Extended blacklist url matching: parts of a domain may now be matched with wildcards '*'. (the URL's path may be matched with regular expressions)</li>
<li>The application will be re-named. Many parts now refer to the new application name 'yacy', but not all.</li>
</ul>
<br><p>v0.28_build20041001
<ul>
<li>Search results are now searched again for characteristic word patterns.
The patterns are statistically evaluated and are used to generate
"search associations",
shown as hints for further combined search.
</li>
<li>Parallelized peer propagation process. This results in very rapid bootstraping.
</li>
<li>Integrated new 'score' library for rapid element sorting - used for search
patterns and rapid bootstraping. May help in future releases to speed up indexing.
</li>
<li>Minor bug-fixes.</li>
</ul>
<br><p>v0.27_build20040924
<ul>
<li>Bug fix in remote search result preparation.</li>
<li>Speed enhancements on search client when doing remote search.</li>
<li>Small changes in file sharing interface.</li>
</ul>
<br><p>v0.26_build20040916
<ul>
<li>Introduced new 'virtual' TLD (top-level-domain) '.yacy' that the proxy resolves into the peers IP and port numbers:</li>
<ul>
<li>Every yacy-peer can now be contacted using the peer's name as domain name:
Proxies users can obtain any other proxy-hosted pages using the url 'http://&lt;peer-name&gt;.yacy'.</li>
<li>Implemented sub-level domains for yacy TLD's: they are matched to subdirectories of the peer's individual web root HTDOCS. (see below)</li>
</ul>
<li>Support for individual web pages:</li>
<ul>
<li>Every proxy host can serve it's individual web page. We implemented two paths for each server: one default path pointing to &lt;application-root&gt;/htroot for administrative pages and an alternative path for individual use at &lt;application-root&gt;/DATA/HTDOCS.</li>
<li>The individual web pages may be accessed either using the new '.yacy' TLD's through another proxy, or optionaly by using the peer's IP:port address. The recommended default address of a proxy is 'http://www.&lt;peer-name&gt;.yacy', which is mapped to &lt;application-root&gt;/DATA/HTDOCS/www/. </li>
<li>Integrated an upload/download interface for individial web pages: additional accounts for uploaders and downloaders ensure appropriate authorization. The file-sharing web space can be browsed with an directory servlet. A default sub-domain is assigned to 'http://share.[peer-name].yacy', which is mapped into &lt;application-root&gt;/DATA/HTDOCS/share/.</li>
<li>Web clients not using the proxy may contact the new individual default subdomains using the URLs http://&lt;peer-IP&gt;:&lt;peer-port&gt;/www/ and http://&lt;peer-IP&gt;:&lt;peer-port&gt;/share/.
</ul>
<li>Several Bug-fixes:</li>
<ul>
<li>Date bug appearing when accessing the proxy httpd with the proxy.</li>
<li>Additional Time-out catch-up at httpc when a file is submitted without length tag. Also extended general retrieve - time.out.</li>
<li>Terminal line restriction of 1000 bytes was too tight (cookies may have 4kb length).</li>
<li>Introduced global general time measurement for peer synchronization and balanced hello - round-robin.</li>
</ul>
<li>Enhanced proxy-proxy - mode: 'no-proxy' settings as list of patterns to exceptional dis-allow usage of remote proxies.</li>
<li>Implemented multiple default-paths for URLs pointing to directories.</li>
<li>Re-design of front-end menu structure.</li>
<li>Integrated Interface for principal configuration in Settings page</li>
<li>Re-named the release: integrated YACY name to emphasize that the proxy is mutating into the YACY Search Engine Node</li>
</ul>
<br><p>v0.25_build20040822
<ul>
<li>New Index Administration Menu Item: RWI's (Reverse Word Indexes) may now be inspected.
Each reference in a word index can be displayed in detail, and optionally be deleted.</li>
<li>Minor bug fixes in Bootstraping. Major Bug fixes in Index Storage (better Normal Form of URLs).</li>
<li>Better display of cache content in the Cache Administration.</li>
</ul>
<br><p>v0.24_build20040816
<ul>
<li>New 'Cache' Menu item: The proxy cache can now be inspected. It shows a directory list with http response headers and content to each file in the proxy cache.</li>
<li>Faster Bootstraping: The connection policy was changed: as long as the proxy status is 'virgin', the most recent known connection is used for bootstraping; then later the least recent for peer distribution.</li>
<li>Better Formatting in Network Menu.</li>
</ul>
<br><p>v0.23_build20040808
<ul>
<li>Blacklists now provide management of several lists and more import options.</li>
<li>code cleanup + many minor bugs</li>
<ul>
<li>Messages now work (corrected POST implementation, this also cleaned the way to index distribution); improved message sending, displaying etc.</li>
<li>double links / unchecked '#', headlines wrong</li>
<li>httpd-speedup (no more temporary files, template prefetch without double-load)</li>
<li>much better Bootstraping and more intelligent yacy-peer updating</li>
<li>auto-migration of new settings from httpProxy.init</li>
<li>much better logging; extensive log configuration options for all parts of the application now in httpProxy.init</li>
<li>better search requesting (more results)</li>
<li>yacy protocol may now also use other proxies in proxy-proxy-mode</li>
</ul>
<li>more documentation</li>
<ul>
<li>permanent demo-page at yacy.net/home.html with wiki</li>
<li>new FAQ at http://www.yacy.net/yacy/FAQ.html</li>
<li>first step to move YACY to new home http://sourceforge.net/projects/yacy/</li>
</ul>
</ul>
<br><p>v0.22_build20040711
<ul>
<li>More security bug fixes (dementia accountia, '..' usage in server path, server blacklist too tight for local clients)</li>
<li>Another advance in better peer distribution and recognition (distinguishes between 'real' disconnected peers and 'hearsay' disconnected peers. Keeps track of online time. No preferences of principal peers in link distribution)</li>
<li>An option to switch the peer to online mode without using the proxy. This makes life much easier for newbie's.</li>
<li>A new message function. Within the Network page, one can hit the 'm' and may then send a message to the other peer. The owner of that
peer can read the message in his/her private message inbox. This function is only in alpha statdium; it works only in rare cases and
we don't know why. Only for testing.</li>
<li>Cleaned up the mess of different database and configuration files. All run-time data is now accumulated in the new folder 'DATA'. If you previously generated an index and want to migrate, you simply need to put your old PLASMADB folder into the new DATA folder.</li>
<li>Clean up of the source mess and partition of them into separate packages</li>
<li>Some design enhancements of the online interface</li>
</ul>
<br><p>v0.21_build20040627
<p>After an announcement on freshmeat.net we got many hits in the newly build p2p-network. We learned from the p2p-propagation behavior and
implemented a lot of new routines to stabilize the YACY network.
<ul>
<li>Better peer analysis, statistics, propagation/distribution (more properties and bug fixes).</li>
<li>No more JavaScript in online Interface. New template logic for httpd and new online interface look-and-feel, using the new features.</li>
<li>New FAQ in documentation.</li>
<li>Protection against hacker and virus attacks: new self-configuring client-IP blocking in serverCore.java</li>
<li>More information and warnings about security settings to the operator to protect the own peer</li>
<li>Network statistics and monitor shows status of remote peers and the distributed index</li>
</ul>
<br><p>v0.20_build20040614
<p>The first step into the p2p-world: introduction the YACY (yet another cyberspace) p2p network propagation and information wares distribution system. YACY enables in this release a rudimentary index exchange so that you can use YACY to bootstrap a world-wide distributed search engine.</p>
<ul>
<li>Added status page on web interface and automatic opening of web browser on status page. Can be switched off on the satus page.</li>
<li>Implemented still missing element removal and AVL balancing for element insterts in the kelondro database. This ensures logarithmic efforts on database access, which influences the proxy and the search service. Now only AVL balancing after removal must be implented, but it's missing is not critical.</li>
<li>Added blacklist enhancements and web interface for blacklist editing from Alexander Schier.</li>
<li>More and better documentation.</li>
<li>Many minor bug fixes, i.e. non-cacheabilty of web interface, exception catch-up on startup when proxy is used before coloured lists are loaded.</li>
<li>First p2p elements implemented: every peer on startup looks for other peers and announces it's own startup. The function does not yet actively implement an index exchange, but can repond to remote index queries.</li>
</ul>
<br><p>v0.16_build20040503
<p>This release is a major step to make the proxy enterprise-ready: we introduced several security mechanism and access
restrictions for the proxy and the server. Every security setting can be configured through a web page. Thanks to the new
HTTPS proxy, the proxy can now be considered as 'complete'.
<ul>
<li>implemented a HTTPS proxy, sharing the same proxy port with http;
this does not help for more/better indexing since the SSL data is simply passed through.
But we can now state to be a 'full' http and https proxy, usable in enterprise environments and internet cafe's.</li>
<li>two security layers for web server and proxy access: implemented Client-IP - filtering, which adds a virtual Firewall to
the application. Every client that does not match the client-IP-filter is blocked. The second layer is a PROXY password
protection. All attributes can be configured through a new web page at http://localhost:8080 (standard configuration).</li>
<li>to protect the configuration pages of the web server, we introduced a password protection for special pages on the web server.
Every page that ends with '_p.html' has a protection; the corresponding account can also be set through the local web server.
Users shall be encouraged to set this administration account first.</li>
</ul>
<br><p>v0.15_build20040318
<ul>
<li>Extensive code re-engineering</li>
<ul>
<li>Inserted and further generalized the proxy's genericServer into the AnomicFTPD project. After further enhancements within that project,
it was re-inserted in the HTTPProxy. The Switchboard interface now belongs to the genericServer, which is now called the serverCore.</li>
<li>Removed the old html parser and replaced it by the new htmlFilter library, which now parses the html files during reading from
the remote server. Real-time parsing during streaming html pages is done extremely fast and does not slow down file passing through
the proxy. The new htmlFile provides a filter interface, which is now used to filter out content that is defined by keywords.
Currently the bluelist 'httpProxy.blue' is used to define these words.</li>
<li>Re-engineered the crawler interface and implemented a crawler. Since the crawler does not work in all cases, it is still
disabled in this release. You can switch it on by setting the prefetchDepth in the configuration file httpProxy.init</li>
</ul>
<li>Implemented a 304 response. This speeds up all responses in the case of a cache hit combined with a conditional request.
Since this combination is fairly common, it noticeable speeds up the proxy.</li>
<li>New documentation design</li>
<li>New Search Page design</li>
</ul>
<br><p>v0.14-build20040213
<ul>
<li>More Structure to the whole system to lay the basis for the Crawler</li>
<ul>
<li>The new structure will distinguish between the <i>httpd</i> with it's servlets, the file-servlet and proxy-servlet;
the <i>crawler</i> which also holds responsibility for the http cache that is used by the http proxy and the <i>indexing</i>
engine 'PLASMA', which is again accessed by the http file server. But even with the crawler concept on board here, we still don't have prefetch now.</li>
<li>Moved plasmaTextProvider to httpCrawlerScraper, httpdProxyCache to httpCrawlerCache and httpdSwitchboard to httpCrawlerSwitchboard</li>
<li>New configuration value proxyCacheSize: limits the memory amount of the cache; if the cache exceeds this value the oldest entries are deleted</li>
</ul>
<li>Bug fixes:</li>
<ul>
<li>Found and eliminated nasty bug that prevented using yahoo mail. (they send several cookies at once)</li>
<li>No more indexing of URLs with 'cgi' in name or ending with '.js', '.ico', or '.css' (checking content-type for 'text' is not enough; some servers do not transfer right value)</li>
<li>Fixed search for words containing numbers and german Umlaute</li>
<li>adopted acrypt.java to no using javax.crypt, this was not supported by debian blackdown java 1.3.1. Furthermore, removed -server - flag from httpProxy.sh, that also made blackdown to crash. (you probably want to insert that flag again in your installation)</li>
</ul>
<li>The proxy can now be configured to access another proxy</li>
</ul>
<br><p>v0.13-build20040210
<ul>
<li>Bug fixes:</li>
<ul>
<li>removed forced unzipping for special cases: either if the file to be transported is 'naturally' in gzip format (.gz, .tgz and .zip) or if zipping would not make sense because it would not yield any compression, as for images. Now the 'Accept-Encoding', created by the browser and send to the server has omitted gzip attributes in this cases. This should lead to less overhead (no gzip en/de-coding) and thus to more speed.</li>
<li>now transport of httpc failure response body (especially 404; seemed to be unneccesary, but is not)</li>
<li>search result bug (mixed up appearence) removed</li>
</ul>
<li>Performance and structure enhancements:</li>
<ul>
<li>Extended database capabilities to hold content of dynamic size; new files kelondroRA.java, kelondroAbstractRA.java, kelondroFileRA.java, kelondroDyn.java</li>
<li>Used new database features to store the response header information for all files in the cache into one database file. This saves 50% of the number of files in the cache (no more need for the .header - files)</li>
<li>Implemented a scheduling that moved the time of cache creation into an proxy-idle - time. This reduces the file operation on a single user system by 50% during web page retrievement.</li>
</ul>
</ul>
<br><p>v0.12-build20040204
<ul>
<li>now a <a href="roadmap.txt">release roadmap</a> exists</li>
<li>enhanced proxy and caching:</li>
<ul>
<li>integrated blacklist 'httpProxy.black' idea and data from Alexander Schier: forced 404 response for blacklisted hosts. This can be used to 'switch off' specific domains, especially AGIS servers. Can also be used for child protection/parental control. Does not filter content!</li>
<li>cache write bug if same file and directory name is used (can be done in URL, but not in cache file system) removed. </li>
<li>detailed 404 debugging response in case of failure</li>
<li>new config value maxSessions for limit the number of concurrent connections to the proxy</li>
<li>Host property bug in httpc for HTTP/1.1 servers removed: now better access to more servers</li>
</ul>
<li>enhanced indexing and searching:</li>
<ul>
<li>implemented rudimentary ranking and ordering of search results either by quality or by date</li>
<li>implemented bluelist 'httpProxy.blue': filtering of all blue-listed words in search expression, result-url and result-description</li>
<li>bugfix for combined search, fixed date attached to search results</li>
</ul>
<li>first contact with <a href="http://gnugle.sourceforge.net">Gnugle</a> project and knowledge exchange</li>
</ul>
<br><p>v0.11-build20040124
<ul>
<li>non-empty field servlet bug in index.java</li>
<li>greatly enhanced indexing</li>
<ul>
<li>better structure: new classes plasmaIndexEntry, plasmaSearch, plasmaIndex, plasmaIndexCache, plasmaURL</li>
<li>index entry caching and transparent flushing implemented</li>
</ul>
<li>catch-up of sleeping connections, enhanced idle check in genericServer.java</li>
</ul>
<br><p>v0.1-build20040119
<ul>
<li>first time published on www.anomic.de!</li>
<li>client user agent forwarding according to 'yellow'-list</li>
<li>plasma database</li>
<ul>
<li>new database sub-path DATABASE</li>
<li>new file kelondroRecords.java + kelondroTree.java</li>
<li>plasmaStore now saves and retrieves transparently urls in the kelondro database</li>
<li>no more XSUMP path, was not necessary for condenser; url attributes will be stored in new DB</li>
<li>indexing implemented; still imperformant since that needs caching (later)</li>
<li>rudimentary index access through new web page index.html and servlet index.java</li>
</ul>
<li>better client timeout -> better idle check -> no job queue blocking</li>
<li>new interface genericServerHandler.java</li>
</ul>
<br><p>build20040110
<ul>
<li>blackboard as global configuration set for all threads with global function scheduler</li>
<ul>
<li>new files httpdSwitchboard.java and plasmaSwitchboard with job control and global config</li>
<li>new file plasmaStore.java</li>
<li>the plasma blackboard saves its data into plasmaBlackboard.conf</li>
<li>cgi control over blackboard (try http://proxy/test.html)</li>
<li>new test file HTROOT/switchboard.[html,java]; try http://proxy/switchboard.html</li>
</ul>
<li>condensement on cache (indexing pre-process)</li>
<li>renamed file htmlParser to plasmaTextProvider</li>
<li>new file plasmaCondenser.java</li>
<li>test output of word list per page</li>
</ul>
</ul>
<br><p>build20040107
<ul>
<li>better/more configuration</li>
<ul>
<li>moved httpd.conf to httpProxy.conf</li>
<li>new loglevel attribute for server and proxy in httpProxy.conf</li>
<li>new clientTimeout attribute for client-proxy connections in httpProxy.conf</li>
</ul>
<li>advanced cache-control</li>
<ul>
<li>transparent gunzip upon loading of gzip-encoded streams in httpc, all cache files are now unzipped</li>
<li>much better cache control, according to RFC standards and recommendations, really usable now</li>
</ul>
<li>implemented scheduler</li>
<ul>
<li>idle check in genericServer as scheduler trigger</li>
<li>new experimental scheduler in httpProxy for new cache arrivals</li>
<li>added acrypt.java for different encoding tools</li>
<li>added htmlParser.java and implemented scheduled parsing of selected html resources</li>
<li>a subdirectory XSUMP is now filled with to-be-indexed text files</li>
</ul>
</ul>
<br><p>build20040105
<ul>
<li>advanced header transport transparency</li>
<ul>
<li>added CaseInsensitiveMap.java, a TreeMap with case-insensitive comparator</li>
<li>management of reverse mapping of header symbols</li>
<li>better handling of cookies (the yahoo-bug was attacked, but still not eliminated)</li>
</ul>
<li>advanced error case behavior</li>
<ul>
<li>implemented 404 response when server or files unreachable</li>
<li>fixed behavior when file download is interrupted and broken file is in cache</li>
<li>fixed session termination bug</li>
</ul>
<li>advanced cache behavior</li>
<ul>
<li>fixed loading of stale files from cache in some cases</li>
<li>203 response instead 200 for files that come from cache</li>
</ul>
</ul>
<br><p>build20031229
<ul>
<li>minor bugs ("#!/bin/sh" in shell scripts; +%Y instead +%y; 755, 644 acc rigths)</li>
<li>major changes in caching load/transport</li>
<li>added wishlist.txt</li>
<li>added changelog.txt</li>
<li>implemented automatic webinterface access if no host given</li>
<ul>
<li>extended httpd.java to access FileServlets from httpdFileServlet.java</li>
</ul>
<li>file servlet with examples, parameter hand-over via get and post, text and multipart</li>
<ul>
<li>added GPL lib 'Template.java' from JavaBY Template Engine from Alexey Popov</li>
<li>made changes in Template.java</li>
<li>added httpdFileServlet.java and classProvider.java to implement template-based CGI file serving</li>
<li>added subdir HTROOT with example files test.{java,http}</li>
</ul>
</ul>
<br><p>build20031218
<ul>
<li>first public release of YACY as AnomicHTTPProxy_20031218.tar.gz</li>
<li>basic httpd proxy functions only</li>
<li>first alpha-tester Alexander</li>
</ul>
<br><p>build20031215
<ul>
<li>first public announcement of project idea on<br>
<a href="http://www.heise.de/newsticker/foren/go.shtml?read=1&msg_id=4744034&forum_id=50682">
<font size="1">http://www.heise.de/newsticker/foren/go.shtml?read=1&msg_id=4744034&forum_id=50682</font>
</a><br>
by Michael Christen</li>
</ul>
<!-- ----- HERE ENDS CONTENT PART ----- -->
<SCRIPT LANGUAGE="JavaScript1.1"><!--
globalfooter();
//--></SCRIPT>
<NOSCRIPT>
<br><br></td></tr></table>
</td>
<td width="10" valign="top">
</td>
</tr></table>
</td></tr></table>
</NOSCRIPT>
</body>
</html>