Commit Graph

231 Commits

Author SHA1 Message Date
orbiter
d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 13:33:28 +00:00
orbiter
addbd5b482 moved the main update url - because of the many languages we support now on yacy.net
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-17 01:22:22 +00:00
orbiter
6c52e31993 new methods to open a browser
- if YaCy is started with the option -gui, it is not in headless mode. Then the java 1.6 browse method is used if all other methods fail
- in linux, the path /etc/alternatives/www-browser is used if no firefox is installed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-14 16:15:14 +00:00
orbiter
5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-13 17:37:28 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
low012
64f32e8f00 *) replaced all IPs in IP filters for proxy with the proper regular expression
*) some cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-11 23:37:13 +00:00
orbiter
fe93caac5a added flags and administration options to show advanced search and to show search result attributes (for each search result)
Administration can be done at ConfigPortal.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:54:13 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
orbiter
3fe03f153d - search page becomes default start page (new users are not forced to do configuration since this is not necessary)
- adjusted top menu on search page (shows less stuff and now also the network graphics)
- adjusted the network page (looks better in when showing no other navigation on top)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 14:58:28 +00:00
orbiter
59d9fe1bd7 added more php mime types
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7443 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:52:36 +00:00
orbiter
3ae8f40fc8 removed yacy.network.group - this feature was never used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:50:36 +00:00
orbiter
efb4ca8fa8 modified auto-delete of search failure-words:
- words are now not deleted from the search index automatically if index receive is switched off
- a flag in the network definition defines if this feature is switched on at all
- the search filter for not-found word references is switched off for server-side remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:46:00 +00:00
f1ori
4e29e9712a * create cleanupjob for cached failed urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 15:04:00 +00:00
lotus
b1484299b2 same units for memory observer configuration (MiB)
old setting for DHT (RAM) will be lost after update
can be set on /Performance_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 20:38:01 +00:00
low012
11ea966f9e *) added SID file (Commodore 64) sound file parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 12:06:04 +00:00
low012
936e976c23 *) added FreeMind (http://freemind.sourceforge.net/) mindmap parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 20:13:31 +00:00
orbiter
4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html
instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-08 10:50:23 +00:00
orbiter
fc2e41e691 added a forwarder for the default page. The forwarder forwards a browser to a different page if the root file index.html is accessed. This can be done by setting the name of the forwarder page to the field
"Default index.html Page (by forwarder)" in /ConfigPortal.html
The purpose is to forward to /yacyinteractive.html for the 27C3 FTP search plattform

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-07 15:46:04 +00:00
orbiter
cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
 - for search results ordered by date (most recent up)
 /near
 - for search results where search words appear near to each other (closest up)
 /language/<lang>
 - for a sorting by language where the wanted language gets up. Example: /language/de
  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 18:08:20 +00:00
orbiter
a9f754c45f removed unused CR accumulation and distribution process
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 11:07:42 +00:00
f1ori
442bebca2b * %0 does not belong to the IPv6-Address -> entry does not work on some systems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-06 15:09:28 +00:00
f1ori
6ac4f8142e * allow proxy requests from localhost via ipv6
(%0 does not belong to the address)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-04 10:52:54 +00:00
orbiter
917d715374 lulabad found his signature
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-01 23:32:20 +00:00
f1ori
def4253555 * add option to network definition to provide a domainlist (syntax like in blacklists)
* crawler and search allow only urls matching one in domainlist (if list is provided)
* this may be useful to prevent dedicated networks from being "polluted"
* FilterEngine is improved Backlist-object, Blacklist may inherit from FilterEngine in the future

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-30 14:44:33 +00:00
orbiter
482127e777 removed release key from location 2 because the signature of that source can not be verified. But the source is ok.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7283 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-29 09:22:56 +00:00
orbiter
facfd204e9 added a parent configuration option.
see /ConfigPortal.html
requested here:
http://forum.yacy-websuche.de/viewtopic.php?p=21099#p21099

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-25 22:16:07 +00:00
orbiter
6a166c2040 patches for bad proxy behaviour
- accept ipv6 localhost clients
- index media files (url only)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-11 11:38:36 +00:00
orbiter
45b1ab3d07 custom + generic skins:
- added a generic skin which is filled with actual color assignment using a servlet
- enabled css servlets
- added a generic color scheme in configuration file
- added configuration input in Customization/Appearance servlet
- added a jquery color picker widget
- placed color picked widget to input field of generic colour definition input fields

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-11 00:00:10 +00:00
orbiter
2c549ae341 fixed a number of small bugs:
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 23:57:58 +00:00
orbiter
37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
- added link to yacystats peer when peer is more than one day old

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-17 10:20:04 +00:00
orbiter
461a2a6ec7 enhanced remote crawling:
- 300 ppm is default now (but this is switched off by default; if you switch it on you may want more traffic?)
- better timing for busy queue
- better amount of remote url retrieval
- better time-out values
- better tracking of availability of remote crawl urls
- more logging for result of receipt sending

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 09:34:17 +00:00
orbiter
670ba4d52b - removed the remote crawl option from the network configuration submenu and
- added a remote crawl menu item to the index create menu. This menu also shows a list of peers that provide remote crawl urls
- set remote crawl option by default to off. This option may be important but it also confuses first-time users


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 00:39:05 +00:00
orbiter
4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more)
- fixed bug in WeakPriorityBlockingQueue
- show favicon faster using pre-loading (same technique as used for fast image search)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 23:25:19 +00:00
orbiter
0ab6a462ee - added a missing entry in YaCy interface robots.txt for bookmarks
- changed default robots.txt deny list to include some more interface pages because the loading of such pages are a peer load issue for YaCy when crawlers come by and information on these pages are not useful for public search. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 09:58:54 +00:00
orbiter
3f1d5a061f by default store crawled pages to HTCache to support verify=false snippet generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 09:28:01 +00:00
lotus
23ba107834 UPnP port forwarding default on now. This also displays a message on the entry settings page if not successful, so the user gets an extra hint to open his ports.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 08:45:00 +00:00
orbiter
70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
orbiter
59c035c40b changed explanation of Xmx and Xms
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7038 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-13 20:03:10 +00:00
orbiter
171f2bd84e - removed unused network oanet
- added new network definition 'allip' which can be used in networks where intranet and internet-addresses shall be indexed
- added a auto-switch-off for global search if there are no global peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 23:41:17 +00:00
low012
8e88fa4a62 *) fixed indetion (tab vs. spaces)
*) added Android packages MIME type

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 21:31:22 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
11b7853940 added a configuration page for search heuristics. currently you can switch on there:
- a site-operation heuristic that loads all direct links from a portal page if the site-operator is used
- a direct crawl for search results from scroogle for the given search terms
The configuration page can be found directly beside the network configuration page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6951 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 21:38:16 +00:00
orbiter
353a924760 - changed default memory to 500m
- now xms is lower than xmx (lets try what happens)
- removed default path for intranet crawl starts to avoid confusion as seen on linuxtag
- added time-out to upnp request (i have a new router which may need that)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-14 21:36:40 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
90fa8fd4d4 - support gpx file extension
- non-blocking location search (time-out handling was wrong)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6871 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-12 08:49:20 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
a7d038bb7a The oai ListFriends source list becomes configurable: just write them into defaults/oaiListFriendsSource.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 10:01:37 +00:00
orbiter
cf13c65bdd added another network definition file for the open access (decentral OAI) search network
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-05 22:47:03 +00:00
orbiter
5efc0dce0b fix for domain options in search box
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6848 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-30 21:53:20 +00:00
orbiter
f83b1b91b9 increased dht busy sleep time to 10 seconds to reduce TCP/IP traffic for default settings. 2 seconds had been too much traffic for home-use routers.
Please try to set your dht busy sleep time in existing installations also to 10 seconds.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6779 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-27 23:04:00 +00:00
orbiter
9623d9e6d2 added a smb loader component for the YaCy crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
72f00dee59 removed never-used server access account function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
30c8185139 fix for sid check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:31:32 +00:00
orbiter
ef62d017e5 integrated session id filtering for crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:15:17 +00:00
orbiter
d8d9984913 added framework for session id filtering (not ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:30:41 +00:00
lotus
945e0ba5a5 allow global search if res. observer disabled index transmission
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 17:14:16 +00:00
lotus
5cbef63c37 fixed bad ip pattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-26 14:25:15 +00:00
orbiter
8df1694288 - added options to switch on/off search domains (text, image, audio, video, app)
- more memory by default


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-21 22:03:02 +00:00
orbiter
dff4f95c78 some patches to get the torrent parser working
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-07 00:42:12 +00:00
lotus
12dd8ece3e enabled memory protection from 6459 with 50000kb (disables dht-in)
this should only apply if there is really little memory available because it is checked by threads explictly requesting memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-13 16:26:45 +00:00
orbiter
2bab0679e0 lost my key :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 23:46:29 +00:00
lotus
6edc168cfe option to disable dht by memory limit:
memory.acceptDHT in kbytes
not yet pre-enabled, will clear on every startup
please review since this could break dht in freeworld

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 19:13:30 +00:00
lotus
79251e6f60 configurable disk space hardlimit for dht
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-31 19:12:53 +00:00
orbiter
8a1046feaa less maximum file size, too many problems with larger size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-30 20:21:45 +00:00
orbiter
3d5eeb842a new default skin 'pdblue'
The old default skin named 'default' is renamed to 'classic-blue'.
All users will keep their current default skin named default, but YaCy will copy the classic-blue also to the skin folder.
For all new peers, the new skin pdblue is used.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 12:59:44 +00:00
orbiter
c864901087 - moved httpd.mime to defaults path
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 13:29:09 +00:00
orbiter
735e2737e3 * added index segments
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 14:44:20 +00:00
orbiter
6e0dc39a7d - some fixes to prevent blocking situations
- better logging for the crawler
- better default values for the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-06 21:52:55 +00:00
orbiter
23ab6fbca4 - navigation appear at correct position when opengeodb-results are also presented after a search
- show an about box if about.headline and about.body is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-20 22:10:45 +00:00
orbiter
721b88efbd - fixed a problem loading blacklists with new yacycore.jar
- fixed badwords and stopwords initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-15 11:46:02 +00:00
orbiter
573d03c7d7 added configuration to enable ram table copy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-07 20:30:57 +00:00
orbiter
d656a94f55 fix for bad paths in dictionary processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-02 18:24:41 +00:00
orbiter
39ae96450b draw more peers in network picture
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 08:36:15 +00:00
orbiter
c6c97f23ad - added cache usage properties to crawl start
- added special rule to balancer to omit forced delays if cache is used exclusively
- extended the htCache size by default to 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 11:54:04 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
b2263bc720 enhanced document type recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 11:01:05 +00:00
orbiter
57a88d435b redesign of parser mime type detection and parser steering
There is now a mime-blacklist instead of a mime-whitelist

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6190 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-10 14:22:17 +00:00
orbiter
8ca1f5d400 - some work to integrate the html parser the same way as the other parsers are integrated (not finished)
- added migration of code of settings pages (hmm.. does not work correctly yet, sorry)
- more refactoring
- removed more unused code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-09 20:56:30 +00:00
orbiter
801aa08162 added f1oris update location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6174 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-04 20:56:25 +00:00
lotus
ec2970cc76 higher dht distribution speed by default
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6168 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-03 13:28:19 +00:00
orbiter
995da28c73 all stack/heap files that had been stored in DATA/PLASMA are now stored in the network-specific QUEUES path
There is no migration. All crawls must be restarted.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 17:01:23 +00:00
low012
457b6c0d6d *) updated Apache POI library to be able to parse Visio files
*) updated PPT and XLS parsers to use new Apache POI library
*) added new Visio (VSD) parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6145 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-27 09:33:09 +00:00
lotus
db70badcf0 possibility to set remote host on upnp device
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-19 15:32:59 +00:00
lotus
aec3e7995a autoconfig.pac can be used to browse .yacy-domains only
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 19:48:11 +00:00
orbiter
f348190566 tried to insert a database dump import method to the phpBB3 import function. Reason: imports or large database dumps are cannot be handled with phpMyAdmin and this should be an easy way to the database dumps into a mySQL database where it can be exported again with the phpBB3 content integration adapter. Completion or removal of this function stub will follow before next main release.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:03:40 +00:00
orbiter
d50be59088 - added a automatic re-construction of the domain stack after 10 minutes. this includes then urls to the domain stack that were left over in case of stack size limitations when the domain stack was created the last time
- changed the busy sleep time for the crawl thread to 30 millisecons. This is sufficient to crawl with 2000 PPM.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 09:34:44 +00:00
orbiter
5fdba0fa51 - fixed a not working selection rule in balancer
- more security about crawl-delay, be more fail-save
- better logging in case of long forced crawl-delays

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 08:46:59 +00:00
orbiter
4522c13ee7 added option for a table prefix when importing phpbb3
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 14:29:02 +00:00
orbiter
4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
- yacy can import phpbb3 posts without crawling
- all data is written as surrogate
- indexed surrogate files can be re-used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 14:53:23 +00:00
orbiter
26a46b5521 increased default maximum file size for database files to 2GB
Other file sizes can now be configured with the attributes
filesize.max.win and filesize.max.other
the default maximum file size for non-windows OS is now 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 06:59:21 +00:00
orbiter
27eb8d62cb - new development cycle
- removed temporary configuration with safe setting for indexer threads (=1) and replaced it with best value computed during performance tests (1/2 of number of processors)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-18 21:20:06 +00:00
lotus
13fb84ab81 you can define your default number of search results displayed by search.items
this applies only to requests through the classic-style page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-15 14:48:34 +00:00
orbiter
a642d6a7b5 - added navigation icons for search result pages
- modified result page rendering to use new icons instead of numbers
- set different default values in yacy.init for higher indexing performance; removed pro-values
- modified WatchCrawler to accept 30000 PPM instead of only a maximum of 6000 PPM

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-14 23:11:10 +00:00
lotus
bad7ce9286 experimental option trayIcon.force for unsupported platforms. java 1.6 needed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-09 18:35:02 +00:00
f1ori
bde88b684a * splitt off yacyRelease from yacyVersion
* added some gui infos about signatures


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 12:12:22 +00:00
orbiter
d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
- increased default memory size to 180MB
- fixed possible bug in http client reset (there was a deadlock)
- bug in BOBHeap marked, but not solved, cause is still unknown.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 01:40:03 +00:00
orbiter
c10c257255 attempt to fix a deadlock situation where the IODispatcher did not work.
I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data.
This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted.
Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 11:55:39 +00:00
orbiter
54773ad4d4 added release keys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 22:46:42 +00:00