Commit Graph

1290 Commits

Author SHA1 Message Date
hermens
d732840f8a Avoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 23:36:40 +00:00
fuchsi
35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 21:47:07 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
fuchsi
ca83f5a8d9 Add external lib FontBox which is part of the PDFBox (they extracted the font handling code into this package in 0.7.3).
Add the packages to the eclipse .classpath.
Closes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=453

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4165 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-18 19:53:52 +00:00
fuchsi
e77aec8c9d fix handling of encrypted PDF-Documents (with default user password "")
- update PDFBox package to current version 0.7.3
- use new security model in PDFBox to "guess" wether we can decrypt a document or not
NOTE: When upgrading to this version make sure the old PDFBox-0.7.2.jar is removed from libx/

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-15 13:18:38 +00:00
low012
fdb0b861f8 *) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer
*) added network QPH
*) banner is cached for 1 second to avoid DOS
*) still no logo


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-09 21:47:37 +00:00
fuchsi
508de558f7 sbStackCrawlThread is null during first cleanProfiles() run at startup.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4152 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-08 15:56:40 +00:00
fuchsi
70614385ef Attempt to fix the "lost profile handle" bug.
It seems improbable, but it might happen, that during a crawl all queues (indexing, crawling, ...) except the crawl URL stacker ran empty. This commit adds an additional check for an empty crawl stacker queue before executing the profile cleaner.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-08 15:11:26 +00:00
low012
507ecd8afa *) added banner that can be displayed like this: http://localhost:8080/Banner.png
possible arguments: textcolor, bgcolor, bordercolor
   example: http://localhost:8000/Banner.png?textcolor=ffffff&bgcolor=121212&bordercolor=ffffff
   take care: YaCy uses CMY color model!
*) there are still some known bugs, but I can't continue coding right now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-07 21:59:36 +00:00
orbiter
33fb2f756d added emergency fail case in remote crawls
in extreme situations this will cause that no remote crawls are send out any more
this is bad, but it protects the case where failing remote crawls fill up the local queue too much,
which is even worse

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 10:40:30 +00:00
fuchsi
6b00fe0c4e fix ArrayIndexOutOfBoundsException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4139 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 08:50:33 +00:00
orbiter
3e60ae93b9 modified remote search snippet fetch behavior: do not fetch snippets for more than 300 milliseconds, even if the snippets can be found locally without online fetch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4137 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 16:42:11 +00:00
orbiter
97f1ca52bd fox for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=390
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:45:12 +00:00
orbiter
143fa40d77 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=394&p=2382#p2382
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:34:16 +00:00
orbiter
b19bb6e5b1 - reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours
- removed and added some debugging lines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 14:34:05 +00:00
fuchsi
03c5b4ad68 more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org
- RFC-822 date time had to include the time instead of date only
- <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1
- <link> elements are mandatory for <channel> and <item>

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 04:00:52 +00:00
orbiter
c1440d2241 fixed problem with redirection: redirected URLs had not been tested with the double-check
see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=348

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 22:40:53 +00:00
fuchsi
7404f2c35c Fix some of the issues with the RSS search interface, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=392
Note: the new DateFormatter822 in the plasmaSwitchboard is just a copy of the DateFormatter that always uses the US locale to allow formatting of a loocale independent date String.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 21:28:29 +00:00
orbiter
98abe0804d another enhancement to crawl starts with link files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 20:30:42 +00:00
orbiter
1b42152a76 fixed and enhanced some details in crawl start with file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 00:49:38 +00:00
orbiter
4465db7399 removed debug information from network grafic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:32:10 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
5b1a937ed8 fix for crawl stack database format change, introduced in SVN 4113
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4115 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 08:17:08 +00:00
orbiter
af25c98306 enhanced local search performance in case of a remote search:
there is no waiting until the local search terminates to show the result page.
the local search appear like all other results from remote peers using a separated thread.
This has especially a stron effect, if the local index for a specific word is large.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:36:22 +00:00
orbiter
842308ea97 - redesigned crawl start menu, integrated monitoring pages
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown

attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:21:31 +00:00
orbiter
341f7cb327 steps to enhance remote search performance:
- added a file size limitation, that disallows parsing of large documents during (offline-) remote search
- added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-26 10:11:50 +00:00
orbiter
2f1ff048ba some fixes to socket connection time-out
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 23:45:05 +00:00
orbiter
3c74014004 automatic deletion of dead client connections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 22:46:11 +00:00
orbiter
11b4f80bde - fixed non-closing client connections
- added client connection tracker in connections servelet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 21:36:08 +00:00
orbiter
1488769e1f cleanup of unmaintained and outdated performance methods:
removed object pools in httpc. Object pooling is not recommended,
if the creation of the object is not time-intensive. Object pools are only useful,
if there is much computation necessary to create some basic data that is stored
in the object pool and can be re-used. This does not apply to object pools in YaCy.
Object pooling of client sessions would make sense if they would allow re-use of
living connections to other yacy clients. But every connection is closed after usage
of an object in the client pool, therefore the YaCy server client objects are not such
that hold hardware/network-allocated entities.
See:
http://www.javaperformancetuning.com/news/qotm033.shtml
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling
http://docs.sun.com/source/816-7159-10/pt_chap5.html
http://www.microjava.com/articles/techtalk/recylcle2


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 20:49:52 +00:00
orbiter
4275727d69 fix for peer ping problem (implemented a 3-time re-ping); cause for 'Connection reset' still unknown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4095 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-12 00:42:53 +00:00
fuchsi
6601e37512 clear caches after changing blacklists, closes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=241&p=1964#p1964
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 08:15:25 +00:00
fuchsi
5b0c1449e1 various fixes and cleanups for blacklist handling:
1. avoid adding duplicate file name entries in config properties for lists, 
2. correctly merge all path masks from all list files for the same host masks,
3. rewrite helper methods standard java methods for Collection transformations,
4. merged various methods with identical functionality for different Collection implementations into one,
5. minor refactoring to improve code readability.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 06:20:27 +00:00
orbiter
841cf71022 fix for NPE in DHT transfer selection, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=327
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-09 19:08:13 +00:00
orbiter
f2a3434407 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=238&p=1341#p1341
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-09 17:31:29 +00:00
orbiter
f4a5c287fe re-implemented post-ranking of search results
(should enhanced search result quality)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-08 11:50:19 +00:00
orbiter
8ff5e2c283 - fixed/re-implemented media search
- fixed search tipps (topwords, now appearing at the bottom of the page)
- added search consequences execution (deletion of bad referenced some time after the search happened)
- added some formatting at network table

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-07 11:45:38 +00:00
orbiter
6c819a6fd9 added cache to favicon display
added better synchronization for simultanous search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4076 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-06 01:28:35 +00:00
borg-0300
d69013f66a added patch from Fuchs - http://forum.yacy-websuche.de/viewtopic.php?f=6&t=241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 11:51:02 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
e90afa9483 fixed search access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-04 09:04:47 +00:00
orbiter
4779f314fe first version of next-generation search interface:
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-03 23:43:55 +00:00
orbiter
f9e6cf6a3d more refactoring of search:
integrated first version of ssi-using search interface,
but the function is currently disabled


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-28 12:15:46 +00:00
orbiter
f81ef40cc4 no dht activity for small networks; this is not needed if the network is small
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 22:35:26 +00:00
orbiter
d9472b6a3a * fixed problem with watch crawler
* added new column to network table (remote crawl urls):
  the new value for provided URLs will be used for new remote crawl method


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4061 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 22:06:58 +00:00
orbiter
e332b844b2 - enhanced remote search: during waiting time for remote crawls
some urls are fetched so the url cache can be filled with these urls
- the url-prefetch is used to sort out some unresolved urls
- the snippet-fetcher is triggered with the search event id. This is used
  to remove missing snippets from the search cache so they will not be displayed again


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 18:18:35 +00:00
orbiter
a34d9b8609 * added a search history cache that maintains search results for 10 minutes
it is necessary for the new search process that will do automatic re-searches
a positive effect is, that when a re-search is done it can be monitored how many
results had been contributed from other peers. The message for this contribution
was moved from the end of the result page to the top.
* enhanced re-search time when a global search was done an the local index has
already a great number of results for this word
* re-organised presearch computation; must be further enhanced

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 23:12:59 +00:00
orbiter
ae86d010bb more refactoring of search processes; also some small speed enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 08:41:52 +00:00
orbiter
bb426565f0 added new yacy protocol for mass url-pull for better remote crawling distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-22 00:59:05 +00:00
orbiter
72752bb503 because of a new database structure handling, the memory need for accessing
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
   try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:36:43 +00:00