Commit Graph

5221 Commits

Author SHA1 Message Date
orbiter
13cb0916ee changes to statistics and content of thread dump servlet
(points now more directly to performance leaks without mentioning class calls inside of sun/java calls that cannot be changed anyway)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5390 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 20:13:14 +00:00
orbiter
db6b3bf5a3 speed enhancement for integrated http server:
- tuning hacks in template engine
- bypassing the template engine if no servlet present

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 20:10:37 +00:00
orbiter
7cd08bd5fb fix for NPE in BLOBCompressor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 13:33:24 +00:00
orbiter
5b94498643 fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 14:35:01 +00:00
orbiter
1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile
- added a write buffer to BLOBHeap
- modified the BLOBBuffer (is now only to buffer non-compressed content)
- added content compression to the HTCache
The new read cache will decrease the start/initialization time of BLOB files,
like the HTCache, RobotsTxt and other BLOBHeap structures.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 11:15:19 +00:00
orbiter
e1acdb952c fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-08 00:17:45 +00:00
lotus
2c682d649b - no stop shortcut (-> stop via tray)
- store registry keys on current profile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-07 19:37:49 +00:00
lotus
e918d64c23 show hand-cursor an labels
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-06 17:32:53 +00:00
orbiter
4a2dac659e more speed hacks:
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 13:55:48 +00:00
lotus
07d7653de1 update to JRE 6u11
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 11:23:01 +00:00
lotus
1fb518a5b4 display <String> etc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 20:21:53 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
759cef23dd fix for bug in kelondroAbstractRA.readFully
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:32:07 +00:00
orbiter
bd1dc9cd5d thread dump with statistics, a little bit of profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:26:25 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
lotus
5280ad638d added basic performance page
other performance settings can be found on advanced settings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 14:10:01 +00:00
lotus
1a51d9fcfd display proper values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-02 17:57:30 +00:00
orbiter
0b4808ba3d added new interactive search feature:
- during the user types search queries, the local database is searched
- results are presented interactively

This was implemented using a new JSON result format for search results in YaCy
- added JSON as file format for servlets
- refactoring of current search servlets (xml and html)
- added JSON output format for search results
- added AJAX-based search page, that uses the yacysearch.json selrvlet to print results as a query is typed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-02 15:24:25 +00:00
orbiter
74a3d86114 fixed a error response that might present classified information
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:14:42 +00:00
orbiter
c6525ab75f fix for NPE in seed handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:08:27 +00:00
lotus
fea82b54ef more contrast on search snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:57:13 +00:00
lotus
1951d30a62 addendum to last commit
handle words with length < 3 correctly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:43:40 +00:00
lotus
325ba7bfb8 only query words with length > 2
this is not complete, yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 16:41:38 +00:00
lotus
489edb4473 improved pattern selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 10:06:38 +00:00
low012
e423fa9846 *) added method to only get file names in directory listing which match a filter
*) only files which end with .black will be listed as blacklists
*) added a little bit of Javadoc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-25 20:26:06 +00:00
lotus
577b53aee6 added more search engines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-24 13:05:20 +00:00
lotus
7f4d411c0d npe-fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-24 13:04:57 +00:00
orbiter
513179f404 changed interface to colletctionIndex and adopted all implementing classes:
do not return a result of a double-check when adding entries with addUnique

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 23:55:08 +00:00
orbiter
9d64693cfb reverting again the changes to new concurrent chunkIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 22:22:44 +00:00
orbiter
45ad1c3dd5 - re-activated concurrent iterator for EcoFiles
- added javadoc for new concurrent intialization in kelondroBytesLongMap
- switched default value for commons storage to false
- version step

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 18:25:40 +00:00
orbiter
2e2120046f speed enhancement for BLOBHeap opening process
using concurrency of FileIO and content processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 17:38:01 +00:00
lotus
1545e5440a * index deletion: checkbox-confirmation
* watch crawler: less load on exhausted peers; wait for data before reloading again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 12:02:58 +00:00
orbiter
fa26a8f25a fix for deadlock-like behavior in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 11:25:01 +00:00
orbiter
1918a0173e added more exception handling during crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:40:18 +00:00
orbiter
10f5ec1040 reverted last commit (more testing needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:12:50 +00:00
f1ori
5af8923f37 * distribute forgotten jar-file in parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:05:04 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
daburna
ba5b274b8c #translation update:
-blacklist
-crawlstart
...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 16:45:45 +00:00
orbiter
0ca4bc7b79 - added reader and visualization for mediawiki-export files:
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.

- enhanced also the md5 computation speed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 18:31:52 +00:00
danielr
2e63f03ca5 copy&paste vergessen :/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:41:11 +00:00
danielr
cd8082b4e3 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:18:19 +00:00
lotus
4f996a7651 fix for logparser pattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-17 16:23:17 +00:00
f1ori
d18c18971e * dirlisting in UTF-8 encoding
* fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550&hilit=#p11108


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 20:49:03 +00:00
lotus
bb570716e6 added more testfiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 09:00:24 +00:00
orbiter
867d0f2f56 removed some unnecessary pause delays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 23:36:33 +00:00
f1ori
d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092
  and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 20:49:16 +00:00
orbiter
8c96bc2ac1 do not use proxy caching rules for crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 16:31:04 +00:00
lotus
fd83e59f8e new remote search average
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 11:50:46 +00:00
orbiter
dba7ef5144 extended crawling constraints:
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 09:58:56 +00:00
orbiter
96174b2b56 more debugging / better result status logging for parser/caching errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:41:43 +00:00