Commit Graph

5074 Commits

Author SHA1 Message Date
lotus
18513e2ee2 npe fix: http://forum.yacy-websuche.de/viewtopic.php?t=1646
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-16 13:36:13 +00:00
orbiter
2802138787 - refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck)
- fix of shallBeOwnWord target computation heuristic


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-15 00:02:58 +00:00
lotus
b1e211b258 no error-alert: http://forum.yacy-websuche.de/viewtopic.php?t=1639
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-13 12:04:08 +00:00
orbiter
13cb0916ee changes to statistics and content of thread dump servlet
(points now more directly to performance leaks without mentioning class calls inside of sun/java calls that cannot be changed anyway)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5390 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 20:13:14 +00:00
orbiter
db6b3bf5a3 speed enhancement for integrated http server:
- tuning hacks in template engine
- bypassing the template engine if no servlet present

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 20:10:37 +00:00
orbiter
7cd08bd5fb fix for NPE in BLOBCompressor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 13:33:24 +00:00
orbiter
5b94498643 fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 14:35:01 +00:00
orbiter
1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile
- added a write buffer to BLOBHeap
- modified the BLOBBuffer (is now only to buffer non-compressed content)
- added content compression to the HTCache
The new read cache will decrease the start/initialization time of BLOB files,
like the HTCache, RobotsTxt and other BLOBHeap structures.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 11:15:19 +00:00
orbiter
e1acdb952c fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-08 00:17:45 +00:00
lotus
2c682d649b - no stop shortcut (-> stop via tray)
- store registry keys on current profile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-07 19:37:49 +00:00
lotus
e918d64c23 show hand-cursor an labels
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-06 17:32:53 +00:00
orbiter
4a2dac659e more speed hacks:
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 13:55:48 +00:00
lotus
07d7653de1 update to JRE 6u11
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 11:23:01 +00:00
lotus
1fb518a5b4 display <String> etc.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 20:21:53 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
759cef23dd fix for bug in kelondroAbstractRA.readFully
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:32:07 +00:00
orbiter
bd1dc9cd5d thread dump with statistics, a little bit of profiling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:26:25 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
lotus
5280ad638d added basic performance page
other performance settings can be found on advanced settings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 14:10:01 +00:00
lotus
1a51d9fcfd display proper values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-02 17:57:30 +00:00
orbiter
0b4808ba3d added new interactive search feature:
- during the user types search queries, the local database is searched
- results are presented interactively

This was implemented using a new JSON result format for search results in YaCy
- added JSON as file format for servlets
- refactoring of current search servlets (xml and html)
- added JSON output format for search results
- added AJAX-based search page, that uses the yacysearch.json selrvlet to print results as a query is typed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-02 15:24:25 +00:00
orbiter
74a3d86114 fixed a error response that might present classified information
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:14:42 +00:00
orbiter
c6525ab75f fix for NPE in seed handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:08:27 +00:00
lotus
fea82b54ef more contrast on search snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:57:13 +00:00
lotus
1951d30a62 addendum to last commit
handle words with length < 3 correctly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:43:40 +00:00
lotus
325ba7bfb8 only query words with length > 2
this is not complete, yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 16:41:38 +00:00
lotus
489edb4473 improved pattern selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 10:06:38 +00:00
low012
e423fa9846 *) added method to only get file names in directory listing which match a filter
*) only files which end with .black will be listed as blacklists
*) added a little bit of Javadoc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-25 20:26:06 +00:00
lotus
577b53aee6 added more search engines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-24 13:05:20 +00:00
lotus
7f4d411c0d npe-fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-24 13:04:57 +00:00
orbiter
513179f404 changed interface to colletctionIndex and adopted all implementing classes:
do not return a result of a double-check when adding entries with addUnique

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 23:55:08 +00:00
orbiter
9d64693cfb reverting again the changes to new concurrent chunkIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 22:22:44 +00:00
orbiter
45ad1c3dd5 - re-activated concurrent iterator for EcoFiles
- added javadoc for new concurrent intialization in kelondroBytesLongMap
- switched default value for commons storage to false
- version step

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 18:25:40 +00:00
orbiter
2e2120046f speed enhancement for BLOBHeap opening process
using concurrency of FileIO and content processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 17:38:01 +00:00
lotus
1545e5440a * index deletion: checkbox-confirmation
* watch crawler: less load on exhausted peers; wait for data before reloading again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 12:02:58 +00:00
orbiter
fa26a8f25a fix for deadlock-like behavior in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 11:25:01 +00:00
orbiter
1918a0173e added more exception handling during crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:40:18 +00:00
orbiter
10f5ec1040 reverted last commit (more testing needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:12:50 +00:00
f1ori
5af8923f37 * distribute forgotten jar-file in parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:05:04 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
daburna
ba5b274b8c #translation update:
-blacklist
-crawlstart
...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 16:45:45 +00:00
orbiter
0ca4bc7b79 - added reader and visualization for mediawiki-export files:
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.

- enhanced also the md5 computation speed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 18:31:52 +00:00
danielr
2e63f03ca5 copy&paste vergessen :/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:41:11 +00:00
danielr
cd8082b4e3 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:18:19 +00:00
lotus
4f996a7651 fix for logparser pattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-17 16:23:17 +00:00
f1ori
d18c18971e * dirlisting in UTF-8 encoding
* fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550&hilit=#p11108


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 20:49:03 +00:00
lotus
bb570716e6 added more testfiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 09:00:24 +00:00
orbiter
867d0f2f56 removed some unnecessary pause delays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 23:36:33 +00:00
f1ori
d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092
  and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 20:49:16 +00:00
orbiter
8c96bc2ac1 do not use proxy caching rules for crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 16:31:04 +00:00