Commit Graph

3674 Commits

Author SHA1 Message Date
orbiter
876746602d catch problems of file hash computation, see also:
http://forum.yacy-websuche.de/viewtopic.php?p=15245#p15245

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 10:08:36 +00:00
orbiter
fec6f9054f some refactoring of search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 23:51:34 +00:00
orbiter
3d4b826ca5 migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
This removes the last very IO-intensive data structures which were still used for Wiki, Blog and Bookmarks. Old database files will still remain in the DATA subdirectory but can be deleted manually if no major bugs appear during migration. There is no need for any user action, all migration is done automatically.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 15:04:04 +00:00
orbiter
4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
- yacy can import phpbb3 posts without crawling
- all data is written as surrogate
- indexed surrogate files can be re-used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 14:53:23 +00:00
orbiter
d8284046b0 enhanced speed of site navigation computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 22:30:20 +00:00
orbiter
c72a5cf326 added stub for PHPBB3 extraction code using direct access to mySQL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 15:58:49 +00:00
orbiter
e735d3a69f fix for http://forum.yacy-websuche.de/viewtopic.php?p=15175#p15175
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 15:03:50 +00:00
orbiter
63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
- refactoring: migrated data objects for the new connector classes
- added a DAO interface class to specify an abstract interface for database retrieval connector methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 07:44:22 +00:00
orbiter
f246928c20 first attempt to add 'real' Navigation to yacy search results: host navigation
- after a search is started, it is analysed how many hits are in each site
- this can be done really efficient, because the navigation information is hidden in the url hash and can be computed very fast
- the search result shows a column on the right with the hosts and the hits per host
- after a click on a host the search is modified using the efficient site: - operator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 22:27:34 +00:00
orbiter
54b9e99c01 - more information about peer tags
- peer tag is by default '*'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5975 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 21:43:33 +00:00
orbiter
26a46b5521 increased default maximum file size for database files to 2GB
Other file sizes can now be configured with the attributes
filesize.max.win and filesize.max.other
the default maximum file size for non-windows OS is now 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 06:59:21 +00:00
orbiter
addecdb18c simplified code, removed one unused method in all implementing classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 23:53:01 +00:00
borg-0300
47fce9020c small change (Orbiter's wish)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 17:51:52 +00:00
borg-0300
e07b14e5d7 finally a working fix for 5960
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 16:07:04 +00:00
borg-0300
3ebb904d2c fix for 5960, http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2119
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 11:47:57 +00:00
lotus
734680dc70 initialize the ResourceObsever in own thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 08:30:34 +00:00
orbiter
e005cfea37 fix for bug in -incell option of URLAnalysis
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 06:57:03 +00:00
orbiter
a7e392f31b The collection index will not be supported any more.
Existing indexes based on the old index collections must be migrated with YaCy 0.8
- removed index collection classes and all migration tools
- added a 'incell' reference collection feature in URL analysis


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-20 14:51:26 +00:00
orbiter
a2f48863fc - added prototype for navigation index
- refactoring of word index prototype
(no functional changes so far)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-20 09:00:24 +00:00
lotus
47fd226bdb proper parsing of sentences
does not affect tokens/words

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-19 16:41:27 +00:00
orbiter
27eb8d62cb - new development cycle
- removed temporary configuration with safe setting for indexer threads (=1) and replaced it with best value computed during performance tests (1/2 of number of processors)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-18 21:20:06 +00:00
orbiter
b7457d3807 patch for http://forum.yacy-websuche.de/viewtopic.php?p=14720#p14720
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5960 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-17 21:44:02 +00:00
orbiter
bffbe43e09 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14522#p14522
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-17 21:15:06 +00:00
orbiter
f133d6065c fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-17 18:28:33 +00:00
lotus
82af994041 added missing loglevel
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-16 08:21:51 +00:00
orbiter
ad9762746d no exception in case of uniq() time-out, see also
http://forum.yacy-websuche.de/viewtopic.php?p=13177#p13177

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-15 23:07:10 +00:00
orbiter
1efe686e3f fix for http://forum.yacy-websuche.de/viewtopic.php?p=13960#p13960
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-15 22:51:15 +00:00
lotus
13fb84ab81 you can define your default number of search results displayed by search.items
this applies only to requests through the classic-style page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-15 14:48:34 +00:00
orbiter
f2e4d156e8 removed debug messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-12 22:03:33 +00:00
orbiter
709bfc2cd4 added a memory check in http post protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-12 20:23:55 +00:00
orbiter
c01d6f43e1 - fixed problem with thread dump if no arguments are given
- rejecting peers that are older than 6 hours (not-seen during 6 hours)
- 0.78, targeting 0.8 at the end of the week

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-11 22:26:21 +00:00
orbiter
a49edd9415 fix for bug in search with site: constraint
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-11 21:20:23 +00:00
orbiter
c1e5fad9a7 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14767#p14767
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 20:50:46 +00:00
orbiter
8ee3a94e82 fix for non-caching of sitehash, see http://forum.yacy-websuche.de/viewtopic.php?p=14440#p14440
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 11:44:17 +00:00
borg-0300
21930d05ed fix for [B@...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 10:54:06 +00:00
orbiter
b6ba387e01 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14751#p14751
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 10:46:26 +00:00
orbiter
4338dcf936 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2093&hilit=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-09 19:07:34 +00:00
lotus
bad7ce9286 experimental option trayIcon.force for unsupported platforms. java 1.6 needed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-09 18:35:02 +00:00
low012
ea27853c59 *) some refactoring
*) added one assertion
*) no functional changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-09 13:43:38 +00:00
low012
d164b42604 *) cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 19:26:36 +00:00
orbiter
17150b2950 fixed bug in snippet computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 15:26:32 +00:00
orbiter
89aeb318d3 enhanced the wikimedia dump import process
enhanced the wiki parser and condenser speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 10:36:13 +00:00
orbiter
5fb77116c6 added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
http://localhost:8080/IndexImportWikimedia_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 07:54:10 +00:00
hermens
df733af4fa Try not to loose content from ram during IndexCell.delete by moving ram.delete after the dangerous operations on the array (array.get and array.delete)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-06 12:53:17 +00:00
hermens
ac72005f2f Let IndexCell.remove remove entries from the ram portion of the DB as well.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-06 12:32:34 +00:00
orbiter
8ba7ff5353 a fix and another speed enhancement for the RWI cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5927 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 22:40:40 +00:00
orbiter
05f077e85f added stack trace output to solve problem in
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2076&hilit=&p=14612#p14612

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 20:24:20 +00:00
orbiter
71a4cadf31 better and more performant synchronization in SimpleARC, the caching object for word hashes. Speeds up indexing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 20:19:51 +00:00
orbiter
e6773cbb33 better handling of RWI cache for concurrency and less overhead when writing new entries -> even more indexing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 20:08:23 +00:00
orbiter
c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 06:31:35 +00:00
orbiter
083533e5ec fix for bugs in IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 21:37:59 +00:00
orbiter
21fbca0410 better scaling of HEAP dump writer for small memory configurations;
should prevent OOMs during cache dumps

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 08:29:44 +00:00
orbiter
6e0b57284d better care for states of the IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-03 22:54:47 +00:00
orbiter
1db9cdd4e4 fixed bug in writing of robots.txt entries in case that host names exceeded 64 characters and some other problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-03 19:35:10 +00:00
f1ori
bde88b684a * splitt off yacyRelease from yacyVersion
* added some gui infos about signatures


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 12:12:22 +00:00
orbiter
057ce14c8e more fixes (character encoding, parser exceptions, http client failure, blob writing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 07:43:03 +00:00
orbiter
d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
- increased default memory size to 180MB
- fixed possible bug in http client reset (there was a deadlock)
- bug in BOBHeap marked, but not solved, cause is still unknown.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 01:40:03 +00:00
lotus
1351d903a1 don't follow links like mailto:
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 08:53:50 +00:00
orbiter
e88a66bcae temporary disabling computation of all sublinks (check needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 07:30:53 +00:00
low012
ff5f82d780 *) removed description of removed commands from wikiHelp ([= =])
*) used format function of Netbeans for wikiCode to make it more readable, no functional changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 07:28:59 +00:00
orbiter
eacf95213a fix for crawling of mailto-links
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 07:25:55 +00:00
orbiter
9c6ac43f66 fixes for wiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-30 22:03:35 +00:00
orbiter
3a64c9d02f - fix for problem with concurrency when computing word hashes
- fix for search in case that a urlfilter was used and zero results were returned

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 22:14:12 +00:00
orbiter
d3f8aa5a2a set of small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 21:36:20 +00:00
low012
78ffb61297 *) got rid of unnecessary variable which might also fix IndexOutOfBoundsException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 21:08:44 +00:00
orbiter
d31e6f9c14 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14457#p14457
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5899 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 11:18:17 +00:00
orbiter
8d6212233b fix for IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-28 07:24:28 +00:00
orbiter
f678472f46 fix for quote problem in json output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 22:27:02 +00:00
orbiter
d079d6dfdb small changes in surrogate reader, wiki code and portal test
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 20:30:43 +00:00
orbiter
07f09742bb set of small fixes and comments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 15:29:50 +00:00
borg-0300
06ed4ef7b3 * better picture handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 11:19:15 +00:00
orbiter
5a634cab23 removed generation of anchor link sets in document types that describe container formats.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 08:46:11 +00:00
low012
f1244264b8 *) hopefully fixed bug reported in http://forum.yacy-websuche.de/viewtopic.php?t=2057
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-26 16:18:14 +00:00
orbiter
2e3186189b fix for mediawikiIndex surrogate producer + added concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:52:21 +00:00
apfelmaennchen
6f5ea7b1a8 small fix for previous post
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:28:08 +00:00
apfelmaennchen
138a0747e3 added serverObjects.putJSON as JSON has very particulare encoding requirements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 20:56:29 +00:00
orbiter
d977dd9a96 fix for surrogate loader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 22:54:40 +00:00
orbiter
9cb68353da fix for bug in ProfilingGraph for ppm >> 10000 ppm (!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 13:18:20 +00:00
orbiter
9e4db75aac reduced internal logging and reduced memory that internal logging can use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5867 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 12:09:04 +00:00
orbiter
c10c257255 attempt to fix a deadlock situation where the IODispatcher did not work.
I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data.
This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted.
Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 11:55:39 +00:00
orbiter
09987e93fd fixed some more bad handling of byte[]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 22:02:12 +00:00
orbiter
1bcc1450cb more explaining error message in case of IOExceptions during html parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 21:18:01 +00:00
orbiter
fe51f4d668 less synchronization may help to prevent deadlocks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 20:54:13 +00:00
orbiter
58802e4201 added missing success test in storeDocumentIndex,
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1922&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 20:38:56 +00:00
orbiter
171e62bee5 addition to the fix from last commit (which did not work)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 16:36:21 +00:00
orbiter
059949a0d1 tried to fix problem with snippet fetch for second search page when verify=false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 15:29:30 +00:00
lotus
b08991e278 moved some constants, rename of Tray class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 13:18:59 +00:00
orbiter
138422990a - removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
- added some debugging output to balancer to find a bug
- removed unused classes for index collection handling
- changed some default values for the process handling: more memory needed to prevent OOM

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 22:39:12 +00:00
orbiter
1b9e532c87 some concurrency for wikipedia dump reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 17:43:27 +00:00
lotus
25d2160288 small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 13:19:37 +00:00
orbiter
16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
orbiter
0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 15:19:32 +00:00
orbiter
5195c94838 two patches for performance enhancements of the index handover process from documents to the index cache:
- one word prototype is generated for each document, that is re-used when a specific word is stored.
- the index cache uses now ByteArray objects to reference to the RWI instead of byte[]. This enhances access to the the map that stores the cache. To dump the cache to the FS, the content must be sorted, but sorting takes less time than maintenance of a sorted map during caching.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 14:23:04 +00:00
orbiter
9416f5c26f more speed test cases: kelondro provides map functions that are more than 20% faster than standard java classes and use less than halve of the memory of java classes:
just start IndexTest (here with 1000000 test objects)

Performance test: comparing HashMap, TreeMap and kelondroRow
generated 1000000 test data entries 


STANDARD JAVA CLASS MAPS 

sorted map
time   for TreeMap<byte[]> generation: 2110
time   for TreeMap<byte[]> test: 2516, 0 bugs
memory for TreeMap<byte[]>: 29 MB

unsorted map
time   for HashMap<String> generation: 1157
time   for HashMap<String> test: 1516, 0 bugs
memory for HashMap<String>: 61 MB


KELONDRO-ENHANCED MAPS 

sorted map
time   for kelondroMap<byte[]> generation: 1781
time   for kelondroMap<byte[]> test: 2452, 0 bugs
memory for kelondroMap<byte[]>: 15 MB

unsorted map
time   for HashMap<ByteArray> generation: 828
time   for HashMap<ByteArray> test: 953, 0 bugs
memory for HashMap<ByteArray>: 9 MB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 09:29:08 +00:00
orbiter
b53790abb1 more performance hacks: 10% more speed for Base64.compare() which is really often used in YaCy code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 07:39:21 +00:00
orbiter
8ffb9889e1 some fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 23:01:44 +00:00
orbiter
dfb96ecb72 more fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 22:08:38 +00:00
orbiter
1b8d346b4c fixes in connection with transiton to byte[] hashes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 21:54:00 +00:00
f1ori
0b0a46d35a * fix transferRWI as suggested by celle (thanks!)
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2000#p14023


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 19:51:20 +00:00
orbiter
996572de95 quickfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 16:11:35 +00:00