Commit Graph

1819 Commits

Author SHA1 Message Date
orbiter
733385cdd7 enahnced database access times by removal of unnecessary synchronization.
added also more hacks that resulted from high-volum query testing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-10 23:02:42 +00:00
orbiter
2c5554c912 small enhancements in search result computation speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-09 15:22:23 +00:00
orbiter
e0b3984805 added navigation keys for site and author facets to remote search interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6038 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-09 09:07:52 +00:00
orbiter
27fa6a66ad - completed the author navigation
- removed some unused variables

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-08 23:30:12 +00:00
orbiter
a9a8b8d161 - added display of author navigation (usage of that navigator not yet implemented
- added a synchronization in pdf parser which should help to avoid deadlocks that occur when displaying several search results pointing to pdf sources
- fixed smaller bugs in navigation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-08 22:01:26 +00:00
orbiter
c879783008 added steering of navigator computation:
- by default the navigator computation if off for servlet yacysearch.html, but:
- the servlet is called by default with a option to switch navigator results on
this will prevent that metasearch users will get slow results that are caused by unnecessary computations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6035 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-07 22:51:15 +00:00
orbiter
c079b18ee7 - refactoring of IntegerHandleIndex and LongHandleIndex: both classes had been merged into the new HandleMap class, which handles (key<byte[]>,n-byte-long) pairs with arbitraty key and value length. This will be useful to get a memory-enhanced/minimized database table indexing.
- added a analysis method that counts bytes that could be saved in case the new HandleMap can be applied in the most efficient way. Look for the log messages beginning with "HeapReader saturation": in most cases we could save about 30% RAM!
- removed the old FlexTable database structure. It was not used any more.
- removed memory statistics in PerformanceMemory about flex tables and node caches (node caches were used by Tree Tables, which are also not used any more)
- add a stub for a steering of navigation functions. That should help to switch off naviagtion computation in cases where it is not demanded by a client

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6034 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-07 21:48:01 +00:00
orbiter
bead0006da replaced tmp file extensions by prt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 18:09:58 +00:00
orbiter
95e8cbd1c3 new fully redesigned balancer and bugfixes regarding lost profile handles and killed crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 01:56:31 +00:00
orbiter
42ae40b9f6 some bugfixes to database close() methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 22:43:46 +00:00
orbiter
a0c53abbe1 - wait until local results are computed during search, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2167&hilit=&p=15521#p15521
- show only x+1 pages in page navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 20:58:47 +00:00
orbiter
1c77db670f re-designed response format for navigation:
- changed json and rss response templates


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 10:54:49 +00:00
orbiter
15fad767c0 some refactoring of topic generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-03 23:49:06 +00:00
orbiter
cc49aedf12 - fixed problem with remote search NPE
- more abstraction for search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-03 08:49:54 +00:00
orbiter
ab06a6edd2 renamed topwords to topics and enhanced computation methods of topics
topics will now only be computed using the document title, not the document url,
because the host navigator is now responsible for statistical effects of urls.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-02 15:20:10 +00:00
orbiter
a5d481eab1 enhanced navigation
- fixed too early computation of navigation
- moved navigation rendering to yacysearchtrailer
- added more asserts

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-01 22:45:28 +00:00
orbiter
1c69d9b8b6 more refactoring of the index classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 14:16:41 +00:00
orbiter
4d4315687f fix for problem with concurrency in host navigator, bug reported by wsb
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 10:52:50 +00:00
orbiter
88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 10:03:35 +00:00
orbiter
99bf0b8e41 refactoring of plasmaWordIndex:
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment

The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 14:26:05 +00:00
orbiter
fec6f9054f some refactoring of search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 23:51:34 +00:00
orbiter
3d4b826ca5 migration of all databases that use the deprecated BLOBTree format into the BLOBHeap format. Old databases are migrated automatically.
This removes the last very IO-intensive data structures which were still used for Wiki, Blog and Bookmarks. Old database files will still remain in the DATA subdirectory but can be deleted manually if no major bugs appear during migration. There is no need for any user action, all migration is done automatically.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 15:04:04 +00:00
orbiter
4b4bddca00 added new submenu to crawler menu: import of phpbb3 forum postings from mysql
- yacy can import phpbb3 posts without crawling
- all data is written as surrogate
- indexed surrogate files can be re-used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 14:53:23 +00:00
orbiter
d8284046b0 enhanced speed of site navigation computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 22:30:20 +00:00
orbiter
63a0255166 - refactoring: added new content package, which will contain connector classes for different types of data sources to import texts into the YaCy index
- refactoring: migrated data objects for the new connector classes
- added a DAO interface class to specify an abstract interface for database retrieval connector methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 07:44:22 +00:00
orbiter
f246928c20 first attempt to add 'real' Navigation to yacy search results: host navigation
- after a search is started, it is analysed how many hits are in each site
- this can be done really efficient, because the navigation information is hidden in the url hash and can be computed very fast
- the search result shows a column on the right with the hosts and the hits per host
- after a click on a host the search is modified using the efficient site: - operator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 22:27:34 +00:00
orbiter
26a46b5521 increased default maximum file size for database files to 2GB
Other file sizes can now be configured with the attributes
filesize.max.win and filesize.max.other
the default maximum file size for non-windows OS is now 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 06:59:21 +00:00
lotus
734680dc70 initialize the ResourceObsever in own thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 08:30:34 +00:00
orbiter
a7e392f31b The collection index will not be supported any more.
Existing indexes based on the old index collections must be migrated with YaCy 0.8
- removed index collection classes and all migration tools
- added a 'incell' reference collection feature in URL analysis


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-20 14:51:26 +00:00
lotus
47fd226bdb proper parsing of sentences
does not affect tokens/words

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-19 16:41:27 +00:00
orbiter
27eb8d62cb - new development cycle
- removed temporary configuration with safe setting for indexer threads (=1) and replaced it with best value computed during performance tests (1/2 of number of processors)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-18 21:20:06 +00:00
orbiter
bffbe43e09 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14522#p14522
fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-17 21:15:06 +00:00
lotus
13fb84ab81 you can define your default number of search results displayed by search.items
this applies only to requests through the classic-style page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-15 14:48:34 +00:00
orbiter
a49edd9415 fix for bug in search with site: constraint
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-11 21:20:23 +00:00
orbiter
8ee3a94e82 fix for non-caching of sitehash, see http://forum.yacy-websuche.de/viewtopic.php?p=14440#p14440
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 11:44:17 +00:00
lotus
bad7ce9286 experimental option trayIcon.force for unsupported platforms. java 1.6 needed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-09 18:35:02 +00:00
low012
d164b42604 *) cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 19:26:36 +00:00
orbiter
17150b2950 fixed bug in snippet computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 15:26:32 +00:00
orbiter
89aeb318d3 enhanced the wikimedia dump import process
enhanced the wiki parser and condenser speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 10:36:13 +00:00
orbiter
5fb77116c6 added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
http://localhost:8080/IndexImportWikimedia_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 07:54:10 +00:00
orbiter
c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 06:31:35 +00:00
orbiter
083533e5ec fix for bugs in IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 21:37:59 +00:00
orbiter
21fbca0410 better scaling of HEAP dump writer for small memory configurations;
should prevent OOMs during cache dumps

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 08:29:44 +00:00
orbiter
6e0b57284d better care for states of the IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-03 22:54:47 +00:00
f1ori
bde88b684a * splitt off yacyRelease from yacyVersion
* added some gui infos about signatures


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 12:12:22 +00:00
orbiter
057ce14c8e more fixes (character encoding, parser exceptions, http client failure, blob writing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 07:43:03 +00:00
orbiter
e88a66bcae temporary disabling computation of all sublinks (check needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 07:30:53 +00:00
orbiter
eacf95213a fix for crawling of mailto-links
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-01 07:25:55 +00:00
orbiter
3a64c9d02f - fix for problem with concurrency when computing word hashes
- fix for search in case that a urlfilter was used and zero results were returned

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 22:14:12 +00:00
orbiter
d3f8aa5a2a set of small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-29 21:36:20 +00:00