Commit Graph

239 Commits

Author SHA1 Message Date
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
7150b463ff changed handling of default values and database paths:
- the default files yacy.init and for the network definition is now moved to the path defaults
- the httpProxy.conf is renamed to yacy.conf
- the DATA/INDEX/PUBLIC is renamed to the actual network nickname, which should be freeworld or sciencenet
more menu entries
- added apfelmaennchens alternative search page to the menu
- added the new thread dump page to the server log menu point as submenu
modifications
- modified the thread dump page: sorting by thread type

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-16 22:31:54 +00:00
orbiter
9c989fe5f7 fixed deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4562 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 00:49:16 +00:00
orbiter
1dce2f1079 more multithreading support:
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
  the very basic kelondroRowCollection which supports sorting with a second thread
  in case that a double-core processing CPU is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 15:16:47 +00:00
orbiter
f4c73d8c68 - fixed highslide usage
- some enhancement to index management, better types

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 14:13:35 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
ff6b69b37e fix for NPE in access tracker
fix for NPE in word index


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 21:47:27 +00:00
orbiter
42c1e11f2b added another link double-check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 12:40:40 +00:00
orbiter
a1e9e6e2e6 fix for search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:23:04 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
fa3b8f0ae1 fixed bug in remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 00:15:43 +00:00
orbiter
2485681002 added termination control for RotateIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 11:44:27 +00:00
orbiter
efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
- refactoring of plasmaParserDocument to use Dublin Core - compatible property names
- redesign of url handling in parser and condenser (less String-to-yacyURL conversion)
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 11:51:43 +00:00
orbiter
f4e9ff6ce9 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 00:40:19 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
9d8b17188a more generics, bugfixes for wrong cast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 03:39:36 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
orbiter
c48b73cda2 redesign of ranking data structure
- the index administration now uses the same code base for url selection and collection
  as the search interface. The index administration is therefore a good test environment for
  ranking order control
- removed old postsorting-algorithms, will be replaced with new one
- fixed many bugs occurred before during ranking; especially the contraint filtering method
  removed too many links
- fixed media search flags; had been attached to too many urls. The effect should be a better
  pre-sorting before media load within snippet fetch

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-21 23:14:57 +00:00
orbiter
6f1308da2f - some enhancements to IndexControlURLs (shows more links, connects referrer to another query)
- some refactoring to search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-17 01:53:02 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
55da871211 preparations for better ranking: better debugging of index properties
to do this, the index administration interface was extended.
It is now possible to select parts of a index.
See properties shown in interface after a word search for details.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-15 03:03:18 +00:00
orbiter
0abf33ed03 - tried to remove deadlock
- enhanced searchtime in kelondroRowSets
- enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 01:14:51 +00:00
orbiter
ecba35de72 enhanced computing speed of kelondro core function: sorting
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.

For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 00:51:38 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
hermens
8f9d65da67 Small corrections to dhtFlushControl()
- Test wCacheMaxChunk against maxURLinCache(), not getMaxWordCount(). This triggered a flush everytime dhtFlushControl() was called.
- If triggered, flush at least 1 entry.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 14:21:58 +00:00
orbiter
55c87b3b12 changed behavior of crawl stacker
- final flush only when tabletype = RAM
- prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100
- number of maximun entries in stacker is configurable in yacy.init (stacker.slots)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 11:32:40 +00:00
hermens
d732840f8a Avoid ConcurrentModificationException when accessing the PerformanceQueues page while yacy is indexing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 23:36:40 +00:00
fuchsi
35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 21:47:07 +00:00
orbiter
143fa40d77 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=394&p=2382#p2382
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:34:16 +00:00
orbiter
842308ea97 - redesigned crawl start menu, integrated monitoring pages
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown

attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:21:31 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
4779f314fe first version of next-generation search interface:
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-03 23:43:55 +00:00
orbiter
e332b844b2 - enhanced remote search: during waiting time for remote crawls
some urls are fetched so the url cache can be filled with these urls
- the url-prefetch is used to sort out some unresolved urls
- the snippet-fetcher is triggered with the search event id. This is used
  to remove missing snippets from the search cache so they will not be displayed again


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-26 18:18:35 +00:00
orbiter
a34d9b8609 * added a search history cache that maintains search results for 10 minutes
it is necessary for the new search process that will do automatic re-searches
a positive effect is, that when a re-search is done it can be monitored how many
results had been contributed from other peers. The message for this contribution
was moved from the end of the result page to the top.
* enhanced re-search time when a global search was done an the local index has
already a great number of results for this word
* re-organised presearch computation; must be further enhanced

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 23:12:59 +00:00
orbiter
72752bb503 because of a new database structure handling, the memory need for accessing
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
   try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:36:43 +00:00
orbiter
947fc46904 refactoring of search process:
- re-designed remote request result processing
- re-designed local result accumulation, will be further enhanced with snippet fetcher
- removed search process handling in switchboad
- made snippet class static (there is no need for multiple snippet objects)
- removed some redundant tasks in server-side search process, should be a little bit faster now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 11:36:59 +00:00
orbiter
5605887571 refactoring of search processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-05 23:57:25 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
michitux
a695c93662 Fix of the Status-page for IE:
- reverted revision 3982 and 3979
- added a real fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=163


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-17 22:06:06 +00:00
orbiter
b28e5d0ee9 protection against wrong word hash length
see http://www.yacy-forum.de/viewtopic.php?p=35657#35657

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-14 10:00:23 +00:00
orbiter
b9add5cf37 some bugfixes:
- dht iterator start point
- wordIndex synchronization
- surftipps url check

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 14:20:43 +00:00
orbiter
47e90f31b2 fix for deadlock in plasmaWordIndex.addPageIndex
synchronization for class method not necessary
see also: http://www.yacy-forum.de/viewtopic.php?p=34959#34959

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 22:30:09 +00:00
orbiter
62c947b4aa next try to fix deadlock in plasmaWordIndex
see also:
http://www.yacy-forum.de/viewtopic.php?p=34821#34821

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3607 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-27 12:26:36 +00:00
orbiter
fa012789b2 tried to fix a deadlock problem durin shutdown
see also:
http://www.yacy-forum.de/viewtopic.php?p=34753#34753

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3601 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 15:15:40 +00:00
orbiter
063063aa0c fix for 100% cpu bug during dht selection
see also: http://www.yacy-forum.de/viewtopic.php?p=34068#34068

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-13 13:40:19 +00:00
orbiter
159bd0cab5 diverses; b.o. fix for http://www.yacy-forum.de/viewtopic.php?p=33914#33914
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 14:58:29 +00:00
orbiter
ba2c307ab3 optimized memory allocation in kelondroRow.Entry
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 12:10:12 +00:00