Commit Graph

19 Commits

Author SHA1 Message Date
orbiter
194da25a2f moved kelondro index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:32:08 +00:00
orbiter
4446acc8cd moved kelondro order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:22:22 +00:00
orbiter
f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
- moved here the logging classes as part of the new net.yacy.kelondro package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:13:30 +00:00
orbiter
735e2737e3 * added index segments
This is a major change in the organization of indexes.
Please consider a back-up of your data before you run this update.
All existing index files will be moved and renamed to a new position.
With this change, it will be possible to maintain different indexes for different purposes and it will be possible to have a distinction between DHT-in and DHT-out specific indexes. Tenants may also have their own index, and it may be possible to have histories and back-ups of indexes. This is just the beginning, many servlets must be adopted after this change, but all functions that had been there should still work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 14:44:20 +00:00
orbiter
6aa474f529 - better logging for web cache access and fail reasons
- better Exception handling for web cache access
- distinction between access of web cache for proxy and crawler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-01 13:08:19 +00:00
orbiter
3671c37989 added experimental oai-pmh reader and integrated it with the existing dublin core parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 22:11:00 +00:00
low012
f65bfaa9af *) Removed base tag from errror page. This has been added by myself a long time ago as a workaround for some weird behavior of my router, but as it turns out, it does more bad than good in general: If HTTPS is used for communication with YaCy, entering a wrong passwort led to an errror page with a form which would send username and password unencrypted with the user possibly being unaware of this.
*) changed some comments, added some annotations, added SVN properties here and there

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-23 21:26:14 +00:00
orbiter
3e9dcfc204 fix for http://forum.yacy-websuche.de/viewtopic.php?p=17504#p17504
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-22 14:39:06 +00:00
orbiter
44579fa06d - fixed a problem loading images through yacy's document loader,
this denied non-parseable documents which excluded all images
- fixed url of osm tile server

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-03 11:46:08 +00:00
orbiter
72e5407115 refactoring of snippet cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-27 14:34:41 +00:00
orbiter
8e56c2ace6 fix for fixes from this afternoon
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6253 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 22:53:49 +00:00
orbiter
c0e17de2fb - fixes for some problems with the new crawling/caching strategies
- speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer
- fixed some deadlock- and 100% CPU problems in the balancer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-25 21:38:57 +00:00
orbiter
634a01a9a4 replaced wget-requests with caching requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 14:52:27 +00:00
orbiter
161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
 * never use the cache
 * use the cache if entry exist
 * use the cache if the proxy freshness rule confirmes
 * use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-23 21:31:51 +00:00
orbiter
4da9042e8a code simplification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6233 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 21:59:29 +00:00
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
b332dfad67 - inserted request object into response object which carries this now instead generating new objects
- fixed a problem with the crawler introduced in SVN 6216

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 23:08:35 +00:00
orbiter
ca72ed7526 -removed superfluous crawl cache
-refactoring of crawler classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 21:07:46 +00:00