Commit Graph

336 Commits

Author SHA1 Message Date
orbiter
f1528672b1 filtering of non-index pages during index-of search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-24 02:46:02 +00:00
orbiter
0a0c3edeb6 fixed a bug in index transfer
- the encoding within the new entry format for binary data was wrong
- the string parser of RWI receive had to be enhanced

added some mor debugging tools
- a target peer for index transfer can now be selected by typing in the peer name
- the RWI result list has an entry counter

enhanced routing
- if communication is between two peers that have the same IP address,
  the loopback address 127.0.0.1 is used instead the public IP
  to contact the peer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3003 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-24 01:12:14 +00:00
orbiter
30888e7a2f implementation of search constraints
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.

In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-23 02:16:30 +00:00
orbiter
e55ef0df28 - automatic migration of old RWI entries to new format during remote search
if new collections are activated
- one more assert in RowSet, control of removeMarker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-20 22:55:27 +00:00
orbiter
10a4ab5195 disabled some (more) write caches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-20 00:27:02 +00:00
orbiter
09bcc10344 bugfix for some problems of last change with assortments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-19 23:10:58 +00:00
orbiter
e3d75f42bd final version of collection entry type definition
- the test phase of the new collection data structure is finished
- test data that had been generated is void. There will be no migration
- the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION
- the index dump is void. There will be no migration
- the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-19 20:05:25 +00:00
orbiter
c9364246cc introduced new RWI-Object.
This will be used for the final version of the collections.
The new object is not yet used.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-17 14:17:20 +00:00
orbiter
e628d34e16 patches for bad data
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-11 14:35:36 +00:00
orbiter
76fceb9997 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-09 16:32:34 +00:00
orbiter
bdc9216366 - more asserts
- some bugfixes
- some patches for bugs that are already in the database

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-08 02:08:33 +00:00
orbiter
1751a799ac - deactivated all write buffers
- fixed a storage bug


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-07 10:56:36 +00:00
orbiter
ba967c4875 - bugfixes and debug code
- ne generalized index class indexCachedRI

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-07 01:09:02 +00:00
orbiter
eaad91d84f fixed wrong RAM calculation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-06 15:53:42 +00:00
orbiter
114a76a86e - added flag to urlhash that shows that domain is a local domain
- enhanced local domain detection
- bugfixing for memory assignment in kelondroFlexSplit
- automatic memory assignment to caches according to available RAM
- bugfixes for details during search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-06 02:05:39 +00:00
orbiter
eafb5ecd22 - better usage of memory resources for kelondroFlexSplit
- kelondroFlexTables does always load a RAM cache if it has enough
  ram assigned. Othervise it creates a kelondroTree file-index.
  If more memory is re-assigned, the file-index is deleted again,
  and RAM is used. Beware that assignement of too less RAM forces
  creation of file indexes and start-up time may last for hours.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 21:30:53 +00:00
orbiter
d454ca44ee update of cache logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 14:48:21 +00:00
orbiter
8fdefd5c68 generalization of payload definition of index storage
this is one step forward to the migration to a new collection data format

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 02:10:40 +00:00
orbiter
46a712e195 - more asserts
- simplified indexURLEntry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-01 14:00:15 +00:00
orbiter
215c4e65f1 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-31 22:10:25 +00:00
orbiter
bd4f43cd66 - fixed a null pointer exception bug
- switched off more write caches
- re-enabled index-abstracts search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-31 02:45:41 +00:00
orbiter
fe8afaf426 switched off usage of write cache for imprortant databases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-30 02:59:22 +00:00
orbiter
985fd807cc bugfixing in collection methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-30 02:39:39 +00:00
orbiter
e6044e5198 bugfix for
http://www.yacy-forum.de/viewtopic.php?p=27207#27207
and
http://www.yacy-forum.de/viewtopic.php?p=27219#27219

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 21:43:12 +00:00
orbiter
ebd2d629d8 added missing file for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-26 13:53:00 +00:00
orbiter
147d88cf23 re-design of database caching
this should reduce IO a lot, because write caches are now actived for all databases
- added new caching class that combines a read- and write-cache.
- removed old read and write cache classes
- removed superfluous RAM index (can be replaced by kelonodroRowSet)
- addoped all current classes that used the old caching methods
- more asserts, more bugfixes


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-26 13:50:50 +00:00
orbiter
f21ede312e bugfixes for internals of database organization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-25 01:21:05 +00:00
orbiter
eb4bfb0e9d fixed problem with cache.profile()
see also: http://www.yacy-forum.de/viewtopic.php?p=27109#27109

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 22:34:13 +00:00
orbiter
2a9d868f6d - removed object cache from kelondroTree
- generalized object caching and added new object caching class
- added object caching wherever kelondroTree was used
- added object caching also to usage of kelondroFlex
- added object buffering (a write cache) to NURLs
- added many assert statements; fixed bugs here and there
- added missing close methods to latest added classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 13:48:16 +00:00
orbiter
dc056fabf3 small bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-23 01:22:50 +00:00
orbiter
278d8c3c7e - more asserts
- bugfix for reading of previously deleted nodex

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-23 00:59:55 +00:00
orbiter
83a0efc65a better assert statements and fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2833 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-21 10:50:30 +00:00
orbiter
2025e885d6 a fix for problems with remove situations in kelondroFlexSplitTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-21 00:19:00 +00:00
orbiter
06854988da - full integration of new LURL database in INDEX
- added migration method for urlHash.db into INDEX

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 21:14:37 +00:00
orbiter
b79e06615d - added new LURL.Entry class for next database migration
- refactoring of affected classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 22:25:07 +00:00
orbiter
77a59a115d refactoring of indexing methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 15:04:16 +00:00
orbiter
14490f0a83 added missing flush statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 09:42:35 +00:00
orbiter
688cbfb776 - bugfixing for flextable bug
- bugfixing for collection index bug
- several other bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 00:27:25 +00:00
orbiter
29a1318ef9 bugfixes for wrong database access that do not consider deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 22:57:47 +00:00
orbiter
50f2578c55 - some bugfixing and code cleanup
- now assortments can completely left out if they do not exist
  before startup and collection index is selected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 01:19:26 +00:00
orbiter
bdf4c7c51e added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 23:17:16 +00:00
orbiter
a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
  also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
  deleted
- The complete database can be limited to a specific size (as wanted many times)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 23:14:41 +00:00
orbiter
130cc76927 loop detection and termination in deletedHandles method
see also: http://www.yacy-forum.de/viewtopic.php?p=26655#26655

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2754 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 19:50:09 +00:00
orbiter
6396f5971e bugfixes and migration attempt toward new kelondroFlex db
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-11 00:46:45 +00:00
orbiter
86047f439d removed very bad bug that prevented production of any remote search result
:-(((
Please update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 04:04:00 +00:00
orbiter
43614f1b36 bugfix in collection index. the index for collections was not created correctly
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-05 23:47:08 +00:00
orbiter
db294687ea enhanced logging
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 22:55:59 +00:00
orbiter
d4c239e4be - fixed problem in collection index with deletion of single url references
- added automatic deletion of not-found snippets after search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:40:52 +00:00
orbiter
b033a80750 better control of failure in node seek of kelondroTree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 00:13:19 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00