Commit Graph

276 Commits

Author SHA1 Message Date
orbiter
80b6c90d54 enhancements to prevent blocking during dht transfer receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:49:39 +00:00
theli
9f298083cd *) adding more urls to the error url
- old error strings where replaced with there corresponding constants   
   See: http://www.yacy-forum.de/viewtopic.php?t=2638

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 15:11:14 +00:00
orbiter
279b1d969d Integrated new indexing data structure 'collections' into the main class
for indexing, the plasmaWordIndex.

The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.

The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.

Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 22:22:14 +00:00
orbiter
ebc2233092 * implemented (finished) class indexRowSetContainer
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 23:20:03 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
e357599f92 * fixed problem with indexContainer iteration from RAM:
indexContainers from RAM must be cloned explicitely to prevent
  side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
  deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-01 10:30:55 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
e4f1820b58 protection against too long authentication strings in switchboard
see also: http://www.yacy-forum.de/viewtopic.php?p=23943#23943

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2312 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-20 11:30:10 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
671fd9a5c9 work towards new indexing database structure
(no effect on current functionality yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-04 14:47:27 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
66964dc015 removed high/med/low from kelondroRecords cache control.
this was done because testing showed that cache-delete operations
slowed down record access most, even more that actual IO operations.
Cache-delete operations appeared when entries were shifted from low-priority
positions to high-priority positions. During a fill of x entries to a database,
x/2 delete situation happen which caused two or more delete operations.
removing the cache control means that these delete operations are not
necessary any more, but it is more difficult to decide which cache elements
shall be removed in case that the cache is full. There is not yet a stable
solution for this case, but the advantage of a faster cache is more important
that the flush problem.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-25 10:31:38 +00:00
allo
67a8c74be3 Fix for dynamic login with static password.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-18 08:04:51 +00:00
allo
ef9eb50c3c fix for adminlogin
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-17 11:15:16 +00:00
allo
6fe2fed87e cookieauth works with static Admin.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2208 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-16 08:04:02 +00:00
theli
4ca0857c0c *) Index transfer now considers the pause time send by busy peers during
index transfer / index distribution
   See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-14 09:40:42 +00:00
orbiter
c75cacda95 added a flex-width-array: this is a table where it is
possible to add columns to an existing table

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-01 16:01:24 +00:00
orbiter
5041d330ce refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 11:44:50 +00:00
orbiter
bd057b44dd - automatic setting of peer-does-not-accept-remote-crawl
- increased percentage of object cache to node cache to 30%

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-23 22:03:09 +00:00
orbiter
cda087f43b - integrated cache miss storage into object cache
- removed cache-miss handling from indexURL
todo: new Monitoring in PerformanceMemory_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-23 16:43:28 +00:00
theli
61078b3885 *) adding support for delayed shutdown
- needed by Ismael to receive the Steering page properly on shutdown
   - now the steering page should always be displayed properly in the web browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-22 08:02:35 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
a930be4ba3 refactoring of index management:
generalized the index entry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:19:20 +00:00
hermens
df7e1d9df3 Changes to plasmaURL and subclasses:
- Improve performance of plasmaURL.exists() by remembering URL-hashes that are not present
- Use a more realistic estimation of memory usage by the existsIndex cache
- Routine cleanup of the existsIndex to limit its memory usage



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-17 13:08:57 +00:00
orbiter
a474669338 start with refactoring of index management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-16 16:11:55 +00:00
theli
f331def5d8 *) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 10:03:24 +00:00
theli
bcc950c533 *) Bugfix for Index Transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-13 15:28:57 +00:00
orbiter
461548698c configuration of index transfer chunk size
see http://www.yacy-forum.de/viewtopic.php?p=20951#20951
new properties in yacy.init:
indexDistribution.minChunkSize = 5
indexDistribution.maxChunkSize = 1000
indexDistribution.startChunkSize = 50

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-09 11:43:10 +00:00
hermens
51e3bb576f Don't increase dhtTransferIndexCount when the last transferred index was smaller
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-07 17:44:33 +00:00
hermens
a0ca4c5fb8 Remove a possible race condition between DHT transfer and deQueue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-05 13:17:00 +00:00
orbiter
60e5aff9fc some enhancements to the remote crawl trigger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 11:53:15 +00:00
orbiter
14d6e476c9 tried to solve some problems with new picture viewer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-10 22:34:47 +00:00
orbiter
f0833b0328 introduced simple search interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 21:48:24 +00:00
orbiter
83e0e765ec redesigned some parts of the html scanner & parser
to better support image tags

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-04 14:36:01 +00:00
orbiter
e2e8d0c188 some kind of refactoring of yacysearch:
made 'room' for new picture search result presentation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-03 22:47:59 +00:00
rramthun
250864406f ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 20:24:53 +00:00
orbiter
63f39ac7b5 added 3 new crawling steering options:
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1fc3b34be6 some pre-work (without function yet) to implement:
- re-crawl (by age of last crawl)
- auto-crawl-filter by crawl depth (to be explained..)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 15:28:17 +00:00
theli
c9e6b5e391 *) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1946 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 14:19:03 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
063ef4660a bug?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-19 22:06:15 +00:00
orbiter
3286b1f498 re-organisation of lurl-creation and -stacking
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
hydrox
8da13088e9 *)removed multiple DHT_Distribution_Threads
*)boosted DHT_Distribution sending chunk parallel to multiple peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-15 11:27:43 +00:00
orbiter
bcd99fe83e introduced a second RAM cache for DHT transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-13 10:43:12 +00:00
orbiter
bae3783d38 added a snippet marking
(search words are now bold in snippets)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
orbiter
f0a38873eb * added yacysearch page with better view on search results
the old search page is obsolete and will be removed
* ConfigBasic.html is now the default page instead of index.html
  as long as no password is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-04 18:52:04 +00:00
theli
759800f543 *) Bugfix for storeHTCache problem
- content was not indexed if storeHTCache was off
   See: http://www.yacy-forum.de/viewtopic.php?p=18269
   See: http://www.yacy-forum.de/viewtopic.php?t=1882
   See: http://www.yacy-forum.de/viewtopic.php?t=241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 08:30:08 +00:00
orbiter
1b9b8922d9 * fixed problems with new basic 1-2-3 configuration (now authentication required)
* fixed graphics problem
* fixed some other problems with default values
* 1-2-3 config now appears automatically on start-up if no password is set
* added new config menu
* moved profile to new config menu


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 22:27:20 +00:00
auron_x
8c6f38fe70 *) added Blog to YaCy (atm not reachable through interface) -> Blog.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 07:40:25 +00:00