theli
4ca0857c0c
*) Index transfer now considers the pause time send by busy peers during
...
index transfer / index distribution
See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-14 09:40:42 +00:00
orbiter
c75cacda95
added a flex-width-array: this is a table where it is
...
possible to add columns to an existing table
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-01 16:01:24 +00:00
orbiter
5041d330ce
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 11:44:50 +00:00
orbiter
bd057b44dd
- automatic setting of peer-does-not-accept-remote-crawl
...
- increased percentage of object cache to node cache to 30%
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-23 22:03:09 +00:00
orbiter
cda087f43b
- integrated cache miss storage into object cache
...
- removed cache-miss handling from indexURL
todo: new Monitoring in PerformanceMemory_p
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-23 16:43:28 +00:00
theli
61078b3885
*) adding support for delayed shutdown
...
- needed by Ismael to receive the Steering page properly on shutdown
- now the steering page should always be displayed properly in the web browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-22 08:02:35 +00:00
orbiter
90d569d70f
refactoring of index management:
...
url storage is part of index management; moved plasmaURL to indexURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
a930be4ba3
refactoring of index management:
...
generalized the index entry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:19:20 +00:00
hermens
df7e1d9df3
Changes to plasmaURL and subclasses:
...
- Improve performance of plasmaURL.exists() by remembering URL-hashes that are not present
- Use a more realistic estimation of memory usage by the existsIndex cache
- Routine cleanup of the existsIndex to limit its memory usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-17 13:08:57 +00:00
orbiter
a474669338
start with refactoring of index management
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-16 16:11:55 +00:00
theli
f331def5d8
*) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 10:03:24 +00:00
theli
bcc950c533
*) Bugfix for Index Transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-13 15:28:57 +00:00
orbiter
461548698c
configuration of index transfer chunk size
...
see http://www.yacy-forum.de/viewtopic.php?p=20951#20951
new properties in yacy.init:
indexDistribution.minChunkSize = 5
indexDistribution.maxChunkSize = 1000
indexDistribution.startChunkSize = 50
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-09 11:43:10 +00:00
hermens
51e3bb576f
Don't increase dhtTransferIndexCount when the last transferred index was smaller
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-07 17:44:33 +00:00
hermens
a0ca4c5fb8
Remove a possible race condition between DHT transfer and deQueue
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-05 13:17:00 +00:00
orbiter
60e5aff9fc
some enhancements to the remote crawl trigger
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 11:53:15 +00:00
orbiter
14d6e476c9
tried to solve some problems with new picture viewer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-10 22:34:47 +00:00
orbiter
f0833b0328
introduced simple search interface
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 21:48:24 +00:00
orbiter
83e0e765ec
redesigned some parts of the html scanner & parser
...
to better support image tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-04 14:36:01 +00:00
orbiter
e2e8d0c188
some kind of refactoring of yacysearch:
...
made 'room' for new picture search result presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-03 22:47:59 +00:00
rramthun
250864406f
...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 20:24:53 +00:00
orbiter
63f39ac7b5
added 3 new crawling steering options:
...
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1fc3b34be6
some pre-work (without function yet) to implement:
...
- re-crawl (by age of last crawl)
- auto-crawl-filter by crawl depth (to be explained..)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 15:28:17 +00:00
theli
c9e6b5e391
*) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1946 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 14:19:03 +00:00
orbiter
1f4412a146
adopted isListed to discussed new behavior as discussed (url, getFile)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
063ef4660a
bug?
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-19 22:06:15 +00:00
orbiter
3286b1f498
re-organisation of lurl-creation and -stacking
...
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
hydrox
8da13088e9
*)removed multiple DHT_Distribution_Threads
...
*)boosted DHT_Distribution sending chunk parallel to multiple peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-15 11:27:43 +00:00
orbiter
bcd99fe83e
introduced a second RAM cache for DHT transfer
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-13 10:43:12 +00:00
orbiter
bae3783d38
added a snippet marking
...
(search words are now bold in snippets)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
orbiter
f0a38873eb
* added yacysearch page with better view on search results
...
the old search page is obsolete and will be removed
* ConfigBasic.html is now the default page instead of index.html
as long as no password is set
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-04 18:52:04 +00:00
theli
759800f543
*) Bugfix for storeHTCache problem
...
- content was not indexed if storeHTCache was off
See: http://www.yacy-forum.de/viewtopic.php?p=18269
See: http://www.yacy-forum.de/viewtopic.php?t=1882
See: http://www.yacy-forum.de/viewtopic.php?t=241
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 08:30:08 +00:00
orbiter
1b9b8922d9
* fixed problems with new basic 1-2-3 configuration (now authentication required)
...
* fixed graphics problem
* fixed some other problems with default values
* 1-2-3 config now appears automatically on start-up if no password is set
* added new config menu
* moved profile to new config menu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 22:27:20 +00:00
auron_x
8c6f38fe70
*) added Blog to YaCy (atm not reachable through interface) -> Blog.html
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 07:40:25 +00:00
orbiter
eaffcfefe2
* added more ranking attributes (without function; this will be added later)
...
* added ranking coefficient transmission to remote peer (without evaluation on server side, will be added later)
* changed ranking coefficients slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-26 11:30:37 +00:00
orbiter
3703f76866
- fixed re-search bug: after a search with several words, a second search could not
...
find the same words as before. This was caused because indexContaines stored the url references
with a hashtable. A tree was needed to work with the index conjunction-by-numeration
- added permanent ram cache flush (again)
- removed direct flush of ram cache after a large container is added.
this happens especially during DHT transmission and therefore this fix should
speed up DHT transmission on server side.
- removed unused and out-dated methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-25 08:42:45 +00:00
theli
fbbbf5f411
*) remote trigger for proxy-crawl
...
- remote crawling can now be enabled for the proxy crawling profile
See: http://www.yacy-forum.de/viewtopic.php?p=17753#17753
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-24 09:35:54 +00:00
orbiter
1d8ca6e082
serialized dhtChunk deletion with indexing
...
The dht selection, transmission and deletion is now completely serialized with indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-21 23:08:07 +00:00
theli
2336f0f013
*) allow pausing/resuming of crawlJob Threads separately
...
- pausing/resuming localCrawls
- pausing/resuming remoteTriggeredCrawls
- pausing/resuming globalCrawlTrigger
See: http://www.yacy-forum.de/viewtopic.php?t=1591
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-21 11:18:48 +00:00
orbiter
60dac4325e
serialized indexing with dht selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 23:57:50 +00:00
orbiter
a840755964
moved parts of index transfer logic back to switchboard
...
this is needed to merge the dht selection with the indexing thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 23:27:11 +00:00
borg-0300
64441b1f78
ADDED: yacy.badwords list to filter the topwords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 17:50:42 +00:00
orbiter
2c4e4ae6a2
further refactoring of dht selection, transfer and flushing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 23:47:45 +00:00
orbiter
73dad68cf1
outsourced thelis DHT flush class into own file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 21:54:46 +00:00
theli
42a5f56723
*) Bugfix for broken dht thread configuration
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-18 14:51:01 +00:00
hydrox
e2af2a3f45
*) it's now possible to run more then one indexDistribution-Thread
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 15:22:25 +00:00
theli
980e986b64
*) Re enabling short cycle for already removed nurl entries
...
See: http://www.yacy-forum.de/viewtopic.php?p=17147#17147
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 06:59:34 +00:00
allo
a26574c894
Migration from tagName as key to wordhash(tagName) as key for bookmarkTags.db
...
(just deleting the old db, rebuildTags does the rest)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 10:00:12 +00:00
orbiter
1e4578aab6
VERY EXPERIMENTAL removal of index ram cache flushing thread.
...
The cache will fill up and flushed explicitely when it is full.
This shall remove double-access of assortments (indexing and flush)
during indexing process. Hopefully this should reduce IO.
The main idea is: the cache shall mainly be flushed by DHT transfer, and
only indexes that shall be hosted by the own peer are flushed to the
assortments. This needs further work.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-11 23:19:01 +00:00
orbiter
d98418390b
- introduced rankingProfile Class
...
- selection of ranking and timing profiles for each search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:51:00 +00:00