Commit Graph

300 Commits

Author SHA1 Message Date
theli
f331def5d8 *) Bugfix for distribution. Incorrect behavior if peerCount == selectedCount
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 10:03:24 +00:00
theli
bcc950c533 *) Bugfix for Index Transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-13 15:28:57 +00:00
orbiter
461548698c configuration of index transfer chunk size
see http://www.yacy-forum.de/viewtopic.php?p=20951#20951
new properties in yacy.init:
indexDistribution.minChunkSize = 5
indexDistribution.maxChunkSize = 1000
indexDistribution.startChunkSize = 50

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-09 11:43:10 +00:00
hermens
51e3bb576f Don't increase dhtTransferIndexCount when the last transferred index was smaller
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-07 17:44:33 +00:00
hermens
a0ca4c5fb8 Remove a possible race condition between DHT transfer and deQueue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-05 13:17:00 +00:00
orbiter
60e5aff9fc some enhancements to the remote crawl trigger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 11:53:15 +00:00
orbiter
14d6e476c9 tried to solve some problems with new picture viewer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-10 22:34:47 +00:00
orbiter
f0833b0328 introduced simple search interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2007 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 21:48:24 +00:00
orbiter
83e0e765ec redesigned some parts of the html scanner & parser
to better support image tags

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-04 14:36:01 +00:00
orbiter
e2e8d0c188 some kind of refactoring of yacysearch:
made 'room' for new picture search result presentation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-03 22:47:59 +00:00
rramthun
250864406f ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 20:24:53 +00:00
orbiter
63f39ac7b5 added 3 new crawling steering options:
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1fc3b34be6 some pre-work (without function yet) to implement:
- re-crawl (by age of last crawl)
- auto-crawl-filter by crawl depth (to be explained..)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 15:28:17 +00:00
theli
c9e6b5e391 *) check size of indexing-queue and crawler pool before processing remote triggered crawl jobs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1946 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 14:19:03 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
063ef4660a bug?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-19 22:06:15 +00:00
orbiter
3286b1f498 re-organisation of lurl-creation and -stacking
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
hydrox
8da13088e9 *)removed multiple DHT_Distribution_Threads
*)boosted DHT_Distribution sending chunk parallel to multiple peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-15 11:27:43 +00:00
orbiter
bcd99fe83e introduced a second RAM cache for DHT transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-13 10:43:12 +00:00
orbiter
bae3783d38 added a snippet marking
(search words are now bold in snippets)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
orbiter
f0a38873eb * added yacysearch page with better view on search results
the old search page is obsolete and will be removed
* ConfigBasic.html is now the default page instead of index.html
  as long as no password is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-04 18:52:04 +00:00
theli
759800f543 *) Bugfix for storeHTCache problem
- content was not indexed if storeHTCache was off
   See: http://www.yacy-forum.de/viewtopic.php?p=18269
   See: http://www.yacy-forum.de/viewtopic.php?t=1882
   See: http://www.yacy-forum.de/viewtopic.php?t=241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 08:30:08 +00:00
orbiter
1b9b8922d9 * fixed problems with new basic 1-2-3 configuration (now authentication required)
* fixed graphics problem
* fixed some other problems with default values
* 1-2-3 config now appears automatically on start-up if no password is set
* added new config menu
* moved profile to new config menu


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 22:27:20 +00:00
auron_x
8c6f38fe70 *) added Blog to YaCy (atm not reachable through interface) -> Blog.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-01 07:40:25 +00:00
orbiter
eaffcfefe2 * added more ranking attributes (without function; this will be added later)
* added ranking coefficient transmission to remote peer (without evaluation on server side, will be added later)
* changed ranking coefficients slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-26 11:30:37 +00:00
orbiter
3703f76866 - fixed re-search bug: after a search with several words, a second search could not
find the same words as before. This was caused because indexContaines stored the url references
  with a hashtable. A tree was needed to work with the index conjunction-by-numeration
- added permanent ram cache flush (again)
- removed direct flush of ram cache after a large container is added.
  this happens especially during DHT transmission and therefore this fix should
  speed up DHT transmission on server side.
- removed unused and out-dated methods

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-25 08:42:45 +00:00
theli
fbbbf5f411 *) remote trigger for proxy-crawl
- remote crawling can now be enabled for the proxy crawling profile
   See: http://www.yacy-forum.de/viewtopic.php?p=17753#17753
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-24 09:35:54 +00:00
orbiter
1d8ca6e082 serialized dhtChunk deletion with indexing
The dht selection, transmission and deletion is now completely serialized with indexing


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-21 23:08:07 +00:00
theli
2336f0f013 *) allow pausing/resuming of crawlJob Threads separately
- pausing/resuming localCrawls
   - pausing/resuming remoteTriggeredCrawls
   - pausing/resuming globalCrawlTrigger
   See: http://www.yacy-forum.de/viewtopic.php?t=1591

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-21 11:18:48 +00:00
orbiter
60dac4325e serialized indexing with dht selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 23:57:50 +00:00
orbiter
a840755964 moved parts of index transfer logic back to switchboard
this is needed to merge the dht selection with the indexing thread

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 23:27:11 +00:00
borg-0300
64441b1f78 ADDED: yacy.badwords list to filter the topwords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 17:50:42 +00:00
orbiter
2c4e4ae6a2 further refactoring of dht selection, transfer and flushing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 23:47:45 +00:00
orbiter
73dad68cf1 outsourced thelis DHT flush class into own file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 21:54:46 +00:00
theli
42a5f56723 *) Bugfix for broken dht thread configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-18 14:51:01 +00:00
hydrox
e2af2a3f45 *) it's now possible to run more then one indexDistribution-Thread
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 15:22:25 +00:00
theli
980e986b64 *) Re enabling short cycle for already removed nurl entries
See: http://www.yacy-forum.de/viewtopic.php?p=17147#17147 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 06:59:34 +00:00
allo
a26574c894 Migration from tagName as key to wordhash(tagName) as key for bookmarkTags.db
(just deleting the old db, rebuildTags does the rest)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 10:00:12 +00:00
orbiter
1e4578aab6 VERY EXPERIMENTAL removal of index ram cache flushing thread.
The cache will fill up and flushed explicitely when it is full.
This shall remove double-access of assortments (indexing and flush)
during indexing process. Hopefully this should reduce IO.
The main idea is: the cache shall mainly be flushed by DHT transfer, and
only indexes that shall be hosted by the own peer are flushed to the
assortments. This needs further work.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-11 23:19:01 +00:00
orbiter
d98418390b - introduced rankingProfile Class
- selection of ranking and timing profiles for each search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:51:00 +00:00
theli
6a99304b2b *) Redesign of db import functionality
- restructuring to allow different import tasks to be controlled via one gui 
   - adding possibility to import a single assortment file
   - adding possibility to set the cache size that should be used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1504 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-31 12:30:24 +00:00
orbiter
fa90c3ca7a - removed some usage of indexEntity
- changed index collection process: indexes are not first flushed to indexEntity,
  but now collected directly from ram cache and assortments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 12:42:06 +00:00
orbiter
03c65742ba changes towards the new index storage scheme:
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 00:42:38 +00:00
orbiter
b946e28e61 some ranking enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-27 02:48:27 +00:00
orbiter
eabf4a0386 fix for null pointer exception during shut-down
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-23 13:45:14 +00:00
orbiter
f14d49fae9 enhancements, bugfixes and additions to word index attribute storage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-22 00:07:00 +00:00
allo
4d33020f56 Migration to WORK
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-21 16:39:57 +00:00
rramthun
1e5feedf0e Fix for http://www.yacy-forum.de/viewtopic.php?p=15547#15547
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-21 11:20:26 +00:00
orbiter
f4ffa9aee5 - implemented more attributes to index entries
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 15:14:21 +00:00
orbiter
0371494010 tried to add word position to index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-19 14:13:39 +00:00
borg-0300
c5b6154136 added CRDistOn = true/false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-18 02:18:23 +00:00
orbiter
e2ff1767b5 fix for last DHT distribution bug-fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-13 16:58:27 +00:00
allo
6822dce57b Using Orbiters function for auth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-12 16:21:08 +00:00
orbiter
2028403670 - consolidated different orderings to kelondroNaturalOrder
- added another iteration method to rwihash-enumeration


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-11 00:32:44 +00:00
orbiter
9086261476 refactoring of base64 encoding:
the kelondro database needs specific information about the order of
base64-encoded keys. Since no other package depends on base64
(only the httpd uses base64 for encryption, but does not need to encode these strings)
it is good to move base64 encoding to the new ordering classes in kelondro.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 00:39:00 +00:00
allo
351fffc129 DATA/WORK for user-created content
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-31 11:47:52 +00:00
allo
a81cc9d969 no DATA/DATA to avoid confusion.
increasing version number

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-31 11:13:26 +00:00
allo
9cce3c5709 dates Table for bookmarksdb(needed for del.icio.us api)
Files in DATA/DATA
Migration: move bookmarks.db from SETTINGS in DATA

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1270 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-30 12:34:44 +00:00
allo
4ac0fd328a First Version of the Bookmarksmanager
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-26 14:21:01 +00:00
orbiter
4500506735 fixed some bugs concerning url entry retrieval and intexControl interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-15 10:31:00 +00:00
orbiter
83a34b838d * added Object allocation monitor on performanceMemory page
* added some final statements
* changed shutdown sequence order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-14 13:04:43 +00:00
orbiter
0c762daf4b better startup failure handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 23:59:58 +00:00
orbiter
f27f9ecf15 * activated write buffer for databases.
This should increase IO performance and reduce HD activity
* bugfixes for new exception-on-failure policy
* bugfixes for new IOChunks
* new Object pool for database write-buffer


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 14:11:59 +00:00
orbiter
c59d1b2f5e - Tests with write buffer (new class kelondroBufferedIOChunks, not yet active)
- minor bugfixes


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 00:19:28 +00:00
orbiter
bb79fb5d91 - changed handling of error cases retrieving urls from database
(no more NULL values are returned, instead, an IOException is thrown)
- removed ugly damagedURLS implementation from plasmaCrawlLURL.java
  (this inserted a static value into the Object which is not really a good style)
- re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message
to do:
- the urldbcleanup feature must be re-tested


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-11 00:25:02 +00:00
theli
386d9e45d8 *) Bugfix for code cleanup
- Code must be in finally block, otherwise it does not work if an error occurs!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 22:16:49 +00:00
rramthun
a1061495d4 Fixed some spelling mistakes and added some text which (should) make it easier to understand the options.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 19:47:21 +00:00
orbiter
0cdc58aaea fixed indexing of local domains.
see http://www.yacy-forum.de/viewtopic.php?p=13680#13680

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 14:26:43 +00:00
theli
e1c2d8ec5f *) Speedup "removed from queue"
See: http://www.yacy-forum.de/viewtopic.php?p=13442#12188

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1183 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 11:27:44 +00:00
orbiter
13fdebc50d added authentication for link deletion in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 00:36:05 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
ec2b39c1ce code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 22:30:15 +00:00
orbiter
8f1f2daa5e implemented interactive link deletion of search results.
next steps: attach voting and restrict to administrator
to see the deletion button, move the mouse pointer to the left of a search result

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 16:15:21 +00:00
theli
44fa94ac52 *) Modifications for dbImport functionality
- dbImporter threads are now shutdown by the switchboard on server shutdown
   - adding possibility to pause a importer thread via GUI
   - Bugfix for abort function
     See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363

*) Modification of content parser configuration
   - now it's possible to configure which parsers should be enabled for the proxy,
     crawler, icap, etc. separately
   - 

*) htmlFilterContentScraper.java
   - adding regular expression to normalize URLs containing /../ and /./ parts

*) httpc.java
   - adding functionality to unzip gzipped content
   - requested by roland: should be used later to allow gzipped seed lists

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 10:41:19 +00:00
orbiter
3d8a5ae652 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 14:24:13 +00:00
orbiter
f57e2d67f5 shortened network overview (less columns fit easier on page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 11:57:30 +00:00
orbiter
85282b1d98 enhanced YBR recognition and search result heuristics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 01:40:02 +00:00
orbiter
0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 15:17:05 +00:00
orbiter
0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 18:55:35 +00:00
orbiter
24dc0e0760 implemented cr-file processing and further transmission steps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 01:59:01 +00:00
theli
86a9210264 *) indexing queue slots are now configurable via config file
See: http://www.yacy-forum.de/viewtopic.php?t=1480

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:25:46 +00:00
borg-0300
ebac51df52 restore defaultRemoteProfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-12 11:38:35 +00:00
borg-0300
5778428455 move cutUrlText to nxTools,
max length from URLs(title) on searchpage now 120 chars


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 13:40:53 +00:00
borg-0300
9158845c3b bugfix for snippet text null bytes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 13:27:36 +00:00
orbiter
f763923e0a added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 08:02:46 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
theli
7e0647f692 *) Bugfix for userDB usage during authentication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-08 10:17:12 +00:00
orbiter
d2731418bf added creation of global ranking files and changed url normal form usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:33:02 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
theli
dd24f0252f *) Searchword highlighting for info page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-06 06:27:17 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
borg-0300
e3179a6394 added getOwnSeedFile()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 14:07:58 +00:00
hydrox
cb69047b91 *)cleanup access static methods and fields
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 17:56:26 +00:00
allo
92c49b406b adminAuth with userDB and adminAuthenticated (fix for statuspage)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-28 18:06:36 +00:00
theli
ec3af327f7 *) Bugfix for Proxy-Authentication against remote proxy
See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804

*) Adding first version of db test for mysql
   NOTES:
   - db user + db + db table must be created before starting the test
   - db table must be empty. Entries can not be updated at the moment
   - db connection properties must be changed in the sourcecode at the moment
   TODOs:
   - accepting connection properties via command line
   - implementing update + remove + read operations
   - 'maybe' adding code to create db + table if it doesn't exists

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-27 11:28:37 +00:00
orbiter
1aa4ba8b62 added post-search filtering of redundant urls (longer than existing cited)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-25 09:06:00 +00:00
orbiter
4dcbc26ef1 introduction of search profiles; very experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 17:50:27 +00:00
theli
9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
to disallow yacy to index the response that belongs to the request where 
   X-YACY-Index-Contro is set to "no-index"

*) Bugfix for Seed-List download via Remote Proxy.
   Now the pragma and cache-control http headers of the request are properly set to "no-cache" 
   See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639

*) Bugfix for http-Proxy
   yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests.
   Now, these request headers are evaluated properly

TODO: Missing evaluation of "no-store" request headers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 10:35:05 +00:00
theli
02d9af1a70 *) Restructuring and extending of Remote Proxy Support
- remote proxy configuration can now be "really" changed on the fly and takes effect immediately
   - adding possibility to disable remote proxy usage for yacy->yacy communication
   - adding possibility to disable remote proxy usage for ssl
   - restructuring proxy configuration so that it is stored in a single place now

*) Adding possibility to import a foreign word DB (or even more of them in parallel) 
   at runtime into the peers DB
   - this can be done by calling IndexImport_p.html 
   - ATTENTION: please not that at the moment this thread must be aborted via gui
     before a normal server shutdown is done. 
   - TODO: integrating IndexImport Thread into normal server shutdown
   - TODO: Adding posibility to import crawl-queues, etc. from foreign peers
   - TODO: removing old import function from yacy.java and calling the new routines instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-22 13:28:04 +00:00
borg-0300
58b670201d now, changed HTCacheSize needs no restart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 17:59:54 +00:00