Commit Graph

109 Commits

Author SHA1 Message Date
orbiter
f188611fc6 apply blacklist on rwis during dht receive
very experimental!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-09 10:46:02 +00:00
theli
5ee0125046 *) adding possibility to configure the server port for seed uploading via scp.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 16:34:05 +00:00
allo
7afa5c1b8e staticIP fix
tried to solve http://www.yacy-forum.de/viewtopic.php?p=18663#18663
D 2006/03/08 07:08:20 YACY yacyClient.publishMySeed mySeed error - not proper: IP is not proper: -UNRESOLVED_PATTERN-


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 12:23:26 +00:00
theli
f108048a2c *) Bugfix for NullpointerException in hello.java
*) Correcting for loop in hello.java   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 06:40:38 +00:00
orbiter
bae3783d38 added a snippet marking
(search words are now bold in snippets)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
allo
f73d51f94b reverted last change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 19:20:35 +00:00
allo
8997b83806 store the staticIP(dyndns) in seed, not the real IP
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 17:33:05 +00:00
allo
7c5f8f997a some more staticIP fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 12:20:19 +00:00
orbiter
d31a4e0b4f some small enhancements with cache flushing parameters and data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-25 16:10:31 +00:00
hermens
3208fe14ed *) log exceptions in crawlOrder.java to the logfile instead of stdout
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-22 01:04:38 +00:00
orbiter
7eb10675b3 re-organization of index management
this was done to be prepared for new storage algorithms


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 00:12:07 +00:00
theli
d0f76fc9bc *) setting logging level for thread pools to info
*) new layout for bookmark list 
   (Allo: please take a look if it's acceptable for you)
*) crawlReceipt.java: displaying peer name in logging message
*) Network.html: adding button for manual peer ping

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-09 08:29:07 +00:00
orbiter
fb7411d7bb re-structuring of ranking application:
concentration of all ranking attributes in the
plasmaSearchRankingProfile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-05 01:47:51 +00:00
orbiter
d98418390b - introduced rankingProfile Class
- selection of ranking and timing profiles for each search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:51:00 +00:00
allo
1f3eaf9f8e use DATA/HTDOCS for notifier.gif. Works even if htroot is readonly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-03 21:21:42 +00:00
orbiter
fa90c3ca7a - removed some usage of indexEntity
- changed index collection process: indexes are not first flushed to indexEntity,
  but now collected directly from ram cache and assortments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 12:42:06 +00:00
orbiter
03c65742ba changes towards the new index storage scheme:
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 00:42:38 +00:00
rramthun
5942f6334c Some language fixes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 20:32:25 +00:00
orbiter
f4ffa9aee5 - implemented more attributes to index entries
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 15:14:21 +00:00
borg-0300
c5b6154136 added CRDistOn = true/false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-18 02:18:23 +00:00
borg-0300
8d8a40c2d9 added properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-18 00:03:28 +00:00
orbiter
cfd1e5e376 more security for index transfer protocol:
- allow only specific file names
- log IP number of accessing peer in case of attack attempts

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 22:19:18 +00:00
orbiter
423ce9bf59 quickfix for http://www.yacy-forum.de/viewtopic.php?p=15336#15336
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 21:50:40 +00:00
allo
5eba6c66c6 thelis fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 19:01:20 +00:00
rramthun
c59027e520 Translated status_p.inc a bit further, but it didn't work.
See http://www.yacy-forum.de/viewtopic.php?p=15180#15180

Added my seed to superseed.txt as I am now proud owner of a PC which runs YaCy most of the day.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-14 22:01:21 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
orbiter
9086261476 refactoring of base64 encoding:
the kelondro database needs specific information about the order of
base64-encoded keys. Since no other package depends on base64
(only the httpd uses base64 for encryption, but does not need to encode these strings)
it is good to move base64 encoding to the new ordering classes in kelondro.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 00:39:00 +00:00
orbiter
b3dca06bb1 added location column to network pages.
The location is computed from the userAgent string of connecting peers.
Therefore this information is not available right after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-22 01:01:46 +00:00
orbiter
bb79fb5d91 - changed handling of error cases retrieving urls from database
(no more NULL values are returned, instead, an IOException is thrown)
- removed ugly damagedURLS implementation from plasmaCrawlLURL.java
  (this inserted a static value into the Object which is not really a good style)
- re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message
to do:
- the urldbcleanup feature must be re-tested


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-11 00:25:02 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
8f1f2daa5e implemented interactive link deletion of search results.
next steps: attach voting and restrict to administrator
to see the deletion button, move the mouse pointer to the left of a search result

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 16:15:21 +00:00
orbiter
7920e1547d code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:13:13 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
orbiter
b9cc9029e3 added ybr selection for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 16:10:24 +00:00
theli
89fab9f200 *) Correcting Problems with lURLEntries containing null URLs.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 21:37:24 +00:00
theli
23dc904e0e *) Correcting Problems with lURLEntries containing null URLs.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 15:43:21 +00:00
theli
0610ff4fe9 *) small changes to crawlReceipt.java
- we do not know if the URL was stored in the noticeURL-DB with the old or new hash.
     therefore we now try to remove the URL from the noticeURL-DB using both hash values

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:40:49 +00:00
orbiter
e9d6defce6 qquickfix for http://www.yacy-forum.de/viewtopic.php?p=12638#12638
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 09:48:39 +00:00
orbiter
f763923e0a added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 08:02:46 +00:00
orbiter
d2731418bf added creation of global ranking files and changed url normal form usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:33:02 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
borg-0300
440e6ed747 see http://www.yacy-forum.de/viewtopic.php?t=1416
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 23:49:50 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
theli
f871408729 *) sharedBlacklist_p.java
- Setting Pragma: no-cache
   - increasing timeout to 12 sec.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 08:32:43 +00:00
theli
8194fde340 *) trying to continue transferRWI processing even if this error occures:
|> Caused by: de.anomic.kelondro.kelondroException: kelondroTree.searchproc: nullpointernull in db '.../urlHash.db' 
   - if URL existence can not be determined, we request it from the remote peer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-28 07:47:30 +00:00
orbiter
4dcbc26ef1 introduction of search profiles; very experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 17:50:27 +00:00
theli
7256bea45f *) Bugfix for nameLookup parameter handling
*) Bugfix for Received xx Words [xxxxxxx .. null] Bug



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 05:38:04 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
borg-0300
e642a5d8b7 more constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-17 15:46:12 +00:00