Commit Graph

483 Commits

Author SHA1 Message Date
allo
351fffc129 DATA/WORK for user-created content
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-31 11:47:52 +00:00
allo
a81cc9d969 no DATA/DATA to avoid confusion.
increasing version number

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-31 11:13:26 +00:00
borg-0300
b95c5d5781 BUGFIX for URLs how "/../" ...;
new port handling;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-30 12:58:36 +00:00
allo
9cce3c5709 dates Table for bookmarksdb(needed for del.icio.us api)
Files in DATA/DATA
Migration: move bookmarks.db from SETTINGS in DATA

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1270 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-30 12:34:44 +00:00
hermens
11fe95832e avoid division by zero when index transfer is extremely fast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-29 20:01:05 +00:00
allo
4ac0fd328a First Version of the Bookmarksmanager
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-26 14:21:01 +00:00
theli
d7b6dcbe2e *) Bugfix for MalformedURL problem if Location header is empty.
See: http://www.yacy-forum.de/viewtopic.php?p=14325#14325

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-25 13:56:11 +00:00
hermens
5b3e01bd3c avoid division by zero when importing very small indexes (<100 entries)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-20 12:03:34 +00:00
borg-0300
b7f9adc2c9 new filters added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-18 01:15:25 +00:00
theli
79667a172e *) Bugfix for additional parser problem
See: http://www.yacy-forum.de/viewtopic.php?p=14146#14146

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-16 09:10:15 +00:00
theli
8c594841a8 *) Bugfix for incorrectly indexing of URLs that were requested with Cookies in the
Request header
   See: http://www.yacy-forum.de/viewtopic.php?p=14077

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-15 15:30:24 +00:00
orbiter
b5d02d649a fixed bug caused strange search result behaviour
(results from remote peers had not been saved propery after search)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-15 13:21:42 +00:00
orbiter
4500506735 fixed some bugs concerning url entry retrieval and intexControl interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-15 10:31:00 +00:00
orbiter
83a34b838d * added Object allocation monitor on performanceMemory page
* added some final statements
* changed shutdown sequence order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-14 13:04:43 +00:00
orbiter
4ff3d219e8 increased delay for cacheScan start and slowed down scan process
to provide more time to other tasks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-13 21:15:52 +00:00
orbiter
3031903d50 re-design of RAM cache flush into assortment cluster
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-13 16:00:20 +00:00
orbiter
0c762daf4b better startup failure handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 23:59:58 +00:00
orbiter
f27f9ecf15 * activated write buffer for databases.
This should increase IO performance and reduce HD activity
* bugfixes for new exception-on-failure policy
* bugfixes for new IOChunks
* new Object pool for database write-buffer


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 14:11:59 +00:00
orbiter
c59d1b2f5e - Tests with write buffer (new class kelondroBufferedIOChunks, not yet active)
- minor bugfixes


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-12 00:19:28 +00:00
orbiter
bb79fb5d91 - changed handling of error cases retrieving urls from database
(no more NULL values are returned, instead, an IOException is thrown)
- removed ugly damagedURLS implementation from plasmaCrawlLURL.java
  (this inserted a static value into the Object which is not really a good style)
- re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message
to do:
- the urldbcleanup feature must be re-tested


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-11 00:25:02 +00:00
theli
e7d16ef831 *) Corrections in jMimeMagic MagicRule-file to detect some special rss feeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 23:25:43 +00:00
theli
386d9e45d8 *) Bugfix for code cleanup
- Code must be in finally block, otherwise it does not work if an error occurs!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 22:16:49 +00:00
theli
5a1d45715d *) Bugfix for parser configuration bug
- it was not possible to disable all parsers
   See: http://www.yacy-forum.de/viewtopic.php?t=1579

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1191 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 21:35:40 +00:00
rramthun
a1061495d4 Fixed some spelling mistakes and added some text which (should) make it easier to understand the options.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 19:47:21 +00:00
orbiter
0cdc58aaea fixed indexing of local domains.
see http://www.yacy-forum.de/viewtopic.php?p=13680#13680

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 14:26:43 +00:00
theli
e1c2d8ec5f *) Speedup "removed from queue"
See: http://www.yacy-forum.de/viewtopic.php?p=13442#12188

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1183 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 11:27:44 +00:00
hydrox
96930f0d2b *)added function to removed malformed URLs from urlHash.db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 11:10:08 +00:00
theli
8862b6ba4b *) Corrections for code cleanup 1175
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 08:15:39 +00:00
orbiter
13fdebc50d added authentication for link deletion in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 00:36:05 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
ec2b39c1ce code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 22:30:15 +00:00
orbiter
8f1f2daa5e implemented interactive link deletion of search results.
next steps: attach voting and restrict to administrator
to see the deletion button, move the mouse pointer to the left of a search result

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 16:15:21 +00:00
theli
6d0f7e6988 *) Adding missing file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1171 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 11:20:06 +00:00
theli
44fa94ac52 *) Modifications for dbImport functionality
- dbImporter threads are now shutdown by the switchboard on server shutdown
   - adding possibility to pause a importer thread via GUI
   - Bugfix for abort function
     See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363

*) Modification of content parser configuration
   - now it's possible to configure which parsers should be enabled for the proxy,
     crawler, icap, etc. separately
   - 

*) htmlFilterContentScraper.java
   - adding regular expression to normalize URLs containing /../ and /./ parts

*) httpc.java
   - adding functionality to unzip gzipped content
   - requested by roland: should be used later to allow gzipped seed lists

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 10:41:19 +00:00
orbiter
dc778659fb fixed problem with time-out during result joint which caused OR behavior instead of AND beahvior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 15:48:45 +00:00
orbiter
3d8a5ae652 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 14:24:13 +00:00
theli
64478b1f02 *) Adding possibility to delete crawler queue entries using regular expressions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1160 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:11:28 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
low012
90b0eb144e just a typo...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-03 09:58:00 +00:00
theli
129b15f3e1 *) Correcting logging output of db importer thread
See: http://www.yacy-forum.de/viewtopic.php?t=1555

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-02 11:56:12 +00:00
orbiter
420d56ce79 extended db-testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1152 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-02 01:51:37 +00:00
orbiter
ecf765ec33 temporary fix to make jrpm extension compilable with my netbeans environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-01 23:03:54 +00:00
theli
8ed0aaae8d *) Adding content Parser for RPM Files
- at the moment only the metadata is extracted

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-01 10:47:00 +00:00
theli
818d37ce44 *) Removing getSimpleName
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:50:13 +00:00
theli
b35c5a48bf *) First version of urlRedirector.pl script
- with this script it's possible to pass URLs from squid
     to yacy via the squid redirector interface
   - this URLs are then used by YaCy to feed the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:27:03 +00:00
theli
bdf30117c1 *) Redesign of parser configuration
- restructuring of mimeTypes based on the parsers
   - displaying parser usage count
   - displaying human readably parser names
   - displaying parser version information

*) httpdFileHandler.java
   - adding possibility to support "streaming" servlets
     which are special servlets that can communicate with
     the client via the connection streams autonomous
   - the name of these new servlet types must end with the 
     file extension .stream
   - this feature will be needed by the yacy ScreenSaver
     class to fetch statistic data from the peer without the
     need to reconnect to the server all the time

*) Adding human readable names and version information for
   all supported parsers

*) plasmaParser.java
   - adding new structure to store parser statistic data

*) Adding openDocument parser
   - can be used to parse odt files

*) jmimemagic
   - adding rules to detect openDocument formats properly

*) serverLog.java
   - adding functions that can be used to query if a given
     logging level is enabled or not.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 07:27:58 +00:00
theli
d4ac3e25b1 *) Bugfix for file system link bug during detection of invalid URLs
See: http://www.yacy-forum.de/viewtopic.php?p=13301

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 07:17:43 +00:00
orbiter
adf75bc9fa better logging for invalid file path detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 22:55:30 +00:00
orbiter
40621a5663 anhancements in ranking preparation and fixed problem with parser/mime recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 11:55:24 +00:00
theli
c650b112ea *) Bugfix for relative URL Bug in Crawler
See: http://www.yacy-forum.de/viewtopic.php?p=13266#13266

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 06:35:23 +00:00
theli
4e73035aef *) Bugfix for "too many open files" during index distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-24 21:47:16 +00:00
orbiter
f57e2d67f5 shortened network overview (less columns fit easier on page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 11:57:30 +00:00
orbiter
85282b1d98 enhanced YBR recognition and search result heuristics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 01:40:02 +00:00
orbiter
b9cc9029e3 added ybr selection for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 16:10:24 +00:00
orbiter
0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 15:17:05 +00:00
theli
90d6c6223b *) Adding color codes to network graphic legend
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 08:13:01 +00:00
orbiter
bfe51c7228 added generation of domain-list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 01:30:30 +00:00
orbiter
0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 18:55:35 +00:00
theli
c2fe3a1670 *) Updating jMimeMagic Ruleset
- to detect some special formated html documents correctly
   - adding rule to detect vCards
*) plasmaParser now supports parsing of files that have a supported fileExtension
   but a unsupported mimeType because the webserver has set it incorrectly to text/plain
*) Adding vCard new Parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 14:39:58 +00:00
orbiter
88e3234393 fine-tuning of rci-generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-18 02:00:25 +00:00
orbiter
a12759c1bf first try to implement a rci-computation from cr-files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 16:17:56 +00:00
orbiter
4a8e8f269e refactoring of cr-processing; new kelondro class to handle the attribute file format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 12:08:04 +00:00
orbiter
24dc0e0760 implemented cr-file processing and further transmission steps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 01:59:01 +00:00
orbiter
9d9a87f445 limited htcache storage length
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-16 18:40:44 +00:00
theli
d0dfccdb77 *) Making CrawlStacker pool configurable via GUI and config file
See: http://www.yacy-forum.de/viewtopic.php?t=1448

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:46:22 +00:00
theli
3631cb1f6d *) deleting empty entities during index selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:23:46 +00:00
theli
ca26aab9b1 *) More debugging output for migrateWords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 11:55:09 +00:00
theli
9b35ae9027 *) Correcting wrong % values on IndexTransfer_p page
See: http://www.yacy-forum.de/viewtopic.php?p=12646 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:52:50 +00:00
theli
e6bf9d90a5 *) Fixing Problems with MalformedURLs during Word Selection
- removing (lurl.toString() == null) comparison because toString() is never null
   - adding (lurl.url() == null) condition because url() is null if we have selected a word entry with
     a malformed URL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:07:00 +00:00
theli
86a9210264 *) indexing queue slots are now configurable via config file
See: http://www.yacy-forum.de/viewtopic.php?t=1480

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:25:46 +00:00
theli
3c11d7b81c *) Bugfix for minimizeUrlDB
- function didn't work correctly because of new url hash structure
   See: http://www.yacy-forum.de/viewtopic.php?p=12753#12753

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 07:35:04 +00:00
orbiter
9913049009 fixed outOfMemory bug caused by loops in kelondroTree during enumeration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 01:20:05 +00:00
theli
bbb936b9ea *) Bugfix for not human readable content of PDFs while viewing the URL Content via GUI
- This Bug also affects the snippet generation on non html/text documents
   See: http://www.yacy-forum.de/viewtopic.php?t=1472

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 10:25:43 +00:00
theli
445e3a620f *) Avoid rejecting of html content by the crawler when the file extension is not set properly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 10:19:40 +00:00
theli
444a5a9368 *) Bugfix for Entries with null url in GlobalQueue
See: http://www.yacy-forum.de/viewtopic.php?p=12675#12675

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 14:59:38 +00:00
borg-0300
ebac51df52 restore defaultRemoteProfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-12 11:38:35 +00:00
borg-0300
5778428455 move cutUrlText to nxTools,
max length from URLs(title) on searchpage now 120 chars


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 13:40:53 +00:00
borg-0300
9158845c3b bugfix for snippet text null bytes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 13:27:36 +00:00
orbiter
f763923e0a added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 08:02:46 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
theli
7e0647f692 *) Bugfix for userDB usage during authentication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-08 10:17:12 +00:00
orbiter
02f8013013 auto-delete of corrupted word files during word-migration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 14:57:37 +00:00
orbiter
d2731418bf added creation of global ranking files and changed url normal form usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:33:02 +00:00
theli
6f9f8ed8f8 *) Automatic Reset of Stack Crawler DB on startup errors
See: http://www.yacy-forum.de/viewtopic.php?t=1432

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:19:05 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
orbiter
bc420c62f6 fixed htcache path generation (never change a running system)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1041 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 01:31:11 +00:00
theli
dd24f0252f *) Searchword highlighting for info page
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-06 06:27:17 +00:00
borg-0300
72cde1d894 getCachePath: no logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 22:47:13 +00:00
borg-0300
1fbd72f9e0 rename "index.html" to "ndx"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 22:39:33 +00:00
borg-0300
cd1107d85e added support for URLs with '?&'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 17:25:15 +00:00
borg-0300
5fb2b017cb small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 16:37:56 +00:00
borg-0300
544e4ea90e small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 14:11:46 +00:00
borg-0300
00ab4d8723 cleaned, small change, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-04 13:41:51 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
borg-0300
e3179a6394 added getOwnSeedFile()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 14:07:58 +00:00
borg-0300
a803a509ae bugfix: port handling in HTCache
grogram flow, cleared up


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1021 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 12:39:24 +00:00
hydrox
cb69047b91 *)cleanup access static methods and fields
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 17:56:26 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
orbiter
5f68b6886b introduced new url-hashes for better ranking computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 00:54:55 +00:00
orbiter
aadace1285 fixed network image in search performance monitor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1012 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-01 00:49:13 +00:00