Commit Graph

743 Commits

Author SHA1 Message Date
theli
b35c5a48bf *) First version of urlRedirector.pl script
- with this script it's possible to pass URLs from squid
     to yacy via the squid redirector interface
   - this URLs are then used by YaCy to feed the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:27:03 +00:00
theli
bdf30117c1 *) Redesign of parser configuration
- restructuring of mimeTypes based on the parsers
   - displaying parser usage count
   - displaying human readably parser names
   - displaying parser version information

*) httpdFileHandler.java
   - adding possibility to support "streaming" servlets
     which are special servlets that can communicate with
     the client via the connection streams autonomous
   - the name of these new servlet types must end with the 
     file extension .stream
   - this feature will be needed by the yacy ScreenSaver
     class to fetch statistic data from the peer without the
     need to reconnect to the server all the time

*) Adding human readable names and version information for
   all supported parsers

*) plasmaParser.java
   - adding new structure to store parser statistic data

*) Adding openDocument parser
   - can be used to parse odt files

*) jmimemagic
   - adding rules to detect openDocument formats properly

*) serverLog.java
   - adding functions that can be used to query if a given
     logging level is enabled or not.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 07:27:58 +00:00
allo
b86d1085e2 passwordAuth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 22:35:41 +00:00
theli
5bf70e6e14 *) Bugfix for serverClassLoader.java
- Classloading didn't work properly if there are multiple classes with the same name
   - This could occure because the yacy servlets have no package name defined and therefore
     are all in the same (default) package.

*) Bugfix for Duplicated Class Error
   See: http://www.yacy-forum.de/viewtopic.php?t=1341

  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 10:15:25 +00:00
theli
d4ac3e25b1 *) Bugfix for file system link bug during detection of invalid URLs
See: http://www.yacy-forum.de/viewtopic.php?p=13301

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 07:17:43 +00:00
orbiter
adf75bc9fa better logging for invalid file path detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 22:55:30 +00:00
orbiter
40621a5663 anhancements in ranking preparation and fixed problem with parser/mime recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 11:55:24 +00:00
theli
c650b112ea *) Bugfix for relative URL Bug in Crawler
See: http://www.yacy-forum.de/viewtopic.php?p=13266#13266

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 06:35:23 +00:00
theli
7e670894d9 *) Suppressing stackTraces in proxyError message for "connect timed out" errors
See: http://www.yacy-forum.de/viewtopic.php?t=1504
*) Increasing default http client timeout

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-25 00:40:35 +00:00
theli
4e73035aef *) Bugfix for "too many open files" during index distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-24 21:47:16 +00:00
allo
d8afe60e07 Bugfix for last Bugfix ;-).
host/port were set to originaladdress instead of the correct values for the new Url.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 14:05:25 +00:00
orbiter
1b656f6b31 correction of bug from svn 1123
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 12:07:07 +00:00
orbiter
f57e2d67f5 shortened network overview (less columns fit easier on page)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 11:57:30 +00:00
allo
24d15eb0e8 moving the redirector code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 07:52:36 +00:00
allo
787c368696 synchronized redirectors and using the port.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 07:37:15 +00:00
orbiter
85282b1d98 enhanced YBR recognition and search result heuristics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 01:40:02 +00:00
allo
4776f3f815 squid like redirctors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 22:07:29 +00:00
orbiter
b9cc9029e3 added ybr selection for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 16:10:24 +00:00
orbiter
0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 15:17:05 +00:00
allo
52a0237bf2 using Filetemplates for #[metas]# and other static includes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 08:33:54 +00:00
theli
90d6c6223b *) Adding color codes to network graphic legend
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 08:13:01 +00:00
orbiter
bfe51c7228 added generation of domain-list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 01:30:30 +00:00
orbiter
0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 18:55:35 +00:00
theli
99fb26e499 *) Suppressing stackTraces in proxyError message for harmless errors
See: http://www.yacy-forum.de/viewtopic.php?t=1504

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 15:37:23 +00:00
theli
c2fe3a1670 *) Updating jMimeMagic Ruleset
- to detect some special formated html documents correctly
   - adding rule to detect vCards
*) plasmaParser now supports parsing of files that have a supported fileExtension
   but a unsupported mimeType because the webserver has set it incorrectly to text/plain
*) Adding vCard new Parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 14:39:58 +00:00
orbiter
88e3234393 fine-tuning of rci-generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-18 02:00:25 +00:00
orbiter
a12759c1bf first try to implement a rci-computation from cr-files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 16:17:56 +00:00
orbiter
4a8e8f269e refactoring of cr-processing; new kelondro class to handle the attribute file format
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 12:08:04 +00:00
orbiter
24dc0e0760 implemented cr-file processing and further transmission steps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 01:59:01 +00:00
low012
5cd1e9cef4 *) fixed some dirty code, idea analog to bit stuffing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1098 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-16 23:58:13 +00:00
orbiter
022530df7e small bugfix in kelondroTree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-16 22:36:36 +00:00
orbiter
9d9a87f445 limited htcache storage length
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-16 18:40:44 +00:00
theli
8e308cf50e *) Possibility to change the server port on-the-fly.
- Now it's possible to change the server port without the need to restart the whole server.
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 15:03:15 +00:00
theli
d0dfccdb77 *) Making CrawlStacker pool configurable via GUI and config file
See: http://www.yacy-forum.de/viewtopic.php?t=1448

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:46:22 +00:00
theli
3631cb1f6d *) deleting empty entities during index selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:23:46 +00:00
theli
ca26aab9b1 *) More debugging output for migrateWords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 11:55:09 +00:00
theli
9b35ae9027 *) Correcting wrong % values on IndexTransfer_p page
See: http://www.yacy-forum.de/viewtopic.php?p=12646 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:52:50 +00:00
theli
e6bf9d90a5 *) Fixing Problems with MalformedURLs during Word Selection
- removing (lurl.toString() == null) comparison because toString() is never null
   - adding (lurl.url() == null) condition because url() is null if we have selected a word entry with
     a malformed URL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:07:00 +00:00
theli
86a9210264 *) indexing queue slots are now configurable via config file
See: http://www.yacy-forum.de/viewtopic.php?t=1480

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:25:46 +00:00
theli
3c11d7b81c *) Bugfix for minimizeUrlDB
- function didn't work correctly because of new url hash structure
   See: http://www.yacy-forum.de/viewtopic.php?p=12753#12753

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 07:35:04 +00:00
orbiter
9913049009 fixed outOfMemory bug caused by loops in kelondroTree during enumeration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 01:20:05 +00:00
allo
f8f9d509d5 removed dead Code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 12:48:14 +00:00
allo
5918d3985e removed Debug Statements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1076 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 11:47:46 +00:00
theli
bbb936b9ea *) Bugfix for not human readable content of PDFs while viewing the URL Content via GUI
- This Bug also affects the snippet generation on non html/text documents
   See: http://www.yacy-forum.de/viewtopic.php?t=1472

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 10:25:43 +00:00
theli
445e3a620f *) Avoid rejecting of html content by the crawler when the file extension is not set properly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 10:19:40 +00:00
orbiter
a3fd0069f5 fixed bug in kelondroTree node iteration (rotation did not work)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 00:23:20 +00:00
theli
fd58d5f8e6 *) Adding possibility to specify the interface / IP-Address where YaCy should bind to.
- e.g. Port = 192.168.0.1:8080
          Port = #eth0:8080
          Port = 8080

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 17:03:52 +00:00
allo
889de6686c Migration in yacyVersion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 15:12:48 +00:00
theli
444a5a9368 *) Bugfix for Entries with null url in GlobalQueue
See: http://www.yacy-forum.de/viewtopic.php?p=12675#12675

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 14:59:38 +00:00
allo
3bbb932fa2 Bugfix for nullpointerexception.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 09:55:14 +00:00