theli
386d9e45d8
*) Bugfix for code cleanup
...
- Code must be in finally block, otherwise it does not work if an error occurs!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 22:16:49 +00:00
theli
5a1d45715d
*) Bugfix for parser configuration bug
...
- it was not possible to disable all parsers
See: http://www.yacy-forum.de/viewtopic.php?t=1579
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1191 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-08 21:35:40 +00:00
rramthun
a1061495d4
Fixed some spelling mistakes and added some text which (should) make it easier to understand the options.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 19:47:21 +00:00
orbiter
0cdc58aaea
fixed indexing of local domains.
...
see http://www.yacy-forum.de/viewtopic.php?p=13680#13680
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 14:26:43 +00:00
theli
e1c2d8ec5f
*) Speedup "removed from queue"
...
See: http://www.yacy-forum.de/viewtopic.php?p=13442#12188
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1183 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 11:27:44 +00:00
hydrox
96930f0d2b
*)added function to removed malformed URLs from urlHash.db
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 11:10:08 +00:00
theli
8862b6ba4b
*) Corrections for code cleanup 1175
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 08:15:39 +00:00
orbiter
13fdebc50d
added authentication for link deletion in search result
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 00:36:05 +00:00
orbiter
37f88b4017
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
ec2b39c1ce
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 22:30:15 +00:00
orbiter
8f1f2daa5e
implemented interactive link deletion of search results.
...
next steps: attach voting and restrict to administrator
to see the deletion button, move the mouse pointer to the left of a search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 16:15:21 +00:00
theli
6d0f7e6988
*) Adding missing file
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1171 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 11:20:06 +00:00
theli
44fa94ac52
*) Modifications for dbImport functionality
...
- dbImporter threads are now shutdown by the switchboard on server shutdown
- adding possibility to pause a importer thread via GUI
- Bugfix for abort function
See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363
*) Modification of content parser configuration
- now it's possible to configure which parsers should be enabled for the proxy,
crawler, icap, etc. separately
-
*) htmlFilterContentScraper.java
- adding regular expression to normalize URLs containing /../ and /./ parts
*) httpc.java
- adding functionality to unzip gzipped content
- requested by roland: should be used later to allow gzipped seed lists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 10:41:19 +00:00
orbiter
dc778659fb
fixed problem with time-out during result joint which caused OR behavior instead of AND beahvior
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1167 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 15:48:45 +00:00
orbiter
3d8a5ae652
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 14:24:13 +00:00
theli
64478b1f02
*) Adding possibility to delete crawler queue entries using regular expressions
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1160 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:11:28 +00:00
orbiter
a04930f025
code cleanup
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
low012
90b0eb144e
just a typo...
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-03 09:58:00 +00:00
theli
129b15f3e1
*) Correcting logging output of db importer thread
...
See: http://www.yacy-forum.de/viewtopic.php?t=1555
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-02 11:56:12 +00:00
orbiter
420d56ce79
extended db-testing
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1152 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-02 01:51:37 +00:00
orbiter
ecf765ec33
temporary fix to make jrpm extension compilable with my netbeans environment
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-01 23:03:54 +00:00
theli
8ed0aaae8d
*) Adding content Parser for RPM Files
...
- at the moment only the metadata is extracted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-01 10:47:00 +00:00
theli
818d37ce44
*) Removing getSimpleName
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:50:13 +00:00
theli
b35c5a48bf
*) First version of urlRedirector.pl script
...
- with this script it's possible to pass URLs from squid
to yacy via the squid redirector interface
- this URLs are then used by YaCy to feed the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 12:27:03 +00:00
theli
bdf30117c1
*) Redesign of parser configuration
...
- restructuring of mimeTypes based on the parsers
- displaying parser usage count
- displaying human readably parser names
- displaying parser version information
*) httpdFileHandler.java
- adding possibility to support "streaming" servlets
which are special servlets that can communicate with
the client via the connection streams autonomous
- the name of these new servlet types must end with the
file extension .stream
- this feature will be needed by the yacy ScreenSaver
class to fetch statistic data from the peer without the
need to reconnect to the server all the time
*) Adding human readable names and version information for
all supported parsers
*) plasmaParser.java
- adding new structure to store parser statistic data
*) Adding openDocument parser
- can be used to parse odt files
*) jmimemagic
- adding rules to detect openDocument formats properly
*) serverLog.java
- adding functions that can be used to query if a given
logging level is enabled or not.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 07:27:58 +00:00
theli
d4ac3e25b1
*) Bugfix for file system link bug during detection of invalid URLs
...
See: http://www.yacy-forum.de/viewtopic.php?p=13301
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 07:17:43 +00:00
orbiter
adf75bc9fa
better logging for invalid file path detection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 22:55:30 +00:00
orbiter
40621a5663
anhancements in ranking preparation and fixed problem with parser/mime recognition
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 11:55:24 +00:00
theli
c650b112ea
*) Bugfix for relative URL Bug in Crawler
...
See: http://www.yacy-forum.de/viewtopic.php?p=13266#13266
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 06:35:23 +00:00
theli
4e73035aef
*) Bugfix for "too many open files" during index distribution
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-24 21:47:16 +00:00
orbiter
f57e2d67f5
shortened network overview (less columns fit easier on page)
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 11:57:30 +00:00
orbiter
85282b1d98
enhanced YBR recognition and search result heuristics
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 01:40:02 +00:00
orbiter
b9cc9029e3
added ybr selection for remote search
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 16:10:24 +00:00
orbiter
0e25020f51
added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 15:17:05 +00:00
theli
90d6c6223b
*) Adding color codes to network graphic legend
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 08:13:01 +00:00
orbiter
bfe51c7228
added generation of domain-list
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 01:30:30 +00:00
orbiter
0ec54d9c5f
enhanced CR-file handling and added first RCI-evaluation tests
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 18:55:35 +00:00
theli
c2fe3a1670
*) Updating jMimeMagic Ruleset
...
- to detect some special formated html documents correctly
- adding rule to detect vCards
*) plasmaParser now supports parsing of files that have a supported fileExtension
but a unsupported mimeType because the webserver has set it incorrectly to text/plain
*) Adding vCard new Parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 14:39:58 +00:00
orbiter
88e3234393
fine-tuning of rci-generation
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-18 02:00:25 +00:00
orbiter
a12759c1bf
first try to implement a rci-computation from cr-files
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 16:17:56 +00:00
orbiter
4a8e8f269e
refactoring of cr-processing; new kelondro class to handle the attribute file format
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 12:08:04 +00:00
orbiter
24dc0e0760
implemented cr-file processing and further transmission steps
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 01:59:01 +00:00
orbiter
9d9a87f445
limited htcache storage length
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-16 18:40:44 +00:00
theli
d0dfccdb77
*) Making CrawlStacker pool configurable via GUI and config file
...
See: http://www.yacy-forum.de/viewtopic.php?t=1448
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:46:22 +00:00
theli
3631cb1f6d
*) deleting empty entities during index selection
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 12:23:46 +00:00
theli
ca26aab9b1
*) More debugging output for migrateWords
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 11:55:09 +00:00
theli
9b35ae9027
*) Correcting wrong % values on IndexTransfer_p page
...
See: http://www.yacy-forum.de/viewtopic.php?p=12646
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:52:50 +00:00
theli
e6bf9d90a5
*) Fixing Problems with MalformedURLs during Word Selection
...
- removing (lurl.toString() == null) comparison because toString() is never null
- adding (lurl.url() == null) condition because url() is null if we have selected a word entry with
a malformed URL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 09:07:00 +00:00
theli
86a9210264
*) indexing queue slots are now configurable via config file
...
See: http://www.yacy-forum.de/viewtopic.php?t=1480
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:25:46 +00:00
theli
3c11d7b81c
*) Bugfix for minimizeUrlDB
...
- function didn't work correctly because of new url hash structure
See: http://www.yacy-forum.de/viewtopic.php?p=12753#12753
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 07:35:04 +00:00