Commit Graph

1719 Commits

Author SHA1 Message Date
orbiter
5ff77612ac bugfix for old WORDS storage method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 02:20:27 +00:00
orbiter
0f10bdde22 more generic cache methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 02:13:13 +00:00
orbiter
72482b1426 fixed scraper
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 01:24:01 +00:00
hermens
6557112d8f small fix for plasmaURLPool.getURL() needed for new alternative htcache layout
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 17:32:01 +00:00
hermens
440c6ee657 Implement alternative htcache layout
mostly according to: http://www.yacy-forum.de/viewtopic.php?p=26205#26205



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 17:25:19 +00:00
allo
226f2c5b2c first version, of the Serverlet Debugger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 14:25:54 +00:00
orbiter
adf1f74ab2 bugfix for java 1.5 compile problem with serverCharBuffer.append(char)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 10:35:35 +00:00
orbiter
fd61209797 lines inside tags without punctuation are extended by a single dot.
This enables the condenser to distinguish the lines in a better way.
The result is a better preparation of snippets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 01:24:00 +00:00
allo
1d0c0edda3 first version of posts/get from the del.icio.us api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 22:16:09 +00:00
orbiter
1969522dc1 removed lowercase of snippets (and other things):
- added new sentence parser to condenser
- sentence parsing can now handle charsets

to do: charsets must be handed over to new sentence parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2712 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-07 00:06:09 +00:00
orbiter
43614f1b36 bugfix in collection index. the index for collections was not created correctly
The bugfix includes a migration function which starts automatically
after startup of yacy.
This applies only to you, if you are using the new collection index.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-05 23:47:08 +00:00
orbiter
1dfab1abe3 more control for seed receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-04 08:55:01 +00:00
theli
1c0e65f55f *) Bugfix for problems with charset detection
See: http://www.yacy-forum.de/viewtopic.php?p=26196

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-04 04:54:21 +00:00
orbiter
db294687ea enhanced logging
- more logging output
- fix in log line preparation
- added filter to log page
- some small bugfixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 22:55:59 +00:00
theli
a9a0f51303 *) suppressing InterruptedException errormessage
See: http://www.yacy-forum.de/viewtopic.php?t=2915

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 15:40:18 +00:00
theli
ce7ee74316 *) better errorhandling in filehandler (try catch block now starts before argument parsing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 14:21:46 +00:00
theli
1d4fb680ce *) CrawlWorker.java: only keep content in memory if size is equal or less than 5MB
TODO: make this limit configurable 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 12:16:25 +00:00
theli
1586d57187 *) odtParser: better handling of large files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 12:00:26 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
orbiter
630a955674 read snippets from cache in case they are not provided in RAM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 17:18:24 +00:00
orbiter
bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 13:35:38 +00:00
orbiter
c40fca08a2 fixed bad handling of string separation
you can now use a new encoding attribute to create strings from byte arrays

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 10:21:14 +00:00
orbiter
5a40ea7866 refactoring of wget string list generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:59:20 +00:00
orbiter
dbc2e039bb added time-out option parameter to call hierarchy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:40:18 +00:00
orbiter
d4c239e4be - fixed problem in collection index with deletion of single url references
- added automatic deletion of not-found snippets after search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:40:52 +00:00
orbiter
00746ca232 identified and fixed search performance problem caused by
snippet loading. Some access to header-db had been twice and even
more times in some cases. Snippet resource loading fixed.
Furthermore the snippet loading during remote search within the
remote peer has been disabled, but can be switched on remotely by
new flag 'includesnippet=true'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 01:15:02 +00:00
orbiter
b033a80750 better control of failure in node seek of kelondroTree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 00:13:19 +00:00
orbiter
310f1c41cd added option to see ranking scores in surftipps
and some cleanups

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:28:03 +00:00
theli
a2e3095044 *) Bugfix. Add missing plasmaParserDocument.close() calls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 10:09:01 +00:00
theli
cd5f349666 *) Better handling of large files during parsing
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
   Attention: the caller of this function has to ensure that enough memory is available to do this 
   to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java: 
   - better handling of documents with exotic charsets
   - better handling of large documents
   - better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
   to this object as byte array or temp file   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 09:31:53 +00:00
theli
8b2ceddb91 *) Displaying servere and warning logging messages in different colors on ViewLog_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 08:12:22 +00:00
low012
f8ac694e51 *) fixed a bug where searchword in snippets were not displayed bold in front of a punctuation mark (see http://www.yacy-forum.de/viewtopic.php?p=25998)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 00:27:42 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
theli
c665f6cddb *) handling of quotes in charset string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 06:29:15 +00:00
theli
b73efd5565 *) missing changes needed because of last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 05:48:28 +00:00
theli
140ddba93f *) adding soap functions to pause and resume the crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-27 05:22:43 +00:00
orbiter
2463e5624a 'quick' release 0.47
- documentation update
- necessary bugfixes (missing css for new peers)
- reduced effect of search result redundancy filter
- removed some debug output, but not all

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 23:41:54 +00:00
theli
49fbb688df *) SOAP: old urlInfo renamed to urlInfoByHash, new urlInfo Function added.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 15:14:33 +00:00
theli
8f143d516b *) make snippet fetcher accessible via soap api
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 15:07:16 +00:00
theli
97615af406 *) Restructuring of YaCy SOAP services
- general functions moved to abstract service class
   - service class splitted into SearchService, CrawlService, StatusService
*) Bugfix for SOAP search services
   - Attention: some xml tages where renamed
   See: http://www.yacy-forum.de/viewtopic.php?p=25877
*) New SOAP service function urlInfo to view the parsed content of an URL
   See: http://www.yacy-forum.de/viewtopic.php?p=25869

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 14:47:44 +00:00
theli
241b881560 *) Redesign of YaCy SOAP handler
- should be more fail-safe now
   - better handling of compressed request bodies
   - better handling of persistent connections
   - better handling of AxisFaults

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 12:24:40 +00:00
theli
009a33170b *) Content-Location header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 04:32:01 +00:00
theli
1aa07a52cd *) Bugfix for UnsupportedEncodingException if the media type contains multiple parameters
See: http://www.yacy-forum.de/viewtopic.php?p=25832#25826

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-24 15:50:51 +00:00
theli
625c2ce6b1 *) bugfix for snippet fetching problem if content but not http header is available in cache
See: http://www.yacy-forum.de/viewtopic.php?p=25748

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-22 11:55:28 +00:00
theli
813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
- better mimetype detection for rss feeds
   - better mimetype detection for odt documents (less memory consuming)
   - two new detector classes implementing MagicDetector interface of jmimemagic

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-22 11:40:46 +00:00
hermens
3f5a4153a0 Make Peers more receptible to transferred indexes
- Set MaxWordCount for dhtInCache to indexDistribution.dhtReceiptLimit
  so that the inCache gets flushed when the limit is passed
- Modify flushCacheSome to flush enough words to get below MaxWordCount immediately



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-22 10:58:58 +00:00
theli
57415b6889 *) Bugfix for surftipps UTF-8 problem
See: http://www.yacy-forum.de/viewtopic.php?t=2864

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2647 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-22 05:40:29 +00:00
allo
b0a4fcce8c fix from theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 18:03:24 +00:00
theli
b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
   - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 12:25:07 +00:00
orbiter
e03427871e enhanced surftipps:
- added switchh to show or hide surftipps
- more news contribute to surftipps
- added voting system for surftipps

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2638 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 07:17:41 +00:00