Commit Graph

216 Commits

Author SHA1 Message Date
theli
813a8a8179 *) migration of mimeTypeParser to jmimemagic 0.1
- better mimetype detection for rss feeds
   - better mimetype detection for odt documents (less memory consuming)
   - two new detector classes implementing MagicDetector interface of jmimemagic

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-22 11:40:46 +00:00
allo
b0a4fcce8c fix from theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 18:03:24 +00:00
theli
b6c7b91582 *) Parser now throws an ParserException instead of returning null on parsing errors (e.g. needed by snippet fetcher)
*) better logging of parser failures
*) simplified usage of plasmaparser through switchboard
*) restructuring of crawler
   - crawler now returns an error message if it is used in sync mode (e.g. by snippet fetcher)
*) snippet-fetcher: more verbose error messages
*) serverByteBuffer.java: adding new function append(String,encoding)
*) serverFileUtils.java: adding functions to copy only a given number of bytes between streams


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 12:25:07 +00:00
orbiter
e03427871e enhanced surftipps:
- added switchh to show or hide surftipps
- more news contribute to surftipps
- added voting system for surftipps

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2638 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-20 07:17:41 +00:00
theli
cc667b0aa5 *) htmlFilterContentScraper.java: adding support for link tag
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-19 16:13:13 +00:00
orbiter
f453c14b5d removed unreacheable catch blocks and unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 11:23:58 +00:00
theli
ad7f600f25 *) Bugfix. re-enabling inheritance of serverCharBuffer from writer class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 11:04:16 +00:00
theli
97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
- serverFileUtils.java: 
   -- adding methods to copy from stream to writer and readers to writers
   -- moving httpc writeX methods into serverFileUtils class
   - serverCharBuffer.java: removing inheritance from Writer class
   - replacing htmlFilterOutputStream by htmlFilterWriter class which handles
     content as char stream
   - htmlFilterContentTransformer.java: deactivating getText mode 
    (still needs to be migrated to use char streams instead of byte streams)
   - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
   - changes in Scraper and Transformer classes to operate on chars instead of bytes
   - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 10:12:11 +00:00
orbiter
3aac5b26da - added automatic tag generation when a web page from the search results is added
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 00:37:02 +00:00
theli
0e84a969d6 *) Bugfix for serverCharBuffer read from file operation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2607 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-16 13:11:32 +00:00
theli
90ef19d778 *) first version of a serverCharBuffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-16 12:56:03 +00:00
orbiter
1b48473bc5 bugfix to utf8 recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 23:55:06 +00:00
orbiter
90f7241b59 serverByteBuffer.trim() can now recognize utf-8 characters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 23:52:26 +00:00
theli
8115ac47b5 *) charset aware metadata parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 15:01:25 +00:00
theli
74c3e7cf29 *) storing document charset into plasmaParserDocument object (is needed later by the condenser)
*) htmlFilterContentScraper.java: using proper charset for document title
*) serverByteBuffer.java: adding new toString which allows to specify the charset for byte encoding


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 13:18:12 +00:00
theli
e2f8339827 *) some bugfixes for UTF-8 related problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2577 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-14 05:16:36 +00:00
orbiter
82a6054275 - fixed bug with new indexAbstract generation
- added partly evaluation of indexAbstracts during remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 10:39:25 +00:00
orbiter
309accb983 memory control for ymage generation:
the ymageMatrix initializer throws an RuntimeException if there is not
enough memory available to generate a new ymage of wanted size

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 07:01:39 +00:00
orbiter
c2e6cc8c6b small part of Bosts patch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:40:23 +00:00
orbiter
a2525072f2 bugfix for kelondroRow - property generation
this bug affected ranking parameters :-(

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 10:55:34 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
orbiter
39b4c26bdc more memory control:
- catchup of OutOfMemoryError in server threads
- automatic adoption of word cache size after a Short Mem Cycle

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 00:06:39 +00:00
orbiter
eb633c0a4f server threads must now supply a method that can be called in case
of short memory. This has been realized for the indexing thread.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 02:07:03 +00:00
orbiter
0187c60010 because of a bug in the JRE 1.4.2 there was no memory protection
see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462
this commit fixes the bug by using a memory-computation patch.
All uses of Runtime.maxMemory had been replaced by serverMemory.max
The bug is not present any more in Java 1.5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 01:33:54 +00:00
orbiter
314021453f * more logging
* option in yacy.init to set useCollectionIndex usage

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-10 21:21:50 +00:00
theli
c09f734d06 *) offer router configuration on ConfigBasic.html
- checkbox to allow router configuration is shown if
   - a) the UPnP forwarder is installed
   - b) a UPnP enabled router was found
   - c) no other forwarder was configured
   See: http://www.yacy-forum.de/viewtopic.php?p=24264

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:31:18 +00:00
orbiter
d468d665c9 some changes that may help to prevent deadlocks that cause an OutOfMemoryError
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:19:01 +00:00
orbiter
8b77afd72c some fixes to new container merger
and some code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 22:40:11 +00:00
theli
839806a775 *) serverPortForwardingUpnp.java: code cleanup, license header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 15:32:35 +00:00
theli
03230cd887 *) removing old port forwarding classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:42:12 +00:00
theli
6e676224d0 *) adding support for upnp
A new port forwarding method for upnp was added.
   If this method is enabled, yacy automatically determines an UPnP 
   capable internet gateway and configures the gateway port forwarding
   settings properly. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:26:45 +00:00
orbiter
1ed3e2daef added option to extract domains and/or urls from the eurl database
when extracting from eurl, the html output format is recommended, since
this format adds also the fail reason to the domain/url.
The complete syntax for domain extraction is now
java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl | eurl } ] [ -format { text  | zip | gzip | html } ] [ <path to DATA folder> ]


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 08:08:33 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
orbiter
b3f7e62e03 better handling of whitespace
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 23:53:27 +00:00
orbiter
4149939c02 better handling of whitespace for gettext quotation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 23:18:06 +00:00
orbiter
97fa6788a1 added gettext support:
automatic replacement of string appearances in html files by
gettext quotes.
see also: http://www.yacy-forum.de/viewtopic.php?p=23901#23901

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 22:35:36 +00:00
orbiter
67edd80884 removed tabs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 11:13:14 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
allo
6acb6a4d8f tiny performance optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-09 15:37:45 +00:00
theli
fe617d7e54 *) adding function to return the protocol type of a ssl connection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 14:16:46 +00:00
orbiter
018b3e0832 added pause option to server threads.
The pause is started by calling intermission(Long.MAX_VALUE)
and can be stopped by calling intermission(0)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2272 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 13:20:14 +00:00
allo
0621106ef3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2214 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-06-18 12:15:26 +00:00
orbiter
12af69dd86 cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-18 11:49:31 +00:00
rramthun
bc94a714b2 Better explanation for the auto-dom-filter.
Some javadoc.
Small change to DetailedSearch.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2146 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-26 12:18:12 +00:00
theli
a520ab2e8c *) adding possibility to use an existing PKCS12 certificate for https
instead of creating a new one.

   Notes:
   This import is done automatically on startup if the following properties 
   are set in the config file:
     pkcs12ImportFile = 
     pkcs12ImportPwd = 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2139 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-24 07:15:42 +00:00
orbiter
cda087f43b - integrated cache miss storage into object cache
- removed cache-miss handling from indexURL
todo: new Monitoring in PerformanceMemory_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-23 16:43:28 +00:00
theli
8b7626f8d1 *) Automatic redirection of browser if user changes port settings in ConfigBasic
See: http://www.yacy-forum.de/viewtopic.php?t=2415
*) If ssl is available, the browser conntects to yacy via https on yacy startup
   See: http://www.yacy-forum.de/viewtopic.php?p=21649#21649

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-20 14:05:49 +00:00
theli
cea6e416d9 *) wrong name in header
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 09:52:22 +00:00
theli
df068cf23c *) adding first version of native SSL support for yacy
VERY EXPERIMENTAL!
   See: http://www.yacy-forum.de/viewtopic.php?p=18516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 09:41:29 +00:00
orbiter
83e0e765ec redesigned some parts of the html scanner & parser
to better support image tags

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-04 14:36:01 +00:00