Commit Graph

3366 Commits

Author SHA1 Message Date
lotus
c8451614f3 fix for overflow
http://forum.yacy-websuche.de/viewtopic.php?p=11696#p11696

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-05 18:28:27 +00:00
orbiter
c4c4c223b9 fixed a problem with attribute flags on RWI entries that prevented proper selection of index-of constraint
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-04 02:27:29 +00:00
orbiter
6072831235 no cr transmission for robinson peers
see also: http://forum.yacy-websuche.de/viewtopic.php?p=10290#p10290

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-03 23:44:42 +00:00
low012
afe98bc11c *) added changes as proposed by Halborinda in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1674
*) changed indention

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-03 08:24:08 +00:00
orbiter
07fc115e90 removed active profiling in kelondroRowSet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-02 12:33:06 +00:00
orbiter
be4c458951 refactoring (implemented Iterable in kelondroRowCollection)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-02 11:38:20 +00:00
low012
bb5c2cd12e *) ISINDEX parameters will not be put on commandline anymore to prevent possible security hazards (better safe than sorry). Parmeters will have to be read from QUERY_STRING in ISINDEX case too which does not seem to be uncommon behaviour for web servers: http://vms.pdv-systeme.de/users/martinv/cgi_basics/cgi_basics.html#Datenuebergabe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-02 11:18:26 +00:00
orbiter
b6bba18c37 replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps
this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 22:31:16 +00:00
low012
db1cfae3e7 *) cleaning up after myself
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 19:45:15 +00:00
low012
f547f9a78c *) added CGI capabilities (run Perl scripts and other software via HTTP GET and POST)
*) set cgi.allow to true in yacy.conf to enable CGI (CGI is disabled by default)
*) edit cgi.suffixes in yacy.conf if necessary to use additional script types

ATTENTION: This is a rather experimental feature, not all environment variables are set yet. 

Only enable CGI if you know what you are doing. Poorly implemented CGI scripts can put a system's integrity at risk!

Implementation of more environment variables and documentation due for the next days.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 19:40:06 +00:00
f1ori
bdc380cd84 * add lastModified to templateCache
-> no outdated files from cache anymore...


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 14:56:53 +00:00
f1ori
025094675f * remove empty directory
* add necessary dependency for pdfParser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5424 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-31 19:39:02 +00:00
f1ori
c5691180cb * skip style-tags in HTML-files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-31 19:34:24 +00:00
orbiter
3567c58b18 added another filed information for BLOBHeap dumps: the gaps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-30 10:49:43 +00:00
orbiter
abdd4aa414 added a index dump for blob heaps:
this will increase the shutdown time for at most some seconds, but will speed up the start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 21:36:27 +00:00
orbiter
8c3205b62e fix for OOB Exception
see http://forum.yacy-websuche.de/viewtopic.php?p=11598#p11598

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 17:36:53 +00:00
orbiter
78c568331e added test channel to /xml/feed.rss
can be obtained with 
http://localhost:8080/xml/feed.rss?set=TEST
returns always a single feed entry with a fresh date

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:39:07 +00:00
orbiter
e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
- refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:22:13 +00:00
f1ori
2d2ce24011 * remove all encoding-stuff from proxy
encoding is handled by parsers or browser, proxy only passes through


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-23 19:14:54 +00:00
f1ori
73c8a0839c * abort download, when proxy connection is closed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-23 11:30:24 +00:00
orbiter
bb935fdbb0 less organization overhead for DNS caching and prefetching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-23 10:06:49 +00:00
f1ori
4907697cfa * make fileuploads through proxy bigger than 65500 bytes possible
* remove gzip-encoding for files from cache


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-22 23:04:00 +00:00
orbiter
fc8189f3fb better self-healing of corrupted databases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-22 16:43:49 +00:00
f1ori
963da8c3f9 * updated tm-extractors to new version 1.0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-21 14:51:03 +00:00
orbiter
e34ac22fbd - added new monitoring servlet at
http://localhost:8080/PerformanceConcurrency_p.html
- used the new monitoring to do some fine-tuning of the indexing queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-19 15:26:01 +00:00
lotus
449e697436 fix for null-seed in seedfile
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1653

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-19 12:10:01 +00:00
orbiter
d376d81fc4 replaced busy thread control of crawl stacker by blocking threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 23:18:34 +00:00
orbiter
f29b48d9ff patch for IndexOutOfBoundsException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 22:05:26 +00:00
f1ori
0881190b19 * Robots.txt: don't interpret Crawl-Delays for other robots
fixes: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1647


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 15:35:41 +00:00
orbiter
243e73f53b removed unnecessary usage of kelondroBLOBTree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 00:18:37 +00:00
orbiter
8cb7170b75 - set status of kelondroTree, kelondroBLOBTree and kelondroFlexTable to deprecated
- removed initialization and/or usage of kelondroFlexTable (should meanwhile not be used any more)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 00:08:17 +00:00
orbiter
7535fd7447 - refactoring of CrawlEntry and CrawlStacker
- introduced blocking queues in CrawlStacker to make it ready for concurrency
- added a second busy thread for the CrawlStacker
The CrawlStacker is multithreaded. It shall be transformed into a BlockingThread in another step.
The concurrency of the stacker will hopefully solve some problems with cases where DNS blocks.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5395 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-17 22:53:06 +00:00
lotus
18513e2ee2 npe fix: http://forum.yacy-websuche.de/viewtopic.php?t=1646
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-16 13:36:13 +00:00
orbiter
2802138787 - refactoring of CrawlStacker (to prepare it for new multi-Threading to remove DNS lookup bottleneck)
- fix of shallBeOwnWord target computation heuristic


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-15 00:02:58 +00:00
orbiter
db6b3bf5a3 speed enhancement for integrated http server:
- tuning hacks in template engine
- bypassing the template engine if no servlet present

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 20:10:37 +00:00
orbiter
7cd08bd5fb fix for NPE in BLOBCompressor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 13:33:24 +00:00
orbiter
5b94498643 fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 14:35:01 +00:00
orbiter
1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile
- added a write buffer to BLOBHeap
- modified the BLOBBuffer (is now only to buffer non-compressed content)
- added content compression to the HTCache
The new read cache will decrease the start/initialization time of BLOB files,
like the HTCache, RobotsTxt and other BLOBHeap structures.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 11:15:19 +00:00
orbiter
e1acdb952c fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-08 00:17:45 +00:00
orbiter
4a2dac659e more speed hacks:
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 13:55:48 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
759cef23dd fix for bug in kelondroAbstractRA.readFully
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:32:07 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
orbiter
0b4808ba3d added new interactive search feature:
- during the user types search queries, the local database is searched
- results are presented interactively

This was implemented using a new JSON result format for search results in YaCy
- added JSON as file format for servlets
- refactoring of current search servlets (xml and html)
- added JSON output format for search results
- added AJAX-based search page, that uses the yacysearch.json selrvlet to print results as a query is typed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-02 15:24:25 +00:00
orbiter
74a3d86114 fixed a error response that might present classified information
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:14:42 +00:00
orbiter
c6525ab75f fix for NPE in seed handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-01 23:08:27 +00:00
lotus
1951d30a62 addendum to last commit
handle words with length < 3 correctly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 19:43:40 +00:00
lotus
325ba7bfb8 only query words with length > 2
this is not complete, yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-26 16:41:38 +00:00
low012
e423fa9846 *) added method to only get file names in directory listing which match a filter
*) only files which end with .black will be listed as blacklists
*) added a little bit of Javadoc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-25 20:26:06 +00:00
orbiter
513179f404 changed interface to colletctionIndex and adopted all implementing classes:
do not return a result of a double-check when adding entries with addUnique

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 23:55:08 +00:00
orbiter
9d64693cfb reverting again the changes to new concurrent chunkIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 22:22:44 +00:00
orbiter
45ad1c3dd5 - re-activated concurrent iterator for EcoFiles
- added javadoc for new concurrent intialization in kelondroBytesLongMap
- switched default value for commons storage to false
- version step

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 18:25:40 +00:00
orbiter
2e2120046f speed enhancement for BLOBHeap opening process
using concurrency of FileIO and content processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 17:38:01 +00:00
orbiter
fa26a8f25a fix for deadlock-like behavior in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 11:25:01 +00:00
orbiter
1918a0173e added more exception handling during crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:40:18 +00:00
orbiter
10f5ec1040 reverted last commit (more testing needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:12:50 +00:00
f1ori
5af8923f37 * distribute forgotten jar-file in parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:05:04 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
orbiter
0ca4bc7b79 - added reader and visualization for mediawiki-export files:
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.

- enhanced also the md5 computation speed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 18:31:52 +00:00
danielr
2e63f03ca5 copy&paste vergessen :/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:41:11 +00:00
danielr
cd8082b4e3 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1111#p11166
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 11:18:19 +00:00
lotus
4f996a7651 fix for logparser pattern
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-17 16:23:17 +00:00
f1ori
d18c18971e * dirlisting in UTF-8 encoding
* fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550&hilit=#p11108


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 20:49:03 +00:00
orbiter
867d0f2f56 removed some unnecessary pause delays
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 23:36:33 +00:00
f1ori
d49ffcd818 * files distributed by yacy are utf-8, files from repository use the system default charset
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1564#p11092
  and http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1550


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 20:49:16 +00:00
orbiter
8c96bc2ac1 do not use proxy caching rules for crawling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 16:31:04 +00:00
orbiter
dba7ef5144 extended crawling constraints:
- removed never-used secondary crawl depth
- added a must-not-match filter that can be used to exclude urls from a crawl
- added stub for crawl tags which will be used to identify search results that had been produced from specific crawls
please update the yacybar: replace property name 'crawlFilter' with 'mustmatch'.
Additionally, a new parameter named 'mustnotmatch' can be used, which should be by default the empty sring (match-never)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-14 09:58:56 +00:00
orbiter
96174b2b56 more debugging / better result status logging for parser/caching errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:41:43 +00:00
f1ori
90e78b2cf6 * improve encoding detection of http service
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 21:06:32 +00:00
orbiter
ef66438662 - more space in error db to store larger error messages
- added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 21:42:12 +00:00
orbiter
674ad2d55b different handling of error cases that occur during loading files with http or ftp:
methods throw exception instead of returning an error string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 21:33:40 +00:00
danielr
538359a0ff simple fix to get DHT working again (maybe something more has to be done ;)
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1578



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 18:55:16 +00:00
f1ori
7e1fe05e3c * added utf8-encoding to many getBytes-calls
* utf8 should work now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-08 20:24:31 +00:00
lotus
fad044fb54 update to snippet marker:
- do not display indexed html (solves xss issues)
the single words are analyzed for already marked parts. this is needed to avoid false encoding of the marker (<b>) tags.
- improved speed for existing routine
heavy used regex pattern are precompiled now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-08 10:08:53 +00:00
lotus
16723d0fa6 ask another peer if crawljob loading fails
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 14:14:34 +00:00
orbiter
1b18d4bcf3 enhancement to crawling and remote crawling:
- for redirector and  remote crawling place crawling url on notice queue instead of direct enqueueing in crawler queue
- when a request to a remote crawl provider fails, remove the peer from the network to prevent that the url fetcher gets stuck another time again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 12:30:55 +00:00
orbiter
3f746be5d4 - consolidation and refactoring of many DHT target - computing methods
- implemented vertical DHT acceptance ("my own DHT") to accept new targets
- added new target computation for global search: addresses vertical targets also
- enhanced remote crawling: collection of remote crawl urls if queue has less than 100 entries (was: 0 entries)
- better performance value computations for PPM selection in network configuration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-06 10:07:53 +00:00
orbiter
d014b2728a Design-check, Extension and Refactoring of DHT target position computation:
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-03 00:27:23 +00:00
orbiter
dd27ce7216 added control logic to ECO tables that deletes ram copies of the tables if they get too large
table copies in ram are now abandoned if less than 20 MB ram is left

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:53:09 +00:00
orbiter
38e6ba5d00 forgot to re-rename commonsPath
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:39:02 +00:00
orbiter
22989d0d8a added property index.storeCommons to switch commons storage on or off
with index.storeCommons=false all currently stored commons are deleted!
Default is now 'true', but in future full releases it will be switched to 'false'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:30:09 +00:00
f1ori
4b4ce75396 * http-server: submit charset from html metatags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5314 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 23:17:51 +00:00
f1ori
69e695bd4b * detect charset for directory index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 22:14:31 +00:00
f1ori
340ecd919d * include non ascii characters in visible characters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5312 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 21:13:57 +00:00
lotus
5cf0cbb47e javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-01 08:56:58 +00:00
lotus
8d07607d1d update to resource observer:
- returns high/medium/low disk space
- pauses crawling on medium disk space
- disables index receive on low disk space

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-31 11:33:17 +00:00
f1ori
d0543a7c39 * fix the debug ant-target
* fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 22:16:56 +00:00
low012
baae3d91b1 *) fixed warning when compiling listManager
*) fixed display of values of information for which part of YaCy (crawler, proxy, ...) blacklist is activated for
*) replaced regular put() with putXML() in several cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 16:56:19 +00:00
danielr
a4fb76e93c undo r5300 (not fixed as seen after longer run)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 23:20:09 +00:00
low012
a99a629ed4 *) quick fix to prevent comments for blog entries which don't exist (http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1554)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 12:04:10 +00:00
low012
00e27e5050 *) fixed bug which made it possible to write files outside of the DATA/LIST directory when creating a new blacklist
*) a blacklist will only be created if no blacklist with same name exists (some refactoring has been necessary for this)
*) further minor fixes
*) to be continued...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-25 00:11:03 +00:00
danielr
0f9c0bd0d5 fix for ConcurrentModificationException at de.anomic.index.indexContainerHeap$heapCacheIterator.next(indexContainerHeap.java:324)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5300 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-24 14:00:41 +00:00
danielr
103ad2a437 some javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-24 13:58:26 +00:00
orbiter
b098522977 some very small advances to index utf-8 (not working yet), inserted also debugging code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 22:04:13 +00:00
orbiter
2f49666908 integrated the character decoding into the parser, removed old code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 20:56:13 +00:00
orbiter
49293c1358 fix for deadlock in new encoder :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5296 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 19:36:34 +00:00
orbiter
0edec2b760 FULL redesign of algorithms in htmlTools to encode/decode strings from/to unicode and html.
The old process used a not really efficient way to detect html encoding strings in texts.
All calling methods had been adoped to call the new class in an enhanced way with less parameters.

Many classes in interfaces used a XML encoding only (instead of full html conversion from unicode to html); this behavior was not changed with this commit but should be controlled again since it points out possible XSS leaks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 18:59:04 +00:00
orbiter
958ec20cd0 removed specialized umlaute-handling in html parser. This has to be replaced by something that is able to transfer all possible html encodings into utf-8. Please see SVN 5293 for test cases.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 11:11:55 +00:00
f1ori
2e53cbc66a should compile now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 08:50:30 +00:00
f1ori
f3bf2e379e should compile again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5291 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 07:35:49 +00:00
f1ori
dd8441f102 fix bug: data from plasmaParser is allready converted to UTF-8
After removing the restrictions in the code, YaCy should be able to index Unicode-charaters!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-21 20:19:10 +00:00
orbiter
6941bf42b1 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-20 14:07:09 +00:00
orbiter
9b0c4b1063 redesign of parts of the new BLOB buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-19 22:30:44 +00:00
orbiter
1778fb420d - added some performance tweaks to the new BLOB buffer
- removed the now superfluous HT storage thread
- reduced number of file decompression by shifting the compression moment to the future


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-19 18:10:42 +00:00
orbiter
9663e61449 added another class to handle BLOB writings to the new HTCACHE data storage:
- entries are buffered and written as stream with many entries at once (saves many IO accesses)
- entries are compressed with gzip: increases capacity of cache
- concurrency for stream-writing and compression: all writings to the cache are non-blocking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-18 08:57:48 +00:00
orbiter
382226da94 fix for bug introduced in SVN 5281: parameters were switched
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5282 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-17 12:14:57 +00:00
danielr
f2fd043797 refactoring (moved duplicate code into methods)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-17 10:39:32 +00:00
danielr
c612046e5e r5278 java 1.5 compatible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-17 09:59:59 +00:00
f1ori
af71ec93bf ops, forgot to import something
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5279 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 22:44:25 +00:00
f1ori
9e65e9141c * always use UTF-8 for encoding hashes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 22:35:27 +00:00
orbiter
826ca79735 refactoring and new architecture to store the files of the web cache:
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods

this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 21:24:09 +00:00
danielr
f095137238 - respecting httpdMaxBusySessions (refusing new connections if limit is hit)
- comments in serverBusyThread converted to JavaDoc
- better debug output for npe-case in diskUsage


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 10:53:32 +00:00
orbiter
8ba33f104e fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:59:53 +00:00
orbiter
998861acfd - some refactoring in BLOBHeap to enable more gap processing functions
- better gap merging in BLOBHeap
- shrinking of heap file if gap is at end of file when file is closed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:15:54 +00:00
lotus
9d50bfd0b3 fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 09:09:53 +00:00
orbiter
766cad6e93 enhancement in memory management of BLOB Heap files / merging of deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5266 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 22:15:01 +00:00
orbiter
7860d5d632 fix for bug in seed list management (cause was bad class overloading, only visual effects!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 19:51:53 +00:00
orbiter
ffed5fc415 fixed problem with lost peers in database
migrated seedDB from BLOBTree to BLOBHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 14:40:02 +00:00
orbiter
6fb865fbdc - fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing
- some refactoring of classes that use kelondroMap (Map instead of HashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 08:39:11 +00:00
orbiter
2d65887723 - fix for bug in new profile handling
- added a new feature in ymageChart (cannot be seen yet, just wait... will be used in profiling chart)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5261 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-09 22:31:43 +00:00
orbiter
ff68f394dd fix for problem with balancer and lost crawl profiles:
if crawl profile ist lost, no robots.txt is loaded any more

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5258 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-08 18:26:36 +00:00
lotus
fb8d9850ea fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1462
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-05 10:03:02 +00:00
lotus
0d1a2f6183 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1461
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-04 12:36:11 +00:00
orbiter
9ac16f565b - fixed several bugs in database management functions
- fixed a display bug for the performance graph
- fixed deadlock when initialization of awt happens simultanously
- removed some debugging output

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-03 18:57:02 +00:00
orbiter
820a03f9d6 - removed some warnings
- used fix in SVN 5233 for ysearch.java and search.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5237 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 20:20:39 +00:00
lotus
fe2792e9ce use accept-language header instead of user agent for language detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 17:47:11 +00:00
orbiter
c8bdd965ec - larger update time for status page
- balancer writes cause of robots.txt in log file for crawl delay
- removed log output for forced GC
- smaller RAM flush for RWI cache, should cause more usage of cache and faster crawling

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-30 11:09:46 +00:00
lotus
dda771db9d - search result layout
- tray only for windows

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-29 12:39:57 +00:00
orbiter
ce4715e305 removed indexing of anchor links and tagging such words as part of urls (that was wrong)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 21:12:26 +00:00
orbiter
ce57de6cb3 - fixed re-setting of DHT Send/Receive settings
- small change to network grafics: smaller circles / more URLs necessary for full radius; more PPM necessary for full crawling circles
- fixed exclusion search ('-' did not work any more)
- fixed NPE bug when FTP loader wrote to the error-db

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 20:01:10 +00:00
lotus
31c31e54e4 new tray icon image for different icon sizes (e.g. linux)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-28 08:54:33 +00:00
f1ori
9589dfe080 * removed trayicon popupmenu title
* added some menu items to trayicon


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-27 08:25:16 +00:00
lotus
5a637f004d localized tray
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 11:09:54 +00:00
lotus
9d4f0325e1 - removed shutdown from search page (we have it in tray now!)
- fixed doubleclick action for tray

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 10:55:08 +00:00
lotus
214277dad6 - revert r5202
- cleanup
- installer checks for JRE 1.6 only

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-26 06:52:36 +00:00
f1ori
7afa084207 * add nativ java trayicon, using reflections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-25 19:36:49 +00:00
apfelmaennchen
b97ff24b43 bookmarksDB / xbel.xml:
- added support for folder=/foldername
- it crashes if foldername ends with /

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-24 21:16:13 +00:00
orbiter
6e7d113eac fix for wrong index initialization after network switch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 23:30:25 +00:00
lotus
0a0cc3bf67 added missing classes to build target "run"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5201 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 15:54:12 +00:00
orbiter
7b35d54c6c fixed some problems with network switching (was not completely 'clean')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-23 12:11:19 +00:00
orbiter
f0b42e5a98 fixed NPE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-22 14:04:38 +00:00
orbiter
8e0de7f180 update to language statistic evaluation:
- the condenser does not abandon too small words any more before feeding the statistics
- for text indexing no more urls are used to feed the index (this was wrong, but in contrast the indexing of urls for media search is necessary)
- urls are not used any more to feed the statistics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 20:25:47 +00:00
orbiter
1198eeecc7 added language selection to search query:
- the language can be selected using a LANGUAGE:<language> element in the query line, i.e.:
java LANGUAGE:en
- the language can be selected with a post element in google-style syntax with the 'rl' element:
?lr=lang_en&query=java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 07:28:57 +00:00
orbiter
00c1535f84 added ranking and evaluation of language type in a search
the wanted language is taken from the browser user-agent string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 00:04:42 +00:00
lotus
a81cb78211 finally some putHTML on htroot/xml/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5188 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-20 07:55:30 +00:00
orbiter
bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information
- metadata and results from statistical analysis are compared and result is printed out as debug lines
- added ranking profile for wanted language
- added class with ISO 639 table, a list of all valid country codes that will be used for the language identification

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 22:19:11 +00:00
apfelmaennchen
5e8bd0f29c small fixes to getpageinfo_p.xml and htmlFilterContentScraper.java with respect to keyword extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5185 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 14:27:44 +00:00
apfelmaennchen
5b2a57bfd0 - /xml/util/getpageinfo_p.xml added <desc> and <lang> tags
- changed htmlFilterContentScraper.getKeywords() to split either space or comma charater not both

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5183 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 21:01:23 +00:00
orbiter
e1f67262f7 - added and removed some debugging output
- fixed a bug with merge method
- patched wrong output of language identification (not fixed, only patched!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 14:12:15 +00:00
orbiter
ce2a7ed116 integrated language detection classes into condenser environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 13:12:33 +00:00