Commit Graph

2945 Commits

Author SHA1 Message Date
danielr
7bd8601f04 delete old releases compatible with java 1.5 ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4728 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-23 07:22:20 +00:00
danielr
da386a1924 fixed deleteOldDownloads if there are no downloads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-22 21:36:52 +00:00
danielr
21418a22a3 removed DEBUG output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-22 17:14:34 +00:00
danielr
79a3edeeef deleting downloaded releases after x days (default 30)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-22 16:53:53 +00:00
danielr
763f9d4f5d serverCore: setting timeout for new connection before SSLDetect
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-22 09:03:16 +00:00
orbiter
1995faef8d - refactoring of Colage back-end: move to plasma package
- renamed also the plasmaCrawlResults to have a consistent naming for url and image queues
- added a double-check for the images
- added additional queues for the images: all worse-quality images go there, so the queue can be used also if no sizes are given; no image is lost
- added a cleanup for the stacks so they cannot flood the memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-21 22:42:49 +00:00
orbiter
d7e89c2aca fixed near-deadlock situation when deleting crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-20 22:10:26 +00:00
orbiter
5e3ce46339 - better logging when rejecting a url because it is not in declared domain
- more XSS attack protection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-20 21:36:25 +00:00
danielr
48ffd61e6a changed "patched wrong" to warning, so it goes to the logfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 07:54:44 +00:00
orbiter
2f629d20a7 - tried to fix the '4217666-problem'
- removed more unused code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 04:24:29 +00:00
orbiter
512f48e7d6 - removed unused methods
- fixed xss attack on peer list in CrawlStartSimple

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 03:33:07 +00:00
orbiter
3c76342619 - added servlet to configure the search page greeting line
- added information output about the current network definition in the network servlet
- better description and usage of profile entries in User Profile servlet regarding FOAF format
- reformatting of menues at status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-18 13:58:56 +00:00
danielr
d1ee231866 HTTPC close more unused connections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-15 16:37:51 +00:00
danielr
181796cffb - HTTPC ConnectionInfo entfernen bei Exceptions, unnötigen Code entfernt
- FTPC (GET-)connections bei Fehlern auf jeden Fall schliessen


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-15 15:27:32 +00:00
orbiter
04c1226c80 added/fixed missing integrity-test else-case during deploy in case that we update with a tar file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-15 15:20:35 +00:00
orbiter
45ae3da7e7 another patch to prevent NPE in EcoTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 05:33:32 +00:00
danielr
96e39b297a reduced StackTraces (by connect timed out)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:50:49 +00:00
orbiter
93376acdca fixed a bad chunkcache limit check which could have caused ArrayIndexOutOfBoundsExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:49:02 +00:00
orbiter
1cab240198 patch for possible NPE in EcoTable iterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:20:37 +00:00
orbiter
9a32a4c328 fixed concurrentModificationException during hello-process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:04:28 +00:00
danielr
64c33e717f catched ConcurrentModificationException in ConnectionInfo.cleanUp so cleanUp is not interrupted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:02:44 +00:00
danielr
d8677ba611 fixed ConcurrentModificationException in HttpConnectionInfos
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-13 11:25:41 +00:00
orbiter
c7021c14bb patch for ArrayIndexOutOfBoundsException in BMP parser
(may occur in case of malformed BMPs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-13 03:28:26 +00:00
orbiter
8dd35f74c8 fixed redirect problem (does not work for POST)
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1068&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 16:35:09 +00:00
orbiter
8313d58ae7 - integrated the collage into the Web Visualization menu
- added a counter for the public and private queue on the page (testing..)
- fixed wrong public/private categorization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 15:45:57 +00:00
danielr
2617f4dcdb Connections_p.html: better formatting and remove very old entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 15:19:18 +00:00
orbiter
82bf9ac1c8 - added Collage servlet from datengrab and modified it:
* all images are queued
* private/public is respected
* inserted into switchboard
* added collageQueue class that stores all the queued images

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 13:24:21 +00:00
danielr
959f448e5f - disabled redirects in proxy (so client sees real path)
- added connection stats (only connections currently in use)
- remove "old" connections (closed or idle for some time)
- synchronized shared parts of proxyHandler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 11:39:48 +00:00
orbiter
8fe39ebd74 -fixed file transmission with POST. The only usage was in ranking transmission, therefore:
-fixed ranking transmission

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 08:12:51 +00:00
orbiter
82a9861779 fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-11 12:55:43 +00:00
orbiter
5d1fbb25e7 fix for bad deploy:
- the name of downloaded release files is adopted if the httpc delivers uncompressed tar.gz files (the .gz is removed from the file name)
- the deploy method is able to handle tar-file (not tar.gz-files)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-11 12:37:17 +00:00
orbiter
202a3adb3e refactoring of HttpClient Writer processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 22:47:05 +00:00
danielr
8aa9fd8f24 HTTPC with only 1 retry
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 16:47:57 +00:00
orbiter
444dce7e81 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 15:28:58 +00:00
orbiter
2c2dcd12a2 - enhanced performance of Eco-Tables: less time-consuming size() - operations
- will increase speed of indexing and collection.index creation


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 13:24:55 +00:00
orbiter
e356625b22 - refacotring of stream copy handling to support time-consuming operations
- made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer)
- introduced another timeout setting (java internal property)
- more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 09:53:07 +00:00
danielr
f01c50cf8d Proxy logging error (first step to resolution!?)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 06:56:06 +00:00
orbiter
c3342e1178 - removed class with only one static method
- removed connection method with too long time-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 23:35:20 +00:00
orbiter
f97971b63b fixed NPE problems doing a shutdown from command-line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 22:59:17 +00:00
danielr
7a35126e91 http timeouts von alten httpc wieder gesetzt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 11:02:14 +00:00
orbiter
2c1c3bb6eb - some refactoring (sorry Daniel, hab in deinem Code rumgewütet)
- fixed broken downloads (flush was missing)
- different problem handling when download is corrupted
- different default values in yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:36:33 +00:00
danielr
d96e2badc7 - fixed POST in proxy
- prepared http connection tracking
- refactoring (mainly moving StreamTools to serverFileUtils)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:17:40 +00:00
orbiter
14404d31a8 - enhanced performance graph (more info)
- added conditions for rarely used logging lines to prevent unnecessary CPU usage for non-printed info

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 14:44:39 +00:00
orbiter
696b8ee3f5 fix for http://forum.yacy-websuche.de/viewtopic.php?p=6806#p6806
- removed all InputStream.available() because this does not work for files > 2GB
- iterator terminate when a IOException occurs
- added handling of non-executing index.add methods to enhance assert usage
- added index for file indexes > 2GB, to be used in new indexHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 11:55:59 +00:00
danielr
94d3d3a86f fixed Proxy (for GET, POST still does not work!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 09:34:20 +00:00
danielr
081ed1d3ec HTTPLoader: reduced stackTraces
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 16:56:15 +00:00
danielr
8b2efb6f8c fixed garbage in HTCACHE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 16:46:45 +00:00
orbiter
225f9fd429 various fixes
- shutdown behavior (killing of client sessions)
- EcoFS reading better
- another synchronization in balancer.size()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 13:12:58 +00:00
orbiter
6e36c156e8 added more logging to EcoFS
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 09:52:25 +00:00
danielr
fb541f9162 HTTPC: default timeout half-hour
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 09:48:49 +00:00
danielr
a94f6cdca4 HTTPC: allowed self-signed certs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 09:21:43 +00:00
danielr
ab330cfdca Network.html: removed ; from location
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 08:13:38 +00:00
orbiter
319144f4b2 fix for outofbounds-excception in EcoFS chunk iterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 22:28:17 +00:00
orbiter
a9cf6cf2f4 generalization of index container-heap class.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 20:31:16 +00:00
orbiter
f099061944 protection against bad dht-flush word selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 20:25:05 +00:00
orbiter
5e4fddc1e6 more logging for new EcoFS.ChunkIterator to find bug for
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1024&hilit=&p=6806#p6806

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 18:47:49 +00:00
orbiter
117ae78001 speed enhancement for reading of eco-table indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4647 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 11:50:15 +00:00
danielr
7c149a4ee8 - undo less 'binary data found'
- removed duplicate stackTrace


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4643 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 17:46:11 +00:00
danielr
96cce8bed9 reduced 'Binary data found' errors
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 14:20:01 +00:00
danielr
2aef1414f5 removed test (in yacy.init)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:49:25 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
daa04f5db9 added additional check in file handler to prevent that url attacks are hidden in url path encodings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-04 12:15:27 +00:00
orbiter
783a4c9edb strong speed enhancements for the index cache dump and restore:
storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds
to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and
3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-02 13:18:23 +00:00
orbiter
442204a1c8 fix for concurrentModificationException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-01 21:21:37 +00:00
orbiter
d2f4926951 - more logging for balancer to get a hint where the problem is
- fix for new concurrency method in kelondroSplitTable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 18:45:27 +00:00
orbiter
20dadba426 - added a deadlock prevention function in cache flushing
- removed unused methods in collection index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 17:51:51 +00:00
orbiter
764a40e37d speed enhancements for crawler and url retrieval (affects also search speed)
- concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed
- because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed
- search speed also profits from LURL-lookup enhancement
- changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing
- removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 15:41:19 +00:00
orbiter
3ce3a4a3a1 added stub for new index container heap data structure (purpose: index folding)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 22:58:42 +00:00
orbiter
2c34038912 addition/correction to last commit: usage of concurrent-classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 21:17:12 +00:00
orbiter
b2150057d2 removed unnecessary cleanup method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4625 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 20:32:08 +00:00
lulabad
c4c0d54b22 * added regex extended blacklistengine
* removed my own engines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 08:50:09 +00:00
orbiter
368593e449 enhanced the concurrency handling of indexing process (better queue size control, better data concept, better shutdown behavior)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 00:03:44 +00:00
orbiter
be58135b3e possible fix for deadlock in search execution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-29 07:50:37 +00:00
orbiter
0241d070bc added concurrency to indexing process:
- the methods {parsing, semantic analysis (condensing), structure analysis (web structure)} in the serialized indexing path had been made concurrent.
- four BlockingQueues handle concurrency and hand-over of the indexing objects, the last object in the queue is stored into a blockingQueue of maximum size 1 to serialize the process for storage (which uses IO and therefore here should not be deserialized)
- a concurrency of (CPUs + 1) is default. Single-CPU users will profil from the change because large files cannot block the indexing process any more.
- removed the secondary indexing thread, which is superfluous now. Concurrency is default for all users.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-28 11:56:28 +00:00
lulabad
9fb5d661f2 added my Blacklistengines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4608 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-28 08:18:21 +00:00
orbiter
bca87f1e38 - refactoring of serverThreads: renaming to distinguish busy-threads and blocking-threads
- added blockingThreads which are threads that are not driven by pause times but by BlockingQueue lookup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-27 12:03:16 +00:00
orbiter
968c775025 - preparation of parsing/indexing queue for concurrent execution
- remote crawl receipts are now transmitted concurrently in separate threads (makes remove crawls much faster!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 22:43:38 +00:00
orbiter
9b0e20fb06 next refactoring step in document indexing to prepare concurrency environment for document parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4604 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 19:51:05 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
8d6a13bc07 refactoring of parsing-condensing-indexing process:
- separated parts
- removed storagePeer function
next step will be parallelization of processes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-24 22:51:26 +00:00
orbiter
d3b06913ec protection against seed-db failure during enumeration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 23:47:41 +00:00
orbiter
5aa96dbc36 fix for shutdown configuration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 13:14:57 +00:00
orbiter
93633abed8 - removed some debugging code from search process - should speed up now
- added some profiling code to search event - more time details in PerformanceSearch_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-23 00:55:04 +00:00
orbiter
fba46c51d7 fixed non-termination bug in qsort
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 23:15:28 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
fc94fbe224 another improvement to the collection sorting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 23:11:04 +00:00
orbiter
11270d450e better quicksort-pivot computation: 30% faster (measured with test program)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 22:01:12 +00:00
orbiter
3e44293f07 - fixed a problem with thread pools in row collection
- added a line-viewing feature in threaddump	

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 14:21:58 +00:00
danielr
e43051b125 - fixed Threaddump output (html-escaped ie. <init>)
- in EcoFS converted comments to javadoc


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 10:20:55 +00:00
orbiter
433ff855f7 - fixed another concurrency problem in collection sorting
- fixed a typing problem that was introduced in svn 4579 and caused the crawl monitor to fail

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4585 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 23:47:24 +00:00
orbiter
19286fa2d1 tried to fix seed2.old.db-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 22:35:19 +00:00
orbiter
f3996e63b8 tried to fix more deadlocks:
- changed connection modes in ftpc
- replaced sort tread pool in row collections by new one using util.concurrent. the old pool had caused blockings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4582 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 11:23:43 +00:00
danielr
7008a218b3 avoid ConcurrentModificationException in plasmaCrawlerQueues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-17 13:51:56 +00:00
orbiter
7150b463ff changed handling of default values and database paths:
- the default files yacy.init and for the network definition is now moved to the path defaults
- the httpProxy.conf is renamed to yacy.conf
- the DATA/INDEX/PUBLIC is renamed to the actual network nickname, which should be freeworld or sciencenet
more menu entries
- added apfelmaennchens alternative search page to the menu
- added the new thread dump page to the server log menu point as submenu
modifications
- modified the thread dump page: sorting by thread type

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-16 22:31:54 +00:00
orbiter
7fd094fcbe small bug in ftpc: did cot compile in Java 1.5
Please set compiler to Java 1.5-compliance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-16 13:41:49 +00:00
danielr
f51bad8ae5 FTP:
- report connection status (to break if no connection possible)
- fixed isFolder()
- additional error output
- fixed paths with encoded symbols (ie. a%20file.txt)
- refactoring


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 21:57:55 +00:00
danielr
820641938e ftpc: fixed date parsing, some refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 10:56:47 +00:00
orbiter
4c584dff87 disabled soLinger to prevent that too many connections stay open (it's a TEST!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 10:46:55 +00:00
orbiter
9c989fe5f7 fixed deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4562 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-15 00:49:16 +00:00
danielr
c565906050 FTP:
- added maxFileSize-check
- added timeout for download
- fixed dirlist (when all filenames have spaces, change to absolute links)
- enhanced isFolder()
- make sure data connection is closed, so a new can be opened
- refactoring


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4561 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 16:28:27 +00:00
danielr
1a7870df0d FTP: source cleanup (added finals, indention for easier diffs)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4559 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 12:35:53 +00:00
orbiter
fa1090113d - next try to fix the networking problem:
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 00:16:04 +00:00
orbiter
d87d295c68 one more try to fix the connection problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-12 13:13:11 +00:00
orbiter
a3dadcd89b preventing that peer which return a false search result are disconnected
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-12 00:56:18 +00:00
orbiter
ba622bb240 addendum to svn 4553
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4554 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-12 00:24:20 +00:00
orbiter
5530b8e1ca reverted changes to yacy protocol classes: they caused the sciencenet to loose connections
a comparisment with the main release 0.57 had been made: this showed a stable network
This is an emergency operation to ensure availability of the sciencenet network.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4553 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-12 00:05:18 +00:00
orbiter
b664a53553 fix for NPE during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4552 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 15:30:26 +00:00
orbiter
b4ed937f1e - modified zone navigation (does still not work correctly)
- added dht switch in network definition
- 0.574

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4550 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-11 11:09:38 +00:00
orbiter
8d0470a5c6 new method to compute search history IDs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-10 23:40:56 +00:00
orbiter
65785da8f2 new method for best hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-10 23:28:05 +00:00
orbiter
9eddc1506b - one try to fix the httpd problem
- fix for handling of collection index that appears when removing elements
- added another navigation method (stub, not working yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:58:22 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
danielr
6788f8f7c1 fixed error 'FTPC cannot change directory'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-06 11:59:23 +00:00
orbiter
7ce76c8ff8 added missing file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4530 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 22:57:53 +00:00
orbiter
bfed9c2da6 - some refactoring in search process
- separated sidebars in new search interface and placed them in their own files
  which can be put in into the search page like plug-ins

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4529 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 21:46:55 +00:00
borg-0300
3445b1e10b *better logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 13:41:54 +00:00
borg-0300
4b0339fec0 *fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=927
*remove some cast
*Properties added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4525 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-05 13:29:42 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
apfelmaennchen
bc3d3b4c97 fixed rebuildTags() to correctly rebuild folders...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-03 22:36:27 +00:00
danielr
fbe335db73 consistent use of de.anomic.server.serverMemory to get information about memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4522 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-02 15:42:50 +00:00
orbiter
8c06436c4a removing the error-db upon each time a start-up is made.
This is necessary because the table uses a lot of RAM and the content is never re-used after Start-Up.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-01 09:44:33 +00:00
orbiter
4fdf695064 - fixed a bug in remote search that prevented that any results had been generated (!)
- added a great number of printStackTrace and new exceptions that shall be used to find the cause
  for a bug in yacy client-server communication which causes the interruption of data transfer
  which then causes the parser bug for the seed strings.
- tried to fix the communication bug on server-side (copy functions)
Be aware that the log may be full of errors and bugs - there should not be more bugs but there is more to see


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 23:12:43 +00:00
borg-0300
0ddbed9451 Less memory consumption at start
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4518 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 20:09:22 +00:00
orbiter
1dce2f1079 more multithreading support:
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
  the very basic kelondroRowCollection which supports sorting with a second thread
  in case that a double-core processing CPU is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 15:16:47 +00:00
orbiter
6779b455d7 another fix for the punycode parser/generator (should work now!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 23:00:20 +00:00
orbiter
1b127406d0 update to punycode encoding (still not working)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4515 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 22:37:23 +00:00
orbiter
83860507c9 - added punycode class from gnu idn library
- added parser for international domains in yacyURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 22:18:40 +00:00
orbiter
253a453413 removed possible synchronization deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 14:05:43 +00:00
orbiter
3f321ece7d added a search history to the new search page
the history distinguishes between different users and identifies them by their ip
a history is only shown to the user who submitted the search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-25 21:26:49 +00:00
orbiter
c48e25d784 - fixed selection box for topwords
- fixed parser detail in condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-25 19:00:11 +00:00
orbiter
87a8747ce3 - enhanced recognition, parsing, management and double-occurrence-handling of image tags
- enhanced text parser (condenser): found and eliminated bad code parts; increase of speed
- added handling of image preview using the image cache from HTCACHE
- some other minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4507 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-25 14:08:15 +00:00
low012
652086159a *) Replaced System.err.println() by logging function. Left System.err.println()s as comments to be able to quickly revert changes since gzip is an application with it's own main method and Orbiter maybe wants to keep it this way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4505 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-24 19:56:19 +00:00
orbiter
677ee2ea04 added remove operation to collection index (re-activation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-23 00:14:11 +00:00
orbiter
d477483373 stronger criteria to use RAM copy to use table copy
(should use less RAM)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-22 23:46:27 +00:00
orbiter
a7abee3578 - fixed some data types in new search stack
- added image domain presentation to image preview
- added new search page to menu
- added automatic re-search when an old search profile is requested and a crawl is ongoing,
  to fetch newly crawled entries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 23:40:38 +00:00
orbiter
81687b6bd5 added missing hachCode computation for previous feature
this solves also the missing image double-check fetaure!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4500 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 15:37:46 +00:00
orbiter
bedd8dfbe2 - added image sorting by image size. This is the default now.
This is performed using a 3-stage sorting process:
  - sort by relevance, then do snippet-fetch
  - sort snippets by relevance then do image link extraction
  - sort image links by image size; unknown sizes are handled like small sizes
- only the exact amount of images as requested are shown

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 14:53:51 +00:00
orbiter
727feb4358 - fixed some bugs in ranking computation
- introduced generalized method to organize ranked results (2 new classes)
- added a post-ranking after snippet-fetch (before: only listed) using the new ranking data structures
- fixed some missing data fields in RWI ranking attributes and correct hand-over between data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 10:06:57 +00:00
orbiter
f4c73d8c68 - fixed highslide usage
- some enhancement to index management, better types

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 14:13:35 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
3441ec3928 - some small changes to highslide integration to get it working... (does not work yet)
- performance enhancement for url list parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4495 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-18 23:49:03 +00:00
orbiter
6c3cd2b4f2 - added new way to watch images from the image search:
they appear as separate, floating window above the search results,
  not in a new window
- added highslide javascript library for feature mentioned above
- removed dir servlet. This thing was not used as it was supposed to be (as an example applet)
  and was a major problem for intranet-indexing when files are hosted on the same peer.
- added yacy-httpd-internal directory listing. Because YaCy is a search engine,
  directory listings are similar to search result listings. Intranet indexing from the same peer
  will get nice index pages for document collections.
- removed unused test applet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-18 16:38:06 +00:00
orbiter
61a81820e3 - refactoring of search tracker
- added link to search history to repeat the search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 23:35:48 +00:00
lulabad
9ecc17baef fixed double Blog entrys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 13:03:40 +00:00
orbiter
36b898ca7a - tested successfully z-presentation of yacy seed encoding
- added alternative switch that takes shortest representation as yacy seed string encoding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4491 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 12:36:43 +00:00
orbiter
066c88140f quickfix for OOM, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=875&hilit=&p=5686#p5686
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 12:16:53 +00:00
orbiter
4079c38ce0 - probably slightly better default ranking
- added experimental right column to new search page (no function, only container)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 12:13:38 +00:00
orbiter
8fd5e52f04 added basket icons and experimental gif animation class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 10:40:00 +00:00
lulabad
94e256e13b * removed single Blogview, now links direct to BlogComments.html
* some other small changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 09:32:29 +00:00