Commit Graph

388 Commits

Author SHA1 Message Date
orbiter
e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 11:13:12 +00:00
orbiter
9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
with this fix search results are (even) faster.
Also enhanced: image search. This is now speeded up using a image search result look-ahead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-04 22:57:12 +00:00
orbiter
b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by:
- storing the startup properties when yacy is started
- using the properties in the restart-script again. this transports also the DATA directory location as parameter of the -gui option that is used when the Mac version of YaCy is started

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 23:08:43 +00:00
sixcooler
ca0a03e9ea ... migrating to HttpComponents-Client-4.x ...
ssl-stuff: accept almost everything

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 16:02:52 +00:00
orbiter
3988a95fb5 added ability in rss reader to parse atom feeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 08:53:24 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
f1ori
938676265f fix shutdown command, close HttpClient connection pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:48:20 +00:00
orbiter
4f22e2df41 bugfixes for
- next-execution-time in scheduler
- deletion of scheduled rss feed loading (now deletes also the scheduling entry)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:42:00 +00:00
orbiter
42414a6ae3 added two more tables in rss reader interface:
- fresh recorded rss feeds (not yet loaded or in scheduler)
- rss feeds in scheduler
The first list has a button that can be used to place rss feeds into the scheduler
The second list has a button to delete rss feeds from the scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:01:45 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
- refactoring of method names to comply with Map method names

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-24 12:36:56 +00:00
orbiter
cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as
Map<byte[], Map<String, byte[]>>
Because such Maps with byte[] keys cannot be stored in hash maps (bad hashing on byte[])
another ARC with comparable Maps has been added

This will make it possible to move the HTCache class 'Cache' into the cora package because that
class may be used either with RAM caches (ARCs) or with file-based caches (BEncodedHeaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 23:38:03 +00:00
orbiter
c60d0282fd more abstraction for tables stored in heaps:
the BEncodedHeap now implements Map<byte[], Map<String, byte[]>>
This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:27:58 +00:00
orbiter
d1be64d491 removed wrong assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:02:28 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
80ba543d4c svn fix for uppercase problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:16:17 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:51:31 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
7aa860c505 - more logging
- more stability for database heap in case of buffer failure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:16:05 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
orbiter
66ac3a7d9d corrected database row iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:33:56 +00:00
orbiter
dfd416e3fb removed a mysterious image buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:13:59 +00:00
orbiter
e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
- refactoring of Mediawiki and PHPBB3 loader interface names (just renamed)
- removed two old not used RSS loader interfaces
- fixed a bug in RSS parser library of cora
- added a new RSS parser component to the set of yacy document parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 11:30:02 +00:00
orbiter
933dc1a600 removed old rss parser (will be replaced with parser from cora package)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 07:42:38 +00:00
orbiter
70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
orbiter
5a994c9796 added a scheduler based on API actions
- every process that is monitored with the API Steering interface can now be scheduled!
- added input methods in Steering interface to set a scheduling time
- added a view on the steering api that shows only crawl jobs inside the Crawl Profile servlet
- added a scheduling call process in the cleanup process handler that triggers the scheduled processes
This causes that the cleanup now also looks for scheduled processes. Such processes are therefore not executed at
the same time as given in the target execution time but they will be executed within the cleanup process time window.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 12:13:54 +00:00
orbiter
189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
- added recording date, last execution date and next execution date for a scheduler (scheduler to be implemented next)
- extended database access methods for more data formats, especially for date insert/retrieval
- extended 'Steering' interface to show new database fields
- migrated Steering to new http client
- extended cora http client to transmit authentication and also added some convenience methods (http response code)
- simplified database back-end (not so much specialized methods for multiple properties)
- extended date formatter to produce a special format to show dates in html (&nbsp; in spaces of date format)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 15:56:38 +00:00
orbiter
054c22e2c6 added TLDs from http://www.opennicproject.org
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 10:39:49 +00:00
orbiter
86d7f8a989 - the web visualization can now be generated in custom color
- added input fields in WatchWebStructure_p.html
- introduced enum classes for Draw Mode and Filter Mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 10:44:00 +00:00
orbiter
7fdb17bb96 redirect uncaught exceptions to logging + small other changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-16 12:33:06 +00:00
orbiter
a82a93f2fc - better url double check in crawler
- more logging for error urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-11 09:54:18 +00:00
sixcooler
a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
make the occurrence of multiple header-keys possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-10 21:22:30 +00:00
orbiter
171f2bd84e - removed unused network oanet
- added new network definition 'allip' which can be used in networks where intranet and internet-addresses shall be indexed
- added a auto-switch-off for global search if there are no global peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 23:41:17 +00:00
sixcooler
1802c54317 LGPL-Header
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 14:38:49 +00:00
orbiter
a835a22b32 fixed isLocal() property (better recognition of intranet hosts)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 11:22:56 +00:00
orbiter
670c746dc5 dual-licensed HttpConnectionInfo for LGPL
original GPL license holder granted dual-licensing by email

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-07 23:03:10 +00:00
orbiter
301a59e07f moved browser access method from kelondro/util/OS to gui/framework/Browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-05 10:49:58 +00:00
orbiter
ec72387165 added a very early test version of a YaCy gui component.
The gui currently does nothing else than providing a search window that sends the search string to the browser
The gui is started when YaCy is started with the option -g or --gui, like
./startYACY.sh -g
The gui will primary be used to provide a 'real' macintosh version that can be started and operated like any other macintosh application. A special mac application wrapper will follow.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7021 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-05 10:43:03 +00:00
sixcooler
d88b9606d1 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2923
+ some client fine tune

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 16:58:33 +00:00
orbiter
6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
- confirm that database objects that are not supposed to grow do not have a index memory management that is designed for growth
- changed index sorting method in such a way that it allocates less objects during quicksort
- database classes classes renaming (shorter, naming addresses that objects hold in RAM)
- added a large number of asserts to check if objects actually take the RAM that they should have


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 13:33:12 +00:00
orbiter
5924a0d851 - enhanced concurrency in database index access for multicore
- added statistics about database index caches in PerformanceMemory_p.html
- adoped many classes to use the new statistics
- added missing close statements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 04:58:48 +00:00
orbiter
55a2536bcf enhancement in drawing speed and reduction of object allocation during drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:44:08 +00:00
orbiter
9ab06bc333 enhancement in sorting efficiency (database root operation): less object allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:42:28 +00:00
sixcooler
39d96abbb5 fix yacyRelease download
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2920&p=20545#p20545)
better cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 20:13:20 +00:00
sixcooler
349e4dee9d ... migrating to HttpComponents-Client-4.x ...
added cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7012 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 14:16:44 +00:00
sixcooler
c29f24a519 ... migrating to HttpComponents-Client-4.x ...
- Proxy
- Release-download

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-01 22:35:11 +00:00
orbiter
d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 11:40:58 +00:00
orbiter
989948e1a9 fixed generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 07:13:15 +00:00
orbiter
e1015ead2c static access to constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:52:58 +00:00
orbiter
27d8a8b53e removed wrong com.sun.codec class access in generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7003 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:49:09 +00:00
orbiter
bbf887d879 added generics to UPnP classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7002 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:48:01 +00:00
sixcooler
15e8c13526 ... migrating to HttpComponents-Client-4.x ...
(gzip decompression, httploader, robots, ...)

+ enable proxy-crawling while log is fine

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 01:16:26 +00:00
mikeworks
b12db14b9f Added Generics to new net.yacy.upnp.* classes to eliminate compiler warnings
Added @Deprecated for deprecated functions getIPDevices and getPPPDevices in class InternetGatewayDevice
Changed debug statement in Domains.java and corrected filename in comments header

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-24 13:48:45 +00:00
sixcooler
b7102eff92 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 23:08:37 +00:00
mikeworks
572e429eff - fixes UPnP not working discussion on forum: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2881
SVN 6987 fixed net.yacy.upnp.devices.UPNPRootDevice for usage with JxPath > 1.3 by using a default namespace (xmlns="urn:schemas-upnp-org:device-1-0")
This commit now fixes the same problem for net.yacy.upnp.devices.UPNPService with default namespace (xmlns="urn:schemas-upnp-org:service-1-0")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 19:13:37 +00:00
mikeworks
2a20282505 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6987 6c8d7289-2bf4-0310-a012-ef5d649a1542 2010-07-22 19:07:02 +00:00
lotus
965aa97993 including sbbi upnplib as source again
http://www.sbbi.net/site/upnp/index.html

renamed package to yacy
all options are also named "yacy" instead of "sbbi"

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 18:02:16 +00:00
orbiter
60caade056 removed debug output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6984 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 07:59:22 +00:00
sixcooler
52718e6dcb ... migrating to HttpComponents-Client-4.x ...
monitoring: replaced unused 'idletime' by uploading bytes
added some kind of 'upload-throttling' at dht-out :-)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 00:51:41 +00:00
sixcooler
5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
monitoring and first try to use remoteProxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-20 01:14:28 +00:00
orbiter
dec1419bc3 ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:18:32 +00:00
orbiter
22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:14:20 +00:00
orbiter
8674a65488 removed override directive which caused a compile error in eclipse helios
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 18:37:20 +00:00
low012
dc5f0e357c *) fixed SVN properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:02:03 +00:00
low012
01d6b952f0 *) minor changes for easier to read code, no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:00:43 +00:00
sixcooler
0e56d29335 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-15 00:59:53 +00:00
sixcooler
2ad5829b26 correct Timeoutparamter at HttpComponents-Client-4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-14 02:47:40 +00:00
sixcooler
e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-13 22:10:24 +00:00
sixcooler
c5c67f0504 start migrating to HttpComponents-Client-4.x
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-12 23:07:05 +00:00
orbiter
25024d6ab2 fix for problen when accessing the metadata index. The index was not available for all peers with no RAM table copy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 07:22:50 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
low012
d4851441b0 *) Added Android packages to parser in order to be able to create a decentralized search for direct downloads of Android apps.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 20:41:08 +00:00
orbiter
150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
see http://www.gnu.org/licenses/license-list.html for explanation
Since (as far as I know) nobody else has ever contributed to these files I may be allowed to just apply an older license.
You may consider this as a dual-licensing and may use and optionally replicate the older files under GPL 3.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 16:25:14 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00
orbiter
bf25407fdd added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 23:10:57 +00:00
orbiter
1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
- added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 22:39:27 +00:00
orbiter
5a4684f21f allow words with length >= 2 (you can't search for 'wm' with 3-letter words...)
lets try that. If we run into a memory problem because of too many 2-letter-words, then we must introduce whitelists for 2-letter words.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 16:31:26 +00:00
orbiter
37b8827a7a - removed the UPnP library sources from sbbi and added the jar library again. The library was included to get support for fedora releases, but after this time the fact that the sbbi cannot be part of fedora should be re-discussed. If this will still not be possible, then we may integrate the sbbi UPnP package using reflection.
- cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 10:32:47 +00:00
orbiter
103c848af8 enhancements in image drawing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-24 13:20:45 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
7e2d6fac12 patch for bad values during local search join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:31:00 +00:00
orbiter
986d4f34d9 added a consistency check for new queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 18:59:42 +00:00
orbiter
73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 15:25:25 +00:00
orbiter
fbf021bb50 redesign of index abstract processing - currently disabled until enough peers have fix in SVN 6928
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 09:44:21 +00:00
orbiter
87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
- removed concurrency overhead for small number of index normalizations as it happens during remote search
- removed 'load only parseable' constraint for snippet fetch because some resources may not have any url file extension and these had therefore not been parseable and searcheable since they may become parseable after loading when their mime type is known
- this partly fixes some problems with http://forum.yacy-websuche.de/viewtopic.php?p=20300#p20300 but more changes are necessary to get all expected search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-17 11:59:40 +00:00
orbiter
7ddb70e7c6 new license for ai.greedy component: LGPL (nobody else than me modified that code)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 22:16:03 +00:00
orbiter
de4f30bb2e UTF-8 fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:22:31 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
51332b787d reverted SVN 6869 as discussed with dulcedo in car after LinuxTag:
missing time-out may be cause of locks during DHT-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 20:30:53 +00:00
orbiter
b03caaa57a better handling of OOM situations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 19:44:05 +00:00
orbiter
353a924760 - changed default memory to 500m
- now xms is lower than xmx (lets try what happens)
- removed default path for intranet crawl starts to avoid confusion as seen on linuxtag
- added time-out to upnp request (i have a new router which may need that)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-14 21:36:40 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
a83772c71b fixes and enhancements for balancer:
- crawl lists for each domain now uses a HandleSet which should use less memory than LinkedLists
- but: fill more entries into the domain lists (all available entries)
- fixes to selection criteria (best domain selection)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 09:30:23 +00:00
orbiter
9cde05418f fixed url crawl list display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-31 00:27:00 +00:00
orbiter
2eea806005 less errors in image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 11:18:05 +00:00