Commit Graph

7011 Commits

Author SHA1 Message Date
orbiter
1da5241c2d do not block server session if maximum number of sessions is reached, just try to clean up once
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7095 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 12:05:37 +00:00
orbiter
3988a95fb5 added ability in rss reader to parse atom feeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 08:53:24 +00:00
orbiter
5de70c3d7c changed way of storage for search requests:
- the search request cache can now get as large as 1000 entries
- if more entries arrive, unused are deleted
- the elements may stay in the cache up to 10 minutes and longer if they are used
- the elements are deleted earlier that 10 minutes if the memory gets low
This commit was mainly done for metager-feeding peers that have a query load of 50000 queries each day. Also added:
- a monitor for cache hit/cache miss in PerformanceMemory_p.html (see at bottom of page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 21:52:45 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
fa5683adfe create a mac dmg file (a disc image) for mac releases in ant
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:11:49 +00:00
orbiter
875741bcff fix for http://forum.yacy-websuche.de/viewtopic.php?p=20657#p20657
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 10:05:04 +00:00
lotus
091281c9f2 Mac app ant task building a ready-to-distribute zip file
extending r7080

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 08:01:01 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
orbiter
3f1d5a061f by default store crawled pages to HTCache to support verify=false snippet generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 09:28:01 +00:00
lotus
2009999162 show landing page after installation finished
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 20:04:19 +00:00
f1ori
938676265f fix shutdown command, close HttpClient connection pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:48:20 +00:00
f1ori
55da979291 disable revision detection for git
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:11:19 +00:00
f1ori
6d2e0f5fb4 always kill shutdown java instance, even if yacy succeeded,
in future, the TERM-signal should be used, but currently not all threads are joined during shutdown


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-29 23:26:03 +00:00
f1ori
be0abd92cd always use kill command in initscript, after timeout elapsed and yacy didn't finished
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-29 18:15:22 +00:00
lotus
2a4ddc48bb adjustment for new java download method
see http://forum.yacy-websuche.de/viewtopic.php?p=20616#p20616

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 18:55:44 +00:00
lotus
e9160ea1e5 Mac ant task according to r7023
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7080 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 18:40:32 +00:00
lotus
93d2c22e60 adapted memory for first run to current standard values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 18:38:02 +00:00
orbiter
104318d58a - added nice colors to feed indexing state messages
- added a 'remove all' button for new and scheduled rss feed list
- made adding of new rss feeds concurrent so interface is more responsible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 11:56:51 +00:00
lotus
23ba107834 UPnP port forwarding default on now. This also displays a message on the entry settings page if not successful, so the user gets an extra hint to open his ports.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-27 08:45:00 +00:00
lotus
d5ccbb99f9 the Windows installer now always requires admin level for installation (Vista/7)
unfortunately some users seem to forget to manually install the downloaded Java runtime and therefore could not start YaCy
- added concept to always distribute the latest Java version via external php script

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7076 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:53:20 +00:00
orbiter
4f22e2df41 bugfixes for
- next-execution-time in scheduler
- deletion of scheduled rss feed loading (now deletes also the scheduling entry)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:42:00 +00:00
orbiter
42414a6ae3 added two more tables in rss reader interface:
- fresh recorded rss feeds (not yet loaded or in scheduler)
- rss feeds in scheduler
The first list has a button that can be used to place rss feeds into the scheduler
The second list has a button to delete rss feeds from the scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:01:45 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
- refactoring of method names to comply with Map method names

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-24 12:36:56 +00:00
orbiter
cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as
Map<byte[], Map<String, byte[]>>
Because such Maps with byte[] keys cannot be stored in hash maps (bad hashing on byte[])
another ARC with comparable Maps has been added

This will make it possible to move the HTCache class 'Cache' into the cora package because that
class may be used either with RAM caches (ARCs) or with file-based caches (BEncodedHeaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 23:38:03 +00:00
orbiter
c60d0282fd more abstraction for tables stored in heaps:
the BEncodedHeap now implements Map<byte[], Map<String, byte[]>>
This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:27:58 +00:00
orbiter
d1be64d491 removed wrong assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:02:28 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
80ba543d4c svn fix for uppercase problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:16:17 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:51:31 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
orbiter
d0fb6bc2bc cleaned up superfluous classes after sixcoolers migration to HttpComponents-Client-4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:04:31 +00:00
orbiter
dcd9065c84 next try to fix loading of network picture
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7061 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:02:54 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
6e4d2f0800 fix for the network image sync bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:59:21 +00:00
orbiter
7aa860c505 - more logging
- more stability for database heap in case of buffer failure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:16:05 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
sixcooler
6b06e94c8c make searched word(s) in search-results viewable at dark themes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:54:46 +00:00
orbiter
66ac3a7d9d corrected database row iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:33:56 +00:00
orbiter
dfd416e3fb removed a mysterious image buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:13:59 +00:00
orbiter
e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
- refactoring of Mediawiki and PHPBB3 loader interface names (just renamed)
- removed two old not used RSS loader interfaces
- fixed a bug in RSS parser library of cora
- added a new RSS parser component to the set of yacy document parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 11:30:02 +00:00
orbiter
933dc1a600 removed old rss parser (will be replaced with parser from cora package)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 07:42:38 +00:00
orbiter
70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
orbiter
5a994c9796 added a scheduler based on API actions
- every process that is monitored with the API Steering interface can now be scheduled!
- added input methods in Steering interface to set a scheduling time
- added a view on the steering api that shows only crawl jobs inside the Crawl Profile servlet
- added a scheduling call process in the cleanup process handler that triggers the scheduled processes
This causes that the cleanup now also looks for scheduled processes. Such processes are therefore not executed at
the same time as given in the target execution time but they will be executed within the cleanup process time window.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 12:13:54 +00:00
orbiter
189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
- added recording date, last execution date and next execution date for a scheduler (scheduler to be implemented next)
- extended database access methods for more data formats, especially for date insert/retrieval
- extended 'Steering' interface to show new database fields
- migrated Steering to new http client
- extended cora http client to transmit authentication and also added some convenience methods (http response code)
- simplified database back-end (not so much specialized methods for multiple properties)
- extended date formatter to produce a special format to show dates in html (&nbsp; in spaces of date format)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 15:56:38 +00:00
f1ori
1bc08e1416 support debconf in debian package
* now you are ask some questions to preconfigure yacy after installing the debian package


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7048 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 13:30:57 +00:00
orbiter
054c22e2c6 added TLDs from http://www.opennicproject.org
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 10:39:49 +00:00
orbiter
f616cdfce4 better resistance of NetworkImage generation against heavy load
this is needed for the network image on the yacy.net home page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 09:51:00 +00:00