Commit Graph

356 Commits

Author SHA1 Message Date
lotus
d2a3d08c44 avoid div. by zero
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-11 10:58:33 +00:00
orbiter
2c7edea35e - better shutdown behavior for the GUI (waits until data is written if GUI is killed)
- release 0.97

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 12:47:24 +00:00
orbiter
34a25856a5 - added navigation to next/prev search page using arrow keys (left/right)
- better information text for YaCy GUI application

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 10:42:01 +00:00
orbiter
32f73d1aaa added copy for Info.plist for Mac application release updates (this file contains class paths and start parameters)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 09:48:09 +00:00
orbiter
4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more)
- fixed bug in WeakPriorityBlockingQueue
- show favicon faster using pre-loading (same technique as used for fast image search)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 23:25:19 +00:00
orbiter
570ca577c6 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 22:42:54 +00:00
orbiter
348dece62f redesign of the SortStack and SortStore classes:
created a WeakPriorityBlockingQueue as special implementation
of a PriorityBlockingQueue with a weak object binding.
- better abstraction of ordering technique
- fixed some bugs according to result numbering (distinguish different counters in Queue)
- fixed a ordering bug in post-ranking (ordering was decreased instead of increased)
- reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 15:30:25 +00:00
hermens
03eb021568 Fix for byte[] Objects as keys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 14:41:20 +00:00
orbiter
114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
- removed old parser
- removed old importer framework (was only used by removed old parser)
- added a new sitemap parser in parser framework
- linked new parser with parser access in old sitemap processing routines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 14:13:15 +00:00
orbiter
c0b08ac59b slighlty changed way of pdf parser integration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 07:32:47 +00:00
orbiter
6d83c7cb62 removed unnecessary Override statements (produces errors in strict validation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 07:15:41 +00:00
orbiter
5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
- added some clear statements that shall clear static cache size within the pdfbox library
- the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-07 17:13:47 +00:00
orbiter
24502fe3de performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 12:59:33 +00:00
orbiter
ffaa9a1c51 avoiding double-loading of the same resource from the web in case that a seond attempt to load the resource is started while the first attempt is still loading the content from the web. This will delay the second attempt to the time when the first attempt has finished with the possible result that the second attempt reads only from the web cache, not from the web.
This will also enhance the process of image result display from SVN 7105

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 10:23:30 +00:00
orbiter
d865ef77a8 removed re-read of index in case of a bad index. This may not solve the problem but it applies a 100% CPU problem on the peer. I'm afraid bad index files must be abandoned, and cannot be fixed this way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 09:55:04 +00:00
orbiter
b2c9db48ea Performance enhancement
- introduced byte[] - based ARC method for MapHeap which avoids a String generation each time the cache is accessed
- bugfixing in required class ComparableARC

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 09:53:33 +00:00
orbiter
ae07e11bc5 enhanced image search result display: concurrent loading of images before they are displayed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7109 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 23:02:46 +00:00
orbiter
22047ffad5 enhanced computation speed of many replaceAll string operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 13:19:42 +00:00
orbiter
e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 11:13:12 +00:00
orbiter
9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
with this fix search results are (even) faster.
Also enhanced: image search. This is now speeded up using a image search result look-ahead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-04 22:57:12 +00:00
orbiter
b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by:
- storing the startup properties when yacy is started
- using the properties in the restart-script again. this transports also the DATA directory location as parameter of the -gui option that is used when the Mac version of YaCy is started

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 23:08:43 +00:00
sixcooler
ca0a03e9ea ... migrating to HttpComponents-Client-4.x ...
ssl-stuff: accept almost everything

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 16:02:52 +00:00
orbiter
3988a95fb5 added ability in rss reader to parse atom feeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 08:53:24 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
f1ori
938676265f fix shutdown command, close HttpClient connection pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:48:20 +00:00
orbiter
4f22e2df41 bugfixes for
- next-execution-time in scheduler
- deletion of scheduled rss feed loading (now deletes also the scheduling entry)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:42:00 +00:00
orbiter
42414a6ae3 added two more tables in rss reader interface:
- fresh recorded rss feeds (not yet loaded or in scheduler)
- rss feeds in scheduler
The first list has a button that can be used to place rss feeds into the scheduler
The second list has a button to delete rss feeds from the scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:01:45 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
- refactoring of method names to comply with Map method names

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-24 12:36:56 +00:00
orbiter
cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as
Map<byte[], Map<String, byte[]>>
Because such Maps with byte[] keys cannot be stored in hash maps (bad hashing on byte[])
another ARC with comparable Maps has been added

This will make it possible to move the HTCache class 'Cache' into the cora package because that
class may be used either with RAM caches (ARCs) or with file-based caches (BEncodedHeaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 23:38:03 +00:00
orbiter
c60d0282fd more abstraction for tables stored in heaps:
the BEncodedHeap now implements Map<byte[], Map<String, byte[]>>
This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:27:58 +00:00
orbiter
d1be64d491 removed wrong assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:02:28 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
80ba543d4c svn fix for uppercase problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:16:17 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:51:31 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
7aa860c505 - more logging
- more stability for database heap in case of buffer failure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:16:05 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
orbiter
66ac3a7d9d corrected database row iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:33:56 +00:00
orbiter
dfd416e3fb removed a mysterious image buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:13:59 +00:00
orbiter
e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
- refactoring of Mediawiki and PHPBB3 loader interface names (just renamed)
- removed two old not used RSS loader interfaces
- fixed a bug in RSS parser library of cora
- added a new RSS parser component to the set of yacy document parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 11:30:02 +00:00
orbiter
933dc1a600 removed old rss parser (will be replaced with parser from cora package)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 07:42:38 +00:00
orbiter
70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
orbiter
5a994c9796 added a scheduler based on API actions
- every process that is monitored with the API Steering interface can now be scheduled!
- added input methods in Steering interface to set a scheduling time
- added a view on the steering api that shows only crawl jobs inside the Crawl Profile servlet
- added a scheduling call process in the cleanup process handler that triggers the scheduled processes
This causes that the cleanup now also looks for scheduled processes. Such processes are therefore not executed at
the same time as given in the target execution time but they will be executed within the cleanup process time window.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 12:13:54 +00:00
orbiter
189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
- added recording date, last execution date and next execution date for a scheduler (scheduler to be implemented next)
- extended database access methods for more data formats, especially for date insert/retrieval
- extended 'Steering' interface to show new database fields
- migrated Steering to new http client
- extended cora http client to transmit authentication and also added some convenience methods (http response code)
- simplified database back-end (not so much specialized methods for multiple properties)
- extended date formatter to produce a special format to show dates in html (&nbsp; in spaces of date format)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 15:56:38 +00:00
orbiter
054c22e2c6 added TLDs from http://www.opennicproject.org
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 10:39:49 +00:00