Commit Graph

572 Commits

Author SHA1 Message Date
orbiter
ac73072924 added a demonstration class: integrate the YaCy search results in own applications
This class requests a YaCy peer remotely and produces search result objects.
The class was implemented in such a way that it is as short as possible. To get a
better integration of search results, use the cora package.
This class is fully stand-alone, it does not need any other external library other than already contained in JRE.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-20 09:57:36 +00:00
orbiter
8da4eb5de6 addition to patch in SVN 7111
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-19 23:12:50 +00:00
orbiter
37baa8bae3 - fixes for concurrency exceptions and failed database integrity verification
- added link to yacystats peer when peer is more than one day old

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-17 10:20:04 +00:00
orbiter
461a2a6ec7 enhanced remote crawling:
- 300 ppm is default now (but this is switched off by default; if you switch it on you may want more traffic?)
- better timing for busy queue
- better amount of remote url retrieval
- better time-out values
- better tracking of availability of remote crawl urls
- more logging for result of receipt sending

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-16 09:34:17 +00:00
orbiter
0cf006865e refactoring and enhanced concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 11:38:03 +00:00
orbiter
83ac07874f - corrected return value of put() methods (not used anywhere, so it did not harm before)
- added use of LookAheadIterator which should prevent mistakes when coding iterators with embedded iterators
- added a fail-safe reaction in case of database corruption using iterators over database elements (no interruption then)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-15 10:43:14 +00:00
orbiter
5702419194 fixed a bug in HTTPClient: keep-alive must be set to false, otherwise servers hold connections 2 seconds open until response.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 22:25:35 +00:00
orbiter
5870b13f3a - code cleanup / added debug line for further investigation in HTTPDemon.parseMultipart
- changed data structure for sorting in search which performs better in that specific case (too many updates)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 21:03:50 +00:00
orbiter
ac1c08924e more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 15:27:27 +00:00
orbiter
14c843d364 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7148 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 15:00:34 +00:00
orbiter
39f409a7bb performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 14:32:24 +00:00
orbiter
3c0e07ba72 removed all delays in shutdown process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 09:13:28 +00:00
orbiter
906c572621 - enhanced index create menu structure
- clear search log caches each time a search is done

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7142 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 09:06:27 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
7dbc357593 patch to identify corrupted database files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7139 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 07:20:53 +00:00
sixcooler
17eebd4ef8 counting crawler traffic again:
fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2808

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-11 15:58:15 +00:00
lotus
d2a3d08c44 avoid div. by zero
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-11 10:58:33 +00:00
orbiter
2c7edea35e - better shutdown behavior for the GUI (waits until data is written if GUI is killed)
- release 0.97

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 12:47:24 +00:00
orbiter
34a25856a5 - added navigation to next/prev search page using arrow keys (left/right)
- better information text for YaCy GUI application

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 10:42:01 +00:00
orbiter
32f73d1aaa added copy for Info.plist for Mac application release updates (this file contains class paths and start parameters)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 09:48:09 +00:00
orbiter
4c21d8dc9d - changed default values for online caution (the pausing may not be necessary any more)
- fixed bug in WeakPriorityBlockingQueue
- show favicon faster using pre-loading (same technique as used for fast image search)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 23:25:19 +00:00
orbiter
570ca577c6 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 22:42:54 +00:00
orbiter
348dece62f redesign of the SortStack and SortStore classes:
created a WeakPriorityBlockingQueue as special implementation
of a PriorityBlockingQueue with a weak object binding.
- better abstraction of ordering technique
- fixed some bugs according to result numbering (distinguish different counters in Queue)
- fixed a ordering bug in post-ranking (ordering was decreased instead of increased)
- reversed ordering numbering using a reversed ordering. The higher the ranking number the better (now).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 15:30:25 +00:00
hermens
03eb021568 Fix for byte[] Objects as keys
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-09 14:41:20 +00:00
orbiter
114bdd8ba7 fixed old sitemap importer which was not able to parse urls containing post elements
- removed old parser
- removed old importer framework (was only used by removed old parser)
- added a new sitemap parser in parser framework
- linked new parser with parser access in old sitemap processing routines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 14:13:15 +00:00
orbiter
c0b08ac59b slighlty changed way of pdf parser integration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 07:32:47 +00:00
orbiter
6d83c7cb62 removed unnecessary Override statements (produces errors in strict validation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 07:15:41 +00:00
orbiter
5fe828fa06 - replaced pdfbox and fontbox version 1.1.0 with 1.2.1
- added some clear statements that shall clear static cache size within the pdfbox library
- the pdfbox library contains a memory leak; it is unsafe to run a peer with pdf parser permanently on.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-07 17:13:47 +00:00
orbiter
24502fe3de performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 12:59:33 +00:00
orbiter
ffaa9a1c51 avoiding double-loading of the same resource from the web in case that a seond attempt to load the resource is started while the first attempt is still loading the content from the web. This will delay the second attempt to the time when the first attempt has finished with the possible result that the second attempt reads only from the web cache, not from the web.
This will also enhance the process of image result display from SVN 7105

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 10:23:30 +00:00
orbiter
d865ef77a8 removed re-read of index in case of a bad index. This may not solve the problem but it applies a 100% CPU problem on the peer. I'm afraid bad index files must be abandoned, and cannot be fixed this way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 09:55:04 +00:00
orbiter
b2c9db48ea Performance enhancement
- introduced byte[] - based ARC method for MapHeap which avoids a String generation each time the cache is accessed
- bugfixing in required class ComparableARC

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-06 09:53:33 +00:00
orbiter
ae07e11bc5 enhanced image search result display: concurrent loading of images before they are displayed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7109 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 23:02:46 +00:00
orbiter
22047ffad5 enhanced computation speed of many replaceAll string operations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 13:19:42 +00:00
orbiter
e8228fba09 less locking in time format computation, caching and during secondary (remote) search evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-05 11:13:12 +00:00
orbiter
9c0c94683c because of a bug in search result caching count search results had not been generated as fast as possible.
with this fix search results are (even) faster.
Also enhanced: image search. This is now speeded up using a image search result look-ahead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-04 22:57:12 +00:00
orbiter
b3f0d06444 fixed a problem with restarts in YaCy mac applications: the DATA directory path was not submitted when doing a restart. This solves the problem by:
- storing the startup properties when yacy is started
- using the properties in the restart-script again. this transports also the DATA directory location as parameter of the -gui option that is used when the Mac version of YaCy is started

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 23:08:43 +00:00
sixcooler
ca0a03e9ea ... migrating to HttpComponents-Client-4.x ...
ssl-stuff: accept almost everything

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7097 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 16:02:52 +00:00
orbiter
3988a95fb5 added ability in rss reader to parse atom feeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-03 08:53:24 +00:00
orbiter
9d080f387e change in handling of the all-visible home path for storage in YaCy:
the home path can now be distinguished between
- data home; the path where the DATA directory is created
- application home; everything else
This will make it possible to store application data on Mac releases within the
~/Library/YaCy
directory; a place where Mac applications write their data.
Similar techniques will be possible for debian and windows.
To use the new data path, YaCy can be started with
-start <data path>
or
-gui <data path>


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-02 19:24:22 +00:00
orbiter
65eaf30f77 redesign of crawl profiles data structure. target will be:
- permanent storage of auto-dom statistics in profile
- storage of profiles in WorkTable data structure
not finished yet. No functional change yet.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7088 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-31 15:47:47 +00:00
f1ori
938676265f fix shutdown command, close HttpClient connection pool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-30 17:48:20 +00:00
orbiter
4f22e2df41 bugfixes for
- next-execution-time in scheduler
- deletion of scheduled rss feed loading (now deletes also the scheduling entry)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7075 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:42:00 +00:00
orbiter
42414a6ae3 added two more tables in rss reader interface:
- fresh recorded rss feeds (not yet loaded or in scheduler)
- rss feeds in scheduler
The first list has a button that can be used to place rss feeds into the scheduler
The second list has a button to delete rss feeds from the scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-26 16:01:45 +00:00
orbiter
0010cd9db1 Support for indexing of RSS feeds!
- added a scanning in html parser for rss feeds
- storage of rss feed addresses, can be viewed with http://localhost:8080/Tables_p.html?table=rss
- rss items retrieved by http://localhost:8080/Load_RSS_p.html (in Index Creation menu) can be selected and indexed
- a rss feed retrieved in http://localhost:8080/Load_RSS_p.html can now be fully indexed
- indexing of rss feeds can be placed in scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-25 18:24:54 +00:00
orbiter
0f276dd63f - MapHeap now implements Map<byte[], Map<String, String>>
- refactoring of method names to comply with Map method names

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-24 12:36:56 +00:00
orbiter
cf07b34c2d implemented the Map interface in the ARC classes so it will be possible to instantiate ARCs as
Map<byte[], Map<String, byte[]>>
Because such Maps with byte[] keys cannot be stored in hash maps (bad hashing on byte[])
another ARC with comparable Maps has been added

This will make it possible to move the HTCache class 'Cache' into the cora package because that
class may be used either with RAM caches (ARCs) or with file-based caches (BEncodedHeaps)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 23:38:03 +00:00
orbiter
c60d0282fd more abstraction for tables stored in heaps:
the BEncodedHeap now implements Map<byte[], Map<String, byte[]>>
This will make it possible that also different database storage types may be added that implement also the same Map<byte[], Map<String, byte[]>> interface.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:27:58 +00:00
orbiter
d1be64d491 removed wrong assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 21:02:28 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
80ba543d4c svn fix for uppercase problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:16:17 +00:00
orbiter
5e7081cd19 refactoring towards a unified loading mechanism for MultiProtocolURIs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 01:08:56 +00:00
orbiter
caece04f26 removed System.err and System.out usage from FTPClient; changed logging to log4j (preferred in yacy.cora)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7064 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:51:31 +00:00
orbiter
90531f78ff refactoring of the cora package to get subpackages for http and ftp (smb to come)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 22:32:39 +00:00
sixcooler
661867923a ... migrating to HttpComponents-Client-4.x ...
The Client is dead, long live the Client!
(no references to the old client)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-22 17:38:27 +00:00
orbiter
7aa860c505 - more logging
- more stability for database heap in case of buffer failure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 10:16:05 +00:00
orbiter
4d5446d641 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-21 00:08:36 +00:00
orbiter
66ac3a7d9d corrected database row iteration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:33:56 +00:00
orbiter
dfd416e3fb removed a mysterious image buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 23:13:59 +00:00
orbiter
e10cd115a9 - added a new RSS reader interface. This is not finished but you can now load and look at RSS feeds. It will be used to index RSS feeds in a way that is appropriate for such kind of data.
- refactoring of Mediawiki and PHPBB3 loader interface names (just renamed)
- removed two old not used RSS loader interfaces
- fixed a bug in RSS parser library of cora
- added a new RSS parser component to the set of yacy document parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 11:30:02 +00:00
orbiter
933dc1a600 removed old rss parser (will be replaced with parser from cora package)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-20 07:42:38 +00:00
orbiter
70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
orbiter
5a994c9796 added a scheduler based on API actions
- every process that is monitored with the API Steering interface can now be scheduled!
- added input methods in Steering interface to set a scheduling time
- added a view on the steering api that shows only crawl jobs inside the Crawl Profile servlet
- added a scheduling call process in the cleanup process handler that triggers the scheduled processes
This causes that the cleanup now also looks for scheduled processes. Such processes are therefore not executed at
the same time as given in the target execution time but they will be executed within the cleanup process time window.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 12:13:54 +00:00
orbiter
189a986ebd - modified api-call interface to record api calls with references to api-call database (carries pk)
- added recording date, last execution date and next execution date for a scheduler (scheduler to be implemented next)
- extended database access methods for more data formats, especially for date insert/retrieval
- extended 'Steering' interface to show new database fields
- migrated Steering to new http client
- extended cora http client to transmit authentication and also added some convenience methods (http response code)
- simplified database back-end (not so much specialized methods for multiple properties)
- extended date formatter to produce a special format to show dates in html (&nbsp; in spaces of date format)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 15:56:38 +00:00
orbiter
054c22e2c6 added TLDs from http://www.opennicproject.org
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-18 10:39:49 +00:00
orbiter
86d7f8a989 - the web visualization can now be generated in custom color
- added input fields in WatchWebStructure_p.html
- introduced enum classes for Draw Mode and Filter Mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-17 10:44:00 +00:00
orbiter
7fdb17bb96 redirect uncaught exceptions to logging + small other changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-16 12:33:06 +00:00
orbiter
a82a93f2fc - better url double check in crawler
- more logging for error urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-11 09:54:18 +00:00
sixcooler
a6ed6e8cb9 ... migrating to HttpComponents-Client-4.x ...
make the occurrence of multiple header-keys possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-10 21:22:30 +00:00
orbiter
171f2bd84e - removed unused network oanet
- added new network definition 'allip' which can be used in networks where intranet and internet-addresses shall be indexed
- added a auto-switch-off for global search if there are no global peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 23:41:17 +00:00
sixcooler
1802c54317 LGPL-Header
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 14:38:49 +00:00
orbiter
a835a22b32 fixed isLocal() property (better recognition of intranet hosts)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-09 11:22:56 +00:00
orbiter
670c746dc5 dual-licensed HttpConnectionInfo for LGPL
original GPL license holder granted dual-licensing by email

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-07 23:03:10 +00:00
orbiter
301a59e07f moved browser access method from kelondro/util/OS to gui/framework/Browser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-05 10:49:58 +00:00
orbiter
ec72387165 added a very early test version of a YaCy gui component.
The gui currently does nothing else than providing a search window that sends the search string to the browser
The gui is started when YaCy is started with the option -g or --gui, like
./startYACY.sh -g
The gui will primary be used to provide a 'real' macintosh version that can be started and operated like any other macintosh application. A special mac application wrapper will follow.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7021 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-05 10:43:03 +00:00
sixcooler
d88b9606d1 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2923
+ some client fine tune

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 16:58:33 +00:00
orbiter
6388a58fc7 better memory management and slightly less (in total and temporary) RAM allocation:
- confirm that database objects that are not supposed to grow do not have a index memory management that is designed for growth
- changed index sorting method in such a way that it allocates less objects during quicksort
- database classes classes renaming (shorter, naming addresses that objects hold in RAM)
- added a large number of asserts to check if objects actually take the RAM that they should have


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-04 13:33:12 +00:00
orbiter
5924a0d851 - enhanced concurrency in database index access for multicore
- added statistics about database index caches in PerformanceMemory_p.html
- adoped many classes to use the new statistics
- added missing close statements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 04:58:48 +00:00
orbiter
55a2536bcf enhancement in drawing speed and reduction of object allocation during drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:44:08 +00:00
orbiter
9ab06bc333 enhancement in sorting efficiency (database root operation): less object allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-03 02:42:28 +00:00
sixcooler
39d96abbb5 fix yacyRelease download
(http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2920&p=20545#p20545)
better cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 20:13:20 +00:00
sixcooler
349e4dee9d ... migrating to HttpComponents-Client-4.x ...
added cookie policy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7012 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-02 14:16:44 +00:00
sixcooler
c29f24a519 ... migrating to HttpComponents-Client-4.x ...
- Proxy
- Release-download

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-01 22:35:11 +00:00
orbiter
d5c65b17a6 added another network activity visualization: show strong query activity as radiation around peer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7006 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-28 11:40:58 +00:00
orbiter
989948e1a9 fixed generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 07:13:15 +00:00
orbiter
e1015ead2c static access to constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:52:58 +00:00
orbiter
27d8a8b53e removed wrong com.sun.codec class access in generic image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7003 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:49:09 +00:00
orbiter
bbf887d879 added generics to UPnP classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7002 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 06:48:01 +00:00
sixcooler
15e8c13526 ... migrating to HttpComponents-Client-4.x ...
(gzip decompression, httploader, robots, ...)

+ enable proxy-crawling while log is fine

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-27 01:16:26 +00:00
mikeworks
b12db14b9f Added Generics to new net.yacy.upnp.* classes to eliminate compiler warnings
Added @Deprecated for deprecated functions getIPDevices and getPPPDevices in class InternetGatewayDevice
Changed debug statement in Domains.java and corrected filename in comments header

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-24 13:48:45 +00:00
sixcooler
b7102eff92 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 23:08:37 +00:00
mikeworks
572e429eff - fixes UPnP not working discussion on forum: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2881
SVN 6987 fixed net.yacy.upnp.devices.UPNPRootDevice for usage with JxPath > 1.3 by using a default namespace (xmlns="urn:schemas-upnp-org:device-1-0")
This commit now fixes the same problem for net.yacy.upnp.devices.UPNPService with default namespace (xmlns="urn:schemas-upnp-org:service-1-0")

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 19:13:37 +00:00
mikeworks
2a20282505 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6987 6c8d7289-2bf4-0310-a012-ef5d649a1542 2010-07-22 19:07:02 +00:00
lotus
965aa97993 including sbbi upnplib as source again
http://www.sbbi.net/site/upnp/index.html

renamed package to yacy
all options are also named "yacy" instead of "sbbi"

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 18:02:16 +00:00
orbiter
60caade056 removed debug output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6984 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 07:59:22 +00:00
sixcooler
52718e6dcb ... migrating to HttpComponents-Client-4.x ...
monitoring: replaced unused 'idletime' by uploading bytes
added some kind of 'upload-throttling' at dht-out :-)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-22 00:51:41 +00:00
sixcooler
5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
monitoring and first try to use remoteProxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-20 01:14:28 +00:00
orbiter
dec1419bc3 ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:18:32 +00:00
orbiter
22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:14:20 +00:00
orbiter
8674a65488 removed override directive which caused a compile error in eclipse helios
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 18:37:20 +00:00
low012
dc5f0e357c *) fixed SVN properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:02:03 +00:00
low012
01d6b952f0 *) minor changes for easier to read code, no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:00:43 +00:00
sixcooler
0e56d29335 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-15 00:59:53 +00:00
sixcooler
2ad5829b26 correct Timeoutparamter at HttpComponents-Client-4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-14 02:47:40 +00:00
sixcooler
e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-13 22:10:24 +00:00
sixcooler
c5c67f0504 start migrating to HttpComponents-Client-4.x
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-12 23:07:05 +00:00
orbiter
25024d6ab2 fix for problen when accessing the metadata index. The index was not available for all peers with no RAM table copy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 07:22:50 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
low012
d4851441b0 *) Added Android packages to parser in order to be able to create a decentralized search for direct downloads of Android apps.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 20:41:08 +00:00
orbiter
150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
see http://www.gnu.org/licenses/license-list.html for explanation
Since (as far as I know) nobody else has ever contributed to these files I may be allowed to just apply an older license.
You may consider this as a dual-licensing and may use and optionally replicate the older files under GPL 3.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 16:25:14 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00
orbiter
bf25407fdd added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 23:10:57 +00:00
orbiter
1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
- added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 22:39:27 +00:00
orbiter
5a4684f21f allow words with length >= 2 (you can't search for 'wm' with 3-letter words...)
lets try that. If we run into a memory problem because of too many 2-letter-words, then we must introduce whitelists for 2-letter words.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 16:31:26 +00:00
orbiter
37b8827a7a - removed the UPnP library sources from sbbi and added the jar library again. The library was included to get support for fedora releases, but after this time the fact that the sbbi cannot be part of fedora should be re-discussed. If this will still not be possible, then we may integrate the sbbi UPnP package using reflection.
- cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 10:32:47 +00:00
orbiter
103c848af8 enhancements in image drawing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-24 13:20:45 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
7e2d6fac12 patch for bad values during local search join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:31:00 +00:00
orbiter
986d4f34d9 added a consistency check for new queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 18:59:42 +00:00
orbiter
73f03e05ee fixed a bug in snippet fetch strategy: cache only does not help if resource can only be found in web
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 15:25:25 +00:00
orbiter
fbf021bb50 redesign of index abstract processing - currently disabled until enough peers have fix in SVN 6928
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-18 09:44:21 +00:00
orbiter
87087f12fe - scanned remote search process and enhanced some data structure and synchronizations here and there
- removed concurrency overhead for small number of index normalizations as it happens during remote search
- removed 'load only parseable' constraint for snippet fetch because some resources may not have any url file extension and these had therefore not been parseable and searcheable since they may become parseable after loading when their mime type is known
- this partly fixes some problems with http://forum.yacy-websuche.de/viewtopic.php?p=20300#p20300 but more changes are necessary to get all expected search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-17 11:59:40 +00:00
orbiter
7ddb70e7c6 new license for ai.greedy component: LGPL (nobody else than me modified that code)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 22:16:03 +00:00
orbiter
de4f30bb2e UTF-8 fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:22:31 +00:00
orbiter
3a1cebb598 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-16 15:11:21 +00:00
orbiter
51332b787d reverted SVN 6869 as discussed with dulcedo in car after LinuxTag:
missing time-out may be cause of locks during DHT-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 20:30:53 +00:00
orbiter
b03caaa57a better handling of OOM situations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-15 19:44:05 +00:00
orbiter
353a924760 - changed default memory to 500m
- now xms is lower than xmx (lets try what happens)
- removed default path for intranet crawl starts to avoid confusion as seen on linuxtag
- added time-out to upnp request (i have a new router which may need that)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-14 21:36:40 +00:00
orbiter
60e71876ad - more abstraction (HashMap -> Map)
- more concurrency-awareness (HashMap -> ConcurrentHashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 13:02:11 +00:00
orbiter
a83772c71b fixes and enhancements for balancer:
- crawl lists for each domain now uses a HandleSet which should use less memory than LinkedLists
- but: fill more entries into the domain lists (all available entries)
- fixes to selection criteria (best domain selection)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-01 09:30:23 +00:00
orbiter
9cde05418f fixed url crawl list display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6908 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-31 00:27:00 +00:00
orbiter
2eea806005 less errors in image parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6907 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-30 11:18:05 +00:00
orbiter
3f93a0cc8f redesign of remote proxy settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6903 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-26 00:01:16 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
6950d8a33d fixes to SMB crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-23 01:17:44 +00:00
orbiter
98c1d65415 - show up to 10 locations (maps) after search (instead of a max of 5)
- order locations by (primary) population and (secondary) longitude (reverse ordering, both)
- added population from GeoNames, OpenGeoDB does not have that information
- changed default viewpoint of map to (30,15); shows more land and europe in the center

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-21 08:18:04 +00:00
orbiter
9842fab6e4 - fixes to query parameter
- replaced/removed search query attribute (was old style, new is 'query' according to SRU)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-20 22:05:04 +00:00
orbiter
1defd580bc - added option to localization search to distinguish between a search for a location according to the search word only or for the relation between a web search results and locations found in the metadata fields
- used that to display two layers on map: cities and search result locations
- added many marker grafics for the display of the markers on the map
- some refactoring of the yacy news code plus bugfixes for latest move from Tree to Table data structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-19 12:53:09 +00:00
orbiter
bd0a9df895 fix for bad location double check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6884 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-18 11:54:30 +00:00
orbiter
e43e61e502 added another geolocalization data source: GeoNames
- added downloader option in DictionaryLoader
- added generalization (interfaces and overarching localization)
- more abstraction using the libraries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 23:49:30 +00:00
orbiter
118d589eff replaced the very very old data structure 'Records' with a simple table to fix the problem from
http://forum.yacy-websuche.de/viewtopic.php?p=20066#p20066

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-15 00:59:02 +00:00
orbiter
2a8f70f0ca - fix for caching of OSM tiles. if you want that this fix applies to your peer, please delete the crawl profiles
- fix for initial generation of crawl profiles (one more reason to remove your crawl profiles)
- more String -> byte[] migration
- more logging for cache store/hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 23:50:07 +00:00
orbiter
2126c03a62 - removed download-limit that can be given for the crawler for non-crawler download tasks. This was necessary because the same procedure was used for other downloads like for the download of dictionary files where a limit is not useful. The limit still stays for the indexer
- migrated the opengeodb downloader to a new version of the opengeodb-dump


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-14 18:30:11 +00:00
orbiter
439b44be9e removed exit from computation in ReferenceContainerArray.get merge method
an warning is still given, but method computes at normal operation
see also: http://forum.yacy-websuche.de/viewtopic.php?p=20038#p20038

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 23:36:40 +00:00
orbiter
789c6b26ce added a location search service: using the following servlet/example:
http://localhost:8080/yacysearch_location.kml?query=berlin&maximumTime=2000&maximumRecords=100

This will open any application that can consume kml data (which will probably be google earth) on your computer and displays the search result as positions on a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 12:58:05 +00:00
orbiter
f23cbd2dab more bugfixes to date parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6864 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:32:46 +00:00
orbiter
cf43bdc87e This is a large bugfix and enhancement commit to support a better location detection for data
- fixes to http file server session handling
- fixes and enhancements to metadata date/time handling
- added dc:publisher metadata field and updated all document parser
- fixed bug in metdata read procedure
- enhanced dublin core and rss parser to understand more fields more properly
- enhanced url selection in case that multiple urls are given in surrogates
- fix for condenser; failure when last word does not end with termination symbol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-11 11:14:05 +00:00
orbiter
6eba2cb96b fix in bmp parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6862 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-09 13:27:58 +00:00
orbiter
c45117f81f fixed dates in metadata
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6860 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-08 22:09:36 +00:00
orbiter
0a5fd15703 :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:06:31 +00:00
orbiter
ac16f582aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=20017#p20017
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 22:04:30 +00:00
orbiter
a7d038bb7a The oai ListFriends source list becomes configurable: just write them into defaults/oaiListFriendsSource.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-06 10:01:37 +00:00
orbiter
5fbf866cae - fixed resumption token generation for oai-pmh import
- relaxed dublin core parsing: the dc:reference tag may replace dc:identifier if this does not contain a valid url
- parsing of completeRecords number and presentation in the download list of oai import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-02 22:20:24 +00:00
orbiter
fc5efcc05a enhanced and fixed OAI-PMH import
- now importing OAI-PMH server list fron two sources
- simultanous import from several servers (even > 2000)
- check buttons on OAI-PMH server list to select multiple servers for import start
- it is possible to select all servers at once for import
- imported XML data is gzipped after import from surrogate reader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-30 14:03:51 +00:00
orbiter
455a763d7c performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6845 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-28 08:38:57 +00:00
orbiter
b6cce08019 fixed a bug in rwi storage data size allocation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 22:22:16 +00:00
orbiter
90c3e5d6f6 - cleanup, removed unused imports
- added crawling queue sizes to /api/status_p.xml, syntax same as in queues_p.html
- fixed a bug in queue enumeration that caused a out of bounds exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-27 21:47:41 +00:00
orbiter
b18a7606a0 some performance hacks and fixed after reading dump in
http://forum.yacy-websuche.de/viewtopic.php?p=19920#p19920

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-25 21:37:36 +00:00
orbiter
2bc3cba6f1 - fix for 'do not write to cache' rule.
- do not read from cache if byte[] array is still filled from response object (will do less IO)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-24 08:22:45 +00:00
orbiter
4cd5418963 removed finalize methods because of a hint in
http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/memleaks.html#gbyvh

The finalize method prevents that the memory, used by the objects containing the finalize method, is collected and available for the garbage collector. Instead, the memory allocated by such classes are enqueued to a java-internal finalize queue runner. This slows down all operations that uses a lot of object containing finalize methods.

this fix does not remove all finalize method, but such that may be used for throw-away objects that are allocated many times. This should cause a better run-time performance and less OutOfMemoryErrors 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-23 09:32:29 +00:00
orbiter
cff8ed134f added index check to prevent blocking in synchronization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-22 22:16:38 +00:00
orbiter
5ab5ac80fe fix for NPE in TextParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 22:35:47 +00:00
orbiter
b95ae2518b fix for assert
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 17:59:22 +00:00
orbiter
3247f0e901 fix for deadlocks caused by self-blocking access to TreeMap in concurrent environments. The TreeMap was replaced by a ConcurrentHashMap and additional care that the strings are compared all in lowercase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:46:02 +00:00
orbiter
027b971bde fix for concurrent quicksort: catch jobs from ThreadPoolExecutor that had been rejected because of full processing queues.
Non-catched jobs may have been the cause for blockings and freezes in case of overloading during strong processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 13:44:59 +00:00
orbiter
8c40f1cb8e self-healing for broken table files (may cause other problems, but better than nothing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-21 11:29:27 +00:00
orbiter
7b69d79727 enhanced remove() operation: in many cases it is not necessary to return the removed object to the called.
for such cases the delete() operation was introduced which is sometimes much cheaper in operation since it does not need to create objects to hold the removed content and it does not need to read those objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 14:47:41 +00:00
orbiter
93ea0a4789 enhanced remove operation in search consequences (which are triggered when the snippet fetch proves that the word has disappeared from the page that was stored in the index)
- no direct deletion of referenced during search (shifted to time after search)
- bundling of all deletions for the references of a single word into one remove operation
- enhanced remove operation by caring that the collection is stored sorted (experimental)
- more String -> byte[] transition for search word lists
- clean up of unused code
- enhanced memory allocation of RowSet Objects (will use a little bit less memory which was wasted before)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 13:45:22 +00:00
orbiter
7a59012632 fix for NPE
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 07:43:48 +00:00
orbiter
1a6c2f77b4 fix for NPE in statistic servlet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-20 00:08:43 +00:00
orbiter
64f29f990e a collection of performance hacks and code cleanup:
- removed usage of URL-Caches which could have been a memory leak
- removed unused classes and methods
- removed not necessary synchronizations
- added synchronization hacks where possible
- fine-tuned crawling speed to prevent IO of balancer
- fixed a bug in IODispatcher that may have caused that no merges were done
- reduced number of parameters in very often called methods (compare methods)
- reduced complexity of data structures of now massively used HandleSet class
- reduction of new String() and getBytes() usage / new methods to support this transition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-19 16:42:37 +00:00
orbiter
8b8107b2a3 reduced IO-load and synchronization/blocking
- enhanced the Balancer performance when building new domain stacks using a new Table buffer
- added the new Table buffer BufferedObjectIndex class
- changed order of access to LURL-read (prefereing segment over Crawl Queues) will reduced blocking time on balancer
- fixed PPM setting in Crawler_p servlet (had doubled values)
- reduced synchronization in IndexCell because it is not necessary: reduced blocking during indexing/merging/dumping
- removed did-you-mean cache in IndexCell because that caused too much overhead and more memory usage but was not very useful. This reduced also deadlocks that could be causes when searched are performed during indexing.




git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-18 21:55:20 +00:00
orbiter
ed07046870 flush only when > 3000 RWIs present + code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-16 16:07:19 +00:00
orbiter
3a50b5aa04 enhanced object hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 14:19:29 +00:00
orbiter
1a8a134e0c continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775 and continued in SVN 6790
The result should be a less usage of new String() and less memory usage (since a String-encapsulated byte[] has 40 bytes overhead)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-15 13:22:59 +00:00
orbiter
dde394a977 - shifted some computation out of synchronization to allow more concurrency
- removed synchronization where not necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6814 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 23:22:06 +00:00
orbiter
f204076d25 removed usage of temporary files: causes too much IO
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6813 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 22:17:18 +00:00
orbiter
650be3599f added a time-out to the RWI cache to flush the cache if it has not been written for ten minutes. This additional dump criteria is necessary because some data sources repeat their vocabulary and may cause that the number of words in a RWI does not increase while the number of references in the RWI set increases. Now the RWI Buffer is flushed every 10 minutes or later if at that time already a dump is ongoing.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 20:30:34 +00:00
orbiter
ff6cf24b80 replaced RowSetArray in ObjectIndexCache with RowSet to reduce complexity in MergeIterator. This complexity caused too much computing overhead when the RowSetArray had become very large.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-14 19:26:51 +00:00
orbiter
55d8e686ea performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 23:29:55 +00:00
orbiter
2f181d0027 introduced concurrency in HTCACHE storage compression
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 16:22:09 +00:00
orbiter
2e26744f4e more concurrency when normalizing RWI entries + cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 14:47:57 +00:00
orbiter
aa083fc45c try to get a fix for OOM problem in case that there is no real problem with missing memory.
See also http://forum.yacy-websuche.de/viewtopic.php?p=19835#p19835

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:39:54 +00:00
orbiter
70e6222978 more concurrency during search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 11:12:36 +00:00
low012
dc93cec3a8 *) Java 1.5 compatibility (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=2764)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-13 00:25:46 +00:00
orbiter
67ec58d8e7 search performance enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-12 07:31:43 +00:00
hermens
ef467a0303 Another workaround for the second part of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2770
This should prevent URLs with bad referrer entries from being dropped by transferURL or even crashing the whole Transmission$Chunk


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-10 13:57:46 +00:00
orbiter
25aef069a6 continuing String-hash - to - byte[]-hash redesign that was started in SVN 6775
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-04-08 00:11:32 +00:00
orbiter
1e8e79b9ef redesign of reference hash (URL-hash) parameter hand-over:
pass value as byte[], not as String. This should cause that less
byte[] <-> String conversions are made during time-critical tasks.
This redesign is not yet complete, more to come ..

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 18:33:20 +00:00
orbiter
72d8e9897b removed unnecessary cache flush call in backend of BufferedRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 12:44:13 +00:00
orbiter
749ffbd642 - added another catch case for the index dump and index merge process that should cause non-blocking behavior in case that index dump and/or index merge caused any unexpected exception.
- reverted SVN 6766, this is too dangerous (may cause unexpected memory usage) and should not be necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6773 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:46:40 +00:00
orbiter
9ddb8e4a43 set an option for the java-internal image parser that prevents that the image is cached using the file-system in a temporary file. This should speed up image parsing during image indexing dramatically and should also cause better performance when showing the yacy banner and OSM tiles.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:43:31 +00:00
orbiter
312ca5d917 removed flush at end of every rwi entry since this reduces the write performance.
This should speed up RWI cache dump and RWI merge operations and should cause less blocking time during these processes for the indexer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:41:20 +00:00
orbiter
0018163c07 moved table row/column matching method from front-end to back-end
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-26 10:01:27 +00:00
orbiter
31e29a8831 - removed synchronization during index dump and index cleaning
- added semaphores to synchronize index dump and index cleaning for each process separately

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-25 07:09:53 +00:00
orbiter
6c093d6aed - enhanced domain navigator computation
- fixed domain navigator content in case that a mustmatch constraint was given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 13:41:41 +00:00
orbiter
bb63c5d075 using a Pattern object with precompiled regular expressions to apply must-match constraints to search results: should speed up pre-sorting of search results and should cause richer search result sets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-23 10:17:28 +00:00
orbiter
e0da0a84b0 performance fix in http parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:52 +00:00
orbiter
90dd197ae7 - no latency for local crawls
- catch interrupted exception during 'fast' crawls in workflow processor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-22 09:12:18 +00:00
orbiter
bfb518cd47 some refactoring to get the LoaderDispatcher a little bit more independent from the switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:28:03 +00:00
orbiter
36bd843ece for for RFC5322 comformance as suggested by Quix0r in http://forum.yacy-websuche.de/viewtopic.php?p=19585#p19585
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6754 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-20 10:23:47 +00:00
orbiter
748abfcffa added patches to prevent yacy-protocol DoS settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 15:31:15 +00:00
orbiter
e820ed061a avoiding excessive DNS lookups to determine localhost
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 14:28:25 +00:00
orbiter
11983bc936 redesigned some parts of the parser entry point:
- in all cases that the parser is entered it is a whole set of possible parsers computed according to given mime type and file extension,
that means that all parsers are considered where the registered mime acceptance and extension acceptions matches.
that may cause that several parsers are tried for the same file which will cause a success in cases where there was only the mime type was used to choose the right parser and the mime type was given wrongly by the host httpd.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-19 13:04:42 +00:00
orbiter
89b4fff1c2 adopted ant script for new exif library
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:36:38 +00:00
orbiter
24e5faee75 added exif parsing for jpg images
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-12 12:23:38 +00:00
orbiter
82f76e1296 removed log line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 20:31:38 +00:00
orbiter
0f8004f9da enhanced html parser to recognize a href tags inside header tags
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6743 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 17:52:07 +00:00
orbiter
3300930fc5 - (almost) fixed FTP crawler
- integrated/fixed SMB crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-11 15:43:06 +00:00
orbiter
1198b9989d bugfixes, more sorttable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 15:39:36 +00:00
orbiter
9623d9e6d2 added a smb loader component for the YaCy crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-10 08:55:29 +00:00
orbiter
ae2f3f000f better handling of table copy abandon .. prevent memory leak
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 13:32:15 +00:00
orbiter
0769517129 added a robots.txt monitor in the crawler monitor submenu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-09 11:31:15 +00:00
orbiter
72f00dee59 removed never-used server access account function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-08 22:30:45 +00:00
orbiter
de01fe0e6d fix for bug in url parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-07 01:33:18 +00:00
orbiter
c4bdb1e7f2 added one more option in ViewFile to show an iframe like for the orginal web page content but using the cache than the direct link to the content in the web. Upgraded the very old and previously not any more used CacheResource_p servlet to a new and working version.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-06 23:41:51 +00:00
orbiter
1bbe14d23f SVN 6716 unfortunately contained parts of the unfinished SMB integration. To fix compile errors the remaining parts of the SMB implementation stub is added with this commit.
This adds the jcifs smb library.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:46:22 +00:00
orbiter
884b262130 - added a new Wiki Namespace Navigator
- some redesign of Navigator data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 21:25:49 +00:00
orbiter
270fb38674 - fixed some bugs in Table viewer
- added 'select all' feature in Tables_p
- enhanced ViewFile.html: has now an input field to load arbitrary resources from the web and analyze them (!!!)
- included the ViewFile servlet into the Index Administration menu
- show in ViewFile if ressource is in url-db and/or in Web cache
- bugfixes to BEncodedHeap and Tables management

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-05 15:41:15 +00:00
orbiter
727dd9b193 - fixed a bug in robots.txt parser
- moved storage of robots.txt entries to WorkTables, so it is now possible to browse the robots entries with the table browser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-04 11:58:07 +00:00
orbiter
54af9e6b49 - added parsing of robots meta-tag in html headers to detect a noindexing request
- added evaluation and indexing prevention in case that a noindexing is given in a html file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-03-03 23:32:56 +00:00
sixcooler
cd6de83905 next try for for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2703
(reverted 6692)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-23 15:59:58 +00:00
sixcooler
bfe4693e9a fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2703
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-23 13:46:56 +00:00
orbiter
564927ce72 redesign of CrawlResult data structures because of OOM occurrences during URL deletion processes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-16 23:06:04 +00:00
orbiter
30c8185139 fix for sid check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:31:32 +00:00
orbiter
ef62d017e5 integrated session id filtering for crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 23:15:17 +00:00
orbiter
d8d9984913 added framework for session id filtering (not ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:30:41 +00:00
orbiter
2bc36de336 - fix for bug in svn 6669
- cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 22:06:13 +00:00
orbiter
d378ca4604 better handling of concurrency in seed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:57:35 +00:00
orbiter
6538043d89 fix for http://forum.yacy-websuche.de/viewtopic.php?p=19189#p19189
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-15 15:45:31 +00:00
sixcooler
e071d71f19 fix for yacy-banner-network-values
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2521

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-09 18:22:36 +00:00
sixcooler
787b588c33 reverted a part of svn6636:
- didn't work on blobs >2GB
- should be obsolete since svn6651
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2652&sid=7fa98fd3edfc2a03f26394d545e3e3c1&p=19172#p19172

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6655 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-07 19:32:46 +00:00
lotus
11188cd7eb resource observer now uses the Java 6 method to check for free space. thus, disk observing now needs Java 6 installed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-06 18:48:06 +00:00
sixcooler
089877f32c my first commit - hopefully fix for merge problem
- http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2652

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-05 19:38:00 +00:00
orbiter
d6391f2537 better handling of rewrite cases where the resulting rewrite blob entry is equal in size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 23:17:47 +00:00
orbiter
ef9473d92c added another sixcooler suggestion: recycle corrupted records
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6647 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 16:25:05 +00:00
orbiter
fe78edac32 - view API calls in correct date-order
- execute recorded API calls in date-order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6646 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 15:51:54 +00:00
orbiter
308a973503 refactoring of tables data organisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-04 11:26:23 +00:00
lotus
85ca96227f fix for re-enable parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6643 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-03 19:33:59 +00:00
orbiter
ada0ce9de3 refactoring of bookmarks: there is a big performance problem in the bookmarks code and furthermore the bookmarks
will loose its leading role for the re-crawl funtion when the new api tables will work. To be prepared for a replacement
of such functions the bookmark class is re-organised.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 22:18:56 +00:00
orbiter
3751ab4ae2 added sixcoolers patch and more checks/removed unnecessary code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 16:11:00 +00:00
orbiter
d8d8562c59 fill key with zeros during normalization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-01 15:40:16 +00:00
orbiter
24060885b6 - added Tables abstraction in data.Tables.java
fix for
http://forum.yacy-websuche.de/viewtopic.php?p=18910#p18910
http://forum.yacy-websuche.de/viewtopic.php?p=18894#p18894
http://forum.yacy-websuche.de/viewtopic.php?p=18814#p18814


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 18:02:09 +00:00
orbiter
7fdf59a77f misc NPE check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-29 15:59:24 +00:00
lotus
38a3d55afd added more possible php extensions for html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-24 20:04:31 +00:00
orbiter
4403304957 bugfix for list()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6616 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 16:00:56 +00:00
orbiter
69c29acb6e no exception thread dump if parser cannot parse becuase that mime-type/extension is in the deny-set
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6611 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 13:21:37 +00:00
orbiter
0098e6e859 bugfix for heap iterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6610 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-22 10:26:50 +00:00