Commit Graph

6648 Commits

Author SHA1 Message Date
mikeworks
b143f6b169 ConfigHeuristics_p.html: XHTML 1.0 Strict Changes
- added empty action tag to form
- replaced name tags with id (name is not a valid tag in XHTML 1.0 Strict)
- changed label for target (so now clicking on the labels also activates the checkboxes)
de.lng: Test with Subversion properties #2

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-21 22:40:34 +00:00
mikeworks
a9474b3caa German language file de.lng updated
- Removed one obsolete line in Blacklist_p.html
 - Testing the new SVN Properties http://forum.yacy-websuche.de/viewtopic.php?f=15&t=2906

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6981 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-21 22:05:46 +00:00
orbiter
89b0f5bce8 fix for exception in http://forum.yacy-websuche.de/viewtopic.php?p=20418#p20418
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-21 11:26:08 +00:00
sixcooler
5fa8038f10 ... migrating to HttpComponents-Client-4.x ...
monitoring and first try to use remoteProxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-20 01:14:28 +00:00
orbiter
dec1419bc3 ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:18:32 +00:00
orbiter
22dbbcfa56 better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 20:14:20 +00:00
low012
2d2771a12e *) more HTML fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 19:21:59 +00:00
low012
eb8550526d *) fixed small HTML bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6975 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 18:40:41 +00:00
orbiter
8674a65488 removed override directive which caused a compile error in eclipse helios
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 18:37:20 +00:00
mikeworks
b4d5bb6a3e Steering.html: Changed link from Settings_p.html to ConfigAccounts_p.html for setting not existing Administrator password
de.lng: Added missing translations for Steering.html during restart/update

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 13:31:44 +00:00
low012
dc5f0e357c *) fixed SVN properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:02:03 +00:00
low012
01d6b952f0 *) minor changes for easier to read code, no functional changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 10:00:43 +00:00
low012
0e6fed1fb6 *) less HTML errors (according to https://addons.mozilla.org/de/firefox/addon/249/)
*) followed some suggestions by PMD

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-18 09:10:46 +00:00
low012
2d263a7157 *) less HTML errors (according to https://addons.mozilla.org/de/firefox/addon/249/)
*) Is line 112 there on purpose or can it be deleted?

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-17 19:00:54 +00:00
sixcooler
0e56d29335 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-15 00:59:53 +00:00
sixcooler
2ad5829b26 correct Timeoutparamter at HttpComponents-Client-4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-14 02:47:40 +00:00
sixcooler
e1316d12d0 ... migrating to HttpComponents-Client-4.x ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-13 22:10:24 +00:00
sixcooler
c5c67f0504 start migrating to HttpComponents-Client-4.x
see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2872

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-12 23:07:05 +00:00
low012
2de0ded377 *) trying to fix bug described in http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2900
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-10 18:03:07 +00:00
mikeworks
d851758dc6 Added German translation for ConfigHeuristics_p.html to de.lng
Fixed Network -> Heuristics title tag of the page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-05 22:58:51 +00:00
orbiter
43e6ce62af use heuristics only if user is authenticated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6962 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-05 21:52:02 +00:00
mikeworks
dcfb5b942d Updated German translation for Network.html in de.lng
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-05 21:26:37 +00:00
suessthomas
7feb549ce6 Small HTML-Fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6960 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-07-04 22:16:58 +00:00
orbiter
aa66da5135 corrected hint for debian installation update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6959 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 14:31:16 +00:00
orbiter
7188c54ddb patch to get dht access to developer peers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 08:42:29 +00:00
orbiter
25024d6ab2 fix for problen when accessing the metadata index. The index was not available for all peers with no RAM table copy.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-30 07:22:50 +00:00
low012
8e88fa4a62 *) fixed indetion (tab vs. spaces)
*) added Android packages MIME type

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 21:31:22 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
59c894029b removed confusing double set button in ConfigHeuristics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 22:27:20 +00:00
low012
d4851441b0 *) Added Android packages to parser in order to be able to create a decentralized search for direct downloads of Android apps.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 20:41:08 +00:00
orbiter
150cf42a1b migrated all my LGPL 3 -licensed files to the LGPL 2.1 because LGPL 3 is not compatible to the GPL 2
see http://www.gnu.org/licenses/license-list.html for explanation
Since (as far as I know) nobody else has ever contributed to these files I may be allowed to just apply an older license.
You may consider this as a dual-licensing and may use and optionally replicate the older files under GPL 3.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-28 16:25:14 +00:00
orbiter
11b7853940 added a configuration page for search heuristics. currently you can switch on there:
- a site-operation heuristic that loads all direct links from a portal page if the site-operator is used
- a direct crawl for search results from scroogle for the given search terms
The configuration page can be found directly beside the network configuration page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6951 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 21:38:16 +00:00
orbiter
5d00888c95 - added animated visualization for DHT-in and DHT-out in network graphic
- found and fixed a possible memory leak in YaCy internal RSS feed system
- some refactoring in RSS feed mechanisms to make this possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-27 10:45:20 +00:00
orbiter
bf25407fdd added peer hash to internal RSSFeed. The hash will be used to display news activities in the network graphic.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 23:10:57 +00:00
orbiter
1557e0f2d0 - some refactoring for internal RSSFeed (protocol of all actions as seen on status page)
- added dht-out to internal RSSFeed (you can see now messages about distributed indexes on status page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6948 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 22:39:27 +00:00
orbiter
5a4684f21f allow words with length >= 2 (you can't search for 'wm' with 3-letter words...)
lets try that. If we run into a memory problem because of too many 2-letter-words, then we must introduce whitelists for 2-letter words.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 16:31:26 +00:00
orbiter
b5e190099d - updated pdfbox and fontbox to 1.1.0
- added license file to sbbi-upnplib

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6946 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 10:58:07 +00:00
orbiter
37b8827a7a - removed the UPnP library sources from sbbi and added the jar library again. The library was included to get support for fedora releases, but after this time the fact that the sbbi cannot be part of fedora should be re-discussed. If this will still not be possible, then we may integrate the sbbi UPnP package using reflection.
- cleaned uo the code. The new eclipse helios provided new warnings for dead code. This change cleans up most of these warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-26 10:32:47 +00:00
orbiter
dcd01698b4 added a 'transition feature' that shall lower the barrier to move from g**gle to yacy (yes!):
Here a new concept called 'search heuristics' is introduced. A heuristic is a kind of 'shortcut' to good results in IT, here for good search results. In this case it will be used to get a very transparent way to compare what YaCy is able to produce as search result and what g**gle produces as search result. Here is what your can do now:
- add the phrase 'heuristic:scroogle' to your search query, like 'oil spill heuristic:scroogle' and then a call to scroogle is made to get anonymous search results from g**gle.
- these results are _not_ taken as meta-search results, but are used to instantly feed a crawling and indexing process. This happens very fast, here 20 results from scroogle are taken and loaded all simultanously, parsed and indexed immediately and from the results of the parsed content the search result is feeded, along to the normal p2p search
- when new results from that heuristic (more to come) get part of the search results, then it is verified if such results are redundant to existing (they had been part of the normal YaCy search result anyway) or if they had been completely new to YaCy.
- in the search results the new search results from heuristics are marked with a 'H ++' and search results from heuristics that had been already found by YaCy are marked with a 'H ='. That means:
- you can now see YaCy and Scroogle search results in one result page but you also see that you would not have 'missed' the g**gle results when you would only have used YaCy.

- to make it short: YaCy now subsumes g**gle results. If you use only YaCy, you miss nothing.

to come: a configuration page that let you configure the usage of heuristics and get this feature by default.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-25 16:44:57 +00:00
orbiter
d5d48b8dc7 enhanced network animation (smooth loading, reload not all 4 animation phases at once)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-24 15:01:26 +00:00
orbiter
103c848af8 enhancements in image drawing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-24 13:20:45 +00:00
orbiter
3a9dc52ac2 added a fascinating new way to search _and_ start a web crawl at the same time:
implemented a hint from dulcedo "use site: - operator as crawl start point".
YaCy already was able to search using a site-constraint. This function is now extended with a instant crawling feature.
When you now use the site-operator, then the landing page of the site iand every page that is linked from this page are loaded, indexed and selected for the search result within that search request. When the remote server responds quickly enough, then this process can result in search results during the normal search result preparation .. just in some seconds.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-23 11:19:32 +00:00
orbiter
8e3cbbb6a9 more animation: update of network image every 10 seconds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-23 10:29:04 +00:00
orbiter
2b4f8f6c06 animated network graphic!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6939 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-23 10:05:08 +00:00
orbiter
d7767e7589 IFFRESH is too strong, IFEXIST sufficient for cache policy when doing a link verification (this is as it was two commits before)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6938 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 19:16:26 +00:00
orbiter
777195e8d1 more abstraction for access of LoaderDispatcher and cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-22 12:28:53 +00:00
orbiter
7bcfa033c9 more abstraction of the htcache when using the LoaderDispatcher:
a cache access shall not made directly to the cache any more, all loading attempts shall use the LoaderDispatcher.
To control the usage of the cache, a enum instance from CrawlProfile.CacheStrategy shall be used.
Some direct loading methods without the usage of a cache strategy have been removed. This affects also the verify-option
of the yacysearch servlet. If there is a 'verify=false' now after this commit this does not necessarily mean that no snippets
are generated. Instead, all snippets that can be retrieved using the cache only are presented. This still means that the search hit was not verified because the snippet was generated using the cache. If a cache-based generation of snippets is not possible, then the verify=false causes that the link is not rejected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-21 14:54:54 +00:00
orbiter
fd9f0714a3 added link verification, global search and navigation to opensearch description.
Hint: the YaCy search can easily be integrated into the firefox search window:
Just start a search, then open the pop-up menu inside the firefox search input window and select "add search engine"

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 11:04:11 +00:00
orbiter
7e2d6fac12 patch for bad values during local search join
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:31:00 +00:00
orbiter
2ddb952a5c added the (fixed and anhanced) secondary search process. The process was disabled since some time.
The search process for more than one word should be enhanced now and produce much more results.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-20 00:11:12 +00:00