Commit Graph

7502 Commits

Author SHA1 Message Date
orbiter
98c4d25185 fix for endless loop in FTP crawling, see http://bugs.yacy.net/view.php?id=32
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7736 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 10:06:20 +00:00
orbiter
d1dbbd956a always use a template method cache even if the template cache flag is set to false. This flag is only used to make dynamic updates to the template files, to not dynamic updates to the rewrite methods (which is not possible without recompiling). low memory usage is guaranteed by the usage of soft references which are dropped before an OOM is thrown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 09:31:07 +00:00
orbiter
0d040ff6bb fix for bug 0000036: no crawling of https pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-24 09:14:32 +00:00
orbiter
3ed4a09368 small features, some bug fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-23 21:08:04 +00:00
orbiter
e55c254f7b enhanced logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-22 20:12:13 +00:00
orbiter
3ec94d87c4 show dom counter only for active crawls where the dom counter is enabled within the crawl profile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-22 19:34:20 +00:00
orbiter
e3ee43e6ed these YBR files are not needed any more
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7730 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-18 14:27:24 +00:00
orbiter
b45701d20f this is a re-implementation of the YaCy Block Rank feature
This time it works like this:
- each peer provides its ranking information using the yacy/idx.json servlet
- peers with more than 1 GB ram will load this information from all other peers, combine that into one ranking table and store it locally. This happens during the start-up of the peer concurrently. The new generated file with the ranking information is at DATA/INDEX/<network>/QUEUES/hostIndex.blob
- this index is then computed to generate a new fresh ranking table. Peers which can calculate their own ranking table will do that every start-up to get latest feature updates until the feature is stable
- I computed new ranking tables as part of the distribition and commit it here also
- the YBR feature must be enabled manually by setting the YBR value in the ranking servlet to level 15. A default configuration for that is also in the commit but it does not affect your current installation only fresh peers
- a recursive block rank refinement is implemented but disabled at this point. it needs more testing

Please play around with the ranking settings and see if this helped to make search results better.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7729 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-18 14:26:28 +00:00
orbiter
d27a0a67ff fix in log initialization according to hint from Dominic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7728 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-17 15:53:59 +00:00
orbiter
205cc75157 abstraction of surrogate main element (xmlns:geo was missing for wiki extracts)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7727 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-17 08:57:49 +00:00
orbiter
021840e5ba removed (almost) deadlocks and unnecessary CPU load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7726 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-17 00:00:01 +00:00
orbiter
3d879e0995 test file not needed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7725 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-16 23:19:11 +00:00
orbiter
123375bfba added a new yacy protocol servlet 'idx'. This returns an index to one of the data entities that is stored in YaCy.
This servlet currently only serves for indexes to the web structure hosts. It can be tested by calling
http://localhost:8090/yacy/idx.json?object=host
This yacy protocol servlet is the first one that returns JSON code and that also shows index entries in a readable format. This will make the development of API applications much easier. This is also an example implementation for possible json versions of the other existing YaCy protocol interfaces.

The main purpose of this new feature is to provide a distributed block rank collection feature. Creating a block rank is very difficult if the forward-link data is first collected and then one peer must create a backward-link index. This interface provides already a partial backward index and therefore a collection of all these indexes needs only to be joined which is very easy. The result should be the computation of new block rank tables that all peers can perform.

To reduce load from peers this servlet buffers all data and refreshes it only once in 12 hours. This very slow update cycle is needed because the interface will be called round-robin from all peers once after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-15 22:57:31 +00:00
orbiter
d326f1486a added timeout setting to scanner interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7723 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 11:30:41 +00:00
low012
f0d5ddfa92 *) preventing potential NPE which occured if user deleted DATA/RELEASE manually and opened ConfigureUpdate_p.java then
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:23:19 +00:00
orbiter
5c981762c6 added bigrange option for network scan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:13:16 +00:00
low012
c55787d07c *) revert of r7667
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7720 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:03:18 +00:00
orbiter
bade61696f speed-up of network port scanner
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 09:03:16 +00:00
orbiter
b04382bc59 added topmenu as defined for search to wiki
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-14 08:29:16 +00:00
lotus
229df8b626 restart link after memory changed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 17:24:03 +00:00
orbiter
1d8b0f74f4 one more fix for SVN 7713
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 15:31:24 +00:00
orbiter
0960261769 fix for svn 7713
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 15:20:57 +00:00
apfelmaennchen
7e368000c8 transparent progress bar
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 13:40:23 +00:00
orbiter
5b579e21a3 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:21:40 +00:00
orbiter
fcd4b03892 show progress of search after display of results is finished
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7712 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:20:00 +00:00
lotus
8b63d7637d revert 7710,
configure java in firewall for full access. this is the usual config if the user manually accepts java for the firewall.
otherwise if a port is specified, windows will not ask on any port change. this would break yacy and other java applications if they run not on the specified port. this would not be an expected behaviour for the user.

firewall config may fail for win64 systems (system32 is specified in path)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-09 20:06:05 +00:00
pca
440e3ba887 Windows Installer:
- remove firewall-handling for WinXP (can only open for JRE not for 
  special port)
- Vista/Win 7: open port 1900 for communication with router (uPnP)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-09 06:16:35 +00:00
orbiter
039126cfaf better handling of on/off switched solr indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-08 22:47:20 +00:00
orbiter
dc54915df4 fix for very bad compare
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-08 08:45:58 +00:00
lotus
f123dbec79 fix in heuristics config
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 18:52:20 +00:00
orbiter
897b4e8b9c another hack to prevent black images
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7706 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 07:45:02 +00:00
orbiter
9248a4eef4 reduce teh effect of 'Bildersuche findet generierte HTML-Seiten als Bilder'
see http://bugs.yacy.net/view.php?id=9

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7705 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-07 07:37:46 +00:00
orbiter
0621a15f89 fix for wrong search result counter: added a counter for all filtered out entities
see also http://bugs.yacy.net/view.php?id=5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-06 23:04:27 +00:00
apfelmaennchen
61c9a791c4 YMarks: sidebar with tabs for tags and folders
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7703 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-06 21:36:35 +00:00
orbiter
9c33b2fb58 fix for String Matcher in case that no snippet is returned (NPE)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7702 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 23:11:03 +00:00
orbiter
76f2817e00 a fix for the snippet computation and hopefully better snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 23:05:38 +00:00
orbiter
deda54d684 - relaxed matching of string-search (this is now case-insensitive)
- added transport of string-search pattern to remote search protocol
- fixed a problem parsing snippets with a '-' inside

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7700 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 22:37:06 +00:00
lotus
8fd4e8ea98 proper jre version (without -s in filename)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 20:03:27 +00:00
orbiter
15e3a57b4e removed unused functions in condenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 09:23:10 +00:00
orbiter
6e42d4de88 - added full-String search function: find things that match exactly what is quoted in the query
- re-structuring authentification methods to fix a problem with API steering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 00:25:14 +00:00
orbiter
8e10b82280 small fix for solr export
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-03 22:21:45 +00:00
apfelmaennchen
8b8db2aaba YMarks: some small changes/fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-03 21:21:06 +00:00
apfelmaennchen
441035f1f4 YMarks: some improvements to flexigrid quick search on YMarks.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 20:11:58 +00:00
orbiter
6fa439c82b - refactoring of robots
- added option to crawler to send error-URLs to solr
- changed solr scheme slightly (no multi-value fields where no multi values are)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 14:05:51 +00:00
sixcooler
1ea0bc775c @apfelmaenchen:
is this the expected, but forgotten change?
Please correct if I'm wrong
(this let me build Yacy again)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-02 10:46:05 +00:00
apfelmaennchen
e7c2ea193b YMark:
- general improvements on importers, especially on auto tagging
- added get_tags (needed for tag clouds etc.)
- improved flexigrid support
- added YMarks.html (not fully working) that will eventually replace Bookmarks.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-01 21:42:48 +00:00
orbiter
e3d19d0a90 fix in Document inboundlinks/outboundlinks sorting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-01 15:49:04 +00:00
pca
5e2d38ef19 Windows Installer:
- fix for firewall Vista/Win7
- update to JRE 1.6 u25
- TODO: fix for firewall WinXP and setting for uPnP (Port 1900)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-30 19:32:07 +00:00
orbiter
4e8fa03514 added more attributes to html evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7688 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 15:36:44 +00:00
orbiter
3b578a28ef some patches to prevent that empty or bad IP information is broadcasted
- on client-side: fix bad IP reports from remote Peers by replacing their reported IP with their server IP if the reported IP is bad, broken or disallowed
- on server-side: the same during a peer ping (here the ping'ed server acts also as client during the back-ping) and also when receiving a message or a search where the client sends also its seed. Here the IP is replaced by the client IP if the reported IP is broken or bad

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-29 10:58:12 +00:00