Commit Graph

3327 Commits

Author SHA1 Message Date
orbiter
08108f0ece fix for http://bugs.yacy.net/view.php?id=12
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-17 22:53:15 +00:00
apfelmaennchen
62855f9567 YMark: code clean up and some small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 21:19:42 +00:00
apfelmaennchen
667e912b19 YMark:
- some improvements to firefox json bookmark importer
- test import with: /api/ymarks/test_import.html
- view ymarks with: /api/ymarks/test_treeview.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 09:09:33 +00:00
apfelmaennchen
a0e4960a4d YMark:
- first attempt for a firefox json bookmark importer
- added JSON library json-simple-1.1.jar

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 20:58:58 +00:00
orbiter
958ff4778e enhanced location search:
search is now done using verify=false (instead of verify=cacheonly) which will cause that much more targets can be found.
This showed a bug where no location information was used from the metadata (and other metadata information) if cache=false is requested. The bug was fixed.

Added also location parsing from wikimedia dumps. A wikipedia dump can now also be a source for a location search.
Fixed many smaller bugs in connection with location search.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 15:54:19 +00:00
orbiter
19fd13d3bc Added federated index storage to solr.
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.

The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.

federated solr storage is switched off by default.

To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/

Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
   that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
   want to use solr instead of YaCy.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-14 20:05:04 +00:00
orbiter
4c013d9088 more UTF8 getBytes() performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-12 05:02:36 +00:00
apfelmaennchen
78d6d6ca06 refactoring for ymarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-08 21:15:10 +00:00
orbiter
b2fe4b7b1a added a handling of appearances of yacy bot entries in robots.txt if this entry addresses the yacy peer
(directly or indirectly) and it grants a crawl-delay of 0. Then all forced pause mechanisms in YaCy are switched off and the domain is crawled at full speed.
crawl delay values can be assigned to either
- all yacy peers using the user-agent yacybot
- a specific peer with peer name <peer-name>.yacy or
- a specific peer with peer hash <peer-hash>.yacyh


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-03 23:39:45 +00:00
low012
e25c1f2ea3 *) preventing whitespace keys in config file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-02 09:25:07 +00:00
orbiter
cb6f709a16 - enhancements in surrogate reading
- better display of map in location search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-02 00:11:37 +00:00
low012
1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
*) cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-01 23:32:40 +00:00
orbiter
156cf02703 - added an index constraint 'has location' to the condenser
- added evaluation of the 'has location' constraint to search using the /location operator


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-31 09:41:30 +00:00
orbiter
0430a94eaa the location search shows now not re-evaluated locations but only such locations that are attached as metadata to web pages
- added parser for in-text appearing geo-locations
- added geo-locations to rss search result
- added evaluation of metadata-attached geo-locations in yacysearch_location to show search results within a map


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 23:26:36 +00:00
orbiter
9b25d07295 - added geo information parsing to html parser
- extended metadata information in index with geolocalisation
- added display of location in yacydoc and ViewFile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-30 00:49:47 +00:00
lotus
06afa94f9d hups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-24 06:24:37 +00:00
lotus
a9a9db98c8 better rename modified version
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7625 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 20:22:08 +00:00
lotus
e19ca27004 do not autocomplete on mouseover. this has resulted in unwanted autocomplete.
fixes bug #3

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 20:13:43 +00:00
low012
16cd919795 *) fixed Exceptions which caused 500 error when entering invalid URL mask or invalid prefer mask, invalid masks are ignored, error message is displayed on yacysearch.html (what about yacysearch.rss and yacysearch.json?)
*) fixed "more options" link on yacysearch.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 00:48:19 +00:00
orbiter
01b968d836 better concurrency in ViewImage icon cache and OOM protection for too large icon caches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 11:00:55 +00:00
orbiter
b1a8d0c020 enhancements to web cache and less strict caching rules
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 10:35:26 +00:00
orbiter
f3baaca920 - enhancements to DNS IP caching and crawler speed
- bugfixes (NPEs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 09:34:10 +00:00
orbiter
ba03ca8620 added more configuration options for search:
- removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal
- added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal
- added configuration of navigation options to ConfigPortal
- added an option to switch off automatic index cleaning in case that a link verification method fails


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 07:50:34 +00:00
f1ori
e0c7d490f9 * fix bug #6
* exclude signature files from auto-deletion of unknown files in DATA/RELEASE


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-20 17:59:58 +00:00
orbiter
a50f28e6e7 - fixed missing save operation for peer name change
- fixed import of mediawiki dump files
- added script to add mediawiki dump files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-19 23:52:09 +00:00
orbiter
43e1660512 fix/enhancement in Crawler: do not generate domain match pattern if crawl depth is 0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7607 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-17 21:07:44 +00:00
orbiter
b1d133b69f another anhancement to the ThreadDump function: better multiple dumps and filtering out of not interesting dump parts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-17 20:48:39 +00:00
orbiter
c2a968c23f fix for bug in formatting in ThreadDump
and added hint for linux/Mac users that they may use the LOCKED feature using the start option -l

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7601 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 08:39:05 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
orbiter
0a11727374 added new feature for Thread dump:
"THREADS WITH STATES: LOCK FOR OTHERS"
will show only such threads that lock other threads. This is the 'opposite part' of the blocked threads.
Because that this uses a thread dump that is produced with a kill -3 on the PID of the process and such thread dumps are written by the Java core outside of System.out and Sytem.err it is necessary to read the dump from a log in the file system. Such a log is only written if YaCy is started with startYACY.sh on a linux system. That means:
this feature is only available on linux and Mac OS X if YaCy is started with ./startYACY.sh -l


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 21:32:20 +00:00
orbiter
f9e5c21083 update to thread dump logs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7590 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-14 20:46:04 +00:00
orbiter
8f11d3a5bb redesigned the ScoreMap classes:
- new concurrent score map using atom operation from java concurrency classes
- redesigned difference beween StaticScore and Dynamic Score into ScoreMap and ReversibleScoreMap allowed that many classes can now use simple ScoreMap Objects which can be used better in concurrent environments using the ConcurrentScoreMap
- switched from DynamicScore to ConcurrentScoreMap usage wherever possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-13 01:41:44 +00:00
orbiter
dc0db3550e avoid string conversion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-11 00:59:27 +00:00
orbiter
694fa3a2a5 - replaced more direct string-based UTF-8 conversions by predefined UTF-8 conversion
- changed menu structure slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 23:25:07 +00:00
lotus
bbb7aea8f3 fix basic config change in portal mode
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7582 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 20:04:15 +00:00
orbiter
30aed9824a moved getBytes() to UTF8.getBytes() to use a default String encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7580 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-10 12:35:32 +00:00
orbiter
4d733608fb fix for broken JSON, see: http://forum.yacy-websuche.de/viewtopic.php?p=22162#p22162
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7577 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 20:08:20 +00:00
orbiter
7962d35425 - removed file upload function in crawl start and replaced it with an input field for a file path where the crawl start file is loaded. This was necessary to support the API steering for file crawl starts, for two reasons:
1) if the file is changed for a re-crawl this is not reflected in the steering because it would take the previously uploaded crawl start file
2) browsers do not submit the full path of the selected file even if this path is shown in the input field because of security reasons. There is no work-around or hack to make the submission of the full path possible

- fixed deletion of crawl start point urls in crawl stack and balancer double-check
- fixed a problem with steering self-call (no resolving of localhost)
- added more logging for the crawler to supervise why crawl urls are not taken by the loader
- added a javascript onload-function to select domain restriction in all cases where a crawl is started from a file or from a url
- fixed the restrict-to-domain pattern computation, added a 'www.'-prefix and added this functionality also to a crawl start from file 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 12:50:39 +00:00
orbiter
e1b6916423 always try to guess the size of a StringBuilder to prevent too many memory re-allocations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-09 09:29:05 +00:00
low012
bea8137997 *) minor changes
*) fixed potential NPE in suggest.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7571 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 23:27:41 +00:00
low012
3e03963b1c *) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 22:37:17 +00:00
low012
0da3b6489e *) added the only changes from r7557 which actualy made sense
*) caught potential exception (occured when user entered a string which did not contain digits only for the maximum number of lines)
*) use prop.putHTML to avoid potential XSS attack in case an attacker manages to cause something to end up in the logs which contains a string which was defined by the attacker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7562 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 21:44:58 +00:00
orbiter
29acd2f108 reverted also changes in ViewLog from SVN 7557 because the ThreadDump submenu was not visible any more.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7560 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 21:19:47 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
low012
9d366ee9d7 *) removed unused code (I assume that most of the code was really dead, but if you need any of the classes, tell me and I will put it back in.)
*) minor code cleanup in ViewLog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7557 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 18:55:11 +00:00
orbiter
7138f4036b less synchronization, better thread dump tool
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 15:29:45 +00:00
orbiter
f8d0454c53 small bug fixes and experiments with search speed enhancement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 14:29:22 +00:00
orbiter
bed79402be introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured.
This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition.

You can find now in the network definition file:
network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime
and in the yacy.conf file:
remotesearch.maxcount and remotesearch.maxtime

There is currently no web interface to define the client-side remote search attributes, please set them manually
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 13:44:00 +00:00
orbiter
42d90664f3 - fixed a memory leak in the httpc.post method (no finish)
- patched some more memory-saving relevant code
- some more minor bug fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-01 09:03:33 +00:00
mikeworks
85f5c02deb de.lng: Updated German translation and removed old unused strings, e.g. 8080 -> 8090 and Search Portal translations
Bookmarks.html, Ranking_p.html, base.css: Fixed XHTML errors to make pages compatible again - switched div -> span inside dt and replaced css definition of id (unique) with class (class of elements)
header.template: Fixed link to berliOS changelog by replacing & -> &amp; and adding translation for German page by refering to German berliOS UI ;-]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-28 00:22:05 +00:00