Commit Graph

308 Commits

Author SHA1 Message Date
Michael Peter Christen
8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly 2012-06-25 14:59:46 +02:00
Michael Peter Christen
b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
2012-06-25 11:34:38 +02:00
Michael Peter Christen
1be0025a9c - added test for EmbeddedSolrConnector
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
2012-06-22 00:36:49 +02:00
Michael Peter Christen
dbdd697f4d moved RDFaParser.xsl configuration file to defaults 2012-06-21 16:09:12 +02:00
Michael Peter Christen
8738336408 set Xms lower than Xmx 2012-06-19 08:45:49 +02:00
Michael Peter Christen
96f6a5869f more robust OAI-PMH client (large time-out, three re-tries). OAI-PMH
server appeart to be very slow sometimes
2012-06-16 22:30:31 +02:00
Michael Peter Christen
6d17686258 made triplestore persistent by default
added a size display in triplestore servlet
2012-06-15 19:13:07 +02:00
cominch
3c255c025b Show tags in search results (if activated in ConfigPortal_p.html) 2012-06-15 10:43:05 +02:00
Michael Peter Christen
a5cdfb91de - fixed Cache link (below snippet)
- added 'Augmented Proxy' link below snippet
- added configuration options for augmented proxy
2012-06-14 19:55:34 +02:00
Roland 'Quix0r' Haeder
af5a597e47 Scroogle is not comming back, remove dead code
Conflicts:
	source/net/yacy/search/Switchboard.java
2012-06-10 23:38:41 +02:00
cominch
90512640bf Added config switches for custom parser
Conflicts:
	source/net/yacy/document/TextParser.java
2012-06-10 12:49:36 +02:00
cominch
5d20cd324a Add Triplestore and RDF query interface
Conflicts:
	build.xml
	defaults/yacy.init
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:35:59 +02:00
cominch
a32943b382 add json mimetype 2012-06-10 09:29:09 +02:00
Michael Peter Christen
41c02cb10e - less restrictions for usage of Table RAM copy
- new limit to use the table copy (instead of flag): 400MB available. If
less is available, then a copy is never used. If more is available, then
it can be used if there is a remaining space of at least 200MB
- flush caches more often: flush the Digest cache
2012-06-08 12:48:25 +02:00
Michael Peter Christen
8002fd2578 use less cache space since a large cache would cause more memory usage
in index files.
2012-06-06 14:17:42 +02:00
Michael Peter Christen
5aee19daa4 added show from cache in search results (not yet finished) 2012-06-04 23:44:26 +02:00
Michael Peter Christen
0d32a766ed relax verify attribute for search widget to make it faster:
set to "cacheonly"
2012-05-20 00:50:54 +02:00
Michael Peter Christen
7eece0256f moved yacy.logging to defaults according to request in
http://bugs.yacy.net/view.php?id=55
2012-05-17 04:26:03 +02:00
Michael Peter Christen
db9d81cb7a ups 2012-05-16 01:04:08 +02:00
Michael Peter Christen
e7e381d110 added configuration to switch off redirection following in crawler 2012-05-15 12:25:46 +02:00
Michael Peter Christen
2be327b5ab update location update 2012-04-19 11:49:43 +02:00
Michael Peter Christen
99c74699de removed scroogle (scroogle is dead) 2012-02-25 12:57:59 +01:00
Michael Peter Christen
8bee1472c9 there is no noindex, only nofollow in links 2012-01-31 23:46:35 +01:00
Michael Peter Christen
4c5edab1ec added option to have exception search result windows 2012-01-26 15:32:30 +01:00
Michael Peter Christen
696ee5fc16 removed pdf from default parser deny list 2012-01-23 17:27:58 +01:00
Lotus
c73af39e54 refactoring of tray icon class,
now uses Java 6 methods natively
2012-01-18 20:47:09 +01:00
Michael Peter Christen
987b412491 updated solr scheme: generic declaration of solr schemes 2012-01-13 11:25:15 +01:00
Michael Peter Christen
0bcef2d156 added feature as requested in
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
Michael Christen
17f962fceb translator updates:
- config string for chinese
- do not copy the language file to DATA/LOCALE any more (and do not use
them there, this is really confusing for new translators)
2011-12-08 10:25:26 +01:00
Michael Christen
c715d19c09 fixes for dependency on svn 2011-12-06 22:05:22 +01:00
Michael Christen
f62e6fb438 less frequent DHT distribution to reduce the load a bit on every peer 2011-12-05 15:45:33 +01:00
Michael Christen
9dbc93613e now that the whole world knows that we actually do p2p and not
metasearch we can support a default look-up to scroogle to gain more
attention to people who say that your search results are incomplete
2011-12-05 11:52:24 +01:00
orbiter
f9216e388c - faster ping to clean up old peers faster
- clean up more news

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-30 21:21:16 +00:00
orbiter
ac5bda205f - removed lower page navigation (it never looks nice)
- added visibility of metadata and parser in search results since that shows what YaCy can do in a nice way

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 13:30:42 +00:00
orbiter
c659310e89 - removed option to search for audio, video and applications. These things are still experimental and should not be shown to new users since this would cause them to argue that YaCy does not work. The functions are stil available, because:
- added a configuration option in ConfigPortal to swtich the search media types on or off

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8090 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 13:07:03 +00:00
orbiter
6cd27473f5 - better default values for caching and cache usage
- set new caching and verification behavior according to use case automatically

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8087 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 10:22:02 +00:00
orbiter
5866c73a09 fix for compare search: use scroogle instead of bing and get a default search if configured search engine is not available
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-23 15:17:46 +00:00
orbiter
e4a82ddd8b produce a bookmark entry from every crawl start. these bookmarks are always private.
these bookmarks will be used to get a source reference for the search in case of intranet or portal searches.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-21 23:10:29 +00:00
orbiter
f183d3822c added a default accept header in http requests since some http fraud detection functions check that this header field exist
see also: http://bad-behavior.ioerror.us/ in source file browser.inc.php

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8048 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 15:27:43 +00:00
orbiter
78ce3b13be typo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8027 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-10 11:57:26 +00:00
suessthomas
887f088dad The IP address of the YaCy-Demo portal added to Whitelist.
This is only a temporary workaround.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-03 23:44:49 +00:00
orbiter
1b45e33f04 added robots tag parser to solr scheme
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 13:39:01 +00:00
orbiter
cf4fd525ee added directDocByURL attribute in crawl profile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-30 12:38:28 +00:00
orbiter
5ad7f9612b added crawl settings for three new filters for each crawl:
must-match for IPs (IPs that are known after DNS resolving for each URL in the crawl queue)
must-not-match for IPs
must-match against a list of country codes (allows only loading from hosts that are hostet in given countries)

note: the settings and input environment is there with that commit, but the values are not yet evaluated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-27 21:58:18 +00:00
orbiter
2c3161b4ac refactoring:
RankingProcess -> RWIProcess
ResultFetcher -> SnippetProcess


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-26 21:42:28 +00:00
orbiter
6b22865dbc - removed some warinings
- removed a dead update location

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7970 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-24 01:58:54 +00:00
orbiter
e48ce5d80e - style change for search box: larger font, selected by default
- style change for search results: by default no parser, size, image info

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 09:05:06 +00:00
sixcooler
ecb4986b38 refactored stuff from last commit to ReferenceContainer
see: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3353&p=23163#p23163
the limiting of references is disabled per default
to enable this set yacy.conf - index.maxReferences to a value of e.g. 100000

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 18:55:16 +00:00
orbiter
49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:08:57 +00:00
orbiter
9a8937f8b6 be more liberal when evaluating search results. This may cause that it is possible to fraud content on fresh peers, but that is better than looong waiting times for the evaluation of every link which causes that everybody rejects YaCy as 'too slow'. But this is only because of the high standards that YaCy sets to itself. If we are able to gain more users by lowering the standard, then that is useful. The option to set that flag to verify each link is still there.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 16:02:15 +00:00
orbiter
1c007188ad bugfixes in html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-31 16:02:06 +00:00
orbiter
5dd2efc9a2 - bugfixes in html parser
- new fields in solr
- extended file viewer to debug parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-25 15:52:25 +00:00
sixcooler
4fec99115b Implementation of strategies for controlling memory resources.
You can toggle between previous (standard) and new (generation) strategy at PerformanceMemory_p.html.
The generation memory strategy is implemented with the objective of running more robust
but with the cost of early stopping some tasks (eg. dht) while running low on memory.
This new strategy does respect the generational way a heap is organized on most used jvms.
These changes run fine on my 3 peers for weeks now, but as I'm human, I may fail.
Please be carefull using generation memory strategy and report errors by naming
OS, jvm and java_args.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-22 17:50:03 +00:00
orbiter
77a9af99f1 same values for Xmx and Xms: memory extension may be difficult if the OS has not the remaining memory available and may kill the jvm. If the memory is reserved at the start but never used the OS may handle that as well and leave non-used space in swap area (and never swap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7867 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:54:27 +00:00
orbiter
768c59740c - replaced solrj 3.1 with solrj 3.3
- updated also slf4j
- added authentication for solrj


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7829 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 16:35:30 +00:00
orbiter
e7c7598923 docfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-04 10:48:01 +00:00
orbiter
b84089ff04 fix for solr scheme list definition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 22:59:43 +00:00
orbiter
2d4bb139d3 - added counting of links with noindex tag for solr index
- bugfixes for solr index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7820 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-03 06:40:05 +00:00
lotus
fa6f2c2b44 use proxy accounts by default for more security
http://bugs.yacy.net/view.php?id=45

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 17:16:00 +00:00
orbiter
bda3eec0ff added parsing of canonical link element to html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-01 16:38:01 +00:00
orbiter
b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-30 15:49:21 +00:00
orbiter
6deef60bc0 added keyword list for solr index attributes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 15:33:27 +00:00
f1ori
fdc84d8319 small pi link on index page to administration pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-29 09:32:00 +00:00
orbiter
84c9658644 added a file type navigator
added a protocol navigator

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 15:39:52 +00:00
suessthomas
66c477129e Creates a new network definition, yacy.networks.metager.unit.
The YaCy freeworld network used in this network definition, minor enhancements for the feed of MetaGer were integrated.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7771 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-03 22:34:42 +00:00
f1ori
900dacbf97 * improve link rewriting in proxy-url
* only rewrites links, which are in current search domain

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-01 13:27:04 +00:00
orbiter
cc239b18cd fix for IPv6 localhost proxy client
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 16:24:11 +00:00
orbiter
10e2f588f8 - enhanced ybr ranking computation
- many speed/performance hacks
- added solr charding and new charding web interface
- added option to switch off the yacy index when using solr
- added new fail-url categories which are used to make a distinction which fail-urls to be sent to solr
- refactoring/renaming of some method names to distinguish host/url hashes better
- a large number of bug/npe fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 10:57:02 +00:00
orbiter
3ed4a09368 small features, some bug fixes and performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-23 21:08:04 +00:00
orbiter
d8e934c085 better abstraction of http client identification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-26 13:35:29 +00:00
orbiter
b77b8cac0c - enhanced html parser: recognized much more details in the content
- added more properties to solr index
- refactoring
- more constants in switchboard
- fix for some NPEs
- recognition of more images
- removed synchronization in HandleMap (obviously not necessary?)
- added a nolocal configuration to remove excessive dns lookup (works only on allip - default off). Indexes produced with this setting are all flagged with 'local' and are (on purpose) not usable for freeworld because they will be rejected as beeing local.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-21 13:58:49 +00:00
orbiter
19fd13d3bc Added federated index storage to solr.
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.

The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.

federated solr storage is switched off by default.

To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/

Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
   that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
   want to use solr instead of YaCy.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-14 20:05:04 +00:00
orbiter
b1a8d0c020 enhancements to web cache and less strict caching rules
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-22 10:35:26 +00:00
orbiter
ba03ca8620 added more configuration options for search:
- removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal
- added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal
- added configuration of navigation options to ConfigPortal
- added an option to switch off automatic index cleaning in case that a link verification method fails


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 07:50:34 +00:00
orbiter
bed79402be introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured.
This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition.

You can find now in the network definition file:
network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime
and in the yacy.conf file:
remotesearch.maxcount and remotesearch.maxtime

There is currently no web interface to define the client-side remote search attributes, please set them manually
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 13:44:00 +00:00
f1ori
59dea3a284 * implement url proxy, a proxy via the url http://peer:port/proxy.html?url=http://domain.tld/path
* enable with proxyURL = true
* could be useful to browse specific pages with proxy or use own improvements in proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7538 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-27 21:39:38 +00:00
orbiter
e3ef4e3021 - increased default peer ping time from 2 minutes to 1 minute
- filtering out too old peers when reading seed lists (limit is now 240 minutes)
- added concurrent host names resolving in front of the http client because the http client uses the java built-in DNS resolve which is not multithreading-safe (i have seen deadlocks in thread dumps showing that this bug in jdk is still there)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7515 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-23 09:42:01 +00:00
orbiter
d28f8040e0 removed unnecessary recording function that caused also a performance problem after serving too much files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 13:33:28 +00:00
orbiter
addbd5b482 moved the main update url - because of the many languages we support now on yacy.net
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-17 01:22:22 +00:00
orbiter
6c52e31993 new methods to open a browser
- if YaCy is started with the option -gui, it is not in headless mode. Then the java 1.6 browse method is used if all other methods fail
- in linux, the path /etc/alternatives/www-browser is used if no firefox is installed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-14 16:15:14 +00:00
orbiter
5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-13 17:37:28 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
low012
64f32e8f00 *) replaced all IPs in IP filters for proxy with the proper regular expression
*) some cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-11 23:37:13 +00:00
orbiter
fe93caac5a added flags and administration options to show advanced search and to show search result attributes (for each search result)
Administration can be done at ConfigPortal.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:54:13 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
6c35b68f17 - removed 'peerName' property from the yacy settings file because this information is stored in the yacy seed file
- the own seed file gets the lead for storage of the peer name
- exchanged default peer name generation method with one that does not use the local ip
- default peer names are now strings starting with '_anon'
- added another switch to suppress forwarding to ConfigBasic if the name was already changed
- replaced all usages of the yacy.conf peerName with access to the local seed
- changes to the peer name are now applied directly and not after the next peer ping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:12:17 +00:00
orbiter
786166041a - added recording of all accessed and submitted servlets
- this recording is then used to redirect from the Status.html page to BasicConfig in case that servlet was never submitted
- this acts as an addition to the new default pop-up page 'index.html' which offers an administration link to Status.html. For a first-time user this then redirects directly to the former start page BasicConfig.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-27 11:17:11 +00:00
orbiter
3fe03f153d - search page becomes default start page (new users are not forced to do configuration since this is not necessary)
- adjusted top menu on search page (shows less stuff and now also the network graphics)
- adjusted the network page (looks better in when showing no other navigation on top)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-26 14:58:28 +00:00
orbiter
59d9fe1bd7 added more php mime types
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7443 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:52:36 +00:00
orbiter
3ae8f40fc8 removed yacy.network.group - this feature was never used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:50:36 +00:00
orbiter
efb4ca8fa8 modified auto-delete of search failure-words:
- words are now not deleted from the search index automatically if index receive is switched off
- a flag in the network definition defines if this feature is switched on at all
- the search filter for not-found word references is switched off for server-side remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:46:00 +00:00
f1ori
4e29e9712a * create cleanupjob for cached failed urls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-17 15:04:00 +00:00
lotus
b1484299b2 same units for memory observer configuration (MiB)
old setting for DHT (RAM) will be lost after update
can be set on /Performance_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-02 20:38:01 +00:00
low012
11ea966f9e *) added SID file (Commodore 64) sound file parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 12:06:04 +00:00
low012
936e976c23 *) added FreeMind (http://freemind.sourceforge.net/) mindmap parser
*) minor changes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-27 20:13:31 +00:00
orbiter
4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html
instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-08 10:50:23 +00:00
orbiter
fc2e41e691 added a forwarder for the default page. The forwarder forwards a browser to a different page if the root file index.html is accessed. This can be done by setting the name of the forwarder page to the field
"Default index.html Page (by forwarder)" in /ConfigPortal.html
The purpose is to forward to /yacyinteractive.html for the 27C3 FTP search plattform

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-07 15:46:04 +00:00
orbiter
cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
 - for search results ordered by date (most recent up)
 /near
 - for search results where search words appear near to each other (closest up)
 /language/<lang>
 - for a sorting by language where the wanted language gets up. Example: /language/de
  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 18:08:20 +00:00
orbiter
a9f754c45f removed unused CR accumulation and distribution process
this was never used and extended in the last years. The resulting YBR ranking criteria
is still a good idea and will be used in the future. Possible generation methods for YBR
ranking are:
- "trust-rank" using the link structure as can be discovered in a single crawl (idea from FSCONS)
- "block-rank" calculated from the local link structure
- a distributed "block-rank" using the xml API to the link structure from other peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 11:07:42 +00:00
f1ori
442bebca2b * %0 does not belong to the IPv6-Address -> entry does not work on some systems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-06 15:09:28 +00:00