Commit Graph

564 Commits

Author SHA1 Message Date
Michael Peter Christen
d6b82840f8 added a feature to find similarities in documents.
This uses an enhanced version of the Nutch/Solr TextProfileSignatue.
As a result, a signature of the document is written to the solr search
index. Additionally for each time when a signature is written, it is
checked if the singature exists already in the index. If the signature
does not exist, the document is marked as unique. The unique attribute
can now be used to sort document lists and bring duplicates to the end
of a result list.
To enable this, a large portion of the search api to Solr had to be
changed. This affected mainly caching of 'exists' searches to enhance
the check for existing signatures and do this without actually doing a
solr query.
Because here the first time a long number is used as value in the Solr
store, also the value naming in the YaCySchema had to be adopted and
normalized. This caused that many files had to be changed.
2012-11-21 18:46:49 +01:00
Michael Peter Christen
952e143580 FINALLY YaCy can now search for full strings using double- or
singlequoted strings in the search query line!!!
2012-11-18 16:03:34 +01:00
orbiter
5dfd6359cb redesign of the QueryParams class: introduced QueryGoal which holds the
query string parser. This shall be used to create a proper full-string
matching which is handled then by QueryGoal.
2012-11-18 01:22:41 +01:00
Michael Peter Christen
d64445c3cb because we have the inurl:<term> - searchmodifier, we don't actually
need regular expressions as search attributes. They had now been removed
from the advanced search page while they are still created internally.
The filter is then expressed against solr as regular expression filter
query. If the expression points out a selection of an specific protocol,
host or filetype this is then translated into a facetted query.
2012-11-13 11:45:56 +01:00
Michael Peter Christen
61995d508e do the commit anyway before calling a search interface 2012-11-07 17:27:50 +01:00
Michael Peter Christen
570e42c4e3 fix for filetype naviagtor 2012-11-07 13:53:29 +01:00
Michael Peter Christen
2371ef031c added solr faceted search support to YaCy search results
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
2012-11-06 14:32:08 +01:00
Michael Peter Christen
8fb370d9f8 renovated the way how search results are count. should be correct now... 2012-11-05 03:19:28 +01:00
Michael Peter Christen
1168d09de8 more refactoring - integrated the code of SnippetProcess into
SearchEvent
2012-11-01 17:40:06 +01:00
Michael Peter Christen
6629e37685 tried to clean up the search process mess 2012-11-01 17:16:43 +01:00
Michael Peter Christen
c5f67a5d6d fixed a problem with local search from solr results: now all results
from solr are shown (again)
2012-11-01 10:22:22 +01:00
Michael Peter Christen
a87811bc38 more auto-commit calls when a search interface is opened, but not when a
search is done there to prevent blocking during search-time.
2012-10-29 11:27:13 +01:00
Michael Peter Christen
8e1248ffe3 force a commit in advance of a search for the administrator to get most
recent results even if commit time is high and an indexing is ongoing.
2012-10-26 15:35:42 +02:00
Michael Peter Christen
43f3345c90 - removed dependencies from URIMetadataRow and made direct access to
URIMetadataNode which creates the opportunity to access Solr objects
directly and use their information richness
- lazy initialization of the URIMetadataNode object - should cause less
computation and memory usage during search.
- removed dead code
2012-10-16 18:11:57 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of &amp; parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
24d2ee3c52 - better date ranking
- more protection against NPE and time travel effects
2012-09-26 18:36:32 +02:00
Michael Peter Christen
1533bfd63b refactoring 2012-09-25 21:20:03 +02:00
Michael Peter Christen
e49359cc95 removed tenant query attribute since it is not used any more and is
replaced by the site-operator in the GSA interface. This operator can
also be simulated in the Solr interface using the collections_sxt field.
2012-09-25 21:09:06 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
63762d8f89 removed kelondro dependencies from cora 2012-09-20 19:38:22 +02:00
orbiter
a55e77a115 added twitter search heuristic 2012-09-13 23:53:53 +02:00
Michael Peter Christen
31d4d38804 - extended the solr interface by a references-by-word-count method
- reduced danger that a non-existing RWI database causes NPEs
- added Solr queries to did-you-mean: this makes it possible that our
did-you-mean algorithm works together with only Solr and without RWIs
2012-08-31 13:03:00 +02:00
Michael Peter Christen
0cab06c47c refactoring 2012-08-17 15:52:33 +02:00
Michael Peter Christen
18f989dfb1 - refactoring (load -> getMetadata)
- added getDocument to retrieve Solr documents which shall replace
getMetadata
2012-08-17 01:34:38 +02:00
Michael Peter Christen
6197caf698 added clear-text search words in query params 2012-08-16 23:05:37 +02:00
Michael Peter Christen
24d9db1613 snippet retrieval loading processes may use a smaller minimum load time
value than crawling processes. This speeds up the search result
preparation dramatically.
2012-07-30 10:38:23 +02:00
Michael Peter Christen
1687737771 Abstraction of HandleMap and HandleSet 2012-07-27 12:13:53 +02:00
Michael Peter Christen
315d83cfa0 cleanup 2012-07-24 22:16:56 +02:00
reger
36c9875b6e removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid) 2012-07-23 00:00:40 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
241dd8410a removed snippet pattern filter - it was not used 2012-07-05 09:21:27 +02:00
Michael Peter Christen
1825f165b8 better integration of blacklist according to use case 2012-07-02 13:57:29 +02:00
Michael Peter Christen
03280fb161 removed segments-concept and the Segments class:
the segments had been there to create a tenant-infrastructure but were
never be used since that was all much too complex. There will be a
replacement using a solr navigation using a segment field in the search
index.
2012-06-28 14:27:29 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c fixing redirects and status codes: storing of status code in
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
9264d8b4af removed old navigation practice using subject tags in favor of
triplestore-tags
2012-06-17 00:33:40 +02:00
Michael Peter Christen
eeb4fd8b8c refactoring (geolocalzation -> geolocation) 2012-06-16 22:09:32 +02:00
Michael Peter Christen
8b53771db2 changed behavior of navigation processing:
- vocabulary annotation is not done any more into the metadata of urldb
- vocabularies are written into the jena triplestore using a rdf
vocabulary
- vocabularies for rdf tripel must be updated; refactoring done
- with the new navigation tags in the triplestore a faster
pre-urldb-lookup is possible: navigation is processed now within the RWI
during pre-ranking retrieval
- added also a Owl vocabulary stub to add the plain-text url to the
triplestore using the owl:sameas predicate
2012-06-11 23:49:30 +02:00
Michael Peter Christen
dd14b19c26 lazy initialization of block rank table ... only normal web search uses
this. When interactive search or location search is used, the block rank
is switched off
2012-06-08 09:41:29 +02:00
Michael Peter Christen
a61f44f9e4 lazy initialization of block rank table.
this causes that the table is not initialized when there is no search is
done. the effect is most strong if YaCy is started headless which causes
no browser pop-up which otherwise would load the search page and
therefore trigger the initialization of the table.
2012-06-07 13:16:38 +02:00
Michael Peter Christen
96c8119b50 added GeoLocation / GeoPoint classes which uses less memory than
Location/Coordinates and has initializers with correct order of lat,lon
coordinates
2012-06-06 12:57:42 +02:00
Michael Peter Christen
a1fe65b115 performance hacks 2012-06-05 12:06:26 +02:00
Michael Peter Christen
2fe207f813 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-04 23:44:38 +02:00
Michael Peter Christen
5aee19daa4 added show from cache in search results (not yet finished) 2012-06-04 23:44:26 +02:00
Michael Peter Christen
e0d8643226 - performance hacks
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
9b4c699526 ehanced location search:
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
2012-05-31 22:39:53 +02:00
Michael Peter Christen
8b974905ee changed log-in text for all servlets with authentication:
- added hint how to set the password using a shell script
- added a shell script to change the password
2012-05-24 13:24:31 +02:00
Michael Peter Christen
7bf421b9dd - fixed image search page navigation
- removed some deadlocks and ConcurrentModificationExceptions during
DidYouMean collection
2012-05-21 01:58:29 +02:00
Roland 'Quix0r' Haeder
5f983faef9 No &amp; in JavaScript-embeded URLs, added ability to stop focus in
ConfigPortal.html preview (is this not secured with _p????)

Conflicts:
	htroot/yacyinteractive.java
	htroot/yacysearch.java
2012-05-17 05:49:25 +02:00
Michael Peter Christen
77f8e9fb9b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-05-04 17:29:16 +02:00
Michael Peter Christen
ba6aaabc51 refactoring + parser bugfixes 2012-05-04 17:28:27 +02:00
Michael Peter Christen
a18b6dee04 Merge remote branch 'bbyacy-rc1/master' 2012-04-29 21:22:23 +02:00
reger
ea932f841c changed link to opensearchdescription document to a absolute uri (in yacysearch.html and yacysearch.rss)
see http://www.opensearch.org/Specifications/OpenSearch/1.1/Draft_5#The_.22Description.22_element
2012-04-29 05:50:56 +02:00
Michael Peter Christen
5c66880be2 fix for search result selection in case that contentdom is not set 2012-04-26 00:04:23 +02:00
Michael Peter Christen
4aa0eedead one more scroogle... 2012-04-24 12:05:37 +02:00
Michael Peter Christen
347612ddd4 removed scroogle parser 2012-04-24 12:04:44 +02:00
Michael Peter Christen
f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c refactoring of ContentDomain: now subclass of Classification 2012-04-22 00:04:36 +02:00
Lotus
78f0d8f046 no focus on preview frames for search integration
fixes bug http://bugs.yacy.net/view.php?id=161
2012-03-17 21:10:29 +01:00
Michael Peter Christen
e2f8f263e8 changed storage of search words: keep order 2012-02-01 18:13:31 +01:00
Michael Peter Christen
88b86afc89 no DoS protection for intranet mode 2012-01-25 12:13:03 +01:00
Michael Peter Christen
e8d24fd802 author navigator can be switched off 2012-01-25 11:11:42 +01:00
Michael Peter Christen
d5ead5314d changed navigation links: now using checkboxes.
This looks better and allows that negative checkboxes (such that remove
the navigation) are possible. These are not yet implemented (comming
next)
2012-01-24 19:03:47 +01:00
Michael Peter Christen
dc165275ad bugfix for usage of multiple vocabulary navigators 2012-01-18 01:00:30 +01:00
Michael Peter Christen
83009d86f7 added the vocabulary navigator. It can be very simply tested by
switching on the locale dictionaries.
2012-01-17 01:53:08 +01:00
stbrumm
d18095dc48 Patch fuer Issue 0000102
and fixes to Patch (private peer status is a property of a peer, not a
status)
2012-01-03 17:49:37 +01:00
Michael Christen
5bfb287753 make a bad fix even worse 2011-12-27 20:59:03 +01:00
Michael Christen
9e5894c784 Removed handling of components objects for URIMetadataRows.
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Peter Christen
0bcef2d156 added feature as requested in
http://forum.yacy-websuche.de/viewtopic.php?f=18&t=3461
The search can now be configured with a non-display host list.
the search will always exlude the given list of host unless they are
requested directly using the host navigation
2011-12-13 00:16:05 +01:00
orbiter
4b8ff84705 - search bugfixes (page counter and number of results per page; recognition of new search)
- experiments to speed-up the network image production (commented out)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-12-02 09:16:24 +00:00
orbiter
6c9320e82a fix for latest navigation feature...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-27 22:11:39 +00:00
orbiter
5b2e68b60d fixed page navigation counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-27 15:38:37 +00:00
orbiter
ebd840ebf6 - enhanced description on search front page
- fixed language and heuristic modifier
- added hint to crawl start that we can do also ftp and smb crawls
- added a protocol extension to remote crawls to transport all search modifiers to remote peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-26 13:40:33 +00:00
orbiter
e22f8497c9 - tested the ARC methods
- removed strict authentication (if password is empty; this was buggy and not useful; can be switched on if necessary globally and not for each interface method)
- increased speed of CrawlResults page (no dns lookup any more)
- increased speed of favicon display (removed dns lookup)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 14:09:25 +00:00
orbiter
5a55397f99 some last-minute performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 11:23:52 +00:00
orbiter
0cf9ebc3b0 speed enhancements when parsing RWI rows (makes search slightly faster)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8096 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 17:37:04 +00:00
orbiter
ee8b1d4de1 fixed unresolved pattern and unwanted local/global switch when using votes on search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8093 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 16:05:09 +00:00
orbiter
c584db991f creating a bookmark from the search results now works again .. with new YMarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8092 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 14:57:09 +00:00
orbiter
1019c36dad bug fixes and speed enhancements for search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8085 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-24 01:30:12 +00:00
orbiter
8e0b2c5832 fixed cluster search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-23 22:21:14 +00:00
orbiter
804e48888b smaller bug fixes for search behavior; should produce less unnecessary removals and an exact number of results as shown in counter
should also be a little bit faster

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-18 13:09:07 +00:00
orbiter
84c3fc9d97 local/global fixes in search, better abstraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-17 01:05:45 +00:00
orbiter
0d858d48ec replaced String with StringBuilder in suggestion process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-09 14:42:55 +00:00
orbiter
94eab08794 - updated opensearchdescription text and icon
- removed automatic setting of maxitems during search (can be set now elsewhere)
- updated RSSMessage.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-30 01:09:38 +00:00
orbiter
9e4875230f performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-20 23:06:49 +00:00
orbiter
a7df70221e refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-04 09:06:24 +00:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:37:35 +00:00
sixcooler
59b767eebd stop loading via http at defined maximum of bytes - even size is unknown before loading
using max-file-size of type int for parsing documents
(since content is used as byte-arrays, 'integer' should be maximum)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-01 23:28:23 +00:00
orbiter
11dc653de3 added a visualization of peer pings to the performance graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-14 07:07:06 +00:00
orbiter
6d9e5865ee faster appearance of search result page (but complete search time is the same)
this was inspired by http://bugs.yacy.net/view.php?id=37

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-26 21:17:02 +00:00
orbiter
31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search.
for example search for "passwd /ftp". This can also be done with /http /https and /smb
- fixed some search throttling processes that should protect your peer against search DoS or strong search load

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-23 11:57:17 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00
orbiter
87082f407e less String object creation during search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 04:19:20 +00:00
orbiter
4bea3f9714 hack to reduce resource contention caused by massive UTF8 decodings which use java.nio resources:
used a ASCII String <-> byte[] conversion wherever possible. Many Strings in YaCy are hashes which are pure ASCII (base64 hashes).
The new ASCII String <-> byte[] conversion method have less computation overhead than the UTF8 conversion.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-27 08:24:54 +00:00
orbiter
e28bd0d038 fix for some possible causes of memory leaks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 14:35:32 +00:00
orbiter
8d9b5dda3b disabled did-you-mean computation for json and rss search results where this info is not used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7739 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-26 12:35:24 +00:00
orbiter
5b579e21a3 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7713 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-13 06:21:40 +00:00
orbiter
0621a15f89 fix for wrong search result counter: added a counter for all filtered out entities
see also http://bugs.yacy.net/view.php?id=5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-06 23:04:27 +00:00
orbiter
6e42d4de88 - added full-String search function: find things that match exactly what is quoted in the query
- re-structuring authentification methods to fix a problem with API steering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7697 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-05 00:25:14 +00:00
low012
1ff9947f91 *) added new user right: extended search right (allows to define users who can query more results than anonymous users)
*) cleaned up code a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-01 23:32:40 +00:00
orbiter
156cf02703 - added an index constraint 'has location' to the condenser
- added evaluation of the 'has location' constraint to search using the /location operator


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-31 09:41:30 +00:00
low012
16cd919795 *) fixed Exceptions which caused 500 error when entering invalid URL mask or invalid prefer mask, invalid masks are ignored, error message is displayed on yacysearch.html (what about yacysearch.rss and yacysearch.json?)
*) fixed "more options" link on yacysearch.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-23 00:48:19 +00:00
orbiter
ba03ca8620 added more configuration options for search:
- removed configuration button for 'search only for admin' from index.html and added this to ConfigPortal
- added configuration of link verification options (iffresh, cacheonly, nocache, ifexist) to ConfigPortal
- added configuration of navigation options to ConfigPortal
- added an option to switch off automatic index cleaning in case that a link verification method fails


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-21 07:50:34 +00:00
low012
2861d0888a *) simplified code\n*) fixed potential NumberFormatExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-15 01:03:35 +00:00
low012
bea8137997 *) minor changes
*) fixed potential NPE in suggest.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7571 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 23:27:41 +00:00
low012
3e03963b1c *) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-08 22:37:17 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
bed79402be introduction of a new remote search load control: the remote search has taken 10 results per peer with a time-out of 3 seconds so far. The attributes of number of results per peer and time-out time can now be configured.
This has two aspects: the user who searches may want to increase these values to get more results and more load on the remote side and the user of the server which is accessed for this search may want to restrict the load. Both sides can now be configured. The server-site maximum load parameters are defined by a network definition and the client-side search request load can be defined by each user individually but when the remote search is done the requested service is limited to the network definition.

You can find now in the network definition file:
network.unit.remotesearch.maxcount and network.unit.remotesearch.maxtime
and in the yacy.conf file:
remotesearch.maxcount and remotesearch.maxtime

There is currently no web interface to define the client-side remote search attributes, please set them manually
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-04 13:44:00 +00:00
orbiter
24909b3006 slightly less restrictive values for DoS
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 15:24:09 +00:00
orbiter
311f57d360 DoS to prevent online snippet fetch: allow read from cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7508 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-21 15:11:03 +00:00
orbiter
5892fff51f introduction of dht-burst modes: this can expand the number of target peers in some cases where a better heuristic is needed. The problematic cases are either when a muti-word search is made (still a hard case for our term-oriented DHT) or when a network operator wants that all robinson peers are asked. We therefore introduced two new network steering values that switch on more peers during the peer selection. Because the number of peers can now be very large, the number of maximum httpc connections was also increased.
Please see new coments in yacy.network.freeworld.unit for details of the new DHT selection methods.
The number of maximum peers is now not fixed to a specific number but may increase with
- the partition exponent
- the number of redundant peers
- the robinson burst percentage
- the multiword burst percentage
The maximum can then be the number of senior peers (all visible peers).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-13 17:37:28 +00:00
orbiter
4588b5a291 - fixed document number limitation for crawls that restrict the number of documents per domain
- some restructuring of the document counting and logging structures was necessary
- better abstraction of CrawlProfiles
- added deletion of logs to the index deletion option (if the index is deleted using the servlets) which is necessary to reset the domain counters for the page limitation
- more refactoring to get the LibraryProvider more clean
- some refactoring of the Condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-12 00:01:40 +00:00
orbiter
0cdfb82963 replaced more appearance of double values by float values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 00:06:29 +00:00
orbiter
88773e4daa changed the default port from 8080 to 8090
see also: http://forum.yacy-websuche.de/viewtopic.php?p=21683#p21683

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-28 10:54:13 +00:00
orbiter
efb4ca8fa8 modified auto-delete of search failure-words:
- words are now not deleted from the search index automatically if index receive is switched off
- a flag in the network definition defines if this feature is switched on at all
- the search filter for not-found word references is switched off for server-side remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-22 09:46:00 +00:00
orbiter
c93f4dda72 - cleaned up yacy news
- removed unused methods
- avoid news generation in case that the peer runs in robinson mode

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-01-12 00:00:14 +00:00
orbiter
a4c9d27287 - moved some variables from Stwitchboard to new class AccessTracker
- added a limitation in access tracking to delete queries which are older than 10 minutes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-29 01:54:27 +00:00
orbiter
b2ed4cfaf8 more small bugfixes and light refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 01:57:05 +00:00
orbiter
1b6702146f remove '*' from query string (people believe that this is a wild card)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-28 01:27:36 +00:00
orbiter
4565b2f2c0 removed the display option from index.html, yacysearch.html and yacyinteractive.html
instead, a setting at ConfigPortal.html can be made to define if the topmenu shall be shown at these pages or if there is no naviagtion at all. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-08 10:50:23 +00:00
orbiter
db99db4be9 some redesign of the search-fail-response mechanism:
when a search fails for a single url because the snippet cannot be generated, then the url reference is deleted from the index. This mechanism was redesign and enhanced. The process now also writes into the work tables into the table searchfl to prepare a re-indexing mechanism.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-06 14:34:58 +00:00
orbiter
18d33b5c6d fixed several search result navigation bugs
fixed bad behaviours during search result collection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-05 23:54:00 +00:00
orbiter
49b5a206cd - better caclculation of search result size
- predefined search recommendations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-02 12:19:59 +00:00
orbiter
f0651e5f2f added image search to yacyinteractive.html
this causes that the search result view switches from list format to image preview format when a search is restricted to png, gif or jpg documents

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-01 18:48:21 +00:00
orbiter
cc6499bf8d - added http://blekko.com as search heuristic (like scroogle). This was easy since they deliver their search results also as rss feed
- renamed YaCys search result modifications keywords for RECENT, NEAR and language: to the blekko slashtag naming scheme. YaCy now supports the following blekko-like slash built-in slashtags:
/date
 - for search results ordered by date (most recent up)
 /near
 - for search results where search words appear near to each other (closest up)
 /language/<lang>
 - for a sorting by language where the wanted language gets up. Example: /language/de
  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 18:08:20 +00:00
orbiter
d4a1a1850b removed warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-29 07:52:10 +00:00
low012
9b3fae9496 *) cleaning up the code a little bit
*) program to interface, not implementation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-28 02:57:31 +00:00
orbiter
7bb4b001ed - view image files from cache
- fixed generic header settings; affects CORS functionality

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-27 09:16:16 +00:00
orbiter
70c95608d4 Added CORS Access header for yacysearch.rss output
used some of the recommendations from Copro:
http://forum.yacy-websuche.de/viewtopic.php?p=21015#p21015
Original Request:
http://forum.yacy-websuche.de/viewtopic.php?p=20829#p20829

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-02 16:28:40 +00:00
orbiter
facfd204e9 added a parent configuration option.
see /ConfigPortal.html
requested here:
http://forum.yacy-websuche.de/viewtopic.php?p=21099#p21099

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-25 22:16:07 +00:00
orbiter
863065abc4 added user agent logging to access tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-18 08:09:59 +00:00
orbiter
de722090b5 enhancements in did-you-mean guessing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-12 09:45:15 +00:00
orbiter
a59c885ee0 autocomplete and did-you-mean can now understand _all_ languages and can generate suggestions in all languages and character types
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-12 08:36:33 +00:00
orbiter
45b1ab3d07 custom + generic skins:
- added a generic skin which is filled with actual color assignment using a servlet
- enabled css servlets
- added a generic color scheme in configuration file
- added configuration input in Customization/Appearance servlet
- added a jquery color picker widget
- placed color picked widget to input field of generic colour definition input fields

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-11 00:00:10 +00:00
orbiter
0d363a94d7 more performance hacks
this makes YaCy search results VERY fast for all verify=false search cases
and it enhances the search speed also for all other snippet-fetch cases.
With this change my peer performed 100 Queries Per Second (!!!) while doing 10 queries simultanously (!!!)
in an intranet index of 20000 URLs on my 16-core Mac

Check this yourself by doing:
cd bin
./searchtestmulti.sh
after finishing the run, divide 1000 by the given time per query (which is the qps for one thread)
and then multiply again by 10 (because 10 search threads has been started)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-09 08:55:57 +00:00
orbiter
10a9cb1971 simplified snippet computation process and separated the algorithm into two classes
also enhances selection criteria for best snippet line computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-22 20:50:02 +00:00
orbiter
84a023cbc8 fixed several search bugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-21 21:48:42 +00:00
lotus
937dd956d3 save default number of search items via web interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-21 19:45:49 +00:00
low012
f32bb5e51f *) Changed image in Steering.html from linked image to embedded image because shutdown is so fast now, browsers can't load image before Yacy instance is gone already. Had to make image smaller since IE does not accept large Base64 encoded images.
*) Decreases wait time in Steering.html before first check since 
*) HTML fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7165 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-18 00:18:52 +00:00
orbiter
906c572621 - enhanced index create menu structure
- clear search log caches each time a search is done

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7142 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-14 09:06:27 +00:00
orbiter
64860dc1bb enhanced search event logging (to be used for further improvements)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-13 09:33:04 +00:00
orbiter
34a25856a5 - added navigation to next/prev search page using arrow keys (left/right)
- better information text for YaCy GUI application

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-10 10:42:01 +00:00
lotus
b73ea6581d fix json in case of query includes "
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-08 11:54:25 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00