Commit Graph

2145 Commits

Author SHA1 Message Date
Michael Peter Christen
85b1922244 activated image type navigation for image search 2013-09-03 13:34:01 +02:00
Michael Peter Christen
9e12fdff23 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-09-03 12:22:57 +02:00
Michael Peter Christen
ab1201fdfd fixed wrong facet count 2013-09-03 12:22:29 +02:00
Michael Peter Christen
049c3b3f2e added an option to exclude image search results from text search. This
is on by default.
2013-09-03 11:14:23 +02:00
Michael Peter Christen
69f85265e1 added an option to put image links to the crawl queue and handle these
like normal documents. Using this option (by default on at this moment;
this might change soon) it is possible to get the exif data into the
search index to be used in image search.
2013-09-03 11:13:45 +02:00
Michael Peter Christen
e8e558a9b7 fix for content domain classification in URIMetadataNode 2013-09-03 10:49:09 +02:00
Michael Peter Christen
a8c5bfcf58 avoid to create unnecessary objects 2013-09-03 09:48:05 +02:00
Michael Peter Christen
5a0de1b77d moving image description text to image text field 2013-09-03 09:47:27 +02:00
Michael Peter Christen
dc179bd61f fix for catchall query goal for image search 2013-09-03 07:55:21 +02:00
reger
392174de8c remove all_words, all_strings lists from QueryGoal
- only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only
2013-09-02 23:09:43 +02:00
Michael Peter Christen
169ef8963d one more fix for image search 2013-09-02 20:02:26 +02:00
Michael Peter Christen
cb85b22725 redesign of the image search process (with much better results,
unfortunately the index schema has changed and p2p image search will not
be muchmuch better until many people update)
2013-09-02 18:55:38 +02:00
reger
29967102a2 optimized QueryGoal (reducing mem and computation by removing all_hashes)
- all_hashes used for text highlighting and word distance computation which can be done with include_hashes only
2013-09-02 04:19:53 +02:00
orbiter
f106345eef link strings should not be tokenized 2013-09-01 14:35:36 +02:00
orbiter
deadeb406e image alt tag strings should be tokenized 2013-09-01 13:48:10 +02:00
reger
d0e78082d1 return field names in index instead of in schema for SolrServerConnector.getFields 2013-08-31 06:25:12 +02:00
Michael Peter Christen
1a3e42eca4 index migration to lucene 4.4 2013-08-26 12:49:39 +02:00
Michael Peter Christen
a88a62f7aa added a feature to set a collection for a crawl result based on a
regular expression on th url: the collection attribut for a crawl start
may be now either a token or a list of tokens, seperated by ',' where a
token is either a string or a pair <string,pattern> where the string is
separated to the pattern with a ':' and the string is assigned to the
document as collection only if the pattern matches with the url.
2013-08-25 00:13:48 +02:00
Michael Peter Christen
3c5abedabf NPE during shutdown fix 2013-08-24 23:36:50 +02:00
Michael Peter Christen
e4cbe9232d fixed a crawler bug where a double-occurring url was not re-crawled
because the double-check error was written to the error-db and never
deleted. No the error-db is cleared on every start and these
double-messages are not written to the error-db any more.
2013-08-22 15:56:09 +02:00
Michael Peter Christen
765943a4b7 Redesign of crawler identification and robots steering. A non-p2p user
in intranets and the internet can now choose to appear as Googlebot.
This is an essential necessity to be able to compete in the field of
commercial search appliances, since most web pages are these days
optimized only for Google and no other search platform any more. All
commercial search engine providers have a built-in fake-Google User
Agent to be able to get the same search index as Google can do. Without
the resistance against obeying to robots.txt in this case, no
competition is possible any more. YaCy will always obey the robots.txt
when it is used for crawling the web in a peer-to-peer network, but to
establish a Search Appliance (like a Google Search Appliance, GSA) it is
necessary to be able to behave exactly like a Google crawler.
With this change, you will be able to switch the user agent when portal
or intranet mode is selected on per-crawl-start basis. Every crawl start
can have a different user agent.
2013-08-22 14:23:47 +02:00
Michael Peter Christen
0f3d8890db removed an assert which causes a shortcut call circuit 2013-08-22 10:12:25 +02:00
Michael Peter Christen
6d5fefe060 added missing files :( 2013-08-20 16:31:34 +02:00
Michael Peter Christen
554c0351dd fix for http://bugs.yacy.net/view.php?id=286 2013-08-20 16:10:26 +02:00
Michael Peter Christen
47b1c81d08 - refactoring
- generalized writing of url attributes to solr documents
- added more url attributes to error documents
2013-08-20 15:46:04 +02:00
Michael Peter Christen
1c62fa7698 fix for bad snippets in gsa api 2013-08-18 10:37:25 +02:00
Michael Peter Christen
697613170d less logging for postprocessing (this was a debugging logging with high
CPU load)
2013-08-17 09:25:32 +02:00
reger
b4016ff324 - remove possible double initialization of rdfa parser
- use ordered list to use preferred parser for mime/extension first (relates to html, rdfa, argument parser)
- harmonize xhtml extension config for the 3 html base parsers
2013-08-14 21:12:10 +02:00
reger
f0575bd44b FieldReIndex: omit active vocabulary fields from reindex detection 2013-08-14 00:00:30 +02:00
reger
a5019bc470 make Vocabulary Navigator tags a hard result entry filter
by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query)

TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.
2013-08-13 03:07:25 +02:00
reger
a67a4b7d86 improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org) 2013-08-12 21:20:23 +02:00
reger
02fe8b43ba Field Re-Indexing: display list of fields in reindex queue
change servlet to display statistic on 1st click (instead after refresh)
2013-08-11 04:51:29 +02:00
sixcooler
7f501b7c38 clear some caches before reporting low Memory
do not break lines in Network-table-rows
2013-08-08 14:38:26 +02:00
reger
b355dd52c6 Index Administration - Field Re-Indexing: exclude internal Solr _version_ field from obsolete field check 2013-08-08 00:55:21 +02:00
sixcooler
8a96140f92 fix / workaround for
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750
+ Seed.hash should be final
2013-08-01 16:40:58 +02:00
Michael Peter Christen
2857499467 fix to collection schema; bug appeared for _txt fields with empty String
as content
2013-07-31 13:32:05 +02:00
Michael Peter Christen
dbfa865700 added a stub of a class for crawler redesign 2013-07-31 13:16:32 +02:00
Michael Peter Christen
76afcccaaf fix for default boolean post values: the default value MUST NOT be TRUE,
because it's normal that a boolean value is missing in the post argument
if a checkbox is not selected.
Added also some style enhancements to IndexFederated, removed the Solr
attachment manual and replaced it with a link to the wiki which explains
this in more detail.
2013-07-31 10:49:26 +02:00
orbiter
252c525709 fixed feed api servlet and and enhanced RSSReader class 2013-07-31 06:18:30 +02:00
orbiter
d38c3c14d8 fix for CGI test 2013-07-31 05:43:58 +02:00
Michael Peter Christen
31902f54df fix for NPE which happens within solr code at MultiMapSolrParams.java,
line 52 in case that the array arr.length == 0
2013-07-30 14:32:59 +02:00
Michael Peter Christen
f13df9dbb6 migration to solr 4.4.0 2013-07-30 14:01:16 +02:00
Michael Peter Christen
58fe986cca Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-30 12:49:14 +02:00
Michael Peter Christen
cf12835f20 replaced the single-text description solr field with a multi-value
description_txt text field
2013-07-30 12:48:57 +02:00
sixcooler
7d53ac86a3 fix for Blacklist (-Administration) 2013-07-29 19:09:28 +02:00
reger
f2d99053ed Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception)
(occured during testing while working on q=store:[* TO *])
2013-07-29 01:32:02 +02:00
reger
92d3f71b16 htmlParser: closes input stream -> changed it to leave it open for a reset (used by AugmentParser - even if this is practically not used),
note: stream.close is done by caller (Textparser.parseSource)
- removed unnecessary reset in AugmentParser
- added stream.mark in tdfatripleimpl. to make stream.reset work here
2013-07-28 03:41:09 +02:00
orbiter
87cfeaa4f3 fix for npe 2013-07-27 15:20:09 +02:00
orbiter
268a36aaff emergency fix for crawler: this will otherwise cause loss of complete
crawl queue if latency of remote system is too low
2013-07-27 11:59:07 +02:00
orbiter
d05e0c5368 wait a bit longer before doing the first peer ping 2013-07-27 11:00:35 +02:00
orbiter
b8f57f7703 don't be noisy when doing background tasks that may be allowed to fail 2013-07-27 10:51:58 +02:00
Roland Haeder
0343f0668c Fix for NPE:
E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in
serverInstantThread.job, thread
'net.yacy.search.Switchboard.cleanupJob': null; target exception: null
java.lang.NullPointerException
        at
net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116)
        at
net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897)
        at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107)
        at
net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)

Conflicts:
	source/net/yacy/search/schema/CollectionConfiguration.java
2013-07-27 10:19:46 +02:00
Roland Haeder
b58ca8622d Some cleanups:
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
2013-07-27 10:13:57 +02:00
Roland Haeder
7263bb82fb Fix for NPE on shutdown:
java.lang.NullPointerException
        at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732)
        at net.yacy.search.Switchboard.access00(Switchboard.java:207)
        at net.yacy.search.Switchboard.run(Switchboard.java:3049)
2013-07-27 09:55:43 +02:00
Roland Haeder
13433d41a1 Log this exception better
Conflicts:
	source/net/yacy/kelondro/blob/Tables.java
2013-07-27 09:54:51 +02:00
orbiter
080d80c9de do not write an empty failreason in case that there is no fail. Because
of the lazy instantiation rule this value was not actually written, but
if lazy instantiation is switched on, then this causes that all crawl
starts delete all crawl-start-hosts completely because this looks for
filled error reasons.
2013-07-26 17:53:28 +02:00
Michael Peter Christen
4c242f9af9 always use a default value for boolean options to have transparency for
the outcome if the attribute is missing in servlets
2013-07-25 12:17:29 +02:00
Michael Peter Christen
61e015268b fix in forced deletion: forced commit needed 2013-07-25 09:53:19 +02:00
Michael Peter Christen
83e2921b39 new test case for http://bugs.yacy.net/view.php?id=141 2013-07-25 09:31:48 +02:00
Michael Peter Christen
304aacb2cc fix for http://bugs.yacy.net/view.php?id=267 2013-07-25 09:26:24 +02:00
Michael Peter Christen
c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268 2013-07-25 09:21:37 +02:00
reger
aa1a1f1d2c - small adjustment to make sure genericParser is tried last
-- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser
      (like .ps .rdf files and other)
- remove redundant file extension registration
2013-07-23 20:24:13 +02:00
orbiter
3e901dcb06 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-23 19:33:07 +02:00
orbiter
f50b596e0b do not run dht ditribution if system load is over 2.5 2013-07-23 19:32:32 +02:00
orbiter
056b42f5aa - added information about segment count to status_p.xml
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
2013-07-23 18:03:33 +02:00
orbiter
6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 2013-07-23 16:46:44 +02:00
sixcooler
af740f3058 changed optimization to a segment-size of index-size/5.000.000
+ one if not idle
+ one (and force) if postprocessing
2013-07-23 14:21:12 +02:00
Michael Peter Christen
336f86394c replaced StringBuffer with StringBuilder 2013-07-23 12:21:27 +02:00
Michael Peter Christen
aeac2fb763 replaced more containsKey() -> get() usages by a simple get(), followed
by a test for NULL. This should increase the application speed and
reduces the lookup time for the affected methods by 50%
2013-07-23 12:16:51 +02:00
orbiter
5364c4dcc9 delayed first peer-ping to send the first ping out after the http got
up; if the ping comes before the http is up, it cannot be recognized as
senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266
2013-07-22 18:21:37 +02:00
orbiter
e24016e30a added the property federated.service.solr.indexing.timeout to yacy.init
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
2013-07-22 17:45:12 +02:00
orbiter
c124037f19 removed forced non-soft commits to prevent index fragmentation 2013-07-22 17:28:20 +02:00
Michael Peter Christen
31483c47e1 fixed problem with remote luke requests 2013-07-22 15:55:20 +02:00
Michael Peter Christen
c15aa758dc removed failreason_t removal patch because that causes too much
confusion using an external solr. to clean up the index after a schema
change, use the index cleaner function from the online servlet
2013-07-22 14:17:38 +02:00
reger
2b7a38640a extend content type detection on file extension for .tif .tiff .htm 2013-07-21 22:57:21 +02:00
Michael Peter Christen
ac1aad5064 added a getSegmentCount method and use it to disable optimize if wanted
current segment count is below optimization level
2013-07-18 14:31:42 +02:00
Michael Peter Christen
36035e0a0a - used reger's LukeRequest to generalize the index info in
SolrServerConnector
- used the LukeRequest in SolrServerConnector to replace the index size
method by a getNumDocs request to a LukeRequest result
2013-07-18 13:26:07 +02:00
Michael Peter Christen
39fceb5ccf fix for NPE & bug #264 2013-07-18 12:37:32 +02:00
Michael Peter Christen
735a66eff3 enhancements to crawler 2013-07-18 12:29:04 +02:00
Roland Haeder
be0ff6018f Removed trailing spaces + some more final 2013-07-17 18:44:24 +02:00
Roland Haeder
aaedc0405d Fixes and avoid of catching bad exceptions (some):
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
  already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call

Conflicts:
	source/net/yacy/repository/LoaderDispatcher.java
2013-07-17 18:37:34 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Felix Ableitner
03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file. 2013-07-17 16:42:10 +02:00
Michael Peter Christen
89c0aa0e74 added collection_sxt to error documents 2013-07-17 15:20:56 +02:00
Michael Peter Christen
0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-17 12:42:06 +02:00
Michael Peter Christen
1fd006cc56 fixes using the embedded connector 2013-07-17 12:41:54 +02:00
orbiter
d0dc86cf3d logging of deadlocks (if any) during cleanup process 2013-07-17 12:38:58 +02:00
Michael Peter Christen
c6a6f159e8 fix for crawl stack domain counter 2013-07-16 18:18:55 +02:00
Michael Peter Christen
93d1bac140 do a more frequent optimization, reduces IO after optimization 2013-07-16 17:16:48 +02:00
orbiter
b71d13a014 added load and deadlock detector in Memory util 2013-07-16 10:49:20 +02:00
orbiter
290e24564b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-14 17:41:32 +02:00
orbiter
5533fc8e01 fix for bug 260 2013-07-14 17:40:28 +02:00
Michael Peter Christen
b79471ee67 grr 2013-07-14 10:15:47 +02:00
Michael Peter Christen
a79f288ac1 automatically running optimize on solr if user/search is idle for some
time
2013-07-14 10:02:08 +02:00
orbiter
a9c8046c87 do a light optimization at the end of a crawl postprocessing 2013-07-13 19:09:46 +02:00
orbiter
a548354c71 replaced type of solr schema object sku of text_en_splitting_tight by
string
2013-07-13 18:54:09 +02:00
orbiter
2f1ec8d4a2 npe fix 2013-07-13 11:10:05 +02:00
Michael Peter Christen
bcc623a843 refactoring of load_delay: this is a matter of client identification 2013-07-12 16:24:56 +02:00
orbiter
0d0b3a30f5 activate api actions after postprocessing of crawls 2013-07-12 16:05:48 +02:00
orbiter
3978c5ca5d fix for http://bugs.yacy.net/view.php?id=255 2013-07-12 14:38:30 +02:00
orbiter
2be456e7fb added a postprocessing field into api/status_p.xml to show if the
postprocessing task is running at that time (status: busy) or not
(status:idle)
2013-07-12 14:29:22 +02:00
orbiter
dac88561ae minimum access time has a tight connection to ClientIdentification,
therefore it is defined there.
2013-07-11 17:04:24 +02:00
Michael Peter Christen
9a29ab469e another patch to prevent CLOSE_WAIT status on solr connections 2013-07-11 12:53:39 +02:00
Michael Peter Christen
5091d627bc fixed parsing of peer flags 2013-07-11 12:53:16 +02:00
Michael Peter Christen
87e9052081 added Connection:close to all http requests in our http client to
prevent CLOSE_WAIT states (as seen in lsof)
2013-07-11 11:54:11 +02:00
Michael Peter Christen
5c6946dd5f replaced usage of log4j by ConcurrentLog where possible 2013-07-09 14:42:39 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
orbiter
f4f6551c66 better handling of time-out at solrj in case that a commit is done in a
fail-over case during add
2013-07-09 11:01:37 +02:00
Michael Peter Christen
07261fe274 Merge remote-tracking branch 'nutomics/blacklist_structure' 2013-07-08 23:32:15 +02:00
Michael Peter Christen
dea71851d2 - better concurrency for network scanner
- network scanner can now start from the list of all hosts in the search
index
2013-07-08 16:29:30 +02:00
Michael Peter Christen
a34e137e27 fix for citation index generation in case that entry.referrerhash() is
null. This is especially the case if ftp sites are crawled
2013-07-08 16:26:11 +02:00
Michael Peter Christen
a2c8116a8f accept (but ignore) a '+' sign in front of search words 2013-07-08 16:20:40 +02:00
orbiter
9f0cc9b401 enhanced network scanner
- textarea input field can now be used to paste in a large list of hosts
- /31er subnet is possible (only one host)
- auto-detect subdomains for ftp and www subdomains
2013-07-08 13:17:09 +02:00
sixcooler
308d73f855 do not use remote proxy if not switched on - regardless of the proto 2013-07-04 19:16:13 +02:00
sixcooler
69906b1d2e Revert "do not use remote proxy if not switched on - regardless of the proto"
This reverts commit 20f452d228.
2013-07-04 19:13:51 +02:00
sixcooler
20f452d228 do not use remote proxy if not switched on - regardless of the proto 2013-07-04 19:12:50 +02:00
sixcooler
9551720d5c re-enable saved setting for proxy-crawl-profile 2013-07-04 19:10:57 +02:00
sixcooler
d5d8936f9d For indexes that are changing rapidly in NRT situations, fcs (stands for
Field Cache per Segment) may be a better choice than the default fc.
(saves memory)
see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
2013-07-04 19:08:53 +02:00
Felix Ableitner
44f8fcf62e Changed class structure of Blacklist. 2013-07-04 18:37:57 +02:00
Michael Peter Christen
57ffdfad4c added a crawl option to obey html-meta-robots-noindex. This is on by
default.
2013-07-03 14:50:06 +02:00
Michael Peter Christen
5a5d411ec0 new robots_i attribute fields 2013-07-02 14:29:13 +02:00
Michael Peter Christen
fa08bd9d5a hack to prevent long waiting times in crawler 2013-07-01 13:24:52 +02:00
Michael Peter Christen
f1c5338210 prepartion for greedy crawl profiles and refactoring 2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474 adding the canonical tag to crawl queues 2013-07-01 13:09:41 +02:00
reger
a6bf44212e bugfix: location (lat/lon) meta data retrival (Double.NaN check) 2013-06-30 03:50:07 +02:00
Michael Peter Christen
203921006a redesign of citation index storage 2013-06-30 02:11:46 +02:00
reger
83763ee4a4 jpeg parser: extract GPS location from meta data 2013-06-29 00:35:43 +02:00
Michael Peter Christen
32aa1d4569 removed unused option for queries 2013-06-28 15:32:36 +02:00
Michael Peter Christen
9d291764d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:03:25 +02:00
sixcooler
e5abccdfe4 added optimize-option 2013-06-28 14:51:37 +02:00
Michael Peter Christen
64140f35cd fix for solr requests if no query part is given (prevent npe) 2013-06-28 13:16:25 +02:00
Michael Peter Christen
8caaf6203a fixed false multiple-generation of remote facet search which
caused high cpu usage on remote side.
2013-06-28 12:39:36 +02:00
Michael Peter Christen
823ae4d6a7 added url_protocol_s to error documents 2013-06-26 16:51:36 +02:00
Michael Peter Christen
660a196989 refactoring 2013-06-26 09:27:22 +02:00
Michael Peter Christen
c4538d8d91 added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib 2013-06-26 09:26:34 +02:00
reger
3760e2616b bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments 2013-06-25 23:24:02 +02:00
Michael Peter Christen
9a6fcdf597 npe fix 2013-06-25 16:36:16 +02:00
Michael Peter Christen
16d1d744fa added url_file_name_s in default collection schema for the file name
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.

The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
2013-06-25 16:27:20 +02:00
reger
8d1c4c423d make imageparser fileextension detection case insensitive (extensions are often upper case) 2013-06-23 00:39:15 +02:00
Michael Peter Christen
f9d859f5dc now writing image alt texts and (camelcase-)parsed urls into a text
search field for a better image retrieval
2013-06-18 16:51:56 +02:00
Michael Peter Christen
e441a9d4c8 to avoid confusion, the gsa api is available at /search? and
/searchresult?
2013-06-18 16:22:06 +02:00
orbiter
8792e6c6e9 stub for better image indexing 2013-06-18 13:28:30 +02:00
orbiter
97f2ac9091 added hint to gsa response writer that the result comes from a yacy peer 2013-06-17 13:29:03 +02:00
Michael Peter Christen
14186e815e npe fix 2013-06-13 22:42:21 +02:00
Michael Peter Christen
bdf306e0a7 increased time-out for loading of seed-lists 2013-06-13 22:32:06 +02:00
Michael Peter Christen
374d2e2a52 removed warning message during crawling 2013-06-13 13:03:56 +02:00
Michael Peter Christen
570511f3c8 removed fields references_internal_id_sxt and
references_internal_url_sxt because they had been shown to be
superfluous. The citation of referrer in the host browser is possible
without them. Therefore now the host browser does not only show
internal, but also external referrer to each link.
2013-06-13 13:01:28 +02:00
Michael Peter Christen
fd1776a3b0 added a new 'Citations' function: each search result item can now be
explored for citations within other documents. A click on the
'Citations' link shows an analysis with all text lines in the document
each with a complete list of documents which contain the same line. A
second section shows the linking documents in ascending order of number
of citations from the original document. Because documents from
different hosts are most interesting here, they are listed at the top of
the page as possible 'copypasta' source.
2013-06-12 15:02:49 +02:00
Michael Peter Christen
fc3ff92c69 npe fix 2013-06-12 13:23:58 +02:00
Michael Peter Christen
1762911f57 added synchronizations and timeouts in solr api; missing
synchronizations in index modification methods causes deadlocks inside
solr.
2013-06-12 02:13:18 +02:00