Michael Peter Christen
22e1f68c0b
solrj user authentication patch
2012-04-27 17:53:45 +02:00
Michael Peter Christen
09484955dc
added new entry class for embed tags
2012-04-27 17:48:51 +02:00
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
2012-04-27 17:46:08 +02:00
Michael Peter Christen
a6d60fc21f
concurrency enhancement in ConfigurationSet
2012-04-27 17:20:18 +02:00
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
2012-04-27 16:48:51 +02:00
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
2012-04-27 14:18:02 +02:00
Michael Peter Christen
7860c1df80
fix needed for new solrj library
2012-04-27 14:13:59 +02:00
Michael Peter Christen
248299d10f
updated solrj lib
2012-04-27 11:22:34 +02:00
Michael Peter Christen
0e13022147
- enhanced solr field documentation
...
- added xml api button to IndexFederated_p - the solr schema.xml file
can be generated by YaCy
2012-04-26 15:25:07 +02:00
Michael Peter Christen
08dcf3e5d1
hack to get all results if the actual number is between 10 and 64
2012-04-26 00:27:21 +02:00
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
2012-04-26 00:23:18 +02:00
Michael Peter Christen
5c66880be2
fix for search result selection in case that contentdom is not set
2012-04-26 00:04:23 +02:00
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
2012-04-24 16:07:03 +02:00
Michael Peter Christen
3bea25c513
increased image preview size
2012-04-24 16:04:13 +02:00
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
2012-04-24 12:55:58 +02:00
Michael Peter Christen
f5efdb21fd
refactoring
2012-04-24 12:54:41 +02:00
Michael Peter Christen
4aa0eedead
one more scroogle...
2012-04-24 12:05:37 +02:00
Michael Peter Christen
347612ddd4
removed scroogle parser
2012-04-24 12:04:44 +02:00
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
2012-04-22 00:04:36 +02:00
Michael Peter Christen
8a08c96a82
removed dependency from logging
2012-04-21 21:32:31 +02:00
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
2012-04-21 21:31:13 +02:00
Michael Peter Christen
a5d7da68a0
refactoring: removed dependency from switchboard in Balancer/CrawlQueues
2012-04-21 13:47:48 +02:00
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
2012-04-21 13:34:07 +02:00
Michael Peter Christen
8429967ea7
no more SVN
2012-04-19 13:29:08 +02:00
Michael Peter Christen
0466bb0ddf
no more SVN..
2012-04-19 13:28:12 +02:00
Michael Peter Christen
4844e124b1
one more warning in case that crawling is paused because of low disk
...
space
2012-04-19 12:35:11 +02:00
Michael Peter Christen
0ec2713af8
'download'
2012-04-19 11:50:24 +02:00
Michael Peter Christen
2be327b5ab
update location update
2012-04-19 11:49:43 +02:00
Michael Peter Christen
f30c577fdb
add hint to speed up search results
2012-04-19 11:11:14 +02:00
Michael Peter Christen
6b133de3e9
add hint for consulting support
2012-04-19 11:10:48 +02:00
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
2012-04-18 10:30:04 +02:00
Michael Peter Christen
eb2c8ffa62
display is not used any more
2012-04-17 12:30:14 +02:00
Michael Peter Christen
91a86f0b06
fixed to network graph testing
2012-04-17 11:46:14 +02:00
Michael Peter Christen
f31ad84d98
automatic generation of blacklist pattern, see
...
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=2685&p=25305#p25305
2012-04-17 11:22:19 +02:00
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
2012-04-16 23:43:50 +02:00
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
reger
06951ef751
remove heuristic scroogle from search option help text in index.html
2012-04-16 04:00:04 +02:00
Michael Peter Christen
e377092198
fix to xml output format
2012-04-13 09:02:18 +02:00
Michael Christen
41be98dc9d
extended webstructure api to show together with incoming links also
...
outgoing links
2012-04-13 11:53:34 +02:00
Michael Christen
02e4dedff2
fix to url citation collection
2012-04-13 11:52:59 +02:00
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
2012-04-13 07:09:15 +02:00
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
2012-04-13 06:47:33 +02:00
Michael Christen
8f89c8ef07
added information about inbound, outbound and citation links into
...
yacydoc api servlet
2012-03-31 07:38:49 +02:00
Michael Christen
71649a1296
added an api to retrieve the new citation.index with the
...
webstructure.xml api. This api will respond with details about a single
URL if requested with 'webstructure.xml?about=[url|urlhash|host]'.
2012-03-29 17:22:31 +02:00
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
2012-03-29 17:20:14 +02:00
Michael Christen
22f05c83ff
fixed default must-match filter for full domain crawls - the old filter
...
was to restrictive and did not allow intranet crawls
2012-03-28 21:50:00 +02:00
Lotus
3e61287326
some better feedback on properties change
2012-03-25 22:21:42 +02:00
Lotus
96ac95cff9
added hint how to change integration options
2012-03-23 17:02:50 +01:00
Thomas
4f61b8fd82
Fixes for compare-search
2012-03-21 21:43:47 +01:00