Commit Graph

180 Commits

Author SHA1 Message Date
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
orbiter
594ed63f2a fixed interactive search which caused an error if pubDate is not present
in a search result
2013-02-16 20:33:27 +01:00
Michael Peter Christen
de58043205 Added image license generation for solr image search results when
results are generated within yjson result writer. This makes it possible
to view images in yacyinteractive from solr.
2013-02-13 00:33:53 +01:00
Michael Peter Christen
02fa31b5bf better filesearch layout 2013-02-12 12:21:29 +01:00
Michael Peter Christen
e55ec3071d reduced number of facets in yacyinteractive (only filetype necessary) 2013-02-12 12:00:54 +01:00
Michael Peter Christen
c34af7fe94 extended JSON Response Writer and Opensearch Response Writer for the
Solr search interface in such way that it is possible to use this
interface for the yacyinteractive search. This search interface is now
much faster using the Solr search directly. For the Solr interface it
was necessary to create a translation from the YaCy search modifiers to
the Solr facet selection. This was added in such a way that it becomes
generic for the normal YaCy search and as a on-top evaluation for Solr
queries.
2013-02-12 03:42:46 +01:00
Michael Peter Christen
e1f89efd0d - made image search in interactive search using the ViewImage servlet -
that enables viewing of images for intranet SMB servers.
- added a filter search for protocol, tld and ext again; otherwise p2p
search produces a lot of rubbish
2012-12-26 21:25:27 +01:00
Michael Peter Christen
7ad5457db0 using the solr facets as navigation in yacyinteractive.html instead of
counting locally result types
2012-12-19 00:59:40 +01:00
Michael Peter Christen
b7004043ea - added a field cache for solr queries which call only for a single
value
- fixed a version conflict exception within a solr add request
2012-11-24 22:30:05 +01:00
Michael Peter Christen
86ec199126 using a better file name 2012-11-07 16:39:49 +01:00
apfelmaennchen
d31a632951 - added dmoz RDF dump importer
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
2012-09-09 09:53:58 +02:00
Michael Peter Christen
6fc5400f91 added a tooltip for search navigation to mention that search pages can
be navigated using the TAB key
2012-08-20 13:02:29 +02:00
sixcooler
f64e78497a fix for reload-feature in Crawler_p 2012-06-14 02:13:23 +02:00
cominch
a120ef660b RDF demo servlet 2012-06-10 13:02:11 +02:00
Michael Peter Christen
638390930d another patch to fix the Crawler_p layout 2012-05-25 15:56:21 +02:00
Michael Peter Christen
c846e9ca14 redesign of the crawler monitor page: show crawled pages instead of
queue of urls that shall be crawled
2012-05-25 01:45:38 +02:00
Michael Peter Christen
08dcf3e5d1 hack to get all results if the actual number is between 10 and 64 2012-04-26 00:27:21 +02:00
Michael Peter Christen
f8cd57c92f new indexing strategy: ALL links that appear anywhere are indexed, not
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
fa7b3481b3 better navigation in file search: less results by first try, but much
faster. after the first search is done, buttons appear to get more
results for the same search
2012-02-26 17:32:45 +01:00
Michael Peter Christen
6e51a00a2f Revert "fix for page navigation: show only as much pages as are available for given navigation constraints, not as given by total results size"
This reverts commit 73f5a9e8b3.
2012-02-24 02:46:56 +01:00
Michael Peter Christen
73f5a9e8b3 fix for page navigation: show only as much pages as are available for
given navigation constraints, not as given by total results size
2012-02-24 02:31:03 +01:00
Michael Peter Christen
9ad1d8dde2 complete redesign of crawl queue monitoring: do not look at a
ready-prepared crawl list but at the stacks of the domains that are
stored for balanced crawling. This affects also the balancer since that
does not need to prepare the pre-selected crawl list for monitoring. As
a effect:
- it is no more possible to see the correct order of next to-be-crawled
links, since that depends on the actual state of the balancer stack the
next time another url is requested for loading
- the balancer works better since the next url can be selected according
to the current situation and not according to a pre-selected order.
2012-02-02 21:33:42 +01:00
apfelmaennchen
c7f88f3fd1 fix for http://bugs.yacy.net/view.php?id=101 - the default crawl
depth for bookmarks is now editable.
2012-01-12 23:30:23 +01:00
Michael Peter Christen
f214f6ebb4 added no-load queues to the crawler monitor 2012-01-07 17:17:11 +01:00
Michael Christen
1cf0f35621 the link to the path shall be the path 2011-12-28 01:12:44 +01:00
apfelmaennchen
77317a88e0 Added nice jquery tagsinput to bookmarks dialog - similar to delicious.com ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-12-03 00:07:07 +00:00
orbiter
9b0879c184 added a hint that the interactive search is only searching in the local index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-27 16:17:31 +00:00
orbiter
5b2e68b60d fixed page navigation counter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-27 15:38:37 +00:00
apfelmaennchen
77a080ced9 smaller fixes for YMarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 17:33:03 +00:00
apfelmaennchen
dd1482aaf5 further update to YMarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 00:32:18 +00:00
apfelmaennchen
564374d1fe - included YMarks in addition to old bookmarks in yacysearchitem.html; don't get confused by the old bookmark dialog, the ymark is automatically added silently beforehand.
- reworked bookmark creation on crawlstart
- many smaller adjustments to ymarks


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-22 23:50:49 +00:00
apfelmaennchen
6287c2b4a9 YMarks:
- introduced tag manager - a quite powerful tool (still not 100% stable, so be careful)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-20 22:42:15 +00:00
apfelmaennchen
5581be12fb YMarks:
- added backend and api for tag management


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-19 20:59:21 +00:00
apfelmaennchen
a3eebfdcba YMarks:
- show active/running crawls
- execute crawls (works currently only if API entry is available)
- various smaller fixes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-17 23:11:27 +00:00
apfelmaennchen
4f95f72124 YMarks:
- working direct importer for YaCy Crawl Starts
- working direct import for old bookmarks.db

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 23:10:53 +00:00
apfelmaennchen
a8dfe787ed - updated to jquery flexigrid 1.1
- YMarks.html automatically  recognizes if a bookmark is a crawl start


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 21:45:17 +00:00
orbiter
f8b8c82421 - refactoring of getpageinfo_p.xml (moved out of util)
- added more logging in getpageinfo_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:22:40 +00:00
orbiter
ff32469272 added a link to /api/util/getpageinfo_p.xml as API to crawl start info and to ViewFile.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8035 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-14 20:19:41 +00:00
apfelmaennchen
5f7dbe1c42 - some refactoring (ymarks)
- improvement for autotagger (is now able to create/detect  multi word tags e.g. 'open source')



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-13 23:19:47 +00:00
orbiter
2adc30d335 suppressing size if size unknown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-27 23:21:39 +00:00
orbiter
b5b09b329c BOOSTED the image search function. The result page now shows the images as embedded image link from the original source and not from the
built-in image buffering and re-sizing servlet. The result is shown much faster now not because YaCy does not need to re-size the images but
for a very strange other reason: because of RFC specification (http://tools.ietf.org/html/rfc2616#section-8.1.4) a browser does not open more than
two connections to the same server at the same time. If the YaCy image servlet is used, then the target host is the YaCy host for all images
and that prevents a parallel computation of the image loading.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7998 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-12 22:59:58 +00:00
orbiter
30d340563e fix in result count display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-21 11:01:01 +00:00
orbiter
e48ce5d80e - style change for search box: larger font, selected by default
- style change for search results: by default no parser, size, image info

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 09:05:06 +00:00
orbiter
b0b4886618 try to avoid the unresolved pattern in search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-08 18:47:00 +00:00
orbiter
656286347e fix for javascript error during search (not ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-03 07:10:47 +00:00
orbiter
0229029dcf a bit protection against search result bugs in interactive search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-01 16:08:33 +00:00
orbiter
ca09081341 better interaction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:13:34 +00:00
orbiter
8e03b8ee8b better integration of server list in interactive search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 12:25:45 +00:00
orbiter
594d8f546a #cccamp11 maintenance fix: anons may find up to 1000 items in interactive search (was: 100)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-11 21:37:35 +00:00
orbiter
115abc8917 - more attributes for search progress bar
- moved cache strategy to cora package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7778 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-13 21:44:03 +00:00