Commit Graph

455 Commits

Author SHA1 Message Date
Michael Peter Christen
b4f0cac102 added the reindexing job servlet to the submenu structure 2013-05-20 11:02:21 +02:00
Michael Peter Christen
f965d04496 added new peer icons for Mentor peers and Mentee peers (not used yet) 2013-05-10 17:33:02 +02:00
Michael Peter Christen
1b102d98d8 - added index deletion to index administration submenu
- added index deletion processes to the process scheduler/recorder
2013-04-30 02:11:28 +02:00
Michael Peter Christen
e4f7e5bcfe fixed bad css change 2013-04-28 20:09:45 +02:00
Michael Peter Christen
25499eead5 - added a new field for the regular expression in crawl start
- added the field in crawl profile
- adopted logging end error management
- adopted duplicate document detection
- added a new rule to the indexing process to reject non-matching
content
- full redesign of the expert crawl start servlet
The new filter field can now be seen in /CrawlStartExpert_p.html at
Section "Document Filter", subsection item "Filter on Content of
Document"
2013-04-26 10:49:55 +02:00
reger
40b3f2c5fe comment out dead menue link 2013-04-06 02:34:56 +02:00
Michael Peter Christen
addba047e2 changes in ranking computation
- an existing ranking servlet for solr was extended. It is now possible
to set boost values for fields, boost functions and boost queries.
- The ranking can have different instances, but currently only the first
one is used
- added an abstraction layer for fields which can be used for search and
those fields can be edited in the solr ranking configruation
- the ranking value from solr within the field score is used to combine
remote search requests, which all are created using the same locally
defined boost values
- reduced the number of fields which are used for search (makes it
faster)
- replaced some text fields by string fields (makes indexing faster)
- removed classes which had no use
- made a large number of experiments for a better ranking and created a
temporary setting which prefers hits inside titles
- adjusted also the RWI-based ranking computation to 'prefer title'
- made special cases like for portal search where no post-processing and
post-ranking is wanted: this keeps the original ranking order as done by
Solr
- fixed many bugs with old settings for ranking
2013-03-13 14:47:00 +01:00
Michael Peter Christen
56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph
index (new solr core webgraph) .. this is now off by default
- completely redesigned this servlet
- added description how to attach a remote solr
- adjusted naming of servlet and menues
- moved 'lazy initialization' attribut from IndexSchema to
IndexFederated (this is a general option) back again.
2013-02-24 18:09:34 +01:00
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
b6de1f42dc Full redesign of solr connection architecture. This was done to support
multiple solr cores instead of just one. Therefore it is now necessary
to distuingish between solr server connections (called an 'Instance')
and a connection to a single solr core. One Instance may now have
multiple connector classes assigned to it, each connecting to a single
core.
To support multiple cores it is also necessary to distinguish between
the connection configuration and the configuration of the index schema.
We will have multiple schema configurations in the future, each for
every solr core. This caused that the IndexFederated servlet had to be
split into two parts, the new Servlet for the Schema editor is now in
the IndexSchema Servlet.
2013-02-15 01:38:10 +01:00
Michael Peter Christen
51e7ab4f70 moved bookmarks back to more prominent location (even if this does not
fit to the 'Search Interfaces' headline)
2013-02-09 06:57:20 +01:00
reger
3b6e08b49f prevent checking of urldb if empty
- disconnect urlIndexFile if empty
- add missing lock class in submenuSearchConfiguration
2013-01-12 15:20:23 +01:00
reger
f143804382 fix configuration for search page navigators
- added additional config page (ConfigSearchPage_p) for easy setup of search page layout (to not overload ConfigPortal page)
   - currently redundant setting with part of ConfigPortal page
- added missing config for filetype and protocol navigator
- adjusted init of SearchEvent to check navigation config setting
- renamed RankigProcess.getTopicNavigator to getTopics (to distiguish between added SearchEvent.getTopicNavigator)
2013-01-05 19:00:54 +01:00
Michael Peter Christen
8ae08a2cac moved HTCache, Heuristics and Parser servlet to a more appropriate menu
location
2013-01-03 01:27:16 +01:00
Michael Peter Christen
908ad2f174 Added a new servlet to configure the solr ranking using field boosts 2012-12-03 17:01:19 +01:00
Michael Peter Christen
a598fb6227 renamed Ranking_p.html to RankingRWI_p.html
because there will be another Ranking servlet as well at next
2012-12-03 00:01:41 +01:00
Michael Peter Christen
074dfd297b added icons and a selection for hosts with urls pending for crawler or
with errors
2012-11-09 16:24:56 +01:00
Michael Peter Christen
4c4e0eece2 added new submenu 'Target Analysis' with three servlets which are useful
to analyse the target servers: robots.txt table, mass target analysis
and a regex tester
2012-11-07 21:26:01 +01:00
Michael Peter Christen
29fbbb49dc better colors for host browser and corrected document count 2012-11-07 12:23:21 +01:00
Michael Peter Christen
51f420e4f5 removed location search because it is only working in special cases 2012-11-07 02:04:41 +01:00
Michael Peter Christen
d481abd087 added the visualization of error-urls to host browser
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
2012-11-06 00:29:37 +01:00
Michael Peter Christen
a15819fbec fix for some interface problems 2012-11-05 22:14:52 +01:00
Michael Peter Christen
64ac2b7b7d new submenu template 2012-11-05 15:36:42 +01:00
Michael Peter Christen
5e77801aac update to web interface structure 2012-11-05 15:23:03 +01:00
Michael Peter Christen
40df2fd193 added the host browser as link to search results. that means you can
select a browsing position after a search is done on the search results.
2012-11-01 21:38:05 +01:00
Michael Peter Christen
ce3fed8882 added the Google Search Appliance (GSA) api interface to the main menu.
See:
https://developers.google.com/search-appliance/documentation/68/xml_reference#request_overview
2012-10-30 12:27:22 +01:00
Michael Peter Christen
3d3d654e88 if a network configuration is choosed which does not allow DHT and no
P2P communication is in robinson mode) then some menu entries are
disabled which have no use in this mode.
2012-10-29 01:51:19 +01:00
Michael Peter Christen
1baf498d59 - show more lines in online log
- reverse order is default now
2012-10-25 18:38:39 +02:00
Michael Peter Christen
cc98496ff3 enhanced the HostBrowser:
- showing also outbound links to other domains if there are any
- the outbound links browser shows also the link structure image
- showing even inbound links if the web structure graph has information
about that
- removed the left menu and made the HostBrowser a part of the top menu
for search
- moved the file search also to the top menu
- added hover information in the HostBrowser to explain what the click
means
- because the HostBrowser also links to the Metadata viewer ViewFile,
there should be a button to switch back to the HostBrowser: added that
also.
2012-10-16 17:13:18 +02:00
Michael Peter Christen
abebb3b124 added a crawl start checker which makes a simple analysis on the list of
all given urls: shows if the url can be loaded and if there is a robots
and/or a sitemap.
2012-10-10 02:02:17 +02:00
Michael Peter Christen
941873fba4 moved the index deletion functions from IndexControlRWIs to
IndexControlURLs where it appears more naturally. Because the RWI
administration is less important in the presence of Solr, the
IndexControlURL is now the default servlet when the Index Administration
button on the main menu is selected.
2012-10-10 00:09:27 +02:00
orbiter
be4c96f3b1 The HostBrowser now offers to index files that are discovered because
they are linked in the web interface.
2012-09-30 13:23:06 +02:00
Michael Peter Christen
c4a3d8870f fixed computation of links in host browser which are not indexed but
knwon by the crawler. Such links are now displayed in grey color.
2012-09-29 02:13:11 +02:00
Michael Peter Christen
97a47319c8 added nice links to the host browser:
- click on the file icon to get the metadata of the file
- click on the link icon behind the link to open the original file in
the browser
2012-09-28 23:09:21 +02:00
Michael Peter Christen
f45f7fc12e added new Host Browser to main menu:
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
Michael Peter Christen
a30653a864 added a regular expression test servlet which is linked within the
parser/crawler error page whenever a problem with regular expression
occurs.
This makes it easy to correct and enhance the must-match and
must-not-match patterns just by trying out which pattern could be
correct.
2012-09-14 12:04:54 +02:00
Michael Peter Christen
4b36a2c3b4 small style changes 2012-09-04 11:23:41 +02:00
Michael Peter Christen
174530a9e0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-03 00:46:17 +02:00
apfelmaennchen
43f3a932fd removed jquery.slider as it is already included as part of jquery-ui
package
2012-09-01 14:17:20 +02:00
apfelmaennchen
a01eb1b7fe removed unused jquery plugin slider as it is part of jquery-ui package 2012-09-01 10:25:22 +02:00
cominch
dc468dad01 add content control features for custom filter lists 2012-08-29 09:04:28 +02:00
orbiter
7ac259477f added a direct access to solr search api to enhance the visibility if
the embedded solr
2012-08-24 23:04:19 +02:00
Michael Peter Christen
3bcd9d622b cleaned up classes and methods which are either superfluous at this time
or will be superfluous or subject of complete redesign after the
migration to solr. Removing these things now will make the transition to
solr more simple.
2012-07-25 14:31:54 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
cominch
c63c3a4495 Show additional interaction elements in footer section on each page, if
activated in ConfigPortal.html.
This footer is also visible in augmented browsing proxy mode.
2012-06-20 18:04:23 +02:00
cominch
011f8a5818 Auto Tagging: Add hyperlinks to tags (provisional) 2012-06-19 01:24:06 +02:00
Michael Peter Christen
fbded1f466 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-16 12:42:43 +02:00
Michael Peter Christen
e806106b10 jquery bugfix 2012-06-16 08:25:28 +02:00
Michael Peter Christen
a0f1decd82 - added loading of the dbpedia pnd triplestore in the dictionary loader
- renamed the dictionary loader to knowledge loader
- some refactoring in the library provider method names
2012-06-15 19:19:18 +02:00