yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
orbiter	ae246c30c3	fixed interpretation of directDocByURL attribute during crawl start	2012-10-09 23:11:31 +02:00
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	2012-09-03 15:26:08 +02:00
Michael Peter Christen	19efbf1b0f	- apply directDocByURL to NOLOAD Queue - choose pushing to NOLOAD as default for site crawl	2012-04-26 00:23:18 +02:00
Michael Peter Christen	8bfc987374	enhanced hint how to enter file:// urls	2012-02-24 02:14:54 +01:00
Michael Peter Christen	9aa73a13a8	stop words are on by default in site crawl. This causes normally nothing since the stopwords are empty by default.	2011-12-08 18:48:30 +01:00
orbiter	ebd840ebf6	- enhanced description on search front page - fixed language and heuristic modifier - added hint to crawl start that we can do also ftp and smb crawls - added a protocol extension to remote crawls to transport all search modifiers to remote peers git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8108 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-26 13:40:33 +00:00
orbiter	e4a82ddd8b	produce a bookmark entry from every crawl start. these bookmarks are always private. these bookmarks will be used to get a source reference for the search in case of intranet or portal searches. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-21 23:10:29 +00:00
orbiter	ff32469272	added a link to /api/util/getpageinfo_p.xml as API to crawl start info and to ViewFile.html git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8035 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-14 20:19:41 +00:00
orbiter	11bebe356b	fixed crawl start: with SVN 7225 the name of the crawl start url was not given in input field and therefore all crawl starts had contained the empty string as crawl start url git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7229 6c8d7289-2bf4-0310-a012-ef5d649a1542	2010-10-08 22:02:24 +00:00
mikeworks	70576e88d2	de.lng: Added some more untranslated strings I found and uncommented old ones that were removed terminal_p.html: Put back the old ID which was really easy to find IndexCreate.js: Because XHTML 1.0 Strict does not allow name tags for some elements rewrote most element access functions to use getElementById Table_API_p.html and all other html pages: Some XHTMl 1.0 Strict fixes, changed checkAll javascript, marked the first row with checkboxes as unsortable where applicable Table_API_p.java and all other java pages: URLencoded lines with possible ampersands & -> & for validation XHTML 1.0 Strict sourcecode --> All Index Create pages should validate now. Hope I did not break anything else (too much :-) git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7225 6c8d7289-2bf4-0310-a012-ef5d649a1542	2010-10-06 00:00:23 +00:00
orbiter	2c549ae341	fixed a number of small bugs: - better crawl star for files paths and smb paths - added time-out wrapper for dns resolving and reverse resolving to prevent blockings - fixed intranet scanner result list check boxes - prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available) - fixed rss feed loader - fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero) - clearing of crawl result lists when a network switch was done - higher maximum file size for crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542	2010-09-30 23:57:58 +00:00
orbiter	f6eebb6f99	replaced auto-dom filter with easy-to-understand Site Link-List crawler option - nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler - nobody ever used the auto-dom filter other than with a crawl depth of 1 - the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste - the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain - the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used - the new option also fits into the easy site-crawl start menu git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542	2010-09-30 12:50:34 +00:00
orbiter	daeea96aea	renamed servlet CrawlStart_p.html to CrawlStartSite_p.html to circumvent problem with translation which still showed old expert crawl start page git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7183 6c8d7289-2bf4-0310-a012-ef5d649a1542	2010-09-22 21:46:31 +00:00

13 Commits