Commit Graph

16 Commits

Author SHA1 Message Date
orbiter
f8b8c82421 - refactoring of getpageinfo_p.xml (moved out of util)
- added more logging in getpageinfo_p.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-15 00:22:40 +00:00
orbiter
ff32469272 added a link to /api/util/getpageinfo_p.xml as API to crawl start info and to ViewFile.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8035 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-14 20:19:41 +00:00
orbiter
c36da90261 added a very fast ftp file list generator to site crawler:
- when a site-crawl for ftp sites is now started, then a special directory-tree harvester gets the complete directory structure of a ftp server at once
- the harvester runs concurrently and feeds into the normal crawl queue

also in this:
- fixed the 'start from file' crawl function
- added a link detector for the html parser. The html parser can now also extract links that are not included in <a> tags.
- this causes that a crawl start is now also possible from clear text link files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-12-09 17:17:25 +00:00
mikeworks
70576e88d2 de.lng: Added some more untranslated strings I found and uncommented old ones that were removed
terminal_p.html: Put back the old ID which was really easy to find
IndexCreate.js: Because XHTML 1.0 Strict does not allow name tags for some elements rewrote most element access functions to use getElementById
Table_API_p.html and all other html pages: Some XHTMl 1.0 Strict fixes, changed checkAll javascript, marked the first row with checkboxes as unsortable where applicable
Table_API_p.java and all other java pages: URLencoded lines with possible ampersands & -> &amp; for validation XHTML 1.0 Strict sourcecode
--> All Index Create pages should validate now. Hope I did not break anything else (too much :-)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-06 00:00:23 +00:00
orbiter
2c549ae341 fixed a number of small bugs:
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 23:57:58 +00:00
orbiter
f6eebb6f99 replaced auto-dom filter with easy-to-understand Site Link-List crawler option
- nobody understand the auto-dom filter without a lenghtly introduction about the function of a crawler
- nobody ever used the auto-dom filter other than with a crawl depth of 1
- the auto-dom filter was buggy since the filter did not survive a restart and then a search index contained waste
- the function of the auto-dom filter was in fact to just load a link list from the given start url and then start separate crawls for all these urls restricted by their domain
- the new Site Link-List option shows the target urls in real-time during input of the start url (like the robots check) and gives a transparent feed-back what it does before it can be used
- the new option also fits into the easy site-crawl start menu

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 12:50:34 +00:00
orbiter
d126d6c1b5 renamed the servlet WatchCrawler_p to Crawler_p
this was done because that servlet may be used for wget/cronjob
triggered crawl starts and it appears to be confusing that the
name of the crawl start servlet looks like a pure monitoring tool.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6568 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-12 10:05:28 +00:00
orbiter
c6c97f23ad - added cache usage properties to crawl start
- added special rule to balancer to omit forced delays if cache is used exclusively
- extended the htCache size by default to 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 11:54:04 +00:00
lotus
187ee4d06e another IE fix (also same names in html and js)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-22 11:24:01 +00:00
orbiter
6663365720 adopted many calls to new api path
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-16 00:02:55 +00:00
apfelmaennchen
8d1bedfc3a - added bookmarkTitle to CrawlStart_p.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-21 21:07:21 +00:00
f1ori
76eac114ed * define global javascript-variable with var to get rid of warnings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 19:51:19 +00:00
theli
e75ca857c3 *) Bugfix for problem with ajax graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 07:40:32 +00:00
orbiter
a3ecfe0a45 replaced failed-icon by new 'bad'-icon
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-07 14:05:49 +00:00
theli
6f46245a51 *) Bookmarks: Ajax icon is displayed while loading title
*) First version of a sitemap parser added
   - currently only autodetection of sitemap files is supported
*) DB-Import restructured
   - pause/resume should work again now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 09:52:04 +00:00
allo
91b78d9f04 missing File for IndexCreate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-18 12:01:52 +00:00