Commit Graph

31 Commits

Author SHA1 Message Date
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
karlchenofhell
0c7b8cf632 - added first version of new wiki-parser
- added blacklist support to manual URLFetcher stack fill
- fix for NPE: http://www.yacy-forum.de/viewtopic.php?t=3559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:31:36 +00:00
orbiter
109ed0a0bb - cleaned up code; removed methods to write the old data structures
- added an assortment importer. the old database structures can
  be imported with
  java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
  to the collection database. The call is
  java -classpath classes yacy -migratewords
  (as it was)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 02:47:51 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
theli
ed8227d222 *) Bugfix for NullpoinerException in IndexCreateIndexingQueue_p.java
See: http://www.yacy-forum.de/viewtopic.php?p=25874

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-27 04:35:02 +00:00
orbiter
3aac5b26da - added automatic tag generation when a web page from the search results is added
- added new image 'B' in front of search results for bookmark generation
- added news generation when a public bookmark is added
- the '+' in front of search results has new meaning: positive rating for that result
- added news generation when a '+' is hit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 00:37:02 +00:00
theli
7a35b8e237 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:36:19 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
theli
b1b8ba719e *) adding links to specify the amount of entries of a queue that should be displayed on the gui
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 08:45:12 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
low012
d5c36c8e2e *) now showing the total number of entries in the queue in addition to the number of entries in the list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1168 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 23:24:46 +00:00
low012
edaa820bec resubmitting Allos patch after accidentally removing it
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1137 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 21:42:01 +00:00
low012
c45e47bf91 fixed vulnerability, see http://www.yacy-forum.de/viewtopic.php?t=1535
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1136 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 16:51:59 +00:00
allo
38d915e24c limiting the Number of Items displayed to 100 max.
http://www.yacy-forum.de/viewtopic.php?p=13275


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-27 09:50:23 +00:00
hydrox
cb69047b91 *)cleanup access static methods and fields
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 17:56:26 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
low012
4dbc871524 *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it.
*) Just a tiny bit of cleanig up in News.java. (I messed it up myself some time ago.)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-19 20:36:29 +00:00
theli
d666630bad *) Bugfix for Status class not found
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 22:23:53 +00:00
theli
dc85b1021e *) Bugfix of Delete-Indexqueue-Entry, Clear Indexing Queue functionality
- htcache files will now also be deleted if entry should not be stored into cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 08:03:04 +00:00
theli
dc0a2d4c11 *) Bugfix for Loader Queue:
Job count was not displayed correctly
*) IndexingQueue:
- now it's possible to delete single entries from the queue
- now it's possible to clear the whole queue
  See: http://www.yacy-forum.de/viewtopic.php?t=995

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 11:40:40 +00:00
theli
732a107160 *) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
- Logging message for "urlEntry.url() == null" is now displayed as info
   - IndexCreateWWWLocalQueue_p.html now detects null entries while looping throug the list and removes them automatically
   See: 
   - http://www.yacy-forum.de/viewtopic.php?t=532#8781
   - http://www.yacy-forum.de/viewtopic.php?t=639
   - http://www.yacy-forum.de/viewtopic.php?t=1071
   - http://www.yacy-forum.de/viewtopic.php?t=338
   - http://www.yacy-forum.de/viewtopic.php?t=980

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 09:33:05 +00:00
theli
33aaffbfc6 *) Displaying content size of each entry in indexing queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 08:22:11 +00:00
theli
0471019606 *) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 22:05:20 +00:00
theli
bead8a32aa *) IndexCreate_p.java:
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
   Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java 
   instead of an iterator to display the indexing-list. 
   Advantages: avoid concurrent modifications of the list while displaying it. 
               Speedup because now we have to access only one sync function instead of multiple ones 
               (one for each entry)
*) IndexCreateIndexingQueue_p.java
   Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
   If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of 
   the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
   used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
   Now it's possible to delete single entries of the local crawler queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 07:52:46 +00:00
orbiter
19dbed7cc8 code clean-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 15:09:35 +00:00
orbiter
419f8fb398 fixed bugs/missing code regarding new crawl stack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-07 01:38:49 +00:00
orbiter
858cd94299 replaced indexing ram-queue by file-based stack-queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-06 14:48:41 +00:00
orbiter
252c6e4869 added crawl queue monitor for global crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 15:07:33 +00:00
orbiter
9a3f80403e redesigned IndexCreate menu -- introduced submenues to enable more crawl queue control pages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 12:42:29 +00:00