Commit Graph

5594 Commits

Author SHA1 Message Date
theli
351c86d5d9 *) Migration of optional Content Parser integration
- each additional parser must be in a subpackage 
  of plasma.parser
- each parser must have its own ant build file (which will 
  be called automatically from the main build file)
- Calling the main build file results in building a separate 
  zip file for each optional parser. This zip file includes:
  + sources of the Parser.java
  + compiled classes of the Parser.java
  + needed additional libs (libx)
- To install an additional parser the user simply needs to
  extract the zip file listed above into his/her yacy directory.
- The configuration (enabling/disabling) of a parser can be done
  via the webinterface (currently the settings dialoge) and is
  done "on-the-fly". The installation can not be done "on-the-fly"
  at the moment because of classpath issues.
- The classpath of the linux startup/stop scripts is generated 
  automatically now (including all libraries from lib and libx).

*) Bugfix: File Extension was not calculated correctly by the crawler
   e.g.: file extension was accidentally: .php?param=value
   Corrected.

*) Adding additional parser for parsing of rss/atom feeds
- added needed libs to do this.

TODO:
- automatic building classpath for windows startup scripts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@78 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-03 09:47:56 +00:00
orbiter
d0010ff0b0 last changes for release 0.37
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@76 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-02 12:23:15 +00:00
orbiter
f99930c04b fixed brute-force + peer-disconnect - Bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@75 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-01 23:31:21 +00:00
orbiter
c7c6aaf06e many bug-fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@73 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-30 01:22:46 +00:00
orbiter
48650c082c fixed 100%-CPU-Bug in plasmaCondenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@72 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-29 12:07:13 +00:00
orbiter
995673d795 several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@71 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-28 22:04:57 +00:00
orbiter
2de90020ed fixed caching+synchronization+brute-force-denial
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@67 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-27 21:09:40 +00:00
orbiter
9156fd53bc fixed bugs in last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@65 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 15:47:33 +00:00
orbiter
e25f2354c2 removed synchronization and thread blockings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@63 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 14:19:44 +00:00
theli
3756e6d20f *) "Httpc object was not returned to object pool." bug fixed.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@62 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 10:38:35 +00:00
theli
47e426ff7e *) one possible deadlock (because of nested object locks) removed in class kelondroMap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@61 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 08:33:59 +00:00
theli
58a65b60bd *) synchronized keyword removed from function processLocalCrawling to avoid deadlocks.
This synchronized keyword is not needed anymore because of the crawler jobqueue which
   is responsible for the synchronization now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@60 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-26 06:59:36 +00:00
theli
65fc650109 *) plasmaCrawlLoader shutdown problem fixed (hopefully)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@59 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 16:34:16 +00:00
orbiter
ba16da72b4 fixed not-working kelondroRecords-Cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@56 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 14:46:59 +00:00
orbiter
d03d60f8b5 separated yacy-core from yacy-libx; fixed makerelease
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@55 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 12:42:14 +00:00
allo
c09c54c652 staticIP Property, for people with dyndns aliases ;-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@54 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 12:34:11 +00:00
allo
d005d7484e yacyDebugMode - allow Lan-IPs for testing
where was the Code from 0.25 lost?


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@53 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-25 12:13:49 +00:00
orbiter
7fb645b0ab enhanced crawling performance, changed memory settings, new performace options
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@51 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 23:15:40 +00:00
theli
10078bb354 *) date string was accidentally replaced with the current value
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@50 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:55:52 +00:00
theli
fd584c113c *) some minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@49 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:52:11 +00:00
theli
f44b219e44 *) Eclipse has accidentally copied in the wrong file header into the new files (because these headers were accidentally set as default for the whole workspace instead of the project)
Fixed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@48 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:47:34 +00:00
theli
081ebd5517 *) I've accidentally used Java 5.0 syntax for enumerations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@47 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:42:02 +00:00
theli
58b1a0ba40 *) adding an new package for extra content parsers
*) adding content parser for
- pdf (using the pdf-box library)
- doc (using the textmining.org library)
*) adding a Interface for content parsers
*) adding a configuration file which can be used to configure which parser is used for which mimeType
*) Sempahore class was moved and renamed to serverSemaphore
*) Changing yacy shutdown behaviour
Buzy waiting loop for shutdown was removed and replaced with a blocking call (using the semaphore class mentioned above) to the new switchboard.waitForShutdown method.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@46 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-24 21:24:53 +00:00
orbiter
8b31f9e202 enhanced shut-down behaviour & added experimental nio-wrapper for kelondroRA (not active yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@44 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-23 13:00:56 +00:00
orbiter
87a61a01c2 fixed bad-gzip-trailer behaviour (now cuts off trailer)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@42 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-22 13:45:07 +00:00
orbiter
00f223cfc1 fixed post-parsing (a case when the bluelist is empty)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@41 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-21 17:13:43 +00:00
theli
c9c0a1f11c *) Trying to speedup local crawling
- introduction of a threadpool for crawling
- introduction of a job queue to avoid buzy waiting for a free crawler slot

*) New classes added
- queue for receiving of crawler jobs
- semaphore class to do reader/writer synchronization (mutual exclusion)
- message object to hold all needed data about a crawler job

*) Trying to solve session-thread shutdown problem
- session thread stopped variable is now set from outside before interrupting the
  session thread.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@39 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-21 10:31:40 +00:00
(no author)
942914ffd2 *) Adding additional functions to serverByteBuffer so that it
can be used instead of a ByteArrayOutputStream
*) Using a serverByteBuffer for lineBuffering in class httpc
   instead of a ByteArrayOutputStream

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@35 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-20 07:39:40 +00:00
(no author)
432e01910b *) Bugfix: Image falsification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@34 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-20 06:41:52 +00:00
orbiter
97ec8d65e4 fixed makerelease & clean-up of dead code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@33 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-19 14:04:16 +00:00
(no author)
4a76ccc6d6 *) Some minor bugfixes
- httpc: wrong error-message on 404
- httpc: error message was accidentally shown when object 
  was released from pool


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@31 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-19 10:42:48 +00:00
(no author)
1fec00bc24 *) Bugfix to avoid Nullpointer-Exceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@30 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-19 10:39:58 +00:00
(no author)
f39812da91 *) Some performance improvements
- many classes set to final
- implementation of a session-thread pool
- reusage of the server handler class (normally the httpd object)
  within the session thread
- implementation of a httpc object pool
- introduction of a linebuffer in httpd which can be reused
- reusing the properties table in the httpc
- added to apache libs (commons-collections, commons-pool) which 
  are needed for the object/thread pool implementation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@26 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-19 06:55:57 +00:00
orbiter
b9203bdb50 bug fixes and code cleaning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@22 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-15 14:18:14 +00:00
allo
c13411c198 Buildfile which inserts the Date.
The Version is set in the source; so it will be correct if you check old versions out.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@21 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-15 13:50:42 +00:00
(no author)
b7d4389e4b *) support for Proxy Auto-Config File generation added.
File is accessible using: 
   http://proxy:8080/autoconfig.pac

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@20 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-15 09:06:15 +00:00
orbiter
c0807abd33 new crawl/proxy/cache design + fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@18 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 23:00:20 +00:00
orbiter
e7d055b98e very experimental integration of the new generic parser and optional disabling of bluelist filtering in proxy. Does not yet work properly. To disable the disable-feature, the presence of a non-empty bluelist is necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@17 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 15:52:00 +00:00
orbiter
96516fc9d8 fixed bugs (search+kelondroException, dns)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@16 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-13 11:16:31 +00:00
orbiter
a87a17a3c8 prepared generic text parser environment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@15 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-12 22:57:54 +00:00
orbiter
e374aca2cd enhanced exception handling in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@14 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-12 15:45:50 +00:00
orbiter
072052f150 fixed bugs (dns, seedDB)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@13 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-11 22:44:40 +00:00
orbiter
89eb9a2292 fixed bug with crawl profiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@12 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-10 23:51:42 +00:00
orbiter
248077d3f0 initial load with yacy 0.36
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-04-07 19:19:42 +00:00