Commit Graph

12237 Commits

Author SHA1 Message Date
reger
756c55e6d1 upd to Solr 5.4.1 2016-02-06 21:32:54 +01:00
reger
937fbb0b9f correct isHidden() for smb from last commit 2016-02-04 19:20:27 +01:00
reger
535d4bf75f respect hidden attribute for file and smb directory listing
(hidden directories are not listed, effects crawling of local file system)
2016-02-04 19:16:00 +01:00
reger
cc79ad8de6 compare search page, remove diminished search target
(romso.de, dbpedia.neofonie.de )
2016-02-04 00:47:42 +01:00
reger
375d49d536 upd classpath in batches (remove not necessary htroot)
see prev commit
2016-02-03 21:50:50 +01:00
reger
c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring 
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot 
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
2016-02-02 03:44:01 +01:00
reger
8e60788c8f fix json date facet displayname 2016-01-31 02:38:39 +01:00
reger
46772e08d0 upd to pdfbox 1.8.11 2016-01-31 00:30:39 +01:00
reger
a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
2016-01-28 23:14:09 +01:00
reger
2048b7e057 support scraping start-/enddate from html tag with property "datetime"
This may be used in html5 <time> tag (which we don't explicite support yet for date in content scraping).
2016-01-26 21:27:44 +01:00
reger
900d4584ba complet resource cleanup of lists in contentscraper's close() 2016-01-25 23:54:20 +01:00
reger
06e5cd6164 add support parsing swf-metadata to swfparser
flash supports metadata tag in swf file with metadata in xmp (xml) format.
parse some common data to include it in the head section of the html string
of converttohtml.
2016-01-25 22:13:04 +01:00
reger
11b1587067 replace remaining use of java.util.Vector by ArrayList (WebCat-swf) 2016-01-24 02:30:27 +01:00
reger
9331acdb18 add support for DEFINEFONT3 (swf8) to webcat parser
experienced issue with JPEGTABLE tag (with length=0) causing abort of parsing (ioexception)
as we don't use/need it for text parsing skip this tag.
2016-01-23 22:46:22 +01:00
reger
bf5fca5d99 add missing swf tag constants according to latest spec
reduce use of synced vector in webcat parser
2016-01-23 20:19:01 +01:00
reger
1f18653de0 pass parsed swf content trough htmlscraper
Swf may contain subset of html tags which shoul'd appear as text.
Especially <font> tag may totally screw up metadata servlet if not filtered out.
2016-01-21 02:55:05 +01:00
reger
18ecf57792 add support of compressed swf to swfParser
from JavaSWF2 (source compatible to WebCat).
Moved swf file signature check to parser
Changed use of synced vector to list swf InStream
2016-01-20 00:58:29 +01:00
sixcooler
5cb7ba0dc4 fix for connections not getting closed to get favicon.ico during seach 2016-01-19 20:57:22 +01:00
sixcooler
e1dd808e1c fix for 'move test classes to test/java' 2016-01-19 20:50:26 +01:00
reger
6c25710a34 replace bugfixed webcat-swf.jar 2016-01-18 23:36:18 +01:00
reger
4213ff84d4 import WebCat swf parser custom source package
This package is not available as jar (used jar is a custom compile as we 
use just a portion of the package) 
WebCat package is not maintained. To be able to fix bugs, source extract 
of swf parser imported here.
2016-01-18 22:41:49 +01:00
reger
bceb779414 refactor libbuild/GitRevMavenTask (marvenize)
to be able to add additional modules to build
2016-01-18 22:29:17 +01:00
reger
730fb43ab1 add translation DE,FR submenuRanking.template
upd translation DE RankingSolr_p
2016-01-17 14:52:24 +01:00
reger
84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout)
because ViewImage*Test.java breaks test run
2016-01-16 19:22:27 +01:00
reger
9f91e6124f add DE translation for submenuCrawler.template
+ upd submenuIndexControl.template
2016-01-16 17:55:20 +01:00
reger
ed3e16e092 apply remote result count config value to Bookmark Autosearch
+ prepare to make the widely unused Bookmark feature optional
2016-01-15 02:10:10 +01:00
Michael Peter Christen
5d635879f8 Merge pull request #40 from Scarfmonster/autocrawl
Automatic crawling
2016-01-14 22:19:55 +01:00
Ryszard Goń
7d6e0d8470 Add missing settings to autocrawl settings page 2016-01-14 03:27:33 +01:00
Ryszard Goń
7a7a1277bd Add autocrawl settings page 2016-01-14 02:40:46 +01:00
Ryszard Goń
a98c395023 Add the Autocrawl thread 2016-01-14 00:50:23 +01:00
reger
4765e374e6 altered clac. of search result items per page to display
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
  - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
  - otherwise requests are limited to 100 results per page ( = search option, index.html)
      (this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)

see http://mantis.tokeek.de/view.php?id=627
2016-01-13 01:30:49 +01:00
Ryszard Goń
1728cd30c6 Create autocrawl profiles 2016-01-12 16:28:34 +01:00
reger
abd8ecb503 remove contendom depending override of search result items per page
initially introduced e4570bffaf (diff-ae6c130fc11088c830b00ed9256ab56b)
(as one part of unexpected difference in actual vs requested results, partial bugfix for http://mantis.tokeek.de/view.php?id=627 )
2016-01-12 01:04:10 +01:00
reger
8271f783ca upd pom javadoc goal
to not fail a build on javadoc errors
2016-01-11 01:38:45 +01:00
reger
ff27824964 fix swfParser reading file signature
before passing to library (current version expects data w/o signature)
2016-01-10 01:16:31 +01:00
reger
b29db4640c update Maven pom - add release-profile
to create the release archive only if profile is activated (speeding up normal compilation)
- bind install of the 2 jar's not available in repository to validate phase (was clean)
  to automatically add these to local repository (with disadvantage it's done every time)
2016-01-09 20:32:47 +01:00
reger
04161912a5 fix tray icon switch
(using predefined/correct config name)
2016-01-09 01:19:06 +01:00
reger
e3d53f0248 add de translation for IndexExport_p 2016-01-08 02:46:29 +01:00
reger
9f5b768d84 fix typo in translation (de,hi) for AccessTracker_p
- rem some not translated in ru (-> currently best maintained translation)
2016-01-08 01:10:09 +01:00
reger
c91e712178 further refactor using standard java / (one) utf-8 charset variable
extending initiative of commit 9a25751850
2016-01-07 16:17:37 +01:00
Michael Peter Christen
e3e8015306 Merge pull request #28 from Stepanov-Sergey/patch-1
fixed typos
2016-01-06 14:57:26 +01:00
Michael Peter Christen
3dbd3caecf Merge pull request #37 from sudheesh001/LogFix
Log files are commitable and shouldn't be
2016-01-06 14:56:55 +01:00
Michael Peter Christen
9a25751850 Merge pull request #38 from luccioman/master
Refactoring : use StandardCharsets instead of hardcoded charset names
2016-01-06 14:55:54 +01:00
reger
bfcca6bfee update on translation files
- delete removed servlets
 AugmentedBrowsingFilters_p.html (de)
 CrawlStartIntranet_p.html (de)
 IndexCreateWWW***Queue_p.html (de)
 Ranking_p.html (de)
- add
 IndexCreateQueues_p.html
- rename
 Settings_Http.inc -> Settings_ProxyAccess.inc
 Language_p.html -> ConfigLanguage_p.html
2016-01-06 11:36:57 +01:00
reger
c283efdd6d remove obsolete css style for removed file CacheAdmin_p.html
and remove from translations
2016-01-06 00:51:49 +01:00
luc
571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
charset names.
2016-01-05 23:37:05 +01:00
reger
218061752e add missing quote chars in sk.lng translation file
+ minor: del one redundancy
2016-01-05 10:48:51 +01:00
reger
e8256bb3b1 remove blekko from opensearch config (not available)
see https://blekko.com/
http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633
2016-01-04 04:49:10 +01:00
sudheesh001
23ac8d186e Log files are commitable and shouldn't be 2016-01-04 07:45:08 +05:30
reger
1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
2016-01-03 01:11:27 +01:00