yacy_search_server/source/net/yacy/document/parser
2011-09-09 23:00:45 +00:00
..
html bugfixes in html parser 2011-08-31 16:02:06 +00:00
images protection against OOM cases in image parser. See also bugs.yacy.net/view.php?id=54 2011-09-09 23:00:45 +00:00
xml more UTF8 getBytes() performance hacks 2011-04-12 05:02:36 +00:00
bzipParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00
csvParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
docParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
genericParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
gzipParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00
htmlParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00
mmParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
odtParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
ooxmlParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
pdfParser.java added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html 2011-06-05 20:04:41 +00:00
pptParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
psParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
rssParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
rtfParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
sevenzipParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00
sidAudioParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
sitemapParser.java better abstraction of http client identification 2011-04-26 13:35:29 +00:00
swfParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
tarParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00
torrentParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
vcfParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
vsdParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
xlsParser.java - enhanced html parser: recognized much more details in the content 2011-04-21 13:58:49 +00:00
zipParser.java added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled. 2011-09-07 10:08:57 +00:00