Search in Attachment contents (pdf, word, excel etc..)

This section will guide you each step to index and search attachment contents.

Please note: Parsing the useful file contents via PHP is an extremely hard task, as most of these non-basic text document types (docx, PDF, rtf etc..) were designed for specific software/APIs, and PHP has no official readers for them. The plugin uses redundant methods to try to extract every bit of useful information, however it is still possible that some information will be missing or additional useless information will be indexed as well.

Prerequisites

To search attachment contents, the index table engine is required. Before you start with this tutorial, I highly suggest reading up the introduction to the index table.

Minimum requirements & supported formats

Only some of the parser scripts require some standard libraries to be installed/enabled. Usually these modules are enabled on most server hosts by default.

  • For the Smalot PDF parser script - PHP 5.3 is required or later, otherwise the secondary PDF2Txt library is used

  • For Microsoft Office and Open Office documents - ZipArchive and php-xml PHP modules (enabled on most hosts) Supported file types: .docx, .xlsx, .pptx, .odt, .ott, .odm, .ods, .odp

Older MS office 97-2003 file formats may not work! These include .xls, .doc, .ppt

Indexing other documents is still possible, without meeting these requirements (RTF, TXT, CSV etc..)

Possible limitations - fair use

The external indexing libraries are highly optimized, and their performance mostly depends on the actual server performance, however there are a few things to consider when using an average server, that may affect the performance greatly:

  • Document length - documents over 30-60 pages can get very difficult to index, and may fail, especially PDF files. Therefore it is not recommended to use this feature to index long books/documents.

  • File size - documents with large images/attachments can be difficult and costly to read from the servers perspective. Optimally, the document should only contain the text to be indexed, although some graphics should not be an issue at all.

  • Secured or Password protected documents - Secure or password protected documents are not possible to parse.

Step 1 - Index table configuration

Open up the Index table submenu, located under the Ajax Search Pro main menu.

Choosing the Attachment post type to index

On the General panel, under the Post types to index option, choose the Attachment - Media post type, that will unlock the File indexing options.

Choosing the Attachment - Media post type
Unlocked options fieldset afterwards

Choosing the file mime types to index

Each attachment has a so-called mime type. The file mime type determines what file the system is dealing with.

The Attachment mime types to index option lets you input a comma separated list of the desired mime types. After entering the desired mime types, the additional options below will unlock.

‚ÄčHere you can find the list of supported mime types.

Below you will find pre-defined configurations, you can copy/paste those to the field, if you want to.

Attachment mime types input

PDF

application/pdf

Text (html, csv, txt, css etc..)

text/plain,text/csv,text/tab-separated-values,text/calendar,text/css,text/html

RTF

text/richtext,application/rtf

Office Word

Please note: .doc files (old Microsoft office) are not supported. Make sure to convert theme to .docx first.

application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.template,application/vnd.ms-word.template.macroEnabled.12,application/vnd.oasis.opendocument.text

Office Excel

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel.sheet.macroEnabled.12,application/vnd.ms-excel.sheet.binary.macroEnabled.12,application/vnd.openxmlformats-officedocument.spreadsheetml.template,application/vnd.ms-excel.template.macroEnabled.12,application/vnd.ms-excel.addin.macroEnabled.12,application/vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.chart,application/vnd.oasis.opendocument.database,application/vnd.oasis.opendocument.formula

Office PowerPoint

application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,application/vnd.ms-powerpoint.presentation.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slideshow,application/vnd.ms-powerpoint.slideshow.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.template,application/vnd.ms-powerpoint.template.macroEnabled.12,application/vnd.ms-powerpoint.addin.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slide,application/vnd.ms-powerpoint.slide.macroEnabled.12,application/vnd.oasis.opendocument.presentation,application/vnd.oasis.opendocument.graphics

All-in-one: PDF + TXT + RTF + All Office (word, excel, powerpoint)

application/pdf,text/plain,text/csv,text/tab-separated-values,text/calendar,text/css,text/html,text/richtext,application/rtf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.template,application/vnd.ms-word.template.macroEnabled.12,application/vnd.oasis.opendocument.text,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel.sheet.macroEnabled.12,application/vnd.ms-excel.sheet.binary.macroEnabled.12,application/vnd.openxmlformats-officedocument.spreadsheetml.template,application/vnd.ms-excel.template.macroEnabled.12,application/vnd.ms-excel.addin.macroEnabled.12,application/vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.chart,application/vnd.oasis.opendocument.database,application/vnd.oasis.opendocument.formula,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,application/vnd.ms-powerpoint.presentation.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slideshow,application/vnd.ms-powerpoint.slideshow.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.template,application/vnd.ms-powerpoint.template.macroEnabled.12,application/vnd.ms-powerpoint.addin.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slide,application/vnd.ms-powerpoint.slide.macroEnabled.12,application/vnd.oasis.opendocument.presentation,application/vnd.oasis.opendocument.graphics

All-in-one + Images

application/pdf,text/plain,text/csv,text/tab-separated-values,text/calendar,text/css,text/html,text/richtext,application/rtf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.template,application/vnd.ms-word.template.macroEnabled.12,application/vnd.oasis.opendocument.text,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel.sheet.macroEnabled.12,application/vnd.ms-excel.sheet.binary.macroEnabled.12,application/vnd.openxmlformats-officedocument.spreadsheetml.template,application/vnd.ms-excel.template.macroEnabled.12,application/vnd.ms-excel.addin.macroEnabled.12,application/vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.chart,application/vnd.oasis.opendocument.database,application/vnd.oasis.opendocument.formula,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,application/vnd.ms-powerpoint.presentation.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slideshow,application/vnd.ms-powerpoint.slideshow.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.template,application/vnd.ms-powerpoint.template.macroEnabled.12,application/vnd.ms-powerpoint.addin.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slide,application/vnd.ms-powerpoint.slide.macroEnabled.12,application/vnd.oasis.opendocument.presentation,application/vnd.oasis.opendocument.graphics,image/jpeg,image/gif,image/png,image/bmp,image/tiff,image/x-icon

Everything (every mime type supported by wordpress)

application/pdf,text/plain,text/csv,text/tab-separated-values,text/calendar,text/css,text/html,text/richtext,application/rtf,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-word.document.macroEnabled.12,application/vnd.openxmlformats-officedocument.wordprocessingml.template,application/vnd.ms-word.template.macroEnabled.12,application/vnd.oasis.opendocument.text,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel.sheet.macroEnabled.12,application/vnd.ms-excel.sheet.binary.macroEnabled.12,application/vnd.openxmlformats-officedocument.spreadsheetml.template,application/vnd.ms-excel.template.macroEnabled.12,application/vnd.ms-excel.addin.macroEnabled.12,application/vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.chart,application/vnd.oasis.opendocument.database,application/vnd.oasis.opendocument.formula,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,application/vnd.ms-powerpoint.presentation.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slideshow,application/vnd.ms-powerpoint.slideshow.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.template,application/vnd.ms-powerpoint.template.macroEnabled.12,application/vnd.ms-powerpoint.addin.macroEnabled.12,application/vnd.openxmlformats-officedocument.presentationml.slide,application/vnd.ms-powerpoint.slide.macroEnabled.12,application/vnd.oasis.opendocument.presentation,application/vnd.oasis.opendocument.graphics,image/jpeg,image/gif,image/png,image/bmp,image/tiff,image/x-icon,video/x-ms-asf,video/x-ms-wmv,video/x-ms-wmx,video/x-ms-wm,video/avi,video/divx,video/x-flv,video/quicktime,video/mpeg,video/mp4,video/ogg,video/webm,video/x-matroska,application/wordperfect,application/vnd.apple.keynote,application/vnd.apple.numbers,application/vnd.apple.pages

Enabling file content indexing

After entering the desired mime types, the the file content indexing options will unlock (based on which mime types are entered)

Click on the On/Off buttons to switch which file type contents should be indexed.

After entering the mime types, the corresponding options are unlocked

Save and Index

After choosing all the desired options, it is time to Save the configuration on the bottom of the page, and then generating the index.

Save the options
Scroll back up, and click the Create new index button, then wait

Step 2 - Search instance configuration

We are almost done, now the desired search instance needs to be configured to use the index table for attachments. If you have not created a search instance yet, make sure to do it first.

On the search instance options, go to the General Options -> Attachments panel. After doing so, change the first two options:

  • Search engine for attachments: Index table engine

  • Return attachments as results: ON

Selecting the Index table for attachments, and enabling attachments as results.

Save the options, and it is done. The search should return attachments based on their content now.