Adding custom dictionaries

Creating custom dictionary files

One custom dictionary can be created for each language already supported by the spell checker (see supported languages) or any arbitrary language added by additional Hunspell dictionary files included in Hunspell Dictionary Path (See Add Hunspell dictionaries to Spell Checker Pro). It’s also possible to define an additional "global" dictionary that contains words that are valid across all languages, such as trademarks.

A custom dictionary file for a particular language must be named with the language code of the language (see supported languages for language code examples), plus the suffix .txt: E.g. en.txt, en_gb.txt, fr.txt, de.txt etc.

The "global" dictionary file for language-independent words must be called "global.txt".

The server will scan the dictionary directory as per configuration above and pick up "txt"-files for each language and the global file as present.

Custom dictionary file format

A dictionary file must be a simple text file with:

  • one word on each line,

  • either Windows-style or Linux-style line endings (CR or CR+LF)

  • no comments or blank lines, and

  • saved in UTF-8 encoding, with or without BOM (byte-order mark).

The last point is important for files created or edited on non-Linux (Windows or Mac) systems, as these will usually encode text files differently. However, Windows or Mac editors such as Windows Notepad can optionally save files in UTF-8 if asked to do so. Please check your editor of choice for this functionality. Failure to chose the correct encoding will result in problems with non-English letters such as umlauts and accents.

German and Finnish languages - Spell checking in German and Finnish will employ compound word spell checking. Compound words such as "Fußballtennis" will be assumed correct as long as the root words "Fußball" and "Tennis" are individually present in the dictionary. It is not necessary to add "Fußballtennis" separately.

Configuring the custom dictionary feature

Additional configuration to your application.conf file is required. (Don’t forget to restart the Java application server after updating the configuration.)

The ephox.spelling.custom-dictionaries-path element is used to define the location of the custom dictionaries. When the setting is not provided, no custom dictionaries are loaded.

Requirements:

  • The directory containing the custom dictionaries must be on same server machine as the java service.

  • The directory should not contain subdirectories or non-dictionary files.

Tiny recommends storing the custom dictionaries in a similar location to the application.conf file. For example, if application.conf is in a directory called /opt/ephox, the dictionary files could be stored in the subdirectory /opt/ephox/dictionaries.

Example:

ephox {
  spelling {
    custom-dictionaries-path = "/opt/ephox/dictionaries"
  }
}

Dynamic Custom Dictionaries

Adding the ephox.spelling.dynamic-custom-dictionaries element and setting it to true instructs the spelling service to periodically check the custom-dictionaries-path for changes, and update the custom dictionaries accordingly. This allows updates to the custom dictionaries without restarting the spelling service. The default value is false.

Example:

ephox {
  spelling {
    custom-dictionaries-path = "/opt/ephox/dictionaries"
    dynamic-custom-dictionaries = true
  }
}

Verifying custom dictionary functionality

If successfully configured, the custom dictionary feature will report dictionaries found in the application server’s log at service startup.

Example:

2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - Starting task (booting Ironbark)
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: [global] = 1 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "en" = 3 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "fr" = 2 words
2017-06-12 17:46:01 [main] INFO  com.ephox.ironbark.IronbarkBoot - Finished task (booting Ironbark)

The above log shows that 3 custom dictionaries were found, one "global", language-independent one and one each for English and French. They were found to contain 1, 3 and 2 words, respectively. Please check that this report matches your expectations.

Ongoing dictionary maintenance

Unless the ephox.spelling.dynamic-custom-dictionaries setting is set to true, future additions/changes to dictionaries after the initial deployment will require a restart of the spell check service each time.