textproc/sonic: Make tokenizer features optional via OPTIONS, adopt port
This patch makes the Japanese and Chinese word segmentation features
optional via FreeBSD OPTIONS helpers, and adopts the port.
Currently the port unconditionally downloads a ~100MB UniDic Japanese
dictionary (unidic-mecab-2.1.2_src.zip) for every build, regardless of
whether the user needs Japanese tokenization. Upstream removed
tokenizer-japanese from default cargo features in v1.4.2 because it
10x'd the final binary size. This patch brings the port in line with
upstream's intent.
Changes:
- MAINTAINER changed to wadegimpbc@tuta.com
- Added CHINESE and JAPANESE OPTIONS using OPTIONS helpers
- OPTIONS_DEFAULT includes CHINESE (matching upstream's default features)
- UniDic download now conditional on JAPANESE option
- CARGO_FEATURES uses --no-default-features with allocator-jemalloc as base, per cargo.mk convention (lines 23-26, 192, 197-200)
- added missing zstd dependency
PR: 293943