lionsoul2014/friso

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.

505
CApache License 2.0
cchinese-tokenizerchinese-word-segmentationcjk-tokenizerfull-text-searchjapanese-tokenizerkorean-tokenizerphp-tokenizertokenizer
Stars

505

Updated

Oct 31, 2025

Stars Over Time

Top Contributors

Related Repositories

Track developers from lionsoul2014/friso

Join 1,000+ companies finding quality developer leads