* FIX: Handle unicode on tokenizer Our fast track code broke when strings had characters who are longer in tokens than in UTF-8. Admins can set `DISCOURSE_AI_STRICT_TOKEN_COUNTING: true` in app.yml to ensure token counting is strict, even if slower. Co-authored-by: wozulong <sidle.pax_0e@icloud.com> |
||
|---|---|---|
| .. | ||
| all_mpnet_base_v2_tokenizer.rb | ||
| anthropic_tokenizer.rb | ||
| basic_tokenizer.rb | ||
| bert_tokenizer.rb | ||
| bge_large_en_tokenizer.rb | ||
| llama2_tokenizer.rb | ||
| mixtral_tokenizer.rb | ||
| multilingual_e5_large_tokenizer.rb | ||
| open_ai_tokenizer.rb | ||