fast wordpiece tokenization