[dgit import tarball sentencepiece 0.1.97-3 sentencepiece_0.1.97-3.debian.tar.xz]
--- /dev/null
+# senencepiece for Debian
+
+The upstream of sentencepiece 0.1.97 was initially released around June 6, 2022,
+but it was withdrawed and re-released as same version at Aug 7, 2022 again.
+
+Thus, some commits were not included into 0.1.97-1.
+
+To fix up this issue, commits since 5e5adf2f851a1514ccc435aae11ee830c438321b
+were applied as the following patch files.
+
+NOTE: Drop these patch files when newer version was released.
+
+0001-update-python-wrapper.patch
+0002-remove-debug-symbols-from-wheel-package.patch
+0003-allow-tab-character-to-be-used-in-user_defined_symbo.patch
+0004-add-test-to-use-tab-as-user-defined-symbols.patch
+0005-Uses-C-17-by-default.patch
+0006-Uses-std-atomic-to-define-global-variable.patch
+0007-Fix-a-typo.patch
+0008-Uses-absl-string_view-as-much-as-possible.patch
+0009-Fixed-build-break.patch
+0010-Added-ImmutableSentencePiece-class.patch
+0011-add-verbose-option.patch
+0012-Supports-ImmutableSentencePieceText-from-python-modu.patch
+0013-Adds-more-unittests.patch
+0014-Adds-SWIGPYTHON-flag.patch
+0015-remove-unused-ifdef-SWIG-macro.patch
+0016-Fixed-test-failure.patch
+0017-Uses-property-in-immutable-proto.patch
+0018-automatically-detect-the-number-of-CPUs-in-batch-pro.patch
+0019-support-slice-in-pieces-nbests-objects.patch
+0020-Updated-the-document.patch
+0021-Fixed-errors-in-example-notebook.patch
+0022-Fix-dead-links.patch
+0023-added-ShutdownLibrary-function-to-uninitialize-globa.patch
+0024-Fixed-the-issue-of-concatinating-paths-for-pkg-confi.patch
+
+
--- /dev/null
+sentencepiece (0.1.97-3) unstable; urgency=medium
+
+ * debian/patches/0001-update-python-wrapper.patch
+ debian/patches/0002-remove-debug-symbols-from-wheel-package.patch
+ debian/patches/0003-allow-tab-character-to-be-used-in-user_defined_symbo.patch
+ debian/patches/0004-add-test-to-use-tab-as-user-defined-symbols.patch
+ debian/patches/0005-Uses-C-17-by-default.patch
+ debian/patches/0006-Uses-std-atomic-to-define-global-variable.patch
+ debian/patches/0007-Fix-a-typo.patch
+ debian/patches/0008-Uses-absl-string_view-as-much-as-possible.patch
+ debian/patches/0009-Fixed-build-break.patch
+ debian/patches/0010-Added-ImmutableSentencePiece-class.patch
+ debian/patches/0011-add-verbose-option.patch
+ debian/patches/0012-Supports-ImmutableSentencePieceText-from-python-modu.patch
+ debian/patches/0013-Adds-more-unittests.patch
+ debian/patches/0014-Adds-SWIGPYTHON-flag.patch
+ debian/patches/0015-remove-unused-ifdef-SWIG-macro.patch
+ debian/patches/0016-Fixed-test-failure.patch
+ debian/patches/0017-Uses-property-in-immutable-proto.patch
+ debian/patches/0018-automatically-detect-the-number-of-CPUs-in-batch-pro.patch
+ debian/patches/0019-support-slice-in-pieces-nbests-objects.patch
+ debian/patches/0020-Updated-the-document.patch
+ debian/patches/0021-Fixed-errors-in-example-notebook.patch
+ debian/patches/0022-Fix-dead-links.patch
+ debian/patches/0023-added-ShutdownLibrary-function-to-uninitialize-globa.patch
+ debian/patches/0024-Fixed-the-issue-of-concatinating-paths-for-pkg-confi.patch
+ - Add missing patch files for 0.1.97.
+ * debian/README.Debian
+ - Add explanation of debian/patches.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Mon, 21 Nov 2022 22:43:46 +0900
+
+sentencepiece (0.1.97-2) unstable; urgency=medium
+
+ * Team upload
+
+ [ Steve Langasek ]
+ * debian/patches/header-dependencies.patch: include necessary headers
+ to ensure IS_BIG_ENDIAN is defined, see #1017360.
+
+ -- Graham Inggs <ginggs@debian.org> Sun, 18 Sep 2022 05:30:57 +0000
+
+sentencepiece (0.1.97-1) unstable; urgency=medium
+
+ * New upstream version 0.1.97
+ * debian/copyright
+ - Update maintainer E-mail address
+ * debian/control
+ - Bump Standards-Version to 4.6.1. No other changes are required.
+ * debian/patches/support-python-module-in-place.patch
+ - Refresh path to build python module.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Tue, 14 Jun 2022 20:19:58 +0900
+
+sentencepiece (0.1.96-1) unstable; urgency=medium
+
+ * New upstream version 0.1.96
+ * debian/control
+ - Bump standard-version to 4.5.1. No changes are required.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Wed, 18 Aug 2021 20:52:46 +0900
+
+sentencepiece (0.1.95-1) unstable; urgency=medium
+
+ * New upstream version 0.1.95
+ * debian/patches/support-python-module-in-place.patch
+ - Fix undefined symbol when importing python module (Closes: #979040)
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Thu, 11 Feb 2021 17:36:23 +0900
+
+sentencepiece (0.1.94-2) unstable; urgency=medium
+
+ * Fix FTBFS on armel/mipsel (Closes: #977235)
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Wed, 16 Dec 2020 21:18:15 +0900
+
+sentencepiece (0.1.94-1) unstable; urgency=medium
+
+ * New upstream version 0.1.94
+ * debian/patches/support-python-module-in-place.patch
+ - Refresh path to build python module.
+ * debian/patches/fix-ftbfs-ports.patch
+ debian/patches/mutiarch-support.patch
+ - Remove needless patch because these patch was merged
+ to google/sentencepiece.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Wed, 28 Oct 2020 21:02:07 +0900
+
+sentencepiece (0.1.93-1) unstable; urgency=medium
+
+ * New upstream version 0.1.93
+ * debian/source/lintian-overrides
+ - Remove needless override.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Thu, 15 Oct 2020 21:32:05 +0900
+
+sentencepiece (0.1.92-3) unstable; urgency=medium
+
+ * debian/patches/fix-ftbfs-ports.patch
+ - Fix FTBFS on powerpc
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Sat, 03 Oct 2020 20:48:27 +0900
+
+sentencepiece (0.1.92-2) unstable; urgency=medium
+
+ * debian/patches/0002-Change-in-order-to-build-Python-modules-in-place.patch
+ - Fix FTBFS on hurd-i386
+ * debian/patches/0004-Fix-FTBFS-on-armel-and-mipsel.patch
+ - Fix missing dependency to atomic library (powerpc,m68k,sh4)
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Sat, 26 Sep 2020 20:27:17 +0900
+
+sentencepiece (0.1.92-1) unstable; urgency=medium
+
+ * New upstream version 0.1.92
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Fri, 19 Jun 2020 19:38:49 +0900
+
+sentencepiece (0.1.91-1) unstable; urgency=medium
+
+ * New upstream version 0.1.91
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Fri, 22 May 2020 15:17:42 +0900
+
+sentencepiece (0.1.90-3) unstable; urgency=medium
+
+ * debian/patches/0004-Fix-FTBFS-on-armel-and-mipsel.patch
+ - Refresh patch to fix FTBFS.
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Sun, 17 May 2020 09:02:23 +0900
+
+sentencepiece (0.1.90-2) unstable; urgency=medium
+
+ * debian/patches/0004-Fix-FTBFS-on-armel-and-mipsel.patch
+ - Add patch to fix FTBFS on mipsel and armel
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Sat, 16 May 2020 16:16:45 +0900
+
+sentencepiece (0.1.90-1) unstable; urgency=medium
+
+ * New upstream version 0.1.90
+ * debian/control
+ - Update Uploaders:
+ - Bump standard-version to 4.5.0
+ - Bump compat version to 13.
+ * debian/source/lintian-overrides
+ - Fix false positive source-is-missing
+ * debian/patches/0003-Disable-static-library-explicitly.patch
+ - Disable to build static library
+
+ -- Kentaro Hayashi <kenhys@xdump.org> Wed, 13 May 2020 19:09:34 +0900
+
+sentencepiece (0.1.84-1) unstable; urgency=medium
+
+ * New upstream version 0.1.84 (Closes: #939860)
+
+ [ TSUCHIYA Masatoshi ]
+ * Initial packaging tasks.
+ * Remove pipeline configurations for BitBucket.
+
+ [ Kentaro Hayashi ]
+ * debian/gbp.conf
+ - Add basic configuration about debian-branch
+ * debian/watch
+ - Add missing watch file to detect a new release
+ * debian/control
+ - Update deprecated Priority: to optional
+ - Add Vcs-* fields
+ - Fix W: sentencepiece: description-synopsis-starts-with-article
+ - Bump standard version to 4.4.1
+ - Update Vcs-* under science-team
+ - Bump up compatibility level
+ - Drop python2 support
+ * debian/copyright
+ - Use https://
+ - Update copyright about third party modules
+ * debian/rules
+ - Enable hardening
+ * debian/salsa-ci.yml
+ - Add Salsa CI configuration
+
+ -- Kentaro Hayashi <hayashi@clear-code.com> Thu, 17 Oct 2019 13:33:34 +0900
--- /dev/null
+Source: sentencepiece
+Section: science
+Priority: optional
+Maintainer: Debian Science Maintainers <debian-science-maintainers@lists.alioth.debian.org>
+Uploaders:
+ TSUCHIYA Masatoshi <tsuchiya@namazu.org>,
+ Kentaro Hayashi <kenhys@xdump.org>
+Build-Depends:
+ debhelper-compat (= 13),
+ protobuf-compiler,
+ libprotobuf-dev,
+ dh-python,
+ python3-all-dev,
+ quilt,
+ cmake,
+ python3-setuptools
+Standards-Version: 4.6.1
+Homepage: https://github.com/google/sentencepiece
+Vcs-Browser: https://salsa.debian.org/science-team/sentencepiece
+Vcs-Git: https://salsa.debian.org/science-team/sentencepiece.git
+Rules-Requires-Root: no
+
+Package: sentencepiece
+Architecture: any
+Depends: ${shlibs:Depends}, ${misc:Depends}
+Description: Unsupervised text tokenizer and detokenizer
+ SentencePiece is an unsupervised text tokenizer/detokenizer mainly
+ designed for Neural Network-based text generation systems where the
+ vocabulary size is predetermined prior to the neural model training.
+
+Package: libsentencepiece0
+Section: libs
+Architecture: any
+Depends: ${shlibs:Depends}, ${misc:Depends}
+Description: Library files of SentencePiece
+ SentencePiece is an unsupervised text tokenizer/detokenizer mainly
+ designed for Neural Network-based text generation systems where the
+ vocabulary size is predetermined prior to the neural model training.
+
+Package: libsentencepiece-dev
+Section: libdevel
+Architecture: any
+Depends: libsentencepiece0 (= ${binary:Version}), ${misc:Depends}
+Description: Header files of SentencePiece
+ SentencePiece is an unsupervised text tokenizer/detokenizer mainly
+ designed for Neural Network-based text generation systems where the
+ vocabulary size is predetermined prior to the neural model training.
+
+Package: python3-sentencepiece
+Section: python
+Architecture: any
+Depends:
+ ${shlibs:Depends},
+ ${misc:Depends},
+ ${python3:Depends}
+Description: SentencePiece binding for Python3
+ SentencePiece is an unsupervised text tokenizer/detokenizer mainly
+ designed for Neural Network-based text generation systems where the
+ vocabulary size is predetermined prior to the neural model training.
+ .
+ python3-sentencepiece is its binding for Python3.
--- /dev/null
+Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
+Upstream-Name: sentencepiece
+Source: https://github.com/google/sentencepiece
+
+Files: *
+Copyright: 2017 Taku Kudo <taku@chasen.org>
+License: Apache-2.0
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+ .
+ http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied. See the License for the specific language governing
+ permissions and limitations under the License.
+
+Files: debian/*
+Copyright:
+ 2016 TSUCHIYA Masatoshi <tsuchiya@namazu.org>
+ 2019-2022 Kentaro Hayashi <kenhys@xdump.org>
+License: GPL-2+
+ This package is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+ .
+ This package is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+ .
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>
+ .
+ On Debian systems, the complete text of the GNU General
+ Public License version 2 can be found in "/usr/share/common-licenses/GPL-2".
+
+Files: third_party/esaxx/*
+Copyright: 2010 Daisuke Okanohara
+License: MIT
+
+Files: third_party/darts_clone/*
+Copyright: 2008-2011, Susumu Yata
+License: BSD-3-clause
+
+Files: third_party/protobuf-lite/*
+Copyright: 2008 Google Inc.
+License: BSD-3-clause
+
+Files: data/Scripts.txt
+Copyright: 1991-2016 Unicode, Inc.
+License: Unicode
+ COPYRIGHT AND PERMISSION NOTICE
+ .
+ Copyright © 1991-2016 Unicode, Inc. All rights reserved.
+ Distributed under the Terms of Use in https://www.unicode.org/copyright.html.
+ .
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of the Unicode data files and any associated documentation
+ (the "Data Files") or Unicode software and any associated documentation
+ (the "Software") to deal in the Data Files or Software
+ without restriction, including without limitation the rights to use,
+ copy, modify, merge, publish, distribute, and/or sell copies of
+ the Data Files or Software, and to permit persons to whom the Data Files
+ or Software are furnished to do so, provided that either
+ (a) this copyright and permission notice appear with all copies
+ of the Data Files or Software, or
+ (b) this copyright and permission notice appear in associated
+ Documentation.
+ .
+ THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF
+ ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE
+ WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ NONINFRINGEMENT OF THIRD PARTY RIGHTS.
+ IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS
+ NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
+ DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
+ DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
+ TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
+ PERFORMANCE OF THE DATA FILES OR SOFTWARE.
+ .
+ Except as contained in this notice, the name of a copyright holder
+ shall not be used in advertising or otherwise to promote the sale,
+ use or other dealings in these Data Files or Software without prior
+ written authorization of the copyright holder.
+
+Files: data/botchan.txt
+Copyright: Kin-nosuke Natsume
+License: public-domain
+ Written by Kin-nosuke Natume and put into the public domain.
+ It's transalted by Yasotaro Morri and published by Project Gutenberg.
+
+Files: data/wagahaiwa_nekodearu.txt
+Copyright: Kin-nosuke Natsume
+License: public-domain
+ Written by Kin-nosuke Natume and put into the public domain.
+ It's digitized by Aozora Bunko collabolator and published by Aozora Bunko.
+
+License: MIT
+ Permission is hereby granted, free of charge, to any person
+ obtaining a copy of this software and associated documentation
+ files (the "Software"), to deal in the Software without
+ restriction, including without limitation the rights to use,
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
+ copies of the Software, and to permit persons to whom the
+ Software is furnished to do so, subject to the following
+ conditions:
+ .
+ The above copyright notice and this permission notice shall be
+ included in all copies or substantial portions of the Software.
+ .
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ OTHER DEALINGS IN THE SOFTWARE.
+
+License: BSD-3-clause
+ Redistribution and use in source and binary forms, with or without
+ modificatio n, are permitted provided that the following conditions
+ are met:
+ .
+ - Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ - Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in the
+ documentation and/or other materials provided with the
+ distribution.
+ - Neither the name of the <ORGANIZATION> nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+ .
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
--- /dev/null
+[DEFAULT]
+debian-branch = master
+
--- /dev/null
+usr/lib/*/lib*.so
+usr/lib/*/pkgconfig/*
+usr/include/*
--- /dev/null
+usr/lib/*/lib*.so.*
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 8 Jun 2022 02:22:21 +0900
+Subject: update python wrapper.
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/make_py_wheel.sh | 73 -
+ python/make_py_wheel_mac.sh | 89 -
+ python/once.h | 157 --
+ python/src/sentencepiece/__init__.py | 293 ++-
+ python/src/sentencepiece/sentencepiece.i | 648 +++++-
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 2383 +++++++++++++++--------
+ python/test/sentencepiece_test.py | 424 ++--
+ 7 files changed, 2575 insertions(+), 1492 deletions(-)
+ delete mode 100755 python/make_py_wheel.sh
+ delete mode 100755 python/make_py_wheel_mac.sh
+ delete mode 100644 python/once.h
+
+diff --git a/python/make_py_wheel.sh b/python/make_py_wheel.sh
+deleted file mode 100755
+index 2e123ce..0000000
+--- a/python/make_py_wheel.sh
++++ /dev/null
+@@ -1,73 +0,0 @@
+-#!/bin/bash
+-# Copyright 2018 Google Inc.
+-#
+-# Licensed under the Apache License, Version 2.0 (the "License");
+-# you may not use this file except in compliance with the License.
+-# You may obtain a copy of the License at
+-#
+-# http://www.apache.org/licenses/LICENSE-2.0
+-#
+-# Unless required by applicable law or agreed to in writing, software
+-# distributed under the License is distributed on an "AS IS" BASIS,
+-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-# See the License for the specific language governing permissions and
+-# limitations under the License.!
+-set -e # exit immediately on error
+-set -x # display all commands
+-
+-CMAKE_VERSION=3.12.0
+-
+-run_docker() {
+- cd `dirname $0`
+- docker pull $1
+- docker run --rm -ti --name py_sentencepiece \
+- -v `pwd`/../:/sentencepiece -w /sentencepiece/python \
+- -td $1 /bin/bash
+- docker exec py_sentencepiece bash -c "./make_py_wheel.sh native $2"
+- docker stop py_sentencepiece
+-}
+-
+-build() {
+- TRG=$1
+- rm -fr build
+- mkdir -p build
+- cd build
+-
+- # Install sentencepiece
+- cmake ../.. -DSPM_ENABLE_SHARED=OFF
+- make -j4
+- make install
+- cd ..
+-
+- for i in /opt/python/*
+- do
+- export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib
+- $i/bin/python setup.py clean
+- $i/bin/python setup.py bdist
+- strip build/*/*/*.so
+- $i/bin/python setup.py bdist_wheel
+- $i/bin/python setup.py test
+- rm -fr build
+- rm -fr *.so
+- done
+-
+- cd dist
+- for i in *${TRG}.whl
+- do
+- auditwheel repair $i
+- done
+-
+- mv -f wheelhouse/*${TRG}.whl .
+-
+- cd ..
+- rm -fr build
+-}
+-
+-if [ "$1" = "native" ]; then
+- build $2
+-elif [ "$#" -eq 1 ]; then
+- run_docker quay.io/pypa/manylinux2014_${1} ${1}
+-else
+- run_docker quay.io/pypa/manylinux2014_i686 i686
+- run_docker quay.io/pypa/manylinux2014_x86_64 x86_64
+-fi
+diff --git a/python/make_py_wheel_mac.sh b/python/make_py_wheel_mac.sh
+deleted file mode 100755
+index bed7366..0000000
+--- a/python/make_py_wheel_mac.sh
++++ /dev/null
+@@ -1,89 +0,0 @@
+-#!/bin/bash
+-# Copyright 2018 Google Inc.
+-#
+-# Licensed under the Apache License, Version 2.0 (the "License");
+-# you may not use this file except in compliance with the License.
+-# You may obtain a copy of the License at
+-#
+-# http://www.apache.org/licenses/LICENSE-2.0
+-#
+-# Unless required by applicable law or agreed to in writing, software
+-# distributed under the License is distributed on an "AS IS" BASIS,
+-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-# See the License for the specific language governing permissions and
+-# limitations under the License.!
+-
+-set -e # exit immediately on error
+-set -x # display all commands
+-
+-build_python() {
+- VERSION=$1
+- URL=$2
+- INSTALL_PATH="/Library/Frameworks/Python.framework/Versions/${VERSION}/bin"
+- CURRENT_PATH=${PATH}
+-
+- curl -L -o python.pkg ${URL}
+- sudo installer -pkg python.pkg -target /
+-
+- if [ -f "${INSTALL_PATH}/python3" ]; then
+- ln -s ${INSTALL_PATH}/python3 ${INSTALL_PATH}/python
+- ln -s ${INSTALL_PATH}/python3-config ${INSTALL_PATH}/python-config
+- ln -s ${INSTALL_PATH}/pip3 ${INSTALL_PATH}/pip
+- fi
+-
+- export PATH="${INSTALL_PATH}:${CURRENT_PATH}"
+- ls -l ${INSTALL_PATH}
+- which python
+- which pip
+- python --version
+- curl -L -o get-pip.py https://bootstrap.pypa.io/pip/3.6/get-pip.py
+- sudo python ./get-pip.py --no-setuptools --no-wheel --ignore-installed
+- pip install --upgrade setuptools
+- pip install wheel
+- pip install delocate
+- python setup.py clean
+- python setup.py bdist_wheel --plat-name=macosx_10_6_x86_64
+- python setup.py test
+- delocate-listdeps dist/*.whl
+- delocate-wheel -w dist/delocated_wheel dist/*.whl
+- export PATH="${CURRENT_PATH}"
+-
+- ls -l dist/delocated_wheel
+- rm -fr build
+- rm -fr *.so
+- rm -fr dist/*.whl
+- rm -fr python.pkg
+-}
+-
+-build() {
+- cd python
+- rm -fr build
+- mkdir -p build
+- cd build
+-
+- # Install sentencepiece
+- cmake ../.. -DSPM_ENABLE_SHARED=OFF -DSPM_NO_THREADLOCAL=ON
+- make -j4 VERBOSE=1
+- make install
+- cd ..
+-
+- mkdir -p dist/delocated_wheel
+-
+-# build_python 2.7 https://www.python.org/ftp/python/2.7.15/python-2.7.15-macosx10.6.pkg
+-# latest pip doesn't support Py3.4
+- # build_python 3.4 https://www.python.org/ftp/python/3.4.4/python-3.4.4-macosx10.6.pkg
+- curl -L -O https://bootstrap.pypa.io/pip/3.5/get-pip.py
+- build_python 3.5 https://www.python.org/ftp/python/3.5.4/python-3.5.4-macosx10.6.pkg
+-
+- curl -L -O https://bootstrap.pypa.io/get-pip.py
+- build_python 3.6 https://www.python.org/ftp/python/3.6.6/python-3.6.6-macosx10.6.pkg
+- build_python 3.7 https://www.python.org/ftp/python/3.7.9/python-3.7.9-macosx10.9.pkg
+- build_python 3.8 https://www.python.org/ftp/python/3.8.6/python-3.8.6-macosx10.9.pkg
+- build_python 3.9 https://www.python.org/ftp/python/3.9.0/python-3.9.0-macosx10.9.pkg
+-
+- cd ..
+-
+- rm -fr build
+-}
+-
+-build
+diff --git a/python/once.h b/python/once.h
+deleted file mode 100644
+index fc7553a..0000000
+--- a/python/once.h
++++ /dev/null
+@@ -1,157 +0,0 @@
+-// Protocol Buffers - Google's data interchange format
+-// Copyright 2008 Google Inc. All rights reserved.
+-// https://developers.google.com/protocol-buffers/
+-//
+-// Redistribution and use in source and binary forms, with or without
+-// modification, are permitted provided that the following conditions are
+-// met:
+-//
+-// * Redistributions of source code must retain the above copyright
+-// notice, this list of conditions and the following disclaimer.
+-// * Redistributions in binary form must reproduce the above
+-// copyright notice, this list of conditions and the following disclaimer
+-// in the documentation and/or other materials provided with the
+-// distribution.
+-// * Neither the name of Google Inc. nor the names of its
+-// contributors may be used to endorse or promote products derived from
+-// this software without specific prior written permission.
+-//
+-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+-// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+-// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+-// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+-// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+-// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+-// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+-// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+-// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+-// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+-// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+-
+-// Author: kenton@google.com (Kenton Varda)
+-//
+-// emulates google3/base/once.h
+-//
+-// This header is intended to be included only by internal .cc files and
+-// generated .pb.cc files. Users should not use this directly.
+-//
+-// This is basically a portable version of pthread_once().
+-//
+-// This header declares:
+-// * A type called ProtobufOnceType.
+-// * A macro GOOGLE_PROTOBUF_DECLARE_ONCE() which declares a variable of type
+-// ProtobufOnceType. This is the only legal way to declare such a variable.
+-// The macro may only be used at the global scope (you cannot create local or
+-// class member variables of this type).
+-// * A function GoogleOnceInit(ProtobufOnceType* once, void (*init_func)()).
+-// This function, when invoked multiple times given the same ProtobufOnceType
+-// object, will invoke init_func on the first call only, and will make sure
+-// none of the calls return before that first call to init_func has finished.
+-// * The user can provide a parameter which GoogleOnceInit() forwards to the
+-// user-provided function when it is called. Usage example:
+-// int a = 10;
+-// GoogleOnceInit(&my_once, &MyFunctionExpectingIntArgument, &a);
+-// * This implementation guarantees that ProtobufOnceType is a POD (i.e. no
+-// static initializer generated).
+-//
+-// This implements a way to perform lazy initialization. It's more efficient
+-// than using mutexes as no lock is needed if initialization has already
+-// happened.
+-//
+-// Example usage:
+-// void Init();
+-// GOOGLE_PROTOBUF_DECLARE_ONCE(once_init);
+-//
+-// // Calls Init() exactly once.
+-// void InitOnce() {
+-// GoogleOnceInit(&once_init, &Init);
+-// }
+-//
+-// Note that if GoogleOnceInit() is called before main() has begun, it must
+-// only be called by the thread that will eventually call main() -- that is,
+-// the thread that performs dynamic initialization. In general this is a safe
+-// assumption since people don't usually construct threads before main() starts,
+-// but it is technically not guaranteed. Unfortunately, Win32 provides no way
+-// whatsoever to statically-initialize its synchronization primitives, so our
+-// only choice is to assume that dynamic initialization is single-threaded.
+-
+-#ifndef GOOGLE_PROTOBUF_STUBS_ONCE_H__
+-#define GOOGLE_PROTOBUF_STUBS_ONCE_H__
+-
+-#include <sched.h>
+-#include <atomic>
+-#include <mutex>
+-#include <utility>
+-
+-namespace google {
+-namespace protobuf {
+-namespace internal {
+-
+-using once_flag = std::atomic<int>;
+-
+-template <typename Callable, typename... Args>
+-void my_call_once(once_flag& once, Callable&& fn, Args&&... args) {
+- enum CallOnceState {
+- ONCE_INIT = 0,
+- ONCE_RUNNING = 1,
+- ONCE_DONE = 2,
+- };
+-
+- int expected_state = ONCE_INIT;
+- if (once.compare_exchange_strong(expected_state, ONCE_RUNNING)) {
+- fn(std::forward<Args>(args)...);
+- once.store(ONCE_DONE);
+- return;
+- }
+-
+- if (expected_state == ONCE_DONE) {
+- return;
+- }
+-
+- while (once.load() == ONCE_RUNNING) {
+- sched_yield();
+- }
+-}
+-
+-template <typename... Args>
+-void call_once(Args&&... args) {
+- my_call_once(std::forward<Args>(args)...);
+-}
+-} // namespace internal
+-
+-// TODO(gerbens) remove this once third_party is fully extracted
+-using ProtobufOnceType = internal::once_flag;
+-
+-inline void GoogleOnceInit(ProtobufOnceType* once, void (*init_func)()) {
+- internal::my_call_once(*once, init_func);
+-}
+-
+-template <typename Arg>
+-inline void GoogleOnceInitArg(ProtobufOnceType* once, void (*init_func)(Arg*),
+- Arg* arg) {
+- internal::my_call_once(*once, init_func, arg);
+-}
+-
+-class GoogleOnceDynamic {
+- public:
+- // If this->Init() has not been called before by any thread,
+- // execute (*func_with_arg)(arg) then return.
+- // Otherwise, wait until that prior invocation has finished
+- // executing its function, then return.
+- template <typename T>
+- void Init(void (*func_with_arg)(T*), T* arg) {
+- GoogleOnceInitArg<T>(&this->state_, func_with_arg, arg);
+- }
+-
+- private:
+- ProtobufOnceType state_;
+-};
+-
+-#define GOOGLE_PROTOBUF_ONCE_TYPE ::google::protobuf::ProtobufOnceType
+-#define GOOGLE_PROTOBUF_DECLARE_ONCE(NAME) \
+- ::google::protobuf::ProtobufOnceType NAME
+-
+-} // namespace protobuf
+-} // namespace google
+-
+-#endif // GOOGLE_PROTOBUF_STUBS_ONCE_H__
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index fdb5976..cba3b70 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -87,48 +87,15 @@ class SentencePieceProcessor(object):
+ def LoadVocabulary(self, filename, threshold):
+ return _sentencepiece.SentencePieceProcessor_LoadVocabulary(self, filename, threshold)
+
+- def EncodeAsPieces(self, input):
+- return _sentencepiece.SentencePieceProcessor_EncodeAsPieces(self, input)
+-
+- def EncodeAsIds(self, input):
+- return _sentencepiece.SentencePieceProcessor_EncodeAsIds(self, input)
+-
+- def NBestEncodeAsPieces(self, input, nbest_size):
+- return _sentencepiece.SentencePieceProcessor_NBestEncodeAsPieces(self, input, nbest_size)
+-
+- def NBestEncodeAsIds(self, input, nbest_size):
+- return _sentencepiece.SentencePieceProcessor_NBestEncodeAsIds(self, input, nbest_size)
+-
+- def SampleEncodeAsPieces(self, input, nbest_size, alpha):
+- return _sentencepiece.SentencePieceProcessor_SampleEncodeAsPieces(self, input, nbest_size, alpha)
+-
+- def SampleEncodeAsIds(self, input, nbest_size, alpha):
+- return _sentencepiece.SentencePieceProcessor_SampleEncodeAsIds(self, input, nbest_size, alpha)
+-
+ def SampleEncodeAndScoreAsPieces(self, input, num_samples, theta, wor, include_best):
+ return _sentencepiece.SentencePieceProcessor_SampleEncodeAndScoreAsPieces(self, input, num_samples, theta, wor, include_best)
+
+ def SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best):
+ return _sentencepiece.SentencePieceProcessor_SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best)
+
+- def DecodePieces(self, pieces):
+- return _sentencepiece.SentencePieceProcessor_DecodePieces(self, pieces)
+-
+ def CalculateEntropy(self, text, theta):
+ return _sentencepiece.SentencePieceProcessor_CalculateEntropy(self, text, theta)
+
+- def EncodeAsSerializedProto(self, input):
+- return _sentencepiece.SentencePieceProcessor_EncodeAsSerializedProto(self, input)
+-
+- def SampleEncodeAsSerializedProto(self, input, nbest_size, alpha):
+- return _sentencepiece.SentencePieceProcessor_SampleEncodeAsSerializedProto(self, input, nbest_size, alpha)
+-
+- def NBestEncodeAsSerializedProto(self, input, nbest_size):
+- return _sentencepiece.SentencePieceProcessor_NBestEncodeAsSerializedProto(self, input, nbest_size)
+-
+- def DecodePiecesAsSerializedProto(self, pieces):
+- return _sentencepiece.SentencePieceProcessor_DecodePiecesAsSerializedProto(self, pieces)
+-
+ def GetPieceSize(self):
+ return _sentencepiece.SentencePieceProcessor_GetPieceSize(self)
+
+@@ -171,30 +138,69 @@ class SentencePieceProcessor(object):
+ def LoadFromFile(self, arg):
+ return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
+
+- def DecodeIdsWithCheck(self, ids):
+- return _sentencepiece.SentencePieceProcessor_DecodeIdsWithCheck(self, ids)
++ def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _EncodeAsPieces(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsPieces(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _EncodeAsSerializedProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsSerializedProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _EncodeAsIdsBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsIdsBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _EncodeAsPiecesBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsPiecesBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _EncodeAsSerializedProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsSerializedProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def DecodeIdsAsSerializedProtoWithCheck(self, ids):
+- return _sentencepiece.SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck(self, ids)
++ def _DecodeIds(self, ids):
++ return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids)
+
+- def _EncodeAsIds(self, text, enabele_sampling, nbest_size, alpha, add_bos, add_eos, reverse):
+- return _sentencepiece.SentencePieceProcessor__EncodeAsIds(self, text, enabele_sampling, nbest_size, alpha, add_bos, add_eos, reverse)
++ def _DecodePieces(self, pieces):
++ return _sentencepiece.SentencePieceProcessor__DecodePieces(self, pieces)
+
+- def _EncodeAsPieces(self, text, enabele_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
+- return _sentencepiece.SentencePieceProcessor__EncodeAsPieces(self, text, enabele_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ def _DecodeIdsAsSerializedProto(self, ids):
++ return _sentencepiece.SentencePieceProcessor__DecodeIdsAsSerializedProto(self, ids)
+
+- def _NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse):
+- return _sentencepiece.SentencePieceProcessor__NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse)
++ def _DecodePiecesAsSerializedProto(self, pieces):
++ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsSerializedProto(self, pieces)
++
++ def _DecodeIdsBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodeIdsBatch(self, ins, num_threads)
++
++ def _DecodeIdsAsSerializedProtoBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch(self, ins, num_threads)
++
++ def _DecodePiecesBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodePiecesBatch(self, ins, num_threads)
++
++ def _DecodePiecesAsSerializedProtoBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(self, ins, num_threads)
++
++ def _NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
+
+ def _NBestEncodeAsPieces(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsPieces(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def _SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse):
+- return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse)
++ def _NBestEncodeAsSerializedProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsSerializedProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
+
+ def _SampleEncodeAndScoreAsPieces(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsPieces(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
+
++ def _CalculateEntropy(self, text, theta):
++ return _sentencepiece.SentencePieceProcessor__CalculateEntropy(self, text, theta)
++
++ def _CalculateEntropyBatch(self, ins, theta, num_threads):
++ return _sentencepiece.SentencePieceProcessor__CalculateEntropyBatch(self, ins, theta, num_threads)
++
+ def Init(self,
+ model_file=None,
+ model_proto=None,
+@@ -205,7 +211,8 @@ class SentencePieceProcessor(object):
+ emit_unk_piece=False,
+ enable_sampling=False,
+ nbest_size=-1,
+- alpha=0.1):
++ alpha=0.1,
++ num_threads=1):
+ """Initialzie sentencepieceProcessor.
+
+ Args:
+@@ -225,6 +232,7 @@ class SentencePieceProcessor(object):
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and dropout probability of
+ merge operations for BPE-dropout.
++ num_threads: number of threads in batch processing.
+ """
+
+ _sentencepiece_processor_init_native(self)
+@@ -236,6 +244,7 @@ class SentencePieceProcessor(object):
+ self._enable_sampling = enable_sampling
+ self._nbest_size = nbest_size
+ self._alpha = alpha
++ self._num_threads = num_threads
+ if model_file or model_proto:
+ self.Load(model_file=model_file, model_proto=model_proto)
+
+@@ -249,7 +258,8 @@ class SentencePieceProcessor(object):
+ emit_unk_piece=None,
+ enable_sampling=None,
+ nbest_size=None,
+- alpha=None):
++ alpha=None,
++ num_threads=None):
+ """Encode text input to segmented ids or tokens.
+
+ Args:
+@@ -268,6 +278,7 @@ class SentencePieceProcessor(object):
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and merge probability for
+ BPE-dropout (probablity 'p' in BPE-dropout paper).
++ num_threads: the number of threads used in the batch processin (Default = 1).
+ """
+
+ if out_type is None:
+@@ -286,6 +297,8 @@ class SentencePieceProcessor(object):
+ nbest_size = self._nbest_size
+ if alpha is None:
+ alpha = self._alpha
++ if num_threads is None:
++ num_threads = self._num_threads
+
+ if enable_sampling == True and (nbest_size is None or nbest_size == 0 or
+ nbest_size == 1 or alpha is None):
+@@ -296,18 +309,59 @@ class SentencePieceProcessor(object):
+ 'instead of nbest segmentations.'
+ )
+
+- def _encode(text):
+- if out_type is int:
+- return self._EncodeAsIds(text, enable_sampling, nbest_size,
+- alpha, add_bos, add_eos, reverse)
+- else:
+- return self._EncodeAsPieces(text, enable_sampling, nbest_size,
+- alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
+
+ if type(input) is list:
+- return [_encode(n) for n in input]
++ if out_type is int:
++ return self._EncodeAsIdsBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._EncodeAsPiecesBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._EncodeAsSerializedProtoBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ if out_type is int:
++ return self._EncodeAsIds(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._EncodeAsPieces(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._EncodeAsSerializedProto(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ raise RuntimeError('unknown out_type={}'.format(out_type))
++ return None
+
+- return _encode(input)
++
++ def EncodeAsPieces(self, input, **kwargs):
++ return self.Encode(input=input, out_type=str, **kwargs)
++
++
++ def EncodeAsIds(self, input, **kwargs):
++ return self.Encode(input=input, out_type=int, **kwargs)
++
++
++ def EncodeAsSerializedProto(self, input, **kwargs):
++ return self.Encode(input=input, out_type='proto', **kwargs)
++
++
++ def SampleEncodeAsPieces(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type=str, enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsIds(self, input, nbest_size=None, alpha=None,**kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type=int, enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsSerializedProto(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type='proto', enable_sampling=True, **kwargs)
+
+
+ def NBestEncode(self,
+@@ -348,9 +402,14 @@ class SentencePieceProcessor(object):
+
+ def _encode(text):
+ if out_type is int:
+- return self._NBestEncodeAsIds(text, nbest_size, add_bos, add_eos, reverse)
+- else:
+- return self._NBestEncodeAsPieces(text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
++ return self._NBestEncodeAsIds(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._NBestEncodeAsPieces(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._NBestEncodeAsSerializedProto(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+ return [_encode(n) for n in input]
+@@ -358,6 +417,21 @@ class SentencePieceProcessor(object):
+ return _encode(input)
+
+
++ def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type=str, **kwargs)
++
++
++ def NBestEncodeAsIds(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type=int, **kwargs)
++
++
++ def NBestEncodeAsSerializedProto(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type='proto', **kwargs)
++
++
+ def SampleEncodeAndScore(self,
+ input,
+ out_type=None,
+@@ -373,7 +447,7 @@ class SentencePieceProcessor(object):
+
+ Args:
+ input: input string. accepsts list of string.
+- out_type: output type. int or str.
++ out_type: output type. int or str or 'proto'.
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+@@ -413,7 +487,7 @@ class SentencePieceProcessor(object):
+ def _encode(text):
+ if out_type is int:
+ return self._SampleEncodeAndScoreAsIds(text, num_samples, theta, wor, include_best,
+- add_bos, add_eos, reverse)
++ add_bos, add_eos, reverse, emit_unk_piece)
+ else:
+ return self._SampleEncodeAndScoreAsPieces(text, num_samples, theta, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+@@ -424,35 +498,90 @@ class SentencePieceProcessor(object):
+ return _encode(input)
+
+
+- def Decode(self, input):
+- """Decode processed id or token sequences."""
++ def Decode(self, input, out_type=str, num_threads=None):
++ """Decode processed id or token sequences.
++
++ Args:
++ out_type: output type. str or 'proto' (Default = str)
++ num_threads: the number of threads used in the batch processin (Default = 1).
++ """
++
++ if num_threads is None:
++ num_threads = self._num_threads
++
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
+
+ if not input:
+- return self.DecodeIds([])
+- elif type(input) is int:
+- return self.DecodeIdsWithCheck([input])
+- elif type(input) is str:
+- return self.DecodePieces([input])
++ return ''
++
++ if out_type is str:
++ if type(input) is int:
++ return self._DecodeIds([input])
++ if type(input) is str:
++ return self._DecodePieces([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIds(input)
++ if type(input[0]) is str:
++ return self._DecodePieces(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesBatch(input, num_threads)
++
++ if out_type == 'proto':
++ if type(input) is int:
++ return self._DecodeIdsAsSerializedProto([input])
++ if type(input) is str:
++ return self._DecodePiecesAsSerializedProto([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIdsAsSerializedProto(input)
++ if type(input[0]) is str:
++ return self._DecodePiecesAsSerializedProto(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsAsSerializedProtoBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesAsSerializedProtoBatch(input, num_threads)
++
++
++ raise RuntimeError('unknown output or input type')
++ return None
+
+- def _decode(input):
+- if not input:
+- return self.DecodeIds([])
+- if type(input[0]) is int:
+- return self.DecodeIdsWithCheck(input)
+- return self.DecodePieces(input)
+
+- if type(input[0]) is list:
+- return [_decode(n) for n in input]
++ def DecodePieces(self, input, out_type=str, **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+- return _decode(input)
+
++ def DecodeIds(self, input, out_type=str, **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodePiecesAsSerializedProto(self, input, out_type='proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+- def Entropy(self, input, theta):
+- """Calculate sentence entropy"""
+
++ def DecodeIdsAsSerializedProto(self, input, out_type='proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def CalculateEntropy(self, input, theta, num_threads=None):
++ """Calculate sentence entropy"""
+ if type(input) is list:
+- return [self.CalculateEntropy(n, theta) for n in input]
+- return self.CalculateEntropy(input, theta)
++ if num_threads is None:
++ num_threads = self._num_threads
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
++ return self._CalculateEntropyBatch(input, theta, num_threads)
++
++ return self._CalculateEntropy(input, theta)
+
+
+ def piece_size(self):
+@@ -642,8 +771,6 @@ setattr(SentencePieceProcessor, '__init__', SentencePieceProcessor.Init)
+
+ SentencePieceProcessor.Tokenize = SentencePieceProcessor.Encode
+ SentencePieceProcessor.Detokenize = SentencePieceProcessor.Decode
+-SentencePieceProcessor.DecodeIds = SentencePieceProcessor.DecodeIdsWithCheck
+-SentencePieceProcessor.DecodeIdsAsSerializedProto = SentencePieceProcessor.DecodeIdsAsSerializedProtoWithCheck
+
+ for m in [
+ 'PieceToId', 'IdToPiece', 'GetScore', 'IsUnknown', 'IsControl', 'IsUnused',
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 21bb7cf..3a822bc 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -2,9 +2,13 @@
+ %include exception.i
+
+ %{
++#include <iostream>
+ #include <algorithm>
++#include <functional>
+ #include <limits>
+ #include <cmath>
++#include <thread>
++#include <vector>
+ #include <sentencepiece_processor.h>
+ #include <sentencepiece_trainer.h>
+
+@@ -12,6 +16,8 @@ namespace {
+ PyObject* kUnicodeInput = reinterpret_cast<PyObject* >(0x1);
+ PyObject* kByteInput = reinterpret_cast<PyObject* >(0x2);
+
++using BytesArray = std::vector<sentencepiece::util::bytes>;
++
+ inline void ReleaseResultObject(PyObject *obj) {
+ if (obj != nullptr && obj != kUnicodeInput && obj != kByteInput) {
+ Py_XDECREF(obj);
+@@ -54,7 +60,7 @@ PyObject* MakePyOutputString(const std::string& output,
+ return PyBytes_FromStringAndSize(output.data(), output.size());
+ }
+
+-PyObject* MakePyOutputBytes(const std::string& output) {
++PyObject* MakePyOutputBytes(const sentencepiece::util::bytes& output) {
+ return PyBytes_FromStringAndSize(output.data(), output.size());
+ }
+
+@@ -126,18 +132,18 @@ class PySentenceIterator : public sentencepiece::SentenceIterator {
+ sentencepiece::util::Status status_;
+ };
+
+-void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
+- std::vector<int> *ids,
+- bool add_bos, bool add_eos, bool reverse) {
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ std::vector<int> *ids,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
+ if (!add_bos && !add_eos && !reverse) return;
+ if (reverse) std::reverse(ids->begin(), ids->end());
+ if (add_bos) ids->insert(ids->begin(), sp.bos_id());
+ if (add_eos) ids->push_back(sp.eos_id());
+ }
+
+-void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+- std::vector<std::string> *pieces,
+- bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ std::vector<std::string> *pieces,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
+ if (!add_bos && !add_eos && !reverse && !emit_unk_piece) return;
+ if (reverse) std::reverse(pieces->begin(), pieces->end());
+ if (add_bos) pieces->insert(pieces->begin(), sp.IdToPiece(sp.bos_id()));
+@@ -152,6 +158,98 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ }
+ }
+ }
++
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ sentencepiece::util::bytes *proto,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++ if (add_bos || add_eos || reverse || emit_unk_piece) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kUnimplemented,
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in AsSerialize API");
++ }
++}
++
++inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
++ for (int id : ids) {
++ if (id < 0 || id >= num_pieces) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "piece id is out of range.");
++ }
++ }
++}
++
++inline void CheckIds(const std::vector<std::string> &ids, int num_pieces) {}
++
++class ThreadPool {
++ public:
++ explicit ThreadPool(size_t request_size) :
++ request_size_(request_size) {}
++
++ virtual ~ThreadPool() {
++ for (auto &task : tasks_) {
++ task.join();
++ }
++ }
++
++ void Schedule(std::function<void()> closure) {
++ static constexpr size_t kMinThreadSize = 2;
++ if (request_size_ < kMinThreadSize) {
++ closure();
++ } else {
++ tasks_.emplace_back(closure);
++ }
++ }
++
++ private:
++ size_t request_size_ = 0;
++ std::vector<std::thread> tasks_;
++};
++
++template <typename T>
++inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
++ *num_threads = std::max<int>(1,
++ std::min<int>({*num_threads,
++ static_cast<int>(ins.size()), 256}));
++}
++
++#define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
++ std::vector<OutType> outs(ins.size()); \
++ InitNumThreads(ins, &num_threads); \
++ { \
++ ThreadPool pool(ins.size()); \
++ for (int n = 0; n < num_threads; ++n) { \
++ pool.Schedule([&, n]() { \
++ for (size_t i = n; i < ins.size(); i += num_threads) { \
++ auto out = enable_sampling ? \
++ self->Sample##FuncName(ins[i], \
++ nbest_size, alpha) : \
++ self->FuncName(ins[i]); \
++ RewriteIds(*self, &out, add_bos, add_eos, reverse, \
++ emit_unk_piece); \
++ outs[i] = std::move(out); \
++ } \
++ }); \
++ } \
++ } \
++ return outs;
++
++#define DEFINE_DECODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
++ std::vector<OutType> outs(ins.size()); \
++ InitNumThreads(ins, &num_threads); \
++ { \
++ ThreadPool pool(ins.size()); \
++ for (int n = 0; n < num_threads; ++n) { \
++ pool.Schedule([&, n]() { \
++ for (size_t i = n; i < ins.size(); i += num_threads) { \
++ CheckIds(ins[i], self->GetPieceSize()); \
++ outs[i] = self->FuncName(ins[i]); \
++ } \
++ }); \
++ } \
++ } \
++ return outs;
++
+ } // namespace
+ %}
+
+@@ -171,15 +269,28 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ %ignore sentencepiece::SentencePieceText;
+ %ignore sentencepiece::NormalizerSpec;
+ %ignore sentencepiece::TrainerSpec;
+-
+ %ignore sentencepiece::SentencePieceProcessor::status;
++
+ %ignore sentencepiece::SentencePieceProcessor::Encode;
++%ignore sentencepiece::SentencePieceProcessor::EncodeAsPieces;
++%ignore sentencepiece::SentencePieceProcessor::EncodeAsIds;
++%ignore sentencepiece::SentencePieceProcessor::EncodeAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::SampleEncode;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsIds;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsPieces;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::NBestEncode;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsPieces;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsIds;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScore;
++
+ %ignore sentencepiece::SentencePieceProcessor::Decode;
+ %ignore sentencepiece::SentencePieceProcessor::DecodeIds;
++%ignore sentencepiece::SentencePieceProcessor::DecodePieces;
++%ignore sentencepiece::SentencePieceProcessor::DecodePiecesAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::DecodeIdsAsSerializedProto;
++
+ %ignore sentencepiece::SentencePieceProcessor::model_proto;
+ %ignore sentencepiece::SentencePieceProcessor::Load;
+ %ignore sentencepiece::SentencePieceProcessor::LoadOrDie;
+@@ -200,62 +311,131 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ return $self->Load(arg);
+ }
+
+- std::string DecodeIdsWithCheck(
+- const std::vector<int> &ids) const {
+- const int num_pieces = $self->GetPieceSize();
+- for (int id : ids) {
+- if (id < 0 || id >= num_pieces) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece id is out of range.");
+- }
+- }
+- return $self->DecodeIds(ids);
+- }
+-
+- util::bytes DecodeIdsAsSerializedProtoWithCheck(
+- const std::vector<int> &ids) const {
+- const int num_pieces = $self->GetPieceSize();
+- for (int id : ids) {
+- if (id < 0 || id >= num_pieces) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece id is out of range.");
+- }
+- }
+- return $self->DecodeIdsAsSerializedProto(ids);
+- }
+-
++ /////////////////////////////////////////////////////////////////////////////
++ // EncodeAs* (Single request)
+ std::vector<int> _EncodeAsIds(absl::string_view text,
+- bool enabele_sampling,
++ bool enable_sampling,
+ int nbest_size, float alpha,
+- bool add_bos, bool add_eos, bool reverse) {
+- auto ids = enabele_sampling ?
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ auto ids = enable_sampling ?
+ $self->SampleEncodeAsIds(text, nbest_size, alpha) :
+ $self->EncodeAsIds(text);
+- RewriteIds(*$self, &ids, add_bos, add_eos, reverse);
++ RewriteIds(*$self, &ids, add_bos, add_eos, reverse, emit_unk_piece);
+ return ids;
+ }
+
+ std::vector<std::string> _EncodeAsPieces(absl::string_view text,
+- bool enabele_sampling,
++ bool enable_sampling,
+ int nbest_size, float alpha,
+ bool add_bos, bool add_eos, bool reverse,
+- bool emit_unk_piece) {
+- auto pieces = enabele_sampling ?
++ bool emit_unk_piece) const {
++ auto pieces = enable_sampling ?
+ $self->SampleEncodeAsPieces(text, nbest_size, alpha) :
+ $self->EncodeAsPieces(text);
+- RewritePieces(*$self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*$self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
+ return pieces;
+ }
+
++ sentencepiece::util::bytes _EncodeAsSerializedProto(absl::string_view text,
++ bool enable_sampling,
++ int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ auto proto = enable_sampling ?
++ $self->SampleEncodeAsSerializedProto(text, nbest_size, alpha) :
++ $self->EncodeAsSerializedProto(text);
++ RewriteIds(*$self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
++ return proto;
++ }
++
++ /////////////////////////////////////////////////////////////////////////////
++ // EncodeAs* (Batch request)
++ std::vector<std::vector<int>> _EncodeAsIdsBatch(
++ const std::vector<absl::string_view> &ins, int num_threads,
++ bool enable_sampling, int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsIds,
++ absl::string_view, std::vector<int>);
++ }
++
++ std::vector<std::vector<std::string>> _EncodeAsPiecesBatch(
++ const std::vector<absl::string_view> &ins, int num_threads,
++ bool enable_sampling, int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsPieces,
++ absl::string_view, std::vector<std::string>);
++ }
++
++ BytesArray _EncodeAsSerializedProtoBatch(
++ const std::vector<absl::string_view> &ins, int num_threads,
++ bool enable_sampling, int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsSerializedProto,
++ absl::string_view,
++ sentencepiece::util::bytes);
++ }
++
++ /////////////////////////////////////////////////////////////////////////////
++ // DecodeAs* (Single request)
++ std::string _DecodeIds(const std::vector<int> &ids) const {
++ CheckIds(ids, $self->GetPieceSize());
++ return $self->DecodeIds(ids);
++ }
++
++ std::string _DecodePieces(const std::vector<std::string> &pieces) const {
++ return $self->DecodePieces(pieces);
++ }
++
++ sentencepiece::util::bytes _DecodeIdsAsSerializedProto(
++ const std::vector<int> &ids) const {
++ CheckIds(ids, $self->GetPieceSize());
++ return $self->DecodeIdsAsSerializedProto(ids);
++ }
++
++ sentencepiece::util::bytes _DecodePiecesAsSerializedProto(
++ const std::vector<std::string> &pieces) const {
++ CheckIds(pieces, $self->GetPieceSize());
++ return $self->DecodePiecesAsSerializedProto(pieces);
++ }
++
++ /////////////////////////////////////////////////////////////////////////////
++ // DecodeAs* (Batch request)
++ std::vector<std::string> _DecodeIdsBatch(
++ const std::vector<std::vector<int>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIds, int, std::string);
++ }
++
++ BytesArray _DecodeIdsAsSerializedProtoBatch(
++ const std::vector<std::vector<int>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsSerializedProto, int,
++ sentencepiece::util::bytes);
++ }
++
++ std::vector<std::string> _DecodePiecesBatch(
++ const std::vector<std::vector<std::string>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
++ }
++
++ BytesArray _DecodePiecesAsSerializedProtoBatch(
++ const std::vector<std::vector<std::string>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsSerializedProto, std::string,
++ sentencepiece::util::bytes);
++ }
++
++ ////////////////////////////////////////////////////////////////////////////
++ // NBestEncodeAs* (Single request)
+ std::vector<std::vector<int>>
+ _NBestEncodeAsIds(absl::string_view text,
+ int nbest_size,
+- bool add_bos, bool add_eos, bool reverse) {
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
+ auto idss = $self->NBestEncodeAsIds(text, nbest_size);
+ for (auto &ids : idss) {
+- RewriteIds(*$self, &ids, add_bos, add_eos, reverse);
++ RewriteIds(*$self, &ids, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return idss;
+ }
+@@ -264,40 +444,74 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ _NBestEncodeAsPieces(absl::string_view text,
+ int nbest_size,
+ bool add_bos, bool add_eos, bool reverse,
+- bool emit_unk_piece) {
++ bool emit_unk_piece) const {
+ auto piecess = $self->NBestEncodeAsPieces(text, nbest_size);
+ for (auto &pieces : piecess) {
+- RewritePieces(*$self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*$self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+ }
+
++ sentencepiece::util::bytes _NBestEncodeAsSerializedProto(absl::string_view text,
++ int nbest_size,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ RewriteIds(*$self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return $self->NBestEncodeAsSerializedProto(text, nbest_size);
++ }
++
++ /////////////////////////////////////////////////////////////////////////////
++ // SampleEncodeAndScoreAs* (Single request)
+ std::vector<std::pair<std::vector<int>, float>>
+ _SampleEncodeAndScoreAsIds(absl::string_view text,
+ int num_samples, float theta, bool wor,
+ bool include_best,
+- bool add_bos, bool add_eos, bool reverse) {
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
+ auto idss = $self->SampleEncodeAndScoreAsIds(text, num_samples,
+ theta, wor, include_best);
+ for (auto &ids : idss) {
+- RewriteIds(*$self, &ids.first, add_bos, add_eos, reverse);
++ RewriteIds(*$self, &ids.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return idss;
+ }
+
+- std::vector<std::pair<std::vector<std::string>, float>>
++ std::vector<std::pair<std::vector<std::string>, float>>
+ _SampleEncodeAndScoreAsPieces(absl::string_view text,
+ int num_samples, float theta, bool wor,
+ bool include_best,
+ bool add_bos, bool add_eos, bool reverse,
+- bool emit_unk_piece) {
++ bool emit_unk_piece) const {
+ auto piecess = $self->SampleEncodeAndScoreAsPieces(text, num_samples,
+ theta, wor, include_best);
+ for (auto &pieces : piecess) {
+- RewritePieces(*$self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*$self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+- }
++ }
++
++ // Calculate Entropy
++ float _CalculateEntropy(absl::string_view text, float theta) {
++ return $self->CalculateEntropy(text, theta);
++ }
++
++ std::vector<float> _CalculateEntropyBatch(const std::vector<absl::string_view> &ins,
++ float theta, int num_threads) {
++ std::vector<float> outs(ins.size());
++ InitNumThreads(ins, &num_threads);
++ {
++ ThreadPool pool(ins.size());
++ for (int n = 0; n < num_threads; ++n) {
++ pool.Schedule([&, n]() {
++ for (size_t i = n; i < ins.size(); i += num_threads) {
++ outs[i] = self->CalculateEntropy(ins[i], theta);
++ }
++ });
++ }
++ }
++ return outs;
++ }
+
+ %pythoncode {
+ def Init(self,
+@@ -310,7 +524,8 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ emit_unk_piece=False,
+ enable_sampling=False,
+ nbest_size=-1,
+- alpha=0.1):
++ alpha=0.1,
++ num_threads=1):
+ """Initialzie sentencepieceProcessor.
+
+ Args:
+@@ -330,6 +545,7 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and dropout probability of
+ merge operations for BPE-dropout.
++ num_threads: number of threads in batch processing.
+ """
+
+ _sentencepiece_processor_init_native(self)
+@@ -341,6 +557,7 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ self._enable_sampling = enable_sampling
+ self._nbest_size = nbest_size
+ self._alpha = alpha
++ self._num_threads = num_threads
+ if model_file or model_proto:
+ self.Load(model_file=model_file, model_proto=model_proto)
+
+@@ -354,7 +571,8 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ emit_unk_piece=None,
+ enable_sampling=None,
+ nbest_size=None,
+- alpha=None):
++ alpha=None,
++ num_threads=None):
+ """Encode text input to segmented ids or tokens.
+
+ Args:
+@@ -373,6 +591,7 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and merge probability for
+ BPE-dropout (probablity 'p' in BPE-dropout paper).
++ num_threads: the number of threads used in the batch processin (Default = 1).
+ """
+
+ if out_type is None:
+@@ -391,6 +610,8 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ nbest_size = self._nbest_size
+ if alpha is None:
+ alpha = self._alpha
++ if num_threads is None:
++ num_threads = self._num_threads
+
+ if enable_sampling == True and (nbest_size is None or nbest_size == 0 or
+ nbest_size == 1 or alpha is None):
+@@ -401,18 +622,59 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ 'instead of nbest segmentations.'
+ )
+
+- def _encode(text):
+- if out_type is int:
+- return self._EncodeAsIds(text, enable_sampling, nbest_size,
+- alpha, add_bos, add_eos, reverse)
+- else:
+- return self._EncodeAsPieces(text, enable_sampling, nbest_size,
+- alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
+
+ if type(input) is list:
+- return [_encode(n) for n in input]
++ if out_type is int:
++ return self._EncodeAsIdsBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._EncodeAsPiecesBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._EncodeAsSerializedProtoBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ if out_type is int:
++ return self._EncodeAsIds(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._EncodeAsPieces(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._EncodeAsSerializedProto(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
++ raise RuntimeError('unknown out_type={}'.format(out_type))
++ return None
+
+- return _encode(input)
++
++ def EncodeAsPieces(self, input, **kwargs):
++ return self.Encode(input=input, out_type=str, **kwargs)
++
++
++ def EncodeAsIds(self, input, **kwargs):
++ return self.Encode(input=input, out_type=int, **kwargs)
++
++
++ def EncodeAsSerializedProto(self, input, **kwargs):
++ return self.Encode(input=input, out_type='proto', **kwargs)
++
++
++ def SampleEncodeAsPieces(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type=str, enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsIds(self, input, nbest_size=None, alpha=None,**kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type=int, enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsSerializedProto(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type='proto', enable_sampling=True, **kwargs)
+
+
+ def NBestEncode(self,
+@@ -453,9 +715,14 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+
+ def _encode(text):
+ if out_type is int:
+- return self._NBestEncodeAsIds(text, nbest_size, add_bos, add_eos, reverse)
+- else:
+- return self._NBestEncodeAsPieces(text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
++ return self._NBestEncodeAsIds(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type is str:
++ return self._NBestEncodeAsPieces(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'proto':
++ return self._NBestEncodeAsSerializedProto(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+ return [_encode(n) for n in input]
+@@ -463,6 +730,21 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ return _encode(input)
+
+
++ def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type=str, **kwargs)
++
++
++ def NBestEncodeAsIds(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type=int, **kwargs)
++
++
++ def NBestEncodeAsSerializedProto(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type='proto', **kwargs)
++
++
+ def SampleEncodeAndScore(self,
+ input,
+ out_type=None,
+@@ -478,7 +760,7 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+
+ Args:
+ input: input string. accepsts list of string.
+- out_type: output type. int or str.
++ out_type: output type. int or str or 'proto'.
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+@@ -513,12 +795,12 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+
+ if include_best and not wor:
+ raise RuntimeError('When include_best is True, We must specify "wor = True".')
+-
++
+
+ def _encode(text):
+ if out_type is int:
+ return self._SampleEncodeAndScoreAsIds(text, num_samples, theta, wor, include_best,
+- add_bos, add_eos, reverse)
++ add_bos, add_eos, reverse, emit_unk_piece)
+ else:
+ return self._SampleEncodeAndScoreAsPieces(text, num_samples, theta, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+@@ -529,35 +811,90 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ return _encode(input)
+
+
+- def Decode(self, input):
+- """Decode processed id or token sequences."""
++ def Decode(self, input, out_type=str, num_threads=None):
++ """Decode processed id or token sequences.
++
++ Args:
++ out_type: output type. str or 'proto' (Default = str)
++ num_threads: the number of threads used in the batch processin (Default = 1).
++ """
++
++ if num_threads is None:
++ num_threads = self._num_threads
++
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
+
+ if not input:
+- return self.DecodeIds([])
+- elif type(input) is int:
+- return self.DecodeIdsWithCheck([input])
+- elif type(input) is str:
+- return self.DecodePieces([input])
++ return ''
++
++ if out_type is str:
++ if type(input) is int:
++ return self._DecodeIds([input])
++ if type(input) is str:
++ return self._DecodePieces([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIds(input)
++ if type(input[0]) is str:
++ return self._DecodePieces(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesBatch(input, num_threads)
++
++ if out_type == 'proto':
++ if type(input) is int:
++ return self._DecodeIdsAsSerializedProto([input])
++ if type(input) is str:
++ return self._DecodePiecesAsSerializedProto([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIdsAsSerializedProto(input)
++ if type(input[0]) is str:
++ return self._DecodePiecesAsSerializedProto(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsAsSerializedProtoBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesAsSerializedProtoBatch(input, num_threads)
++
++
++ raise RuntimeError('unknown output or input type')
++ return None
+
+- def _decode(input):
+- if not input:
+- return self.DecodeIds([])
+- if type(input[0]) is int:
+- return self.DecodeIdsWithCheck(input)
+- return self.DecodePieces(input)
+
+- if type(input[0]) is list:
+- return [_decode(n) for n in input]
++ def DecodePieces(self, input, out_type=str, **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+- return _decode(input)
+
++ def DecodeIds(self, input, out_type=str, **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodePiecesAsSerializedProto(self, input, out_type='proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+- def Entropy(self, input, theta):
+- """Calculate sentence entropy"""
+
++ def DecodeIdsAsSerializedProto(self, input, out_type='proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def CalculateEntropy(self, input, theta, num_threads=None):
++ """Calculate sentence entropy"""
+ if type(input) is list:
+- return [self.CalculateEntropy(n, theta) for n in input]
+- return self.CalculateEntropy(input, theta)
++ if num_threads is None:
++ num_threads = self._num_threads
++ if num_threads is None or type(num_threads) is not int:
++ raise RuntimeError('num_threads must be int')
++ return self._CalculateEntropyBatch(input, theta, num_threads)
++
++ return self._CalculateEntropy(input, theta)
+
+
+ def piece_size(self):
+@@ -696,6 +1033,13 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ }
+ }
+
++%typemap(out) std::vector<float> {
++ $result = PyList_New($1.size());
++ for (size_t i = 0; i < $1.size(); ++i) {
++ PyList_SetItem($result, i, PyFloat_FromDouble(static_cast<double>($1[i])));
++ }
++}
++
+ %typemap(out) std::vector<std::vector<int>> {
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+@@ -715,6 +1059,13 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ }
+ }
+
++%typemap(out) BytesArray {
++ $result = PyList_New($1.size());
++ for (size_t i = 0; i < $1.size(); ++i) {
++ PyList_SetItem($result, i, MakePyOutputBytes($1[i]));
++ }
++}
++
+ %typemap(out) std::vector<std::vector<std::string>> {
+ PyObject *input_type = resultobj;
+ $result = PyList_New($1.size());
+@@ -778,7 +1129,51 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem($input, i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = std::string(ustring.data(), ustring.size());
++ (*out)[i].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ $1 = out;
++}
++
++%typemap(in) const std::vector<absl::string_view>& {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check($input)) {
++ const size_t size = PyList_Size($input);
++ out = new std::vector<std::string>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem($input, i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ $1 = out;
++}
++
++%typemap(in) const std::vector<absl::string_view>& {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check($input)) {
++ const size_t size = PyList_Size($input);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem($input, i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -813,6 +1208,69 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ $1 = out;
+ }
+
++%typemap(in) const std::vector<std::vector<std::string>>& {
++ std::vector<std::vector<std::string>> *out = nullptr;
++ if (PyList_Check($input)) {
++ const size_t size = PyList_Size($input);
++ out = new std::vector<std::vector<std::string>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem($input, i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ const PyInputString ustring(PyList_GetItem(o, j));
++ if (ustring.IsAvalable()) {
++ (*out)[i][j].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ $1 = out;
++}
++
++%typemap(in) const std::vector<std::vector<int>>& {
++ std::vector<std::vector<int>> *out = nullptr;
++ if (PyList_Check($input)) {
++ const size_t size = PyList_Size($input);
++ out = new std::vector<std::vector<int>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem($input, i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ PyObject *o2 = PyList_GetItem(o, j);
++ if (PyInt_Check(o2)) {
++ (*out)[i][j] = static_cast<int>(PyInt_AsLong(o2));
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ $1 = out;
++}
++
+ %typemap(in) const std::unordered_map<std::string, std::string> & {
+ std::unordered_map<std::string, std::string> *out = nullptr;
+ if (PyDict_Check($input)) {
+@@ -880,6 +1338,10 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ delete $1;
+ }
+
++%typemap(freearg) const std::vector<absl::string_view>& {
++ delete $1;
++}
++
+ %typemap(freearg) const std::vector<std::vector<std::string>>& {
+ delete $1;
+ }
+@@ -888,6 +1350,10 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ delete $1;
+ }
+
++%typemap(freearg) const std::vector<float>& {
++ delete $1;
++}
++
+ %typemap(freearg) const std::vector<std::vector<int>>& {
+ delete $1;
+ }
+@@ -948,8 +1414,6 @@ setattr(SentencePieceProcessor, '__init__', SentencePieceProcessor.Init)
+
+ SentencePieceProcessor.Tokenize = SentencePieceProcessor.Encode
+ SentencePieceProcessor.Detokenize = SentencePieceProcessor.Decode
+-SentencePieceProcessor.DecodeIds = SentencePieceProcessor.DecodeIdsWithCheck
+-SentencePieceProcessor.DecodeIdsAsSerializedProto = SentencePieceProcessor.DecodeIdsAsSerializedProtoWithCheck
+
+ for m in [
+ 'PieceToId', 'IdToPiece', 'GetScore', 'IsUnknown', 'IsControl', 'IsUnused',
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 36b3a0e..6df3880 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -2698,10 +2698,13 @@ SWIGINTERN PyObject *SWIG_PyStaticMethod_New(PyObject *SWIGUNUSEDPARM(self), PyO
+ #define SWIGTYPE_p_sentencepiece__SentencePieceTrainer swig_types[3]
+ #define SWIGTYPE_p_std__string swig_types[4]
+ #define SWIGTYPE_p_std__unordered_mapT_std__string_std__string_t swig_types[5]
+-#define SWIGTYPE_p_std__vectorT_int_t swig_types[6]
+-#define SWIGTYPE_p_std__vectorT_std__string_t swig_types[7]
+-static swig_type_info *swig_types[9];
+-static swig_module_info swig_module = {swig_types, 8, 0, 0, 0, 0};
++#define SWIGTYPE_p_std__vectorT_absl__string_view_t swig_types[6]
++#define SWIGTYPE_p_std__vectorT_int_t swig_types[7]
++#define SWIGTYPE_p_std__vectorT_std__string_t swig_types[8]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_int_t_t swig_types[9]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_std__string_t_t swig_types[10]
++static swig_type_info *swig_types[12];
++static swig_module_info swig_module = {swig_types, 11, 0, 0, 0, 0};
+ #define SWIG_TypeQuery(name) SWIG_TypeQueryModule(&swig_module, &swig_module, name)
+ #define SWIG_MangledTypeQuery(name) SWIG_MangledTypeQueryModule(&swig_module, &swig_module, name)
+
+@@ -2805,9 +2808,13 @@ namespace swig {
+ }
+
+
++#include <iostream>
+ #include <algorithm>
++#include <functional>
+ #include <limits>
+ #include <cmath>
++#include <thread>
++#include <vector>
+ #include <sentencepiece_processor.h>
+ #include <sentencepiece_trainer.h>
+
+@@ -2815,6 +2822,8 @@ namespace {
+ PyObject* kUnicodeInput = reinterpret_cast<PyObject* >(0x1);
+ PyObject* kByteInput = reinterpret_cast<PyObject* >(0x2);
+
++using BytesArray = std::vector<sentencepiece::util::bytes>;
++
+ inline void ReleaseResultObject(PyObject *obj) {
+ if (obj != nullptr && obj != kUnicodeInput && obj != kByteInput) {
+ Py_XDECREF(obj);
+@@ -2857,7 +2866,7 @@ PyObject* MakePyOutputString(const std::string& output,
+ return PyBytes_FromStringAndSize(output.data(), output.size());
+ }
+
+-PyObject* MakePyOutputBytes(const std::string& output) {
++PyObject* MakePyOutputBytes(const sentencepiece::util::bytes& output) {
+ return PyBytes_FromStringAndSize(output.data(), output.size());
+ }
+
+@@ -2929,18 +2938,18 @@ class PySentenceIterator : public sentencepiece::SentenceIterator {
+ sentencepiece::util::Status status_;
+ };
+
+-void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
+- std::vector<int> *ids,
+- bool add_bos, bool add_eos, bool reverse) {
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ std::vector<int> *ids,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
+ if (!add_bos && !add_eos && !reverse) return;
+ if (reverse) std::reverse(ids->begin(), ids->end());
+ if (add_bos) ids->insert(ids->begin(), sp.bos_id());
+ if (add_eos) ids->push_back(sp.eos_id());
+ }
+
+-void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+- std::vector<std::string> *pieces,
+- bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ std::vector<std::string> *pieces,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
+ if (!add_bos && !add_eos && !reverse && !emit_unk_piece) return;
+ if (reverse) std::reverse(pieces->begin(), pieces->end());
+ if (add_bos) pieces->insert(pieces->begin(), sp.IdToPiece(sp.bos_id()));
+@@ -2955,6 +2964,98 @@ void RewritePieces(const sentencepiece::SentencePieceProcessor &sp,
+ }
+ }
+ }
++
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ sentencepiece::util::bytes *proto,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++ if (add_bos || add_eos || reverse || emit_unk_piece) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kUnimplemented,
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in AsSerialize API");
++ }
++}
++
++inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
++ for (int id : ids) {
++ if (id < 0 || id >= num_pieces) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "piece id is out of range.");
++ }
++ }
++}
++
++inline void CheckIds(const std::vector<std::string> &ids, int num_pieces) {}
++
++class ThreadPool {
++ public:
++ explicit ThreadPool(size_t request_size) :
++ request_size_(request_size) {}
++
++ virtual ~ThreadPool() {
++ for (auto &task : tasks_) {
++ task.join();
++ }
++ }
++
++ void Schedule(std::function<void()> closure) {
++ static constexpr size_t kMinThreadSize = 2;
++ if (request_size_ < kMinThreadSize) {
++ closure();
++ } else {
++ tasks_.emplace_back(closure);
++ }
++ }
++
++ private:
++ size_t request_size_ = 0;
++ std::vector<std::thread> tasks_;
++};
++
++template <typename T>
++inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
++ *num_threads = std::max<int>(1,
++ std::min<int>({*num_threads,
++ static_cast<int>(ins.size()), 256}));
++}
++
++#define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
++ std::vector<OutType> outs(ins.size()); \
++ InitNumThreads(ins, &num_threads); \
++ { \
++ ThreadPool pool(ins.size()); \
++ for (int n = 0; n < num_threads; ++n) { \
++ pool.Schedule([&, n]() { \
++ for (size_t i = n; i < ins.size(); i += num_threads) { \
++ auto out = enable_sampling ? \
++ self->Sample##FuncName(ins[i], \
++ nbest_size, alpha) : \
++ self->FuncName(ins[i]); \
++ RewriteIds(*self, &out, add_bos, add_eos, reverse, \
++ emit_unk_piece); \
++ outs[i] = std::move(out); \
++ } \
++ }); \
++ } \
++ } \
++ return outs;
++
++#define DEFINE_DECODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
++ std::vector<OutType> outs(ins.size()); \
++ InitNumThreads(ins, &num_threads); \
++ { \
++ ThreadPool pool(ins.size()); \
++ for (int n = 0; n < num_threads; ++n) { \
++ pool.Schedule([&, n]() { \
++ for (size_t i = n; i < ins.size(); i += num_threads) { \
++ CheckIds(ins[i], self->GetPieceSize()); \
++ outs[i] = self->FuncName(ins[i]); \
++ } \
++ }); \
++ } \
++ } \
++ return outs;
++
+ } // namespace
+
+
+@@ -3334,72 +3435,122 @@ SWIGINTERNINLINE PyObject*
+ SWIGINTERN sentencepiece::util::Status sentencepiece_SentencePieceProcessor_LoadFromFile(sentencepiece::SentencePieceProcessor *self,absl::string_view arg){
+ return self->Load(arg);
+ }
+-SWIGINTERN std::string sentencepiece_SentencePieceProcessor_DecodeIdsWithCheck(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
+- const int num_pieces = self->GetPieceSize();
+- for (int id : ids) {
+- if (id < 0 || id >= num_pieces) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece id is out of range.");
+- }
+- }
+- return self->DecodeIds(ids);
+- }
+-SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
+- const int num_pieces = self->GetPieceSize();
+- for (int id : ids) {
+- if (id < 0 || id >= num_pieces) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece id is out of range.");
+- }
+- }
+- return self->DecodeIdsAsSerializedProto(ids);
+- }
+-SWIGINTERN std::vector< int > sentencepiece_SentencePieceProcessor__EncodeAsIds(sentencepiece::SentencePieceProcessor *self,absl::string_view text,bool enabele_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse){
+- auto ids = enabele_sampling ?
++SWIGINTERN std::vector< int > sentencepiece_SentencePieceProcessor__EncodeAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ auto ids = enable_sampling ?
+ self->SampleEncodeAsIds(text, nbest_size, alpha) :
+ self->EncodeAsIds(text);
+- RewriteIds(*self, &ids, add_bos, add_eos, reverse);
++ RewriteIds(*self, &ids, add_bos, add_eos, reverse, emit_unk_piece);
+ return ids;
+ }
+-SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__EncodeAsPieces(sentencepiece::SentencePieceProcessor *self,absl::string_view text,bool enabele_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+- auto pieces = enabele_sampling ?
++SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__EncodeAsPieces(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ auto pieces = enable_sampling ?
+ self->SampleEncodeAsPieces(text, nbest_size, alpha) :
+ self->EncodeAsPieces(text);
+- RewritePieces(*self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
+ return pieces;
+ }
+-SWIGINTERN std::vector< std::vector< int > > sentencepiece_SentencePieceProcessor__NBestEncodeAsIds(sentencepiece::SentencePieceProcessor *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse){
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__EncodeAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ auto proto = enable_sampling ?
++ self->SampleEncodeAsSerializedProto(text, nbest_size, alpha) :
++ self->EncodeAsSerializedProto(text);
++ RewriteIds(*self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
++ return proto;
++ }
++SWIGINTERN std::vector< std::vector< int > > sentencepiece_SentencePieceProcessor__EncodeAsIdsBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &ins,int num_threads,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsIds,
++ absl::string_view, std::vector<int>);
++ }
++SWIGINTERN std::vector< std::vector< std::string > > sentencepiece_SentencePieceProcessor__EncodeAsPiecesBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &ins,int num_threads,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsPieces,
++ absl::string_view, std::vector<std::string>);
++ }
++SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__EncodeAsSerializedProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &ins,int num_threads,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsSerializedProto,
++ absl::string_view,
++ sentencepiece::util::bytes);
++ }
++SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodeIds(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
++ CheckIds(ids, self->GetPieceSize());
++ return self->DecodeIds(ids);
++ }
++SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodePieces(sentencepiece::SentencePieceProcessor const *self,std::vector< std::string > const &pieces){
++ return self->DecodePieces(pieces);
++ }
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__DecodeIdsAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
++ CheckIds(ids, self->GetPieceSize());
++ return self->DecodeIdsAsSerializedProto(ids);
++ }
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,std::vector< std::string > const &pieces){
++ CheckIds(pieces, self->GetPieceSize());
++ return self->DecodePiecesAsSerializedProto(pieces);
++ }
++SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodeIdsBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< int > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIds, int, std::string);
++ }
++SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< int > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsSerializedProto, int,
++ sentencepiece::util::bytes);
++ }
++SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodePiecesBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< std::string > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
++ }
++SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< std::string > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsSerializedProto, std::string,
++ sentencepiece::util::bytes);
++ }
++SWIGINTERN std::vector< std::vector< int > > sentencepiece_SentencePieceProcessor__NBestEncodeAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto idss = self->NBestEncodeAsIds(text, nbest_size);
+ for (auto &ids : idss) {
+- RewriteIds(*self, &ids, add_bos, add_eos, reverse);
++ RewriteIds(*self, &ids, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return idss;
+ }
+-SWIGINTERN std::vector< std::vector< std::string > > sentencepiece_SentencePieceProcessor__NBestEncodeAsPieces(sentencepiece::SentencePieceProcessor *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++SWIGINTERN std::vector< std::vector< std::string > > sentencepiece_SentencePieceProcessor__NBestEncodeAsPieces(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto piecess = self->NBestEncodeAsPieces(text, nbest_size);
+ for (auto &pieces : piecess) {
+- RewritePieces(*self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*self, &pieces, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+ }
+-SWIGINTERN std::vector< std::pair< std::vector< int >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(sentencepiece::SentencePieceProcessor *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse){
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__NBestEncodeAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ RewriteIds(*self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return self->NBestEncodeAsSerializedProto(text, nbest_size);
++ }
++SWIGINTERN std::vector< std::pair< std::vector< int >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto idss = self->SampleEncodeAndScoreAsIds(text, num_samples,
+ theta, wor, include_best);
+ for (auto &ids : idss) {
+- RewriteIds(*self, &ids.first, add_bos, add_eos, reverse);
++ RewriteIds(*self, &ids.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return idss;
+ }
+-SWIGINTERN std::vector< std::pair< std::vector< std::string >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(sentencepiece::SentencePieceProcessor *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++SWIGINTERN std::vector< std::pair< std::vector< std::string >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto piecess = self->SampleEncodeAndScoreAsPieces(text, num_samples,
+ theta, wor, include_best);
+ for (auto &pieces : piecess) {
+- RewritePieces(*self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
++ RewriteIds(*self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+ }
++SWIGINTERN float sentencepiece_SentencePieceProcessor__CalculateEntropy(sentencepiece::SentencePieceProcessor *self,absl::string_view text,float theta){
++ return self->CalculateEntropy(text, theta);
++ }
++SWIGINTERN std::vector< float > sentencepiece_SentencePieceProcessor__CalculateEntropyBatch(sentencepiece::SentencePieceProcessor *self,std::vector< absl::string_view > const &ins,float theta,int num_threads){
++ std::vector<float> outs(ins.size());
++ InitNumThreads(ins, &num_threads);
++ {
++ ThreadPool pool(ins.size());
++ for (int n = 0; n < num_threads; ++n) {
++ pool.Schedule([&, n]() {
++ for (size_t i = n; i < ins.size(); i += num_threads) {
++ outs[i] = self->CalculateEntropy(ins[i], theta);
++ }
++ });
++ }
++ }
++ return outs;
++ }
+
+ SWIGINTERN int
+ SWIG_AsVal_unsigned_SS_long (PyObject *obj, unsigned long *val)
+@@ -3703,7 +3854,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetVocabulary(PyObject *SWIGUN
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = std::string(ustring.data(), ustring.size());
++ (*out)[i].assign(ustring.data(), ustring.size());
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -3832,19 +3983,31 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
++ int arg3 ;
++ float arg4 ;
++ bool arg5 ;
++ bool arg6 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- std::vector< std::string > result;
++ int val3 ;
++ int ecode3 = 0 ;
++ float val4 ;
++ int ecode4 = 0 ;
++ bool val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ PyObject *swig_obj[6] ;
++ std::vector< std::pair< std::vector< std::string >,float > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_EncodeAsPieces", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsPieces", 6, 6, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_EncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -3856,9 +4019,29 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsPieces(PyObject *SWIGU
+ resultobj = ustring.input_type();
+ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "4"" of type '" "float""'");
++ }
++ arg4 = static_cast< float >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "5"" of type '" "bool""'");
++ }
++ arg5 = static_cast< bool >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->EncodeAsPieces(arg2);
++ result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsPieces(arg2,arg3,arg4,arg5,arg6);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -3869,7 +4052,11 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsPieces(PyObject *SWIGU
+ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ PyObject *obj = PyList_New(result[i].first.size());
++ for (size_t j = 0; j < result[i].first.size(); ++j) {
++ PyList_SetItem(obj, j, MakePyOutputString(result[i].first[j], input_type));
++ }
++ PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -3878,19 +4065,31 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
++ int arg3 ;
++ float arg4 ;
++ bool arg5 ;
++ bool arg6 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- std::vector< int > result;
++ int val3 ;
++ int ecode3 = 0 ;
++ float val4 ;
++ int ecode4 = 0 ;
++ bool val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ PyObject *swig_obj[6] ;
++ std::vector< std::pair< std::vector< int >,float > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_EncodeAsIds", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsIds", 6, 6, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_EncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -3902,9 +4101,29 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsIds(PyObject *SWIGUNUS
+ resultobj = ustring.input_type();
+ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "4"" of type '" "float""'");
++ }
++ arg4 = static_cast< float >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "5"" of type '" "bool""'");
++ }
++ arg5 = static_cast< bool >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->EncodeAsIds(arg2);
++ result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsIds(arg2,arg3,arg4,arg5,arg6);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -3914,7 +4133,11 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsIds(PyObject *SWIGUNUS
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
++ PyObject *obj = PyList_New(result[i].first.size());
++ for (size_t j = 0; j < result[i].first.size(); ++j) {
++ PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
++ }
++ PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -3923,22 +4146,22 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_NBestEncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+- int arg3 ;
++ float arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
++ float val3 ;
+ int ecode3 = 0 ;
+ PyObject *swig_obj[3] ;
+- std::vector< std::vector< std::string > > result;
++ float result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_NBestEncodeAsPieces", 3, 3, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_CalculateEntropy", 3, 3, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_NBestEncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -3950,113 +4173,71 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_NBestEncodeAsPieces(PyObject *
+ resultobj = ustring.input_type();
+ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_NBestEncodeAsPieces" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "3"" of type '" "float""'");
+ }
+- arg3 = static_cast< int >(val3);
++ arg3 = static_cast< float >(val3);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->NBestEncodeAsPieces(arg2,arg3);
++ result = (float)((sentencepiece::SentencePieceProcessor const *)arg1)->CalculateEntropy(arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- PyObject *input_type = resultobj;
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].size());
+- for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
+- }
+- PyList_SetItem(resultobj, i, obj);
+- }
+- }
++ resultobj = SWIG_From_float(static_cast< float >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_NBestEncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetPieceSize(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- PyObject *swig_obj[3] ;
+- std::vector< std::vector< int > > result;
++ PyObject *swig_obj[1] ;
++ int result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_NBestEncodeAsIds", 3, 3, swig_obj)) SWIG_fail;
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_NBestEncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_GetPieceSize" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_NBestEncodeAsIds" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->NBestEncodeAsIds(arg2,arg3);
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->GetPieceSize();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].size());
+- for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
+- }
+- PyList_SetItem(resultobj, i, obj);
+- }
+- }
++ resultobj = SWIG_From_int(static_cast< int >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_PieceToId(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- PyObject *swig_obj[4] ;
+- std::vector< std::string > result;
++ PyObject *swig_obj[2] ;
++ int result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAsPieces", 4, 4, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_PieceToId", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_PieceToId" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -4068,81 +4249,47 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAsPieces(PyObject
+ resultobj = ustring.input_type();
+ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAsPieces" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAsPieces" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAsPieces(arg2,arg3,arg4);
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->PieceToId(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- PyObject *input_type = resultobj;
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
+- }
+- }
++ resultobj = SWIG_From_int(static_cast< int >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IdToPiece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- PyObject *swig_obj[4] ;
+- std::vector< int > result;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ std::string *result = 0 ;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAsIds", 4, 4, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IdToPiece", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IdToPiece" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAsIds" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAsIds" "', argument " "4"" of type '" "float""'");
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IdToPiece" "', argument " "2"" of type '" "int""'");
+ }
+- arg4 = static_cast< float >(val4);
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAsIds(arg2,arg3,arg4);
++ result = (std::string *) &((sentencepiece::SentencePieceProcessor const *)arg1)->IdToPiece(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -4150,10 +4297,8 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAsIds(PyObject *SW
+ }
+ }
+ {
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
+- }
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(*result, input_type);
+ }
+ return resultobj;
+ fail:
+@@ -4161,489 +4306,290 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetScore(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+- bool arg5 ;
+- bool arg6 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- bool val5 ;
+- int ecode5 = 0 ;
+- bool val6 ;
+- int ecode6 = 0 ;
+- PyObject *swig_obj[6] ;
+- std::vector< std::pair< std::vector< std::string >,float > > result;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ float result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsPieces", 6, 6, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_GetScore", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_GetScore" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+- ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+- if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "5"" of type '" "bool""'");
+- }
+- arg5 = static_cast< bool >(val5);
+- ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+- if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "6"" of type '" "bool""'");
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_GetScore" "', argument " "2"" of type '" "int""'");
+ }
+- arg6 = static_cast< bool >(val6);
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsPieces(arg2,arg3,arg4,arg5,arg6);
++ result = (float)((sentencepiece::SentencePieceProcessor const *)arg1)->GetScore(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- PyObject *input_type = resultobj;
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].first.size());
+- for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i].first[j], input_type));
+- }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+- }
+- }
++ resultobj = SWIG_From_float(static_cast< float >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsUnknown(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+- bool arg5 ;
+- bool arg6 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- bool val5 ;
+- int ecode5 = 0 ;
+- bool val6 ;
+- int ecode6 = 0 ;
+- PyObject *swig_obj[6] ;
+- std::vector< std::pair< std::vector< int >,float > > result;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ bool result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsIds", 6, 6, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsUnknown", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsUnknown" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+- ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+- if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "5"" of type '" "bool""'");
+- }
+- arg5 = static_cast< bool >(val5);
+- ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+- if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "6"" of type '" "bool""'");
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsUnknown" "', argument " "2"" of type '" "int""'");
+ }
+- arg6 = static_cast< bool >(val6);
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsIds(arg2,arg3,arg4,arg5,arg6);
++ result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsUnknown(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].first.size());
+- for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
+- }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+- }
+- }
++ resultobj = SWIG_From_bool(static_cast< bool >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodePieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsControl(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::string > *arg2 = 0 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
+ PyObject *swig_obj[2] ;
+- std::string result;
++ bool result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_DecodePieces", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsControl", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_DecodePieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsControl" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- std::vector<std::string> *out = nullptr;
+- if (PyList_Check(swig_obj[1])) {
+- const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::string>(size);
+- for (size_t i = 0; i < size; ++i) {
+- const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+- if (ustring.IsAvalable()) {
+- (*out)[i] = std::string(ustring.data(), ustring.size());
+- } else {
+- PyErr_SetString(PyExc_TypeError, "list must contain strings");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- }
+- } else {
+- PyErr_SetString(PyExc_TypeError, "not a list");
+- SWIG_fail;
+- }
+- arg2 = out;
+- }
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsControl" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->DecodePieces((std::vector< std::string > const &)*arg2);
++ result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsControl(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- PyObject *input_type = resultobj;
+- resultobj = MakePyOutputString(result, input_type);
+- }
+- {
+- delete arg2;
+- }
++ resultobj = SWIG_From_bool(static_cast< bool >(result));
+ return resultobj;
+ fail:
+- {
+- delete arg2;
+- }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsUnused(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- float arg3 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- float val3 ;
+- int ecode3 = 0 ;
+- PyObject *swig_obj[3] ;
+- float result;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ bool result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_CalculateEntropy", 3, 3, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsUnused", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsUnused" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "3"" of type '" "float""'");
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsUnused" "', argument " "2"" of type '" "int""'");
+ }
+- arg3 = static_cast< float >(val3);
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = (float)((sentencepiece::SentencePieceProcessor const *)arg1)->CalculateEntropy(arg2,arg3);
++ result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsUnused(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_float(static_cast< float >(result));
++ resultobj = SWIG_From_bool(static_cast< bool >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_EncodeAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsByte(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
++ int arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
+ PyObject *swig_obj[2] ;
+- sentencepiece::util::bytes result;
++ bool result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_EncodeAsSerializedProto", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsByte", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_EncodeAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsByte" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsByte" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->EncodeAsSerializedProto(arg2);
++ result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsByte(arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = MakePyOutputBytes(result);
+- }
++ resultobj = SWIG_From_bool(static_cast< bool >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_unk_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- PyObject *swig_obj[4] ;
+- sentencepiece::util::bytes result;
++ PyObject *swig_obj[1] ;
++ int result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAsSerializedProto", 4, 4, swig_obj)) SWIG_fail;
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_unk_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAsSerializedProto" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAsSerializedProto" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAsSerializedProto(arg2,arg3,arg4);
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->unk_id();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = MakePyOutputBytes(result);
+- }
++ resultobj = SWIG_From_int(static_cast< int >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_NBestEncodeAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_bos_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- PyObject *swig_obj[3] ;
+- sentencepiece::util::bytes result;
++ PyObject *swig_obj[1] ;
++ int result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_NBestEncodeAsSerializedProto", 3, 3, swig_obj)) SWIG_fail;
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_NBestEncodeAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_bos_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_NBestEncodeAsSerializedProto" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->NBestEncodeAsSerializedProto(arg2,arg3);
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->bos_id();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = MakePyOutputBytes(result);
+- }
++ resultobj = SWIG_From_int(static_cast< int >(result));
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodePiecesAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_eos_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::string > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- sentencepiece::util::bytes result;
++ PyObject *swig_obj[1] ;
++ int result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_DecodePiecesAsSerializedProto", 2, 2, swig_obj)) SWIG_fail;
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_DecodePiecesAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_eos_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- std::vector<std::string> *out = nullptr;
+- if (PyList_Check(swig_obj[1])) {
+- const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::string>(size);
+- for (size_t i = 0; i < size; ++i) {
+- const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+- if (ustring.IsAvalable()) {
+- (*out)[i] = std::string(ustring.data(), ustring.size());
+- } else {
+- PyErr_SetString(PyExc_TypeError, "list must contain strings");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- }
+- } else {
+- PyErr_SetString(PyExc_TypeError, "not a list");
+- SWIG_fail;
+- }
+- arg2 = out;
+- }
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->DecodePiecesAsSerializedProto((std::vector< std::string > const &)*arg2);
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->eos_id();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- {
+- resultobj = MakePyOutputBytes(result);
+- }
+- {
+- delete arg2;
+- }
++ resultobj = SWIG_From_int(static_cast< int >(result));
+ return resultobj;
+ fail:
+- {
+- delete arg2;
+- }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetPieceSize(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_pad_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ void *argp1 = 0 ;
+@@ -4655,12 +4601,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetPieceSize(PyObject *SWIGUNU
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_GetPieceSize" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_pad_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->GetPieceSize();
++ result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->pad_id();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -4674,71 +4620,66 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_PieceToId(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_serialized_model_proto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- int result;
++ PyObject *swig_obj[1] ;
++ sentencepiece::util::bytes result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_PieceToId", 2, 2, swig_obj)) SWIG_fail;
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_PieceToId" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_serialized_model_proto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
+- }
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->PieceToId(arg2);
++ result = ((sentencepiece::SentencePieceProcessor const *)arg1)->serialized_model_proto();
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_int(static_cast< int >(result));
++ {
++ resultobj = MakePyOutputBytes(result);
++ }
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IdToPiece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadFromFile(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ absl::string_view arg2 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+ PyObject *swig_obj[2] ;
+- std::string *result = 0 ;
++ sentencepiece::util::Status result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IdToPiece", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_LoadFromFile", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IdToPiece" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_LoadFromFile" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IdToPiece" "', argument " "2"" of type '" "int""'");
+- }
+- arg2 = static_cast< int >(val2);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = absl::string_view(ustring.data(), ustring.size());
++ }
+ {
+ try {
+- result = (std::string *) &((sentencepiece::SentencePieceProcessor const *)arg1)->IdToPiece(arg2);
++ result = sentencepiece_SentencePieceProcessor_LoadFromFile(arg1,arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -4746,8 +4687,10 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IdToPiece(PyObject *SWIGUNUSED
+ }
+ }
+ {
+- PyObject *input_type = resultobj;
+- resultobj = MakePyOutputString(*result, input_type);
++ if (!(&result)->ok()) {
++ SWIG_exception(ToSwigError((&result)->code()), (&result)->ToString().c_str());
++ }
++ resultobj = SWIG_From_bool((&result)->ok());
+ }
+ return resultobj;
+ fail:
+@@ -4755,338 +4698,916 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetScore(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ absl::string_view arg2 ;
++ bool arg3 ;
++ int arg4 ;
++ float arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- float result;
++ bool val3 ;
++ int ecode3 = 0 ;
++ int val4 ;
++ int ecode4 = 0 ;
++ float val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ PyObject *swig_obj[9] ;
++ std::vector< int > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_GetScore", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsIds", 9, 9, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_GetScore" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_GetScore" "', argument " "2"" of type '" "int""'");
+- }
+- arg2 = static_cast< int >(val2);
+ {
+- try {
+- result = (float)((sentencepiece::SentencePieceProcessor const *)arg1)->GetScore(arg2);
+- ReleaseResultObject(resultobj);
+- }
+- catch (const sentencepiece::util::Status &status) {
+- SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
+ }
++ resultobj = ustring.input_type();
++ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
+- resultobj = SWIG_From_float(static_cast< float >(result));
+- return resultobj;
+-fail:
+- return NULL;
+-}
+-
+-
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsUnknown(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "3"" of type '" "bool""'");
++ }
++ arg3 = static_cast< bool >(val3);
++ ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "4"" of type '" "int""'");
++ }
++ arg4 = static_cast< int >(val4);
++ ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "5"" of type '" "float""'");
++ }
++ arg5 = static_cast< float >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__EncodeAsIds((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
++ }
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ absl::string_view arg2 ;
++ bool arg3 ;
++ int arg4 ;
++ float arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- bool result;
++ bool val3 ;
++ int ecode3 = 0 ;
++ int val4 ;
++ int ecode4 = 0 ;
++ float val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ PyObject *swig_obj[9] ;
++ std::vector< std::string > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsUnknown", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsPieces", 9, 9, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsUnknown" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsUnknown" "', argument " "2"" of type '" "int""'");
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = absl::string_view(ustring.data(), ustring.size());
++ }
++ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "3"" of type '" "bool""'");
+ }
+- arg2 = static_cast< int >(val2);
++ arg3 = static_cast< bool >(val3);
++ ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "4"" of type '" "int""'");
++ }
++ arg4 = static_cast< int >(val4);
++ ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "5"" of type '" "float""'");
++ }
++ arg5 = static_cast< float >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
+ {
+ try {
+- result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsUnknown(arg2);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsPieces((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_bool(static_cast< bool >(result));
++ {
++ PyObject *input_type = resultobj;
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ }
++ }
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsControl(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ absl::string_view arg2 ;
++ bool arg3 ;
++ int arg4 ;
++ float arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- bool result;
++ bool val3 ;
++ int ecode3 = 0 ;
++ int val4 ;
++ int ecode4 = 0 ;
++ float val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ PyObject *swig_obj[9] ;
++ sentencepiece::util::bytes result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsControl", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsSerializedProto", 9, 9, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsControl" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsControl" "', argument " "2"" of type '" "int""'");
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = absl::string_view(ustring.data(), ustring.size());
++ }
++ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "3"" of type '" "bool""'");
+ }
+- arg2 = static_cast< int >(val2);
++ arg3 = static_cast< bool >(val3);
++ ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "4"" of type '" "int""'");
++ }
++ arg4 = static_cast< int >(val4);
++ ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "5"" of type '" "float""'");
++ }
++ arg5 = static_cast< float >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsSerializedProto" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
+ {
+ try {
+- result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsControl(arg2);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_bool(static_cast< bool >(result));
++ {
++ resultobj = MakePyOutputBytes(result);
++ }
+ return resultobj;
+ fail:
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsUnused(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ int arg3 ;
++ bool arg4 ;
++ int arg5 ;
++ float arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- bool result;
++ int val3 ;
++ int ecode3 = 0 ;
++ bool val4 ;
++ int ecode4 = 0 ;
++ int val5 ;
++ int ecode5 = 0 ;
++ float val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ std::vector< std::vector< int > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsUnused", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsIdsBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsUnused" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsUnused" "', argument " "2"" of type '" "int""'");
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "3"" of type '" "int""'");
+ }
+- arg2 = static_cast< int >(val2);
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "4"" of type '" "bool""'");
++ }
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "5"" of type '" "int""'");
++ }
++ arg5 = static_cast< int >(val5);
++ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "6"" of type '" "float""'");
++ }
++ arg6 = static_cast< float >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsUnused(arg2);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsIdsBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_bool(static_cast< bool >(result));
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyObject *obj = PyList_New(result[i].size());
++ for (size_t j = 0; j < result[i].size(); ++j) {
++ PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ }
++ PyList_SetItem(resultobj, i, obj);
++ }
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_IsByte(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- int arg2 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ int arg3 ;
++ bool arg4 ;
++ int arg5 ;
++ float arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- bool result;
++ int val3 ;
++ int ecode3 = 0 ;
++ bool val4 ;
++ int ecode4 = 0 ;
++ int val5 ;
++ int ecode5 = 0 ;
++ float val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ std::vector< std::vector< std::string > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_IsByte", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsPiecesBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_IsByte" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "SentencePieceProcessor_IsByte" "', argument " "2"" of type '" "int""'");
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "3"" of type '" "int""'");
+ }
+- arg2 = static_cast< int >(val2);
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "4"" of type '" "bool""'");
++ }
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "5"" of type '" "int""'");
++ }
++ arg5 = static_cast< int >(val5);
++ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "6"" of type '" "float""'");
++ }
++ arg6 = static_cast< float >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = (bool)((sentencepiece::SentencePieceProcessor const *)arg1)->IsByte(arg2);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsPiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_bool(static_cast< bool >(result));
++ {
++ PyObject *input_type = resultobj;
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyObject *obj = PyList_New(result[i].size());
++ for (size_t j = 0; j < result[i].size(); ++j) {
++ PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
++ }
++ PyList_SetItem(resultobj, i, obj);
++ }
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_unk_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ int arg3 ;
++ bool arg4 ;
++ int arg5 ;
++ float arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[1] ;
+- int result;
++ int val3 ;
++ int ecode3 = 0 ;
++ bool val4 ;
++ int ecode4 = 0 ;
++ int val5 ;
++ int ecode5 = 0 ;
++ float val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ BytesArray result;
+
+- if (!args) SWIG_fail;
+- swig_obj[0] = args;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsSerializedProtoBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_unk_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "4"" of type '" "bool""'");
++ }
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "5"" of type '" "int""'");
++ }
++ arg5 = static_cast< int >(val5);
++ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "6"" of type '" "float""'");
++ }
++ arg6 = static_cast< float >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->unk_id();
++ result = sentencepiece_SentencePieceProcessor__EncodeAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_int(static_cast< int >(result));
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
++ }
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_bos_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< int > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[1] ;
+- int result;
++ PyObject *swig_obj[2] ;
++ std::string result;
+
+- if (!args) SWIG_fail;
+- swig_obj[0] = args;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIds", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_bos_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<int> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<int>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyInt_Check(o)) {
++ (*out)[i] = static_cast<int>(PyInt_AsLong(o));
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->bos_id();
++ result = sentencepiece_SentencePieceProcessor__DecodeIds((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< int > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_int(static_cast< int >(result));
++ {
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(result, input_type);
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_eos_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< std::string > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[1] ;
+- int result;
++ PyObject *swig_obj[2] ;
++ std::string result;
+
+- if (!args) SWIG_fail;
+- swig_obj[0] = args;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePieces", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_eos_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<std::string> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::string>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
+ }
+- arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->eos_id();
++ result = sentencepiece_SentencePieceProcessor__DecodePieces((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::string > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_int(static_cast< int >(result));
++ {
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(result, input_type);
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_pad_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< int > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[1] ;
+- int result;
++ PyObject *swig_obj[2] ;
++ sentencepiece::util::bytes result;
+
+- if (!args) SWIG_fail;
+- swig_obj[0] = args;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIdsAsSerializedProto", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_pad_id" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIdsAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<int> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<int>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyInt_Check(o)) {
++ (*out)[i] = static_cast<int>(PyInt_AsLong(o));
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
+ {
+ try {
+- result = (int)((sentencepiece::SentencePieceProcessor const *)arg1)->pad_id();
++ result = sentencepiece_SentencePieceProcessor__DecodeIdsAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< int > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+ }
+ }
+- resultobj = SWIG_From_int(static_cast< int >(result));
++ {
++ resultobj = MakePyOutputBytes(result);
++ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_serialized_model_proto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< std::string > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[1] ;
++ PyObject *swig_obj[2] ;
+ sentencepiece::util::bytes result;
+
+- if (!args) SWIG_fail;
+- swig_obj[0] = args;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePiecesAsSerializedProto", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_serialized_model_proto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePiecesAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<std::string> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::string>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->serialized_model_proto();
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::string > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5096,39 +5617,74 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_serialized_model_proto(PyObjec
+ {
+ resultobj = MakePyOutputBytes(result);
+ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadFromFile(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
++ std::vector< std::vector< int > > *arg2 = 0 ;
++ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- sentencepiece::util::Status result;
++ int val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ std::vector< std::string > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_LoadFromFile", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIdsBatch", 3, 3, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_LoadFromFile" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIdsBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
++ std::vector<std::vector<int>> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::vector<int>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ PyObject *o2 = PyList_GetItem(o, j);
++ if (PyInt_Check(o2)) {
++ (*out)[i][j] = static_cast<int>(PyInt_AsLong(o2));
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
+ SWIG_fail;
+ }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = out;
+ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodeIdsBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor_LoadFromFile(arg1,arg2);
++ result = sentencepiece_SentencePieceProcessor__DecodeIdsBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< int > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5136,43 +5692,63 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadFromFile(PyObject *SWIGUNU
+ }
+ }
+ {
+- if (!(&result)->ok()) {
+- SWIG_exception(ToSwigError((&result)->code()), (&result)->ToString().c_str());
++ PyObject *input_type = resultobj;
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
+ }
+- resultobj = SWIG_From_bool((&result)->ok());
++ }
++ {
++ delete arg2;
+ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsWithCheck(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< int > *arg2 = 0 ;
++ std::vector< std::vector< int > > *arg2 = 0 ;
++ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- std::string result;
++ int val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ BytesArray result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_DecodeIdsWithCheck", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch", 3, 3, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_DecodeIdsWithCheck" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<int> *out = nullptr;
++ std::vector<std::vector<int>> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<int>(size);
++ out = new std::vector<std::vector<int>>(size);
+ for (size_t i = 0; i < size; ++i) {
+ PyObject *o = PyList_GetItem(swig_obj[1], i);
+- if (PyInt_Check(o)) {
+- (*out)[i] = static_cast<int>(PyInt_AsLong(o));
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ PyObject *o2 = PyList_GetItem(o, j);
++ if (PyInt_Check(o2)) {
++ (*out)[i][j] = static_cast<int>(PyInt_AsLong(o2));
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ }
+ } else {
+- PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ PyErr_SetString(PyExc_TypeError, "not a list");
+ SWIG_fail;
+ }
+ }
+@@ -5182,9 +5758,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsWithCheck(PyObject *S
+ }
+ arg2 = out;
+ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor_DecodeIdsWithCheck((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< int > const &)*arg2);
++ result = sentencepiece_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< int > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5192,8 +5773,10 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsWithCheck(PyObject *S
+ }
+ }
+ {
+- PyObject *input_type = resultobj;
+- resultobj = MakePyOutputString(result, input_type);
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
++ }
+ }
+ {
+ delete arg2;
+@@ -5207,32 +5790,46 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< int > *arg2 = 0 ;
++ std::vector< std::vector< std::string > > *arg2 = 0 ;
++ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- PyObject *swig_obj[2] ;
+- sentencepiece::util::bytes result;
++ int val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ std::vector< std::string > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePiecesBatch", 3, 3, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePiecesBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<int> *out = nullptr;
++ std::vector<std::vector<std::string>> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<int>(size);
++ out = new std::vector<std::vector<std::string>>(size);
+ for (size_t i = 0; i < size; ++i) {
+ PyObject *o = PyList_GetItem(swig_obj[1], i);
+- if (PyInt_Check(o)) {
+- (*out)[i] = static_cast<int>(PyInt_AsLong(o));
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ const PyInputString ustring(PyList_GetItem(o, j));
++ if (ustring.IsAvalable()) {
++ (*out)[i][j].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
+ } else {
+- PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ PyErr_SetString(PyExc_TypeError,"not a list");
+ SWIG_fail;
+ }
+ }
+@@ -5242,9 +5839,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsAsSerializedProtoWith
+ }
+ arg2 = out;
+ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodePiecesBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< int > const &)*arg2);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< std::string > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5252,7 +5854,11 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_DecodeIdsAsSerializedProtoWith
+ }
+ }
+ {
+- resultobj = MakePyOutputBytes(result);
++ PyObject *input_type = resultobj;
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ }
+ }
+ {
+ delete arg2;
+@@ -5266,81 +5872,63 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- bool arg3 ;
+- int arg4 ;
+- float arg5 ;
+- bool arg6 ;
+- bool arg7 ;
+- bool arg8 ;
++ std::vector< std::vector< std::string > > *arg2 = 0 ;
++ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- bool val3 ;
++ int val3 ;
+ int ecode3 = 0 ;
+- int val4 ;
+- int ecode4 = 0 ;
+- float val5 ;
+- int ecode5 = 0 ;
+- bool val6 ;
+- int ecode6 = 0 ;
+- bool val7 ;
+- int ecode7 = 0 ;
+- bool val8 ;
+- int ecode8 = 0 ;
+- PyObject *swig_obj[8] ;
+- std::vector< int > result;
++ PyObject *swig_obj[3] ;
++ BytesArray result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsIds", 8, 8, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch", 3, 3, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
+- }
+- arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+- ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "3"" of type '" "bool""'");
+- }
+- arg3 = static_cast< bool >(val3);
+- ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "4"" of type '" "int""'");
+- }
+- arg4 = static_cast< int >(val4);
+- ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
+- if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "5"" of type '" "float""'");
+- }
+- arg5 = static_cast< float >(val5);
+- ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+- if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "6"" of type '" "bool""'");
+- }
+- arg6 = static_cast< bool >(val6);
+- ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+- if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "7"" of type '" "bool""'");
+- }
+- arg7 = static_cast< bool >(val7);
+- ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
+- if (!SWIG_IsOK(ecode8)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsIds" "', argument " "8"" of type '" "bool""'");
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<std::vector<std::string>> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::vector<std::string>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ const PyInputString ustring(PyList_GetItem(o, j));
++ if (ustring.IsAvalable()) {
++ (*out)[i][j].assign(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch" "', argument " "3"" of type '" "int""'");
+ }
+- arg8 = static_cast< bool >(val8);
++ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__EncodeAsIds(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< std::string > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5350,49 +5938,49 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIds(PyObject *SWIGUNU
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
++ PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
+ }
+ }
++ {
++ delete arg2;
++ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+- bool arg3 ;
+- int arg4 ;
+- float arg5 ;
++ int arg3 ;
++ bool arg4 ;
++ bool arg5 ;
+ bool arg6 ;
+ bool arg7 ;
+- bool arg8 ;
+- bool arg9 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- bool val3 ;
++ int val3 ;
+ int ecode3 = 0 ;
+- int val4 ;
++ bool val4 ;
+ int ecode4 = 0 ;
+- float val5 ;
++ bool val5 ;
+ int ecode5 = 0 ;
+ bool val6 ;
+ int ecode6 = 0 ;
+ bool val7 ;
+ int ecode7 = 0 ;
+- bool val8 ;
+- int ecode8 = 0 ;
+- bool val9 ;
+- int ecode9 = 0 ;
+- PyObject *swig_obj[9] ;
+- std::vector< std::string > result;
++ PyObject *swig_obj[7] ;
++ std::vector< std::vector< int > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsPieces", 9, 9, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsIds", 7, 7, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5404,44 +5992,34 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIG
+ resultobj = ustring.input_type();
+ arg2 = absl::string_view(ustring.data(), ustring.size());
+ }
+- ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "3"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "3"" of type '" "int""'");
+ }
+- arg3 = static_cast< bool >(val3);
+- ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "4"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "4"" of type '" "bool""'");
+ }
+- arg4 = static_cast< int >(val4);
+- ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "5"" of type '" "float""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "5"" of type '" "bool""'");
+ }
+- arg5 = static_cast< float >(val5);
++ arg5 = static_cast< bool >(val5);
+ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "6"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "6"" of type '" "bool""'");
+ }
+ arg6 = static_cast< bool >(val6);
+ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+ if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "7"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "7"" of type '" "bool""'");
+ }
+ arg7 = static_cast< bool >(val7);
+- ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
+- if (!SWIG_IsOK(ecode8)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "8"" of type '" "bool""'");
+- }
+- arg8 = static_cast< bool >(val8);
+- ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
+- if (!SWIG_IsOK(ecode9)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsPieces" "', argument " "9"" of type '" "bool""'");
+- }
+- arg9 = static_cast< bool >(val9);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__EncodeAsPieces(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
++ result = sentencepiece_SentencePieceProcessor__NBestEncodeAsIds((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5449,10 +6027,13 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIG
+ }
+ }
+ {
+- PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ PyObject *obj = PyList_New(result[i].size());
++ for (size_t j = 0; j < result[i].size(); ++j) {
++ PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ }
++ PyList_SetItem(resultobj, i, obj);
+ }
+ }
+ return resultobj;
+@@ -5461,7 +6042,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+@@ -5469,6 +6050,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ bool arg4 ;
+ bool arg5 ;
+ bool arg6 ;
++ bool arg7 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+ int val3 ;
+@@ -5479,13 +6061,15 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ int ecode5 = 0 ;
+ bool val6 ;
+ int ecode6 = 0 ;
+- PyObject *swig_obj[6] ;
+- std::vector< std::vector< int > > result;
++ bool val7 ;
++ int ecode7 = 0 ;
++ PyObject *swig_obj[7] ;
++ std::vector< std::vector< std::string > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsIds", 6, 6, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsPieces", 7, 7, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5499,27 +6083,32 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "3"" of type '" "int""'");
+ }
+ arg3 = static_cast< int >(val3);
+ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "4"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "4"" of type '" "bool""'");
+ }
+ arg4 = static_cast< bool >(val4);
+ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "5"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "5"" of type '" "bool""'");
+ }
+ arg5 = static_cast< bool >(val5);
+ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsIds" "', argument " "6"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "6"" of type '" "bool""'");
+ }
+ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__NBestEncodeAsIds(arg1,arg2,arg3,arg4,arg5,arg6);
++ result = sentencepiece_SentencePieceProcessor__NBestEncodeAsPieces((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5527,11 +6116,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ }
+ }
+ {
++ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
+ }
+ PyList_SetItem(resultobj, i, obj);
+ }
+@@ -5542,7 +6132,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+@@ -5564,12 +6154,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject
+ bool val7 ;
+ int ecode7 = 0 ;
+ PyObject *swig_obj[7] ;
+- std::vector< std::vector< std::string > > result;
++ sentencepiece::util::bytes result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsPieces", 7, 7, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsSerializedProto", 7, 7, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5583,32 +6173,32 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "3"" of type '" "int""'");
+ }
+ arg3 = static_cast< int >(val3);
+ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "4"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "4"" of type '" "bool""'");
+ }
+ arg4 = static_cast< bool >(val4);
+ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "5"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "5"" of type '" "bool""'");
+ }
+ arg5 = static_cast< bool >(val5);
+ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "6"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "6"" of type '" "bool""'");
+ }
+ arg6 = static_cast< bool >(val6);
+ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+ if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__NBestEncodeAsPieces" "', argument " "7"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__NBestEncodeAsSerializedProto" "', argument " "7"" of type '" "bool""'");
+ }
+ arg7 = static_cast< bool >(val7);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__NBestEncodeAsPieces(arg1,arg2,arg3,arg4,arg5,arg6,arg7);
++ result = sentencepiece_SentencePieceProcessor__NBestEncodeAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5616,15 +6206,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject
+ }
+ }
+ {
+- PyObject *input_type = resultobj;
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].size());
+- for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
+- }
+- PyList_SetItem(resultobj, i, obj);
+- }
++ resultobj = MakePyOutputBytes(result);
+ }
+ return resultobj;
+ fail:
+@@ -5643,6 +6225,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyO
+ bool arg7 ;
+ bool arg8 ;
+ bool arg9 ;
++ bool arg10 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+ int val3 ;
+@@ -5659,13 +6242,15 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyO
+ int ecode8 = 0 ;
+ bool val9 ;
+ int ecode9 = 0 ;
+- PyObject *swig_obj[9] ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
+ std::vector< std::pair< std::vector< int >,float > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__SampleEncodeAndScoreAsIds", 9, 9, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__SampleEncodeAndScoreAsIds", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5712,9 +6297,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyO
+ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsIds" "', argument " "9"" of type '" "bool""'");
+ }
+ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsIds" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
++ result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5773,7 +6363,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(
+ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__SampleEncodeAndScoreAsPieces", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5827,7 +6417,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(
+ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5851,6 +6441,133 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropy(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ float arg3 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ float val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ float result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__CalculateEntropy", 3, 3, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = absl::string_view(ustring.data(), ustring.size());
++ }
++ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__CalculateEntropy" "', argument " "3"" of type '" "float""'");
++ }
++ arg3 = static_cast< float >(val3);
++ {
++ try {
++ result = (float)sentencepiece_SentencePieceProcessor__CalculateEntropy(arg1,arg2,arg3);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_float(static_cast< float >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropyBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ float arg3 ;
++ int arg4 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ float val3 ;
++ int ecode3 = 0 ;
++ int val4 ;
++ int ecode4 = 0 ;
++ PyObject *swig_obj[4] ;
++ std::vector< float > result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__CalculateEntropyBatch", 4, 4, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__CalculateEntropyBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__CalculateEntropyBatch" "', argument " "3"" of type '" "float""'");
++ }
++ arg3 = static_cast< float >(val3);
++ ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__CalculateEntropyBatch" "', argument " "4"" of type '" "int""'");
++ }
++ arg4 = static_cast< int >(val4);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__CalculateEntropyBatch(arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SetItem(resultobj, i, PyFloat_FromDouble(static_cast<double>(result[i])));
++ }
++ }
++ {
++ delete arg2;
++ }
++ return resultobj;
++fail:
++ {
++ delete arg2;
++ }
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *SentencePieceProcessor_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *obj;
+ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
+@@ -6191,20 +6908,9 @@ static PyMethodDef SwigMethods[] = {
+ { "SentencePieceProcessor_SetVocabulary", _wrap_SentencePieceProcessor_SetVocabulary, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_ResetVocabulary", _wrap_SentencePieceProcessor_ResetVocabulary, METH_O, NULL},
+ { "SentencePieceProcessor_LoadVocabulary", _wrap_SentencePieceProcessor_LoadVocabulary, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_EncodeAsPieces", _wrap_SentencePieceProcessor_EncodeAsPieces, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_EncodeAsIds", _wrap_SentencePieceProcessor_EncodeAsIds, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_NBestEncodeAsPieces", _wrap_SentencePieceProcessor_NBestEncodeAsPieces, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_NBestEncodeAsIds", _wrap_SentencePieceProcessor_NBestEncodeAsIds, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_SampleEncodeAsPieces", _wrap_SentencePieceProcessor_SampleEncodeAsPieces, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_SampleEncodeAsIds", _wrap_SentencePieceProcessor_SampleEncodeAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_SampleEncodeAndScoreAsPieces", _wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_SampleEncodeAndScoreAsIds", _wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_DecodePieces", _wrap_SentencePieceProcessor_DecodePieces, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_CalculateEntropy", _wrap_SentencePieceProcessor_CalculateEntropy, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_EncodeAsSerializedProto", _wrap_SentencePieceProcessor_EncodeAsSerializedProto, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_SampleEncodeAsSerializedProto", _wrap_SentencePieceProcessor_SampleEncodeAsSerializedProto, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_NBestEncodeAsSerializedProto", _wrap_SentencePieceProcessor_NBestEncodeAsSerializedProto, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_DecodePiecesAsSerializedProto", _wrap_SentencePieceProcessor_DecodePiecesAsSerializedProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_GetPieceSize", _wrap_SentencePieceProcessor_GetPieceSize, METH_O, NULL},
+ { "SentencePieceProcessor_PieceToId", _wrap_SentencePieceProcessor_PieceToId, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_IdToPiece", _wrap_SentencePieceProcessor_IdToPiece, METH_VARARGS, NULL},
+@@ -6219,14 +6925,27 @@ static PyMethodDef SwigMethods[] = {
+ { "SentencePieceProcessor_pad_id", _wrap_SentencePieceProcessor_pad_id, METH_O, NULL},
+ { "SentencePieceProcessor_serialized_model_proto", _wrap_SentencePieceProcessor_serialized_model_proto, METH_O, NULL},
+ { "SentencePieceProcessor_LoadFromFile", _wrap_SentencePieceProcessor_LoadFromFile, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_DecodeIdsWithCheck", _wrap_SentencePieceProcessor_DecodeIdsWithCheck, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck", _wrap_SentencePieceProcessor_DecodeIdsAsSerializedProtoWithCheck, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsIds", _wrap_SentencePieceProcessor__EncodeAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsPieces", _wrap_SentencePieceProcessor__EncodeAsPieces, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsSerializedProto", _wrap_SentencePieceProcessor__EncodeAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsIdsBatch", _wrap_SentencePieceProcessor__EncodeAsIdsBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsPiecesBatch", _wrap_SentencePieceProcessor__EncodeAsPiecesBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsSerializedProtoBatch", _wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIds", _wrap_SentencePieceProcessor__DecodeIds, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePieces", _wrap_SentencePieceProcessor__DecodePieces, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIdsAsSerializedProto", _wrap_SentencePieceProcessor__DecodeIdsAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePiecesAsSerializedProto", _wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIdsBatch", _wrap_SentencePieceProcessor__DecodeIdsBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePiecesBatch", _wrap_SentencePieceProcessor__DecodePiecesBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__NBestEncodeAsIds", _wrap_SentencePieceProcessor__NBestEncodeAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__NBestEncodeAsPieces", _wrap_SentencePieceProcessor__NBestEncodeAsPieces, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__NBestEncodeAsSerializedProto", _wrap_SentencePieceProcessor__NBestEncodeAsSerializedProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__SampleEncodeAndScoreAsIds", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__SampleEncodeAndScoreAsPieces", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__CalculateEntropy", _wrap_SentencePieceProcessor__CalculateEntropy, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__CalculateEntropyBatch", _wrap_SentencePieceProcessor__CalculateEntropyBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_swigregister", SentencePieceProcessor_swigregister, METH_O, NULL},
+ { "SentencePieceProcessor_swiginit", SentencePieceProcessor_swiginit, METH_VARARGS, NULL},
+ { "SetRandomGeneratorSeed", _wrap_SetRandomGeneratorSeed, METH_O, NULL},
+@@ -6252,8 +6971,11 @@ static swig_type_info _swigt__p_sentencepiece__SentencePieceProcessor = {"_p_sen
+ static swig_type_info _swigt__p_sentencepiece__SentencePieceTrainer = {"_p_sentencepiece__SentencePieceTrainer", "sentencepiece::SentencePieceTrainer *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__string = {"_p_std__string", "sentencepiece::util::bytes *|std::string *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__unordered_mapT_std__string_std__string_t = {"_p_std__unordered_mapT_std__string_std__string_t", "std::unordered_map< std::string,std::string > *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_std__vectorT_absl__string_view_t = {"_p_std__vectorT_absl__string_view_t", "std::vector< absl::string_view > *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__vectorT_int_t = {"_p_std__vectorT_int_t", "std::vector< int > *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__vectorT_std__string_t = {"_p_std__vectorT_std__string_t", "std::vector< std::string > *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_std__vectorT_std__vectorT_int_t_t = {"_p_std__vectorT_std__vectorT_int_t_t", "std::vector< std::vector< int > > *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_std__vectorT_std__vectorT_std__string_t_t = {"_p_std__vectorT_std__vectorT_std__string_t_t", "std::vector< std::vector< std::string > > *", 0, 0, (void*)0, 0};
+
+ static swig_type_info *swig_type_initial[] = {
+ &_swigt__p_char,
+@@ -6262,8 +6984,11 @@ static swig_type_info *swig_type_initial[] = {
+ &_swigt__p_sentencepiece__SentencePieceTrainer,
+ &_swigt__p_std__string,
+ &_swigt__p_std__unordered_mapT_std__string_std__string_t,
++ &_swigt__p_std__vectorT_absl__string_view_t,
+ &_swigt__p_std__vectorT_int_t,
+ &_swigt__p_std__vectorT_std__string_t,
++ &_swigt__p_std__vectorT_std__vectorT_int_t_t,
++ &_swigt__p_std__vectorT_std__vectorT_std__string_t_t,
+ };
+
+ static swig_cast_info _swigc__p_char[] = { {&_swigt__p_char, 0, 0, 0},{0, 0, 0, 0}};
+@@ -6272,8 +6997,11 @@ static swig_cast_info _swigc__p_sentencepiece__SentencePieceProcessor[] = { {&_
+ static swig_cast_info _swigc__p_sentencepiece__SentencePieceTrainer[] = { {&_swigt__p_sentencepiece__SentencePieceTrainer, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__string[] = { {&_swigt__p_std__string, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__unordered_mapT_std__string_std__string_t[] = { {&_swigt__p_std__unordered_mapT_std__string_std__string_t, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_std__vectorT_absl__string_view_t[] = { {&_swigt__p_std__vectorT_absl__string_view_t, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__vectorT_int_t[] = { {&_swigt__p_std__vectorT_int_t, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__vectorT_std__string_t[] = { {&_swigt__p_std__vectorT_std__string_t, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_std__vectorT_std__vectorT_int_t_t[] = { {&_swigt__p_std__vectorT_std__vectorT_int_t_t, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_std__vectorT_std__vectorT_std__string_t_t[] = { {&_swigt__p_std__vectorT_std__vectorT_std__string_t_t, 0, 0, 0},{0, 0, 0, 0}};
+
+ static swig_cast_info *swig_cast_initial[] = {
+ _swigc__p_char,
+@@ -6282,8 +7010,11 @@ static swig_cast_info *swig_cast_initial[] = {
+ _swigc__p_sentencepiece__SentencePieceTrainer,
+ _swigc__p_std__string,
+ _swigc__p_std__unordered_mapT_std__string_std__string_t,
++ _swigc__p_std__vectorT_absl__string_view_t,
+ _swigc__p_std__vectorT_int_t,
+ _swigc__p_std__vectorT_std__string_t,
++ _swigc__p_std__vectorT_std__vectorT_int_t_t,
++ _swigc__p_std__vectorT_std__vectorT_std__string_t_t,
+ };
+
+
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index b747e81..99e36f3 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -15,7 +15,6 @@
+ # See the License for the specific language governing permissions and
+ # limitations under the License.!
+
+-import codecs
+ import io
+ import sentencepiece as spm
+ import unittest
+@@ -62,6 +61,17 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ piece = self.sp_.IdToPiece(i)
+ self.assertEqual(i, self.sp_.PieceToId(piece))
+
++ self.assertEqual(1000, self.sp_.get_piece_size())
++ self.assertEqual(0, self.sp_.piece_to_id('<unk>'))
++ self.assertEqual(1, self.sp_.piece_to_id('<s>'))
++ self.assertEqual(2, self.sp_.piece_to_id('</s>'))
++ self.assertEqual('<unk>', self.sp_.id_to_piece(0))
++ self.assertEqual('<s>', self.sp_.id_to_piece(1))
++ self.assertEqual('</s>', self.sp_.id_to_piece(2))
++ for i in range(self.sp_.get_piece_size()):
++ piece = self.sp_.id_to_piece(i)
++ self.assertEqual(i, self.sp_.piece_to_id(piece))
++
+ def test_roundtrip(self):
+ text = 'I saw a girl with a telescope.'
+ ids = self.sp_.EncodeAsIds(text)
+@@ -82,6 +92,34 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(
+ text, self.sp_.DecodeIds(self.sp_.SampleEncodeAsIds(text, -1, 0.5)))
+
++ ids2 = self.sp_.encode_as_ids(text)
++ pieces3 = self.sp_.encode_as_pieces(text)
++ pieces4 = self.sp_.nbest_encode_as_pieces(text, 10)[0]
++ self.assertEqual(pieces3, pieces4)
++ self.assertEqual(pieces1, pieces3)
++ self.assertEqual(ids, ids2)
++ self.assertEqual(text, self.sp_.decode_pieces(pieces3))
++ self.assertEqual(text, self.sp_.decode_ids(ids2))
++ for n in range(100):
++ self.assertEqual(
++ text,
++ self.sp_.decode_pieces(
++ self.sp_.sample_encode_as_pieces(text, 64, 0.5)))
++ self.assertEqual(
++ text,
++ self.sp_.decode_pieces(
++ self.sp_.sample_encode_as_pieces(text, -1, 0.5)))
++ self.assertEqual(
++ text,
++ self.sp_.decode_ids(self.sp_.sample_encode_as_ids(text, 64, 0.5)))
++ self.assertEqual(
++ text,
++ self.sp_.decode_ids(self.sp_.sample_encode_as_ids(text, -1, 0.5)))
++
++ self.assertEqual(
++ self.sp_.calculate_entropy(text, 0.1),
++ self.sp_.CalculateEntropy(text, 0.1))
++
+ def test_ja_load(self):
+ self.assertEqual(8000, self.jasp_.GetPieceSize())
+ self.assertEqual(0, self.jasp_.PieceToId('<unk>'))
+@@ -94,6 +132,17 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ piece = self.jasp_.IdToPiece(i)
+ self.assertEqual(i, self.jasp_.PieceToId(piece))
+
++ self.assertEqual(8000, self.jasp_.get_piece_size())
++ self.assertEqual(0, self.jasp_.piece_to_id('<unk>'))
++ self.assertEqual(1, self.jasp_.piece_to_id('<s>'))
++ self.assertEqual(2, self.jasp_.piece_to_id('</s>'))
++ self.assertEqual('<unk>', self.jasp_.id_to_piece(0))
++ self.assertEqual('<s>', self.jasp_.id_to_piece(1))
++ self.assertEqual('</s>', self.jasp_.id_to_piece(2))
++ for i in range(self.jasp_.get_piece_size()):
++ piece = self.jasp_.id_to_piece(i)
++ self.assertEqual(i, self.jasp_.piece_to_id(piece))
++
+ def test_ja_roundtrip(self):
+ text = '清水寺は京都にある。'
+ ids = self.jasp_.EncodeAsIds(text)
+@@ -112,40 +161,27 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.jasp_.DecodePieces(
+ self.jasp_.SampleEncodeAsPieces(text, -1, 0.5)))
+
+- def test_unicode_roundtrip(self):
+- text = u'I saw a girl with a telescope.'
+- ids = self.sp_.EncodeAsIds(text)
+- pieces = self.sp_.EncodeAsPieces(text)
+- self.assertEqual(text, self.sp_.DecodePieces(pieces))
+- self.assertEqual(text, self.sp_.DecodeIds(ids))
+- # python2 returns `str`.
+- if sys.version_info < (3, 0, 0):
+- text = text.encode('utf-8')
+- self.assertEqual(text, self.sp_.DecodeIds(ids))
+- self.assertEqual(text, self.sp_.DecodePieces(pieces))
+-
+- def test_unicode_ja_roundtrip(self):
+- text = u'清水寺は京都にある。'
+- ids = self.jasp_.EncodeAsIds(text)
+- pieces = self.jasp_.EncodeAsPieces(text)
+- self.assertEqual(text, self.jasp_.DecodePieces(pieces))
+- # python2 returns `str`.
+- if sys.version_info < (3, 0, 0):
+- text = text.encode('utf-8')
+- self.assertEqual(text, self.jasp_.DecodeIds(ids))
+-
+- def test_pickle(self):
+- with open('sp.pickle', 'wb') as f:
+- pickle.dump(self.sp_, f)
+-
+- id1 = self.sp_.encode('hello world.', out_type=int)
+-
+- with open('sp.pickle', 'rb') as f:
+- sp = pickle.load(f)
+-
+- id2 = sp.encode('hello world.', out_type=int)
++ ids2 = self.jasp_.encode_as_ids(text)
++ pieces3 = self.jasp_.encode_as_pieces(text)
++ pieces4 = self.jasp_.nbest_encode_as_pieces(text, 10)[0]
++ self.assertEqual(pieces3, pieces4)
++ self.assertEqual(pieces1, pieces3)
++ self.assertEqual(ids, ids2)
++ self.assertEqual(text, self.jasp_.decode_pieces(pieces1))
++ self.assertEqual(text, self.jasp_.decode_ids(ids2))
++ for n in range(100):
++ self.assertEqual(
++ text,
++ self.jasp_.decode_pieces(
++ self.jasp_.sample_encode_as_pieces(text, 64, 0.5)))
++ self.assertEqual(
++ text,
++ self.jasp_.decode_pieces(
++ self.jasp_.sample_encode_as_pieces(text, -1, 0.5)))
+
+- self.assertEqual(id1, id2)
++ self.assertEqual(
++ self.jasp_.calculate_entropy(text, 0.1),
++ self.jasp_.CalculateEntropy(text, 0.1))
+
+ def test_train(self):
+ spm.SentencePieceTrainer.Train('--input=' +
+@@ -153,37 +189,45 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ ' --model_prefix=m --vocab_size=1000')
+ sp = spm.SentencePieceProcessor()
+ sp.Load('m.model')
+- with codecs.open(
+- os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
++ with open(os.path.join(data_dir, 'botchan.txt'), 'r') as file:
+ for line in file:
+ sp.DecodePieces(sp.EncodeAsPieces(line))
+ sp.DecodeIds(sp.EncodeAsIds(line))
+
+- def test_train(self):
++ def test_train_iterator(self):
+ spm.SentencePieceTrainer.Train('--input=' +
+ os.path.join(data_dir, 'botchan.txt') +
+ ' --model_prefix=m --vocab_size=1000')
+ # Load as 'rb' for Python3.5/2.7.
+- is1 = open(os.path.join(data_dir, 'botchan.txt'), 'rb')
+- is2 = open(os.path.join(data_dir, 'botchan.txt'), 'rb')
+ os1 = io.BytesIO()
+ os2 = io.BytesIO()
+
++ # suppress logging (redirect to /dev/null)
+ spm.SentencePieceTrainer.train(
+ input=os.path.join(data_dir, 'botchan.txt'),
+ model_prefix='m',
+- vocab_size=1000)
++ vocab_size=1000,
++ logstream=open(os.devnull, 'w'))
+
+- spm.SentencePieceTrainer.train(
+- sentence_iterator=is1, model_prefix='m', vocab_size=1000)
++ with open(os.path.join(data_dir, 'botchan.txt'), 'rb') as is1:
++ spm.SentencePieceTrainer.train(
++ sentence_iterator=is1,
++ model_prefix='m',
++ vocab_size=1000,
++ logstream=open(os.devnull, 'w'))
+
+ spm.SentencePieceTrainer.train(
+ input=os.path.join(data_dir, 'botchan.txt'),
+ model_writer=os1,
+- vocab_size=1000)
++ vocab_size=1000,
++ logstream=open(os.devnull, 'w'))
+
+- spm.SentencePieceTrainer.train(
+- sentence_iterator=is2, model_writer=os2, vocab_size=1000)
++ with open(os.path.join(data_dir, 'botchan.txt'), 'rb') as is2:
++ spm.SentencePieceTrainer.train(
++ sentence_iterator=is2,
++ model_writer=os2,
++ vocab_size=1000,
++ logstream=open(os.devnull, 'w'))
+
+ sp1 = spm.SentencePieceProcessor(model_proto=os1.getvalue())
+ sp2 = spm.SentencePieceProcessor(model_proto=os2.getvalue())
+@@ -200,127 +244,37 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ logstream=open(os.devnull, 'w'))
+ sp = spm.SentencePieceProcessor()
+ sp.Load('m.model')
+- with codecs.open(
++ with open(
+ os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
+ for line in file:
+ sp.DecodePieces(sp.EncodeAsPieces(line))
+ sp.DecodeIds(sp.EncodeAsIds(line))
+
+- # snake case API.
+- def test_load_snake(self):
+- self.assertEqual(1000, self.sp_.get_piece_size())
+- self.assertEqual(0, self.sp_.piece_to_id('<unk>'))
+- self.assertEqual(1, self.sp_.piece_to_id('<s>'))
+- self.assertEqual(2, self.sp_.piece_to_id('</s>'))
+- self.assertEqual('<unk>', self.sp_.id_to_piece(0))
+- self.assertEqual('<s>', self.sp_.id_to_piece(1))
+- self.assertEqual('</s>', self.sp_.id_to_piece(2))
+- for i in range(self.sp_.get_piece_size()):
+- piece = self.sp_.id_to_piece(i)
+- self.assertEqual(i, self.sp_.piece_to_id(piece))
+-
+- def test_roundtrip_snake(self):
+- text = 'I saw a girl with a telescope.'
+- ids = self.sp_.encode_as_ids(text)
+- pieces1 = self.sp_.encode_as_pieces(text)
+- pieces2 = self.sp_.nbest_encode_as_pieces(text, 10)[0]
+- self.assertEqual(pieces1, pieces2)
+- self.assertEqual(text, self.sp_.decode_pieces(pieces1))
+- self.assertEqual(text, self.sp_.decode_ids(ids))
+- for n in range(100):
+- self.assertEqual(
+- text,
+- self.sp_.decode_pieces(
+- self.sp_.sample_encode_as_pieces(text, 64, 0.5)))
+- self.assertEqual(
+- text,
+- self.sp_.decode_pieces(
+- self.sp_.sample_encode_as_pieces(text, -1, 0.5)))
+- self.assertEqual(
+- text,
+- self.sp_.decode_ids(self.sp_.sample_encode_as_ids(text, 64, 0.5)))
+- self.assertEqual(
+- text,
+- self.sp_.decode_ids(self.sp_.sample_encode_as_ids(text, -1, 0.5)))
+-
+- def test_ja_load_snake(self):
+- self.assertEqual(8000, self.jasp_.get_piece_size())
+- self.assertEqual(0, self.jasp_.piece_to_id('<unk>'))
+- self.assertEqual(1, self.jasp_.piece_to_id('<s>'))
+- self.assertEqual(2, self.jasp_.piece_to_id('</s>'))
+- self.assertEqual('<unk>', self.jasp_.id_to_piece(0))
+- self.assertEqual('<s>', self.jasp_.id_to_piece(1))
+- self.assertEqual('</s>', self.jasp_.id_to_piece(2))
+- for i in range(self.jasp_.get_piece_size()):
+- piece = self.jasp_.id_to_piece(i)
+- self.assertEqual(i, self.jasp_.piece_to_id(piece))
+-
+- def test_ja_roundtrip_snake(self):
+- text = '清水寺は京都にある。'
+- ids = self.jasp_.encode_as_ids(text)
+- pieces1 = self.jasp_.encode_as_pieces(text)
+- pieces2 = self.jasp_.nbest_encode_as_pieces(text, 10)[0]
+- self.assertEqual(pieces1, pieces2)
+- self.assertEqual(text, self.jasp_.decode_pieces(pieces1))
+- self.assertEqual(text, self.jasp_.decode_ids(ids))
+- for n in range(100):
+- self.assertEqual(
+- text,
+- self.jasp_.decode_pieces(
+- self.jasp_.sample_encode_as_pieces(text, 64, 0.5)))
+- self.assertEqual(
+- text,
+- self.jasp_.decode_pieces(
+- self.jasp_.sample_encode_as_pieces(text, -1, 0.5)))
+-
+- def test_unicode_roundtrip_snake(self):
+- text = u'I saw a girl with a telescope.'
+- ids = self.sp_.encode_as_ids(text)
+- pieces = self.sp_.encode_as_pieces(text)
+- self.assertEqual(text, self.sp_.decode_pieces(pieces))
+- # python2 returns `str`.
+- if sys.version_info < (3, 0, 0):
+- text = text.encode('utf-8')
+- self.assertEqual(text, self.sp_.decode_ids(ids))
+-
+- def test_unicode_ja_roundtrip_snake(self):
+- text = u'清水寺は京都にある。'
+- ids = self.jasp_.encode_as_ids(text)
+- pieces = self.jasp_.encode_as_pieces(text)
+- self.assertEqual(text, self.jasp_.decode_pieces(pieces))
+- # python2 returns `str`.
+- if sys.version_info < (3, 0, 0):
+- text = text.encode('utf-8')
+- self.assertEqual(text, self.jasp_.decode_ids(ids))
+-
+- def test_train_snake(self):
+- spm.SentencePieceTrainer.train('--input=' +
+- os.path.join(data_dir, 'botchan.txt') +
+- ' --model_prefix=m --vocab_size=1000')
+- sp = spm.SentencePieceProcessor()
+- sp.load('m.model')
+- with codecs.open(
+- os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
+- for line in file:
+- sp.decode_pieces(sp.encode_as_pieces(line))
+- sp.decode_ids(sp.encode_as_ids(line))
+-
+ def test_serialized_proto(self):
+- text = u'I saw a girl with a telescope.'
+- self.assertNotEqual('', self.sp_.EncodeAsSerializedProto(text))
+- self.assertNotEqual('',
+- self.sp_.SampleEncodeAsSerializedProto(text, 10, 0.2))
+- self.assertNotEqual('', self.sp_.NBestEncodeAsSerializedProto(text, 10))
+- self.assertNotEqual('',
+- self.sp_.DecodePiecesAsSerializedProto(['foo', 'bar']))
+- self.assertNotEqual('', self.sp_.DecodeIdsAsSerializedProto([20, 30]))
+- self.assertNotEqual('', self.sp_.encode_as_serialized_proto(text))
+- self.assertNotEqual(
+- '', self.sp_.sample_encode_as_serialized_proto(text, 10, 0.2))
+- self.assertNotEqual('', self.sp_.nbest_encode_as_serialized_proto(text, 10))
+- self.assertNotEqual(
+- '', self.sp_.decode_pieces_as_serialized_proto(['foo', 'bar']))
+- self.assertNotEqual('', self.sp_.decode_ids_as_serialized_proto([20, 30]))
++ text = 'I saw a girl with a telescope.'
++ s1 = self.sp_.EncodeAsSerializedProto(text)
++ s2 = self.sp_.SampleEncodeAsSerializedProto(text, 10, 0.2)
++ s3 = self.sp_.NBestEncodeAsSerializedProto(text, 10)
++ s4 = self.sp_.DecodePiecesAsSerializedProto(['foo', 'bar'])
++ s5 = self.sp_.DecodeIdsAsSerializedProto([20, 30])
++
++ t1 = self.sp_.encode_as_serialized_proto(text)
++ t2 = self.sp_.sample_encode_as_serialized_proto(text, 10, 0.2)
++ t3 = self.sp_.nbest_encode_as_serialized_proto(text, 10)
++ t4 = self.sp_.decode_pieces_as_serialized_proto(['foo', 'bar'])
++ t5 = self.sp_.decode_ids_as_serialized_proto([20, 30])
++
++ self.assertEqual(type(s1), bytes)
++ self.assertEqual(type(s2), bytes)
++ self.assertEqual(type(t2), bytes)
++ self.assertEqual(type(s3), bytes)
++ self.assertEqual(type(s4), bytes)
++ self.assertEqual(type(s5), bytes)
++
++ self.assertEqual(s1, t1)
++ self.assertEqual(s3, t3)
++ self.assertEqual(s4, t4)
++ self.assertEqual(s5, t5)
+
+ def test_new_api(self):
+ sp = spm.SentencePieceProcessor(
+@@ -331,19 +285,33 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ ids2 = self.sp_.EncodeAsIds(text2)
+ pieces = self.sp_.EncodeAsPieces(text)
+ pieces2 = self.sp_.EncodeAsPieces(text2)
+- self.assertEqual(sp.encode(text), ids)
++ protos = self.sp_.EncodeAsSerializedProto(text)
++ proto2 = self.sp_.EncodeAsSerializedProto(text2)
++
++ self.assertEqual(sp.encode(text, out_type=int), ids)
+ self.assertEqual(sp.encode(text, out_type=str), pieces)
++ self.assertEqual(sp.encode(text, out_type='proto'), protos)
++
++ self.assertEqual(sp.encode([text], out_type=int), [ids])
++ self.assertEqual(sp.encode([text], out_type=str), [pieces])
++ self.assertEqual(sp.encode([text], out_type='proto'), [protos])
++
+ detok_ids = self.sp_.DecodeIds(ids)
+ detok_pieces = self.sp_.DecodePieces(pieces)
+ self.assertEqual(sp.decode(ids), detok_ids)
+ self.assertEqual(sp.decode(pieces), detok_pieces)
++ self.assertEqual(sp.decode([]), '')
++ self.assertEqual(sp.decode([[]]), [''])
+
+ # add_bos, add_eos, reverse
+ self.assertEqual([sp.bos_id()] + ids, sp.encode(text, add_bos=True))
+ self.assertEqual(ids + [sp.eos_id()], sp.encode(text, add_eos=True))
++ self.assertEqual(ids + [sp.eos_id()], sp.EncodeAsIds(text, add_eos=True))
+ rids = ids[:]
+ rids.reverse()
++
+ self.assertEqual(rids, sp.encode(text, reverse=True))
++ self.assertEqual(rids, sp.EncodeAsIds(text, reverse=True))
+
+ # different shape.
+ self.assertEqual([ids, ids2], sp.encode([text, text2]))
+@@ -351,6 +319,29 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual([text, text2], sp.decode([ids, ids2]))
+ self.assertEqual([text, text2], sp.decode([pieces, pieces2]))
+
++ pieces = list(reversed(self.sp_.EncodeAsPieces(text)))
++ self.assertEqual(pieces, sp.encode(text, reverse=True, out_type=str))
++
++ # emit unk piece
++ unk_char = '藤'
++ pieces = self.sp_.EncodeAsIds(unk_char, emit_unk_piece=True)
++ pieces2 = self.sp_.encode(unk_char, out_type=int, emit_unk_piece=True)
++ self.assertEqual(pieces[1], sp.unk_id())
++ self.assertEqual(pieces2[1], sp.unk_id())
++ self.assertEqual(pieces, pieces2)
++
++ pieces = self.sp_.EncodeAsPieces(unk_char, emit_unk_piece=True)
++ pieces2 = self.sp_.encode(unk_char, out_type=str, emit_unk_piece=True)
++ self.assertEqual(pieces[1], '<unk>')
++ self.assertEqual(pieces2[1], '<unk>')
++ self.assertEqual(pieces, pieces2)
++
++ pieces = self.sp_.EncodeAsPieces(unk_char, emit_unk_piece=False)
++ pieces2 = self.sp_.encode(unk_char, out_type=str, emit_unk_piece=False)
++ self.assertEqual(pieces[1], unk_char)
++ self.assertEqual(pieces2[1], unk_char)
++ self.assertEqual(pieces, pieces2)
++
+ def test_new_api_init(self):
+ sp = spm.SentencePieceProcessor(
+ model_file=os.path.join('test', 'test_model.model'),
+@@ -361,7 +352,10 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ pieces = ['<s>'] + self.sp_.EncodeAsPieces(text) + ['</s>']
+ self.assertEqual(pieces, sp.encode(text))
+
+- def test_new_api_sampling(self):
++ pieces = self.sp_.EncodeAsPieces(text) + ['</s>']
++ self.assertEqual(pieces, sp.encode(text, add_bos=False, add_eos=True))
++
++ def test_sampling(self):
+ sp = spm.SentencePieceProcessor(
+ model_file=os.path.join('test', 'test_model.model'),
+ out_type=str,
+@@ -376,25 +370,35 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ ++ids2[' '.join(sp.encode('hello world', enable_sampling=False))]
+ self.assertEqual(len(ids2), 1)
+
+- def test_new_api_nbest(self):
++ def test_nbest(self):
+ sp = spm.SentencePieceProcessor(
+ model_file=os.path.join('test', 'test_model.model'))
+- results = sp.nbest_encode('hello world', nbest_size=10, out_type=str)
++ text = 'hello world'
++ results = sp.nbest_encode(text, nbest_size=10, out_type=str)
++ self.assertEqual(results, sp.NBestEncode(text, nbest_size=10, out_type=str))
+ for n in results:
+- self.assertEqual(sp.decode(n), 'hello world')
+- results = sp.nbest_encode('hello world', nbest_size=10, out_type=int)
++ self.assertEqual(sp.decode(n), text)
++ decoded = sp.decode(results)
++ for n in decoded:
++ self.assertEqual(n, text)
++ results = sp.nbest_encode(text, nbest_size=10, out_type=int)
++ self.assertEqual(results, sp.NBestEncode(text, nbest_size=10, out_type=int))
+ for n in results:
+- self.assertEqual(sp.decode(n), 'hello world')
++ self.assertEqual(sp.decode(n), text)
++ decoded = sp.decode(results)
++ for n in decoded:
++ self.assertEqual(n, text)
+
+- def test_new_api_sample_and_score(self):
++ def test_sample_and_score(self):
+ sp = spm.SentencePieceProcessor(
+ model_file=os.path.join('test', 'test_model.model'))
+- results = sp.sample_encode_and_score('hello world', wor=True, out_type=str)
++ text = 'hello world'
++ results = sp.sample_encode_and_score(text, wor=True, out_type=str)
+ for n in results:
+- self.assertEqual(sp.decode(n[0]), 'hello world')
+- results = sp.sample_encode_and_score('hello world', wor=True, out_type=int)
++ self.assertEqual(sp.decode(n[0]), text)
++ results = sp.sample_encode_and_score(text, wor=True, out_type=int)
+ for n in results:
+- self.assertEqual(sp.decode(n[0]), 'hello world')
++ self.assertEqual(sp.decode(n[0]), text)
+
+ def test_valid_range(self):
+ size = self.sp_.piece_size()
+@@ -412,6 +416,82 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ except:
+ self.assertTrue(True)
+
++ def test_batch(self):
++ sp = spm.SentencePieceProcessor(
++ model_file=os.path.join('test', 'test_model.model'))
++ with open(
++ os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
++ texts = file.readlines()
++
++ r1 = sp.encode(texts, out_type=str, num_threads=None)
++ r2 = sp.encode(texts, out_type=str, num_threads=1)
++ r3 = sp.encode(texts, out_type=str, num_threads=-1)
++ r4 = sp.encode(texts, out_type=str, num_threads=8)
++ r5 = [sp.encode(s, out_type=str) for s in texts]
++ self.assertEqual(r1, r2)
++ self.assertEqual(r1, r3)
++ self.assertEqual(r1, r4)
++ self.assertEqual(r1, r5)
++
++ d1 = sp.decode(r1, num_threads=None)
++ d2 = sp.decode(r2, num_threads=1)
++ d3 = sp.decode(r3, num_threads=-1)
++ d4 = sp.decode(r4, num_threads=8)
++ d5 = [sp.decode(s) for s in r5]
++ self.assertEqual(d1, d2)
++ self.assertEqual(d1, d3)
++ self.assertEqual(d1, d4)
++ self.assertEqual(d1, d5)
++
++ r1 = sp.encode(texts, out_type=int, num_threads=None)
++ r2 = sp.encode(texts, out_type=int, num_threads=1)
++ r3 = sp.encode(texts, out_type=int, num_threads=-1)
++ r4 = sp.encode(texts, out_type=int, num_threads=8)
++ r5 = [sp.encode(s, out_type=int) for s in texts]
++ self.assertEqual(r1, r2)
++ self.assertEqual(r1, r3)
++ self.assertEqual(r1, r4)
++ self.assertEqual(r1, r5)
++
++ d1 = sp.decode(r1, num_threads=None)
++ d2 = sp.decode(r2, num_threads=1)
++ d3 = sp.decode(r3, num_threads=-1)
++ d4 = sp.decode(r4, num_threads=8)
++ d5 = [sp.decode(s) for s in r5]
++ self.assertEqual(d1, d2)
++ self.assertEqual(d1, d3)
++ self.assertEqual(d1, d4)
++ self.assertEqual(d1, d5)
++
++ r1 = sp.encode(texts, out_type='proto', num_threads=None)
++ r2 = sp.encode(texts, out_type='proto', num_threads=1)
++ r3 = sp.encode(texts, out_type='proto', num_threads=-1)
++ r4 = sp.encode(texts, out_type='proto', num_threads=8)
++ r5 = [sp.encode(s, out_type='proto') for s in texts]
++ self.assertEqual(r1, r2)
++ self.assertEqual(r1, r3)
++ self.assertEqual(r1, r4)
++ self.assertEqual(r1, r5)
++
++ e1 = sp.calculate_entropy(texts, theta=1.0, num_threads=10)
++ e2 = sp.CalculateEntropy(texts, theta=1.0, num_threads=10)
++ e3 = [sp.calculate_entropy(s, theta=1.0) for s in texts]
++ self.assertEqual(e1, e2)
++ self.assertEqual(e1, e3)
++
++ def test_pickle(self):
++ with open('sp.pickle', 'wb') as f:
++ pickle.dump(self.sp_, f)
++
++ id1 = self.sp_.encode('hello world.', out_type=int)
++
++ with open('sp.pickle', 'rb') as f:
++ sp = pickle.load(f)
++
++ id2 = sp.encode('hello world.', out_type=int)
++
++ self.assertEqual(id1, id2)
++
+
+ def suite():
+ suite = unittest.TestSuite()
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 8 Jun 2022 16:38:21 +0900
+Subject: remove debug symbols from wheel package
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/setup.py | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/python/setup.py b/python/setup.py
+index cfbf0db..198cba7 100755
+--- a/python/setup.py
++++ b/python/setup.py
+@@ -93,6 +93,9 @@ class build_ext(_build_ext):
+ # See: https://github.com/neulab/xnmt/issues/199
+ if sys.platform == 'darwin':
+ cflags.append('-mmacosx-version-min=10.9')
++ else:
++ cflags.append('-Wl,-strip-all')
++ libs.append('-Wl,-strip-all')
+ print('## cflags={}'.format(' '.join(cflags)))
+ print('## libs={}'.format(' '.join(libs)))
+ ext.extra_compile_args = cflags
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Mon, 13 Jun 2022 03:20:23 +0900
+Subject: allow tab character to be used in user_defined_symbols.
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/trainer_interface.cc | 11 ++++++++++-
+ src/util.cc | 5 ++---
+ 2 files changed, 12 insertions(+), 4 deletions(-)
+
+diff --git a/src/trainer_interface.cc b/src/trainer_interface.cc
+index ef0c370..5e26b75 100644
+--- a/src/trainer_interface.cc
++++ b/src/trainer_interface.cc
+@@ -12,6 +12,8 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
++#include "trainer_interface.h"
++
+ #include <algorithm>
+ #include <cstdlib>
+ #include <memory>
+@@ -35,7 +37,6 @@
+ #include "third_party/absl/strings/str_format.h"
+ #include "third_party/absl/strings/str_join.h"
+ #include "third_party/absl/strings/str_split.h"
+-#include "trainer_interface.h"
+ #include "unicode_script.h"
+ #include "util.h"
+
+@@ -699,6 +700,14 @@ util::Status TrainerInterface::SaveVocab(absl::string_view filename) const {
+ auto output = filesystem::NewWritableFile(filename);
+ RETURN_IF_ERROR(output->status());
+
++ for (const auto &piece : model_proto.pieces()) {
++ if (piece.piece().find_first_of(" \t\r\n") != std::string::npos) {
++ LOG(WARNING) << "The piece [" << piece.piece()
++ << "] contains escaped characters that break the format of "
++ << filename;
++ }
++ }
++
+ if (trainer_spec_.vocabulary_output_piece_score()) {
+ for (const auto &piece : model_proto.pieces()) {
+ std::ostringstream os;
+diff --git a/src/util.cc b/src/util.cc
+index 8424448..8da16c4 100644
+--- a/src/util.cc
++++ b/src/util.cc
+@@ -12,10 +12,10 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
+-#include <iostream>
+-
+ #include "util.h"
+
++#include <iostream>
++
+ namespace sentencepiece {
+
+ namespace {
+@@ -217,7 +217,6 @@ std::vector<std::string> StrSplitAsCSV(absl::string_view text) {
+
+ std::vector<std::string> result;
+ for (; str < eos; ++str) {
+- while (*str == ' ' || *str == '\t') ++str;
+ if (*str == '"') {
+ start = ++str;
+ end = start;
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Mon, 13 Jun 2022 16:46:18 +0900
+Subject: add test to use tab as user defined symbols..
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/test/sentencepiece_test.py | 11 ++++++-----
+ 1 file changed, 6 insertions(+), 5 deletions(-)
+
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 99e36f3..6c48bcd 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -240,16 +240,18 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ input=[os.path.join(data_dir, 'botchan.txt')],
+ model_prefix='m',
+ vocab_size=1002,
+- user_defined_symbols=['foo', 'bar', ','],
++ user_defined_symbols=['foo', 'bar', ',', ' ', '\t', '\b', '\n', '\r'],
+ logstream=open(os.devnull, 'w'))
+ sp = spm.SentencePieceProcessor()
+ sp.Load('m.model')
+- with open(
+- os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
++ with open(os.path.join(data_dir, 'botchan.txt'), 'r') as file:
+ for line in file:
+ sp.DecodePieces(sp.EncodeAsPieces(line))
+ sp.DecodeIds(sp.EncodeAsIds(line))
+
++ s = 'hello\tworld\r\nthis\tis a \b pen'
++ self.assertEqual(s, sp.decode(sp.encode(s)))
++
+ def test_serialized_proto(self):
+ text = 'I saw a girl with a telescope.'
+ s1 = self.sp_.EncodeAsSerializedProto(text)
+@@ -419,8 +421,7 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ def test_batch(self):
+ sp = spm.SentencePieceProcessor(
+ model_file=os.path.join('test', 'test_model.model'))
+- with open(
+- os.path.join(data_dir, 'botchan.txt'), 'r', encoding='utf-8') as file:
++ with open(os.path.join(data_dir, 'botchan.txt'), 'r') as file:
+ texts = file.readlines()
+
+ r1 = sp.encode(texts, out_type=str, num_threads=None)
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Tue, 14 Jun 2022 01:18:09 +0900
+Subject: Uses C++17 by default
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ CMakeLists.txt | 4 +-
+ python/setup.py | 6 +-
+ src/CMakeLists.txt | 1 -
+ src/sentencepiece_processor.h | 26 +-
+ third_party/absl/strings/string_view.cc | 267 -----------------
+ third_party/absl/strings/string_view.h | 508 --------------------------------
+ 6 files changed, 9 insertions(+), 803 deletions(-)
+ delete mode 100644 third_party/absl/strings/string_view.cc
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index a791f08..78379a3 100644
+--- a/CMakeLists.txt
++++ b/CMakeLists.txt
+@@ -28,7 +28,7 @@ option(SPM_NO_THREADLOCAL "Disable thread_local operator" OFF)
+ option(SPM_USE_BUILTIN_PROTOBUF "Use built-in protobuf" ON)
+ option(SPM_USE_EXTERNAL_ABSL "Use external abseil" OFF)
+
+-set(CMAKE_CXX_STANDARD 11)
++set(CMAKE_CXX_STANDARD 17)
+ set(CMAKE_CXX_STANDARD_REQUIRED ON)
+
+ if((CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND
+@@ -98,6 +98,8 @@ configure_file("${PROJECT_SOURCE_DIR}/config.h.in" "config.h")
+ configure_file("${PROJECT_SOURCE_DIR}/sentencepiece.pc.in" "sentencepiece.pc" @ONLY)
+
+ if (NOT MSVC)
++ # suppress warning for C++11 features.
++# add_definitions("-Wno-deprecated-declarations -Wno-deprecated-enum-enum-conversion")
+ install(FILES "${CMAKE_CURRENT_BINARY_DIR}/sentencepiece.pc" DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig)
+ endif()
+
+diff --git a/python/setup.py b/python/setup.py
+index 198cba7..fdf9394 100755
+--- a/python/setup.py
++++ b/python/setup.py
+@@ -58,7 +58,7 @@ def is_sentencepiece_installed():
+
+
+ def get_cflags_and_libs(root):
+- cflags = ['-std=c++11', '-I' + os.path.join(root, 'include')]
++ cflags = ['-std=c++17', '-I' + os.path.join(root, 'include')]
+ libs = []
+ if os.path.exists(os.path.join(root, 'lib/pkgconfig/sentencepiece.pc')):
+ libs = [
+@@ -109,13 +109,13 @@ if os.name == 'nt':
+ if sys.maxsize > 2**32:
+ arch = 'amd64'
+ if os.path.exists('..\\build\\root_{}\\lib'.format(arch)):
+- cflags = ['/MT', '/I..\\build\\root_{}\\include'.format(arch)]
++ cflags = ['/std:c++17', '/MT', '/I..\\build\\root_{}\\include'.format(arch)]
+ libs = [
+ '..\\build\\root_{}\\lib\\sentencepiece.lib'.format(arch),
+ '..\\build\\root_{}\\lib\\sentencepiece_train.lib'.format(arch)
+ ]
+ else:
+- cflags = ['/MT', '/I..\\build\\root\\include']
++ cflags = ['/std:c++17', '/MT', '/I..\\build\\root\\include']
+ libs = [
+ '..\\build\\root\\lib\\sentencepiece.lib',
+ '..\\build\\root\\lib\\sentencepiece_train.lib'
+diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
+index 8b7fb76..6cb3922 100644
+--- a/src/CMakeLists.txt
++++ b/src/CMakeLists.txt
+@@ -25,7 +25,6 @@ if (SPM_USE_EXTERNAL_ABSL)
+ endif()
+ else()
+ set(ABSL_FLAGS_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/../third_party/absl/flags/flag.cc)
+- set(ABSL_STRINGS_SRCS ${CMAKE_CURRENT_SOURCE_DIR}/../third_party/absl/strings/string_view.cc)
+ endif()
+
+ if (SPM_USE_BUILTIN_PROTOBUF)
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index 3f9c20d..9d38214 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -18,33 +18,13 @@
+ #include <cstring>
+ #include <memory>
+ #include <string>
++#include <string_view>
+ #include <utility>
+ #include <vector>
+
+-#if defined(_USE_INTERNAL_STRING_VIEW)
+-#include "third_party/absl/strings/string_view.h"
+-#elif defined(_USE_TF_STRING_VIEW)
+-#include "absl/strings/string_view.h"
+-#else
+-// Minimum absl::string_view class that is used only for
+-// the argument of public APIs.
+ namespace absl {
+-class string_view {
+- public:
+- string_view() : ptr_(nullptr), length_(0) {}
+- string_view(const std::string &str) : ptr_(str.data()), length_(str.size()) {}
+- string_view(const char *str) : ptr_(str), length_(std::strlen(str)) {}
+- string_view(const char *data, size_t len) : ptr_(data), length_(len) {}
+-
+- const char *data() const { return ptr_; }
+- size_t size() const { return length_; }
+-
+- private:
+- const char *ptr_ = nullptr;
+- size_t length_ = 0;
+-};
+-} // namespace absl
+-#endif
++using std::string_view;
++}
+
+ namespace sentencepiece {
+
+diff --git a/third_party/absl/strings/string_view.cc b/third_party/absl/strings/string_view.cc
+deleted file mode 100644
+index dce208d..0000000
+--- a/third_party/absl/strings/string_view.cc
++++ /dev/null
+@@ -1,267 +0,0 @@
+-// Copyright 2017 The Abseil Authors.
+-//
+-// Licensed under the Apache License, Version 2.0 (the "License");
+-// you may not use this file except in compliance with the License.
+-// You may obtain a copy of the License at
+-//
+-// http://www.apache.org/licenses/LICENSE-2.0
+-//
+-// Unless required by applicable law or agreed to in writing, software
+-// distributed under the License is distributed on an "AS IS" BASIS,
+-// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+-// See the License for the specific language governing permissions and
+-// limitations under the License.
+-
+-#include "third_party/absl/strings/string_view.h"
+-
+-#ifndef ABSL_HAVE_STD_STRING_VIEW
+-
+-#include <algorithm>
+-#include <climits>
+-#include <cstring>
+-#include <ostream>
+-
+-// #include "absl/strings/internal/memutil.h"
+-
+-namespace absl {
+-
+-namespace {
+-void WritePadding(std::ostream& o, size_t pad) {
+- char fill_buf[32];
+- memset(fill_buf, o.fill(), sizeof(fill_buf));
+- while (pad) {
+- size_t n = std::min(pad, sizeof(fill_buf));
+- o.write(fill_buf, n);
+- pad -= n;
+- }
+-}
+-
+-class LookupTable {
+- public:
+- // For each character in wanted, sets the index corresponding
+- // to the ASCII code of that character. This is used by
+- // the find_.*_of methods below to tell whether or not a character is in
+- // the lookup table in constant time.
+- explicit LookupTable(string_view wanted) {
+- for (char c : wanted) {
+- table_[Index(c)] = true;
+- }
+- }
+- bool operator[](char c) const { return table_[Index(c)]; }
+-
+- private:
+- static unsigned char Index(char c) { return static_cast<unsigned char>(c); }
+- bool table_[UCHAR_MAX + 1] = {};
+-};
+-
+-} // namespace
+-
+-std::ostream& operator<<(std::ostream& o, string_view piece) {
+- std::ostream::sentry sentry(o);
+- if (sentry) {
+- size_t lpad = 0;
+- size_t rpad = 0;
+- if (static_cast<size_t>(o.width()) > piece.size()) {
+- size_t pad = o.width() - piece.size();
+- if ((o.flags() & o.adjustfield) == o.left) {
+- rpad = pad;
+- } else {
+- lpad = pad;
+- }
+- }
+- if (lpad) WritePadding(o, lpad);
+- o.write(piece.data(), piece.size());
+- if (rpad) WritePadding(o, rpad);
+- o.width(0);
+- }
+- return o;
+-}
+-
+-string_view::size_type string_view::copy(char* buf, size_type n,
+- size_type pos) const {
+- size_type ulen = length_;
+- assert(pos <= ulen);
+- size_type rlen = std::min(ulen - pos, n);
+- if (rlen > 0) {
+- const char* start = ptr_ + pos;
+- std::copy(start, start + rlen, buf);
+- }
+- return rlen;
+-}
+-
+-namespace {
+-const char* memmatch(const char* phaystack, size_t haylen, const char* pneedle,
+- size_t neelen) {
+- if (0 == neelen) {
+- return phaystack; // even if haylen is 0
+- }
+- if (haylen < neelen) {
+- return nullptr;
+- }
+- const char* match;
+- const char* hayend = phaystack + haylen - neelen + 1;
+- while ((match = (const char*)(memchr(phaystack, pneedle[0],
+- hayend - phaystack)))) {
+- if (memcmp(match, pneedle, neelen) == 0) {
+- return match;
+- } else {
+- phaystack = match + 1;
+- }
+- }
+- return nullptr;
+-}
+-} // namespace
+-
+-string_view::size_type string_view::find(string_view s, size_type pos) const
+- noexcept {
+- if (empty() || pos > length_) {
+- if (empty() && pos == 0 && s.empty()) return 0;
+- return npos;
+- }
+- const char* result = memmatch(ptr_ + pos, length_ - pos, s.ptr_, s.length_);
+- return result ? result - ptr_ : npos;
+-}
+-
+-string_view::size_type string_view::find(char c, size_type pos) const noexcept {
+- if (empty() || pos >= length_) {
+- return npos;
+- }
+- const char* result =
+- static_cast<const char*>(memchr(ptr_ + pos, c, length_ - pos));
+- return result != nullptr ? result - ptr_ : npos;
+-}
+-
+-string_view::size_type string_view::rfind(string_view s, size_type pos) const
+- noexcept {
+- if (length_ < s.length_) return npos;
+- if (s.empty()) return std::min(length_, pos);
+- const char* last = ptr_ + std::min(length_ - s.length_, pos) + s.length_;
+- const char* result = std::find_end(ptr_, last, s.ptr_, s.ptr_ + s.length_);
+- return result != last ? result - ptr_ : npos;
+-}
+-
+-// Search range is [0..pos] inclusive. If pos == npos, search everything.
+-string_view::size_type string_view::rfind(char c, size_type pos) const
+- noexcept {
+- // Note: memrchr() is not available on Windows.
+- if (empty()) return npos;
+- for (size_type i = std::min(pos, length_ - 1);; --i) {
+- if (ptr_[i] == c) {
+- return i;
+- }
+- if (i == 0) break;
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_first_of(string_view s,
+- size_type pos) const
+- noexcept {
+- if (empty() || s.empty()) {
+- return npos;
+- }
+- // Avoid the cost of LookupTable() for a single-character search.
+- if (s.length_ == 1) return find_first_of(s.ptr_[0], pos);
+- LookupTable tbl(s);
+- for (size_type i = pos; i < length_; ++i) {
+- if (tbl[ptr_[i]]) {
+- return i;
+- }
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_first_not_of(string_view s,
+- size_type pos) const
+- noexcept {
+- if (empty()) return npos;
+- // Avoid the cost of LookupTable() for a single-character search.
+- if (s.length_ == 1) return find_first_not_of(s.ptr_[0], pos);
+- LookupTable tbl(s);
+- for (size_type i = pos; i < length_; ++i) {
+- if (!tbl[ptr_[i]]) {
+- return i;
+- }
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_first_not_of(char c,
+- size_type pos) const
+- noexcept {
+- if (empty()) return npos;
+- for (; pos < length_; ++pos) {
+- if (ptr_[pos] != c) {
+- return pos;
+- }
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_last_of(string_view s,
+- size_type pos) const noexcept {
+- if (empty() || s.empty()) return npos;
+- // Avoid the cost of LookupTable() for a single-character search.
+- if (s.length_ == 1) return find_last_of(s.ptr_[0], pos);
+- LookupTable tbl(s);
+- for (size_type i = std::min(pos, length_ - 1);; --i) {
+- if (tbl[ptr_[i]]) {
+- return i;
+- }
+- if (i == 0) break;
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_last_not_of(string_view s,
+- size_type pos) const
+- noexcept {
+- if (empty()) return npos;
+- size_type i = std::min(pos, length_ - 1);
+- if (s.empty()) return i;
+- // Avoid the cost of LookupTable() for a single-character search.
+- if (s.length_ == 1) return find_last_not_of(s.ptr_[0], pos);
+- LookupTable tbl(s);
+- for (;; --i) {
+- if (!tbl[ptr_[i]]) {
+- return i;
+- }
+- if (i == 0) break;
+- }
+- return npos;
+-}
+-
+-string_view::size_type string_view::find_last_not_of(char c,
+- size_type pos) const
+- noexcept {
+- if (empty()) return npos;
+- size_type i = std::min(pos, length_ - 1);
+- for (;; --i) {
+- if (ptr_[i] != c) {
+- return i;
+- }
+- if (i == 0) break;
+- }
+- return npos;
+-}
+-
+-// MSVC has non-standard behavior that implicitly creates definitions for static
+-// const members. These implicit definitions conflict with explicit out-of-class
+-// member definitions that are required by the C++ standard, resulting in
+-// LNK1169 "multiply defined" errors at link time. __declspec(selectany) asks
+-// MSVC to choose only one definition for the symbol it decorates. See details
+-// at http://msdn.microsoft.com/en-us/library/34h23df8(v=vs.100).aspx
+-#ifdef _MSC_VER
+-#define ABSL_STRING_VIEW_SELECTANY __declspec(selectany)
+-#else
+-#define ABSL_STRING_VIEW_SELECTANY
+-#endif
+-
+-ABSL_STRING_VIEW_SELECTANY
+-constexpr string_view::size_type string_view::npos;
+-ABSL_STRING_VIEW_SELECTANY
+-constexpr string_view::size_type string_view::kMaxSize;
+-
+-} // namespace absl
+-
+-#endif // ABSL_HAVE_STD_STRING_VIEW
+diff --git a/third_party/absl/strings/string_view.h b/third_party/absl/strings/string_view.h
+index 68d46e3..9bb8b1c 100644
+--- a/third_party/absl/strings/string_view.h
++++ b/third_party/absl/strings/string_view.h
+@@ -28,518 +28,10 @@
+ #define ABSL_STRINGS_STRING_VIEW_H_
+
+ #include <algorithm>
+-// #include "absl/base/config.h"
+-
+-#ifdef ABSL_HAVE_STD_STRING_VIEW
+-
+ #include <string_view>
+
+ namespace absl {
+ using std::string_view;
+-} // namespace absl
+-
+-#else // ABSL_HAVE_STD_STRING_VIEW
+-
+-#include <cassert>
+-#include <cstddef>
+-#include <cstring>
+-#include <iosfwd>
+-#include <iterator>
+-#include <limits>
+-#include <string>
+-
+-#ifdef __has_builtin
+-#define ABSL_HAVE_BUILTIN(x) __has_builtin(x)
+-#else
+-#define ABSL_HAVE_BUILTIN(x) 0
+-#endif
+-
+-// #include "absl/base/internal/throw_delegate.h"
+-// #include "absl/base/macros.h"
+-// #include "absl/base/port.h"
+-
+-namespace absl {
+-
+-// absl::string_view
+-//
+-// A `string_view` provides a lightweight view into the std::string data
+-// provided by a `std::string`, double-quoted std::string literal, character
+-// array, or even another `string_view`. A `string_view` does *not* own the
+-// std::string to which it points, and that data cannot be modified through the
+-// view.
+-//
+-// You can use `string_view` as a function or method parameter anywhere a
+-// parameter can receive a double-quoted std::string literal, `const char*`,
+-// `std::string`, or another `absl::string_view` argument with no need to copy
+-// the std::string data. Systematic use of `string_view` within function
+-// arguments reduces data copies and `strlen()` calls.
+-//
+-// Because of its small size, prefer passing `string_view` by value:
+-//
+-// void MyFunction(absl::string_view arg);
+-//
+-// If circumstances require, you may also pass one by const reference:
+-//
+-// void MyFunction(const absl::string_view& arg); // not preferred
+-//
+-// Passing by value generates slightly smaller code for many architectures.
+-//
+-// In either case, the source data of the `string_view` must outlive the
+-// `string_view` itself.
+-//
+-// A `string_view` is also suitable for local variables if you know that the
+-// lifetime of the underlying object is longer than the lifetime of your
+-// `string_view` variable. However, beware of binding a `string_view` to a
+-// temporary value:
+-//
+-// // BAD use of string_view: lifetime problem
+-// absl::string_view sv = obj.ReturnAString();
+-//
+-// // GOOD use of string_view: str outlives sv
+-// std::string str = obj.ReturnAString();
+-// absl::string_view sv = str;
+-//
+-// Due to lifetime issues, a `string_view` is sometimes a poor choice for a
+-// return value and usually a poor choice for a data member. If you do use a
+-// `string_view` this way, it is your responsibility to ensure that the object
+-// pointed to by the `string_view` outlives the `string_view`.
+-//
+-// A `string_view` may represent a whole std::string or just part of a
+-// std::string. For example, when splitting a std::string,
+-// `std::vector<absl::string_view>` is a natural data type for the output.
+-//
+-//
+-// When constructed from a source which is nul-terminated, the `string_view`
+-// itself will not include the nul-terminator unless a specific size (including
+-// the nul) is passed to the constructor. As a result, common idioms that work
+-// on nul-terminated strings do not work on `string_view` objects. If you write
+-// code that scans a `string_view`, you must check its length rather than test
+-// for nul, for example. Note, however, that nuls may still be embedded within
+-// a `string_view` explicitly.
+-//
+-// You may create a null `string_view` in two ways:
+-//
+-// absl::string_view sv();
+-// absl::string_view sv(nullptr, 0);
+-//
+-// For the above, `sv.data() == nullptr`, `sv.length() == 0`, and
+-// `sv.empty() == true`. Also, if you create a `string_view` with a non-null
+-// pointer then `sv.data() != nullptr`. Thus, you can use `string_view()` to
+-// signal an undefined value that is different from other `string_view` values
+-// in a similar fashion to how `const char* p1 = nullptr;` is different from
+-// `const char* p2 = "";`. However, in practice, it is not recommended to rely
+-// on this behavior.
+-//
+-// Be careful not to confuse a null `string_view` with an empty one. A null
+-// `string_view` is an empty `string_view`, but some empty `string_view`s are
+-// not null. Prefer checking for emptiness over checking for null.
+-//
+-// There are many ways to create an empty string_view:
+-//
+-// const char* nullcp = nullptr;
+-// // string_view.size() will return 0 in all cases.
+-// absl::string_view();
+-// absl::string_view(nullcp, 0);
+-// absl::string_view("");
+-// absl::string_view("", 0);
+-// absl::string_view("abcdef", 0);
+-// absl::string_view("abcdef" + 6, 0);
+-//
+-// All empty `string_view` objects whether null or not, are equal:
+-//
+-// absl::string_view() == absl::string_view("", 0)
+-// absl::string_view(nullptr, 0) == absl:: string_view("abcdef"+6, 0)
+-class string_view {
+- public:
+- using traits_type = std::char_traits<char>;
+- using value_type = char;
+- using pointer = char*;
+- using const_pointer = const char*;
+- using reference = char&;
+- using const_reference = const char&;
+- using const_iterator = const char*;
+- using iterator = const_iterator;
+- using const_reverse_iterator = std::reverse_iterator<const_iterator>;
+- using reverse_iterator = const_reverse_iterator;
+- using size_type = size_t;
+- using difference_type = std::ptrdiff_t;
+-
+- static constexpr size_type npos = static_cast<size_type>(-1);
+-
+- // Null `string_view` constructor
+- constexpr string_view() noexcept : ptr_(nullptr), length_(0) {}
+-
+- // Implicit constructors
+-
+- template <typename Allocator>
+- string_view( // NOLINT(runtime/explicit)
+- const std::basic_string<char, std::char_traits<char>, Allocator>&
+- str) noexcept
+- : ptr_(str.data()), length_(CheckLengthInternal(str.size())) {}
+-
+- // Implicit constructor of a `string_view` from nul-terminated `str`. When
+- // accepting possibly null strings, use `absl::NullSafeStringView(str)`
+- // instead (see below).
+- constexpr string_view(const char* str) // NOLINT(runtime/explicit)
+- : ptr_(str), length_(CheckLengthInternal(StrLenInternal(str))) {}
+-
+- // Implicit constructor of a `string_view` from a `const char*` and length.
+- constexpr string_view(const char* data, size_type len)
+- : ptr_(data), length_(CheckLengthInternal(len)) {}
+-
+- // NOTE: Harmlessly omitted to work around gdb bug.
+- // constexpr string_view(const string_view&) noexcept = default;
+- // string_view& operator=(const string_view&) noexcept = default;
+-
+- // Iterators
+-
+- // string_view::begin()
+- //
+- // Returns an iterator pointing to the first character at the beginning of the
+- // `string_view`, or `end()` if the `string_view` is empty.
+- constexpr const_iterator begin() const noexcept { return ptr_; }
+-
+- // string_view::end()
+- //
+- // Returns an iterator pointing just beyond the last character at the end of
+- // the `string_view`. This iterator acts as a placeholder; attempting to
+- // access it results in undefined behavior.
+- constexpr const_iterator end() const noexcept { return ptr_ + length_; }
+-
+- // string_view::cbegin()
+- //
+- // Returns a const iterator pointing to the first character at the beginning
+- // of the `string_view`, or `end()` if the `string_view` is empty.
+- constexpr const_iterator cbegin() const noexcept { return begin(); }
+-
+- // string_view::cend()
+- //
+- // Returns a const iterator pointing just beyond the last character at the end
+- // of the `string_view`. This pointer acts as a placeholder; attempting to
+- // access its element results in undefined behavior.
+- constexpr const_iterator cend() const noexcept { return end(); }
+-
+- // string_view::rbegin()
+- //
+- // Returns a reverse iterator pointing to the last character at the end of the
+- // `string_view`, or `rend()` if the `string_view` is empty.
+- const_reverse_iterator rbegin() const noexcept {
+- return const_reverse_iterator(end());
+- }
+-
+- // string_view::rend()
+- //
+- // Returns a reverse iterator pointing just before the first character at the
+- // beginning of the `string_view`. This pointer acts as a placeholder;
+- // attempting to access its element results in undefined behavior.
+- const_reverse_iterator rend() const noexcept {
+- return const_reverse_iterator(begin());
+- }
+-
+- // string_view::crbegin()
+- //
+- // Returns a const reverse iterator pointing to the last character at the end
+- // of the `string_view`, or `crend()` if the `string_view` is empty.
+- const_reverse_iterator crbegin() const noexcept { return rbegin(); }
+-
+- // string_view::crend()
+- //
+- // Returns a const reverse iterator pointing just before the first character
+- // at the beginning of the `string_view`. This pointer acts as a placeholder;
+- // attempting to access its element results in undefined behavior.
+- const_reverse_iterator crend() const noexcept { return rend(); }
+-
+- // Capacity Utilities
+-
+- // string_view::size()
+- //
+- // Returns the number of characters in the `string_view`.
+- constexpr size_type size() const noexcept { return length_; }
+-
+- // string_view::length()
+- //
+- // Returns the number of characters in the `string_view`. Alias for `size()`.
+- constexpr size_type length() const noexcept { return size(); }
+-
+- // string_view::max_size()
+- //
+- // Returns the maximum number of characters the `string_view` can hold.
+- constexpr size_type max_size() const noexcept { return kMaxSize; }
+-
+- // string_view::empty()
+- //
+- // Checks if the `string_view` is empty (refers to no characters).
+- constexpr bool empty() const noexcept { return length_ == 0; }
+-
+- // std::string:view::operator[]
+- //
+- // Returns the ith element of an `string_view` using the array operator.
+- // Note that this operator does not perform any bounds checking.
+- constexpr const_reference operator[](size_type i) const { return ptr_[i]; }
+-
+- // string_view::front()
+- //
+- // Returns the first element of a `string_view`.
+- constexpr const_reference front() const { return ptr_[0]; }
+-
+- // string_view::back()
+- //
+- // Returns the last element of a `string_view`.
+- constexpr const_reference back() const { return ptr_[size() - 1]; }
+-
+- // string_view::data()
+- //
+- // Returns a pointer to the underlying character array (which is of course
+- // stored elsewhere). Note that `string_view::data()` may contain embedded nul
+- // characters, but the returned buffer may or may not be nul-terminated;
+- // therefore, do not pass `data()` to a routine that expects a nul-terminated
+- // std::string.
+- constexpr const_pointer data() const noexcept { return ptr_; }
+-
+- // Modifiers
+-
+- // string_view::remove_prefix()
+- //
+- // Removes the first `n` characters from the `string_view`. Note that the
+- // underlying std::string is not changed, only the view.
+- void remove_prefix(size_type n) {
+- assert(n <= length_);
+- ptr_ += n;
+- length_ -= n;
+- }
+-
+- // string_view::remove_suffix()
+- //
+- // Removes the last `n` characters from the `string_view`. Note that the
+- // underlying std::string is not changed, only the view.
+- void remove_suffix(size_type n) {
+- assert(n <= length_);
+- length_ -= n;
+- }
+-
+- // string_view::swap()
+- //
+- // Swaps this `string_view` with another `string_view`.
+- void swap(string_view& s) noexcept {
+- auto t = *this;
+- *this = s;
+- s = t;
+- }
+-
+- // Explicit conversion operators
+-
+- // Converts to `std::basic_string`.
+- template <typename A>
+- explicit operator std::basic_string<char, traits_type, A>() const {
+- if (!data()) return {};
+- return std::basic_string<char, traits_type, A>(data(), size());
+- }
+-
+- // string_view::copy()
+- //
+- // Copies the contents of the `string_view` at offset `pos` and length `n`
+- // into `buf`.
+- size_type copy(char* buf, size_type n, size_type pos = 0) const;
+-
+- // string_view::substr()
+- //
+- // Returns a "substring" of the `string_view` (at offset `pos` and length
+- // `n`) as another string_view. This function throws `std::out_of_bounds` if
+- // `pos > size'.
+- string_view substr(size_type pos, size_type n = npos) const {
+- n = std::min(n, length_ - pos);
+- return string_view(ptr_ + pos, n);
+- }
+-
+- // string_view::compare()
+- //
+- // Performs a lexicographical comparison between the `string_view` and
+- // another `absl::string_view), returning -1 if `this` is less than, 0 if
+- // `this` is equal to, and 1 if `this` is greater than the passed std::string
+- // view. Note that in the case of data equality, a further comparison is made
+- // on the respective sizes of the two `string_view`s to determine which is
+- // smaller, equal, or greater.
+- int compare(string_view x) const noexcept {
+- auto min_length = std::min(length_, x.length_);
+- if (min_length > 0) {
+- int r = memcmp(ptr_, x.ptr_, min_length);
+- if (r < 0) return -1;
+- if (r > 0) return 1;
+- }
+- if (length_ < x.length_) return -1;
+- if (length_ > x.length_) return 1;
+- return 0;
+- }
+-
+- // Overload of `string_view::compare()` for comparing a substring of the
+- // 'string_view` and another `absl::string_view`.
+- int compare(size_type pos1, size_type count1, string_view v) const {
+- return substr(pos1, count1).compare(v);
+- }
+-
+- // Overload of `string_view::compare()` for comparing a substring of the
+- // `string_view` and a substring of another `absl::string_view`.
+- int compare(size_type pos1, size_type count1, string_view v, size_type pos2,
+- size_type count2) const {
+- return substr(pos1, count1).compare(v.substr(pos2, count2));
+- }
+-
+- // Overload of `string_view::compare()` for comparing a `string_view` and a
+- // a different C-style std::string `s`.
+- int compare(const char* s) const { return compare(string_view(s)); }
+-
+- // Overload of `string_view::compare()` for comparing a substring of the
+- // `string_view` and a different std::string C-style std::string `s`.
+- int compare(size_type pos1, size_type count1, const char* s) const {
+- return substr(pos1, count1).compare(string_view(s));
+- }
+-
+- // Overload of `string_view::compare()` for comparing a substring of the
+- // `string_view` and a substring of a different C-style std::string `s`.
+- int compare(size_type pos1, size_type count1, const char* s,
+- size_type count2) const {
+- return substr(pos1, count1).compare(string_view(s, count2));
+- }
+-
+- // Find Utilities
+-
+- // string_view::find()
+- //
+- // Finds the first occurrence of the substring `s` within the `string_view`,
+- // returning the position of the first character's match, or `npos` if no
+- // match was found.
+- size_type find(string_view s, size_type pos = 0) const noexcept;
+-
+- // Overload of `string_view::find()` for finding the given character `c`
+- // within the `string_view`.
+- size_type find(char c, size_type pos = 0) const noexcept;
+-
+- // string_view::rfind()
+- //
+- // Finds the last occurrence of a substring `s` within the `string_view`,
+- // returning the position of the first character's match, or `npos` if no
+- // match was found.
+- size_type rfind(string_view s, size_type pos = npos) const noexcept;
+-
+- // Overload of `string_view::rfind()` for finding the last given character `c`
+- // within the `string_view`.
+- size_type rfind(char c, size_type pos = npos) const noexcept;
+-
+- // string_view::find_first_of()
+- //
+- // Finds the first occurrence of any of the characters in `s` within the
+- // `string_view`, returning the start position of the match, or `npos` if no
+- // match was found.
+- size_type find_first_of(string_view s, size_type pos = 0) const noexcept;
+-
+- // Overload of `string_view::find_first_of()` for finding a character `c`
+- // within the `string_view`.
+- size_type find_first_of(char c, size_type pos = 0) const noexcept {
+- return find(c, pos);
+- }
+-
+- // string_view::find_last_of()
+- //
+- // Finds the last occurrence of any of the characters in `s` within the
+- // `string_view`, returning the start position of the match, or `npos` if no
+- // match was found.
+- size_type find_last_of(string_view s, size_type pos = npos) const noexcept;
+-
+- // Overload of `string_view::find_last_of()` for finding a character `c`
+- // within the `string_view`.
+- size_type find_last_of(char c, size_type pos = npos) const noexcept {
+- return rfind(c, pos);
+- }
+-
+- // string_view::find_first_not_of()
+- //
+- // Finds the first occurrence of any of the characters not in `s` within the
+- // `string_view`, returning the start position of the first non-match, or
+- // `npos` if no non-match was found.
+- size_type find_first_not_of(string_view s, size_type pos = 0) const noexcept;
+-
+- // Overload of `string_view::find_first_not_of()` for finding a character
+- // that is not `c` within the `string_view`.
+- size_type find_first_not_of(char c, size_type pos = 0) const noexcept;
+-
+- // string_view::find_last_not_of()
+- //
+- // Finds the last occurrence of any of the characters not in `s` within the
+- // `string_view`, returning the start position of the last non-match, or
+- // `npos` if no non-match was found.
+- size_type find_last_not_of(string_view s,
+- size_type pos = npos) const noexcept;
+-
+- // Overload of `string_view::find_last_not_of()` for finding a character
+- // that is not `c` within the `string_view`.
+- size_type find_last_not_of(char c, size_type pos = npos) const noexcept;
+-
+- private:
+- static constexpr size_type kMaxSize =
+- std::numeric_limits<difference_type>::max();
+-
+- // check whether __builtin_strlen is provided by the compiler.
+- // GCC doesn't have __has_builtin()
+- // (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66970),
+- // but has __builtin_strlen according to
+- // https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Other-Builtins.html.
+-#if ABSL_HAVE_BUILTIN(__builtin_strlen) || \
+- (defined(__GNUC__) && !defined(__clang__))
+- static constexpr size_type StrLenInternal(const char* str) {
+- return str ? __builtin_strlen(str) : 0;
+- }
+-#else
+- static constexpr size_type StrLenInternal(const char* str) {
+- return str ? strlen(str) : 0;
+- }
+-#endif
+-
+- static constexpr size_type CheckLengthInternal(size_type len) { return len; }
+-
+- const char* ptr_;
+- size_type length_;
+-};
+-
+-// This large function is defined inline so that in a fairly common case where
+-// one of the arguments is a literal, the compiler can elide a lot of the
+-// following comparisons.
+-inline bool operator==(string_view x, string_view y) noexcept {
+- auto len = x.size();
+- if (len != y.size()) {
+- return false;
+- }
+- return x.data() == y.data() || len <= 0 ||
+- memcmp(x.data(), y.data(), len) == 0;
+-}
+-
+-inline bool operator!=(string_view x, string_view y) noexcept {
+- return !(x == y);
+-}
+-
+-inline bool operator<(string_view x, string_view y) noexcept {
+- auto min_size = std::min(x.size(), y.size());
+- const int r = min_size == 0 ? 0 : memcmp(x.data(), y.data(), min_size);
+- return (r < 0) || (r == 0 && x.size() < y.size());
+-}
+-
+-inline bool operator>(string_view x, string_view y) noexcept { return y < x; }
+-
+-inline bool operator<=(string_view x, string_view y) noexcept {
+- return !(y < x);
+-}
+-
+-inline bool operator>=(string_view x, string_view y) noexcept {
+- return !(x < y);
+-}
+-
+-// IO Insertion Operator
+-std::ostream& operator<<(std::ostream& o, string_view piece);
+-
+-} // namespace absl
+-
+-#endif // ABSL_HAVE_STD_STRING_VIEW
+-
+-namespace absl {
+
+ // ClippedSubstr()
+ //
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Tue, 14 Jun 2022 02:00:43 +0900
+Subject: Uses std::atomic to define global variable
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/common.h | 13 -------------
+ src/util.cc | 13 +++++++------
+ 2 files changed, 7 insertions(+), 19 deletions(-)
+
+diff --git a/src/common.h b/src/common.h
+index 7595634..6ec4c09 100644
+--- a/src/common.h
++++ b/src/common.h
+@@ -50,19 +50,6 @@ typedef uint32_t char32;
+ typedef uint32_t uint32;
+ typedef uint64_t uint64;
+
+-static constexpr uint8 kuint8max = ((uint8)0xFF);
+-static constexpr uint16 kuint16max = ((uint16)0xFFFF);
+-static constexpr uint32 kuint32max = ((uint32)0xFFFFFFFF);
+-static constexpr uint64 kuint64max = ((uint64)(0xFFFFFFFFFFFFFFFF));
+-static constexpr int8 kint8min = ((int8)~0x7F);
+-static constexpr int8 kint8max = ((int8)0x7F);
+-static constexpr int16 kint16min = ((int16)~0x7FFF);
+-static constexpr int16 kint16max = ((int16)0x7FFF);
+-static constexpr int32 kint32min = ((int32)~0x7FFFFFFF);
+-static constexpr int32 kint32max = ((int32)0x7FFFFFFF);
+-static constexpr int64 kint64min = ((int64)(~0x7FFFFFFFFFFFFFFF));
+-static constexpr int64 kint64max = ((int64)(0x7FFFFFFFFFFFFFFF));
+-
+ static constexpr uint32 kUnicodeError = 0xFFFD;
+
+ #if defined(OS_WIN) && defined(UNICODE) && defined(_UNICODE)
+diff --git a/src/util.cc b/src/util.cc
+index 8da16c4..f99c73a 100644
+--- a/src/util.cc
++++ b/src/util.cc
+@@ -14,27 +14,28 @@
+
+ #include "util.h"
+
++#include <atomic>
+ #include <iostream>
+
+ namespace sentencepiece {
+
+ namespace {
+ constexpr unsigned int kDefaultSeed = static_cast<unsigned int>(-1);
+-static unsigned int g_seed = kDefaultSeed;
+-static int g_minloglevel = 0;
++static std::atomic<unsigned int> g_seed = kDefaultSeed;
++static std::atomic<int> g_minloglevel = 0;
+ } // namespace
+
+ void SetRandomGeneratorSeed(unsigned int seed) {
+- if (seed != kDefaultSeed) g_seed = seed;
++ if (seed != kDefaultSeed) g_seed.store(seed);
+ }
+
+ uint32 GetRandomGeneratorSeed() {
+- return g_seed == kDefaultSeed ? std::random_device{}() : g_seed;
++ return g_seed == kDefaultSeed ? std::random_device{}() : g_seed.load();
+ }
+
+ namespace logging {
+-int GetMinLogLevel() { return g_minloglevel; }
+-void SetMinLogLevel(int v) { g_minloglevel = v; }
++int GetMinLogLevel() { return g_minloglevel.load(); }
++void SetMinLogLevel(int v) { g_minloglevel.store(v); }
+ } // namespace logging
+
+ namespace string_util {
--- /dev/null
+From: Kentaro Hayashi <kenhys@gmail.com>
+Date: Tue, 14 Jun 2022 20:40:59 +0900
+Subject: Fix a typo
+
+gu rantees ->
+guarantees
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/trainer_interface.cc | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/src/trainer_interface.cc b/src/trainer_interface.cc
+index 5e26b75..7270f29 100644
+--- a/src/trainer_interface.cc
++++ b/src/trainer_interface.cc
+@@ -460,11 +460,11 @@ END:
+ }
+ if (trainer_spec_.differential_privacy_noise_level() <= 0) {
+ LOG(WARNING) << "Private version with <=0 noise level will give "
+- "infinity epsilon gurantees.";
++ "infinity epsilon guarantees.";
+ }
+ if (trainer_spec_.differential_privacy_clipping_threshold() <= 0) {
+ LOG(WARNING) << "Private version with <=0 clipping threshold will give "
+- "infinity epsilon gurantees.";
++ "infinity epsilon guarantees.";
+ }
+
+ // Add noise to all the sentences via threadpool.
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 15 Jun 2022 01:29:55 +0900
+Subject: Uses absl::string_view as much as possible
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/__init__.py | 4 +-
+ python/src/sentencepiece/sentencepiece.i | 92 ++-----
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 329 ++++++++++++++++--------
+ src/builder.cc | 2 +-
+ src/builder.h | 2 +-
+ src/common.h | 3 +-
+ src/error.cc | 9 +-
+ src/sentencepiece_processor.cc | 36 ++-
+ src/sentencepiece_processor.h | 28 +-
+ src/sentencepiece_trainer.h | 8 +-
+ src/spec_parser.h | 16 +-
+ src/spm_encode_main.cc | 22 +-
+ src/util.cc | 27 +-
+ 13 files changed, 333 insertions(+), 245 deletions(-)
+
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index cba3b70..1543d32 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -93,8 +93,8 @@ class SentencePieceProcessor(object):
+ def SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best):
+ return _sentencepiece.SentencePieceProcessor_SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best)
+
+- def CalculateEntropy(self, text, theta):
+- return _sentencepiece.SentencePieceProcessor_CalculateEntropy(self, text, theta)
++ def CalculateEntropy(self, *args):
++ return _sentencepiece.SentencePieceProcessor_CalculateEntropy(self, *args)
+
+ def GetPieceSize(self):
+ return _sentencepiece.SentencePieceProcessor_GetPieceSize(self)
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 3a822bc..40373ce 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -37,6 +37,7 @@ class PyInputString {
+ str_ = nullptr;
+ }
+ }
++ absl::string_view str() const { return absl::string_view(data(), size()); }
+ const char* data() const { return str_; }
+ Py_ssize_t size() const { return size_; }
+ bool IsAvalable() const { return str_ != nullptr; }
+@@ -179,7 +180,7 @@ inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
+ }
+ }
+
+-inline void CheckIds(const std::vector<std::string> &ids, int num_pieces) {}
++inline void CheckIds(const std::vector<absl::string_view> &ids, int num_pieces) {}
+
+ class ThreadPool {
+ public:
+@@ -266,6 +267,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %ignore sentencepiece::util::Status;
+ %ignore sentencepiece::util::StatusCode;
+ %ignore absl::string_view;
++%ignore std::string_view;
+ %ignore sentencepiece::SentencePieceText;
+ %ignore sentencepiece::NormalizerSpec;
+ %ignore sentencepiece::TrainerSpec;
+@@ -386,7 +388,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return $self->DecodeIds(ids);
+ }
+
+- std::string _DecodePieces(const std::vector<std::string> &pieces) const {
++ std::string _DecodePieces(const std::vector<absl::string_view> &pieces) const {
+ return $self->DecodePieces(pieces);
+ }
+
+@@ -397,7 +399,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+
+ sentencepiece::util::bytes _DecodePiecesAsSerializedProto(
+- const std::vector<std::string> &pieces) const {
++ const std::vector<absl::string_view> &pieces) const {
+ CheckIds(pieces, $self->GetPieceSize());
+ return $self->DecodePiecesAsSerializedProto(pieces);
+ }
+@@ -416,12 +418,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+
+ std::vector<std::string> _DecodePiecesBatch(
+- const std::vector<std::vector<std::string>> &ins, int num_threads) const {
++ const std::vector<std::vector<absl::string_view>> &ins, int num_threads) const {
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
+ }
+
+ BytesArray _DecodePiecesAsSerializedProtoBatch(
+- const std::vector<std::vector<std::string>> &ins, int num_threads) const {
++ const std::vector<std::vector<absl::string_view>> &ins, int num_threads) const {
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsSerializedProto, std::string,
+ sentencepiece::util::bytes);
+ }
+@@ -1029,14 +1031,14 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %typemap(out) std::vector<int> {
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+- PyList_SetItem($result, i, PyInt_FromLong(static_cast<long>($1[i])));
++ PyList_SET_ITEM($result, i, PyInt_FromLong(static_cast<long>($1[i])));
+ }
+ }
+
+ %typemap(out) std::vector<float> {
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+- PyList_SetItem($result, i, PyFloat_FromDouble(static_cast<double>($1[i])));
++ PyList_SET_ITEM($result, i, PyFloat_FromDouble(static_cast<double>($1[i])));
+ }
+ }
+
+@@ -1045,9 +1047,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t i = 0; i < $1.size(); ++i) {
+ PyObject *obj = PyList_New($1[i].size());
+ for (size_t j = 0; j < $1[i].size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>($1[i][j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>($1[i][j])));
+ }
+- PyList_SetItem($result, i, obj);
++ PyList_SET_ITEM($result, i, obj);
+ }
+ }
+
+@@ -1055,14 +1057,14 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ PyObject *input_type = resultobj;
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+- PyList_SetItem($result, i, MakePyOutputString($1[i], input_type));
++ PyList_SET_ITEM($result, i, MakePyOutputString($1[i], input_type));
+ }
+ }
+
+ %typemap(out) BytesArray {
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+- PyList_SetItem($result, i, MakePyOutputBytes($1[i]));
++ PyList_SET_ITEM($result, i, MakePyOutputBytes($1[i]));
+ }
+ }
+
+@@ -1072,9 +1074,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t i = 0; i < $1.size(); ++i) {
+ PyObject *obj = PyList_New($1[i].size());
+ for (size_t j = 0; j < $1[i].size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString($1[i][j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString($1[i][j], input_type));
+ }
+- PyList_SetItem($result, i, obj);
++ PyList_SET_ITEM($result, i, obj);
+ }
+ }
+
+@@ -1118,51 +1120,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- $1 = absl::string_view(ustring.data(), ustring.size());
+-}
+-
+-%typemap(in) const std::vector<std::string>& {
+- std::vector<std::string> *out = nullptr;
+- if (PyList_Check($input)) {
+- const size_t size = PyList_Size($input);
+- out = new std::vector<std::string>(size);
+- for (size_t i = 0; i < size; ++i) {
+- const PyInputString ustring(PyList_GetItem($input, i));
+- if (ustring.IsAvalable()) {
+- (*out)[i].assign(ustring.data(), ustring.size());
+- } else {
+- PyErr_SetString(PyExc_TypeError, "list must contain strings");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- }
+- } else {
+- PyErr_SetString(PyExc_TypeError, "not a list");
+- SWIG_fail;
+- }
+- $1 = out;
+-}
+-
+-%typemap(in) const std::vector<absl::string_view>& {
+- std::vector<absl::string_view> *out = nullptr;
+- if (PyList_Check($input)) {
+- const size_t size = PyList_Size($input);
+- out = new std::vector<std::string>(size);
+- for (size_t i = 0; i < size; ++i) {
+- const PyInputString ustring(PyList_GetItem($input, i));
+- if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
+- } else {
+- PyErr_SetString(PyExc_TypeError, "list must contain strings");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- }
+- } else {
+- PyErr_SetString(PyExc_TypeError, "not a list");
+- SWIG_fail;
+- }
+- $1 = out;
++ $1 = ustring.str();
+ }
+
+ %typemap(in) const std::vector<absl::string_view>& {
+@@ -1173,7 +1131,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem($input, i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -1208,11 +1166,11 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ $1 = out;
+ }
+
+-%typemap(in) const std::vector<std::vector<std::string>>& {
+- std::vector<std::vector<std::string>> *out = nullptr;
++%typemap(in) const std::vector<std::vector<absl::string_view>>& {
++ std::vector<std::vector<absl::string_view>> *out = nullptr;
+ if (PyList_Check($input)) {
+ const size_t size = PyList_Size($input);
+- out = new std::vector<std::vector<std::string>>(size);
++ out = new std::vector<std::vector<absl::string_view>>(size);
+ for (size_t i = 0; i < size; ++i) {
+ PyObject *o = PyList_GetItem($input, i);
+ if (PyList_Check(o)) {
+@@ -1221,7 +1179,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t j = 0; j < size2; ++j) {
+ const PyInputString ustring(PyList_GetItem(o, j));
+ if (ustring.IsAvalable()) {
+- (*out)[i][j].assign(ustring.data(), ustring.size());
++ (*out)[i][j] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError,"list must contain integers");
+ SWIG_fail;
+@@ -1302,9 +1260,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t i = 0; i < $1.size(); ++i) {
+ PyObject *obj = PyList_New($1[i].first.size());
+ for (size_t j = 0; j < $1[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString($1[i].first[j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString($1[i].first[j], input_type));
+ }
+- PyList_SetItem($result, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>($1[i].second))));
++ PyList_SET_ITEM($result, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>($1[i].second))));
+ }
+ }
+
+@@ -1313,9 +1271,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (size_t i = 0; i < $1.size(); ++i) {
+ PyObject *obj = PyList_New($1[i].first.size());
+ for (size_t j = 0; j < $1[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>($1[i].first[j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>($1[i].first[j])));
+ }
+- PyList_SetItem($result, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>($1[i].second))));
++ PyList_SET_ITEM($result, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>($1[i].second))));
+ }
+ }
+
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 6df3880..36ce38c 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -2693,16 +2693,16 @@ SWIGINTERN PyObject *SWIG_PyStaticMethod_New(PyObject *SWIGUNUSEDPARM(self), PyO
+ /* -------- TYPES TABLE (BEGIN) -------- */
+
+ #define SWIGTYPE_p_char swig_types[0]
+-#define SWIGTYPE_p_sentencepiece__SentenceIterator swig_types[1]
+-#define SWIGTYPE_p_sentencepiece__SentencePieceProcessor swig_types[2]
+-#define SWIGTYPE_p_sentencepiece__SentencePieceTrainer swig_types[3]
+-#define SWIGTYPE_p_std__string swig_types[4]
+-#define SWIGTYPE_p_std__unordered_mapT_std__string_std__string_t swig_types[5]
+-#define SWIGTYPE_p_std__vectorT_absl__string_view_t swig_types[6]
+-#define SWIGTYPE_p_std__vectorT_int_t swig_types[7]
+-#define SWIGTYPE_p_std__vectorT_std__string_t swig_types[8]
+-#define SWIGTYPE_p_std__vectorT_std__vectorT_int_t_t swig_types[9]
+-#define SWIGTYPE_p_std__vectorT_std__vectorT_std__string_t_t swig_types[10]
++#define SWIGTYPE_p_float swig_types[1]
++#define SWIGTYPE_p_sentencepiece__SentenceIterator swig_types[2]
++#define SWIGTYPE_p_sentencepiece__SentencePieceProcessor swig_types[3]
++#define SWIGTYPE_p_sentencepiece__SentencePieceTrainer swig_types[4]
++#define SWIGTYPE_p_std__string swig_types[5]
++#define SWIGTYPE_p_std__unordered_mapT_std__string_std__string_t swig_types[6]
++#define SWIGTYPE_p_std__vectorT_absl__string_view_t swig_types[7]
++#define SWIGTYPE_p_std__vectorT_int_t swig_types[8]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_absl__string_view_t_t swig_types[9]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_int_t_t swig_types[10]
+ static swig_type_info *swig_types[12];
+ static swig_module_info swig_module = {swig_types, 11, 0, 0, 0, 0};
+ #define SWIG_TypeQuery(name) SWIG_TypeQueryModule(&swig_module, &swig_module, name)
+@@ -2843,6 +2843,7 @@ class PyInputString {
+ str_ = nullptr;
+ }
+ }
++ absl::string_view str() const { return absl::string_view(data(), size()); }
+ const char* data() const { return str_; }
+ Py_ssize_t size() const { return size_; }
+ bool IsAvalable() const { return str_ != nullptr; }
+@@ -2985,7 +2986,7 @@ inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
+ }
+ }
+
+-inline void CheckIds(const std::vector<std::string> &ids, int num_pieces) {}
++inline void CheckIds(const std::vector<absl::string_view> &ids, int num_pieces) {}
+
+ class ThreadPool {
+ public:
+@@ -3473,14 +3474,14 @@ SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodeIds(sentencep
+ CheckIds(ids, self->GetPieceSize());
+ return self->DecodeIds(ids);
+ }
+-SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodePieces(sentencepiece::SentencePieceProcessor const *self,std::vector< std::string > const &pieces){
++SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodePieces(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &pieces){
+ return self->DecodePieces(pieces);
+ }
+ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__DecodeIdsAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
+ CheckIds(ids, self->GetPieceSize());
+ return self->DecodeIdsAsSerializedProto(ids);
+ }
+-SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,std::vector< std::string > const &pieces){
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &pieces){
+ CheckIds(pieces, self->GetPieceSize());
+ return self->DecodePiecesAsSerializedProto(pieces);
+ }
+@@ -3491,10 +3492,10 @@ SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodeIdsAsSerialize
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsSerializedProto, int,
+ sentencepiece::util::bytes);
+ }
+-SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodePiecesBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< std::string > > const &ins,int num_threads){
++SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodePiecesBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< absl::string_view > > const &ins,int num_threads){
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
+ }
+-SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< std::string > > const &ins,int num_threads){
++SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< absl::string_view > > const &ins,int num_threads){
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsSerializedProto, std::string,
+ sentencepiece::util::bytes);
+ }
+@@ -3718,7 +3719,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadFromSerializedProto(PyObje
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ {
+ try {
+@@ -3763,7 +3764,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetEncodeExtraOptions(PyObject
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ {
+ try {
+@@ -3808,7 +3809,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetDecodeExtraOptions(PyObject
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ {
+ try {
+@@ -3834,7 +3835,7 @@ fail:
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetVocabulary(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::string > *arg2 = 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+ PyObject *swig_obj[2] ;
+@@ -3847,14 +3848,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetVocabulary(PyObject *SWIGUN
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<std::string> *out = nullptr;
++ std::vector<absl::string_view> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::string>(size);
++ out = new std::vector<absl::string_view>(size);
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i].assign(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -3869,7 +3870,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SetVocabulary(PyObject *SWIGUN
+ }
+ {
+ try {
+- result = (arg1)->SetVocabulary((std::vector< std::string > const &)*arg2);
++ result = (arg1)->SetVocabulary((std::vector< absl::string_view > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -3955,7 +3956,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadVocabulary(PyObject *SWIGU
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -3983,6 +3984,66 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy__SWIG_0(PyObject *SWIGUNUSEDPARM(self), Py_ssize_t nobjs, PyObject **swig_obj) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ float arg3 ;
++ float *arg4 = (float *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ float val3 ;
++ int ecode3 = 0 ;
++ void *argp4 = 0 ;
++ int res4 = 0 ;
++ sentencepiece::util::Status result;
++
++ if ((nobjs < 4) || (nobjs > 4)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = ustring.str();
++ }
++ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "3"" of type '" "float""'");
++ }
++ arg3 = static_cast< float >(val3);
++ res4 = SWIG_ConvertPtr(swig_obj[3], &argp4,SWIGTYPE_p_float, 0 | 0 );
++ if (!SWIG_IsOK(res4)) {
++ SWIG_exception_fail(SWIG_ArgError(res4), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "4"" of type '" "float *""'");
++ }
++ arg4 = reinterpret_cast< float * >(argp4);
++ {
++ try {
++ result = ((sentencepiece::SentencePieceProcessor const *)arg1)->CalculateEntropy(arg2,arg3,arg4);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ if (!(&result)->ok()) {
++ SWIG_exception(ToSwigError((&result)->code()), (&result)->ToString().c_str());
++ }
++ resultobj = SWIG_From_bool((&result)->ok());
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -4017,7 +4078,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(P
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -4054,9 +4115,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(P
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].first.size());
+ for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i].first[j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString(result[i].first[j], input_type));
+ }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
++ PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -4099,7 +4160,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyOb
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -4135,9 +4196,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyOb
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].first.size());
+ for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
+ }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
++ PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -4146,7 +4207,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy__SWIG_1(PyObject *SWIGUNUSEDPARM(self), Py_ssize_t nobjs, PyObject **swig_obj) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ absl::string_view arg2 ;
+@@ -4155,10 +4216,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *SWI
+ int res1 = 0 ;
+ float val3 ;
+ int ecode3 = 0 ;
+- PyObject *swig_obj[3] ;
+ float result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_CalculateEntropy", 3, 3, swig_obj)) SWIG_fail;
++ if ((nobjs < 3) || (nobjs > 3)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+@@ -4171,7 +4231,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *SWI
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -4194,6 +4254,67 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy(PyObject *self, PyObject *args) {
++ Py_ssize_t argc;
++ PyObject *argv[5] = {
++ 0
++ };
++
++ if (!(argc = SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_CalculateEntropy", 0, 4, argv))) SWIG_fail;
++ --argc;
++ if (argc == 3) {
++ int _v;
++ void *vptr = 0;
++ int res = SWIG_ConvertPtr(argv[0], &vptr, SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0);
++ _v = SWIG_CheckState(res);
++ if (_v) {
++ int res = SWIG_AsCharPtrAndSize(argv[1], 0, NULL, 0);
++ _v = SWIG_CheckState(res);
++ if (_v) {
++ {
++ int res = SWIG_AsVal_float(argv[2], NULL);
++ _v = SWIG_CheckState(res);
++ }
++ if (_v) {
++ return _wrap_SentencePieceProcessor_CalculateEntropy__SWIG_1(self, argc, argv);
++ }
++ }
++ }
++ }
++ if (argc == 4) {
++ int _v;
++ void *vptr = 0;
++ int res = SWIG_ConvertPtr(argv[0], &vptr, SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0);
++ _v = SWIG_CheckState(res);
++ if (_v) {
++ int res = SWIG_AsCharPtrAndSize(argv[1], 0, NULL, 0);
++ _v = SWIG_CheckState(res);
++ if (_v) {
++ {
++ int res = SWIG_AsVal_float(argv[2], NULL);
++ _v = SWIG_CheckState(res);
++ }
++ if (_v) {
++ void *vptr = 0;
++ int res = SWIG_ConvertPtr(argv[3], &vptr, SWIGTYPE_p_float, 0);
++ _v = SWIG_CheckState(res);
++ if (_v) {
++ return _wrap_SentencePieceProcessor_CalculateEntropy__SWIG_0(self, argc, argv);
++ }
++ }
++ }
++ }
++ }
++
++fail:
++ SWIG_Python_RaiseOrModifyTypeError("Wrong number or type of arguments for overloaded function 'SentencePieceProcessor_CalculateEntropy'.\n"
++ " Possible C/C++ prototypes are:\n"
++ " sentencepiece::SentencePieceProcessor::CalculateEntropy(absl::string_view,float,float *) const\n"
++ " sentencepiece::SentencePieceProcessor::CalculateEntropy(absl::string_view,float) const\n");
++ return 0;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_GetPieceSize(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -4247,7 +4368,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_PieceToId(PyObject *SWIGUNUSED
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ {
+ try {
+@@ -4675,7 +4796,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_LoadFromFile(PyObject *SWIGUNU
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ {
+ try {
+@@ -4741,7 +4862,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIds(PyObject *SWIGUNU
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -4790,7 +4911,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIds(PyObject *SWIGUNU
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
++ PyList_SET_ITEM(resultobj, i, PyInt_FromLong(static_cast<long>(result[i])));
+ }
+ }
+ return resultobj;
+@@ -4842,7 +4963,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIG
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -4892,7 +5013,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPieces(PyObject *SWIG
+ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputString(result[i], input_type));
+ }
+ }
+ return resultobj;
+@@ -4944,7 +5065,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProto(PyObj
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -5046,7 +5167,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SW
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -5113,9 +5234,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SW
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
+ }
+- PyList_SetItem(resultobj, i, obj);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ {
+@@ -5177,7 +5298,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -5245,9 +5366,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString(result[i][j], input_type));
+ }
+- PyList_SetItem(resultobj, i, obj);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ {
+@@ -5309,7 +5430,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -5374,7 +5495,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
+ }
+ }
+ {
+@@ -5452,7 +5573,7 @@ fail:
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::string > *arg2 = 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+ PyObject *swig_obj[2] ;
+@@ -5465,14 +5586,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePieces(PyObject *SWIGUN
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<std::string> *out = nullptr;
++ std::vector<absl::string_view> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::string>(size);
++ out = new std::vector<absl::string_view>(size);
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i].assign(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -5487,7 +5608,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePieces(PyObject *SWIGUN
+ }
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__DecodePieces((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::string > const &)*arg2);
++ result = sentencepiece_SentencePieceProcessor__DecodePieces((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5572,7 +5693,7 @@ fail:
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::string > *arg2 = 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+ PyObject *swig_obj[2] ;
+@@ -5585,14 +5706,14 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<std::string> *out = nullptr;
++ std::vector<absl::string_view> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::string>(size);
++ out = new std::vector<absl::string_view>(size);
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i].assign(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -5607,7 +5728,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ }
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::string > const &)*arg2);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5695,7 +5816,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsBatch(PyObject *SWIG
+ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputString(result[i], input_type));
+ }
+ }
+ {
+@@ -5775,7 +5896,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsAsSerializedProtoBat
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
+ }
+ }
+ {
+@@ -5793,7 +5914,7 @@ fail:
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::vector< std::string > > *arg2 = 0 ;
++ std::vector< std::vector< absl::string_view > > *arg2 = 0 ;
+ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+@@ -5809,10 +5930,10 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *S
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<std::vector<std::string>> *out = nullptr;
++ std::vector<std::vector<absl::string_view>> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::vector<std::string>>(size);
++ out = new std::vector<std::vector<absl::string_view>>(size);
+ for (size_t i = 0; i < size; ++i) {
+ PyObject *o = PyList_GetItem(swig_obj[1], i);
+ if (PyList_Check(o)) {
+@@ -5821,7 +5942,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *S
+ for (size_t j = 0; j < size2; ++j) {
+ const PyInputString ustring(PyList_GetItem(o, j));
+ if (ustring.IsAvalable()) {
+- (*out)[i][j].assign(ustring.data(), ustring.size());
++ (*out)[i][j] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError,"list must contain integers");
+ SWIG_fail;
+@@ -5846,7 +5967,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *S
+ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__DecodePiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< std::string > > const &)*arg2,arg3);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< absl::string_view > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5857,17 +5978,11 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *S
+ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputString(result[i], input_type));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputString(result[i], input_type));
+ }
+ }
+- {
+- delete arg2;
+- }
+ return resultobj;
+ fail:
+- {
+- delete arg2;
+- }
+ return NULL;
+ }
+
+@@ -5875,7 +5990,7 @@ fail:
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- std::vector< std::vector< std::string > > *arg2 = 0 ;
++ std::vector< std::vector< absl::string_view > > *arg2 = 0 ;
+ int arg3 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+@@ -5891,10 +6006,10 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+- std::vector<std::vector<std::string>> *out = nullptr;
++ std::vector<std::vector<absl::string_view>> *out = nullptr;
+ if (PyList_Check(swig_obj[1])) {
+ const size_t size = PyList_Size(swig_obj[1]);
+- out = new std::vector<std::vector<std::string>>(size);
++ out = new std::vector<std::vector<absl::string_view>>(size);
+ for (size_t i = 0; i < size; ++i) {
+ PyObject *o = PyList_GetItem(swig_obj[1], i);
+ if (PyList_Check(o)) {
+@@ -5903,7 +6018,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ for (size_t j = 0; j < size2; ++j) {
+ const PyInputString ustring(PyList_GetItem(o, j));
+ if (ustring.IsAvalable()) {
+- (*out)[i][j].assign(ustring.data(), ustring.size());
++ (*out)[i][j] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError,"list must contain integers");
+ SWIG_fail;
+@@ -5928,7 +6043,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< std::string > > const &)*arg2,arg3);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< absl::string_view > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5938,17 +6053,11 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, MakePyOutputBytes(result[i]));
++ PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
+ }
+ }
+- {
+- delete arg2;
+- }
+ return resultobj;
+ fail:
+- {
+- delete arg2;
+- }
+ return NULL;
+ }
+
+@@ -5990,7 +6099,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6031,9 +6140,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsIds(PyObject *SW
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
+ }
+- PyList_SetItem(resultobj, i, obj);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ return resultobj;
+@@ -6079,7 +6188,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6121,9 +6230,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsPieces(PyObject
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i][j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString(result[i][j], input_type));
+ }
+- PyList_SetItem(resultobj, i, obj);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ return resultobj;
+@@ -6169,7 +6278,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsSerializedProto(
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6260,7 +6369,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyO
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6316,9 +6425,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyO
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].first.size());
+ for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
+ }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
++ PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -6373,7 +6482,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6430,9 +6539,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].first.size());
+ for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SetItem(obj, j, MakePyOutputString(result[i].first[j], input_type));
++ PyList_SET_ITEM(obj, j, MakePyOutputString(result[i].first[j], input_type));
+ }
+- PyList_SetItem(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
++ PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+ }
+ }
+ return resultobj;
+@@ -6466,7 +6575,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropy(PyObject *SW
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg2 = absl::string_view(ustring.data(), ustring.size());
++ arg2 = ustring.str();
+ }
+ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+@@ -6518,7 +6627,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropyBatch(PyObjec
+ for (size_t i = 0; i < size; ++i) {
+ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
+ if (ustring.IsAvalable()) {
+- (*out)[i] = absl::string_view(ustring.data(), ustring.size());
++ (*out)[i] = ustring.str();
+ } else {
+ PyErr_SetString(PyExc_TypeError, "list must contain strings");
+ SWIG_fail;
+@@ -6553,7 +6662,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropyBatch(PyObjec
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SetItem(resultobj, i, PyFloat_FromDouble(static_cast<double>(result[i])));
++ PyList_SET_ITEM(resultobj, i, PyFloat_FromDouble(static_cast<double>(result[i])));
+ }
+ }
+ {
+@@ -6623,7 +6732,7 @@ SWIGINTERN PyObject *_wrap_SentencePieceTrainer__TrainFromString(PyObject *SWIGU
+ SWIG_fail;
+ }
+ resultobj = ustring.input_type();
+- arg1 = absl::string_view(ustring.data(), ustring.size());
++ arg1 = ustring.str();
+ }
+ {
+ try {
+@@ -6966,6 +7075,7 @@ static PyMethodDef SwigMethods_proxydocs[] = {
+ /* -------- TYPE CONVERSION AND EQUIVALENCE RULES (BEGIN) -------- */
+
+ static swig_type_info _swigt__p_char = {"_p_char", "char *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_float = {"_p_float", "float *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentenceIterator = {"_p_sentencepiece__SentenceIterator", "sentencepiece::SentenceIterator *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentencePieceProcessor = {"_p_sentencepiece__SentencePieceProcessor", "sentencepiece::SentencePieceProcessor *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentencePieceTrainer = {"_p_sentencepiece__SentencePieceTrainer", "sentencepiece::SentencePieceTrainer *", 0, 0, (void*)0, 0};
+@@ -6973,12 +7083,12 @@ static swig_type_info _swigt__p_std__string = {"_p_std__string", "sentencepiece:
+ static swig_type_info _swigt__p_std__unordered_mapT_std__string_std__string_t = {"_p_std__unordered_mapT_std__string_std__string_t", "std::unordered_map< std::string,std::string > *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__vectorT_absl__string_view_t = {"_p_std__vectorT_absl__string_view_t", "std::vector< absl::string_view > *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__vectorT_int_t = {"_p_std__vectorT_int_t", "std::vector< int > *", 0, 0, (void*)0, 0};
+-static swig_type_info _swigt__p_std__vectorT_std__string_t = {"_p_std__vectorT_std__string_t", "std::vector< std::string > *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_std__vectorT_std__vectorT_absl__string_view_t_t = {"_p_std__vectorT_std__vectorT_absl__string_view_t_t", "std::vector< std::vector< absl::string_view > > *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_std__vectorT_std__vectorT_int_t_t = {"_p_std__vectorT_std__vectorT_int_t_t", "std::vector< std::vector< int > > *", 0, 0, (void*)0, 0};
+-static swig_type_info _swigt__p_std__vectorT_std__vectorT_std__string_t_t = {"_p_std__vectorT_std__vectorT_std__string_t_t", "std::vector< std::vector< std::string > > *", 0, 0, (void*)0, 0};
+
+ static swig_type_info *swig_type_initial[] = {
+ &_swigt__p_char,
++ &_swigt__p_float,
+ &_swigt__p_sentencepiece__SentenceIterator,
+ &_swigt__p_sentencepiece__SentencePieceProcessor,
+ &_swigt__p_sentencepiece__SentencePieceTrainer,
+@@ -6986,12 +7096,12 @@ static swig_type_info *swig_type_initial[] = {
+ &_swigt__p_std__unordered_mapT_std__string_std__string_t,
+ &_swigt__p_std__vectorT_absl__string_view_t,
+ &_swigt__p_std__vectorT_int_t,
+- &_swigt__p_std__vectorT_std__string_t,
++ &_swigt__p_std__vectorT_std__vectorT_absl__string_view_t_t,
+ &_swigt__p_std__vectorT_std__vectorT_int_t_t,
+- &_swigt__p_std__vectorT_std__vectorT_std__string_t_t,
+ };
+
+ static swig_cast_info _swigc__p_char[] = { {&_swigt__p_char, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_float[] = { {&_swigt__p_float, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentenceIterator[] = { {&_swigt__p_sentencepiece__SentenceIterator, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentencePieceProcessor[] = { {&_swigt__p_sentencepiece__SentencePieceProcessor, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentencePieceTrainer[] = { {&_swigt__p_sentencepiece__SentencePieceTrainer, 0, 0, 0},{0, 0, 0, 0}};
+@@ -6999,12 +7109,12 @@ static swig_cast_info _swigc__p_std__string[] = { {&_swigt__p_std__string, 0, 0
+ static swig_cast_info _swigc__p_std__unordered_mapT_std__string_std__string_t[] = { {&_swigt__p_std__unordered_mapT_std__string_std__string_t, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__vectorT_absl__string_view_t[] = { {&_swigt__p_std__vectorT_absl__string_view_t, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__vectorT_int_t[] = { {&_swigt__p_std__vectorT_int_t, 0, 0, 0},{0, 0, 0, 0}};
+-static swig_cast_info _swigc__p_std__vectorT_std__string_t[] = { {&_swigt__p_std__vectorT_std__string_t, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_std__vectorT_std__vectorT_absl__string_view_t_t[] = { {&_swigt__p_std__vectorT_std__vectorT_absl__string_view_t_t, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_std__vectorT_std__vectorT_int_t_t[] = { {&_swigt__p_std__vectorT_std__vectorT_int_t_t, 0, 0, 0},{0, 0, 0, 0}};
+-static swig_cast_info _swigc__p_std__vectorT_std__vectorT_std__string_t_t[] = { {&_swigt__p_std__vectorT_std__vectorT_std__string_t_t, 0, 0, 0},{0, 0, 0, 0}};
+
+ static swig_cast_info *swig_cast_initial[] = {
+ _swigc__p_char,
++ _swigc__p_float,
+ _swigc__p_sentencepiece__SentenceIterator,
+ _swigc__p_sentencepiece__SentencePieceProcessor,
+ _swigc__p_sentencepiece__SentencePieceTrainer,
+@@ -7012,9 +7122,8 @@ static swig_cast_info *swig_cast_initial[] = {
+ _swigc__p_std__unordered_mapT_std__string_std__string_t,
+ _swigc__p_std__vectorT_absl__string_view_t,
+ _swigc__p_std__vectorT_int_t,
+- _swigc__p_std__vectorT_std__string_t,
++ _swigc__p_std__vectorT_std__vectorT_absl__string_view_t_t,
+ _swigc__p_std__vectorT_std__vectorT_int_t_t,
+- _swigc__p_std__vectorT_std__vectorT_std__string_t_t,
+ };
+
+
+diff --git a/src/builder.cc b/src/builder.cc
+index 0fc7f24..822f6fc 100644
+--- a/src/builder.cc
++++ b/src/builder.cc
+@@ -272,7 +272,7 @@ util::Status Builder::DecompileCharsMap(absl::string_view blob,
+ }
+
+ // static
+-util::Status Builder::GetPrecompiledCharsMap(const std::string &name,
++util::Status Builder::GetPrecompiledCharsMap(absl::string_view name,
+ std::string *output) {
+ CHECK_OR_RETURN(output);
+
+diff --git a/src/builder.h b/src/builder.h
+index 95c5168..094da72 100644
+--- a/src/builder.h
++++ b/src/builder.h
+@@ -51,7 +51,7 @@ class Builder {
+ CharsMap *chars_map);
+
+ // Returns a pre-compiled binary index with `name`.
+- static util::Status GetPrecompiledCharsMap(const std::string &name,
++ static util::Status GetPrecompiledCharsMap(absl::string_view name,
+ std::string *output);
+
+ // Makes a normalization mapping based on NFKC.
+diff --git a/src/common.h b/src/common.h
+index 6ec4c09..ab07d85 100644
+--- a/src/common.h
++++ b/src/common.h
+@@ -71,8 +71,7 @@ char (&ArraySizeHelper(const T (&array)[N]))[N];
+ namespace sentencepiece {
+ #ifdef OS_WIN
+ namespace win32 {
+-std::wstring Utf8ToWide(const std::string &input);
+-std::string WideToUtf8(const std::wstring &input);
++std::wstring Utf8ToWide(const absl::string_view input);
+ } // namespace win32
+ #endif
+
+diff --git a/src/error.cc b/src/error.cc
+index a226d98..10faa2d 100644
+--- a/src/error.cc
++++ b/src/error.cc
+@@ -61,15 +61,10 @@ struct Status::Rep {
+ std::string error_message;
+ };
+
+-Status::Status(StatusCode code, const char* error_message) : rep_(new Rep) {
+- rep_->code = code;
+- rep_->error_message = error_message;
+-}
+-
+-Status::Status(StatusCode code, const std::string& error_message)
++Status::Status(StatusCode code, absl::string_view error_message)
+ : rep_(new Rep) {
+ rep_->code = code;
+- rep_->error_message = error_message;
++ rep_->error_message = std::string(error_message);
+ }
+
+ Status::Status(const Status& s)
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index 4d697be..331fc90 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -48,6 +48,12 @@ const char kDefaultUnknownSymbol[] = " \xE2\x81\x87 ";
+
+ // REPLACEMENT CHARACTER (U+FFFD) in UTF-8.
+ const char kReplacementCharacter[] = "\xef\xbf\xbd";
++
++std::vector<absl::string_view> ToPieceArray(const std::vector<std::string> &v) {
++ std::vector<absl::string_view> out(v.size());
++ for (int i = 0; i < v.size(); ++i) out[i] = v[i];
++ return out;
++}
+ } // namespace
+
+ SentencePieceProcessor::SentencePieceProcessor() {}
+@@ -146,7 +152,7 @@ util::Status SentencePieceProcessor::status() const {
+ }
+
+ util::Status SentencePieceProcessor::SetVocabulary(
+- const std::vector<std::string> &valid_vocab) {
++ const std::vector<absl::string_view> &valid_vocab) {
+ RETURN_IF_ERROR(status());
+
+ // TODO(taku): supports vocabulary constraint in BPE model.
+@@ -154,7 +160,8 @@ util::Status SentencePieceProcessor::SetVocabulary(
+ CHECK_OR_RETURN(type == TrainerSpec::UNIGRAM || type == TrainerSpec::BPE)
+ << "Vocabulary constraint is only enabled in subword units.";
+
+- const std::set<std::string> vocab(valid_vocab.begin(), valid_vocab.end());
++ const std::set<absl::string_view> vocab(valid_vocab.begin(),
++ valid_vocab.end());
+
+ for (int i = 0; i < model_proto_->pieces_size(); ++i) {
+ auto *piece = model_proto_->mutable_pieces(i);
+@@ -207,7 +214,7 @@ util::Status SentencePieceProcessor::LoadVocabulary(absl::string_view filename,
+ }
+ }
+
+- return SetVocabulary(vocab);
++ return SetVocabulary(ToPieceArray(vocab));
+ }
+
+ #define CHECK_OR_RETURN_STATUS_STL(container) \
+@@ -250,6 +257,12 @@ util::Status SentencePieceProcessor::Encode(absl::string_view input,
+
+ util::Status SentencePieceProcessor::Decode(
+ const std::vector<std::string> &pieces, std::string *detokenized) const {
++ return Decode(ToPieceArray(pieces), detokenized);
++}
++
++util::Status SentencePieceProcessor::Decode(
++ const std::vector<absl::string_view> &pieces,
++ std::string *detokenized) const {
+ CHECK_OR_RETURN_STATUS_STL(detokenized);
+
+ SentencePieceText spt;
+@@ -593,6 +606,12 @@ util::Status SentencePieceProcessor::CalculateEntropy(absl::string_view input,
+
+ util::Status SentencePieceProcessor::Decode(
+ const std::vector<std::string> &pieces, SentencePieceText *spt) const {
++ return Decode(ToPieceArray(pieces), spt);
++}
++
++util::Status SentencePieceProcessor::Decode(
++ const std::vector<absl::string_view> &pieces,
++ SentencePieceText *spt) const {
+ CHECK_OR_RETURN_STATUS_PROTO(spt);
+
+ const char *unk_surface = kDefaultUnknownSymbol;
+@@ -637,9 +656,9 @@ util::Status SentencePieceProcessor::Decode(
+ has_bos_ws);
+ };
+
+- for (const std::string &w : pieces) {
++ for (absl::string_view w : pieces) {
+ auto *sp = spt->add_pieces();
+- sp->set_piece(w);
++ sp->mutable_piece()->assign(w.data(), w.size());
+ sp->set_id(PieceToId(w));
+ }
+
+@@ -779,6 +798,13 @@ std::string SentencePieceProcessor::DecodePiecesAsSerializedProto(
+ return spt.SerializeAsString();
+ }
+
++std::string SentencePieceProcessor::DecodePiecesAsSerializedProto(
++ const std::vector<absl::string_view> &pieces) const {
++ SentencePieceText spt;
++ if (!Decode(pieces, &spt).ok()) return "";
++ return spt.SerializeAsString();
++}
++
+ std::string SentencePieceProcessor::DecodeIdsAsSerializedProto(
+ const std::vector<int> &ids) const {
+ SentencePieceText spt;
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index 9d38214..8c72656 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -22,9 +22,11 @@
+ #include <utility>
+ #include <vector>
+
++#ifndef SWIG
+ namespace absl {
+ using std::string_view;
+ }
++#endif // SWIG
+
+ namespace sentencepiece {
+
+@@ -58,8 +60,7 @@ class Status {
+ public:
+ Status();
+ ~Status();
+- Status(StatusCode code, const char *error_message);
+- Status(StatusCode code, const std::string &error_message);
++ Status(StatusCode code, absl::string_view error_message);
+ Status(const Status &s);
+ void operator=(const Status &s);
+ bool operator==(const Status &s) const;
+@@ -204,7 +205,7 @@ class SentencePieceProcessor {
+ // Restricts the vocabulary set.
+ // The input sentences are encoded into the tokens in `valid_vocab`.
+ virtual util::Status SetVocabulary(
+- const std::vector<std::string> &valid_vocab);
++ const std::vector<absl::string_view> &valid_vocab);
+
+ // Reverts the vocabulary restriction.
+ virtual util::Status ResetVocabulary();
+@@ -230,6 +231,10 @@ class SentencePieceProcessor {
+ virtual util::Status Decode(const std::vector<std::string> &pieces,
+ std::string *detokenized) const;
+
++ // Given a sequence of pieces, decodes it into a detokenized output.
++ virtual util::Status Decode(const std::vector<absl::string_view> &pieces,
++ std::string *detokenized) const;
++
+ // Given a sequence of ids, decodes it into a detokenized output.
+ virtual util::Status Decode(const std::vector<int> &ids,
+ std::string *detokenized) const;
+@@ -320,16 +325,19 @@ class SentencePieceProcessor {
+ absl::string_view input, int samples, float theta, bool wor,
+ bool include_best, NBestSentencePieceText *samples_spt) const;
+
+-#ifndef SWIG
+ // Calculate entropy of possible tokenisations
+ virtual util::Status CalculateEntropy(absl::string_view input, float theta,
+ float *entropy) const;
+-#endif
+
+ // Given a sequence of pieces, decodes it into SentencePieceText.
++ // TODO(taku): Remove this API and use std::vector<std::string_view>
+ virtual util::Status Decode(const std::vector<std::string> &pieces,
+ SentencePieceText *spt) const;
+
++ // Given a sequence of pieces, decodes it into SentencePieceText.
++ virtual util::Status Decode(const std::vector<absl::string_view> &pieces,
++ SentencePieceText *spt) const;
++
+ // Given a sequence of ids, decodes it into SentencePieceText.
+ virtual util::Status Decode(const std::vector<int> &ids,
+ SentencePieceText *spt) const;
+@@ -401,11 +409,17 @@ class SentencePieceProcessor {
+ theta, wor, include_best);
+ }
+
++ // TODO(taku): Remove this API and use std::vector<std::string_view>
+ virtual std::string DecodePieces(
+ const std::vector<std::string> &pieces) const {
+ DEFINE_SPP_DIRECT_FUNC_IMPL(Decode, std::string, pieces);
+ }
+
++ virtual std::string DecodePieces(
++ const std::vector<absl::string_view> &pieces) const {
++ DEFINE_SPP_DIRECT_FUNC_IMPL(Decode, std::string, pieces);
++ }
++
+ virtual std::string DecodeIds(const std::vector<int> &ids) const {
+ DEFINE_SPP_DIRECT_FUNC_IMPL(Decode, std::string, ids);
+ }
+@@ -428,9 +442,13 @@ class SentencePieceProcessor {
+ virtual util::bytes NBestEncodeAsSerializedProto(absl::string_view input,
+ int nbest_size) const;
+
++ // TODO(taku): Remove this API and use std::vector<std::string_view>
+ virtual util::bytes DecodePiecesAsSerializedProto(
+ const std::vector<std::string> &pieces) const;
+
++ virtual util::bytes DecodePiecesAsSerializedProto(
++ const std::vector<absl::string_view> &pieces) const;
++
+ virtual util::bytes DecodeIdsAsSerializedProto(
+ const std::vector<int> &ids) const;
+
+diff --git a/src/sentencepiece_trainer.h b/src/sentencepiece_trainer.h
+index bb74ab9..b4af6f0 100644
+--- a/src/sentencepiece_trainer.h
++++ b/src/sentencepiece_trainer.h
+@@ -129,12 +129,12 @@ class SentencePieceTrainer {
+ // with comma-separated values. `field_name` must not be a nested message.
+ // The body of these functions are automatically generated with
+ // data/gen_spec_parser.pl
+- static util::Status SetProtoField(const std::string &name,
+- const std::string &value,
++ static util::Status SetProtoField(absl::string_view name,
++ absl::string_view value,
+ TrainerSpec *message);
+
+- static util::Status SetProtoField(const std::string &name,
+- const std::string &value,
++ static util::Status SetProtoField(absl::string_view name,
++ absl::string_view value,
+ NormalizerSpec *message);
+
+ // Populates model type from string representation, e.g., "bpe".
+diff --git a/src/spec_parser.h b/src/spec_parser.h
+index b5713fb..de8f72f 100644
+--- a/src/spec_parser.h
++++ b/src/spec_parser.h
+@@ -25,10 +25,10 @@
+
+ namespace sentencepiece {
+
+-#define PARSE_STRING(param_name) \
+- if (name == #param_name) { \
+- message->set_##param_name(value); \
+- return util::OkStatus(); \
++#define PARSE_STRING(param_name) \
++ if (name == #param_name) { \
++ message->set_##param_name(std::string(value)); \
++ return util::OkStatus(); \
+ }
+
+ #define PARSE_REPEATED_STRING(param_name) \
+@@ -189,8 +189,8 @@ inline std::string PrintProto(const NormalizerSpec &message,
+ return os.str();
+ }
+
+-util::Status SentencePieceTrainer::SetProtoField(const std::string &name,
+- const std::string &value,
++util::Status SentencePieceTrainer::SetProtoField(absl::string_view name,
++ absl::string_view value,
+ TrainerSpec *message) {
+ CHECK_OR_RETURN(message);
+
+@@ -249,8 +249,8 @@ util::Status SentencePieceTrainer::SetProtoField(const std::string &name,
+ << "unknown field name \"" << name << "\" in TrainerSpec.";
+ }
+
+-util::Status SentencePieceTrainer::SetProtoField(const std::string &name,
+- const std::string &value,
++util::Status SentencePieceTrainer::SetProtoField(absl::string_view name,
++ absl::string_view value,
+ NormalizerSpec *message) {
+ CHECK_OR_RETURN(message);
+
+diff --git a/src/spm_encode_main.cc b/src/spm_encode_main.cc
+index 4d12a38..b0e508d 100644
+--- a/src/spm_encode_main.cc
++++ b/src/spm_encode_main.cc
+@@ -92,13 +92,13 @@ int main(int argc, char *argv[]) {
+ absl::flat_hash_map<std::string, int> vocab;
+ sentencepiece::SentencePieceText spt;
+ sentencepiece::NBestSentencePieceText nbest_spt;
+- std::function<void(const std::string &line)> process;
++ std::function<void(absl::string_view line)> process;
+
+ const int nbest_size = absl::GetFlag(FLAGS_nbest_size);
+ const float alpha = absl::GetFlag(FLAGS_alpha);
+
+ if (absl::GetFlag(FLAGS_generate_vocabulary)) {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.Encode(line, &spt));
+ for (const auto &piece : spt.pieces()) {
+ if (!sp.IsUnknown(piece.id()) && !sp.IsControl(piece.id()))
+@@ -106,47 +106,47 @@ int main(int argc, char *argv[]) {
+ }
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "piece") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.Encode(line, &sps));
+ output->WriteLine(absl::StrJoin(sps, " "));
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "id") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.Encode(line, &ids));
+ output->WriteLine(absl::StrJoin(ids, " "));
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "proto") {
+- process = [&](const std::string &line) { CHECK_OK(sp.Encode(line, &spt)); };
++ process = [&](absl::string_view line) { CHECK_OK(sp.Encode(line, &spt)); };
+ } else if (absl::GetFlag(FLAGS_output_format) == "sample_piece") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.SampleEncode(line, nbest_size, alpha, &sps));
+ output->WriteLine(absl::StrJoin(sps, " "));
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "sample_id") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.SampleEncode(line, nbest_size, alpha, &ids));
+ output->WriteLine(absl::StrJoin(ids, " "));
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "sample_proto") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.SampleEncode(line, nbest_size, alpha, &spt));
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "nbest_piece") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.NBestEncode(line, nbest_size, &nbest_sps));
+ for (const auto &result : nbest_sps) {
+ output->WriteLine(absl::StrJoin(result, " "));
+ }
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "nbest_id") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.NBestEncode(line, nbest_size, &nbest_ids));
+ for (const auto &result : nbest_ids) {
+ output->WriteLine(absl::StrJoin(result, " "));
+ }
+ };
+ } else if (absl::GetFlag(FLAGS_output_format) == "nbest_proto") {
+- process = [&](const std::string &line) {
++ process = [&](absl::string_view line) {
+ CHECK_OK(sp.NBestEncode(line, nbest_size, &nbest_spt));
+ };
+ } else {
+diff --git a/src/util.cc b/src/util.cc
+index f99c73a..f54e8ba 100644
+--- a/src/util.cc
++++ b/src/util.cc
+@@ -244,15 +244,16 @@ std::vector<std::string> StrSplitAsCSV(absl::string_view text) {
+
+ #ifdef OS_WIN
+ namespace win32 {
+-std::wstring Utf8ToWide(const std::string &input) {
+- int output_length =
+- ::MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1, nullptr, 0);
++std::wstring Utf8ToWide(absl::string_view input) {
++ int output_length = ::MultiByteToWideChar(
++ CP_UTF8, 0, input.data(), static_cast<int>(input.size()), nullptr, 0);
+ output_length = output_length <= 0 ? 0 : output_length - 1;
+ if (output_length == 0) {
+ return L"";
+ }
+ std::unique_ptr<wchar_t[]> input_wide(new wchar_t[output_length + 1]);
+- const int result = ::MultiByteToWideChar(CP_UTF8, 0, input.c_str(), -1,
++ const int result = ::MultiByteToWideChar(CP_UTF8, 0, input.data(),
++ static_cast<int>(input.size()),
+ input_wide.get(), output_length + 1);
+ std::wstring output;
+ if (result > 0) {
+@@ -260,24 +261,6 @@ std::wstring Utf8ToWide(const std::string &input) {
+ }
+ return output;
+ }
+-
+-std::string WideToUtf8(const std::wstring &input) {
+- const int output_length = ::WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1,
+- nullptr, 0, nullptr, nullptr);
+- if (output_length == 0) {
+- return "";
+- }
+-
+- std::unique_ptr<char[]> input_encoded(new char[output_length + 1]);
+- const int result =
+- ::WideCharToMultiByte(CP_UTF8, 0, input.c_str(), -1, input_encoded.get(),
+- output_length + 1, nullptr, nullptr);
+- std::string output;
+- if (result > 0) {
+- output.assign(input_encoded.get());
+- }
+- return output;
+-}
+ } // namespace win32
+ #endif
+ } // namespace sentencepiece
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 15 Jun 2022 02:22:05 +0900
+Subject: Fixed build break.
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/common.h | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/src/common.h b/src/common.h
+index ab07d85..c27c352 100644
+--- a/src/common.h
++++ b/src/common.h
+@@ -26,6 +26,7 @@
+ #include <vector>
+
+ #include "config.h"
++#include "third_party/absl/strings/string_view.h"
+
+ #if defined(_WIN32) && !defined(__CYGWIN__)
+ #define OS_WIN
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Mon, 20 Jun 2022 00:55:46 +0900
+Subject: Added ImmutableSentencePiece class
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/bpe_model.cc | 6 +-
+ src/model_interface.h | 26 +--
+ src/model_interface_test.cc | 19 +--
+ src/sentencepiece_processor.cc | 173 ++++++++++++-------
+ src/sentencepiece_processor.h | 332 ++++++++++++++++++++++++++----------
+ src/sentencepiece_processor_test.cc | 102 ++++++++---
+ src/unigram_model.cc | 70 ++++----
+ src/unigram_model.h | 15 ++
+ src/unigram_model_test.cc | 114 +++++++------
+ src/util.h | 11 --
+ 10 files changed, 557 insertions(+), 311 deletions(-)
+
+diff --git a/src/bpe_model.cc b/src/bpe_model.cc
+index 22cd115..bc7ada1 100644
+--- a/src/bpe_model.cc
++++ b/src/bpe_model.cc
+@@ -12,6 +12,8 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
++#include "bpe_model.h"
++
+ #include <functional>
+ #include <memory>
+ #include <queue>
+@@ -19,7 +21,6 @@
+ #include <utility>
+ #include <vector>
+
+-#include "bpe_model.h"
+ #include "freelist.h"
+ #include "third_party/absl/container/flat_hash_map.h"
+ #include "util.h"
+@@ -71,8 +72,7 @@ std::vector<std::pair<absl::string_view, int>> Model::SampleEncode(
+ // Reverse merge rules.
+ // key: merged symbol, value: pair of original symbols.
+ absl::flat_hash_map<absl::string_view,
+- std::pair<absl::string_view, absl::string_view>,
+- string_util::string_view_hash>
++ std::pair<absl::string_view, absl::string_view>>
+ rev_merge;
+
+ // Pre-allocates SymbolPair for efficiency.
+diff --git a/src/model_interface.h b/src/model_interface.h
+index 06b3a65..06e9243 100644
+--- a/src/model_interface.h
++++ b/src/model_interface.h
+@@ -53,8 +53,8 @@ class ModelProto;
+ // Given a normalized string, returns a sequence of sentence pieces with ids.
+ class ModelInterface {
+ public:
+- using PieceToIdMap = absl::flat_hash_map<absl::string_view, int,
+- string_util::string_view_hash>;
++ using PieceToIdMap = absl::flat_hash_map<absl::string_view, int>;
++ // string_util::string_view_hash>;
+
+ absl::string_view unk_piece() const;
+ absl::string_view bos_piece() const;
+@@ -77,19 +77,6 @@ class ModelInterface {
+ return matcher_.get();
+ }
+
+- // Sets the encoder version. Currently only unigram has an optimized encoder.
+- // The optimized version is always used by default if there is one, so
+- // normally users do not need to call this function. This function is provided
+- // just in case that a user want to manually choose which encoder version to
+- // use.
+- virtual util::Status SetEncoderVersion(EncoderVersion encoder_version) {
+- encoder_version_ = encoder_version;
+- return util::OkStatus();
+- }
+-
+- // Returns the current encoder version in use.
+- virtual EncoderVersion GetEncoderVersion() const { return encoder_version_; }
+-
+ // Given a normalized string, returns a sequence of sentence pieces with ids.
+ // The concatenation of pieces must be the same as `normalized`.
+ virtual EncodeResult Encode(absl::string_view normalized) const = 0;
+@@ -123,10 +110,9 @@ class ModelInterface {
+ }
+
+ // Calculates the entropy of the segmentation lattice with inverse temperature
+- // `theta`.
+- // Uses a novel dynamic program to calculate the entropy.
++ // `alpha`. Uses a novel dynamic program to calculate the entropy.
+ virtual float CalculateEntropy(absl::string_view normalized,
+- float theta) const {
++ float alpha) const {
+ LOG(ERROR) << "Not implemented.";
+ return 0.0;
+ }
+@@ -256,10 +242,6 @@ class ModelInterface {
+ // unknown id.
+ int unk_id_ = 0;
+
+- // The encoder version. Currently it is only effective for unigram model but
+- // ignored by other models.
+- EncoderVersion encoder_version_ = EncoderVersion::kOptimized;
+-
+ // status.
+ util::Status status_;
+ };
+diff --git a/src/model_interface_test.cc b/src/model_interface_test.cc
+index 69ee4e6..09e41d3 100644
+--- a/src/model_interface_test.cc
++++ b/src/model_interface_test.cc
+@@ -12,8 +12,9 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
+-#include "model_factory.h"
+ #include "model_interface.h"
++
++#include "model_factory.h"
+ #include "testharness.h"
+ #include "third_party/absl/container/flat_hash_map.h"
+ #include "util.h"
+@@ -481,22 +482,6 @@ TEST(ModelInterfaceTest, PieceToByteTest) {
+ EXPECT_EQ(PieceToByte("a"), -1);
+ }
+
+-TEST(ModelInterfaceTest, SetEncoderVersion) {
+- for (const auto type : kModelTypes) {
+- ModelProto model_proto = MakeBaseModelProto(type);
+- AddPiece(&model_proto, "a");
+- AddPiece(&model_proto, "b");
+- auto model = ModelFactory::Create(model_proto);
+-
+- // Verify the default encoder version.
+- EXPECT_EQ(EncoderVersion::kOptimized, model->GetEncoderVersion());
+-
+- // Set the encoder version to original and verify.
+- EXPECT_TRUE(model->SetEncoderVersion(EncoderVersion::kOriginal).ok());
+- EXPECT_EQ(EncoderVersion::kOriginal, model->GetEncoderVersion());
+- }
+-}
+-
+ TEST(ModelInterfaceTest, VerifyOutputsEquivalent) {
+ for (const auto type : kModelTypes) {
+ ModelProto model_proto = MakeBaseModelProto(type);
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index 331fc90..a6f5395 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -56,6 +56,112 @@ std::vector<absl::string_view> ToPieceArray(const std::vector<std::string> &v) {
+ }
+ } // namespace
+
++ImmutableSentencePieceText::ImmutableSentencePieceText() {}
++ImmutableSentencePieceText::~ImmutableSentencePieceText() {}
++
++ImmutableSentencePieceText::ImmutableSentencePieceText(
++ const SentencePieceText &spt)
++ : spt_(&spt) {}
++
++ImmutableSentencePieceText::ImmutableSentencePiece::ImmutableSentencePiece(
++ const SentencePieceText_SentencePiece &sp)
++ : sp_(&sp) {}
++
++absl::string_view ImmutableSentencePieceText::ImmutableSentencePiece::piece()
++ const {
++ return sp_->piece();
++}
++
++absl::string_view ImmutableSentencePieceText::ImmutableSentencePiece::surface()
++ const {
++ return sp_->surface();
++}
++
++uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::id() const {
++ return sp_->id();
++}
++
++uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::begin() const {
++ return sp_->begin();
++}
++
++uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::end() const {
++ return sp_->end();
++}
++
++std::vector<ImmutableSentencePieceText::ImmutableSentencePiece>
++ImmutableSentencePieceText::pieces() const {
++ std::vector<ImmutableSentencePieceText::ImmutableSentencePiece> pieces;
++ if (spt_ == nullptr) return pieces;
++ pieces.reserve(spt_->pieces_size());
++ for (int i = 0; i < spt_->pieces_size(); ++i)
++ pieces[i] = ImmutableSentencePiece(spt_->pieces(i));
++ return pieces;
++}
++
++size_t ImmutableSentencePieceText::pieces_size() const {
++ return spt_ ? spt_->pieces_size() : 0;
++}
++
++ImmutableSentencePieceText::ImmutableSentencePiece
++ImmutableSentencePieceText::pieces(int index) const {
++ return ImmutableSentencePieceText::ImmutableSentencePiece(
++ spt_->pieces(index));
++}
++
++absl::string_view ImmutableSentencePieceText::text() const {
++ return spt_ ? spt_->text() : "";
++}
++
++float ImmutableSentencePieceText::score() const {
++ return spt_ ? spt_->score() : 0.0;
++}
++
++SentencePieceText *ImmutableSentencePieceText::mutable_proto() {
++ if (rep_ == nullptr) {
++ rep_ = std::make_shared<SentencePieceText>();
++ spt_ = rep_.get();
++ }
++ return rep_.get();
++}
++
++std::string ImmutableSentencePieceText::SerializeAsString() const {
++ return spt_ ? spt_->SerializeAsString() : "";
++}
++
++ImmutableNBestSentencePieceText::ImmutableNBestSentencePieceText() {}
++ImmutableNBestSentencePieceText::~ImmutableNBestSentencePieceText() {}
++
++size_t ImmutableNBestSentencePieceText::nbests_size() const {
++ return rep_ ? rep_->nbests_size() : 0;
++}
++
++ImmutableSentencePieceText ImmutableNBestSentencePieceText::nbests(
++ int index) const {
++ return ImmutableSentencePieceText(rep_->nbests(index));
++}
++
++std::vector<ImmutableSentencePieceText>
++ImmutableNBestSentencePieceText::nbests() const {
++ std::vector<ImmutableSentencePieceText> nbests;
++ if (rep_ == nullptr) return nbests;
++ nbests.reserve(rep_->nbests_size());
++ for (int i = 0; i < rep_->nbests_size(); ++i)
++ nbests[i] = ImmutableSentencePieceText(rep_->nbests(i));
++ return nbests;
++}
++
++NBestSentencePieceText *ImmutableNBestSentencePieceText::mutable_proto() {
++ if (rep_ == nullptr) {
++ rep_ = std::make_shared<NBestSentencePieceText>();
++ }
++ return rep_.get();
++}
++
++std::string ImmutableNBestSentencePieceText::SerializeAsString() const {
++ return rep_ ? rep_->SerializeAsString() : "";
++}
++
+ SentencePieceProcessor::SentencePieceProcessor() {}
+ SentencePieceProcessor::~SentencePieceProcessor() {}
+
+@@ -124,15 +230,6 @@ util::Status SentencePieceProcessor::Load(
+ return util::OkStatus();
+ }
+
+-util::Status SentencePieceProcessor::SetEncoderVersion(
+- EncoderVersion encoder_version) {
+- return model_->SetEncoderVersion(encoder_version);
+-}
+-
+-EncoderVersion SentencePieceProcessor::GetEncoderVersion() const {
+- return model_->GetEncoderVersion();
+-}
+-
+ util::Status SentencePieceProcessor::SetEncodeExtraOptions(
+ absl::string_view extra_options) {
+ return ParseExtraOptions(extra_options, &encode_extra_options_);
+@@ -348,14 +445,14 @@ util::Status SentencePieceProcessor::SampleEncode(absl::string_view input,
+ }
+
+ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+- absl::string_view input, int num_samples, float theta, bool wor,
++ absl::string_view input, int num_samples, float alpha, bool wor,
+ bool include_best,
+ std::vector<std::pair<std::vector<std::string>, float>> *pieces) const {
+ CHECK_OR_RETURN_STATUS_STL(pieces);
+
+ NBestSentencePieceText spt;
+ RETURN_IF_ERROR(
+- SampleEncodeAndScore(input, num_samples, theta, wor, include_best, &spt));
++ SampleEncodeAndScore(input, num_samples, alpha, wor, include_best, &spt));
+
+ pieces->clear();
+ pieces->reserve(spt.nbests_size());
+@@ -373,14 +470,14 @@ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+ }
+
+ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+- absl::string_view input, int num_samples, float theta, bool wor,
++ absl::string_view input, int num_samples, float alpha, bool wor,
+ bool include_best,
+ std::vector<std::pair<std::vector<int>, float>> *ids) const {
+ CHECK_OR_RETURN_STATUS_STL(ids);
+
+ NBestSentencePieceText spt;
+ RETURN_IF_ERROR(
+- SampleEncodeAndScore(input, num_samples, theta, wor, include_best, &spt));
++ SampleEncodeAndScore(input, num_samples, alpha, wor, include_best, &spt));
+
+ ids->clear();
+ ids->reserve(spt.nbests_size());
+@@ -568,7 +665,7 @@ util::Status SentencePieceProcessor::SampleEncode(
+ }
+
+ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+- absl::string_view input, int samples, float theta, bool wor,
++ absl::string_view input, int samples, float alpha, bool wor,
+ bool include_best, NBestSentencePieceText *samples_spt) const {
+ CHECK_OR_RETURN(model_->IsSampleEncodeAndScoreAvailable())
+ << "SampleEncodeAndScore is not available for the current model.";
+@@ -576,7 +673,7 @@ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+ std::vector<size_t> norm_to_orig;
+ RETURN_IF_ERROR(normalizer_->Normalize(input, &normalized, &norm_to_orig));
+
+- const auto results = model_->SampleEncodeAndScore(normalized, theta, samples,
++ const auto results = model_->SampleEncodeAndScore(normalized, alpha, samples,
+ wor, include_best);
+ CHECK_OR_RETURN(!results.empty())
+ << "SampleEncodeAndScore returns empty result.";
+@@ -592,7 +689,7 @@ util::Status SentencePieceProcessor::SampleEncodeAndScore(
+ }
+
+ util::Status SentencePieceProcessor::CalculateEntropy(absl::string_view input,
+- float theta,
++ float alpha,
+ float *entropy) const {
+ CHECK_OR_RETURN(model_->IsCalculateEntropyAvailable())
+ << "CalculateEntropy is not available for the current model.";
+@@ -600,7 +697,7 @@ util::Status SentencePieceProcessor::CalculateEntropy(absl::string_view input,
+ std::vector<size_t> norm_to_orig;
+ RETURN_IF_ERROR(normalizer_->Normalize(input, &normalized, &norm_to_orig));
+
+- *entropy = model_->CalculateEntropy(normalized, theta);
++ *entropy = model_->CalculateEntropy(normalized, alpha);
+ return util::OkStatus();
+ }
+
+@@ -770,48 +867,6 @@ util::Status SentencePieceProcessor::Decode(const std::vector<int> &ids,
+ return Decode(pieces, spt);
+ }
+
+-std::string SentencePieceProcessor::EncodeAsSerializedProto(
+- absl::string_view input) const {
+- SentencePieceText spt;
+- if (!Encode(input, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+-std::string SentencePieceProcessor::SampleEncodeAsSerializedProto(
+- absl::string_view input, int nbest_size, float alpha) const {
+- SentencePieceText spt;
+- if (!SampleEncode(input, nbest_size, alpha, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+-std::string SentencePieceProcessor::NBestEncodeAsSerializedProto(
+- absl::string_view input, int nbest_size) const {
+- NBestSentencePieceText spt;
+- if (!NBestEncode(input, nbest_size, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+-std::string SentencePieceProcessor::DecodePiecesAsSerializedProto(
+- const std::vector<std::string> &pieces) const {
+- SentencePieceText spt;
+- if (!Decode(pieces, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+-std::string SentencePieceProcessor::DecodePiecesAsSerializedProto(
+- const std::vector<absl::string_view> &pieces) const {
+- SentencePieceText spt;
+- if (!Decode(pieces, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+-std::string SentencePieceProcessor::DecodeIdsAsSerializedProto(
+- const std::vector<int> &ids) const {
+- SentencePieceText spt;
+- if (!Decode(ids, &spt).ok()) return "";
+- return spt.SerializeAsString();
+-}
+-
+ #define CHECK_STATUS_OR_RETURN_DEFAULT(value) \
+ if (!status().ok()) { \
+ LOG(ERROR) << status().message() << "\nReturns default value " << value; \
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index 8c72656..51c5b3b 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -29,11 +29,6 @@ using std::string_view;
+ #endif // SWIG
+
+ namespace sentencepiece {
+-
+-#ifndef SWIG
+-using EncodeResult = std::vector<std::pair<absl::string_view, int>>;
+-#endif // SWIG
+-
+ namespace util {
+
+ enum class StatusCode : int {
+@@ -107,17 +102,17 @@ class Status {
+ // sp.Load("//path/to/model");
+ //
+ // vector<string> sps;
+-// sp.Encode("hello world.", &sps);
++// sp.Encode("hello world.", &sps).IgnoreError();
+ //
+ // vector<int> ids;
+-// sp.Encode("hello world.", &ids);
++// sp.Encode("hello world.", &ids).IgnoreError();
+ //
+ // string detok;
+ // sp.Decode(sps, &detok);
+-// CHECK_EQ("hello world.", detok);
++// CHECK_EQ("hello world.", detok).IgnoreError();
+ //
+ // sp.Decode(ids, &detok);
+-// CHECK_EQ("hello world.", detok);
++// CHECK_EQ("hello world.", detok).IgnoreError();
+ //
+ // We can also use SentencePieceText which manages the byte-offsets
+ // between user input (output) and internal sentence pieces.
+@@ -144,16 +139,6 @@ namespace normalizer {
+ class Normalizer;
+ } // namespace normalizer
+
+-#ifndef SWIG
+-// Defines the multiple versions of encoder within each model. Currently only
+-// the Unigram model has an optimized encoder.
+-enum class EncoderVersion {
+- kOptimized, // The optimized encoder (default).
+- kOriginal // The original encoder (user may choose to fall back to this
+- // just in case).
+-};
+-#endif
+-
+ #ifndef SWIGGO
+ namespace util {
+ // Redefine std::string for serialized_proto interface as Python's string is
+@@ -161,7 +146,87 @@ namespace util {
+ // with SWIG's typemap.
+ using bytes = std::string;
+ } // namespace util
+-#endif
++#endif // SWIGGO
++
++class NBestSentencePieceText;
++class ModelInterface;
++class SentencePieceText;
++class SentencePieceText_SentencePiece;
++
++// Wrapper class of SentencePieceText
++// This wrapper only allows an immutable access to the proto and
++// hides the actual implementation of protobuf.
++// See sentencepiece.proto for the details of this class.
++class ImmutableSentencePieceText {
++ public:
++ ImmutableSentencePieceText();
++ virtual ~ImmutableSentencePieceText();
++
++ class ImmutableSentencePiece {
++ public:
++ ~ImmutableSentencePiece() = default;
++ absl::string_view piece() const;
++ absl::string_view surface() const;
++ uint32_t id() const;
++ uint32_t begin() const;
++ uint32_t end() const;
++
++ friend class ImmutableSentencePieceText;
++
++ private:
++ ImmutableSentencePiece() = default;
++ explicit ImmutableSentencePiece(const SentencePieceText_SentencePiece &sp);
++ const SentencePieceText_SentencePiece *sp_ = nullptr;
++ };
++
++ std::vector<ImmutableSentencePiece> pieces() const;
++ size_t pieces_size() const;
++ ImmutableSentencePiece pieces(int index) const;
++ absl::string_view text() const;
++ float score() const;
++
++ std::string SerializeAsString() const;
++
++ // Returns the actual mutable proto.
++ // Do not use this outside of SentencePieceProcessor, as
++ // it returns the raw pointer managed by the shared_ptr.
++ SentencePieceText *mutable_proto();
++
++ friend class ImmutableNBestSentencePieceText;
++ friend class SentencePieceProcessor;
++
++ private:
++ explicit ImmutableSentencePieceText(const SentencePieceText &spt);
++ const SentencePieceText *spt_ = nullptr;
++ std::shared_ptr<SentencePieceText> rep_;
++};
++
++// Wrapper class of SentencePieceText
++// This wrapper only allows an immutable access to the proto and
++// hides the actual implementation of protobuf.
++// See sentencepiece.proto for the details of this class.
++class ImmutableNBestSentencePieceText {
++ public:
++ ImmutableNBestSentencePieceText();
++ virtual ~ImmutableNBestSentencePieceText();
++
++ std::vector<ImmutableSentencePieceText> nbests() const;
++
++ size_t nbests_size() const;
++ ImmutableSentencePieceText nbests(int index) const;
++
++ std::string SerializeAsString() const;
++
++ // Returns the actual mutable proto.
++ // Do not use this outside of SentencePieceProcessor, as
++ // it returns the raw pointer managed by the shared_ptr.
++ NBestSentencePieceText *mutable_proto();
++
++ friend class SentencePieceProcessor;
++
++ private:
++ std::shared_ptr<NBestSentencePieceText> rep_;
++};
+
+ class SentencePieceProcessor {
+ public:
+@@ -217,7 +282,7 @@ class SentencePieceProcessor {
+ int threshold);
+
+ //////////////////////////////////////////////////////////////
+- // Simple API.
++ // Simple Encode and Decode API.
+ //
+ // Given a UTF8 input, encodes it into a sequence of sentence pieces.
+ virtual util::Status Encode(absl::string_view input,
+@@ -239,18 +304,9 @@ class SentencePieceProcessor {
+ virtual util::Status Decode(const std::vector<int> &ids,
+ std::string *detokenized) const;
+
+-#ifndef SWIG
+- // Sets the encoder version. Normally users do not need to call this function.
+- // But they can call this fucntion just in case if they want to fall back to
+- // the original encoder.
+- virtual util::Status SetEncoderVersion(EncoderVersion encoder_version);
+-
+- // Returns the current encoder version in use.
+- virtual EncoderVersion GetEncoderVersion() const;
+-#endif
+-
+ //////////////////////////////////////////////////////////////
+ // NBest API.
++ //
+ // Same as Encode, but returns nbest results.
+ virtual util::Status NBestEncode(
+ absl::string_view input, int nbest_size,
+@@ -262,24 +318,24 @@ class SentencePieceProcessor {
+
+ //////////////////////////////////////////////////////////////
+ // Sampling API.
++ //
+ // Unigram and BPE support sampling mode.
+ // - Unigram (--model_type=unigram):
+- // When `nbest_size` is positive value, approximately samples one
+- // segmentation from nbest candidates. When `nbest_size` is negative value,
+- // samples one segmentation from the hypotheses (Lattice) according to the
+- // generation probabilities using forward-filtering and backward-sampling
+- // algorithm. `alpha` is a smoothing parameter. The best segmentation
+- // (Viterbi segmentation) is more likely sampled when setting larger
+- // alpha. When alpha is 0.0, one segmentation is uniformly sampled from the
+- // nbest or lattice.
+- // `nbest_size` and `alpha` correspond to parameters `l` and `alpha`
++ // `nbest_size`: When `nbest_size` is positive value, approximately samples
++ // one segmentation from nbest candidates. When `nbest_size` is negative
++ // value, samples one segmentation from the hypotheses (Lattice) according to
++ // the generation probabilities using forward-filtering and backward-sampling
++ // algorithm.
++ // `alpha`: Smoothing parameter (inverse temperature). The best segmentation
++ // (Viterbi segmentation) is more likely sampled when setting larger alpha.
++ // When alpha is 0.0, one segmentation is uniformly sampled from the nbest or
++ // lattice. `nbest_size` and `alpha` correspond to parameters `l` and `alpha`
+ // in https://arxiv.org/abs/1804.10959 (nbest_size < 0 means l = infinity)
+ //
+ // - BPE (--model_type=bpe):
+- // `alpha` is the dropout probability `p` of bpe merge operations
+- // in https://arxiv.org/abs/1910.13267
+- // Nbest-based sampling is not supported so nbest_size parameter is ignored in
+- // BPE.
++ // `alpha`: The dropout probability `p` of bpe merge operations in
++ // https://arxiv.org/abs/1910.13267 Nbest-based sampling is not supported so
++ // nbest_size parameter is ignored in BPE.
+ virtual util::Status SampleEncode(absl::string_view input, int nbest_size,
+ float alpha,
+ std::vector<std::string> *pieces) const;
+@@ -290,74 +346,104 @@ class SentencePieceProcessor {
+
+ //////////////////////////////////////////////////////////////
+ // SampleEncodeAndScore API.
+- // Similar to SampleEncode, but returns samples results.
++ //
++ // Sample `samples` many tokenisations from the segmentation lattice.
++ // These methods are only available in model_type=unigram.
++ //
++ // `alpha`: smoothing parameter (inverse temperature). The same as `alpha` in
++ // `Sample` method.
++ // 'wor`: If `wor` is true, the samples are taken without replacement, and the
++ // scores are the inclusion probabilities of the elements in the sample;
++ // otherwise the samples are taken with replacement and the scores are the
++ // log-probs of sample elements
++ // `include_best`: If `include_best` is true, the best tokenisation is always
++ // included in the sample, and the remaining elements are sampled excluding
++ // the best.
+ virtual util::Status SampleEncodeAndScore(
+- absl::string_view input, int num_samples, float theta, bool wor,
++ absl::string_view input, int num_samples, float alpha, bool wor,
+ bool include_best,
+ std::vector<std::pair<std::vector<std::string>, float>> *pieces) const;
+
+ // Same as above, but returns a sequence of ids.
+ virtual util::Status SampleEncodeAndScore(
+- absl::string_view input, int num_samples, float theta, bool wor,
++ absl::string_view input, int num_samples, float alpha, bool wor,
+ bool include_best,
+ std::vector<std::pair<std::vector<int>, float>> *ids) const;
+
++ //////////////////////////////////////////////////////////////
++ // Entropy API.
++ //
++ // This only available in model_type=unigram.
++ // Calculate entropy of possible tokenisations
++ virtual util::Status CalculateEntropy(absl::string_view input, float alpha,
++ float *entropy) const;
++
+ //////////////////////////////////////////////////////////////
+ // Advanced API returning SentencePieceText, which manages
+ // utf8-byte alignments between user-input/detokenized text
+ // and internal sentencepiece sequence.
+ //
+ // Given a UTF8 input, encodes it into SentencePieceText.
++ //
++ // When using these APIs, sentencepiece.pb.h header files must be included.
++ // We can also use ImutableSentencePieceText as follows.
++ //
++ // ImmutableSentencePieceText spt;
++ // Encode("hello", spt.mutable_proto()).IgnoreError();
++ // std::cout << spt.pieces_size() << std::endl;
+ virtual util::Status Encode(absl::string_view input,
+ SentencePieceText *spt) const;
+
+- // Same as above, but returns NBestSentencePieceText.
+ virtual util::Status NBestEncode(absl::string_view input, int nbest_size,
+ NBestSentencePieceText *nbest_spt) const;
+
+- // Same as above, but samples one segmentation from the hypotheses
+- // (Lattice).
+ virtual util::Status SampleEncode(absl::string_view input, int nbest_size,
+ float alpha, SentencePieceText *spt) const;
+
+- // Samples N segmentation and returns the scores as well
+ virtual util::Status SampleEncodeAndScore(
+- absl::string_view input, int samples, float theta, bool wor,
++ absl::string_view input, int samples, float alpha, bool wor,
+ bool include_best, NBestSentencePieceText *samples_spt) const;
+
+- // Calculate entropy of possible tokenisations
+- virtual util::Status CalculateEntropy(absl::string_view input, float theta,
+- float *entropy) const;
+-
+- // Given a sequence of pieces, decodes it into SentencePieceText.
+- // TODO(taku): Remove this API and use std::vector<std::string_view>
++ // DEPRECATED: Remove this API and use std::vector<std::string_view>
+ virtual util::Status Decode(const std::vector<std::string> &pieces,
+ SentencePieceText *spt) const;
+
+- // Given a sequence of pieces, decodes it into SentencePieceText.
+ virtual util::Status Decode(const std::vector<absl::string_view> &pieces,
+ SentencePieceText *spt) const;
+
+- // Given a sequence of ids, decodes it into SentencePieceText.
+ virtual util::Status Decode(const std::vector<int> &ids,
+ SentencePieceText *spt) const;
+
+- //////////////////////////////////////////////////////////////
+- // Handy methods that return the result directly.
+- // These functions ignore internal errors.
+ #ifdef SWIG
+-#define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+- OutType output; \
+- const auto _status = FuncName(__VA_ARGS__, &output); \
+- if (!_status.ok()) throw _status; \
+- return output;
++#define SPP_SWIG_CHECK_AND_THROW \
++ if (!status.ok()) throw status;
+ #else
++#define SPP_SWIG_CHECK_AND_THROW \
++ if (!status.ok()) { \
++ }
++#endif // SWIG
++
+ #define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+- FuncName(__VA_ARGS__, &output).IgnoreError(); \
++ const auto status = FuncName(__VA_ARGS__, &output); \
++ SPP_SWIG_CHECK_AND_THROW; \
++ return output;
++
++#define DEFINE_SPP_SERIALIZED_PROTO_IMPL(FuncName, OutType, ...) \
++ OutType output; \
++ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
++ SPP_SWIG_CHECK_AND_THROW; \
++ return output.SerializeAsString();
++
++#define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
++ OutType output; \
++ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
++ SPP_SWIG_CHECK_AND_THROW; \
+ return output;
+-#endif
+
++ //////////////////////////////////////////////////////////////
++ // Handy methods that return the result directly.
++ // These functions ignore internal errors.
+ virtual std::vector<std::string> EncodeAsPieces(
+ absl::string_view input) const {
+ DEFINE_SPP_DIRECT_FUNC_IMPL(Encode, std::vector<std::string>, input);
+@@ -395,21 +481,21 @@ class SentencePieceProcessor {
+
+ virtual std::vector<std::pair<std::vector<std::string>, float>>
+ SampleEncodeAndScoreAsPieces(absl::string_view input, int num_samples,
+- float theta, bool wor, bool include_best) const {
++ float alpha, bool wor, bool include_best) const {
+ using _T = std::vector<std::pair<std::vector<std::string>, float>>;
+ DEFINE_SPP_DIRECT_FUNC_IMPL(SampleEncodeAndScore, _T, input, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ }
+
+ virtual std::vector<std::pair<std::vector<int>, float>>
+ SampleEncodeAndScoreAsIds(absl::string_view input, int num_samples,
+- float theta, bool wor, bool include_best) const {
++ float alpha, bool wor, bool include_best) const {
+ using _T = std::vector<std::pair<std::vector<int>, float>>;
+ DEFINE_SPP_DIRECT_FUNC_IMPL(SampleEncodeAndScore, _T, input, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ }
+
+- // TODO(taku): Remove this API and use std::vector<std::string_view>
++ // DEPRECATED: Remove this API and use std::vector<std::string_view>
+ virtual std::string DecodePieces(
+ const std::vector<std::string> &pieces) const {
+ DEFINE_SPP_DIRECT_FUNC_IMPL(Decode, std::string, pieces);
+@@ -424,33 +510,104 @@ class SentencePieceProcessor {
+ DEFINE_SPP_DIRECT_FUNC_IMPL(Decode, std::string, ids);
+ }
+
+- virtual float CalculateEntropy(absl::string_view text, float theta) const {
+- DEFINE_SPP_DIRECT_FUNC_IMPL(CalculateEntropy, float, text, theta);
++ virtual float CalculateEntropy(absl::string_view text, float alpha) const {
++ DEFINE_SPP_DIRECT_FUNC_IMPL(CalculateEntropy, float, text, alpha);
+ }
+
+-#undef DEFINE_SPP_DIRECT_FUNC_IMPL
+-
++ //////////////////////////////////////////////////////////////
++ // SerializedProto API. (DEPRECATED). Use ImmutableProto API.
+ // They are used in Python interface. Returns serialized proto.
+ // In python module, we can get access to the full Proto after
+ // deserialzing the returned byte sequence.
+- virtual util::bytes EncodeAsSerializedProto(absl::string_view input) const;
++ virtual util::bytes EncodeAsSerializedProto(absl::string_view input) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(Encode, ImmutableSentencePieceText, input);
++ }
+
+ virtual util::bytes SampleEncodeAsSerializedProto(absl::string_view input,
+ int nbest_size,
+- float alpha) const;
++ float alpha) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(SampleEncode, ImmutableSentencePieceText,
++ input, nbest_size, alpha);
++ }
+
+ virtual util::bytes NBestEncodeAsSerializedProto(absl::string_view input,
+- int nbest_size) const;
++ int nbest_size) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(
++ NBestEncode, ImmutableNBestSentencePieceText, input, nbest_size);
++ }
++
++ virtual util::bytes SampleEncodeAndScoreAsSerializedProto(
++ absl::string_view input, int samples, float alpha, bool wor,
++ bool include_best, int nbest_size) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(SampleEncodeAndScore,
++ ImmutableNBestSentencePieceText, input,
++ samples, alpha, wor, include_best);
++ }
+
+ // TODO(taku): Remove this API and use std::vector<std::string_view>
+ virtual util::bytes DecodePiecesAsSerializedProto(
+- const std::vector<std::string> &pieces) const;
++ const std::vector<std::string> &pieces) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(Decode, ImmutableSentencePieceText,
++ pieces);
++ }
+
+ virtual util::bytes DecodePiecesAsSerializedProto(
+- const std::vector<absl::string_view> &pieces) const;
++ const std::vector<absl::string_view> &pieces) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(Decode, ImmutableSentencePieceText,
++ pieces);
++ }
+
+ virtual util::bytes DecodeIdsAsSerializedProto(
+- const std::vector<int> &ids) const;
++ const std::vector<int> &ids) const {
++ DEFINE_SPP_SERIALIZED_PROTO_IMPL(Decode, ImmutableSentencePieceText, ids);
++ }
++
++ //////////////////////////////////////////////////////////////
++ // ImmutableProto API.
++ virtual ImmutableSentencePieceText EncodeAsImmutableProto(
++ absl::string_view input) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(Encode, ImmutableSentencePieceText, input);
++ }
++
++ virtual ImmutableSentencePieceText SampleEncodeAsImmutableProto(
++ absl::string_view input, int nbest_size, float alpha) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(SampleEncode, ImmutableSentencePieceText,
++ input, nbest_size, alpha);
++ }
++
++ virtual ImmutableNBestSentencePieceText NBestEncodeAsImmutableProto(
++ absl::string_view input, int nbest_size) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(
++ NBestEncode, ImmutableNBestSentencePieceText, input, nbest_size);
++ }
++
++ virtual ImmutableNBestSentencePieceText SampleEncodeAndScoreAsImmutableProto(
++ absl::string_view input, int samples, float alpha, bool wor,
++ bool include_best, int nbest_size) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(SampleEncodeAndScore,
++ ImmutableNBestSentencePieceText, input,
++ samples, alpha, wor, include_best);
++ }
++
++ // TODO(taku): Remove this API and use std::vector<std::string_view>
++ virtual ImmutableSentencePieceText DecodePiecesAsImmutableProto(
++ const std::vector<std::string> &pieces) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(Decode, ImmutableSentencePieceText, pieces);
++ }
++
++ virtual ImmutableSentencePieceText DecodePiecesAsImmutableProto(
++ const std::vector<absl::string_view> &pieces) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(Decode, ImmutableSentencePieceText, pieces);
++ }
++
++ virtual ImmutableSentencePieceText DecodeIdsAsImmutableProto(
++ const std::vector<int> &ids) const {
++ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(Decode, ImmutableSentencePieceText, ids);
++ }
++
++#undef DEFINE_SPP_DIRECT_FUNC_IMPL
++#undef DEFINE_SPP_SERIALIZED_PROTO_IMPL
++#undef DEFINE_SPP_IMMUTABLE_PROTO_IMPL
+
+ //////////////////////////////////////////////////////////////
+ // Vocabulary management methods.
+@@ -467,7 +624,8 @@ class SentencePieceProcessor {
+ virtual const std::string &IdToPiece(int id) const;
+
+ // Returns the score of `id`.
+- // Usually score is an emission log probability of unigram language model.
++ // Usually score is an emission log probability of unigram language
++ // model.
+ virtual float GetScore(int id) const;
+
+ // Returns true if `id` is unknown symbol.
+@@ -506,7 +664,7 @@ class SentencePieceProcessor {
+
+ // Allows injection of a normalizer instance. `normalizer` is moved.
+ void SetNormalizer(std::unique_ptr<normalizer::Normalizer> &&normalizer);
+-#endif
++#endif // SWIG
+
+ // Returns immutable model proto. Useful to obtain extended
+ // or experimental parameters encoded in model_proto.
+diff --git a/src/sentencepiece_processor_test.cc b/src/sentencepiece_processor_test.cc
+index d57ab5a..ed651f7 100644
+--- a/src/sentencepiece_processor_test.cc
++++ b/src/sentencepiece_processor_test.cc
+@@ -12,6 +12,8 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
++#include "sentencepiece_processor.h"
++
+ #include <utility>
+
+ #include "builder.h"
+@@ -20,7 +22,6 @@
+ #include "normalizer.h"
+ #include "sentencepiece.pb.h"
+ #include "sentencepiece_model.pb.h"
+-#include "sentencepiece_processor.h"
+ #include "sentencepiece_trainer.h"
+ #include "testharness.h"
+ #include "third_party/absl/container/flat_hash_map.h"
+@@ -551,10 +552,9 @@ TEST(SentencepieceProcessorTest, DecodeTest) {
+ int GetPieceSize() const override { return 7; }
+
+ int PieceToId(absl::string_view piece) const override {
+- static absl::flat_hash_map<absl::string_view, int,
+- string_util::string_view_hash>
+- kMap = {{"<unk>", 0}, {"<s>", 1}, {"</s>", 2}, {WS "ABC", 3},
+- {WS "DE", 4}, {"F", 5}, {"G" WS "H", 6}};
++ static absl::flat_hash_map<absl::string_view, int> kMap = {
++ {"<unk>", 0}, {"<s>", 1}, {"</s>", 2}, {WS "ABC", 3},
++ {WS "DE", 4}, {"F", 5}, {"G" WS "H", 6}};
+ return port::FindWithDefault(kMap, piece, 0);
+ }
+
+@@ -719,10 +719,9 @@ TEST(SentencepieceProcessorTest, DummyPrefixDecodeTest) {
+ int GetPieceSize() const override { return 7; }
+
+ int PieceToId(absl::string_view piece) const override {
+- static absl::flat_hash_map<absl::string_view, int,
+- string_util::string_view_hash>
+- kMap = {{"<unk>", 0}, {"<s>", 1}, {"</s>", 2}, {WS "ABC", 3},
+- {WS "DE", 4}, {"F", 5}, {"G" WS "H", 6}, {WS, 7}};
++ static absl::flat_hash_map<absl::string_view, int> kMap = {
++ {"<unk>", 0}, {"<s>", 1}, {"</s>", 2}, {WS "ABC", 3},
++ {WS "DE", 4}, {"F", 5}, {"G" WS "H", 6}, {WS, 7}};
+ return port::FindWithDefault(kMap, piece, 0);
+ }
+
+@@ -1058,18 +1057,6 @@ TEST(SentencePieceProcessorTest, EndToEndTest) {
+ EXPECT_EQ(2, sp.eos_id());
+ EXPECT_EQ(-1, sp.pad_id());
+
+- {
+- // Verify the default encoder version.
+- EXPECT_EQ(EncoderVersion::kOptimized, sp.GetEncoderVersion());
+-
+- // Set the encoder version to original and verify.
+- EXPECT_TRUE(sp.SetEncoderVersion(EncoderVersion::kOriginal).ok());
+- EXPECT_EQ(EncoderVersion::kOriginal, sp.GetEncoderVersion());
+-
+- // Set back to the default encoder version.
+- EXPECT_TRUE(sp.SetEncoderVersion(EncoderVersion::kOptimized).ok());
+- }
+-
+ {
+ std::vector<std::string> sps;
+ const std::vector<std::string> expected_str = {WS, "ab", "c"};
+@@ -1574,4 +1561,77 @@ TEST(SentencePieceProcessorTest, VocabularyTest) {
+ EXPECT_FALSE(sp.IsUnused(6));
+ EXPECT_FALSE(sp.IsUnused(7));
+ }
++
++TEST(SentencePieceProcessorTest, ImmutableSentencePieceTextTest) {
++ ImmutableSentencePieceText spt;
++ auto *v = spt.mutable_proto();
++
++ v->set_text("hello world");
++ v->set_score(1.0);
++ for (int i = 0; i < 10; ++i) {
++ auto *p = v->add_pieces();
++ p->set_surface(absl::StrCat("surface_", i));
++ p->set_piece(absl::StrCat("surface_", i));
++ p->set_id(i);
++ p->set_begin(i + 10);
++ p->set_end(i + 20);
++ }
++
++ EXPECT_EQ(v->pieces_size(), spt.pieces_size());
++ for (int i = 0; i < spt.pieces_size(); ++i) {
++ EXPECT_EQ(v->pieces(i).surface(), spt.pieces(i).surface());
++ EXPECT_EQ(v->pieces(i).piece(), spt.pieces(i).piece());
++ EXPECT_EQ(v->pieces(i).id(), spt.pieces(i).id());
++ EXPECT_EQ(v->pieces(i).begin(), spt.pieces(i).begin());
++ EXPECT_EQ(v->pieces(i).end(), spt.pieces(i).end());
++ }
++
++ int n = 0;
++ for (auto &p : spt.pieces()) {
++ EXPECT_EQ(v->pieces(n).surface(), p.surface());
++ EXPECT_EQ(v->pieces(n).piece(), p.piece());
++ EXPECT_EQ(v->pieces(n).id(), p.id());
++ EXPECT_EQ(v->pieces(n).begin(), p.begin());
++ EXPECT_EQ(v->pieces(n).end(), p.end());
++ ++n;
++ }
++
++ EXPECT_EQ(v->text(), spt.text());
++ EXPECT_EQ(v->score(), spt.score());
++ EXPECT_EQ(v->SerializeAsString(), spt.SerializeAsString());
++
++ // test copy.
++ auto spt2 = spt;
++ EXPECT_EQ(spt2.pieces_size(), spt.pieces_size());
++ for (int i = 0; i < spt.pieces_size(); ++i) {
++ EXPECT_EQ(spt2.pieces(i).surface(), spt.pieces(i).surface());
++ EXPECT_EQ(spt2.pieces(i).piece(), spt.pieces(i).piece());
++ EXPECT_EQ(spt2.pieces(i).id(), spt.pieces(i).id());
++ EXPECT_EQ(spt2.pieces(i).begin(), spt.pieces(i).begin());
++ EXPECT_EQ(spt2.pieces(i).end(), spt.pieces(i).end());
++ }
++}
++
++TEST(SentencePieceProcessorTest, ImmutableNBestSentencePieceTextTest) {
++ ImmutableNBestSentencePieceText spt;
++ auto *v = spt.mutable_proto();
++ for (int i = 0; i < 10; ++i) {
++ auto *p = v->add_nbests();
++ p->set_text(absl::StrCat("text_", i));
++ p->set_score(2.0 * i);
++ }
++
++ EXPECT_EQ(v->nbests_size(), spt.nbests_size());
++ for (int i = 0; i < v->nbests_size(); ++i) {
++ EXPECT_EQ(v->nbests(i).text(), spt.nbests(i).text());
++ EXPECT_EQ(v->nbests(i).score(), spt.nbests(i).score());
++ }
++ EXPECT_EQ(v->SerializeAsString(), spt.SerializeAsString());
++
++ // test copy.
++ auto spt2 = spt;
++ EXPECT_EQ(spt2.nbests_size(), spt.nbests_size());
++ EXPECT_EQ(spt2.SerializeAsString(), spt.SerializeAsString());
++}
++
+ } // namespace sentencepiece
+diff --git a/src/unigram_model.cc b/src/unigram_model.cc
+index ea48912..d9f1ce9 100644
+--- a/src/unigram_model.cc
++++ b/src/unigram_model.cc
+@@ -198,16 +198,17 @@ Lattice::LatticePathWithScore Lattice::Viterbi() {
+ return retval;
+ }
+
+-std::vector<float> Lattice::ForwardAlgorithm(float theta) const {
++std::vector<float> Lattice::ForwardAlgorithm(float inv_theta) const {
+ const int len = size();
+ std::vector<float> alpha(node_allocator_.size(), 0.0);
+
+ for (int pos = 0; pos <= len; ++pos) {
+ for (Node *rnode : begin_nodes_[pos]) {
+ for (Node *lnode : end_nodes_[pos]) {
+- alpha[rnode->node_id] = LogSumExp(
+- alpha[rnode->node_id], theta * lnode->score + alpha[lnode->node_id],
+- lnode == end_nodes_[pos][0]);
++ alpha[rnode->node_id] =
++ LogSumExp(alpha[rnode->node_id],
++ inv_theta * lnode->score + alpha[lnode->node_id],
++ lnode == end_nodes_[pos][0]);
+ }
+ }
+ }
+@@ -215,7 +216,7 @@ std::vector<float> Lattice::ForwardAlgorithm(float theta) const {
+ return alpha;
+ }
+
+-std::vector<float> Lattice::BackwardAlgorithm(float theta) const {
++std::vector<float> Lattice::BackwardAlgorithm(float inv_theta) const {
+ const int len = size();
+ std::vector<float> beta(node_allocator_.size(), 0.0);
+
+@@ -260,17 +261,16 @@ float Lattice::PopulateMarginal(float freq,
+ return freq * Z;
+ }
+
+-float Lattice::CalculateEntropy(float theta) const {
++float Lattice::CalculateEntropy(float inv_theta) const {
+ const int len = size();
+
+ // alpha[node_id] is the marginal prob of sequence up to start of node
+ // H is entropy of sequence
+ // the index of alpha/H is Node::node_id.
+- std::vector<float> alpha(node_allocator_.size(), 0.0);
+ std::vector<float> H(node_allocator_.size(), 0.0);
+
+ // Populate the forward marginals to get the normalising constant
+- alpha = ForwardAlgorithm(theta);
++ const auto alpha = ForwardAlgorithm(inv_theta);
+
+ // Now populate the forward entropies
+ for (int pos = 0; pos <= len; ++pos) {
+@@ -280,7 +280,7 @@ float Lattice::CalculateEntropy(float theta) const {
+
+ // We have to normalise p(lnode) by the marginal contribution it makes
+ const float lnode_transition_prob =
+- ((theta * lnode->score) + alpha[lnode->node_id] -
++ ((inv_theta * lnode->score) + alpha[lnode->node_id] -
+ alpha[rnode->node_id]);
+ H[rnode->node_id] += std::exp(lnode_transition_prob) *
+ (H[lnode->node_id] + lnode_transition_prob);
+@@ -345,7 +345,7 @@ Hypothesis *CloneHypAndDependents(
+
+ std::vector<Lattice::LatticePathWithScore> Lattice::NBest(size_t nbest_size,
+ bool sample,
+- float theta) {
++ float inv_theta) {
+ if (nbest_size < 1) {
+ LOG(WARNING) << "nbest_size >= 1. Returns empty result.";
+ return {};
+@@ -391,7 +391,7 @@ std::vector<Lattice::LatticePathWithScore> Lattice::NBest(size_t nbest_size,
+
+ if (sample) {
+ // Run forwards algorithm to get normalising constants
+- alpha = ForwardAlgorithm(theta);
++ alpha = ForwardAlgorithm(inv_theta);
+ // f(eos) = Gumbel(0), as it is the perturbed score of the entire lattice.
+ eos->fx = Gumbel();
+ } else {
+@@ -432,7 +432,8 @@ std::vector<Lattice::LatticePathWithScore> Lattice::NBest(size_t nbest_size,
+ for (int i = 0; i < end_nodes(node->pos).size(); i++) {
+ Node *lnode = end_nodes(node->pos)[i];
+ // Calculate backwards transition score
+- probs[i] = top->gx + alpha[lnode->node_id] + (theta * lnode->score) - Z;
++ probs[i] =
++ top->gx + alpha[lnode->node_id] + (inv_theta * lnode->score) - Z;
+ perturbed_probs[i] = probs[i] + Gumbel();
+ if (perturbed_probs[i] > max_score) {
+ max_score = perturbed_probs[i];
+@@ -508,13 +509,13 @@ std::vector<Lattice::LatticePathWithScore> Lattice::NBest(size_t nbest_size,
+ return results;
+ }
+
+-std::vector<Lattice::Node *> Lattice::Sample(float theta) {
++std::vector<Lattice::Node *> Lattice::Sample(float inv_theta) {
+ const int len = size();
+ if (len == 0) return {};
+
+ std::vector<float> alpha(node_allocator_.size(), 0.0);
+
+- alpha = ForwardAlgorithm(theta);
++ alpha = ForwardAlgorithm(inv_theta);
+
+ auto *mt = random::GetRandomGenerator();
+
+@@ -526,8 +527,8 @@ std::vector<Lattice::Node *> Lattice::Sample(float theta) {
+ while (true) {
+ probs.clear();
+ for (const Node *lnode : end_nodes_[node->pos]) {
+- probs.push_back(std::exp(static_cast<double>(alpha[lnode->node_id] +
+- theta * lnode->score - Z)));
++ probs.push_back(std::exp(static_cast<double>(
++ alpha[lnode->node_id] + inv_theta * lnode->score - Z)));
+ }
+ std::discrete_distribution<int> dist(probs.begin(), probs.end());
+ node = end_nodes_[node->pos][dist(*mt)];
+@@ -721,7 +722,7 @@ NBestEncodeResult Model::NBestEncode(absl::string_view normalized,
+ }
+
+ EncodeResult Model::SampleEncode(absl::string_view normalized,
+- float theta) const {
++ float inv_theta) const {
+ if (!status().ok() || normalized.empty()) {
+ return {};
+ }
+@@ -731,7 +732,7 @@ EncodeResult Model::SampleEncode(absl::string_view normalized,
+ PopulateNodes(&lattice);
+
+ EncodeResult results;
+- for (const auto *node : lattice.Sample(theta)) {
++ for (const auto *node : lattice.Sample(inv_theta)) {
+ results.emplace_back(node->piece, node->id);
+ }
+
+@@ -739,7 +740,7 @@ EncodeResult Model::SampleEncode(absl::string_view normalized,
+ }
+
+ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+- float theta, int samples,
++ float inv_theta, int samples,
+ bool wor,
+ bool include_best) const {
+ if (!status().ok() || normalized.empty()) {
+@@ -750,16 +751,16 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+ lattice.SetSentence(normalized);
+ PopulateNodes(&lattice);
+
+- std::vector<float> alpha = lattice.ForwardAlgorithm(theta);
+- float marginal = alpha[lattice.eos_node()->node_id];
++ const std::vector<float> alpha = lattice.ForwardAlgorithm(inv_theta);
++ const float marginal = alpha[lattice.eos_node()->node_id];
+
+ if (include_best) {
+ if (!wor) {
+- LOG(FATAL) << "include_best not supported for wor false";
++ LOG(ERROR) << "include_best not supported for wor false";
++ return {};
+ }
+ EncodeResult result;
+- Lattice::LatticePathWithScore best_path = lattice.Viterbi();
+-
++ const auto best_path = lattice.Viterbi();
+ for (const auto *node : best_path.first) {
+ result.emplace_back(node->piece, node->id);
+ }
+@@ -770,8 +771,7 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+
+ if (wor) {
+ // Draw k+1 samples as we need perturbed score of k+1th element
+- std::vector<Lattice::LatticePathWithScore> nbest_samples =
+- lattice.NBest(samples + 1, true, theta);
++ auto nbest_samples = lattice.NBest(samples + 1, true, inv_theta);
+
+ if (include_best) {
+ std::vector<std::vector<Lattice::Node *>> nbest_paths(
+@@ -780,14 +780,13 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+ nbest_paths[i] = nbest_samples[i].first;
+ }
+ // Remove the best result from the samples if necessary
+- Lattice::LatticePathWithScore best_path = lattice.Viterbi();
++ const auto best_path = lattice.Viterbi();
+
+ const int index_of_best =
+ (std::find(nbest_paths.begin(), nbest_paths.end(), best_path.first) -
+ nbest_paths.begin());
+
+ if (index_of_best != nbest_samples.size()) {
+- LOG(INFO) << "removing best path from samples";
+ nbest_samples.erase(nbest_samples.begin() + index_of_best);
+ } else {
+ nbest_samples.pop_back();
+@@ -803,7 +802,7 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+ float score = 0.0;
+
+ for (const auto *node : nbest.first) {
+- score += (theta * node->score);
++ score += (inv_theta * node->score);
+ result.emplace_back(node->piece, node->id);
+ }
+
+@@ -814,8 +813,8 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+ for (auto &it : results) {
+ // Only modify non best sample inclusion probabilities.
+ if (it.second != 0.0) {
+- double x = it.second - kappa;
+- double y = std::exp(x);
++ const double x = it.second - kappa;
++ const double y = std::exp(x);
+ double inclusion_prob;
+ if (x <= -10) {
+ // Series expansion of the log Gumbel survival function up to eps.
+@@ -835,10 +834,10 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+
+ float score = 0.0;
+ EncodeResult result;
+- std::vector<Lattice::Node *> sample = lattice.Sample(theta);
++ const std::vector<Lattice::Node *> sample = lattice.Sample(inv_theta);
+ for (const auto *node : sample) {
+ result.emplace_back(node->piece, node->id);
+- score += (theta * node->score);
++ score += (inv_theta * node->score);
+ }
+ results.emplace_back(result, score - marginal);
+ }
+@@ -847,12 +846,13 @@ NBestEncodeResult Model::SampleEncodeAndScore(absl::string_view normalized,
+ return results;
+ }
+
+-float Model::CalculateEntropy(absl::string_view normalized, float theta) const {
++float Model::CalculateEntropy(absl::string_view normalized,
++ float inv_theta) const {
+ Lattice lattice;
+ lattice.SetSentence(normalized);
+ PopulateNodes(&lattice);
+
+- return lattice.CalculateEntropy(theta);
++ return lattice.CalculateEntropy(inv_theta);
+ }
+
+ bool Model::VerifyOutputsEquivalent(absl::string_view expected,
+diff --git a/src/unigram_model.h b/src/unigram_model.h
+index 448e489..aa4f28f 100644
+--- a/src/unigram_model.h
++++ b/src/unigram_model.h
+@@ -173,6 +173,18 @@ class Model : public ModelInterface {
+ bool VerifyOutputsEquivalent(absl::string_view expected,
+ absl::string_view actual) const override;
+
++ enum EncoderVersion {
++ kOptimized, // The optimized encoder.
++ kOriginal // The original encoder.
++ };
++
++ void SetEncoderVersion(EncoderVersion encoder_version) {
++ encoder_version_ = encoder_version;
++ }
++
++ // Returns the current encoder version in use.
++ EncoderVersion GetEncoderVersion() const { return encoder_version_; }
++
+ protected:
+ // Builds a Trie index.
+ void BuildTrie(std::vector<std::pair<absl::string_view, int>> *pieces);
+@@ -195,6 +207,9 @@ class Model : public ModelInterface {
+ // Maximum size of the return value of Trie, which corresponds
+ // to the maximum size of shared common prefix in the sentence pieces.
+ int trie_results_size_;
++
++ // encoder version.
++ EncoderVersion encoder_version_ = kOptimized;
+ };
+
+ } // namespace unigram
+diff --git a/src/unigram_model_test.cc b/src/unigram_model_test.cc
+index 8049d20..221bac2 100644
+--- a/src/unigram_model_test.cc
++++ b/src/unigram_model_test.cc
+@@ -12,6 +12,8 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
++#include "unigram_model.h"
++
+ #include <cmath>
+ #include <map>
+ #include <string>
+@@ -22,7 +24,6 @@
+ #include "testharness.h"
+ #include "third_party/absl/strings/str_cat.h"
+ #include "third_party/absl/strings/str_join.h"
+-#include "unigram_model.h"
+ #include "util.h"
+
+ namespace sentencepiece {
+@@ -249,14 +250,14 @@ TEST(LatticeTest, NBestSampleTest) {
+
+ // Calculate expected probabilities of each path
+ // Note that sampling without replacement affects the expected frequencies!
+- const std::vector<double> kTheta = {0.0, 0.01, 0.5, 0.7, 1.0};
+- for (const auto theta : kTheta) {
++ const std::vector<double> kInv_Theta = {0.0, 0.01, 0.5, 0.7, 1.0};
++ for (const auto inv_theta : kInv_Theta) {
+ std::vector<std::string> strings = {"ABC", "AB C", "A BC", "A B C"};
+ std::map<std::string, float> probs;
+- probs["ABC"] = std::exp(theta * 1.0);
+- probs["AB C"] = std::exp(theta * (0.2 + 0.1));
+- probs["A BC"] = std::exp(theta * (0.0 + 0.5));
+- probs["A B C"] = std::exp(theta * (0.0 + 0.0 + 0.1));
++ probs["ABC"] = std::exp(inv_theta * 1.0);
++ probs["AB C"] = std::exp(inv_theta * (0.2 + 0.1));
++ probs["A BC"] = std::exp(inv_theta * (0.0 + 0.5));
++ probs["A B C"] = std::exp(inv_theta * (0.0 + 0.0 + 0.1));
+
+ for (const auto &it : strings) {
+ EXPECT_EQ(1, probs.count(it));
+@@ -298,7 +299,7 @@ TEST(LatticeTest, NBestSampleTest) {
+ for (const auto num_samples : kNumSamples) {
+ std::map<std::string, int> counts;
+ for (int i = 0; i < kTrials; i++) {
+- auto nbests = lattice.NBest(num_samples, true, theta);
++ auto nbests = lattice.NBest(num_samples, true, inv_theta);
+ for (const auto &nbest : nbests) {
+ counts[GetTokenized(nbest.first)]++;
+ }
+@@ -329,14 +330,14 @@ TEST(LatticeTest, CalculateEntropyTest) {
+ InsertWithScore(&lattice, 0, 3, 1.0); // ABC
+
+ // Calculate expected probabilities of each path
+- const std::vector<double> kTheta = {0.0, 0.01, 0.5, 0.7, 1.0};
+- for (const auto theta : kTheta) {
++ const std::vector<double> kInv_Theta = {0.0, 0.01, 0.5, 0.7, 1.0};
++ for (const auto inv_theta : kInv_Theta) {
+ std::vector<std::string> strings = {"ABC", "AB C", "A BC", "A B C"};
+ std::map<std::string, float> probs;
+- probs["ABC"] = std::exp(theta * 1.0);
+- probs["AB C"] = std::exp(theta * (0.2 + 0.1));
+- probs["A BC"] = std::exp(theta * (0.0 + 0.5));
+- probs["A B C"] = std::exp(theta * (0.0 + 0.0 + 0.1));
++ probs["ABC"] = std::exp(inv_theta * 1.0);
++ probs["AB C"] = std::exp(inv_theta * (0.2 + 0.1));
++ probs["A BC"] = std::exp(inv_theta * (0.0 + 0.5));
++ probs["A B C"] = std::exp(inv_theta * (0.0 + 0.0 + 0.1));
+
+ double Z = 0.0;
+ for (const auto &it : probs) Z += it.second;
+@@ -349,7 +350,7 @@ TEST(LatticeTest, CalculateEntropyTest) {
+ for (const auto &it : probs) {
+ entropy += (it.second * std::log(it.second));
+ }
+- EXPECT_NEAR(-entropy, lattice.CalculateEntropy(theta), 0.02);
++ EXPECT_NEAR(-entropy, lattice.CalculateEntropy(inv_theta), 0.02);
+ }
+ }
+
+@@ -364,9 +365,9 @@ TEST(LatticeTest, ForwardAlgorithmTest) {
+ InsertWithScore(&lattice, 1, 2, 0.5); // BC
+ InsertWithScore(&lattice, 0, 3, 1.0); // ABC
+
+- const std::vector<float> kTheta = {0.0, 0.01, 0.5, 0.7, 1.0};
+- for (const auto theta : kTheta) {
+- std::vector<float> alpha = lattice.ForwardAlgorithm(theta);
++ const std::vector<float> kInv_Theta = {0.0, 0.01, 0.5, 0.7, 1.0};
++ for (const auto inv_theta : kInv_Theta) {
++ std::vector<float> alpha = lattice.ForwardAlgorithm(inv_theta);
+ EXPECT_EQ(alpha.size(), 8); // 6 nodes, plus BOS, EOS
+ // only alpha[C], alpha[EOS] have non-zero alpha
+ for (int i : {0, 1, 2, 3}) {
+@@ -374,14 +375,15 @@ TEST(LatticeTest, ForwardAlgorithmTest) {
+ if (i < 2) {
+ EXPECT_EQ(alpha[node->node_id], 0.0);
+ } else if (i == 2) {
+- float Z =
+- std::log(std::exp(theta * (0.0 + 0.0)) + std::exp(theta * 0.2));
++ float Z = std::log(std::exp(inv_theta * (0.0 + 0.0)) +
++ std::exp(inv_theta * 0.2));
+ EXPECT_EQ(alpha[node->node_id], Z);
+ } else if (i == 3) {
+- float Z = std::log(std::exp(theta * (0.0 + 0.0 + 0.1)) + // A + B + C
+- std::exp(theta * (0.2 + 0.1)) + // AB + C
+- std::exp(theta * (0.0 + 0.5)) + // A + BC
+- std::exp(theta * 1.0)); // ABC
++ float Z =
++ std::log(std::exp(inv_theta * (0.0 + 0.0 + 0.1)) + // A + B + C
++ std::exp(inv_theta * (0.2 + 0.1)) + // AB + C
++ std::exp(inv_theta * (0.0 + 0.5)) + // A + BC
++ std::exp(inv_theta * 1.0)); // ABC
+ EXPECT_EQ(Z, alpha[node->node_id]);
+ }
+ }
+@@ -435,14 +437,14 @@ TEST(LatticeTest, SampleTest) {
+ InsertWithScoreAndId(&lattice, 1, 2, 1.7, 4); // BC
+ InsertWithScoreAndId(&lattice, 0, 3, 1.8, 5); // ABC
+
+- const std::vector<double> kTheta = {0.0, 0.01, 0.5, 0.7, 1.0};
+- for (int i = 0; i < kTheta.size(); ++i) {
++ const std::vector<double> kInv_Theta = {0.0, 0.01, 0.5, 0.7, 1.0};
++ for (int i = 0; i < kInv_Theta.size(); ++i) {
+ std::map<std::string, double> probs;
+ // Expands all paths in the lattice.
+- probs["A B C"] = exp(kTheta[i] * (1.0 + 1.2 + 1.5)); // A B C
+- probs["AB C"] = exp(kTheta[i] * (1.6 + 1.5)); // AB C
+- probs["A BC"] = exp(kTheta[i] * (1.0 + 1.7)); // A BC
+- probs["ABC"] = exp(kTheta[i] * 1.8); // ABC
++ probs["A B C"] = exp(kInv_Theta[i] * (1.0 + 1.2 + 1.5)); // A B C
++ probs["AB C"] = exp(kInv_Theta[i] * (1.6 + 1.5)); // AB C
++ probs["A BC"] = exp(kInv_Theta[i] * (1.0 + 1.7)); // A BC
++ probs["ABC"] = exp(kInv_Theta[i] * 1.8); // ABC
+
+ // Computes expected probabilities.
+ double Z = 0.0;
+@@ -453,7 +455,7 @@ TEST(LatticeTest, SampleTest) {
+ constexpr int kTrial = 100000;
+ std::map<std::string, int> freq;
+ for (int n = 0; n < kTrial; ++n) {
+- freq[GetTokenized(lattice.Sample(kTheta[i]))]++;
++ freq[GetTokenized(lattice.Sample(kInv_Theta[i]))]++;
+ }
+
+ EXPECT_EQ(probs.size(), freq.size());
+@@ -480,18 +482,18 @@ ModelProto MakeBaseModelProto() {
+ }
+
+ // Returns model protos in parameterized tests.
+-const std::vector<EncoderVersion> &GetEncoderVersions() {
+- static const std::vector<EncoderVersion> &v =
+- *new std::vector<EncoderVersion>{EncoderVersion::kOptimized,
+- EncoderVersion::kOriginal};
++const std::vector<Model::EncoderVersion> &GetEncoderVersions() {
++ static const std::vector<Model::EncoderVersion> &v =
++ *new std::vector<Model::EncoderVersion>{Model::kOptimized,
++ Model::kOriginal};
+ return v;
+ }
+
+-class UnigramModelTest : public test::TestWithParam<EncoderVersion> {
++class UnigramModelTest : public test::TestWithParam<Model::EncoderVersion> {
+ protected:
+ void SetUp() override { encoder_version_ = GetParam(); }
+ void TearDown() override {}
+- EncoderVersion encoder_version_;
++ Model::EncoderVersion encoder_version_;
+ };
+
+ void AddPiece(ModelProto *model_proto, const std::string &piece,
+@@ -530,15 +532,15 @@ TEST(UnigramModelTest, SampleEncodeAndScoreTest) {
+ lattice.SetSentence("ABC");
+ model.PopulateNodes(&lattice);
+
+- std::vector<float> kTheta = {0.0, 1.0};
++ std::vector<float> kInv_Theta = {0.0, 1.0};
+
+- for (const auto theta : kTheta) {
++ for (const auto inv_theta : kInv_Theta) {
+ std::vector<std::string> strings = {"ABC", "AB C", "A BC", "A B C"};
+ std::map<std::string, float> probs;
+- probs["ABC"] = std::exp(theta * 1.0);
+- probs["AB C"] = std::exp(theta * (0.2 + 0.1));
+- probs["A BC"] = std::exp(theta * (0.0 + 0.5));
+- probs["A B C"] = std::exp(theta * (0.0 + 0.0 + 0.1));
++ probs["ABC"] = std::exp(inv_theta * 1.0);
++ probs["AB C"] = std::exp(inv_theta * (0.2 + 0.1));
++ probs["A BC"] = std::exp(inv_theta * (0.0 + 0.5));
++ probs["A B C"] = std::exp(inv_theta * (0.0 + 0.0 + 0.1));
+
+ for (const auto &it : strings) {
+ EXPECT_EQ(1, probs.count(it));
+@@ -579,8 +581,8 @@ TEST(UnigramModelTest, SampleEncodeAndScoreTest) {
+ std::map<std::string, float> scores;
+ int kTrials = 50000;
+ for (int i = 0; i < kTrials; i++) {
+- NBestEncodeResult sample =
+- model.SampleEncodeAndScore("ABC", theta, num_samples, true, false);
++ NBestEncodeResult sample = model.SampleEncodeAndScore(
++ "ABC", inv_theta, num_samples, true, false);
+
+ for (const auto &it : sample) {
+ std::vector<std::string> tokens;
+@@ -619,7 +621,7 @@ TEST_P(UnigramModelTest, PieceToIdTest) {
+ AddPiece(&model_proto, "d", 0.4);
+
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ EXPECT_EQ(model_proto.SerializeAsString(),
+ model.model_proto().SerializeAsString());
+@@ -677,7 +679,7 @@ TEST_P(UnigramModelTest, PopulateNodesAllUnknownsTest) {
+ ModelProto model_proto = MakeBaseModelProto();
+ AddPiece(&model_proto, "x");
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ Lattice lattice;
+ lattice.SetSentence("abc");
+@@ -701,7 +703,7 @@ TEST_P(UnigramModelTest, PopulateNodesTest) {
+ AddPiece(&model_proto, "bc", 0.4); // 6
+
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ Lattice lattice;
+ lattice.SetSentence("abc");
+@@ -736,7 +738,7 @@ TEST_P(UnigramModelTest, PopulateNodesWithUnusedTest) {
+ model_proto.mutable_pieces(6)->set_type(ModelProto::SentencePiece::UNUSED);
+
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ Lattice lattice;
+ lattice.SetSentence("abc");
+@@ -761,7 +763,7 @@ TEST_P(UnigramModelTest, ModelNBestTest) {
+ AddPiece(&model_proto, "abc", 10.0); // 8
+
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ auto nbest = model.NBestEncode("", 10);
+ EXPECT_EQ(1, nbest.size());
+@@ -800,7 +802,7 @@ TEST_P(UnigramModelTest, EncodeTest) {
+ ModelProto::SentencePiece::USER_DEFINED);
+
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+
+ EncodeResult result;
+
+@@ -883,7 +885,7 @@ TEST_P(UnigramModelTest, EncodeWithUnusedTest) {
+ // No unused.
+ {
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+ const auto result = model.Encode("abcd");
+ EXPECT_EQ(1, result.size());
+ EXPECT_EQ("abcd", result[0].first);
+@@ -892,7 +894,7 @@ TEST_P(UnigramModelTest, EncodeWithUnusedTest) {
+ {
+ model_proto.mutable_pieces(3)->set_type(ModelProto::SentencePiece::UNUSED);
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+ const auto result = model.Encode("abcd");
+ EXPECT_EQ(2, result.size());
+ EXPECT_EQ("abc", result[0].first);
+@@ -903,7 +905,7 @@ TEST_P(UnigramModelTest, EncodeWithUnusedTest) {
+ model_proto.mutable_pieces(3)->set_type(ModelProto::SentencePiece::UNUSED);
+ model_proto.mutable_pieces(5)->set_type(ModelProto::SentencePiece::UNUSED);
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+ const auto result = model.Encode("abcd");
+ EXPECT_EQ(2, result.size());
+ EXPECT_EQ("abc", result[0].first);
+@@ -917,7 +919,7 @@ TEST_P(UnigramModelTest, EncodeWithUnusedTest) {
+ model_proto.mutable_pieces(4)->set_type(ModelProto::SentencePiece::UNUSED);
+ model_proto.mutable_pieces(5)->set_type(ModelProto::SentencePiece::NORMAL);
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+ const auto result = model.Encode("abcd");
+ EXPECT_EQ(2, result.size());
+ EXPECT_EQ("ab", result[0].first);
+@@ -937,7 +939,7 @@ TEST_P(UnigramModelTest, VerifyOutputsEquivalent) {
+ AddPiece(&model_proto, "c", 2.0); // 9
+ AddPiece(&model_proto, "d", 1.0); // 10
+ Model model(model_proto);
+- EXPECT_TRUE(model.SetEncoderVersion(encoder_version_).ok());
++ model.SetEncoderVersion(encoder_version_);
+ // Equivalent outputs.
+ EXPECT_TRUE(model.VerifyOutputsEquivalent("", ""));
+ EXPECT_TRUE(model.VerifyOutputsEquivalent("a b", "a b"));
+diff --git a/src/util.h b/src/util.h
+index 285676d..fb312f1 100644
+--- a/src/util.h
++++ b/src/util.h
+@@ -60,17 +60,6 @@ uint32 GetRandomGeneratorSeed();
+ // String utilities
+ namespace string_util {
+
+-struct string_view_hash {
+- // DJB hash function.
+- inline size_t operator()(const absl::string_view &sp) const {
+- size_t hash = 5381;
+- for (size_t i = 0; i < sp.size(); ++i) {
+- hash = ((hash << 5) + hash) + sp[i];
+- }
+- return hash;
+- }
+-};
+-
+ template <typename Target>
+ inline bool lexical_cast(absl::string_view arg, Target *result) {
+ std::stringstream ss;
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Mon, 20 Jun 2022 01:35:11 +0900
+Subject: add verbose option
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ .github/workflows/cmake.yml | 2 +-
+ src/common.h | 13 -------------
+ src/normalizer.cc | 7 +++----
+ src/sentencepiece_processor.cc | 10 ++++++----
+ src/sentencepiece_processor.h | 9 +++------
+ src/util.h | 1 -
+ 6 files changed, 13 insertions(+), 29 deletions(-)
+
+diff --git a/.github/workflows/cmake.yml b/.github/workflows/cmake.yml
+index 7f19083..5108074 100644
+--- a/.github/workflows/cmake.yml
++++ b/.github/workflows/cmake.yml
+@@ -45,7 +45,7 @@ jobs:
+
+ - name: Test
+ working-directory: ${{github.workspace}}/build
+- run: ctest -C Release
++ run: ctest -C Release --output-on-failure
+
+ - name: Package
+ working-directory: ${{github.workspace}}/build
+diff --git a/src/common.h b/src/common.h
+index c27c352..ba951d6 100644
+--- a/src/common.h
++++ b/src/common.h
+@@ -98,15 +98,6 @@ class Die {
+ private:
+ bool die_;
+ };
+-
+-template <typename T>
+-T &&CheckNotNull(const char *file, int line, const char *exprtext, T &&t) {
+- if (t == nullptr) {
+- std::cerr << file << "(" << line << ") " << exprtext;
+- Abort();
+- }
+- return std::forward<T>(t);
+-}
+ } // namespace error
+
+ namespace logging {
+@@ -158,10 +149,6 @@ inline const char *BaseName(const char *path) {
+ #define CHECK_LE(a, b) CHECK((a) <= (b))
+ #define CHECK_GT(a, b) CHECK((a) > (b))
+ #define CHECK_LT(a, b) CHECK((a) < (b))
+-#define CHECK_NOTNULL(val) \
+- ::sentencepiece::error::CheckNotNull( \
+- ::sentencepiece::logging::BaseName(__FILE__), __LINE__, \
+- "'" #val "' Must be non NULL", (val))
+
+ #define FRIEND_TEST(a, b) friend class a##_Test_##b;
+
+diff --git a/src/normalizer.cc b/src/normalizer.cc
+index d87f89b..2ab8084 100644
+--- a/src/normalizer.cc
++++ b/src/normalizer.cc
+@@ -12,11 +12,12 @@
+ // See the License for the specific language governing permissions and
+ // limitations under the License.!
+
++#include "normalizer.h"
++
+ #include <utility>
+ #include <vector>
+
+ #include "common.h"
+-#include "normalizer.h"
+ #include "third_party/absl/memory/memory.h"
+ #include "third_party/absl/strings/match.h"
+ #include "third_party/absl/strings/string_view.h"
+@@ -46,9 +47,7 @@ Normalizer::~Normalizer() {}
+
+ void Normalizer::Init() {
+ absl::string_view index = spec_->precompiled_charsmap();
+- if (index.empty()) {
+- LOG(INFO) << "precompiled_charsmap is empty. use identity normalization.";
+- } else {
++ if (!index.empty()) {
+ absl::string_view trie_blob, normalized;
+ #ifdef IS_BIG_ENDIAN
+ status_ = DecodePrecompiledCharsMap(index, &trie_blob, &normalized,
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index a6f5395..805e0f9 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -67,12 +67,12 @@ ImmutableSentencePieceText::ImmutableSentencePiece::ImmutableSentencePiece(
+ const SentencePieceText_SentencePiece &sp)
+ : sp_(&sp) {}
+
+-absl::string_view ImmutableSentencePieceText::ImmutableSentencePiece::piece()
++const std::string &ImmutableSentencePieceText::ImmutableSentencePiece::piece()
+ const {
+ return sp_->piece();
+ }
+
+-absl::string_view ImmutableSentencePieceText::ImmutableSentencePiece::surface()
++const std::string &ImmutableSentencePieceText::ImmutableSentencePiece::surface()
+ const {
+ return sp_->surface();
+ }
+@@ -109,8 +109,10 @@ ImmutableSentencePieceText::pieces(int index) const {
+ spt_->pieces(index));
+ }
+
+-absl::string_view ImmutableSentencePieceText::text() const {
+- return spt_ ? spt_->text() : "";
++const std::string &ImmutableSentencePieceText::text() const {
++ if (spt_) return spt_->text();
++ static std::string *kEmptyString = new std::string();
++ return *kEmptyString;
+ }
+
+ float ImmutableSentencePieceText::score() const {
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index 51c5b3b..8124c59 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -165,8 +165,8 @@ class ImmutableSentencePieceText {
+ class ImmutableSentencePiece {
+ public:
+ ~ImmutableSentencePiece() = default;
+- absl::string_view piece() const;
+- absl::string_view surface() const;
++ const std::string &piece() const;
++ const std::string &surface() const;
+ uint32_t id() const;
+ uint32_t begin() const;
+ uint32_t end() const;
+@@ -182,7 +182,7 @@ class ImmutableSentencePieceText {
+ std::vector<ImmutableSentencePiece> pieces() const;
+ size_t pieces_size() const;
+ ImmutableSentencePiece pieces(int index) const;
+- absl::string_view text() const;
++ const std::string &text() const;
+ float score() const;
+
+ std::string SerializeAsString() const;
+@@ -193,7 +193,6 @@ class ImmutableSentencePieceText {
+ SentencePieceText *mutable_proto();
+
+ friend class ImmutableNBestSentencePieceText;
+- friend class SentencePieceProcessor;
+
+ private:
+ explicit ImmutableSentencePieceText(const SentencePieceText &spt);
+@@ -222,8 +221,6 @@ class ImmutableNBestSentencePieceText {
+ // it returns the raw pointer managed by the shared_ptr.
+ NBestSentencePieceText *mutable_proto();
+
+- friend class SentencePieceProcessor;
+-
+ private:
+ std::shared_ptr<NBestSentencePieceText> rep_;
+ };
+diff --git a/src/util.h b/src/util.h
+index fb312f1..01a561f 100644
+--- a/src/util.h
++++ b/src/util.h
+@@ -94,7 +94,6 @@ inline bool lexical_cast(absl::string_view arg, std::string *result) {
+
+ template <typename T>
+ inline bool DecodePOD(absl::string_view str, T *result) {
+- CHECK_NOTNULL(result);
+ if (sizeof(*result) != str.size()) {
+ return false;
+ }
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Mon, 1 Aug 2022 17:19:09 +0900
+Subject: Supports ImmutableSentencePieceText from python module
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/__init__.py | 228 ++-
+ python/src/sentencepiece/sentencepiece.i | 310 ++-
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 2310 ++++++++++++++++++-----
+ python/test/sentencepiece_test.py | 62 +-
+ src/sentencepiece_processor.cc | 87 +-
+ src/sentencepiece_processor.h | 61 +-
+ src/sentencepiece_processor_test.cc | 137 +-
+ 7 files changed, 2524 insertions(+), 671 deletions(-)
+
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index 1543d32..69a9825 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -61,6 +61,98 @@ class _SwigNonDynamicMeta(type):
+ __setattr__ = _swig_setattr_nondynamic_class_variable(type.__setattr__)
+
+
++class ImmutableSentencePieceText_ImmutableSentencePiece(object):
++ thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
++ __repr__ = _swig_repr
++
++ def __init__(self):
++ _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_swiginit(self, _sentencepiece.new_ImmutableSentencePieceText_ImmutableSentencePiece())
++ __swig_destroy__ = _sentencepiece.delete_ImmutableSentencePieceText_ImmutableSentencePiece
++
++ def piece(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_piece(self)
++
++ def surface(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_surface(self)
++
++ def id(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_id(self)
++
++ def begin(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_begin(self)
++
++ def end(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_end(self)
++
++# Register ImmutableSentencePieceText_ImmutableSentencePiece in _sentencepiece:
++_sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_swigregister(ImmutableSentencePieceText_ImmutableSentencePiece)
++
++class ImmutableSentencePieceText(object):
++ thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
++ __repr__ = _swig_repr
++
++ def __init__(self):
++ _sentencepiece.ImmutableSentencePieceText_swiginit(self, _sentencepiece.new_ImmutableSentencePieceText())
++ __swig_destroy__ = _sentencepiece.delete_ImmutableSentencePieceText
++
++ def pieces_size(self):
++ return _sentencepiece.ImmutableSentencePieceText_pieces_size(self)
++
++ def text(self):
++ return _sentencepiece.ImmutableSentencePieceText_text(self)
++
++ def score(self):
++ return _sentencepiece.ImmutableSentencePieceText_score(self)
++
++ def SerializeAsString(self):
++ return _sentencepiece.ImmutableSentencePieceText_SerializeAsString(self)
++
++ def pieces(self, index):
++ return _sentencepiece.ImmutableSentencePieceText_pieces(self, index)
++
++ def __len__(self):
++ return self.pieces_size()
++
++ def __getitem__(self, i):
++ return self.pieces(i)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
++
++
++# Register ImmutableSentencePieceText in _sentencepiece:
++_sentencepiece.ImmutableSentencePieceText_swigregister(ImmutableSentencePieceText)
++
++class ImmutableNBestSentencePieceText(object):
++ thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
++ __repr__ = _swig_repr
++
++ def __init__(self):
++ _sentencepiece.ImmutableNBestSentencePieceText_swiginit(self, _sentencepiece.new_ImmutableNBestSentencePieceText())
++ __swig_destroy__ = _sentencepiece.delete_ImmutableNBestSentencePieceText
++
++ def nbests_size(self):
++ return _sentencepiece.ImmutableNBestSentencePieceText_nbests_size(self)
++
++ def SerializeAsString(self):
++ return _sentencepiece.ImmutableNBestSentencePieceText_SerializeAsString(self)
++
++ def nbests(self, index):
++ return _sentencepiece.ImmutableNBestSentencePieceText_nbests(self, index)
++
++ def __len__(self):
++ return self.nbests_size()
++
++ def __getitem__(self, i):
++ return self.nbests(i)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
++
++
++# Register ImmutableNBestSentencePieceText in _sentencepiece:
++_sentencepiece.ImmutableNBestSentencePieceText_swigregister(ImmutableNBestSentencePieceText)
++
+ class SentencePieceProcessor(object):
+ thisown = property(lambda x: x.this.own(), lambda x, v: x.this.own(v), doc="The membership flag")
+ __repr__ = _swig_repr
+@@ -87,12 +179,6 @@ class SentencePieceProcessor(object):
+ def LoadVocabulary(self, filename, threshold):
+ return _sentencepiece.SentencePieceProcessor_LoadVocabulary(self, filename, threshold)
+
+- def SampleEncodeAndScoreAsPieces(self, input, num_samples, theta, wor, include_best):
+- return _sentencepiece.SentencePieceProcessor_SampleEncodeAndScoreAsPieces(self, input, num_samples, theta, wor, include_best)
+-
+- def SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best):
+- return _sentencepiece.SentencePieceProcessor_SampleEncodeAndScoreAsIds(self, input, num_samples, theta, wor, include_best)
+-
+ def CalculateEntropy(self, *args):
+ return _sentencepiece.SentencePieceProcessor_CalculateEntropy(self, *args)
+
+@@ -147,6 +233,9 @@ class SentencePieceProcessor(object):
+ def _EncodeAsSerializedProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__EncodeAsSerializedProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
++ def _EncodeAsImmutableProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsImmutableProto(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
+ def _EncodeAsIdsBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__EncodeAsIdsBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+@@ -156,6 +245,9 @@ class SentencePieceProcessor(object):
+ def _EncodeAsSerializedProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__EncodeAsSerializedProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
++ def _EncodeAsImmutableProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__EncodeAsImmutableProtoBatch(self, ins, num_threads, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
++
+ def _DecodeIds(self, ids):
+ return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids)
+
+@@ -168,6 +260,12 @@ class SentencePieceProcessor(object):
+ def _DecodePiecesAsSerializedProto(self, pieces):
+ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsSerializedProto(self, pieces)
+
++ def _DecodeIdsAsImmutableProto(self, ids):
++ return _sentencepiece.SentencePieceProcessor__DecodeIdsAsImmutableProto(self, ids)
++
++ def _DecodePiecesAsImmutableProto(self, pieces):
++ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsImmutableProto(self, pieces)
++
+ def _DecodeIdsBatch(self, ins, num_threads):
+ return _sentencepiece.SentencePieceProcessor__DecodeIdsBatch(self, ins, num_threads)
+
+@@ -180,6 +278,9 @@ class SentencePieceProcessor(object):
+ def _DecodePiecesAsSerializedProtoBatch(self, ins, num_threads):
+ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch(self, ins, num_threads)
+
++ def _DecodePiecesAsImmutableProtoBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch(self, ins, num_threads)
++
+ def _NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsIds(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
+
+@@ -189,17 +290,26 @@ class SentencePieceProcessor(object):
+ def _NBestEncodeAsSerializedProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
+ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsSerializedProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def _SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
+- return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsIds(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
++ def _NBestEncodeAsImmutableProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__NBestEncodeAsImmutableProto(self, text, nbest_size, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _SampleEncodeAndScoreAsIds(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsIds(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def _SampleEncodeAndScoreAsPieces(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
+- return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsPieces(self, text, num_samples, theta, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
++ def _SampleEncodeAndScoreAsPieces(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsPieces(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def _CalculateEntropy(self, text, theta):
+- return _sentencepiece.SentencePieceProcessor__CalculateEntropy(self, text, theta)
++ def _SampleEncodeAndScoreAsSerializedProto(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
+
+- def _CalculateEntropyBatch(self, ins, theta, num_threads):
+- return _sentencepiece.SentencePieceProcessor__CalculateEntropyBatch(self, ins, theta, num_threads)
++ def _SampleEncodeAndScoreAsImmutableProto(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece):
++ return _sentencepiece.SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto(self, text, num_samples, alpha, wor, include_best, add_bos, add_eos, reverse, emit_unk_piece)
++
++ def _CalculateEntropy(self, text, alpha):
++ return _sentencepiece.SentencePieceProcessor__CalculateEntropy(self, text, alpha)
++
++ def _CalculateEntropyBatch(self, ins, alpha, num_threads):
++ return _sentencepiece.SentencePieceProcessor__CalculateEntropyBatch(self, ins, alpha, num_threads)
+
+ def Init(self,
+ model_file=None,
+@@ -319,9 +429,12 @@ class SentencePieceProcessor(object):
+ if out_type is str:
+ return self._EncodeAsPiecesBatch(input, num_threads, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._EncodeAsSerializedProtoBatch(input, num_threads, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._EncodeAsImmutableProtoBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+ if out_type is int:
+ return self._EncodeAsIds(input, enable_sampling, nbest_size,
+@@ -329,9 +442,12 @@ class SentencePieceProcessor(object):
+ if out_type is str:
+ return self._EncodeAsPieces(input, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._EncodeAsSerializedProto(input, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._EncodeAsImmutableProto(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+ raise RuntimeError('unknown out_type={}'.format(out_type))
+ return None
+@@ -346,7 +462,11 @@ class SentencePieceProcessor(object):
+
+
+ def EncodeAsSerializedProto(self, input, **kwargs):
+- return self.Encode(input=input, out_type='proto', **kwargs)
++ return self.Encode(input=input, out_type='serialized_proto', **kwargs)
++
++
++ def EncodeAsImmutableProto(self, input, **kwargs):
++ return self.Encode(input=input, out_type='immutable_proto', **kwargs)
+
+
+ def SampleEncodeAsPieces(self, input, nbest_size=None, alpha=None, **kwargs):
+@@ -361,7 +481,12 @@ class SentencePieceProcessor(object):
+
+ def SampleEncodeAsSerializedProto(self, input, nbest_size=None, alpha=None, **kwargs):
+ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
+- out_type='proto', enable_sampling=True, **kwargs)
++ out_type='serialized_proto', enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsImmutableProto(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type='immutable_proto', enable_sampling=True, **kwargs)
+
+
+ def NBestEncode(self,
+@@ -407,9 +532,12 @@ class SentencePieceProcessor(object):
+ if out_type is str:
+ return self._NBestEncodeAsPieces(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._NBestEncodeAsSerializedProto(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._NBestEncodeAsImmutableProto(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+ return [_encode(n) for n in input]
+@@ -429,7 +557,12 @@ class SentencePieceProcessor(object):
+
+ def NBestEncodeAsSerializedProto(self, input, nbest_size=None, **kwargs):
+ return self.NBestEncode(input=input, nbest_size=nbest_size,
+- out_type='proto', **kwargs)
++ out_type='serialized_proto', **kwargs)
++
++
++ def NBestEncodeAsImmutableProto(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type='immutable_proto', **kwargs)
+
+
+ def SampleEncodeAndScore(self,
+@@ -440,20 +573,20 @@ class SentencePieceProcessor(object):
+ reverse=None,
+ emit_unk_piece=None,
+ num_samples=None,
+- theta=None,
++ alpha=None,
+ wor=None,
+ include_best=None):
+ """SampleEncodeAndScore text input to segmented ids or tokens.
+
+ Args:
+ input: input string. accepsts list of string.
+- out_type: output type. int or str or 'proto'.
++ out_type: output type. int or str or 'serialized_proto' or 'immutable_proto'
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+ num_samples: How many samples to return (Default = 1)
+- theta: inverse temperature for sampling
++ alpha: inverse temperature for sampling
+ wor: whether to sample without replacement (Default = false)
+ include_best: whether to include the best tokenization, requires wor=True (Default = false)
+ """
+@@ -470,8 +603,8 @@ class SentencePieceProcessor(object):
+ emit_unk_piece = self._emit_unk_piece
+ if num_samples is None:
+ num_samples = 1
+- if theta is None:
+- theta = 1.
++ if alpha is None:
++ alpha = 1.
+ if wor is None:
+ wor = False
+ if include_best is None:
+@@ -486,10 +619,10 @@ class SentencePieceProcessor(object):
+
+ def _encode(text):
+ if out_type is int:
+- return self._SampleEncodeAndScoreAsIds(text, num_samples, theta, wor, include_best,
++ return self._SampleEncodeAndScoreAsIds(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+ else:
+- return self._SampleEncodeAndScoreAsPieces(text, num_samples, theta, wor, include_best,
++ return self._SampleEncodeAndScoreAsPieces(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+@@ -502,7 +635,7 @@ class SentencePieceProcessor(object):
+ """Decode processed id or token sequences.
+
+ Args:
+- out_type: output type. str or 'proto' (Default = str)
++ out_type: output type. str or 'serialized_proto' or 'immutable_proto' (Default = str)
+ num_threads: the number of threads used in the batch processin (Default = 1).
+ """
+
+@@ -533,7 +666,7 @@ class SentencePieceProcessor(object):
+ if type(input[0][0]) is str:
+ return self._DecodePiecesBatch(input, num_threads)
+
+- if out_type == 'proto':
++ if out_type == 'serialized_proto':
+ if type(input) is int:
+ return self._DecodeIdsAsSerializedProto([input])
+ if type(input) is str:
+@@ -552,6 +685,25 @@ class SentencePieceProcessor(object):
+ return self._DecodePiecesAsSerializedProtoBatch(input, num_threads)
+
+
++ if out_type == 'immutable_proto':
++ if type(input) is int:
++ return self._DecodeIdsAsImmutableProto([input])
++ if type(input) is str:
++ return self._DecodePiecesAsImmutableProto([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIdsAsImmutableProto(input)
++ if type(input[0]) is str:
++ return self._DecodePiecesAsImmutableProto(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsAsImmutableProtoBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesAsImmutableProtoBatch(input, num_threads)
++
++
+ raise RuntimeError('unknown output or input type')
+ return None
+
+@@ -564,24 +716,32 @@ class SentencePieceProcessor(object):
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def DecodePiecesAsSerializedProto(self, input, out_type='proto', **kwargs):
++ def DecodePiecesAsSerializedProto(self, input, out_type='serialized_proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodeIdsAsSerializedProto(self, input, out_type='serialized_proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodePiecesAsImmutableProto(self, input, out_type='immutable_proto', **kwargs):
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def DecodeIdsAsSerializedProto(self, input, out_type='proto', **kwargs):
++ def DecodeIdsAsImmutableProto(self, input, out_type='immutable_proto', **kwargs):
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def CalculateEntropy(self, input, theta, num_threads=None):
++ def CalculateEntropy(self, input, alpha, num_threads=None):
+ """Calculate sentence entropy"""
+ if type(input) is list:
+ if num_threads is None:
+ num_threads = self._num_threads
+ if num_threads is None or type(num_threads) is not int:
+ raise RuntimeError('num_threads must be int')
+- return self._CalculateEntropyBatch(input, theta, num_threads)
++ return self._CalculateEntropyBatch(input, alpha, num_threads)
+
+- return self._CalculateEntropy(input, theta)
++ return self._CalculateEntropy(input, alpha)
+
+
+ def piece_size(self):
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 40373ce..1e2e1e0 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -166,7 +166,17 @@ inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
+ if (add_bos || add_eos || reverse || emit_unk_piece) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kUnimplemented,
+- "add_bos, add_eos, reverse, and emit_unk_piece is not supported in AsSerialize API");
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in proto API");
++ }
++}
++
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ sentencepiece::ImmutableSentencePieceText *proto,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++ if (add_bos || add_eos || reverse || emit_unk_piece) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kUnimplemented,
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in proto API");
+ }
+ }
+
+@@ -216,7 +226,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ #define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+ std::vector<OutType> outs(ins.size()); \
+- InitNumThreads(ins, &num_threads); \
++ InitNumThreads(ins, &num_threads); \
+ { \
+ ThreadPool pool(ins.size()); \
+ for (int n = 0; n < num_threads; ++n) { \
+@@ -237,7 +247,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ #define DEFINE_DECODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+ std::vector<OutType> outs(ins.size()); \
+- InitNumThreads(ins, &num_threads); \
++ InitNumThreads(ins, &num_threads); \
+ { \
+ ThreadPool pool(ins.size()); \
+ for (int n = 0; n < num_threads; ++n) { \
+@@ -264,6 +274,8 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+ }
+
++%apply unsigned int { uint32_t }
++
+ %ignore sentencepiece::util::Status;
+ %ignore sentencepiece::util::StatusCode;
+ %ignore absl::string_view;
+@@ -272,32 +284,48 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %ignore sentencepiece::NormalizerSpec;
+ %ignore sentencepiece::TrainerSpec;
+ %ignore sentencepiece::SentencePieceProcessor::status;
++%ignore sentencepiece::ImmutableSentencePieceText::mutable_proto;
++%ignore sentencepiece::ImmutableSentencePieceText::pieces() const;
++%ignore sentencepiece::ImmutableNBestSentencePieceText::mutable_proto;
++%ignore sentencepiece::ImmutableNBestSentencePieceText::nbests() const;
+
+ %ignore sentencepiece::SentencePieceProcessor::Encode;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncode;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncode;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScore;
++%ignore sentencepiece::SentencePieceProcessor::Decode;
++
+ %ignore sentencepiece::SentencePieceProcessor::EncodeAsPieces;
+ %ignore sentencepiece::SentencePieceProcessor::EncodeAsIds;
+-%ignore sentencepiece::SentencePieceProcessor::EncodeAsSerializedProto;
+-%ignore sentencepiece::SentencePieceProcessor::SampleEncode;
+ %ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsIds;
+ %ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsPieces;
+-%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsSerializedProto;
+-%ignore sentencepiece::SentencePieceProcessor::NBestEncode;
+-%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsPieces;
+ %ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsIds;
+-%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsSerializedProto;
+-%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScore;
+-
+-%ignore sentencepiece::SentencePieceProcessor::Decode;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsPieces;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScoreAsIds;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScoreAsPieces;
+ %ignore sentencepiece::SentencePieceProcessor::DecodeIds;
+ %ignore sentencepiece::SentencePieceProcessor::DecodePieces;
++
++%ignore sentencepiece::SentencePieceProcessor::EncodeAsSerializedProto;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsSerializedProto;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsSerializedProto;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScoreAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::DecodePiecesAsSerializedProto;
+ %ignore sentencepiece::SentencePieceProcessor::DecodeIdsAsSerializedProto;
+
++%ignore sentencepiece::SentencePieceProcessor::EncodeAsImmutableProto;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAsImmutableProto;
++%ignore sentencepiece::SentencePieceProcessor::NBestEncodeAsImmutableProto;
++%ignore sentencepiece::SentencePieceProcessor::SampleEncodeAndScoreAsImmutableProto;
++%ignore sentencepiece::SentencePieceProcessor::DecodePiecesAsImmutableProto;
++%ignore sentencepiece::SentencePieceProcessor::DecodeIdsAsImmutableProto;
++
+ %ignore sentencepiece::SentencePieceProcessor::model_proto;
+ %ignore sentencepiece::SentencePieceProcessor::Load;
+ %ignore sentencepiece::SentencePieceProcessor::LoadOrDie;
+ %ignore sentencepiece::pretokenizer::PretokenizerForTrainingInterface;
+ %ignore sentencepiece::SentenceIterator;
++%ignore sentencepiece::ConvertToUnicodeSpans;
+ %ignore sentencepiece::SentencePieceTrainer::Train;
+ %ignore sentencepiece::SentencePieceTrainer::GetNormalizerSpec;
+ %ignore sentencepiece::SentencePieceTrainer::PopulateNormalizerSpec;
+@@ -351,6 +379,19 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return proto;
+ }
+
++ sentencepiece::ImmutableSentencePieceText
++ _EncodeAsImmutableProto(absl::string_view text,
++ bool enable_sampling,
++ int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ auto proto = enable_sampling ?
++ $self->SampleEncodeAsImmutableProto(text, nbest_size, alpha) :
++ $self->EncodeAsImmutableProto(text);
++ RewriteIds(*$self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
++ return proto;
++ }
++
+ /////////////////////////////////////////////////////////////////////////////
+ // EncodeAs* (Batch request)
+ std::vector<std::vector<int>> _EncodeAsIdsBatch(
+@@ -381,6 +422,17 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ sentencepiece::util::bytes);
+ }
+
++ std::vector<sentencepiece::ImmutableSentencePieceText>
++ _EncodeAsImmutableProtoBatch(
++ const std::vector<absl::string_view> &ins, int num_threads,
++ bool enable_sampling, int nbest_size, float alpha,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsImmutableProto,
++ absl::string_view,
++ sentencepiece::ImmutableSentencePieceText);
++ }
++
+ /////////////////////////////////////////////////////////////////////////////
+ // DecodeAs* (Single request)
+ std::string _DecodeIds(const std::vector<int> &ids) const {
+@@ -404,6 +456,18 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return $self->DecodePiecesAsSerializedProto(pieces);
+ }
+
++ sentencepiece::ImmutableSentencePieceText _DecodeIdsAsImmutableProto(
++ const std::vector<int> &ids) const {
++ CheckIds(ids, $self->GetPieceSize());
++ return $self->DecodeIdsAsImmutableProto(ids);
++ }
++
++ sentencepiece::ImmutableSentencePieceText _DecodePiecesAsImmutableProto(
++ const std::vector<absl::string_view> &pieces) const {
++ CheckIds(pieces, $self->GetPieceSize());
++ return $self->DecodePiecesAsImmutableProto(pieces);
++ }
++
+ /////////////////////////////////////////////////////////////////////////////
+ // DecodeAs* (Batch request)
+ std::vector<std::string> _DecodeIdsBatch(
+@@ -428,6 +492,13 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ sentencepiece::util::bytes);
+ }
+
++ std::vector<sentencepiece::ImmutableSentencePieceText>
++ _DecodePiecesAsImmutableProtoBatch(
++ const std::vector<std::vector<absl::string_view>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsImmutableProto, std::string,
++ sentencepiece::ImmutableSentencePieceText);
++ }
++
+ ////////////////////////////////////////////////////////////////////////////
+ // NBestEncodeAs* (Single request)
+ std::vector<std::vector<int>>
+@@ -454,25 +525,37 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return piecess;
+ }
+
+- sentencepiece::util::bytes _NBestEncodeAsSerializedProto(absl::string_view text,
+- int nbest_size,
+- bool add_bos, bool add_eos, bool reverse,
+- bool emit_unk_piece) const {
++ sentencepiece::util::bytes
++ _NBestEncodeAsSerializedProto(absl::string_view text,
++ int nbest_size,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
+ RewriteIds(*$self, static_cast<sentencepiece::util::bytes *>(nullptr),
+ add_bos, add_eos, reverse, emit_unk_piece);
+ return $self->NBestEncodeAsSerializedProto(text, nbest_size);
+ }
+
++ sentencepiece::ImmutableNBestSentencePieceText
++ _NBestEncodeAsImmutableProto(absl::string_view text,
++ int nbest_size,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ RewriteIds(*$self, static_cast<sentencepiece::ImmutableSentencePieceText *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return $self->NBestEncodeAsImmutableProto(text, nbest_size);
++ }
++
++
+ /////////////////////////////////////////////////////////////////////////////
+ // SampleEncodeAndScoreAs* (Single request)
+ std::vector<std::pair<std::vector<int>, float>>
+ _SampleEncodeAndScoreAsIds(absl::string_view text,
+- int num_samples, float theta, bool wor,
++ int num_samples, float alpha, bool wor,
+ bool include_best,
+ bool add_bos, bool add_eos, bool reverse,
+ bool emit_unk_piece) const {
+ auto idss = $self->SampleEncodeAndScoreAsIds(text, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ for (auto &ids : idss) {
+ RewriteIds(*$self, &ids.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+@@ -481,25 +564,50 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ std::vector<std::pair<std::vector<std::string>, float>>
+ _SampleEncodeAndScoreAsPieces(absl::string_view text,
+- int num_samples, float theta, bool wor,
++ int num_samples, float alpha, bool wor,
+ bool include_best,
+ bool add_bos, bool add_eos, bool reverse,
+ bool emit_unk_piece) const {
+ auto piecess = $self->SampleEncodeAndScoreAsPieces(text, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ for (auto &pieces : piecess) {
+ RewriteIds(*$self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+ }
+
++ sentencepiece::util::bytes
++ _SampleEncodeAndScoreAsSerializedProto(absl::string_view text,
++ int num_samples, float alpha, bool wor,
++ bool include_best,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ RewriteIds(*$self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return $self->SampleEncodeAndScoreAsSerializedProto(text, num_samples,
++ alpha, wor, include_best);
++ }
++
++ sentencepiece::ImmutableNBestSentencePieceText
++ _SampleEncodeAndScoreAsImmutableProto(absl::string_view text,
++ int num_samples, float alpha, bool wor,
++ bool include_best,
++ bool add_bos, bool add_eos, bool reverse,
++ bool emit_unk_piece) const {
++ RewriteIds(*$self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return $self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
++ alpha, wor, include_best);
++ }
++
++
+ // Calculate Entropy
+- float _CalculateEntropy(absl::string_view text, float theta) {
+- return $self->CalculateEntropy(text, theta);
++ float _CalculateEntropy(absl::string_view text, float alpha) {
++ return $self->CalculateEntropy(text, alpha);
+ }
+
+ std::vector<float> _CalculateEntropyBatch(const std::vector<absl::string_view> &ins,
+- float theta, int num_threads) {
++ float alpha, int num_threads) {
+ std::vector<float> outs(ins.size());
+ InitNumThreads(ins, &num_threads);
+ {
+@@ -507,7 +615,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ for (int n = 0; n < num_threads; ++n) {
+ pool.Schedule([&, n]() {
+ for (size_t i = n; i < ins.size(); i += num_threads) {
+- outs[i] = self->CalculateEntropy(ins[i], theta);
++ outs[i] = self->CalculateEntropy(ins[i], alpha);
+ }
+ });
+ }
+@@ -634,9 +742,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ if out_type is str:
+ return self._EncodeAsPiecesBatch(input, num_threads, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._EncodeAsSerializedProtoBatch(input, num_threads, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._EncodeAsImmutableProtoBatch(input, num_threads, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+ if out_type is int:
+ return self._EncodeAsIds(input, enable_sampling, nbest_size,
+@@ -644,9 +755,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ if out_type is str:
+ return self._EncodeAsPieces(input, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._EncodeAsSerializedProto(input, enable_sampling, nbest_size,
+ alpha, add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._EncodeAsImmutableProto(input, enable_sampling, nbest_size,
++ alpha, add_bos, add_eos, reverse, emit_unk_piece)
+
+ raise RuntimeError('unknown out_type={}'.format(out_type))
+ return None
+@@ -661,7 +775,11 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+
+ def EncodeAsSerializedProto(self, input, **kwargs):
+- return self.Encode(input=input, out_type='proto', **kwargs)
++ return self.Encode(input=input, out_type='serialized_proto', **kwargs)
++
++
++ def EncodeAsImmutableProto(self, input, **kwargs):
++ return self.Encode(input=input, out_type='immutable_proto', **kwargs)
+
+
+ def SampleEncodeAsPieces(self, input, nbest_size=None, alpha=None, **kwargs):
+@@ -676,7 +794,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ def SampleEncodeAsSerializedProto(self, input, nbest_size=None, alpha=None, **kwargs):
+ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
+- out_type='proto', enable_sampling=True, **kwargs)
++ out_type='serialized_proto', enable_sampling=True, **kwargs)
++
++
++ def SampleEncodeAsImmutableProto(self, input, nbest_size=None, alpha=None, **kwargs):
++ return self.Encode(input=input, nbest_size=nbest_size, alpha=alpha,
++ out_type='immutable_proto', enable_sampling=True, **kwargs)
+
+
+ def NBestEncode(self,
+@@ -722,9 +845,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ if out_type is str:
+ return self._NBestEncodeAsPieces(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
+- if out_type == 'proto':
++ if out_type == 'serialized_proto' or out_type == 'proto':
+ return self._NBestEncodeAsSerializedProto(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
++ if out_type == 'immutable_proto':
++ return self._NBestEncodeAsImmutableProto(text, nbest_size,
++ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+ return [_encode(n) for n in input]
+@@ -744,7 +870,12 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ def NBestEncodeAsSerializedProto(self, input, nbest_size=None, **kwargs):
+ return self.NBestEncode(input=input, nbest_size=nbest_size,
+- out_type='proto', **kwargs)
++ out_type='serialized_proto', **kwargs)
++
++
++ def NBestEncodeAsImmutableProto(self, input, nbest_size=None, **kwargs):
++ return self.NBestEncode(input=input, nbest_size=nbest_size,
++ out_type='immutable_proto', **kwargs)
+
+
+ def SampleEncodeAndScore(self,
+@@ -755,20 +886,20 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ reverse=None,
+ emit_unk_piece=None,
+ num_samples=None,
+- theta=None,
++ alpha=None,
+ wor=None,
+ include_best=None):
+ """SampleEncodeAndScore text input to segmented ids or tokens.
+
+ Args:
+ input: input string. accepsts list of string.
+- out_type: output type. int or str or 'proto'.
++ out_type: output type. int or str or 'serialized_proto' or 'immutable_proto'
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+ num_samples: How many samples to return (Default = 1)
+- theta: inverse temperature for sampling
++ alpha: inverse temperature for sampling
+ wor: whether to sample without replacement (Default = false)
+ include_best: whether to include the best tokenization, requires wor=True (Default = false)
+ """
+@@ -785,8 +916,8 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ emit_unk_piece = self._emit_unk_piece
+ if num_samples is None:
+ num_samples = 1
+- if theta is None:
+- theta = 1.
++ if alpha is None:
++ alpha = 1.
+ if wor is None:
+ wor = False
+ if include_best is None:
+@@ -801,10 +932,10 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ def _encode(text):
+ if out_type is int:
+- return self._SampleEncodeAndScoreAsIds(text, num_samples, theta, wor, include_best,
++ return self._SampleEncodeAndScoreAsIds(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+ else:
+- return self._SampleEncodeAndScoreAsPieces(text, num_samples, theta, wor, include_best,
++ return self._SampleEncodeAndScoreAsPieces(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
+ if type(input) is list:
+@@ -817,7 +948,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ """Decode processed id or token sequences.
+
+ Args:
+- out_type: output type. str or 'proto' (Default = str)
++ out_type: output type. str or 'serialized_proto' or 'immutable_proto' (Default = str)
+ num_threads: the number of threads used in the batch processin (Default = 1).
+ """
+
+@@ -848,7 +979,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ if type(input[0][0]) is str:
+ return self._DecodePiecesBatch(input, num_threads)
+
+- if out_type == 'proto':
++ if out_type == 'serialized_proto':
+ if type(input) is int:
+ return self._DecodeIdsAsSerializedProto([input])
+ if type(input) is str:
+@@ -867,6 +998,25 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return self._DecodePiecesAsSerializedProtoBatch(input, num_threads)
+
+
++ if out_type == 'immutable_proto':
++ if type(input) is int:
++ return self._DecodeIdsAsImmutableProto([input])
++ if type(input) is str:
++ return self._DecodePiecesAsImmutableProto([input])
++
++ if type(input) is list:
++ if len(input) == 0 or type(input[0]) is int:
++ return self._DecodeIdsAsImmutableProto(input)
++ if type(input[0]) is str:
++ return self._DecodePiecesAsImmutableProto(input)
++
++ if type(input[0]) is list:
++ if len(input[0]) == 0 or type(input[0][0]) is int:
++ return self._DecodeIdsAsImmutableProtoBatch(input, num_threads)
++ if type(input[0][0]) is str:
++ return self._DecodePiecesAsImmutableProtoBatch(input, num_threads)
++
++
+ raise RuntimeError('unknown output or input type')
+ return None
+
+@@ -879,24 +1029,32 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def DecodePiecesAsSerializedProto(self, input, out_type='proto', **kwargs):
++ def DecodePiecesAsSerializedProto(self, input, out_type='serialized_proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodeIdsAsSerializedProto(self, input, out_type='serialized_proto', **kwargs):
++ return self.Decode(input=input, out_type=out_type, **kwargs)
++
++
++ def DecodePiecesAsImmutableProto(self, input, out_type='immutable_proto', **kwargs):
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def DecodeIdsAsSerializedProto(self, input, out_type='proto', **kwargs):
++ def DecodeIdsAsImmutableProto(self, input, out_type='immutable_proto', **kwargs):
+ return self.Decode(input=input, out_type=out_type, **kwargs)
+
+
+- def CalculateEntropy(self, input, theta, num_threads=None):
++ def CalculateEntropy(self, input, alpha, num_threads=None):
+ """Calculate sentence entropy"""
+ if type(input) is list:
+ if num_threads is None:
+ num_threads = self._num_threads
+ if num_threads is None or type(num_threads) is not int:
+ raise RuntimeError('num_threads must be int')
+- return self._CalculateEntropyBatch(input, theta, num_threads)
++ return self._CalculateEntropyBatch(input, alpha, num_threads)
+
+- return self._CalculateEntropy(input, theta)
++ return self._CalculateEntropy(input, alpha)
+
+
+ def piece_size(self):
+@@ -1028,6 +1186,50 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+ }
+
++%extend sentencepiece::ImmutableSentencePieceText {
++ ImmutableSentencePieceText_ImmutableSentencePiece pieces(int index) const {
++ if (index < 0 || index >= static_cast<int>($self->pieces_size())) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "piece index is out of range.");
++ }
++ return $self->pieces(index);
++ }
++
++%pythoncode {
++ def __len__(self):
++ return self.pieces_size()
++
++ def __getitem__(self, i):
++ return self.pieces(i)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
++}
++}
++
++%extend sentencepiece::ImmutableNBestSentencePieceText {
++ ImmutableSentencePieceText nbests(int index) const {
++ if (index < 0 || index >= static_cast<int>($self->nbests_size())) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "nbest index is out of range.");
++ }
++ return $self->nbests(index);
++ }
++
++%pythoncode {
++ def __len__(self):
++ return self.nbests_size()
++
++ def __getitem__(self, i):
++ return self.nbests(i)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
++}
++}
++
+ %typemap(out) std::vector<int> {
+ $result = PyList_New($1.size());
+ for (size_t i = 0; i < $1.size(); ++i) {
+@@ -1277,6 +1479,14 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+ }
+
++%typemap(out) std::vector<sentencepiece::ImmutableSentencePieceText> {
++ $result = PyList_New($1.size());
++ for (size_t i = 0; i < $1.size(); ++i) {
++ PyObject *obj = SWIG_NewPointerObj(new sentencepiece::ImmutableSentencePieceText($1.at(i)), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0);
++ PyList_SET_ITEM($result, i, obj);
++ }
++}
++
+ %typemap(in) sentencepiece::SentenceIterator * {
+ sentencepiece::SentenceIterator *out = nullptr;
+ if (PyIter_Check($input)) {
+@@ -1324,6 +1534,18 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ delete $1;
+ }
+
++%typemap(freearg) sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece {
++ delete $1;
++}
++
++%typemap(freearg) sentencepiece::ImmutableSentencePieceText {
++ delete $1;
++}
++
++%typemap(freearg) sentencepiece::ImmutableNBestSentencePieceText {
++ delete $1;
++}
++
+ %include <sentencepiece_processor.h>
+ %include <sentencepiece_trainer.h>
+
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 36ce38c..9776b0f 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -2694,17 +2694,20 @@ SWIGINTERN PyObject *SWIG_PyStaticMethod_New(PyObject *SWIGUNUSEDPARM(self), PyO
+
+ #define SWIGTYPE_p_char swig_types[0]
+ #define SWIGTYPE_p_float swig_types[1]
+-#define SWIGTYPE_p_sentencepiece__SentenceIterator swig_types[2]
+-#define SWIGTYPE_p_sentencepiece__SentencePieceProcessor swig_types[3]
+-#define SWIGTYPE_p_sentencepiece__SentencePieceTrainer swig_types[4]
+-#define SWIGTYPE_p_std__string swig_types[5]
+-#define SWIGTYPE_p_std__unordered_mapT_std__string_std__string_t swig_types[6]
+-#define SWIGTYPE_p_std__vectorT_absl__string_view_t swig_types[7]
+-#define SWIGTYPE_p_std__vectorT_int_t swig_types[8]
+-#define SWIGTYPE_p_std__vectorT_std__vectorT_absl__string_view_t_t swig_types[9]
+-#define SWIGTYPE_p_std__vectorT_std__vectorT_int_t_t swig_types[10]
+-static swig_type_info *swig_types[12];
+-static swig_module_info swig_module = {swig_types, 11, 0, 0, 0, 0};
++#define SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText swig_types[2]
++#define SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText swig_types[3]
++#define SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece swig_types[4]
++#define SWIGTYPE_p_sentencepiece__SentenceIterator swig_types[5]
++#define SWIGTYPE_p_sentencepiece__SentencePieceProcessor swig_types[6]
++#define SWIGTYPE_p_sentencepiece__SentencePieceTrainer swig_types[7]
++#define SWIGTYPE_p_std__string swig_types[8]
++#define SWIGTYPE_p_std__unordered_mapT_std__string_std__string_t swig_types[9]
++#define SWIGTYPE_p_std__vectorT_absl__string_view_t swig_types[10]
++#define SWIGTYPE_p_std__vectorT_int_t swig_types[11]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_absl__string_view_t_t swig_types[12]
++#define SWIGTYPE_p_std__vectorT_std__vectorT_int_t_t swig_types[13]
++static swig_type_info *swig_types[15];
++static swig_module_info swig_module = {swig_types, 14, 0, 0, 0, 0};
+ #define SWIG_TypeQuery(name) SWIG_TypeQueryModule(&swig_module, &swig_module, name)
+ #define SWIG_MangledTypeQuery(name) SWIG_MangledTypeQueryModule(&swig_module, &swig_module, name)
+
+@@ -2972,7 +2975,17 @@ inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
+ if (add_bos || add_eos || reverse || emit_unk_piece) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kUnimplemented,
+- "add_bos, add_eos, reverse, and emit_unk_piece is not supported in AsSerialize API");
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in proto API");
++ }
++}
++
++inline void RewriteIds(const sentencepiece::SentencePieceProcessor &sp,
++ sentencepiece::ImmutableSentencePieceText *proto,
++ bool add_bos, bool add_eos, bool reverse, bool emit_unk_piece) {
++ if (add_bos || add_eos || reverse || emit_unk_piece) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kUnimplemented,
++ "add_bos, add_eos, reverse, and emit_unk_piece is not supported in proto API");
+ }
+ }
+
+@@ -3022,7 +3035,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ #define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+ std::vector<OutType> outs(ins.size()); \
+- InitNumThreads(ins, &num_threads); \
++ InitNumThreads(ins, &num_threads); \
+ { \
+ ThreadPool pool(ins.size()); \
+ for (int n = 0; n < num_threads; ++n) { \
+@@ -3043,7 +3056,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ #define DEFINE_DECODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+ std::vector<OutType> outs(ins.size()); \
+- InitNumThreads(ins, &num_threads); \
++ InitNumThreads(ins, &num_threads); \
+ { \
+ ThreadPool pool(ins.size()); \
+ for (int n = 0; n < num_threads; ++n) { \
+@@ -3060,131 +3073,24 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ } // namespace
+
+
+-SWIGINTERN swig_type_info*
+-SWIG_pchar_descriptor(void)
++SWIGINTERNINLINE PyObject*
++ SWIG_From_unsigned_SS_int (unsigned int value)
+ {
+- static int init = 0;
+- static swig_type_info* info = 0;
+- if (!init) {
+- info = SWIG_TypeQuery("_p_char");
+- init = 1;
+- }
+- return info;
++ return PyInt_FromSize_t((size_t) value);
+ }
+
+
+-SWIGINTERN int
+-SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc)
+-{
+-#if PY_VERSION_HEX>=0x03000000
+-#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
+- if (PyBytes_Check(obj))
+-#else
+- if (PyUnicode_Check(obj))
+-#endif
+-#else
+- if (PyString_Check(obj))
+-#endif
+- {
+- char *cstr; Py_ssize_t len;
+- int ret = SWIG_OK;
+-#if PY_VERSION_HEX>=0x03000000
+-#if !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
+- if (!alloc && cptr) {
+- /* We can't allow converting without allocation, since the internal
+- representation of string in Python 3 is UCS-2/UCS-4 but we require
+- a UTF-8 representation.
+- TODO(bhy) More detailed explanation */
+- return SWIG_RuntimeError;
+- }
+- obj = PyUnicode_AsUTF8String(obj);
+- if (!obj)
+- return SWIG_TypeError;
+- if (alloc)
+- *alloc = SWIG_NEWOBJ;
+-#endif
+- if (PyBytes_AsStringAndSize(obj, &cstr, &len) == -1)
+- return SWIG_TypeError;
+-#else
+- if (PyString_AsStringAndSize(obj, &cstr, &len) == -1)
+- return SWIG_TypeError;
+-#endif
+- if (cptr) {
+- if (alloc) {
+- if (*alloc == SWIG_NEWOBJ) {
+- *cptr = reinterpret_cast< char* >(memcpy(new char[len + 1], cstr, sizeof(char)*(len + 1)));
+- *alloc = SWIG_NEWOBJ;
+- } else {
+- *cptr = cstr;
+- *alloc = SWIG_OLDOBJ;
+- }
+- } else {
+-#if PY_VERSION_HEX>=0x03000000
+-#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
+- *cptr = PyBytes_AsString(obj);
+-#else
+- assert(0); /* Should never reach here with Unicode strings in Python 3 */
+-#endif
+-#else
+- *cptr = SWIG_Python_str_AsChar(obj);
+- if (!*cptr)
+- ret = SWIG_TypeError;
+-#endif
+- }
+- }
+- if (psize) *psize = len + 1;
+-#if PY_VERSION_HEX>=0x03000000 && !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
+- Py_XDECREF(obj);
+-#endif
+- return ret;
+- } else {
+-#if defined(SWIG_PYTHON_2_UNICODE)
+-#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
+-#error "Cannot use both SWIG_PYTHON_2_UNICODE and SWIG_PYTHON_STRICT_BYTE_CHAR at once"
+-#endif
+-#if PY_VERSION_HEX<0x03000000
+- if (PyUnicode_Check(obj)) {
+- char *cstr; Py_ssize_t len;
+- if (!alloc && cptr) {
+- return SWIG_RuntimeError;
+- }
+- obj = PyUnicode_AsUTF8String(obj);
+- if (!obj)
+- return SWIG_TypeError;
+- if (PyString_AsStringAndSize(obj, &cstr, &len) != -1) {
+- if (cptr) {
+- if (alloc) *alloc = SWIG_NEWOBJ;
+- *cptr = reinterpret_cast< char* >(memcpy(new char[len + 1], cstr, sizeof(char)*(len + 1)));
+- }
+- if (psize) *psize = len + 1;
++ #define SWIG_From_long PyInt_FromLong
+
+- Py_XDECREF(obj);
+- return SWIG_OK;
+- } else {
+- Py_XDECREF(obj);
+- }
+- }
+-#endif
+-#endif
+
+- swig_type_info* pchar_descriptor = SWIG_pchar_descriptor();
+- if (pchar_descriptor) {
+- void* vptr = 0;
+- if (SWIG_ConvertPtr(obj, &vptr, pchar_descriptor, 0) == SWIG_OK) {
+- if (cptr) *cptr = (char *) vptr;
+- if (psize) *psize = vptr ? (strlen((char *)vptr) + 1) : 0;
+- if (alloc) *alloc = SWIG_OLDOBJ;
+- return SWIG_OK;
+- }
+- }
+- }
+- return SWIG_TypeError;
++SWIGINTERNINLINE PyObject*
++SWIG_From_unsigned_SS_long (unsigned long value)
++{
++ return (value > LONG_MAX) ?
++ PyLong_FromUnsignedLong(value) : PyInt_FromLong(static_cast< long >(value));
+ }
+
+
+-
+-
+-
+ #include <limits.h>
+ #if !defined(SWIG_NO_LLONG_MAX)
+ # if !defined(LLONG_MAX) && defined(__GNUC__) && defined (__LONG_LONG_MAX__)
+@@ -3195,6 +3101,47 @@ SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc)
+ #endif
+
+
++#if defined(LLONG_MAX) && !defined(SWIG_LONG_LONG_AVAILABLE)
++# define SWIG_LONG_LONG_AVAILABLE
++#endif
++
++
++#ifdef SWIG_LONG_LONG_AVAILABLE
++SWIGINTERNINLINE PyObject*
++SWIG_From_unsigned_SS_long_SS_long (unsigned long long value)
++{
++ return (value > LONG_MAX) ?
++ PyLong_FromUnsignedLongLong(value) : PyInt_FromLong(static_cast< long >(value));
++}
++#endif
++
++
++SWIGINTERNINLINE PyObject *
++SWIG_From_size_t (size_t value)
++{
++#ifdef SWIG_LONG_LONG_AVAILABLE
++ if (sizeof(size_t) <= sizeof(unsigned long)) {
++#endif
++ return SWIG_From_unsigned_SS_long (static_cast< unsigned long >(value));
++#ifdef SWIG_LONG_LONG_AVAILABLE
++ } else {
++ /* assume sizeof(size_t) <= sizeof(unsigned long long) */
++ return SWIG_From_unsigned_SS_long_SS_long (static_cast< unsigned long long >(value));
++ }
++#endif
++}
++
++
++ #define SWIG_From_double PyFloat_FromDouble
++
++
++SWIGINTERNINLINE PyObject *
++SWIG_From_float (float value)
++{
++ return SWIG_From_double (value);
++}
++
++
+ SWIGINTERN int
+ SWIG_AsVal_double (PyObject *obj, double *val)
+ {
+@@ -3335,98 +3282,215 @@ SWIG_AsVal_int (PyObject * obj, int *val)
+ return res;
+ }
+
+-
+-/* Getting isfinite working pre C99 across multiple platforms is non-trivial. Users can provide SWIG_isfinite on older platforms. */
+-#ifndef SWIG_isfinite
+-/* isfinite() is a macro for C99 */
+-# if defined(isfinite)
+-# define SWIG_isfinite(X) (isfinite(X))
+-# elif defined(__cplusplus) && __cplusplus >= 201103L
+-/* Use a template so that this works whether isfinite() is std::isfinite() or
+- * in the global namespace. The reality seems to vary between compiler
+- * versions.
+- *
+- * Make sure namespace std exists to avoid compiler warnings.
+- *
+- * extern "C++" is required as this fragment can end up inside an extern "C" { } block
+- */
+-namespace std { }
+-extern "C++" template<typename T>
+-inline int SWIG_isfinite_func(T x) {
+- using namespace std;
+- return isfinite(x);
+-}
+-# define SWIG_isfinite(X) (SWIG_isfinite_func(X))
+-# elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 2))
+-# define SWIG_isfinite(X) (__builtin_isfinite(X))
+-# elif defined(__clang__) && defined(__has_builtin)
+-# if __has_builtin(__builtin_isfinite)
+-# define SWIG_isfinite(X) (__builtin_isfinite(X))
+-# endif
+-# elif defined(_MSC_VER)
+-# define SWIG_isfinite(X) (_finite(X))
+-# elif defined(__sun) && defined(__SVR4)
+-# include <ieeefp.h>
+-# define SWIG_isfinite(X) (finite(X))
+-# endif
+-#endif
+-
+-
+-/* Accept infinite as a valid float value unless we are unable to check if a value is finite */
+-#ifdef SWIG_isfinite
+-# define SWIG_Float_Overflow_Check(X) ((X < -FLT_MAX || X > FLT_MAX) && SWIG_isfinite(X))
+-#else
+-# define SWIG_Float_Overflow_Check(X) ((X < -FLT_MAX || X > FLT_MAX))
+-#endif
+-
+-
+-SWIGINTERN int
+-SWIG_AsVal_float (PyObject * obj, float *val)
+-{
+- double v;
+- int res = SWIG_AsVal_double (obj, &v);
+- if (SWIG_IsOK(res)) {
+- if (SWIG_Float_Overflow_Check(v)) {
+- return SWIG_OverflowError;
+- } else {
+- if (val) *val = static_cast< float >(v);
++SWIGINTERN sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece sentencepiece_ImmutableSentencePieceText_pieces(sentencepiece::ImmutableSentencePieceText const *self,int index){
++ if (index < 0 || index >= static_cast<int>(self->pieces_size())) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "piece index is out of range.");
+ }
+- }
+- return res;
+-}
+-
++ return self->pieces(index);
++ }
++SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_ImmutableNBestSentencePieceText_nbests(sentencepiece::ImmutableNBestSentencePieceText const *self,int index){
++ if (index < 0 || index >= static_cast<int>(self->nbests_size())) {
++ throw sentencepiece::util::Status(
++ sentencepiece::util::StatusCode::kOutOfRange,
++ "nbest index is out of range.");
++ }
++ return self->nbests(index);
++ }
+
+-SWIGINTERN int
+-SWIG_AsVal_bool (PyObject *obj, bool *val)
++SWIGINTERN swig_type_info*
++SWIG_pchar_descriptor(void)
+ {
+- int r;
+- if (!PyBool_Check(obj))
+- return SWIG_ERROR;
+- r = PyObject_IsTrue(obj);
+- if (r == -1)
+- return SWIG_ERROR;
+- if (val) *val = r ? true : false;
+- return SWIG_OK;
+-}
+-
+-
+- #define SWIG_From_double PyFloat_FromDouble
+-
+-
+-SWIGINTERNINLINE PyObject *
+-SWIG_From_float (float value)
+-{
+- return SWIG_From_double (value);
++ static int init = 0;
++ static swig_type_info* info = 0;
++ if (!init) {
++ info = SWIG_TypeQuery("_p_char");
++ init = 1;
++ }
++ return info;
+ }
+
+
+-SWIGINTERNINLINE PyObject*
+- SWIG_From_int (int value)
++SWIGINTERN int
++SWIG_AsCharPtrAndSize(PyObject *obj, char** cptr, size_t* psize, int *alloc)
+ {
+- return PyInt_FromLong((long) value);
+-}
+-
+-
++#if PY_VERSION_HEX>=0x03000000
++#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
++ if (PyBytes_Check(obj))
++#else
++ if (PyUnicode_Check(obj))
++#endif
++#else
++ if (PyString_Check(obj))
++#endif
++ {
++ char *cstr; Py_ssize_t len;
++ int ret = SWIG_OK;
++#if PY_VERSION_HEX>=0x03000000
++#if !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
++ if (!alloc && cptr) {
++ /* We can't allow converting without allocation, since the internal
++ representation of string in Python 3 is UCS-2/UCS-4 but we require
++ a UTF-8 representation.
++ TODO(bhy) More detailed explanation */
++ return SWIG_RuntimeError;
++ }
++ obj = PyUnicode_AsUTF8String(obj);
++ if (!obj)
++ return SWIG_TypeError;
++ if (alloc)
++ *alloc = SWIG_NEWOBJ;
++#endif
++ if (PyBytes_AsStringAndSize(obj, &cstr, &len) == -1)
++ return SWIG_TypeError;
++#else
++ if (PyString_AsStringAndSize(obj, &cstr, &len) == -1)
++ return SWIG_TypeError;
++#endif
++ if (cptr) {
++ if (alloc) {
++ if (*alloc == SWIG_NEWOBJ) {
++ *cptr = reinterpret_cast< char* >(memcpy(new char[len + 1], cstr, sizeof(char)*(len + 1)));
++ *alloc = SWIG_NEWOBJ;
++ } else {
++ *cptr = cstr;
++ *alloc = SWIG_OLDOBJ;
++ }
++ } else {
++#if PY_VERSION_HEX>=0x03000000
++#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
++ *cptr = PyBytes_AsString(obj);
++#else
++ assert(0); /* Should never reach here with Unicode strings in Python 3 */
++#endif
++#else
++ *cptr = SWIG_Python_str_AsChar(obj);
++ if (!*cptr)
++ ret = SWIG_TypeError;
++#endif
++ }
++ }
++ if (psize) *psize = len + 1;
++#if PY_VERSION_HEX>=0x03000000 && !defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
++ Py_XDECREF(obj);
++#endif
++ return ret;
++ } else {
++#if defined(SWIG_PYTHON_2_UNICODE)
++#if defined(SWIG_PYTHON_STRICT_BYTE_CHAR)
++#error "Cannot use both SWIG_PYTHON_2_UNICODE and SWIG_PYTHON_STRICT_BYTE_CHAR at once"
++#endif
++#if PY_VERSION_HEX<0x03000000
++ if (PyUnicode_Check(obj)) {
++ char *cstr; Py_ssize_t len;
++ if (!alloc && cptr) {
++ return SWIG_RuntimeError;
++ }
++ obj = PyUnicode_AsUTF8String(obj);
++ if (!obj)
++ return SWIG_TypeError;
++ if (PyString_AsStringAndSize(obj, &cstr, &len) != -1) {
++ if (cptr) {
++ if (alloc) *alloc = SWIG_NEWOBJ;
++ *cptr = reinterpret_cast< char* >(memcpy(new char[len + 1], cstr, sizeof(char)*(len + 1)));
++ }
++ if (psize) *psize = len + 1;
++
++ Py_XDECREF(obj);
++ return SWIG_OK;
++ } else {
++ Py_XDECREF(obj);
++ }
++ }
++#endif
++#endif
++
++ swig_type_info* pchar_descriptor = SWIG_pchar_descriptor();
++ if (pchar_descriptor) {
++ void* vptr = 0;
++ if (SWIG_ConvertPtr(obj, &vptr, pchar_descriptor, 0) == SWIG_OK) {
++ if (cptr) *cptr = (char *) vptr;
++ if (psize) *psize = vptr ? (strlen((char *)vptr) + 1) : 0;
++ if (alloc) *alloc = SWIG_OLDOBJ;
++ return SWIG_OK;
++ }
++ }
++ }
++ return SWIG_TypeError;
++}
++
++
++
++
++
++/* Getting isfinite working pre C99 across multiple platforms is non-trivial. Users can provide SWIG_isfinite on older platforms. */
++#ifndef SWIG_isfinite
++/* isfinite() is a macro for C99 */
++# if defined(isfinite)
++# define SWIG_isfinite(X) (isfinite(X))
++# elif defined(__cplusplus) && __cplusplus >= 201103L
++/* Use a template so that this works whether isfinite() is std::isfinite() or
++ * in the global namespace. The reality seems to vary between compiler
++ * versions.
++ *
++ * Make sure namespace std exists to avoid compiler warnings.
++ *
++ * extern "C++" is required as this fragment can end up inside an extern "C" { } block
++ */
++namespace std { }
++extern "C++" template<typename T>
++inline int SWIG_isfinite_func(T x) {
++ using namespace std;
++ return isfinite(x);
++}
++# define SWIG_isfinite(X) (SWIG_isfinite_func(X))
++# elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 2))
++# define SWIG_isfinite(X) (__builtin_isfinite(X))
++# elif defined(__clang__) && defined(__has_builtin)
++# if __has_builtin(__builtin_isfinite)
++# define SWIG_isfinite(X) (__builtin_isfinite(X))
++# endif
++# elif defined(_MSC_VER)
++# define SWIG_isfinite(X) (_finite(X))
++# elif defined(__sun) && defined(__SVR4)
++# include <ieeefp.h>
++# define SWIG_isfinite(X) (finite(X))
++# endif
++#endif
++
++
++/* Accept infinite as a valid float value unless we are unable to check if a value is finite */
++#ifdef SWIG_isfinite
++# define SWIG_Float_Overflow_Check(X) ((X < -FLT_MAX || X > FLT_MAX) && SWIG_isfinite(X))
++#else
++# define SWIG_Float_Overflow_Check(X) ((X < -FLT_MAX || X > FLT_MAX))
++#endif
++
++
++SWIGINTERN int
++SWIG_AsVal_float (PyObject * obj, float *val)
++{
++ double v;
++ int res = SWIG_AsVal_double (obj, &v);
++ if (SWIG_IsOK(res)) {
++ if (SWIG_Float_Overflow_Check(v)) {
++ return SWIG_OverflowError;
++ } else {
++ if (val) *val = static_cast< float >(v);
++ }
++ }
++ return res;
++}
++
++
++SWIGINTERNINLINE PyObject*
++ SWIG_From_int (int value)
++{
++ return PyInt_FromLong((long) value);
++}
++
++
+ SWIGINTERNINLINE PyObject*
+ SWIG_From_bool (bool value)
+ {
+@@ -3436,6 +3500,20 @@ SWIGINTERNINLINE PyObject*
+ SWIGINTERN sentencepiece::util::Status sentencepiece_SentencePieceProcessor_LoadFromFile(sentencepiece::SentencePieceProcessor *self,absl::string_view arg){
+ return self->Load(arg);
+ }
++
++SWIGINTERN int
++SWIG_AsVal_bool (PyObject *obj, bool *val)
++{
++ int r;
++ if (!PyBool_Check(obj))
++ return SWIG_ERROR;
++ r = PyObject_IsTrue(obj);
++ if (r == -1)
++ return SWIG_ERROR;
++ if (val) *val = r ? true : false;
++ return SWIG_OK;
++}
++
+ SWIGINTERN std::vector< int > sentencepiece_SentencePieceProcessor__EncodeAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto ids = enable_sampling ?
+ self->SampleEncodeAsIds(text, nbest_size, alpha) :
+@@ -3457,6 +3535,13 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__Enco
+ RewriteIds(*self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
+ return proto;
+ }
++SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePieceProcessor__EncodeAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ auto proto = enable_sampling ?
++ self->SampleEncodeAsImmutableProto(text, nbest_size, alpha) :
++ self->EncodeAsImmutableProto(text);
++ RewriteIds(*self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
++ return proto;
++ }
+ SWIGINTERN std::vector< std::vector< int > > sentencepiece_SentencePieceProcessor__EncodeAsIdsBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &ins,int num_threads,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsIds,
+ absl::string_view, std::vector<int>);
+@@ -3470,6 +3555,11 @@ SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__EncodeAsSerializedPr
+ absl::string_view,
+ sentencepiece::util::bytes);
+ }
++SWIGINTERN std::vector< sentencepiece::ImmutableSentencePieceText > sentencepiece_SentencePieceProcessor__EncodeAsImmutableProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &ins,int num_threads,bool enable_sampling,int nbest_size,float alpha,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ DEFINE_ENCODE_BATCH_FUNC_IMPL(EncodeAsImmutableProto,
++ absl::string_view,
++ sentencepiece::ImmutableSentencePieceText);
++ }
+ SWIGINTERN std::string sentencepiece_SentencePieceProcessor__DecodeIds(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
+ CheckIds(ids, self->GetPieceSize());
+ return self->DecodeIds(ids);
+@@ -3485,6 +3575,14 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__Deco
+ CheckIds(pieces, self->GetPieceSize());
+ return self->DecodePiecesAsSerializedProto(pieces);
+ }
++SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePieceProcessor__DecodeIdsAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
++ CheckIds(ids, self->GetPieceSize());
++ return self->DecodeIdsAsImmutableProto(ids);
++ }
++SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePieceProcessor__DecodePiecesAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &pieces){
++ CheckIds(pieces, self->GetPieceSize());
++ return self->DecodePiecesAsImmutableProto(pieces);
++ }
+ SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodeIdsBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< int > > const &ins,int num_threads){
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIds, int, std::string);
+ }
+@@ -3499,6 +3597,10 @@ SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodePiecesAsSerial
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsSerializedProto, std::string,
+ sentencepiece::util::bytes);
+ }
++SWIGINTERN std::vector< sentencepiece::ImmutableSentencePieceText > sentencepiece_SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< absl::string_view > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePiecesAsImmutableProto, std::string,
++ sentencepiece::ImmutableSentencePieceText);
++ }
+ SWIGINTERN std::vector< std::vector< int > > sentencepiece_SentencePieceProcessor__NBestEncodeAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto idss = self->NBestEncodeAsIds(text, nbest_size);
+ for (auto &ids : idss) {
+@@ -3518,26 +3620,43 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__NBes
+ add_bos, add_eos, reverse, emit_unk_piece);
+ return self->NBestEncodeAsSerializedProto(text, nbest_size);
+ }
+-SWIGINTERN std::vector< std::pair< std::vector< int >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++SWIGINTERN sentencepiece::ImmutableNBestSentencePieceText sentencepiece_SentencePieceProcessor__NBestEncodeAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ RewriteIds(*self, static_cast<sentencepiece::ImmutableSentencePieceText *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return self->NBestEncodeAsImmutableProto(text, nbest_size);
++ }
++SWIGINTERN std::vector< std::pair< std::vector< int >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto idss = self->SampleEncodeAndScoreAsIds(text, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ for (auto &ids : idss) {
+ RewriteIds(*self, &ids.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return idss;
+ }
+-SWIGINTERN std::vector< std::pair< std::vector< std::string >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float theta,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++SWIGINTERN std::vector< std::pair< std::vector< std::string >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsPieces(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto piecess = self->SampleEncodeAndScoreAsPieces(text, num_samples,
+- theta, wor, include_best);
++ alpha, wor, include_best);
+ for (auto &pieces : piecess) {
+ RewriteIds(*self, &pieces.first, add_bos, add_eos, reverse, emit_unk_piece);
+ }
+ return piecess;
+ }
+-SWIGINTERN float sentencepiece_SentencePieceProcessor__CalculateEntropy(sentencepiece::SentencePieceProcessor *self,absl::string_view text,float theta){
+- return self->CalculateEntropy(text, theta);
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ RewriteIds(*self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return self->SampleEncodeAndScoreAsSerializedProto(text, num_samples,
++ alpha, wor, include_best);
++ }
++SWIGINTERN sentencepiece::ImmutableNBestSentencePieceText sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
++ RewriteIds(*self, static_cast<sentencepiece::util::bytes *>(nullptr),
++ add_bos, add_eos, reverse, emit_unk_piece);
++ return self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
++ alpha, wor, include_best);
++ }
++SWIGINTERN float sentencepiece_SentencePieceProcessor__CalculateEntropy(sentencepiece::SentencePieceProcessor *self,absl::string_view text,float alpha){
++ return self->CalculateEntropy(text, alpha);
+ }
+-SWIGINTERN std::vector< float > sentencepiece_SentencePieceProcessor__CalculateEntropyBatch(sentencepiece::SentencePieceProcessor *self,std::vector< absl::string_view > const &ins,float theta,int num_threads){
++SWIGINTERN std::vector< float > sentencepiece_SentencePieceProcessor__CalculateEntropyBatch(sentencepiece::SentencePieceProcessor *self,std::vector< absl::string_view > const &ins,float alpha,int num_threads){
+ std::vector<float> outs(ins.size());
+ InitNumThreads(ins, &num_threads);
+ {
+@@ -3545,7 +3664,7 @@ SWIGINTERN std::vector< float > sentencepiece_SentencePieceProcessor__CalculateE
+ for (int n = 0; n < num_threads; ++n) {
+ pool.Schedule([&, n]() {
+ for (size_t i = n; i < ins.size(); i += num_threads) {
+- outs[i] = self->CalculateEntropy(ins[i], theta);
++ outs[i] = self->CalculateEntropy(ins[i], alpha);
+ }
+ });
+ }
+@@ -3596,56 +3715,672 @@ SWIG_AsVal_unsigned_SS_long (PyObject *obj, unsigned long *val)
+ }
+ }
+ }
+-#endif
+- return SWIG_TypeError;
++#endif
++ return SWIG_TypeError;
++}
++
++
++SWIGINTERN int
++SWIG_AsVal_unsigned_SS_int (PyObject * obj, unsigned int *val)
++{
++ unsigned long v;
++ int res = SWIG_AsVal_unsigned_SS_long (obj, &v);
++ if (SWIG_IsOK(res)) {
++ if ((v > UINT_MAX)) {
++ return SWIG_OverflowError;
++ } else {
++ if (val) *val = static_cast< unsigned int >(v);
++ }
++ }
++ return res;
++}
++
++SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromString(absl::string_view arg){
++ const auto _status = sentencepiece::SentencePieceTrainer::Train(arg);
++ if (!_status.ok()) throw _status;
++ return;
++ }
++SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromMap(std::unordered_map< std::string,std::string > const &args){
++ const auto _status = sentencepiece::SentencePieceTrainer::Train(args);
++ if (!_status.ok()) throw _status;
++ return;
++ }
++SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromMap2(std::unordered_map< std::string,std::string > const &args,sentencepiece::SentenceIterator *iter){
++ const auto _status = sentencepiece::SentencePieceTrainer::Train(args, iter);
++ if (!_status.ok()) throw _status;
++ return;
++ }
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceTrainer__TrainFromMap3(std::unordered_map< std::string,std::string > const &args){
++ sentencepiece::util::bytes model_proto;
++ const auto _status = sentencepiece::SentencePieceTrainer::Train(args, nullptr, &model_proto);
++ if (!_status.ok()) throw _status;
++ return model_proto;
++ }
++SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceTrainer__TrainFromMap4(std::unordered_map< std::string,std::string > const &args,sentencepiece::SentenceIterator *iter){
++ sentencepiece::util::bytes model_proto;
++ const auto _status = sentencepiece::SentencePieceTrainer::Train(args, iter, &model_proto);
++ if (!_status.ok()) throw _status;
++ return model_proto;
++ }
++#ifdef __cplusplus
++extern "C" {
++#endif
++SWIGINTERN PyObject *_wrap_new_ImmutableSentencePieceText_ImmutableSentencePiece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *result = 0 ;
++
++ if (!SWIG_Python_UnpackTuple(args, "new_ImmutableSentencePieceText_ImmutableSentencePiece", 0, 0, 0)) SWIG_fail;
++ {
++ try {
++ result = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *)new sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_POINTER_NEW | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_delete_ImmutableSentencePieceText_ImmutableSentencePiece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_POINTER_DISOWN | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "delete_ImmutableSentencePieceText_ImmutableSentencePiece" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ delete arg1;
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_Py_Void();
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_piece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ std::string *result = 0 ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_piece" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ result = (std::string *) &((sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *)arg1)->piece();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(*result, input_type);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_surface(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ std::string *result = 0 ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_surface" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ result = (std::string *) &((sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *)arg1)->surface();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(*result, input_type);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ uint32_t result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_id" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *)arg1)->id();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_unsigned_SS_int(static_cast< unsigned int >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_begin(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ uint32_t result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_begin" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *)arg1)->begin();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_unsigned_SS_int(static_cast< unsigned int >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_end(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ uint32_t result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_end" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *)arg1)->end();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_unsigned_SS_int(static_cast< unsigned int >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *ImmutableSentencePieceText_ImmutableSentencePiece_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *obj;
++ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
++ SWIG_TypeNewClientData(SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_NewClientData(obj));
++ return SWIG_Py_Void();
++}
++
++SWIGINTERN PyObject *ImmutableSentencePieceText_ImmutableSentencePiece_swiginit(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ return SWIG_Python_InitShadowInstance(args);
++}
++
++SWIGINTERN PyObject *_wrap_new_ImmutableSentencePieceText(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *result = 0 ;
++
++ if (!SWIG_Python_UnpackTuple(args, "new_ImmutableSentencePieceText", 0, 0, 0)) SWIG_fail;
++ {
++ try {
++ result = (sentencepiece::ImmutableSentencePieceText *)new sentencepiece::ImmutableSentencePieceText();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_NEW | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_delete_ImmutableSentencePieceText(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_DISOWN | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "delete_ImmutableSentencePieceText" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ {
++ try {
++ delete arg1;
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_Py_Void();
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ size_t result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces_size" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText const *)arg1)->pieces_size();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_size_t(static_cast< size_t >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_text(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ std::string *result = 0 ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_text" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ {
++ try {
++ result = (std::string *) &((sentencepiece::ImmutableSentencePieceText const *)arg1)->text();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ PyObject *input_type = resultobj;
++ resultobj = MakePyOutputString(*result, input_type);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_score(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ float result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_score" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ {
++ try {
++ result = (float)((sentencepiece::ImmutableSentencePieceText const *)arg1)->score();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_float(static_cast< float >(result));
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_SerializeAsString(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ sentencepiece::util::bytes result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_SerializeAsString" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText const *)arg1)->SerializeAsString();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = MakePyOutputBytes(result);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ int arg2 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece result;
++
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText_pieces", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
++ {
++ try {
++ result = sentencepiece_ImmutableSentencePieceText_pieces((sentencepiece::ImmutableSentencePieceText const *)arg1,arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece(static_cast< const sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *ImmutableSentencePieceText_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *obj;
++ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
++ SWIG_TypeNewClientData(SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_NewClientData(obj));
++ return SWIG_Py_Void();
++}
++
++SWIGINTERN PyObject *ImmutableSentencePieceText_swiginit(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ return SWIG_Python_InitShadowInstance(args);
++}
++
++SWIGINTERN PyObject *_wrap_new_ImmutableNBestSentencePieceText(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *result = 0 ;
++
++ if (!SWIG_Python_UnpackTuple(args, "new_ImmutableNBestSentencePieceText", 0, 0, 0)) SWIG_fail;
++ {
++ try {
++ result = (sentencepiece::ImmutableNBestSentencePieceText *)new sentencepiece::ImmutableNBestSentencePieceText();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, SWIG_POINTER_NEW | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_delete_ImmutableNBestSentencePieceText(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, SWIG_POINTER_DISOWN | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "delete_ImmutableNBestSentencePieceText" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
++ {
++ try {
++ delete arg1;
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_Py_Void();
++ return resultobj;
++fail:
++ return NULL;
+ }
+
+
+-SWIGINTERN int
+-SWIG_AsVal_unsigned_SS_int (PyObject * obj, unsigned int *val)
+-{
+- unsigned long v;
+- int res = SWIG_AsVal_unsigned_SS_long (obj, &v);
+- if (SWIG_IsOK(res)) {
+- if ((v > UINT_MAX)) {
+- return SWIG_OverflowError;
+- } else {
+- if (val) *val = static_cast< unsigned int >(v);
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ size_t result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests_size" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableNBestSentencePieceText const *)arg1)->nbests_size();
++ ReleaseResultObject(resultobj);
+ }
+- }
+- return res;
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_From_size_t(static_cast< size_t >(result));
++ return resultobj;
++fail:
++ return NULL;
+ }
+
+-SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromString(absl::string_view arg){
+- const auto _status = sentencepiece::SentencePieceTrainer::Train(arg);
+- if (!_status.ok()) throw _status;
+- return;
++
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_SerializeAsString(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[1] ;
++ sentencepiece::util::bytes result;
++
++ if (!args) SWIG_fail;
++ swig_obj[0] = args;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_SerializeAsString" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+ }
+-SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromMap(std::unordered_map< std::string,std::string > const &args){
+- const auto _status = sentencepiece::SentencePieceTrainer::Train(args);
+- if (!_status.ok()) throw _status;
+- return;
++ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
++ {
++ try {
++ result = ((sentencepiece::ImmutableNBestSentencePieceText const *)arg1)->SerializeAsString();
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
+ }
+-SWIGINTERN void sentencepiece_SentencePieceTrainer__TrainFromMap2(std::unordered_map< std::string,std::string > const &args,sentencepiece::SentenceIterator *iter){
+- const auto _status = sentencepiece::SentencePieceTrainer::Train(args, iter);
+- if (!_status.ok()) throw _status;
+- return;
++ {
++ resultobj = MakePyOutputBytes(result);
+ }
+-SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceTrainer__TrainFromMap3(std::unordered_map< std::string,std::string > const &args){
+- sentencepiece::util::bytes model_proto;
+- const auto _status = sentencepiece::SentencePieceTrainer::Train(args, nullptr, &model_proto);
+- if (!_status.ok()) throw _status;
+- return model_proto;
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
++ int arg2 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText_nbests", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+ }
+-SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceTrainer__TrainFromMap4(std::unordered_map< std::string,std::string > const &args,sentencepiece::SentenceIterator *iter){
+- sentencepiece::util::bytes model_proto;
+- const auto _status = sentencepiece::SentencePieceTrainer::Train(args, iter, &model_proto);
+- if (!_status.ok()) throw _status;
+- return model_proto;
++ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
++ {
++ try {
++ result = sentencepiece_ImmutableNBestSentencePieceText_nbests((sentencepiece::ImmutableNBestSentencePieceText const *)arg1,arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
+ }
+-#ifdef __cplusplus
+-extern "C" {
+-#endif
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *ImmutableNBestSentencePieceText_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *obj;
++ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
++ SWIG_TypeNewClientData(SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, SWIG_NewClientData(obj));
++ return SWIG_Py_Void();
++}
++
++SWIGINTERN PyObject *ImmutableNBestSentencePieceText_swiginit(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ return SWIG_Python_InitShadowInstance(args);
++}
++
+ SWIGINTERN PyObject *_wrap_new_SentencePieceProcessor(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *result = 0 ;
+@@ -3992,165 +4727,16 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_CalculateEntropy__SWIG_0(PyObj
+ float *arg4 = (float *) 0 ;
+ void *argp1 = 0 ;
+ int res1 = 0 ;
+- float val3 ;
+- int ecode3 = 0 ;
+- void *argp4 = 0 ;
+- int res4 = 0 ;
+- sentencepiece::util::Status result;
+-
+- if ((nobjs < 4) || (nobjs > 4)) SWIG_fail;
+- res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+- if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+- }
+- arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = ustring.str();
+- }
+- ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "3"" of type '" "float""'");
+- }
+- arg3 = static_cast< float >(val3);
+- res4 = SWIG_ConvertPtr(swig_obj[3], &argp4,SWIGTYPE_p_float, 0 | 0 );
+- if (!SWIG_IsOK(res4)) {
+- SWIG_exception_fail(SWIG_ArgError(res4), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "4"" of type '" "float *""'");
+- }
+- arg4 = reinterpret_cast< float * >(argp4);
+- {
+- try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->CalculateEntropy(arg2,arg3,arg4);
+- ReleaseResultObject(resultobj);
+- }
+- catch (const sentencepiece::util::Status &status) {
+- SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+- }
+- }
+- {
+- if (!(&result)->ok()) {
+- SWIG_exception(ToSwigError((&result)->code()), (&result)->ToString().c_str());
+- }
+- resultobj = SWIG_From_bool((&result)->ok());
+- }
+- return resultobj;
+-fail:
+- return NULL;
+-}
+-
+-
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+- PyObject *resultobj = 0;
+- sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+- bool arg5 ;
+- bool arg6 ;
+- void *argp1 = 0 ;
+- int res1 = 0 ;
+- int val3 ;
+- int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- bool val5 ;
+- int ecode5 = 0 ;
+- bool val6 ;
+- int ecode6 = 0 ;
+- PyObject *swig_obj[6] ;
+- std::vector< std::pair< std::vector< std::string >,float > > result;
+-
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsPieces", 6, 6, swig_obj)) SWIG_fail;
+- res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+- if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+- }
+- arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+- {
+- const PyInputString ustring(swig_obj[1]);
+- if (!ustring.IsAvalable()) {
+- PyErr_SetString(PyExc_TypeError, "not a string");
+- SWIG_fail;
+- }
+- resultobj = ustring.input_type();
+- arg2 = ustring.str();
+- }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+- if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+- ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+- if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "5"" of type '" "bool""'");
+- }
+- arg5 = static_cast< bool >(val5);
+- ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+- if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsPieces" "', argument " "6"" of type '" "bool""'");
+- }
+- arg6 = static_cast< bool >(val6);
+- {
+- try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsPieces(arg2,arg3,arg4,arg5,arg6);
+- ReleaseResultObject(resultobj);
+- }
+- catch (const sentencepiece::util::Status &status) {
+- SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+- }
+- }
+- {
+- PyObject *input_type = resultobj;
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].first.size());
+- for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SET_ITEM(obj, j, MakePyOutputString(result[i].first[j], input_type));
+- }
+- PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
+- }
+- }
+- return resultobj;
+-fail:
+- return NULL;
+-}
+-
+-
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+- PyObject *resultobj = 0;
+- sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+- absl::string_view arg2 ;
+- int arg3 ;
+- float arg4 ;
+- bool arg5 ;
+- bool arg6 ;
+- void *argp1 = 0 ;
+- int res1 = 0 ;
+- int val3 ;
++ float val3 ;
+ int ecode3 = 0 ;
+- float val4 ;
+- int ecode4 = 0 ;
+- bool val5 ;
+- int ecode5 = 0 ;
+- bool val6 ;
+- int ecode6 = 0 ;
+- PyObject *swig_obj[6] ;
+- std::vector< std::pair< std::vector< int >,float > > result;
++ void *argp4 = 0 ;
++ int res4 = 0 ;
++ sentencepiece::util::Status result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor_SampleEncodeAndScoreAsIds", 6, 6, swig_obj)) SWIG_fail;
++ if ((nobjs < 4) || (nobjs > 4)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -4162,29 +4748,19 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyOb
+ resultobj = ustring.input_type();
+ arg2 = ustring.str();
+ }
+- ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ ecode3 = SWIG_AsVal_float(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "3"" of type '" "int""'");
+- }
+- arg3 = static_cast< int >(val3);
+- ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
+- if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "4"" of type '" "float""'");
+- }
+- arg4 = static_cast< float >(val4);
+- ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
+- if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "5"" of type '" "bool""'");
+- }
+- arg5 = static_cast< bool >(val5);
+- ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
+- if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor_SampleEncodeAndScoreAsIds" "', argument " "6"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "3"" of type '" "float""'");
+ }
+- arg6 = static_cast< bool >(val6);
++ arg3 = static_cast< float >(val3);
++ res4 = SWIG_ConvertPtr(swig_obj[3], &argp4,SWIGTYPE_p_float, 0 | 0 );
++ if (!SWIG_IsOK(res4)) {
++ SWIG_exception_fail(SWIG_ArgError(res4), "in method '" "SentencePieceProcessor_CalculateEntropy" "', argument " "4"" of type '" "float *""'");
++ }
++ arg4 = reinterpret_cast< float * >(argp4);
+ {
+ try {
+- result = ((sentencepiece::SentencePieceProcessor const *)arg1)->SampleEncodeAndScoreAsIds(arg2,arg3,arg4,arg5,arg6);
++ result = ((sentencepiece::SentencePieceProcessor const *)arg1)->CalculateEntropy(arg2,arg3,arg4);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -4192,14 +4768,10 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds(PyOb
+ }
+ }
+ {
+- resultobj = PyList_New((&result)->size());
+- for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].first.size());
+- for (size_t j = 0; j < result[i].first.size(); ++j) {
+- PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i].first[j])));
+- }
+- PyList_SET_ITEM(resultobj, i, PyTuple_Pack(2, obj, PyFloat_FromDouble(static_cast<double>(result[i].second))));
++ if (!(&result)->ok()) {
++ SWIG_exception(ToSwigError((&result)->code()), (&result)->ToString().c_str());
+ }
++ resultobj = SWIG_From_bool((&result)->ok());
+ }
+ return resultobj;
+ fail:
+@@ -5112,15 +5684,242 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProto(PyObj
+ }
+ }
+ {
+- resultobj = MakePyOutputBytes(result);
++ resultobj = MakePyOutputBytes(result);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsImmutableProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ bool arg3 ;
++ int arg4 ;
++ float arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ bool val3 ;
++ int ecode3 = 0 ;
++ int val4 ;
++ int ecode4 = 0 ;
++ float val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ PyObject *swig_obj[9] ;
++ sentencepiece::ImmutableSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsImmutableProto", 9, 9, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = ustring.str();
++ }
++ ecode3 = SWIG_AsVal_bool(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "3"" of type '" "bool""'");
++ }
++ arg3 = static_cast< bool >(val3);
++ ecode4 = SWIG_AsVal_int(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "4"" of type '" "int""'");
++ }
++ arg4 = static_cast< int >(val4);
++ ecode5 = SWIG_AsVal_float(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "5"" of type '" "float""'");
++ }
++ arg5 = static_cast< float >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsImmutableProto" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__EncodeAsImmutableProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ int arg3 ;
++ bool arg4 ;
++ int arg5 ;
++ float arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ bool val4 ;
++ int ecode4 = 0 ;
++ int val5 ;
++ int ecode5 = 0 ;
++ float val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ std::vector< std::vector< int > > result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsIdsBatch", 10, 10, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = ustring.str();
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "4"" of type '" "bool""'");
++ }
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "5"" of type '" "int""'");
++ }
++ arg5 = static_cast< int >(val5);
++ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "6"" of type '" "float""'");
++ }
++ arg6 = static_cast< float >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__EncodeAsIdsBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyObject *obj = PyList_New(result[i].size());
++ for (size_t j = 0; j < result[i].size(); ++j) {
++ PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ }
++ PyList_SET_ITEM(resultobj, i, obj);
++ }
++ }
++ {
++ delete arg2;
+ }
+ return resultobj;
+ fail:
++ {
++ delete arg2;
++ }
+ return NULL;
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ std::vector< absl::string_view > *arg2 = 0 ;
+@@ -5151,12 +5950,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SW
+ bool val10 ;
+ int ecode10 = 0 ;
+ PyObject *swig_obj[10] ;
+- std::vector< std::vector< int > > result;
++ std::vector< std::vector< std::string > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsIdsBatch", 10, 10, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsPiecesBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5182,47 +5981,47 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SW
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "3"" of type '" "int""'");
+ }
+ arg3 = static_cast< int >(val3);
+ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "4"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "4"" of type '" "bool""'");
+ }
+ arg4 = static_cast< bool >(val4);
+ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "5"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "5"" of type '" "int""'");
+ }
+ arg5 = static_cast< int >(val5);
+ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "6"" of type '" "float""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "6"" of type '" "float""'");
+ }
+ arg6 = static_cast< float >(val6);
+ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+ if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "7"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "7"" of type '" "bool""'");
+ }
+ arg7 = static_cast< bool >(val7);
+ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
+ if (!SWIG_IsOK(ecode8)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "8"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "8"" of type '" "bool""'");
+ }
+ arg8 = static_cast< bool >(val8);
+ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
+ if (!SWIG_IsOK(ecode9)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "9"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "9"" of type '" "bool""'");
+ }
+ arg9 = static_cast< bool >(val9);
+ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
+ if (!SWIG_IsOK(ecode10)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsIdsBatch" "', argument " "10"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "10"" of type '" "bool""'");
+ }
+ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__EncodeAsIdsBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsPiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5230,11 +6029,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsIdsBatch(PyObject *SW
+ }
+ }
+ {
++ PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+ PyObject *obj = PyList_New(result[i].size());
+ for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SET_ITEM(obj, j, PyInt_FromLong(static_cast<long>(result[i][j])));
++ PyList_SET_ITEM(obj, j, MakePyOutputString(result[i][j], input_type));
+ }
+ PyList_SET_ITEM(resultobj, i, obj);
+ }
+@@ -5251,7 +6051,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ std::vector< absl::string_view > *arg2 = 0 ;
+@@ -5282,12 +6082,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject
+ bool val10 ;
+ int ecode10 = 0 ;
+ PyObject *swig_obj[10] ;
+- std::vector< std::vector< std::string > > result;
++ BytesArray result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsPiecesBatch", 10, 10, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsSerializedProtoBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5313,47 +6113,47 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "3"" of type '" "int""'");
+ }
+ arg3 = static_cast< int >(val3);
+ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "4"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "4"" of type '" "bool""'");
+ }
+ arg4 = static_cast< bool >(val4);
+ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "5"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "5"" of type '" "int""'");
+ }
+ arg5 = static_cast< int >(val5);
+ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "6"" of type '" "float""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "6"" of type '" "float""'");
+ }
+ arg6 = static_cast< float >(val6);
+ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+ if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "7"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "7"" of type '" "bool""'");
+ }
+ arg7 = static_cast< bool >(val7);
+ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
+ if (!SWIG_IsOK(ecode8)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "8"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "8"" of type '" "bool""'");
+ }
+ arg8 = static_cast< bool >(val8);
+ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
+ if (!SWIG_IsOK(ecode9)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "9"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "9"" of type '" "bool""'");
+ }
+ arg9 = static_cast< bool >(val9);
+ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
+ if (!SWIG_IsOK(ecode10)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsPiecesBatch" "', argument " "10"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "10"" of type '" "bool""'");
+ }
+ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__EncodeAsPiecesBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5361,14 +6161,9 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsPiecesBatch(PyObject
+ }
+ }
+ {
+- PyObject *input_type = resultobj;
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyObject *obj = PyList_New(result[i].size());
+- for (size_t j = 0; j < result[i].size(); ++j) {
+- PyList_SET_ITEM(obj, j, MakePyOutputString(result[i][j], input_type));
+- }
+- PyList_SET_ITEM(resultobj, i, obj);
++ PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
+ }
+ }
+ {
+@@ -5383,7 +6178,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsImmutableProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+ std::vector< absl::string_view > *arg2 = 0 ;
+@@ -5414,12 +6209,12 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(
+ bool val10 ;
+ int ecode10 = 0 ;
+ PyObject *swig_obj[10] ;
+- BytesArray result;
++ SwigValueWrapper< std::vector< sentencepiece::ImmutableSentencePieceText > > result;
+
+- if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsSerializedProtoBatch", 10, 10, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__EncodeAsImmutableProtoBatch", 10, 10, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
+ {
+@@ -5445,47 +6240,47 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(
+ }
+ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
+ if (!SWIG_IsOK(ecode3)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "3"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "3"" of type '" "int""'");
+ }
+ arg3 = static_cast< int >(val3);
+ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
+ if (!SWIG_IsOK(ecode4)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "4"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "4"" of type '" "bool""'");
+ }
+ arg4 = static_cast< bool >(val4);
+ ecode5 = SWIG_AsVal_int(swig_obj[4], &val5);
+ if (!SWIG_IsOK(ecode5)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "5"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "5"" of type '" "int""'");
+ }
+ arg5 = static_cast< int >(val5);
+ ecode6 = SWIG_AsVal_float(swig_obj[5], &val6);
+ if (!SWIG_IsOK(ecode6)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "6"" of type '" "float""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "6"" of type '" "float""'");
+ }
+ arg6 = static_cast< float >(val6);
+ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
+ if (!SWIG_IsOK(ecode7)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "7"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "7"" of type '" "bool""'");
+ }
+ arg7 = static_cast< bool >(val7);
+ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
+ if (!SWIG_IsOK(ecode8)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "8"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "8"" of type '" "bool""'");
+ }
+ arg8 = static_cast< bool >(val8);
+ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
+ if (!SWIG_IsOK(ecode9)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "9"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "9"" of type '" "bool""'");
+ }
+ arg9 = static_cast< bool >(val9);
+ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
+ if (!SWIG_IsOK(ecode10)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsSerializedProtoBatch" "', argument " "10"" of type '" "bool""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__EncodeAsImmutableProtoBatch" "', argument " "10"" of type '" "bool""'");
+ }
+ arg10 = static_cast< bool >(val10);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__EncodeAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ result = sentencepiece_SentencePieceProcessor__EncodeAsImmutableProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -5495,7 +6290,8 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch(
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
++ PyObject *obj = SWIG_NewPointerObj(new sentencepiece::ImmutableSentencePieceText((&result)->at(i)), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ {
+@@ -5750,6 +6546,121 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsAsImmutableProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< int > *arg2 = 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIdsAsImmutableProto", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIdsAsImmutableProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<int> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<int>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyInt_Check(o)) {
++ (*out)[i] = static_cast<int>(PyInt_AsLong(o));
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__DecodeIdsAsImmutableProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< int > const &)*arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
++ {
++ delete arg2;
++ }
++ return resultobj;
++fail:
++ {
++ delete arg2;
++ }
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsImmutableProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< absl::string_view > *arg2 = 0 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePiecesAsImmutableProto", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePiecesAsImmutableProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<absl::string_view> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<absl::string_view>(size);
++ for (size_t i = 0; i < size; ++i) {
++ const PyInputString ustring(PyList_GetItem(swig_obj[1], i));
++ if (ustring.IsAvalable()) {
++ (*out)[i] = ustring.str();
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsImmutableProto((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< absl::string_view > const &)*arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
++ {
++ delete arg2;
++ }
++ return resultobj;
++fail:
++ {
++ delete arg2;
++ }
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -6043,7 +6954,82 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ arg3 = static_cast< int >(val3);
+ {
+ try {
+- result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< absl::string_view > > const &)*arg2,arg3);
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< absl::string_view > > const &)*arg2,arg3);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
++ }
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< std::vector< absl::string_view > > *arg2 = 0 ;
++ int arg3 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ SwigValueWrapper< std::vector< sentencepiece::ImmutableSentencePieceText > > result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch", 3, 3, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<std::vector<absl::string_view>> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::vector<absl::string_view>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ const PyInputString ustring(PyList_GetItem(o, j));
++ if (ustring.IsAvalable()) {
++ (*out)[i][j] = ustring.str();
++ } else {
++ PyErr_SetString(PyExc_TypeError,"list must contain integers");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< absl::string_view > > const &)*arg2,arg3);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -6053,7 +7039,8 @@ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto
+ {
+ resultobj = PyList_New((&result)->size());
+ for (size_t i = 0; i < (&result)->size(); ++i) {
+- PyList_SET_ITEM(resultobj, i, MakePyOutputBytes(result[i]));
++ PyObject *obj = SWIG_NewPointerObj(new sentencepiece::ImmutableSentencePieceText((&result)->at(i)), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0);
++ PyList_SET_ITEM(resultobj, i, obj);
+ }
+ }
+ return resultobj;
+@@ -6323,6 +7310,86 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__NBestEncodeAsImmutableProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ int arg3 ;
++ bool arg4 ;
++ bool arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ bool val4 ;
++ int ecode4 = 0 ;
++ bool val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ PyObject *swig_obj[7] ;
++ sentencepiece::ImmutableNBestSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__NBestEncodeAsImmutableProto", 7, 7, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = ustring.str();
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_bool(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "4"" of type '" "bool""'");
++ }
++ arg4 = static_cast< bool >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "5"" of type '" "bool""'");
++ }
++ arg5 = static_cast< bool >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__NBestEncodeAsImmutableProto" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__NBestEncodeAsImmutableProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableNBestSentencePieceText(static_cast< const sentencepiece::ImmutableNBestSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -6550,6 +7617,216 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ int arg3 ;
++ float arg4 ;
++ bool arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ float val4 ;
++ int ecode4 = 0 ;
++ bool val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ sentencepiece::util::bytes result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto", 10, 10, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = ustring.str();
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "4"" of type '" "float""'");
++ }
++ arg4 = static_cast< float >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "5"" of type '" "bool""'");
++ }
++ arg5 = static_cast< bool >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = MakePyOutputBytes(result);
++ }
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ absl::string_view arg2 ;
++ int arg3 ;
++ float arg4 ;
++ bool arg5 ;
++ bool arg6 ;
++ bool arg7 ;
++ bool arg8 ;
++ bool arg9 ;
++ bool arg10 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ float val4 ;
++ int ecode4 = 0 ;
++ bool val5 ;
++ int ecode5 = 0 ;
++ bool val6 ;
++ int ecode6 = 0 ;
++ bool val7 ;
++ int ecode7 = 0 ;
++ bool val8 ;
++ int ecode8 = 0 ;
++ bool val9 ;
++ int ecode9 = 0 ;
++ bool val10 ;
++ int ecode10 = 0 ;
++ PyObject *swig_obj[10] ;
++ sentencepiece::ImmutableNBestSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto", 10, 10, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ const PyInputString ustring(swig_obj[1]);
++ if (!ustring.IsAvalable()) {
++ PyErr_SetString(PyExc_TypeError, "not a string");
++ SWIG_fail;
++ }
++ resultobj = ustring.input_type();
++ arg2 = ustring.str();
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ ecode4 = SWIG_AsVal_float(swig_obj[3], &val4);
++ if (!SWIG_IsOK(ecode4)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode4), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "4"" of type '" "float""'");
++ }
++ arg4 = static_cast< float >(val4);
++ ecode5 = SWIG_AsVal_bool(swig_obj[4], &val5);
++ if (!SWIG_IsOK(ecode5)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode5), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "5"" of type '" "bool""'");
++ }
++ arg5 = static_cast< bool >(val5);
++ ecode6 = SWIG_AsVal_bool(swig_obj[5], &val6);
++ if (!SWIG_IsOK(ecode6)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode6), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "6"" of type '" "bool""'");
++ }
++ arg6 = static_cast< bool >(val6);
++ ecode7 = SWIG_AsVal_bool(swig_obj[6], &val7);
++ if (!SWIG_IsOK(ecode7)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode7), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "7"" of type '" "bool""'");
++ }
++ arg7 = static_cast< bool >(val7);
++ ecode8 = SWIG_AsVal_bool(swig_obj[7], &val8);
++ if (!SWIG_IsOK(ecode8)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode8), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "8"" of type '" "bool""'");
++ }
++ arg8 = static_cast< bool >(val8);
++ ecode9 = SWIG_AsVal_bool(swig_obj[8], &val9);
++ if (!SWIG_IsOK(ecode9)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode9), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "9"" of type '" "bool""'");
++ }
++ arg9 = static_cast< bool >(val9);
++ ecode10 = SWIG_AsVal_bool(swig_obj[9], &val10);
++ if (!SWIG_IsOK(ecode10)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode10), "in method '" "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto" "', argument " "10"" of type '" "bool""'");
++ }
++ arg10 = static_cast< bool >(val10);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto((sentencepiece::SentencePieceProcessor const *)arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableNBestSentencePieceText(static_cast< const sentencepiece::ImmutableNBestSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__CalculateEntropy(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -7009,6 +8286,31 @@ SWIGINTERN PyObject *SentencePieceTrainer_swigregister(PyObject *SWIGUNUSEDPARM(
+
+ static PyMethodDef SwigMethods[] = {
+ { "SWIG_PyInstanceMethod_New", SWIG_PyInstanceMethod_New, METH_O, NULL},
++ { "new_ImmutableSentencePieceText_ImmutableSentencePiece", _wrap_new_ImmutableSentencePieceText_ImmutableSentencePiece, METH_NOARGS, NULL},
++ { "delete_ImmutableSentencePieceText_ImmutableSentencePiece", _wrap_delete_ImmutableSentencePieceText_ImmutableSentencePiece, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_piece", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_piece, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_surface", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_surface, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_id", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_id, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_begin", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_begin, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_end", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_end, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_swigregister", ImmutableSentencePieceText_ImmutableSentencePiece_swigregister, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece_swiginit", ImmutableSentencePieceText_ImmutableSentencePiece_swiginit, METH_VARARGS, NULL},
++ { "new_ImmutableSentencePieceText", _wrap_new_ImmutableSentencePieceText, METH_NOARGS, NULL},
++ { "delete_ImmutableSentencePieceText", _wrap_delete_ImmutableSentencePieceText, METH_O, NULL},
++ { "ImmutableSentencePieceText_pieces_size", _wrap_ImmutableSentencePieceText_pieces_size, METH_O, NULL},
++ { "ImmutableSentencePieceText_text", _wrap_ImmutableSentencePieceText_text, METH_O, NULL},
++ { "ImmutableSentencePieceText_score", _wrap_ImmutableSentencePieceText_score, METH_O, NULL},
++ { "ImmutableSentencePieceText_SerializeAsString", _wrap_ImmutableSentencePieceText_SerializeAsString, METH_O, NULL},
++ { "ImmutableSentencePieceText_pieces", _wrap_ImmutableSentencePieceText_pieces, METH_VARARGS, NULL},
++ { "ImmutableSentencePieceText_swigregister", ImmutableSentencePieceText_swigregister, METH_O, NULL},
++ { "ImmutableSentencePieceText_swiginit", ImmutableSentencePieceText_swiginit, METH_VARARGS, NULL},
++ { "new_ImmutableNBestSentencePieceText", _wrap_new_ImmutableNBestSentencePieceText, METH_NOARGS, NULL},
++ { "delete_ImmutableNBestSentencePieceText", _wrap_delete_ImmutableNBestSentencePieceText, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText_nbests_size", _wrap_ImmutableNBestSentencePieceText_nbests_size, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText_SerializeAsString", _wrap_ImmutableNBestSentencePieceText_SerializeAsString, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText_nbests", _wrap_ImmutableNBestSentencePieceText_nbests, METH_VARARGS, NULL},
++ { "ImmutableNBestSentencePieceText_swigregister", ImmutableNBestSentencePieceText_swigregister, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText_swiginit", ImmutableNBestSentencePieceText_swiginit, METH_VARARGS, NULL},
+ { "new_SentencePieceProcessor", _wrap_new_SentencePieceProcessor, METH_NOARGS, NULL},
+ { "delete_SentencePieceProcessor", _wrap_delete_SentencePieceProcessor, METH_O, NULL},
+ { "SentencePieceProcessor_LoadFromSerializedProto", _wrap_SentencePieceProcessor_LoadFromSerializedProto, METH_VARARGS, NULL},
+@@ -7017,8 +8319,6 @@ static PyMethodDef SwigMethods[] = {
+ { "SentencePieceProcessor_SetVocabulary", _wrap_SentencePieceProcessor_SetVocabulary, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_ResetVocabulary", _wrap_SentencePieceProcessor_ResetVocabulary, METH_O, NULL},
+ { "SentencePieceProcessor_LoadVocabulary", _wrap_SentencePieceProcessor_LoadVocabulary, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_SampleEncodeAndScoreAsPieces", _wrap_SentencePieceProcessor_SampleEncodeAndScoreAsPieces, METH_VARARGS, NULL},
+- { "SentencePieceProcessor_SampleEncodeAndScoreAsIds", _wrap_SentencePieceProcessor_SampleEncodeAndScoreAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_CalculateEntropy", _wrap_SentencePieceProcessor_CalculateEntropy, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_GetPieceSize", _wrap_SentencePieceProcessor_GetPieceSize, METH_O, NULL},
+ { "SentencePieceProcessor_PieceToId", _wrap_SentencePieceProcessor_PieceToId, METH_VARARGS, NULL},
+@@ -7037,22 +8337,30 @@ static PyMethodDef SwigMethods[] = {
+ { "SentencePieceProcessor__EncodeAsIds", _wrap_SentencePieceProcessor__EncodeAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsPieces", _wrap_SentencePieceProcessor__EncodeAsPieces, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsSerializedProto", _wrap_SentencePieceProcessor__EncodeAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsImmutableProto", _wrap_SentencePieceProcessor__EncodeAsImmutableProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsIdsBatch", _wrap_SentencePieceProcessor__EncodeAsIdsBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsPiecesBatch", _wrap_SentencePieceProcessor__EncodeAsPiecesBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__EncodeAsSerializedProtoBatch", _wrap_SentencePieceProcessor__EncodeAsSerializedProtoBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__EncodeAsImmutableProtoBatch", _wrap_SentencePieceProcessor__EncodeAsImmutableProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIds", _wrap_SentencePieceProcessor__DecodeIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePieces", _wrap_SentencePieceProcessor__DecodePieces, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIdsAsSerializedProto", _wrap_SentencePieceProcessor__DecodeIdsAsSerializedProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesAsSerializedProto", _wrap_SentencePieceProcessor__DecodePiecesAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIdsAsImmutableProto", _wrap_SentencePieceProcessor__DecodeIdsAsImmutableProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePiecesAsImmutableProto", _wrap_SentencePieceProcessor__DecodePiecesAsImmutableProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIdsBatch", _wrap_SentencePieceProcessor__DecodeIdsBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesBatch", _wrap_SentencePieceProcessor__DecodePiecesBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch", _wrap_SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__NBestEncodeAsIds", _wrap_SentencePieceProcessor__NBestEncodeAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__NBestEncodeAsPieces", _wrap_SentencePieceProcessor__NBestEncodeAsPieces, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__NBestEncodeAsSerializedProto", _wrap_SentencePieceProcessor__NBestEncodeAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__NBestEncodeAsImmutableProto", _wrap_SentencePieceProcessor__NBestEncodeAsImmutableProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__SampleEncodeAndScoreAsIds", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsIds, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__SampleEncodeAndScoreAsPieces", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsPieces, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsSerializedProto, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto", _wrap_SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__CalculateEntropy", _wrap_SentencePieceProcessor__CalculateEntropy, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__CalculateEntropyBatch", _wrap_SentencePieceProcessor__CalculateEntropyBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor_swigregister", SentencePieceProcessor_swigregister, METH_O, NULL},
+@@ -7076,6 +8384,9 @@ static PyMethodDef SwigMethods_proxydocs[] = {
+
+ static swig_type_info _swigt__p_char = {"_p_char", "char *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_float = {"_p_float", "float *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_sentencepiece__ImmutableNBestSentencePieceText = {"_p_sentencepiece__ImmutableNBestSentencePieceText", "sentencepiece::ImmutableNBestSentencePieceText *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_sentencepiece__ImmutableSentencePieceText = {"_p_sentencepiece__ImmutableSentencePieceText", "sentencepiece::ImmutableSentencePieceText *", 0, 0, (void*)0, 0};
++static swig_type_info _swigt__p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece = {"_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece", "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentenceIterator = {"_p_sentencepiece__SentenceIterator", "sentencepiece::SentenceIterator *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentencePieceProcessor = {"_p_sentencepiece__SentencePieceProcessor", "sentencepiece::SentencePieceProcessor *", 0, 0, (void*)0, 0};
+ static swig_type_info _swigt__p_sentencepiece__SentencePieceTrainer = {"_p_sentencepiece__SentencePieceTrainer", "sentencepiece::SentencePieceTrainer *", 0, 0, (void*)0, 0};
+@@ -7089,6 +8400,9 @@ static swig_type_info _swigt__p_std__vectorT_std__vectorT_int_t_t = {"_p_std__ve
+ static swig_type_info *swig_type_initial[] = {
+ &_swigt__p_char,
+ &_swigt__p_float,
++ &_swigt__p_sentencepiece__ImmutableNBestSentencePieceText,
++ &_swigt__p_sentencepiece__ImmutableSentencePieceText,
++ &_swigt__p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece,
+ &_swigt__p_sentencepiece__SentenceIterator,
+ &_swigt__p_sentencepiece__SentencePieceProcessor,
+ &_swigt__p_sentencepiece__SentencePieceTrainer,
+@@ -7102,6 +8416,9 @@ static swig_type_info *swig_type_initial[] = {
+
+ static swig_cast_info _swigc__p_char[] = { {&_swigt__p_char, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_float[] = { {&_swigt__p_float, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_sentencepiece__ImmutableNBestSentencePieceText[] = { {&_swigt__p_sentencepiece__ImmutableNBestSentencePieceText, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_sentencepiece__ImmutableSentencePieceText[] = { {&_swigt__p_sentencepiece__ImmutableSentencePieceText, 0, 0, 0},{0, 0, 0, 0}};
++static swig_cast_info _swigc__p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece[] = { {&_swigt__p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentenceIterator[] = { {&_swigt__p_sentencepiece__SentenceIterator, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentencePieceProcessor[] = { {&_swigt__p_sentencepiece__SentencePieceProcessor, 0, 0, 0},{0, 0, 0, 0}};
+ static swig_cast_info _swigc__p_sentencepiece__SentencePieceTrainer[] = { {&_swigt__p_sentencepiece__SentencePieceTrainer, 0, 0, 0},{0, 0, 0, 0}};
+@@ -7115,6 +8432,9 @@ static swig_cast_info _swigc__p_std__vectorT_std__vectorT_int_t_t[] = { {&_swig
+ static swig_cast_info *swig_cast_initial[] = {
+ _swigc__p_char,
+ _swigc__p_float,
++ _swigc__p_sentencepiece__ImmutableNBestSentencePieceText,
++ _swigc__p_sentencepiece__ImmutableSentencePieceText,
++ _swigc__p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece,
+ _swigc__p_sentencepiece__SentenceIterator,
+ _swigc__p_sentencepiece__SentencePieceProcessor,
+ _swigc__p_sentencepiece__SentencePieceTrainer,
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 6c48bcd..2f2c84a 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -287,16 +287,44 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ ids2 = self.sp_.EncodeAsIds(text2)
+ pieces = self.sp_.EncodeAsPieces(text)
+ pieces2 = self.sp_.EncodeAsPieces(text2)
+- protos = self.sp_.EncodeAsSerializedProto(text)
+- proto2 = self.sp_.EncodeAsSerializedProto(text2)
++ sprotos = self.sp_.EncodeAsSerializedProto(text)
++ sproto2 = self.sp_.EncodeAsSerializedProto(text2)
++ iprotos = self.sp_.EncodeAsImmutableProto(text)
++ iprotos2 = self.sp_.EncodeAsImmutableProto(text2)
+
+ self.assertEqual(sp.encode(text, out_type=int), ids)
+ self.assertEqual(sp.encode(text, out_type=str), pieces)
+- self.assertEqual(sp.encode(text, out_type='proto'), protos)
++ self.assertEqual(sp.encode(text, out_type='serialized_proto'), sprotos)
++ self.assertEqual(sp.encode(text, out_type='immutable_proto'), iprotos)
+
+ self.assertEqual(sp.encode([text], out_type=int), [ids])
+ self.assertEqual(sp.encode([text], out_type=str), [pieces])
+- self.assertEqual(sp.encode([text], out_type='proto'), [protos])
++ self.assertEqual(sp.encode([text], out_type='serialized_proto'), [sprotos])
++ self.assertEqual(sp.encode([text], out_type='immutable_proto'), [iprotos])
++
++ self.assertEqual(len(iprotos), len(pieces))
++ self.assertEqual(len(iprotos), len(ids))
++ self.assertEqual(iprotos.text(), text)
++
++ self.assertEqual(len(iprotos2), len(pieces2))
++ self.assertEqual(len(iprotos2), len(ids2))
++ self.assertEqual(iprotos2.text(), text2)
++
++ for i in range(len(iprotos)):
++ self.assertEqual(ids[i], iprotos.pieces(i).id())
++ self.assertEqual(pieces[i], iprotos.pieces(i).piece())
++
++ for i, piece in enumerate(iprotos):
++ self.assertEqual(ids[i], piece.id())
++ self.assertEqual(pieces[i], piece.piece())
++
++ for i in range(len(iprotos2)):
++ self.assertEqual(ids2[i], iprotos2.pieces(i).id())
++ self.assertEqual(pieces2[i], iprotos2.pieces(i).piece())
++
++ for i, piece in enumerate(iprotos2):
++ self.assertEqual(ids2[i], piece.id())
++ self.assertEqual(pieces2[i], piece.piece())
+
+ detok_ids = self.sp_.DecodeIds(ids)
+ detok_pieces = self.sp_.DecodePieces(pieces)
+@@ -464,19 +492,29 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(d1, d4)
+ self.assertEqual(d1, d5)
+
+- r1 = sp.encode(texts, out_type='proto', num_threads=None)
+- r2 = sp.encode(texts, out_type='proto', num_threads=1)
+- r3 = sp.encode(texts, out_type='proto', num_threads=-1)
+- r4 = sp.encode(texts, out_type='proto', num_threads=8)
+- r5 = [sp.encode(s, out_type='proto') for s in texts]
++ r1 = sp.encode(texts, out_type='serialized_proto', num_threads=None)
++ r2 = sp.encode(texts, out_type='serialized_proto', num_threads=1)
++ r3 = sp.encode(texts, out_type='serialized_proto', num_threads=-1)
++ r4 = sp.encode(texts, out_type='serialized_proto', num_threads=8)
++ r5 = [sp.encode(s, out_type='serialized_proto') for s in texts]
++ self.assertEqual(r1, r2)
++ self.assertEqual(r1, r3)
++ self.assertEqual(r1, r4)
++ self.assertEqual(r1, r5)
++
++ r1 = sp.encode(texts, out_type='immutable_proto', num_threads=None)
++ r2 = sp.encode(texts, out_type='immutable_proto', num_threads=1)
++ r3 = sp.encode(texts, out_type='immutable_proto', num_threads=-1)
++ r4 = sp.encode(texts, out_type='immutable_proto', num_threads=8)
++ r5 = [sp.encode(s, out_type='immutable_proto') for s in texts]
+ self.assertEqual(r1, r2)
+ self.assertEqual(r1, r3)
+ self.assertEqual(r1, r4)
+ self.assertEqual(r1, r5)
+
+- e1 = sp.calculate_entropy(texts, theta=1.0, num_threads=10)
+- e2 = sp.CalculateEntropy(texts, theta=1.0, num_threads=10)
+- e3 = [sp.calculate_entropy(s, theta=1.0) for s in texts]
++ e1 = sp.calculate_entropy(texts, alpha=1.0, num_threads=10)
++ e2 = sp.CalculateEntropy(texts, alpha=1.0, num_threads=10)
++ e3 = [sp.calculate_entropy(s, alpha=1.0) for s in texts]
+ self.assertEqual(e1, e2)
+ self.assertEqual(e1, e3)
+
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index 805e0f9..482a45b 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -54,65 +54,70 @@ std::vector<absl::string_view> ToPieceArray(const std::vector<std::string> &v) {
+ for (int i = 0; i < v.size(); ++i) out[i] = v[i];
+ return out;
+ }
++
+ } // namespace
+
+-ImmutableSentencePieceText::ImmutableSentencePieceText() {}
+-ImmutableSentencePieceText::~ImmutableSentencePieceText() {}
++ImmutableSentencePieceText::ImmutableSentencePieceText()
++ : spt_(&SentencePieceText::default_instance()) {}
+
+ ImmutableSentencePieceText::ImmutableSentencePieceText(
+ const SentencePieceText &spt)
+ : spt_(&spt) {}
+
+-ImmutableSentencePieceText::ImmutableSentencePiece::ImmutableSentencePiece(
+- const SentencePieceText_SentencePiece &sp)
++ImmutableSentencePieceText::~ImmutableSentencePieceText() {}
++
++ImmutableSentencePieceText_ImmutableSentencePiece::
++ ImmutableSentencePieceText_ImmutableSentencePiece()
++ : sp_(&SentencePieceText_SentencePiece::default_instance()) {}
++
++ImmutableSentencePieceText_ImmutableSentencePiece::
++ ImmutableSentencePieceText_ImmutableSentencePiece(
++ const SentencePieceText_SentencePiece &sp)
+ : sp_(&sp) {}
+
+-const std::string &ImmutableSentencePieceText::ImmutableSentencePiece::piece()
++const std::string &ImmutableSentencePieceText_ImmutableSentencePiece::piece()
+ const {
+ return sp_->piece();
+ }
+
+-const std::string &ImmutableSentencePieceText::ImmutableSentencePiece::surface()
++const std::string &ImmutableSentencePieceText_ImmutableSentencePiece::surface()
+ const {
+ return sp_->surface();
+ }
+
+-uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::id() const {
++uint32_t ImmutableSentencePieceText_ImmutableSentencePiece::id() const {
+ return sp_->id();
+ }
+
+-uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::begin() const {
++uint32_t ImmutableSentencePieceText_ImmutableSentencePiece::begin() const {
+ return sp_->begin();
+ }
+
+-uint32_t ImmutableSentencePieceText::ImmutableSentencePiece::end() const {
++uint32_t ImmutableSentencePieceText_ImmutableSentencePiece::end() const {
+ return sp_->end();
+ }
+
+-std::vector<ImmutableSentencePieceText::ImmutableSentencePiece>
++std::vector<ImmutableSentencePieceText_ImmutableSentencePiece>
+ ImmutableSentencePieceText::pieces() const {
+- std::vector<ImmutableSentencePieceText::ImmutableSentencePiece> pieces;
+- if (spt_ == nullptr) return pieces;
+- pieces.reserve(spt_->pieces_size());
++ std::vector<ImmutableSentencePieceText_ImmutableSentencePiece> pieces(
++ spt_->pieces_size());
+ for (int i = 0; i < spt_->pieces_size(); ++i)
+- pieces[i] = ImmutableSentencePiece(spt_->pieces(i));
++ pieces[i] =
++ ImmutableSentencePieceText_ImmutableSentencePiece(spt_->pieces(i));
+ return pieces;
+ }
+
+ size_t ImmutableSentencePieceText::pieces_size() const {
+- return spt_ ? spt_->pieces_size() : 0;
++ return spt_->pieces_size();
+ }
+
+-ImmutableSentencePieceText::ImmutableSentencePiece
++ImmutableSentencePieceText_ImmutableSentencePiece
+ ImmutableSentencePieceText::pieces(int index) const {
+- return ImmutableSentencePieceText::ImmutableSentencePiece(
+- spt_->pieces(index));
++ return ImmutableSentencePieceText_ImmutableSentencePiece(spt_->pieces(index));
+ }
+
+ const std::string &ImmutableSentencePieceText::text() const {
+- if (spt_) return spt_->text();
+- static std::string *kEmptyString = new std::string();
+- return *kEmptyString;
++ return spt_->text();
+ }
+
+ float ImmutableSentencePieceText::score() const {
+@@ -127,8 +132,8 @@ SentencePieceText *ImmutableSentencePieceText::mutable_proto() {
+ return rep_.get();
+ }
+
+-std::string ImmutableSentencePieceText::SerializeAsString() const {
+- return spt_ ? spt_->SerializeAsString() : "";
++util::bytes ImmutableSentencePieceText::SerializeAsString() const {
++ return spt_->SerializeAsString();
+ }
+
+ ImmutableNBestSentencePieceText::ImmutableNBestSentencePieceText() {}
+@@ -145,9 +150,8 @@ ImmutableSentencePieceText ImmutableNBestSentencePieceText::nbests(
+
+ std::vector<ImmutableSentencePieceText>
+ ImmutableNBestSentencePieceText::nbests() const {
+- std::vector<ImmutableSentencePieceText> nbests;
+- if (rep_ == nullptr) return nbests;
+- nbests.reserve(rep_->nbests_size());
++ if (rep_ == nullptr) return {};
++ std::vector<ImmutableSentencePieceText> nbests(rep_->nbests_size());
+ for (int i = 0; i < rep_->nbests_size(); ++i)
+ nbests[i] = ImmutableSentencePieceText(rep_->nbests(i));
+ return nbests;
+@@ -160,7 +164,7 @@ NBestSentencePieceText *ImmutableNBestSentencePieceText::mutable_proto() {
+ return rep_.get();
+ }
+
+-std::string ImmutableNBestSentencePieceText::SerializeAsString() const {
++util::bytes ImmutableNBestSentencePieceText::SerializeAsString() const {
+ return rep_ ? rep_->SerializeAsString() : "";
+ }
+
+@@ -1044,8 +1048,35 @@ std::string SentencePieceProcessor::serialized_model_proto() const {
+ // std::random_device.
+ void SetRandomGeneratorSeed(unsigned int seed);
+
+-namespace io {
++void ConvertToUnicodeSpans(SentencePieceText *spt) {
++ if (spt == nullptr) return;
++
++ std::vector<int> utf8_to_unicode(spt->text().size() + 1, 0);
++ absl::string_view str = spt->text();
++ size_t prev = 0;
++ int ulen = 0;
++ while (!str.empty()) {
++ const size_t mblen = string_util::OneCharLen(str.data());
++ for (int i = prev; i < prev + mblen; ++i) {
++ utf8_to_unicode[i] = ulen;
++ }
++ ++ulen;
++ prev += mblen;
++ str.remove_prefix(mblen);
++ }
++ utf8_to_unicode[prev] = ulen;
++
++ auto clip = [&](int s) {
++ return std::min<int>(std::max<int>(0, s), utf8_to_unicode.size() - 1);
++ };
+
++ for (auto &piece : *(spt->mutable_pieces())) {
++ piece.set_begin(utf8_to_unicode[clip(piece.begin())]);
++ piece.set_end(utf8_to_unicode[clip(piece.end())]);
++ }
++}
++
++namespace io {
+ util::Status LoadModelProto(absl::string_view filename,
+ ModelProto *model_proto) {
+ if (filename.empty()) {
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index 8124c59..b7fae6a 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -157,35 +157,39 @@ class SentencePieceText_SentencePiece;
+ // This wrapper only allows an immutable access to the proto and
+ // hides the actual implementation of protobuf.
+ // See sentencepiece.proto for the details of this class.
++class ImmutableSentencePieceText_ImmutableSentencePiece {
++ public:
++ ImmutableSentencePieceText_ImmutableSentencePiece();
++ ~ImmutableSentencePieceText_ImmutableSentencePiece() = default;
++
++ const std::string &piece() const;
++ const std::string &surface() const;
++ uint32_t id() const;
++ uint32_t begin() const;
++ uint32_t end() const;
++
++ friend class ImmutableSentencePieceText;
++
++ private:
++ explicit ImmutableSentencePieceText_ImmutableSentencePiece(
++ const SentencePieceText_SentencePiece &sp);
++ const SentencePieceText_SentencePiece *sp_ = nullptr;
++};
++
+ class ImmutableSentencePieceText {
+ public:
+ ImmutableSentencePieceText();
+ virtual ~ImmutableSentencePieceText();
+
+- class ImmutableSentencePiece {
+- public:
+- ~ImmutableSentencePiece() = default;
+- const std::string &piece() const;
+- const std::string &surface() const;
+- uint32_t id() const;
+- uint32_t begin() const;
+- uint32_t end() const;
++ std::vector<ImmutableSentencePieceText_ImmutableSentencePiece> pieces() const;
+
+- friend class ImmutableSentencePieceText;
+-
+- private:
+- ImmutableSentencePiece() = default;
+- explicit ImmutableSentencePiece(const SentencePieceText_SentencePiece &sp);
+- const SentencePieceText_SentencePiece *sp_ = nullptr;
+- };
+-
+- std::vector<ImmutableSentencePiece> pieces() const;
+ size_t pieces_size() const;
+- ImmutableSentencePiece pieces(int index) const;
++ ImmutableSentencePieceText_ImmutableSentencePiece pieces(int index) const;
++
+ const std::string &text() const;
+ float score() const;
+
+- std::string SerializeAsString() const;
++ util::bytes SerializeAsString() const;
+
+ // Returns the actual mutable proto.
+ // Do not use this outside of SentencePieceProcessor, as
+@@ -214,7 +218,7 @@ class ImmutableNBestSentencePieceText {
+ size_t nbests_size() const;
+ ImmutableSentencePieceText nbests(int index) const;
+
+- std::string SerializeAsString() const;
++ util::bytes SerializeAsString() const;
+
+ // Returns the actual mutable proto.
+ // Do not use this outside of SentencePieceProcessor, as
+@@ -398,7 +402,7 @@ class SentencePieceProcessor {
+ float alpha, SentencePieceText *spt) const;
+
+ virtual util::Status SampleEncodeAndScore(
+- absl::string_view input, int samples, float alpha, bool wor,
++ absl::string_view input, int num_samples, float alpha, bool wor,
+ bool include_best, NBestSentencePieceText *samples_spt) const;
+
+ // DEPRECATED: Remove this API and use std::vector<std::string_view>
+@@ -534,11 +538,11 @@ class SentencePieceProcessor {
+ }
+
+ virtual util::bytes SampleEncodeAndScoreAsSerializedProto(
+- absl::string_view input, int samples, float alpha, bool wor,
+- bool include_best, int nbest_size) const {
++ absl::string_view input, int num_samples, float alpha, bool wor,
++ bool include_best) const {
+ DEFINE_SPP_SERIALIZED_PROTO_IMPL(SampleEncodeAndScore,
+ ImmutableNBestSentencePieceText, input,
+- samples, alpha, wor, include_best);
++ num_samples, alpha, wor, include_best);
+ }
+
+ // TODO(taku): Remove this API and use std::vector<std::string_view>
+@@ -579,11 +583,11 @@ class SentencePieceProcessor {
+ }
+
+ virtual ImmutableNBestSentencePieceText SampleEncodeAndScoreAsImmutableProto(
+- absl::string_view input, int samples, float alpha, bool wor,
+- bool include_best, int nbest_size) const {
++ absl::string_view input, int num_samples, float alpha, bool wor,
++ bool include_best) const {
+ DEFINE_SPP_IMMUTABLE_PROTO_IMPL(SampleEncodeAndScore,
+ ImmutableNBestSentencePieceText, input,
+- samples, alpha, wor, include_best);
++ num_samples, alpha, wor, include_best);
+ }
+
+ // TODO(taku): Remove this API and use std::vector<std::string_view>
+@@ -703,6 +707,9 @@ class SentencePieceProcessor {
+ // std::random_device.
+ void SetRandomGeneratorSeed(unsigned int seed);
+
++// Converts the utf8 byte spans into Unicode char span.
++void ConvertToUnicodeSpans(SentencePieceText *spt);
++
+ #ifndef SWIG
+ // IO related functions to absorb model formats.
+ namespace io {
+diff --git a/src/sentencepiece_processor_test.cc b/src/sentencepiece_processor_test.cc
+index ed651f7..ff55aeb 100644
+--- a/src/sentencepiece_processor_test.cc
++++ b/src/sentencepiece_processor_test.cc
+@@ -1564,6 +1564,10 @@ TEST(SentencePieceProcessorTest, VocabularyTest) {
+
+ TEST(SentencePieceProcessorTest, ImmutableSentencePieceTextTest) {
+ ImmutableSentencePieceText spt;
++ EXPECT_TRUE(spt.text().empty());
++ EXPECT_EQ(spt.score(), 0.0);
++ EXPECT_TRUE(spt.SerializeAsString().empty());
++
+ auto *v = spt.mutable_proto();
+
+ v->set_text("hello world");
+@@ -1586,52 +1590,123 @@ TEST(SentencePieceProcessorTest, ImmutableSentencePieceTextTest) {
+ EXPECT_EQ(v->pieces(i).end(), spt.pieces(i).end());
+ }
+
+- int n = 0;
+- for (auto &p : spt.pieces()) {
+- EXPECT_EQ(v->pieces(n).surface(), p.surface());
+- EXPECT_EQ(v->pieces(n).piece(), p.piece());
+- EXPECT_EQ(v->pieces(n).id(), p.id());
+- EXPECT_EQ(v->pieces(n).begin(), p.begin());
+- EXPECT_EQ(v->pieces(n).end(), p.end());
+- ++n;
+- }
+-
+- EXPECT_EQ(v->text(), spt.text());
+- EXPECT_EQ(v->score(), spt.score());
+- EXPECT_EQ(v->SerializeAsString(), spt.SerializeAsString());
++ auto check_proto = [&v](const ImmutableSentencePieceText &s) {
++ int n = 0;
++ for (auto &p : s.pieces()) {
++ EXPECT_EQ(v->pieces(n).surface(), p.surface());
++ EXPECT_EQ(v->pieces(n).piece(), p.piece());
++ EXPECT_EQ(v->pieces(n).id(), p.id());
++ EXPECT_EQ(v->pieces(n).begin(), p.begin());
++ EXPECT_EQ(v->pieces(n).end(), p.end());
++ ++n;
++ }
++ EXPECT_EQ(v->text(), s.text());
++ EXPECT_EQ(v->score(), s.score());
++ EXPECT_EQ(v->SerializeAsString(), s.SerializeAsString());
++ };
+
+ // test copy.
+- auto spt2 = spt;
+- EXPECT_EQ(spt2.pieces_size(), spt.pieces_size());
+- for (int i = 0; i < spt.pieces_size(); ++i) {
+- EXPECT_EQ(spt2.pieces(i).surface(), spt.pieces(i).surface());
+- EXPECT_EQ(spt2.pieces(i).piece(), spt.pieces(i).piece());
+- EXPECT_EQ(spt2.pieces(i).id(), spt.pieces(i).id());
+- EXPECT_EQ(spt2.pieces(i).begin(), spt.pieces(i).begin());
+- EXPECT_EQ(spt2.pieces(i).end(), spt.pieces(i).end());
+- }
++ const auto spt2 = spt;
++ check_proto(spt2);
++
++ // test assign.
++ const ImmutableSentencePieceText spt3(spt);
++ check_proto(spt3);
++
++ // default piece.
++ const ImmutableSentencePieceText_ImmutableSentencePiece piece;
++ EXPECT_TRUE(piece.surface().empty());
++ EXPECT_TRUE(piece.piece().empty());
++ EXPECT_EQ(piece.begin(), 0);
++ EXPECT_EQ(piece.end(), 0);
++ EXPECT_EQ(piece.id(), 0);
+ }
+
+ TEST(SentencePieceProcessorTest, ImmutableNBestSentencePieceTextTest) {
+ ImmutableNBestSentencePieceText spt;
++ EXPECT_EQ(spt.nbests_size(), 0);
++ EXPECT_TRUE(spt.SerializeAsString().empty());
++
+ auto *v = spt.mutable_proto();
++
+ for (int i = 0; i < 10; ++i) {
+ auto *p = v->add_nbests();
+ p->set_text(absl::StrCat("text_", i));
+ p->set_score(2.0 * i);
+ }
+
+- EXPECT_EQ(v->nbests_size(), spt.nbests_size());
+- for (int i = 0; i < v->nbests_size(); ++i) {
+- EXPECT_EQ(v->nbests(i).text(), spt.nbests(i).text());
+- EXPECT_EQ(v->nbests(i).score(), spt.nbests(i).score());
+- }
+- EXPECT_EQ(v->SerializeAsString(), spt.SerializeAsString());
++ auto check_proto = [&v](const ImmutableNBestSentencePieceText &s) {
++ EXPECT_EQ(v->nbests_size(), s.nbests_size());
++ for (int i = 0; i < v->nbests_size(); ++i) {
++ EXPECT_EQ(v->nbests(i).text(), s.nbests(i).text());
++ EXPECT_EQ(v->nbests(i).score(), s.nbests(i).score());
++ }
++ EXPECT_EQ(v->SerializeAsString(), s.SerializeAsString());
++ };
++
++ check_proto(spt);
+
+ // test copy.
+- auto spt2 = spt;
+- EXPECT_EQ(spt2.nbests_size(), spt.nbests_size());
+- EXPECT_EQ(spt2.SerializeAsString(), spt.SerializeAsString());
++ const auto spt2 = spt;
++ check_proto(spt2);
++
++ // test assign.
++ const ImmutableNBestSentencePieceText spt3(spt);
++ check_proto(spt3);
++}
++
++TEST(SentencePieceProcessorTest, ConvertToUnicodeSpansTest) {
++ auto make_spt = [&](const std::vector<std::string> &tokens) {
++ SentencePieceText spt;
++ int prev = 0;
++ std::string text;
++ for (const auto &tok : tokens) {
++ auto *piece = spt.add_pieces();
++ piece->set_surface(tok);
++ piece->set_piece(tok);
++ piece->set_begin(prev);
++ piece->set_end(prev + tok.size());
++ prev += tok.size();
++ text += tok;
++ }
++ spt.set_text(text);
++ ConvertToUnicodeSpans(&spt);
++ return spt;
++ };
++
++ {
++ const auto spt = make_spt({"hello", "_world", "."});
++ EXPECT_EQ(spt.pieces_size(), 3);
++ EXPECT_EQ(spt.pieces(0).begin(), 0);
++ EXPECT_EQ(spt.pieces(0).end(), 5);
++ EXPECT_EQ(spt.pieces(1).begin(), 5);
++ EXPECT_EQ(spt.pieces(1).end(), 11);
++ EXPECT_EQ(spt.pieces(2).begin(), 11);
++ EXPECT_EQ(spt.pieces(2).end(), 12);
++ }
++
++ {
++ const auto spt = make_spt({"これは", "test", "です"});
++ EXPECT_EQ(spt.pieces_size(), 3);
++ EXPECT_EQ(spt.pieces(0).begin(), 0);
++ EXPECT_EQ(spt.pieces(0).end(), 3);
++ EXPECT_EQ(spt.pieces(1).begin(), 3);
++ EXPECT_EQ(spt.pieces(1).end(), 7);
++
++ EXPECT_EQ(spt.pieces(2).begin(), 7);
++ EXPECT_EQ(spt.pieces(2).end(), 9);
++ }
++
++ {
++ const auto spt = make_spt({"いABは", "にほCD", "へと"});
++ EXPECT_EQ(spt.pieces_size(), 3);
++ EXPECT_EQ(spt.pieces(0).begin(), 0);
++ EXPECT_EQ(spt.pieces(0).end(), 4);
++ EXPECT_EQ(spt.pieces(1).begin(), 4);
++ EXPECT_EQ(spt.pieces(1).end(), 8);
++ EXPECT_EQ(spt.pieces(2).begin(), 8);
++ EXPECT_EQ(spt.pieces(2).end(), 10);
++ }
+ }
+
+ } // namespace sentencepiece
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 3 Aug 2022 02:24:53 +0900
+Subject: Adds more unittests
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/__init__.py | 48 +++-
+ python/src/sentencepiece/sentencepiece.i | 45 +++-
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 213 +++++++++++++++--
+ python/test/sentencepiece_test.py | 301 ++++++++++++++++--------
+ src/sentencepiece_processor.cc | 67 +++---
+ src/sentencepiece_processor.h | 19 +-
+ src/sentencepiece_processor_test.cc | 11 +-
+ 7 files changed, 532 insertions(+), 172 deletions(-)
+
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index 69a9825..07acb94 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -98,6 +98,9 @@ class ImmutableSentencePieceText(object):
+ def pieces_size(self):
+ return _sentencepiece.ImmutableSentencePieceText_pieces_size(self)
+
++ def pieces(self, index):
++ return _sentencepiece.ImmutableSentencePieceText_pieces(self, index)
++
+ def text(self):
+ return _sentencepiece.ImmutableSentencePieceText_text(self)
+
+@@ -107,18 +110,24 @@ class ImmutableSentencePieceText(object):
+ def SerializeAsString(self):
+ return _sentencepiece.ImmutableSentencePieceText_SerializeAsString(self)
+
+- def pieces(self, index):
+- return _sentencepiece.ImmutableSentencePieceText_pieces(self, index)
++ def _pieces(self, index):
++ return _sentencepiece.ImmutableSentencePieceText__pieces(self, index)
++
++ def pieces(self, i):
++ return self._pieces(i)
+
+ def __len__(self):
+ return self.pieces_size()
+
+ def __getitem__(self, i):
+- return self.pieces(i)
++ return self._pieces(i)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
+
++ def __hash__(self):
++ return hash(self.SerializeAsString())
++
+
+ # Register ImmutableSentencePieceText in _sentencepiece:
+ _sentencepiece.ImmutableSentencePieceText_swigregister(ImmutableSentencePieceText)
+@@ -134,21 +143,30 @@ class ImmutableNBestSentencePieceText(object):
+ def nbests_size(self):
+ return _sentencepiece.ImmutableNBestSentencePieceText_nbests_size(self)
+
++ def nbests(self, index):
++ return _sentencepiece.ImmutableNBestSentencePieceText_nbests(self, index)
++
+ def SerializeAsString(self):
+ return _sentencepiece.ImmutableNBestSentencePieceText_SerializeAsString(self)
+
+- def nbests(self, index):
+- return _sentencepiece.ImmutableNBestSentencePieceText_nbests(self, index)
++ def _nbests(self, index):
++ return _sentencepiece.ImmutableNBestSentencePieceText__nbests(self, index)
++
++ def __nbests__(self, i):
++ return self._nbests(i)
+
+ def __len__(self):
+ return self.nbests_size()
+
+ def __getitem__(self, i):
+- return self.nbests(i)
++ return self._nbests(i)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
+
++ def __hash__(self):
++ return hash(self.SerializeAsString())
++
+
+ # Register ImmutableNBestSentencePieceText in _sentencepiece:
+ _sentencepiece.ImmutableNBestSentencePieceText_swigregister(ImmutableNBestSentencePieceText)
+@@ -272,6 +290,9 @@ class SentencePieceProcessor(object):
+ def _DecodeIdsAsSerializedProtoBatch(self, ins, num_threads):
+ return _sentencepiece.SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch(self, ins, num_threads)
+
++ def _DecodeIdsAsImmutableProtoBatch(self, ins, num_threads):
++ return _sentencepiece.SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch(self, ins, num_threads)
++
+ def _DecodePiecesBatch(self, ins, num_threads):
+ return _sentencepiece.SentencePieceProcessor__DecodePiecesBatch(self, ins, num_threads)
+
+@@ -539,6 +560,8 @@ class SentencePieceProcessor(object):
+ return self._NBestEncodeAsImmutableProto(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
++ raise RuntimeError('unknown out_type')
++
+ if type(input) is list:
+ return [_encode(n) for n in input]
+
+@@ -621,10 +644,21 @@ class SentencePieceProcessor(object):
+ if out_type is int:
+ return self._SampleEncodeAndScoreAsIds(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+- else:
++ if out_type is str:
+ return self._SampleEncodeAndScoreAsPieces(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
++ if out_type == 'serialized_proto' or out_type == 'proto':
++ return self._SampleEncodeAndScoreAsSerializedProto(text, num_samples, alpha, wor, include_best,
++ add_bos, add_eos, reverse, emit_unk_piece)
++
++ if out_type == 'immutable_proto':
++ return self._SampleEncodeAndScoreAsImmutableProto(text, num_samples, alpha, wor, include_best,
++ add_bos, add_eos, reverse, emit_unk_piece)
++
++ raise RuntimeError('unknown output type')
++
++
+ if type(input) is list:
+ return [_encode(n) for n in input]
+
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 1e2e1e0..f3a4f30 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -2,6 +2,7 @@
+ %include exception.i
+
+ %{
++
+ #include <iostream>
+ #include <algorithm>
+ #include <functional>
+@@ -286,8 +287,10 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %ignore sentencepiece::SentencePieceProcessor::status;
+ %ignore sentencepiece::ImmutableSentencePieceText::mutable_proto;
+ %ignore sentencepiece::ImmutableSentencePieceText::pieces() const;
++%ignore sentencepiece::ImmutableSentencePieceText::ConvertToUnicodeSpans;
+ %ignore sentencepiece::ImmutableNBestSentencePieceText::mutable_proto;
+ %ignore sentencepiece::ImmutableNBestSentencePieceText::nbests() const;
++%ignore sentencepiece::ImmutableNBestSentencePieceText::ConvertToUnicodeSpans;
+
+ %ignore sentencepiece::SentencePieceProcessor::Encode;
+ %ignore sentencepiece::SentencePieceProcessor::SampleEncode;
+@@ -481,6 +484,13 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ sentencepiece::util::bytes);
+ }
+
++ std::vector<sentencepiece::ImmutableSentencePieceText>
++ _DecodeIdsAsImmutableProtoBatch(
++ const std::vector<std::vector<int>> &ins, int num_threads) const {
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsImmutableProto, int,
++ sentencepiece::ImmutableSentencePieceText);
++ }
++
+ std::vector<std::string> _DecodePiecesBatch(
+ const std::vector<std::vector<absl::string_view>> &ins, int num_threads) const {
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
+@@ -852,6 +862,8 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return self._NBestEncodeAsImmutableProto(text, nbest_size,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
++ raise RuntimeError('unknown out_type')
++
+ if type(input) is list:
+ return [_encode(n) for n in input]
+
+@@ -934,10 +946,21 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ if out_type is int:
+ return self._SampleEncodeAndScoreAsIds(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+- else:
++ if out_type is str:
+ return self._SampleEncodeAndScoreAsPieces(text, num_samples, alpha, wor, include_best,
+ add_bos, add_eos, reverse, emit_unk_piece)
+
++ if out_type == 'serialized_proto' or out_type == 'proto':
++ return self._SampleEncodeAndScoreAsSerializedProto(text, num_samples, alpha, wor, include_best,
++ add_bos, add_eos, reverse, emit_unk_piece)
++
++ if out_type == 'immutable_proto':
++ return self._SampleEncodeAndScoreAsImmutableProto(text, num_samples, alpha, wor, include_best,
++ add_bos, add_eos, reverse, emit_unk_piece)
++
++ raise RuntimeError('unknown output type')
++
++
+ if type(input) is list:
+ return [_encode(n) for n in input]
+
+@@ -1187,7 +1210,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+
+ %extend sentencepiece::ImmutableSentencePieceText {
+- ImmutableSentencePieceText_ImmutableSentencePiece pieces(int index) const {
++ ImmutableSentencePieceText_ImmutableSentencePiece _pieces(int index) const {
+ if (index < 0 || index >= static_cast<int>($self->pieces_size())) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kOutOfRange,
+@@ -1197,19 +1220,25 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+
+ %pythoncode {
++ def pieces(self, i):
++ return self._pieces(i)
++
+ def __len__(self):
+ return self.pieces_size()
+
+ def __getitem__(self, i):
+- return self.pieces(i)
++ return self._pieces(i)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
++
++ def __hash__(self):
++ return hash(self.SerializeAsString())
+ }
+ }
+
+ %extend sentencepiece::ImmutableNBestSentencePieceText {
+- ImmutableSentencePieceText nbests(int index) const {
++ ImmutableSentencePieceText _nbests(int index) const {
+ if (index < 0 || index >= static_cast<int>($self->nbests_size())) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kOutOfRange,
+@@ -1219,14 +1248,20 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+
+ %pythoncode {
++ def __nbests__(self, i):
++ return self._nbests(i)
++
+ def __len__(self):
+ return self.nbests_size()
+
+ def __getitem__(self, i):
+- return self.nbests(i)
++ return self._nbests(i)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
++
++ def __hash__(self):
++ return hash(self.SerializeAsString())
+ }
+ }
+
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 9776b0f..22e0708 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -2811,6 +2811,7 @@ namespace swig {
+ }
+
+
++
+ #include <iostream>
+ #include <algorithm>
+ #include <functional>
+@@ -3132,16 +3133,6 @@ SWIG_From_size_t (size_t value)
+ }
+
+
+- #define SWIG_From_double PyFloat_FromDouble
+-
+-
+-SWIGINTERNINLINE PyObject *
+-SWIG_From_float (float value)
+-{
+- return SWIG_From_double (value);
+-}
+-
+-
+ SWIGINTERN int
+ SWIG_AsVal_double (PyObject *obj, double *val)
+ {
+@@ -3282,7 +3273,17 @@ SWIG_AsVal_int (PyObject * obj, int *val)
+ return res;
+ }
+
+-SWIGINTERN sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece sentencepiece_ImmutableSentencePieceText_pieces(sentencepiece::ImmutableSentencePieceText const *self,int index){
++
++ #define SWIG_From_double PyFloat_FromDouble
++
++
++SWIGINTERNINLINE PyObject *
++SWIG_From_float (float value)
++{
++ return SWIG_From_double (value);
++}
++
++SWIGINTERN sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece sentencepiece_ImmutableSentencePieceText__pieces(sentencepiece::ImmutableSentencePieceText const *self,int index){
+ if (index < 0 || index >= static_cast<int>(self->pieces_size())) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kOutOfRange,
+@@ -3290,7 +3291,7 @@ SWIGINTERN sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece sent
+ }
+ return self->pieces(index);
+ }
+-SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_ImmutableNBestSentencePieceText_nbests(sentencepiece::ImmutableNBestSentencePieceText const *self,int index){
++SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_ImmutableNBestSentencePieceText__nbests(sentencepiece::ImmutableNBestSentencePieceText const *self,int index){
+ if (index < 0 || index >= static_cast<int>(self->nbests_size())) {
+ throw sentencepiece::util::Status(
+ sentencepiece::util::StatusCode::kOutOfRange,
+@@ -3590,6 +3591,10 @@ SWIGINTERN BytesArray sentencepiece_SentencePieceProcessor__DecodeIdsAsSerialize
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsSerializedProto, int,
+ sentencepiece::util::bytes);
+ }
++SWIGINTERN std::vector< sentencepiece::ImmutableSentencePieceText > sentencepiece_SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< int > > const &ins,int num_threads){
++ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIdsAsImmutableProto, int,
++ sentencepiece::ImmutableSentencePieceText);
++ }
+ SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodePiecesBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< absl::string_view > > const &ins,int num_threads){
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodePieces, std::string, std::string);
+ }
+@@ -4070,6 +4075,44 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
++ int arg2 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece result;
++
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText_pieces", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
++ {
++ try {
++ result = ((sentencepiece::ImmutableSentencePieceText const *)arg1)->pieces(arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece(static_cast< const sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_text(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+@@ -4168,7 +4211,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+ int arg2 ;
+@@ -4179,20 +4222,20 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSE
+ PyObject *swig_obj[2] ;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece result;
+
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText_pieces", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText__pieces", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+ if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "2"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "2"" of type '" "int""'");
+ }
+ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = sentencepiece_ImmutableSentencePieceText_pieces((sentencepiece::ImmutableSentencePieceText const *)arg1,arg2);
++ result = sentencepiece_ImmutableSentencePieceText__pieces((sentencepiece::ImmutableSentencePieceText const *)arg1,arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -4299,6 +4342,44 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
++ int arg2 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val2 ;
++ int ecode2 = 0 ;
++ PyObject *swig_obj[2] ;
++ sentencepiece::ImmutableSentencePieceText result;
++
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText_nbests", 2, 2, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
++ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
++ if (!SWIG_IsOK(ecode2)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "2"" of type '" "int""'");
++ }
++ arg2 = static_cast< int >(val2);
++ {
++ try {
++ result = ((sentencepiece::ImmutableNBestSentencePieceText const *)arg1)->nbests(arg2);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
++ return resultobj;
++fail:
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_SerializeAsString(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
+@@ -4332,7 +4413,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText__nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
+ int arg2 ;
+@@ -4343,20 +4424,20 @@ SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIG
+ PyObject *swig_obj[2] ;
+ sentencepiece::ImmutableSentencePieceText result;
+
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText_nbests", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText__nbests", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
+ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+ if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "2"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "2"" of type '" "int""'");
+ }
+ arg2 = static_cast< int >(val2);
+ {
+ try {
+- result = sentencepiece_ImmutableNBestSentencePieceText_nbests((sentencepiece::ImmutableNBestSentencePieceText const *)arg1,arg2);
++ result = sentencepiece_ImmutableNBestSentencePieceText__nbests((sentencepiece::ImmutableNBestSentencePieceText const *)arg1,arg2);
+ ReleaseResultObject(resultobj);
+ }
+ catch (const sentencepiece::util::Status &status) {
+@@ -6822,6 +6903,87 @@ fail:
+ }
+
+
++SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++ PyObject *resultobj = 0;
++ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
++ std::vector< std::vector< int > > *arg2 = 0 ;
++ int arg3 ;
++ void *argp1 = 0 ;
++ int res1 = 0 ;
++ int val3 ;
++ int ecode3 = 0 ;
++ PyObject *swig_obj[3] ;
++ SwigValueWrapper< std::vector< sentencepiece::ImmutableSentencePieceText > > result;
++
++ if (!SWIG_Python_UnpackTuple(args, "SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch", 3, 3, swig_obj)) SWIG_fail;
++ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__SentencePieceProcessor, 0 | 0 );
++ if (!SWIG_IsOK(res1)) {
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch" "', argument " "1"" of type '" "sentencepiece::SentencePieceProcessor const *""'");
++ }
++ arg1 = reinterpret_cast< sentencepiece::SentencePieceProcessor * >(argp1);
++ {
++ std::vector<std::vector<int>> *out = nullptr;
++ if (PyList_Check(swig_obj[1])) {
++ const size_t size = PyList_Size(swig_obj[1]);
++ out = new std::vector<std::vector<int>>(size);
++ for (size_t i = 0; i < size; ++i) {
++ PyObject *o = PyList_GetItem(swig_obj[1], i);
++ if (PyList_Check(o)) {
++ const size_t size2 = PyList_Size(o);
++ (*out)[i].resize(size2);
++ for (size_t j = 0; j < size2; ++j) {
++ PyObject *o2 = PyList_GetItem(o, j);
++ if (PyInt_Check(o2)) {
++ (*out)[i][j] = static_cast<int>(PyInt_AsLong(o2));
++ } else {
++ PyErr_SetString(PyExc_TypeError, "list must contain strings");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError, "not a list");
++ SWIG_fail;
++ }
++ }
++ } else {
++ PyErr_SetString(PyExc_TypeError,"not a list");
++ SWIG_fail;
++ }
++ arg2 = out;
++ }
++ ecode3 = SWIG_AsVal_int(swig_obj[2], &val3);
++ if (!SWIG_IsOK(ecode3)) {
++ SWIG_exception_fail(SWIG_ArgError(ecode3), "in method '" "SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch" "', argument " "3"" of type '" "int""'");
++ }
++ arg3 = static_cast< int >(val3);
++ {
++ try {
++ result = sentencepiece_SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch((sentencepiece::SentencePieceProcessor const *)arg1,(std::vector< std::vector< int > > const &)*arg2,arg3);
++ ReleaseResultObject(resultobj);
++ }
++ catch (const sentencepiece::util::Status &status) {
++ SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
++ }
++ }
++ {
++ resultobj = PyList_New((&result)->size());
++ for (size_t i = 0; i < (&result)->size(); ++i) {
++ PyObject *obj = SWIG_NewPointerObj(new sentencepiece::ImmutableSentencePieceText((&result)->at(i)), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0);
++ PyList_SET_ITEM(resultobj, i, obj);
++ }
++ }
++ {
++ delete arg2;
++ }
++ return resultobj;
++fail:
++ {
++ delete arg2;
++ }
++ return NULL;
++}
++
++
+ SWIGINTERN PyObject *_wrap_SentencePieceProcessor__DecodePiecesBatch(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::SentencePieceProcessor *arg1 = (sentencepiece::SentencePieceProcessor *) 0 ;
+@@ -8298,17 +8460,19 @@ static PyMethodDef SwigMethods[] = {
+ { "new_ImmutableSentencePieceText", _wrap_new_ImmutableSentencePieceText, METH_NOARGS, NULL},
+ { "delete_ImmutableSentencePieceText", _wrap_delete_ImmutableSentencePieceText, METH_O, NULL},
+ { "ImmutableSentencePieceText_pieces_size", _wrap_ImmutableSentencePieceText_pieces_size, METH_O, NULL},
++ { "ImmutableSentencePieceText_pieces", _wrap_ImmutableSentencePieceText_pieces, METH_VARARGS, NULL},
+ { "ImmutableSentencePieceText_text", _wrap_ImmutableSentencePieceText_text, METH_O, NULL},
+ { "ImmutableSentencePieceText_score", _wrap_ImmutableSentencePieceText_score, METH_O, NULL},
+ { "ImmutableSentencePieceText_SerializeAsString", _wrap_ImmutableSentencePieceText_SerializeAsString, METH_O, NULL},
+- { "ImmutableSentencePieceText_pieces", _wrap_ImmutableSentencePieceText_pieces, METH_VARARGS, NULL},
++ { "ImmutableSentencePieceText__pieces", _wrap_ImmutableSentencePieceText__pieces, METH_VARARGS, NULL},
+ { "ImmutableSentencePieceText_swigregister", ImmutableSentencePieceText_swigregister, METH_O, NULL},
+ { "ImmutableSentencePieceText_swiginit", ImmutableSentencePieceText_swiginit, METH_VARARGS, NULL},
+ { "new_ImmutableNBestSentencePieceText", _wrap_new_ImmutableNBestSentencePieceText, METH_NOARGS, NULL},
+ { "delete_ImmutableNBestSentencePieceText", _wrap_delete_ImmutableNBestSentencePieceText, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText_nbests_size", _wrap_ImmutableNBestSentencePieceText_nbests_size, METH_O, NULL},
+- { "ImmutableNBestSentencePieceText_SerializeAsString", _wrap_ImmutableNBestSentencePieceText_SerializeAsString, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText_nbests", _wrap_ImmutableNBestSentencePieceText_nbests, METH_VARARGS, NULL},
++ { "ImmutableNBestSentencePieceText_SerializeAsString", _wrap_ImmutableNBestSentencePieceText_SerializeAsString, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText__nbests", _wrap_ImmutableNBestSentencePieceText__nbests, METH_VARARGS, NULL},
+ { "ImmutableNBestSentencePieceText_swigregister", ImmutableNBestSentencePieceText_swigregister, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText_swiginit", ImmutableNBestSentencePieceText_swiginit, METH_VARARGS, NULL},
+ { "new_SentencePieceProcessor", _wrap_new_SentencePieceProcessor, METH_NOARGS, NULL},
+@@ -8350,6 +8514,7 @@ static PyMethodDef SwigMethods[] = {
+ { "SentencePieceProcessor__DecodePiecesAsImmutableProto", _wrap_SentencePieceProcessor__DecodePiecesAsImmutableProto, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIdsBatch", _wrap_SentencePieceProcessor__DecodeIdsBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodeIdsAsSerializedProtoBatch, METH_VARARGS, NULL},
++ { "SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch", _wrap_SentencePieceProcessor__DecodeIdsAsImmutableProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesBatch", _wrap_SentencePieceProcessor__DecodePiecesBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch", _wrap_SentencePieceProcessor__DecodePiecesAsSerializedProtoBatch, METH_VARARGS, NULL},
+ { "SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch", _wrap_SentencePieceProcessor__DecodePiecesAsImmutableProtoBatch, METH_VARARGS, NULL},
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 2f2c84a..5e4af7f 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -266,6 +266,13 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ t4 = self.sp_.decode_pieces_as_serialized_proto(['foo', 'bar'])
+ t5 = self.sp_.decode_ids_as_serialized_proto([20, 30])
+
++ y1 = self.sp_.encode(text, out_type='serialized_proto')
++ y2 = self.sp_.encode(
++ text, enable_sampling=True, out_type='serialized_proto')
++ y3 = self.sp_.nbest_encode(text, out_type='serialized_proto', nbest_size=10)
++ y4 = self.sp_.decode(['foo', 'bar'], out_type='serialized_proto')
++ y5 = self.sp_.decode([20, 30], out_type='serialized_proto')
++
+ self.assertEqual(type(s1), bytes)
+ self.assertEqual(type(s2), bytes)
+ self.assertEqual(type(t2), bytes)
+@@ -277,6 +284,92 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(s3, t3)
+ self.assertEqual(s4, t4)
+ self.assertEqual(s5, t5)
++ self.assertEqual(s1, y1)
++ self.assertEqual(s3, y3)
++ self.assertEqual(s4, y4)
++ self.assertEqual(s5, y5)
++
++ ids = self.jasp_.EncodeAsIds(text)
++ pieces = self.jasp_.EncodeAsPieces(text)
++ s1 = self.jasp_.EncodeAsSerializedProto(text)
++ s2 = self.jasp_.DecodeIdsAsSerializedProto(ids)
++ s3 = self.jasp_.DecodePiecesAsSerializedProto(ids)
++ self.assertEqual(s2, s1)
++ self.assertEqual(s3, s1)
++
++ def test_immutable_proto(self):
++ text = 'I saw a girl with a telescope.'
++ s1 = self.sp_.EncodeAsImmutableProto(text)
++ s2 = self.sp_.SampleEncodeAsImmutableProto(text, 10, 0.2)
++ s3 = self.sp_.NBestEncodeAsImmutableProto(text, 10)
++ s4 = self.sp_.DecodePiecesAsImmutableProto(['foo', 'bar'])
++ s5 = self.sp_.DecodeIdsAsImmutableProto([20, 30])
++
++ t1 = self.sp_.encode_as_immutable_proto(text)
++ t2 = self.sp_.sample_encode_as_immutable_proto(text, 10, 0.2)
++ t3 = self.sp_.nbest_encode_as_immutable_proto(text, 10)
++ t4 = self.sp_.decode_pieces_as_immutable_proto(['foo', 'bar'])
++ t5 = self.sp_.decode_ids_as_immutable_proto([20, 30])
++
++ y1 = self.sp_.encode(text, out_type='immutable_proto')
++ y2 = self.sp_.encode(text, enable_sampling=True, out_type='immutable_proto')
++ y3 = self.sp_.nbest_encode(text, out_type='immutable_proto', nbest_size=10)
++ y4 = self.sp_.decode(['foo', 'bar'], out_type='immutable_proto')
++ y5 = self.sp_.decode([20, 30], out_type='immutable_proto')
++
++ self.assertEqual(s1, t1)
++ self.assertEqual(s3, t3)
++ self.assertEqual(s4, t4)
++ self.assertEqual(s5, t5)
++ self.assertEqual(s1, y1)
++ self.assertEqual(s3, y3)
++ self.assertEqual(s4, y4)
++ self.assertEqual(s5, y5)
++
++ x1 = self.sp_.encode_as_serialized_proto(text)
++ x2 = self.sp_.sample_encode_as_serialized_proto(text, 10, 0.2)
++ x3 = self.sp_.nbest_encode_as_serialized_proto(text, 10)
++ x4 = self.sp_.decode_pieces_as_serialized_proto(['foo', 'bar'])
++ x5 = self.sp_.decode_ids_as_serialized_proto([20, 30])
++
++ self.assertEqual(x1, t1.SerializeAsString())
++ self.assertEqual(x3, t3.SerializeAsString())
++ self.assertEqual(x4, t4.SerializeAsString())
++ self.assertEqual(x5, t5.SerializeAsString())
++
++ v1 = self.sp_.EncodeAsIds(text)
++ v2 = self.sp_.EncodeAsPieces(text)
++ self.assertEqual([x.id() for x in s1], v1)
++ self.assertEqual([x.piece() for x in s1], v2)
++ self.assertEqual(text, s1.text())
++
++ surfaces1 = [s1.text()[x.begin():x.end()] for x in s1]
++ surfaces2 = [x.surface() for x in s1]
++ self.assertEqual(surfaces1, surfaces2)
++
++ ids = []
++ for i in range(s1.pieces_size()):
++ ids.append(s1.pieces(i).id())
++ self.assertEqual(ids, v1)
++
++ pieces = []
++ for i in range(s1.pieces_size()):
++ pieces.append(s1.pieces(i).piece())
++ self.assertEqual(pieces, v2)
++
++ # Japanese offset
++ s1 = self.jasp_.EncodeAsImmutableProto('吾輩は猫である。Hello world. ABC 123')
++ surfaces1 = [s1.text()[x.begin():x.end()] for x in s1]
++ surfaces2 = [x.surface() for x in s1]
++ self.assertEqual(surfaces1, surfaces2)
++
++ ids = [x.id() for x in s1]
++ s2 = self.jasp_.DecodeIdsAsImmutableProto(ids)
++ self.assertEqual(s2, s1)
++
++ pieces = [x.piece() for x in s1]
++ s2 = self.jasp_.DecodePiecesAsImmutableProto(pieces)
++ self.assertEqual(s2, s1)
+
+ def test_new_api(self):
+ sp = spm.SentencePieceProcessor(
+@@ -386,49 +479,102 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(pieces, sp.encode(text, add_bos=False, add_eos=True))
+
+ def test_sampling(self):
+- sp = spm.SentencePieceProcessor(
+- model_file=os.path.join('test', 'test_model.model'),
+- out_type=str,
+- enable_sampling=True)
+- ids = defaultdict(int)
+- for n in range(100):
+- ++ids[' '.join(sp.encode('hello world'))]
+- self.assertGreater(len(ids), 1)
+-
+- ids2 = defaultdict(int)
+- for n in range(100):
+- ++ids2[' '.join(sp.encode('hello world', enable_sampling=False))]
+- self.assertEqual(len(ids2), 1)
++ sp = self.sp_
++
++ for out_type in [str, int, 'serialized_proto', 'immutable_proto']:
++ ids = defaultdict(int)
++ for n in range(100):
++ out = sp.encode('hello world', out_type=out_type, enable_sampling=True)
++ if type(out) is list:
++ out = tuple(out)
++ ++ids[out]
++ self.assertGreater(len(ids), 1)
++
++ ids2 = defaultdict(int)
++ for n in range(100):
++ out = sp.encode('hello world', out_type=out_type, enable_sampling=False)
++ if type(out) is list:
++ out = tuple(out)
++ ++ids2[out]
++ self.assertEqual(len(ids2), 1)
++
++ out = sp.encode(['hello world', 'this is a test'],
++ out_type=out_type,
++ enable_sampling=True)
++ self.assertEqual(len(out), 2)
++ out = sp.encode(['hello world', 'this is a test'],
++ out_type=out_type,
++ enable_sampling=False)
++ self.assertEqual(len(out), 2)
+
+ def test_nbest(self):
+- sp = spm.SentencePieceProcessor(
+- model_file=os.path.join('test', 'test_model.model'))
++ sp = self.sp_
+ text = 'hello world'
+- results = sp.nbest_encode(text, nbest_size=10, out_type=str)
+- self.assertEqual(results, sp.NBestEncode(text, nbest_size=10, out_type=str))
+- for n in results:
+- self.assertEqual(sp.decode(n), text)
+- decoded = sp.decode(results)
+- for n in decoded:
+- self.assertEqual(n, text)
+- results = sp.nbest_encode(text, nbest_size=10, out_type=int)
+- self.assertEqual(results, sp.NBestEncode(text, nbest_size=10, out_type=int))
+- for n in results:
+- self.assertEqual(sp.decode(n), text)
+- decoded = sp.decode(results)
+- for n in decoded:
+- self.assertEqual(n, text)
++ text2 = 'I have a pen.'
++
++ for out_type in [str, int, 'serialized_proto', 'immutable_proto']:
++ results = sp.nbest_encode(text, nbest_size=10, out_type=out_type)
++ self.assertEqual(results,
++ sp.NBestEncode(text, nbest_size=10, out_type=out_type))
++
++ if out_type in [str, int]:
++ for n in results:
++ self.assertEqual(sp.decode(n), text)
++
++ for n in sp.decode(results):
++ self.assertEqual(n, text)
++
++ # batch test
++ results = sp.nbest_encode([text, text2], nbest_size=10, out_type=out_type)
++ self.assertEqual(
++ results,
++ sp.NBestEncode([text, text2], nbest_size=10, out_type=out_type))
++ self.assertEqual(len(results), 2)
++
++ if out_type in [str, int]:
++ for n in results[0]:
++ self.assertEqual(sp.decode(n), text)
++
++ for n in results[1]:
++ self.assertEqual(sp.decode(n), text2)
++
++ decoded = sp.decode(results[0])
++ self.assertEqual(len(decoded), 10)
++ for n in decoded:
++ self.assertEqual(n, text)
++ decoded = sp.decode(results[1])
++ self.assertEqual(len(decoded), 10)
++ for n in decoded:
++ self.assertEqual(n, text2)
+
+ def test_sample_and_score(self):
+- sp = spm.SentencePieceProcessor(
+- model_file=os.path.join('test', 'test_model.model'))
++ sp = self.sp_
+ text = 'hello world'
+- results = sp.sample_encode_and_score(text, wor=True, out_type=str)
+- for n in results:
+- self.assertEqual(sp.decode(n[0]), text)
+- results = sp.sample_encode_and_score(text, wor=True, out_type=int)
+- for n in results:
+- self.assertEqual(sp.decode(n[0]), text)
++ text2 = 'I have a pen.'
++ for out_type in [str, int, 'serialized_proto', 'immutable_proto']:
++ results = sp.sample_encode_and_score(
++ text, wor=True, num_samples=10, out_type=out_type)
++ results = sp.SampleEncodeAndScore(
++ text, wor=False, num_samples=10, out_type=out_type)
++
++ if out_type in [str, int]:
++ for n in results:
++ self.assertEqual(sp.decode(n[0]), text)
++
++ results = sp.sample_encode_and_score([text, text2],
++ wor=True,
++ num_samples=10,
++ out_type=out_type)
++ results = sp.SampleEncodeAndScore([text, text2],
++ wor=True,
++ num_samples=10,
++ out_type=out_type)
++
++ if out_type in [str, int]:
++ for n in results[0]:
++ self.assertEqual(sp.decode(n[0]), text)
++ for n in results[1]:
++ self.assertEqual(sp.decode(n[0]), text2)
+
+ def test_valid_range(self):
+ size = self.sp_.piece_size()
+@@ -452,65 +598,28 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ with open(os.path.join(data_dir, 'botchan.txt'), 'r') as file:
+ texts = file.readlines()
+
+- r1 = sp.encode(texts, out_type=str, num_threads=None)
+- r2 = sp.encode(texts, out_type=str, num_threads=1)
+- r3 = sp.encode(texts, out_type=str, num_threads=-1)
+- r4 = sp.encode(texts, out_type=str, num_threads=8)
+- r5 = [sp.encode(s, out_type=str) for s in texts]
+- self.assertEqual(r1, r2)
+- self.assertEqual(r1, r3)
+- self.assertEqual(r1, r4)
+- self.assertEqual(r1, r5)
+-
+- d1 = sp.decode(r1, num_threads=None)
+- d2 = sp.decode(r2, num_threads=1)
+- d3 = sp.decode(r3, num_threads=-1)
+- d4 = sp.decode(r4, num_threads=8)
+- d5 = [sp.decode(s) for s in r5]
+- self.assertEqual(d1, d2)
+- self.assertEqual(d1, d3)
+- self.assertEqual(d1, d4)
+- self.assertEqual(d1, d5)
+-
+- r1 = sp.encode(texts, out_type=int, num_threads=None)
+- r2 = sp.encode(texts, out_type=int, num_threads=1)
+- r3 = sp.encode(texts, out_type=int, num_threads=-1)
+- r4 = sp.encode(texts, out_type=int, num_threads=8)
+- r5 = [sp.encode(s, out_type=int) for s in texts]
+- self.assertEqual(r1, r2)
+- self.assertEqual(r1, r3)
+- self.assertEqual(r1, r4)
+- self.assertEqual(r1, r5)
+-
+- d1 = sp.decode(r1, num_threads=None)
+- d2 = sp.decode(r2, num_threads=1)
+- d3 = sp.decode(r3, num_threads=-1)
+- d4 = sp.decode(r4, num_threads=8)
+- d5 = [sp.decode(s) for s in r5]
+- self.assertEqual(d1, d2)
+- self.assertEqual(d1, d3)
+- self.assertEqual(d1, d4)
+- self.assertEqual(d1, d5)
+-
+- r1 = sp.encode(texts, out_type='serialized_proto', num_threads=None)
+- r2 = sp.encode(texts, out_type='serialized_proto', num_threads=1)
+- r3 = sp.encode(texts, out_type='serialized_proto', num_threads=-1)
+- r4 = sp.encode(texts, out_type='serialized_proto', num_threads=8)
+- r5 = [sp.encode(s, out_type='serialized_proto') for s in texts]
+- self.assertEqual(r1, r2)
+- self.assertEqual(r1, r3)
+- self.assertEqual(r1, r4)
+- self.assertEqual(r1, r5)
+-
+- r1 = sp.encode(texts, out_type='immutable_proto', num_threads=None)
+- r2 = sp.encode(texts, out_type='immutable_proto', num_threads=1)
+- r3 = sp.encode(texts, out_type='immutable_proto', num_threads=-1)
+- r4 = sp.encode(texts, out_type='immutable_proto', num_threads=8)
+- r5 = [sp.encode(s, out_type='immutable_proto') for s in texts]
+- self.assertEqual(r1, r2)
+- self.assertEqual(r1, r3)
+- self.assertEqual(r1, r4)
+- self.assertEqual(r1, r5)
++ for out_type in [str, int, 'serialized_proto', 'immutable_proto']:
++ r1 = sp.encode(texts, out_type=out_type, num_threads=None)
++ r2 = sp.encode(texts, out_type=out_type, num_threads=1)
++ r3 = sp.encode(texts, out_type=out_type, num_threads=-1)
++ r4 = sp.encode(texts, out_type=out_type, num_threads=8)
++ r5 = [sp.encode(s, out_type=out_type) for s in texts]
++ self.assertEqual(r1, r2)
++ self.assertEqual(r1, r3)
++ self.assertEqual(r1, r4)
++ self.assertEqual(r1, r5)
++
++ if out_type in [str, int]:
++ d1 = sp.decode(r1, num_threads=None)
++ d2 = sp.decode(r2, num_threads=1)
++ d3 = sp.decode(r3, num_threads=-1)
++ d4 = sp.decode(r4, num_threads=8)
++ d5 = [sp.decode(s) for s in r5]
++
++ self.assertEqual(d1, d2)
++ self.assertEqual(d1, d3)
++ self.assertEqual(d1, d4)
++ self.assertEqual(d1, d5)
+
+ e1 = sp.calculate_entropy(texts, alpha=1.0, num_threads=10)
+ e2 = sp.CalculateEntropy(texts, alpha=1.0, num_threads=10)
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index 482a45b..2a5c399 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -55,6 +55,34 @@ std::vector<absl::string_view> ToPieceArray(const std::vector<std::string> &v) {
+ return out;
+ }
+
++void ConvertToUnicodeSpansInternal(SentencePieceText *spt) {
++ if (spt == nullptr) return;
++
++ std::vector<int> utf8_to_unicode(spt->text().size() + 1, 0);
++ absl::string_view str = spt->text();
++ size_t prev = 0;
++ int ulen = 0;
++ while (!str.empty()) {
++ const size_t mblen = string_util::OneCharLen(str.data());
++ for (int i = prev; i < prev + mblen; ++i) {
++ utf8_to_unicode[i] = ulen;
++ }
++ ++ulen;
++ prev += mblen;
++ str.remove_prefix(mblen);
++ }
++ utf8_to_unicode[prev] = ulen;
++
++ auto clip = [&](int s) {
++ return std::min<int>(std::max<int>(0, s), utf8_to_unicode.size() - 1);
++ };
++
++ for (auto &piece : *(spt->mutable_pieces())) {
++ piece.set_begin(utf8_to_unicode[clip(piece.begin())]);
++ piece.set_end(utf8_to_unicode[clip(piece.end())]);
++ }
++}
++
+ } // namespace
+
+ ImmutableSentencePieceText::ImmutableSentencePieceText()
+@@ -132,6 +160,10 @@ SentencePieceText *ImmutableSentencePieceText::mutable_proto() {
+ return rep_.get();
+ }
+
++void ImmutableSentencePieceText::ConvertToUnicodeSpans() {
++ ConvertToUnicodeSpansInternal(mutable_proto());
++}
++
+ util::bytes ImmutableSentencePieceText::SerializeAsString() const {
+ return spt_->SerializeAsString();
+ }
+@@ -164,6 +196,13 @@ NBestSentencePieceText *ImmutableNBestSentencePieceText::mutable_proto() {
+ return rep_.get();
+ }
+
++void ImmutableNBestSentencePieceText::ConvertToUnicodeSpans() {
++ if (!mutable_proto()) return;
++ for (auto &spt : *(mutable_proto()->mutable_nbests())) {
++ ConvertToUnicodeSpansInternal(&spt);
++ }
++}
++
+ util::bytes ImmutableNBestSentencePieceText::SerializeAsString() const {
+ return rep_ ? rep_->SerializeAsString() : "";
+ }
+@@ -1048,34 +1087,6 @@ std::string SentencePieceProcessor::serialized_model_proto() const {
+ // std::random_device.
+ void SetRandomGeneratorSeed(unsigned int seed);
+
+-void ConvertToUnicodeSpans(SentencePieceText *spt) {
+- if (spt == nullptr) return;
+-
+- std::vector<int> utf8_to_unicode(spt->text().size() + 1, 0);
+- absl::string_view str = spt->text();
+- size_t prev = 0;
+- int ulen = 0;
+- while (!str.empty()) {
+- const size_t mblen = string_util::OneCharLen(str.data());
+- for (int i = prev; i < prev + mblen; ++i) {
+- utf8_to_unicode[i] = ulen;
+- }
+- ++ulen;
+- prev += mblen;
+- str.remove_prefix(mblen);
+- }
+- utf8_to_unicode[prev] = ulen;
+-
+- auto clip = [&](int s) {
+- return std::min<int>(std::max<int>(0, s), utf8_to_unicode.size() - 1);
+- };
+-
+- for (auto &piece : *(spt->mutable_pieces())) {
+- piece.set_begin(utf8_to_unicode[clip(piece.begin())]);
+- piece.set_end(utf8_to_unicode[clip(piece.end())]);
+- }
+-}
+-
+ namespace io {
+ util::Status LoadModelProto(absl::string_view filename,
+ ModelProto *model_proto) {
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index b7fae6a..d107a2a 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -25,8 +25,8 @@
+ #ifndef SWIG
+ namespace absl {
+ using std::string_view;
+-}
+-#endif // SWIG
++} // namespace absl
++#endif
+
+ namespace sentencepiece {
+ namespace util {
+@@ -196,6 +196,9 @@ class ImmutableSentencePieceText {
+ // it returns the raw pointer managed by the shared_ptr.
+ SentencePieceText *mutable_proto();
+
++ // Converts the utf8 byte spans into Unicode char span.
++ void ConvertToUnicodeSpans();
++
+ friend class ImmutableNBestSentencePieceText;
+
+ private:
+@@ -225,6 +228,8 @@ class ImmutableNBestSentencePieceText {
+ // it returns the raw pointer managed by the shared_ptr.
+ NBestSentencePieceText *mutable_proto();
+
++ void ConvertToUnicodeSpans();
++
+ private:
+ std::shared_ptr<NBestSentencePieceText> rep_;
+ };
+@@ -415,14 +420,16 @@ class SentencePieceProcessor {
+ virtual util::Status Decode(const std::vector<int> &ids,
+ SentencePieceText *spt) const;
+
+-#ifdef SWIG
++#ifdef SWIGPYTHON
++#define CONVERT_TO_UNICODE_SPAN output.ConvertToUnicodeSpans();
+ #define SPP_SWIG_CHECK_AND_THROW \
+ if (!status.ok()) throw status;
+ #else
++#define CONVERT_TO_UNICODE_SPAN
+ #define SPP_SWIG_CHECK_AND_THROW \
+ if (!status.ok()) { \
+ }
+-#endif // SWIG
++#endif // SWIGPYTHON
+
+ #define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+@@ -439,6 +446,7 @@ class SentencePieceProcessor {
+ #define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
++ CONVERT_TO_UNICODE_SPAN; \
+ SPP_SWIG_CHECK_AND_THROW; \
+ return output;
+
+@@ -707,9 +715,6 @@ class SentencePieceProcessor {
+ // std::random_device.
+ void SetRandomGeneratorSeed(unsigned int seed);
+
+-// Converts the utf8 byte spans into Unicode char span.
+-void ConvertToUnicodeSpans(SentencePieceText *spt);
+-
+ #ifndef SWIG
+ // IO related functions to absorb model formats.
+ namespace io {
+diff --git a/src/sentencepiece_processor_test.cc b/src/sentencepiece_processor_test.cc
+index ff55aeb..f05dc5d 100644
+--- a/src/sentencepiece_processor_test.cc
++++ b/src/sentencepiece_processor_test.cc
+@@ -1657,11 +1657,12 @@ TEST(SentencePieceProcessorTest, ImmutableNBestSentencePieceTextTest) {
+
+ TEST(SentencePieceProcessorTest, ConvertToUnicodeSpansTest) {
+ auto make_spt = [&](const std::vector<std::string> &tokens) {
+- SentencePieceText spt;
++ ImmutableSentencePieceText ispt;
++ auto *spt = ispt.mutable_proto();
+ int prev = 0;
+ std::string text;
+ for (const auto &tok : tokens) {
+- auto *piece = spt.add_pieces();
++ auto *piece = spt->add_pieces();
+ piece->set_surface(tok);
+ piece->set_piece(tok);
+ piece->set_begin(prev);
+@@ -1669,9 +1670,9 @@ TEST(SentencePieceProcessorTest, ConvertToUnicodeSpansTest) {
+ prev += tok.size();
+ text += tok;
+ }
+- spt.set_text(text);
+- ConvertToUnicodeSpans(&spt);
+- return spt;
++ spt->set_text(text);
++ ispt.ConvertToUnicodeSpans();
++ return ispt;
+ };
+
+ {
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 3 Aug 2022 12:45:31 +0900
+Subject: Adds SWIGPYTHON flag
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/setup.py | 3 ++-
+ python/src/sentencepiece/__init__.py | 2 +-
+ 2 files changed, 3 insertions(+), 2 deletions(-)
+
+diff --git a/python/setup.py b/python/setup.py
+index fdf9394..3438ddd 100755
+--- a/python/setup.py
++++ b/python/setup.py
+@@ -96,6 +96,7 @@ class build_ext(_build_ext):
+ else:
+ cflags.append('-Wl,-strip-all')
+ libs.append('-Wl,-strip-all')
++ cflags.append('-DSWIGPYTHON')
+ print('## cflags={}'.format(' '.join(cflags)))
+ print('## libs={}'.format(' '.join(libs)))
+ ext.extra_compile_args = cflags
+@@ -115,7 +116,7 @@ if os.name == 'nt':
+ '..\\build\\root_{}\\lib\\sentencepiece_train.lib'.format(arch)
+ ]
+ else:
+- cflags = ['/std:c++17', '/MT', '/I..\\build\\root\\include']
++ cflags = ['/std:c++17', '/MT', '/I..\\build\\root\\include', '/DSWIGPYTHON']
+ libs = [
+ '..\\build\\root\\lib\\sentencepiece.lib',
+ '..\\build\\root\\lib\\sentencepiece_train.lib'
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index 07acb94..2a91022 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -126,7 +126,7 @@ class ImmutableSentencePieceText(object):
+ return self.SerializeAsString() == other.SerializeAsString()
+
+ def __hash__(self):
+- return hash(self.SerializeAsString())
++ return hash(self.SerializeAsString())
+
+
+ # Register ImmutableSentencePieceText in _sentencepiece:
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 3 Aug 2022 15:45:09 +0900
+Subject: remove unused ifdef SWIG macro
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/sentencepiece.i | 5 ++++
+ src/sentencepiece_processor.h | 42 ++++++++++++++++++--------------
+ 2 files changed, 29 insertions(+), 18 deletions(-)
+
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index f3a4f30..75f62c8 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -326,6 +326,8 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %ignore sentencepiece::SentencePieceProcessor::model_proto;
+ %ignore sentencepiece::SentencePieceProcessor::Load;
+ %ignore sentencepiece::SentencePieceProcessor::LoadOrDie;
++%ignore sentencepiece::SentencePieceProcessor::SetModel;
++%ignore sentencepiece::SentencePieceProcessor::SetNormalizer;
+ %ignore sentencepiece::pretokenizer::PretokenizerForTrainingInterface;
+ %ignore sentencepiece::SentenceIterator;
+ %ignore sentencepiece::ConvertToUnicodeSpans;
+@@ -339,6 +341,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ %ignore sentencepiece::SentencePieceTrainer::SetPretokenizerForTraining;
+ %ignore sentencepiece::SentencePieceTrainer::GetPretokenizerForTraining;
+
++%ignore sentencepiece::io::LoadModelProto;
++%ignore sentencepiece::io::SaveModelProto;
++
+ %extend sentencepiece::SentencePieceProcessor {
+ sentencepiece::util::Status LoadFromFile(absl::string_view arg) {
+ return $self->Load(arg);
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index d107a2a..be9449e 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -26,7 +26,7 @@
+ namespace absl {
+ using std::string_view;
+ } // namespace absl
+-#endif
++#endif // SWIG
+
+ namespace sentencepiece {
+ namespace util {
+@@ -420,36 +420,46 @@ class SentencePieceProcessor {
+ virtual util::Status Decode(const std::vector<int> &ids,
+ SentencePieceText *spt) const;
+
+-#ifdef SWIGPYTHON
+-#define CONVERT_TO_UNICODE_SPAN output.ConvertToUnicodeSpans();
+-#define SPP_SWIG_CHECK_AND_THROW \
+- if (!status.ok()) throw status;
++#ifndef SWIGPYTHON
++
++#define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
++ OutType output; \
++ const auto status = FuncName(__VA_ARGS__, &output); \
++ return output;
++
++#define DEFINE_SPP_SERIALIZED_PROTO_IMPL(FuncName, OutType, ...) \
++ OutType output; \
++ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
++ return output.SerializeAsString();
++
++#define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
++ OutType output; \
++ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
++ return output;
++
+ #else
+-#define CONVERT_TO_UNICODE_SPAN
+-#define SPP_SWIG_CHECK_AND_THROW \
+- if (!status.ok()) { \
+- }
+-#endif // SWIGPYTHON
+
+ #define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, &output); \
+- SPP_SWIG_CHECK_AND_THROW; \
++ if (!status.ok()) throw status; \
+ return output;
+
+ #define DEFINE_SPP_SERIALIZED_PROTO_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- SPP_SWIG_CHECK_AND_THROW; \
++ if (!status.ok()) throw status; \
+ return output.SerializeAsString();
+
+ #define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- CONVERT_TO_UNICODE_SPAN; \
+- SPP_SWIG_CHECK_AND_THROW; \
++ if (!status.ok()) throw status; \
++ output.ConvertToUnicodeSpans(); \
+ return output;
+
++#endif // SWIGPYTHON
++
+ //////////////////////////////////////////////////////////////
+ // Handy methods that return the result directly.
+ // These functions ignore internal errors.
+@@ -664,7 +674,6 @@ class SentencePieceProcessor {
+ // Returns PAD (<pad>) id.
+ virtual int pad_id() const;
+
+-#ifndef SWIG
+ //////////////////////////////////////////////////////////////
+ // Model management.
+ //
+@@ -673,7 +682,6 @@ class SentencePieceProcessor {
+
+ // Allows injection of a normalizer instance. `normalizer` is moved.
+ void SetNormalizer(std::unique_ptr<normalizer::Normalizer> &&normalizer);
+-#endif // SWIG
+
+ // Returns immutable model proto. Useful to obtain extended
+ // or experimental parameters encoded in model_proto.
+@@ -715,7 +723,6 @@ class SentencePieceProcessor {
+ // std::random_device.
+ void SetRandomGeneratorSeed(unsigned int seed);
+
+-#ifndef SWIG
+ // IO related functions to absorb model formats.
+ namespace io {
+ // Loads `model_proto` from `filename`.
+@@ -730,6 +737,5 @@ util::Status LoadModelProto(absl::string_view, ModelProto *model_proto);
+ // Saves `model_proto` as `filename`.
+ util::Status SaveModelProto(absl::string_view, const ModelProto &model_proto);
+ } // namespace io
+-#endif // SWIG
+ } // namespace sentencepiece
+ #endif // SENTENCEPIECE_PROCESSOR_H_
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Wed, 3 Aug 2022 17:20:01 +0900
+Subject: Fixed test failure.
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/sentencepiece.i | 35 +++++++++++++++++++++----
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 35 +++++++++++++++++++++----
+ src/sentencepiece_processor.cc | 4 +--
+ src/sentencepiece_processor.h | 34 +++++++-----------------
+ 4 files changed, 72 insertions(+), 36 deletions(-)
+
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 75f62c8..1a94fef 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -193,6 +193,19 @@ inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
+
+ inline void CheckIds(const std::vector<absl::string_view> &ids, int num_pieces) {}
+
++template <typename T>
++inline void ConvertToUnicodeSpans(T *proto) {}
++
++template <>
++inline void ConvertToUnicodeSpans(sentencepiece::ImmutableSentencePieceText *proto) {
++ proto->ConvertToUnicodeSpans();
++}
++
++template <>
++inline void ConvertToUnicodeSpans(sentencepiece::ImmutableNBestSentencePieceText *proto) {
++ proto->ConvertToUnicodeSpans();
++}
++
+ class ThreadPool {
+ public:
+ explicit ThreadPool(size_t request_size) :
+@@ -239,6 +252,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ self->FuncName(ins[i]); \
+ RewriteIds(*self, &out, add_bos, add_eos, reverse, \
+ emit_unk_piece); \
++ ConvertToUnicodeSpans(&out); \
+ outs[i] = std::move(out); \
+ } \
+ }); \
+@@ -255,7 +269,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ pool.Schedule([&, n]() { \
+ for (size_t i = n; i < ins.size(); i += num_threads) { \
+ CheckIds(ins[i], self->GetPieceSize()); \
+- outs[i] = self->FuncName(ins[i]); \
++ auto out = self->FuncName(ins[i]); \
++ ConvertToUnicodeSpans(&out); \
++ outs[i] = std::move(out); \
+ } \
+ }); \
+ } \
+@@ -396,6 +412,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ auto proto = enable_sampling ?
+ $self->SampleEncodeAsImmutableProto(text, nbest_size, alpha) :
+ $self->EncodeAsImmutableProto(text);
++ proto.ConvertToUnicodeSpans();
+ RewriteIds(*$self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
+ return proto;
+ }
+@@ -467,13 +484,17 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ sentencepiece::ImmutableSentencePieceText _DecodeIdsAsImmutableProto(
+ const std::vector<int> &ids) const {
+ CheckIds(ids, $self->GetPieceSize());
+- return $self->DecodeIdsAsImmutableProto(ids);
++ auto proto = $self->DecodeIdsAsImmutableProto(ids);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+
+ sentencepiece::ImmutableSentencePieceText _DecodePiecesAsImmutableProto(
+ const std::vector<absl::string_view> &pieces) const {
+ CheckIds(pieces, $self->GetPieceSize());
+- return $self->DecodePiecesAsImmutableProto(pieces);
++ auto proto= $self->DecodePiecesAsImmutableProto(pieces);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+
+ /////////////////////////////////////////////////////////////////////////////
+@@ -557,7 +578,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ bool emit_unk_piece) const {
+ RewriteIds(*$self, static_cast<sentencepiece::ImmutableSentencePieceText *>(nullptr),
+ add_bos, add_eos, reverse, emit_unk_piece);
+- return $self->NBestEncodeAsImmutableProto(text, nbest_size);
++ auto proto = $self->NBestEncodeAsImmutableProto(text, nbest_size);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+
+
+@@ -611,8 +634,10 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ bool emit_unk_piece) const {
+ RewriteIds(*$self, static_cast<sentencepiece::util::bytes *>(nullptr),
+ add_bos, add_eos, reverse, emit_unk_piece);
+- return $self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
++ auto proto = $self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
+ alpha, wor, include_best);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+
+
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 22e0708..4b8b5ef 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -3002,6 +3002,19 @@ inline void CheckIds(const std::vector<int> &ids, int num_pieces) {
+
+ inline void CheckIds(const std::vector<absl::string_view> &ids, int num_pieces) {}
+
++template <typename T>
++inline void ConvertToUnicodeSpans(T *proto) {}
++
++template <>
++inline void ConvertToUnicodeSpans(sentencepiece::ImmutableSentencePieceText *proto) {
++ proto->ConvertToUnicodeSpans();
++}
++
++template <>
++inline void ConvertToUnicodeSpans(sentencepiece::ImmutableNBestSentencePieceText *proto) {
++ proto->ConvertToUnicodeSpans();
++}
++
+ class ThreadPool {
+ public:
+ explicit ThreadPool(size_t request_size) :
+@@ -3048,6 +3061,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ self->FuncName(ins[i]); \
+ RewriteIds(*self, &out, add_bos, add_eos, reverse, \
+ emit_unk_piece); \
++ ConvertToUnicodeSpans(&out); \
+ outs[i] = std::move(out); \
+ } \
+ }); \
+@@ -3064,7 +3078,9 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ pool.Schedule([&, n]() { \
+ for (size_t i = n; i < ins.size(); i += num_threads) { \
+ CheckIds(ins[i], self->GetPieceSize()); \
+- outs[i] = self->FuncName(ins[i]); \
++ auto out = self->FuncName(ins[i]); \
++ ConvertToUnicodeSpans(&out); \
++ outs[i] = std::move(out); \
+ } \
+ }); \
+ } \
+@@ -3540,6 +3556,7 @@ SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePiece
+ auto proto = enable_sampling ?
+ self->SampleEncodeAsImmutableProto(text, nbest_size, alpha) :
+ self->EncodeAsImmutableProto(text);
++ proto.ConvertToUnicodeSpans();
+ RewriteIds(*self, &proto, add_bos, add_eos, reverse, emit_unk_piece);
+ return proto;
+ }
+@@ -3578,11 +3595,15 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__Deco
+ }
+ SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePieceProcessor__DecodeIdsAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,std::vector< int > const &ids){
+ CheckIds(ids, self->GetPieceSize());
+- return self->DecodeIdsAsImmutableProto(ids);
++ auto proto = self->DecodeIdsAsImmutableProto(ids);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+ SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_SentencePieceProcessor__DecodePiecesAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,std::vector< absl::string_view > const &pieces){
+ CheckIds(pieces, self->GetPieceSize());
+- return self->DecodePiecesAsImmutableProto(pieces);
++ auto proto= self->DecodePiecesAsImmutableProto(pieces);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+ SWIGINTERN std::vector< std::string > sentencepiece_SentencePieceProcessor__DecodeIdsBatch(sentencepiece::SentencePieceProcessor const *self,std::vector< std::vector< int > > const &ins,int num_threads){
+ DEFINE_DECODE_BATCH_FUNC_IMPL(DecodeIds, int, std::string);
+@@ -3628,7 +3649,9 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__NBes
+ SWIGINTERN sentencepiece::ImmutableNBestSentencePieceText sentencepiece_SentencePieceProcessor__NBestEncodeAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int nbest_size,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ RewriteIds(*self, static_cast<sentencepiece::ImmutableSentencePieceText *>(nullptr),
+ add_bos, add_eos, reverse, emit_unk_piece);
+- return self->NBestEncodeAsImmutableProto(text, nbest_size);
++ auto proto = self->NBestEncodeAsImmutableProto(text, nbest_size);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+ SWIGINTERN std::vector< std::pair< std::vector< int >,float > > sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsIds(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ auto idss = self->SampleEncodeAndScoreAsIds(text, num_samples,
+@@ -3655,8 +3678,10 @@ SWIGINTERN sentencepiece::util::bytes sentencepiece_SentencePieceProcessor__Samp
+ SWIGINTERN sentencepiece::ImmutableNBestSentencePieceText sentencepiece_SentencePieceProcessor__SampleEncodeAndScoreAsImmutableProto(sentencepiece::SentencePieceProcessor const *self,absl::string_view text,int num_samples,float alpha,bool wor,bool include_best,bool add_bos,bool add_eos,bool reverse,bool emit_unk_piece){
+ RewriteIds(*self, static_cast<sentencepiece::util::bytes *>(nullptr),
+ add_bos, add_eos, reverse, emit_unk_piece);
+- return self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
++ auto proto = self->SampleEncodeAndScoreAsImmutableProto(text, num_samples,
+ alpha, wor, include_best);
++ proto.ConvertToUnicodeSpans();
++ return proto;
+ }
+ SWIGINTERN float sentencepiece_SentencePieceProcessor__CalculateEntropy(sentencepiece::SentencePieceProcessor *self,absl::string_view text,float alpha){
+ return self->CalculateEntropy(text, alpha);
+diff --git a/src/sentencepiece_processor.cc b/src/sentencepiece_processor.cc
+index 2a5c399..f0df2f6 100644
+--- a/src/sentencepiece_processor.cc
++++ b/src/sentencepiece_processor.cc
+@@ -56,14 +56,14 @@ std::vector<absl::string_view> ToPieceArray(const std::vector<std::string> &v) {
+ }
+
+ void ConvertToUnicodeSpansInternal(SentencePieceText *spt) {
+- if (spt == nullptr) return;
++ if (spt == nullptr || spt->text().empty()) return;
+
+ std::vector<int> utf8_to_unicode(spt->text().size() + 1, 0);
+ absl::string_view str = spt->text();
+ size_t prev = 0;
+ int ulen = 0;
+ while (!str.empty()) {
+- const size_t mblen = string_util::OneCharLen(str.data());
++ const size_t mblen = std::max<int>(1, string_util::OneCharLen(str.data()));
+ for (int i = prev; i < prev + mblen; ++i) {
+ utf8_to_unicode[i] = ulen;
+ }
+diff --git a/src/sentencepiece_processor.h b/src/sentencepiece_processor.h
+index be9449e..14b1e8c 100644
+--- a/src/sentencepiece_processor.h
++++ b/src/sentencepiece_processor.h
+@@ -419,47 +419,33 @@ class SentencePieceProcessor {
+
+ virtual util::Status Decode(const std::vector<int> &ids,
+ SentencePieceText *spt) const;
+-
+-#ifndef SWIGPYTHON
+-
+-#define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+- OutType output; \
+- const auto status = FuncName(__VA_ARGS__, &output); \
+- return output;
+-
+-#define DEFINE_SPP_SERIALIZED_PROTO_IMPL(FuncName, OutType, ...) \
+- OutType output; \
+- const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- return output.SerializeAsString();
+-
+-#define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
+- OutType output; \
+- const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- return output;
+-
++#ifdef SWIG
++#define SPP_SWIG_CHECK_AND_THROW \
++ if (!status.ok()) throw status;
+ #else
++#define SPP_SWIG_CHECK_AND_THROW \
++ if (!status.ok()) { \
++ }
++#endif // SWIG
+
+ #define DEFINE_SPP_DIRECT_FUNC_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, &output); \
+- if (!status.ok()) throw status; \
++ SPP_SWIG_CHECK_AND_THROW; \
+ return output;
+
+ #define DEFINE_SPP_SERIALIZED_PROTO_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- if (!status.ok()) throw status; \
++ SPP_SWIG_CHECK_AND_THROW; \
+ return output.SerializeAsString();
+
+ #define DEFINE_SPP_IMMUTABLE_PROTO_IMPL(FuncName, OutType, ...) \
+ OutType output; \
+ const auto status = FuncName(__VA_ARGS__, output.mutable_proto()); \
+- if (!status.ok()) throw status; \
+- output.ConvertToUnicodeSpans(); \
++ SPP_SWIG_CHECK_AND_THROW; \
+ return output;
+
+-#endif // SWIGPYTHON
+-
+ //////////////////////////////////////////////////////////////
+ // Handy methods that return the result directly.
+ // These functions ignore internal errors.
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Thu, 4 Aug 2022 16:03:31 +0900
+Subject: Uses property in immutable proto
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/setup.py | 3 +-
+ python/src/sentencepiece/__init__.py | 128 ++++++++++++------
+ python/src/sentencepiece/sentencepiece.i | 143 ++++++++++++++------
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 168 ++++++------------------
+ python/test/sentencepiece_test.py | 68 +++++-----
+ 5 files changed, 265 insertions(+), 245 deletions(-)
+
+diff --git a/python/setup.py b/python/setup.py
+index 3438ddd..fdf9394 100755
+--- a/python/setup.py
++++ b/python/setup.py
+@@ -96,7 +96,6 @@ class build_ext(_build_ext):
+ else:
+ cflags.append('-Wl,-strip-all')
+ libs.append('-Wl,-strip-all')
+- cflags.append('-DSWIGPYTHON')
+ print('## cflags={}'.format(' '.join(cflags)))
+ print('## libs={}'.format(' '.join(libs)))
+ ext.extra_compile_args = cflags
+@@ -116,7 +115,7 @@ if os.name == 'nt':
+ '..\\build\\root_{}\\lib\\sentencepiece_train.lib'.format(arch)
+ ]
+ else:
+- cflags = ['/std:c++17', '/MT', '/I..\\build\\root\\include', '/DSWIGPYTHON']
++ cflags = ['/std:c++17', '/MT', '/I..\\build\\root\\include']
+ libs = [
+ '..\\build\\root\\lib\\sentencepiece.lib',
+ '..\\build\\root\\lib\\sentencepiece_train.lib'
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index 2a91022..12dc631 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -69,20 +69,36 @@ class ImmutableSentencePieceText_ImmutableSentencePiece(object):
+ _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_swiginit(self, _sentencepiece.new_ImmutableSentencePieceText_ImmutableSentencePiece())
+ __swig_destroy__ = _sentencepiece.delete_ImmutableSentencePieceText_ImmutableSentencePiece
+
+- def piece(self):
+- return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_piece(self)
++ def _piece(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece__piece(self)
+
+- def surface(self):
+- return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_surface(self)
++ def _surface(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece__surface(self)
+
+- def id(self):
+- return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_id(self)
++ def _id(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece__id(self)
+
+- def begin(self):
+- return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_begin(self)
++ def _begin(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece__begin(self)
++
++ def _end(self):
++ return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece__end(self)
++
++ piece = property(_piece)
++ surface = property(_surface)
++ id = property(_id)
++ begin = property(_begin)
++ end = property(_end)
++
++ def __str__(self):
++ return ('piece: \"{}\"\n'
++ 'id: {}\n'
++ 'surface: \"{}\"\n'
++ 'begin: {}\n'
++ 'end: {}\n').format(self.piece, self.id, self.surface,
++ self.begin, self.end)
++ __repr__ = __str__
+
+- def end(self):
+- return _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_end(self)
+
+ # Register ImmutableSentencePieceText_ImmutableSentencePiece in _sentencepiece:
+ _sentencepiece.ImmutableSentencePieceText_ImmutableSentencePiece_swigregister(ImmutableSentencePieceText_ImmutableSentencePiece)
+@@ -95,32 +111,45 @@ class ImmutableSentencePieceText(object):
+ _sentencepiece.ImmutableSentencePieceText_swiginit(self, _sentencepiece.new_ImmutableSentencePieceText())
+ __swig_destroy__ = _sentencepiece.delete_ImmutableSentencePieceText
+
+- def pieces_size(self):
+- return _sentencepiece.ImmutableSentencePieceText_pieces_size(self)
++ def _pieces_size(self):
++ return _sentencepiece.ImmutableSentencePieceText__pieces_size(self)
+
+- def pieces(self, index):
+- return _sentencepiece.ImmutableSentencePieceText_pieces(self, index)
++ def _pieces(self, index):
++ return _sentencepiece.ImmutableSentencePieceText__pieces(self, index)
+
+- def text(self):
+- return _sentencepiece.ImmutableSentencePieceText_text(self)
++ def _text(self):
++ return _sentencepiece.ImmutableSentencePieceText__text(self)
+
+- def score(self):
+- return _sentencepiece.ImmutableSentencePieceText_score(self)
++ def _score(self):
++ return _sentencepiece.ImmutableSentencePieceText__score(self)
+
+ def SerializeAsString(self):
+ return _sentencepiece.ImmutableSentencePieceText_SerializeAsString(self)
+
+- def _pieces(self, index):
+- return _sentencepiece.ImmutableSentencePieceText__pieces(self, index)
++ text = property(_text)
++ score = property(_score)
+
+- def pieces(self, i):
+- return self._pieces(i)
++ class ImmutableSentencePieceIterator:
++ def __init__(self, proto):
++ self.proto = proto
++ self.len = self.proto._pieces_size()
+
+- def __len__(self):
+- return self.pieces_size()
++ def __len__(self):
++ return self.len
++
++ def __getitem__(self, index):
++ if index < 0 or index >= self.len:
++ raise IndexError('piece index is out of range')
++ return self.proto._pieces(index)
++
++ def __str__(self):
++ return '\n'.join(['pieces {{\n{}}}'.format(str(x)) for x in self])
++
++ __repr__ = __str__
+
+- def __getitem__(self, i):
+- return self._pieces(i)
++ @property
++ def pieces(self):
++ return ImmutableSentencePieceText.ImmutableSentencePieceIterator(self)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
+@@ -128,6 +157,14 @@ class ImmutableSentencePieceText(object):
+ def __hash__(self):
+ return hash(self.SerializeAsString())
+
++ def __str__(self):
++ return ('text: \"{}\"\n'
++ 'score: {}\n'
++ '{}').format(self.text, self.score,
++ '\n'.join(['pieces {{\n{}}}'.format(str(x)) for x in self.pieces]))
++
++ __repr__ = __str__
++
+
+ # Register ImmutableSentencePieceText in _sentencepiece:
+ _sentencepiece.ImmutableSentencePieceText_swigregister(ImmutableSentencePieceText)
+@@ -140,26 +177,36 @@ class ImmutableNBestSentencePieceText(object):
+ _sentencepiece.ImmutableNBestSentencePieceText_swiginit(self, _sentencepiece.new_ImmutableNBestSentencePieceText())
+ __swig_destroy__ = _sentencepiece.delete_ImmutableNBestSentencePieceText
+
+- def nbests_size(self):
+- return _sentencepiece.ImmutableNBestSentencePieceText_nbests_size(self)
++ def _nbests_size(self):
++ return _sentencepiece.ImmutableNBestSentencePieceText__nbests_size(self)
+
+- def nbests(self, index):
+- return _sentencepiece.ImmutableNBestSentencePieceText_nbests(self, index)
++ def _nbests(self, index):
++ return _sentencepiece.ImmutableNBestSentencePieceText__nbests(self, index)
+
+ def SerializeAsString(self):
+ return _sentencepiece.ImmutableNBestSentencePieceText_SerializeAsString(self)
+
+- def _nbests(self, index):
+- return _sentencepiece.ImmutableNBestSentencePieceText__nbests(self, index)
++ class ImmutableSentencePieceTextIterator:
++ def __init__(self, proto):
++ self.proto = proto
++ self.len = self.proto._nbests_size()
+
+- def __nbests__(self, i):
+- return self._nbests(i)
++ def __len__(self):
++ return self.len
+
+- def __len__(self):
+- return self.nbests_size()
++ def __getitem__(self, index):
++ if index < 0 or index >= self.len:
++ raise IndexError('nbests index is out of range')
++ return self.proto._nbests(index)
++
++ def __str__(self):
++ return '\n'.join(['nbests {{\n{}}}'.format(str(x)) for x in self])
++
++ __repr__ = __str__
+
+- def __getitem__(self, i):
+- return self._nbests(i)
++ @property
++ def nbests(self):
++ return ImmutableNBestSentencePieceText.ImmutableSentencePieceTextIterator(self)
+
+ def __eq__(self, other):
+ return self.SerializeAsString() == other.SerializeAsString()
+@@ -167,6 +214,11 @@ class ImmutableNBestSentencePieceText(object):
+ def __hash__(self):
+ return hash(self.SerializeAsString())
+
++ def __str__(self):
++ return '\n'.join(['nbests {{\n{}}}'.format(str(x)) for x in self.nbests])
++
++ __repr__ = __str__
++
+
+ # Register ImmutableNBestSentencePieceText in _sentencepiece:
+ _sentencepiece.ImmutableNBestSentencePieceText_swigregister(ImmutableNBestSentencePieceText)
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 1a94fef..8309fc2 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -1239,60 +1239,117 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ }
+ }
+
++%extend sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece {
++ %rename(_piece) piece;
++ %rename(_id) id;
++ %rename(_surface) surface;
++ %rename(_begin) begin;
++ %rename(_end) end;
++
++ %pythoncode %{
++ piece = property(_piece)
++ surface = property(_surface)
++ id = property(_id)
++ begin = property(_begin)
++ end = property(_end)
++
++ def __str__(self):
++ return ('piece: \"{}\"\n'
++ 'id: {}\n'
++ 'surface: \"{}\"\n'
++ 'begin: {}\n'
++ 'end: {}\n').format(self.piece, self.id, self.surface,
++ self.begin, self.end)
++ __repr__ = __str__
++ %}
++}
++
+ %extend sentencepiece::ImmutableSentencePieceText {
+- ImmutableSentencePieceText_ImmutableSentencePiece _pieces(int index) const {
+- if (index < 0 || index >= static_cast<int>($self->pieces_size())) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece index is out of range.");
+- }
+- return $self->pieces(index);
+- }
++ %rename(_text) text;
++ %rename(_score) score;
++ %rename(_pieces) pieces;
++ %rename(_pieces_size) pieces_size;
++
++ %pythoncode %{
++ text = property(_text)
++ score = property(_score)
++
++ class ImmutableSentencePieceIterator:
++ def __init__(self, proto):
++ self.proto = proto
++ self.len = self.proto._pieces_size()
++
++ def __len__(self):
++ return self.len
++
++ def __getitem__(self, index):
++ if index < 0 or index >= self.len:
++ raise IndexError('piece index is out of range')
++ return self.proto._pieces(index)
++
++ def __str__(self):
++ return '\n'.join(['pieces {{\n{}}}'.format(str(x)) for x in self])
++
++ __repr__ = __str__
++
++ @property
++ def pieces(self):
++ return ImmutableSentencePieceText.ImmutableSentencePieceIterator(self)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
++
++ def __hash__(self):
++ return hash(self.SerializeAsString())
++
++ def __str__(self):
++ return ('text: \"{}\"\n'
++ 'score: {}\n'
++ '{}').format(self.text, self.score,
++ '\n'.join(['pieces {{\n{}}}'.format(str(x)) for x in self.pieces]))
++
++ __repr__ = __str__
++ %}
++}
+
+-%pythoncode {
+- def pieces(self, i):
+- return self._pieces(i)
++%extend sentencepiece::ImmutableNBestSentencePieceText {
++ %rename(_nbests) nbests;
++ %rename(_nbests_size) nbests_size;
+
+- def __len__(self):
+- return self.pieces_size()
++ %pythoncode %{
++ class ImmutableSentencePieceTextIterator:
++ def __init__(self, proto):
++ self.proto = proto
++ self.len = self.proto._nbests_size()
+
+- def __getitem__(self, i):
+- return self._pieces(i)
++ def __len__(self):
++ return self.len
+
+- def __eq__(self, other):
+- return self.SerializeAsString() == other.SerializeAsString()
++ def __getitem__(self, index):
++ if index < 0 or index >= self.len:
++ raise IndexError('nbests index is out of range')
++ return self.proto._nbests(index)
+
+- def __hash__(self):
+- return hash(self.SerializeAsString())
+-}
+-}
+-
+-%extend sentencepiece::ImmutableNBestSentencePieceText {
+- ImmutableSentencePieceText _nbests(int index) const {
+- if (index < 0 || index >= static_cast<int>($self->nbests_size())) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "nbest index is out of range.");
+- }
+- return $self->nbests(index);
+- }
++ def __str__(self):
++ return '\n'.join(['nbests {{\n{}}}'.format(str(x)) for x in self])
+
+-%pythoncode {
+- def __nbests__(self, i):
+- return self._nbests(i)
++ __repr__ = __str__
+
+- def __len__(self):
+- return self.nbests_size()
++ @property
++ def nbests(self):
++ return ImmutableNBestSentencePieceText.ImmutableSentencePieceTextIterator(self)
++
++ def __eq__(self, other):
++ return self.SerializeAsString() == other.SerializeAsString()
+
+- def __getitem__(self, i):
+- return self._nbests(i)
++ def __hash__(self):
++ return hash(self.SerializeAsString())
+
+- def __eq__(self, other):
+- return self.SerializeAsString() == other.SerializeAsString()
++ def __str__(self):
++ return '\n'.join(['nbests {{\n{}}}'.format(str(x)) for x in self.nbests])
+
+- def __hash__(self):
+- return hash(self.SerializeAsString())
+-}
++ __repr__ = __str__
++ %}
+ }
+
+ %typemap(out) std::vector<int> {
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 4b8b5ef..0a8df5f 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -3299,22 +3299,6 @@ SWIG_From_float (float value)
+ return SWIG_From_double (value);
+ }
+
+-SWIGINTERN sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece sentencepiece_ImmutableSentencePieceText__pieces(sentencepiece::ImmutableSentencePieceText const *self,int index){
+- if (index < 0 || index >= static_cast<int>(self->pieces_size())) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "piece index is out of range.");
+- }
+- return self->pieces(index);
+- }
+-SWIGINTERN sentencepiece::ImmutableSentencePieceText sentencepiece_ImmutableNBestSentencePieceText__nbests(sentencepiece::ImmutableNBestSentencePieceText const *self,int index){
+- if (index < 0 || index >= static_cast<int>(self->nbests_size())) {
+- throw sentencepiece::util::Status(
+- sentencepiece::util::StatusCode::kOutOfRange,
+- "nbest index is out of range.");
+- }
+- return self->nbests(index);
+- }
+
+ SWIGINTERN swig_type_info*
+ SWIG_pchar_descriptor(void)
+@@ -3846,7 +3830,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_piece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece__piece(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
+ void *argp1 = 0 ;
+@@ -3858,7 +3842,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_pie
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_piece" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece__piece" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
+ {
+@@ -3880,7 +3864,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_surface(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece__surface(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
+ void *argp1 = 0 ;
+@@ -3892,7 +3876,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_sur
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_surface" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece__surface" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
+ {
+@@ -3914,7 +3898,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece__id(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
+ void *argp1 = 0 ;
+@@ -3926,7 +3910,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_id(
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_id" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece__id" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
+ {
+@@ -3945,7 +3929,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_begin(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece__begin(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
+ void *argp1 = 0 ;
+@@ -3957,7 +3941,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_beg
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_begin" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece__begin" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
+ {
+@@ -3976,7 +3960,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_end(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece__end(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *arg1 = (sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece *) 0 ;
+ void *argp1 = 0 ;
+@@ -3988,7 +3972,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_ImmutableSentencePiece_end
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece_end" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_ImmutableSentencePiece__end" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece * >(argp1);
+ {
+@@ -4069,7 +4053,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__pieces_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+ void *argp1 = 0 ;
+@@ -4081,7 +4065,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces_size(PyObject *SWIG
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces_size" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__pieces_size" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+ {
+@@ -4100,7 +4084,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+ int arg2 ;
+@@ -4111,15 +4095,15 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_pieces(PyObject *SWIGUNUSE
+ PyObject *swig_obj[2] ;
+ sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece result;
+
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText_pieces", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText__pieces", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+ if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText_pieces" "', argument " "2"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "2"" of type '" "int""'");
+ }
+ arg2 = static_cast< int >(val2);
+ {
+@@ -4138,7 +4122,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_text(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__text(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+ void *argp1 = 0 ;
+@@ -4150,7 +4134,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_text(PyObject *SWIGUNUSEDP
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_text" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__text" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+ {
+@@ -4172,7 +4156,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_score(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__score(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+ void *argp1 = 0 ;
+@@ -4184,7 +4168,7 @@ SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText_score(PyObject *SWIGUNUSED
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText_score" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__score" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+ {
+@@ -4236,44 +4220,6 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableSentencePieceText__pieces(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+- PyObject *resultobj = 0;
+- sentencepiece::ImmutableSentencePieceText *arg1 = (sentencepiece::ImmutableSentencePieceText *) 0 ;
+- int arg2 ;
+- void *argp1 = 0 ;
+- int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece result;
+-
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableSentencePieceText__pieces", 2, 2, swig_obj)) SWIG_fail;
+- res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, 0 | 0 );
+- if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "1"" of type '" "sentencepiece::ImmutableSentencePieceText const *""'");
+- }
+- arg1 = reinterpret_cast< sentencepiece::ImmutableSentencePieceText * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableSentencePieceText__pieces" "', argument " "2"" of type '" "int""'");
+- }
+- arg2 = static_cast< int >(val2);
+- {
+- try {
+- result = sentencepiece_ImmutableSentencePieceText__pieces((sentencepiece::ImmutableSentencePieceText const *)arg1,arg2);
+- ReleaseResultObject(resultobj);
+- }
+- catch (const sentencepiece::util::Status &status) {
+- SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+- }
+- }
+- resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece(static_cast< const sentencepiece::ImmutableSentencePieceText_ImmutableSentencePiece& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText_ImmutableSentencePiece, SWIG_POINTER_OWN | 0 );
+- return resultobj;
+-fail:
+- return NULL;
+-}
+-
+-
+ SWIGINTERN PyObject *ImmutableSentencePieceText_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *obj;
+ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
+@@ -4336,7 +4282,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText__nbests_size(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
+ void *argp1 = 0 ;
+@@ -4348,7 +4294,7 @@ SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests_size(PyObject
+ swig_obj[0] = args;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests_size" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText__nbests_size" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
+ {
+@@ -4367,7 +4313,7 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
++SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText__nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *resultobj = 0;
+ sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
+ int arg2 ;
+@@ -4378,15 +4324,15 @@ SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText_nbests(PyObject *SWIG
+ PyObject *swig_obj[2] ;
+ sentencepiece::ImmutableSentencePieceText result;
+
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText_nbests", 2, 2, swig_obj)) SWIG_fail;
++ if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText__nbests", 2, 2, swig_obj)) SWIG_fail;
+ res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
+ if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
++ SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+ }
+ arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
+ ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+ if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText_nbests" "', argument " "2"" of type '" "int""'");
++ SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "2"" of type '" "int""'");
+ }
+ arg2 = static_cast< int >(val2);
+ {
+@@ -4438,44 +4384,6 @@ fail:
+ }
+
+
+-SWIGINTERN PyObject *_wrap_ImmutableNBestSentencePieceText__nbests(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+- PyObject *resultobj = 0;
+- sentencepiece::ImmutableNBestSentencePieceText *arg1 = (sentencepiece::ImmutableNBestSentencePieceText *) 0 ;
+- int arg2 ;
+- void *argp1 = 0 ;
+- int res1 = 0 ;
+- int val2 ;
+- int ecode2 = 0 ;
+- PyObject *swig_obj[2] ;
+- sentencepiece::ImmutableSentencePieceText result;
+-
+- if (!SWIG_Python_UnpackTuple(args, "ImmutableNBestSentencePieceText__nbests", 2, 2, swig_obj)) SWIG_fail;
+- res1 = SWIG_ConvertPtr(swig_obj[0], &argp1,SWIGTYPE_p_sentencepiece__ImmutableNBestSentencePieceText, 0 | 0 );
+- if (!SWIG_IsOK(res1)) {
+- SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "1"" of type '" "sentencepiece::ImmutableNBestSentencePieceText const *""'");
+- }
+- arg1 = reinterpret_cast< sentencepiece::ImmutableNBestSentencePieceText * >(argp1);
+- ecode2 = SWIG_AsVal_int(swig_obj[1], &val2);
+- if (!SWIG_IsOK(ecode2)) {
+- SWIG_exception_fail(SWIG_ArgError(ecode2), "in method '" "ImmutableNBestSentencePieceText__nbests" "', argument " "2"" of type '" "int""'");
+- }
+- arg2 = static_cast< int >(val2);
+- {
+- try {
+- result = sentencepiece_ImmutableNBestSentencePieceText__nbests((sentencepiece::ImmutableNBestSentencePieceText const *)arg1,arg2);
+- ReleaseResultObject(resultobj);
+- }
+- catch (const sentencepiece::util::Status &status) {
+- SWIG_exception(ToSwigError(status.code()), status.ToString().c_str());
+- }
+- }
+- resultobj = SWIG_NewPointerObj((new sentencepiece::ImmutableSentencePieceText(static_cast< const sentencepiece::ImmutableSentencePieceText& >(result))), SWIGTYPE_p_sentencepiece__ImmutableSentencePieceText, SWIG_POINTER_OWN | 0 );
+- return resultobj;
+-fail:
+- return NULL;
+-}
+-
+-
+ SWIGINTERN PyObject *ImmutableNBestSentencePieceText_swigregister(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
+ PyObject *obj;
+ if (!SWIG_Python_UnpackTuple(args, "swigregister", 1, 1, &obj)) return NULL;
+@@ -8475,29 +8383,27 @@ static PyMethodDef SwigMethods[] = {
+ { "SWIG_PyInstanceMethod_New", SWIG_PyInstanceMethod_New, METH_O, NULL},
+ { "new_ImmutableSentencePieceText_ImmutableSentencePiece", _wrap_new_ImmutableSentencePieceText_ImmutableSentencePiece, METH_NOARGS, NULL},
+ { "delete_ImmutableSentencePieceText_ImmutableSentencePiece", _wrap_delete_ImmutableSentencePieceText_ImmutableSentencePiece, METH_O, NULL},
+- { "ImmutableSentencePieceText_ImmutableSentencePiece_piece", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_piece, METH_O, NULL},
+- { "ImmutableSentencePieceText_ImmutableSentencePiece_surface", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_surface, METH_O, NULL},
+- { "ImmutableSentencePieceText_ImmutableSentencePiece_id", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_id, METH_O, NULL},
+- { "ImmutableSentencePieceText_ImmutableSentencePiece_begin", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_begin, METH_O, NULL},
+- { "ImmutableSentencePieceText_ImmutableSentencePiece_end", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece_end, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece__piece", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece__piece, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece__surface", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece__surface, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece__id", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece__id, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece__begin", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece__begin, METH_O, NULL},
++ { "ImmutableSentencePieceText_ImmutableSentencePiece__end", _wrap_ImmutableSentencePieceText_ImmutableSentencePiece__end, METH_O, NULL},
+ { "ImmutableSentencePieceText_ImmutableSentencePiece_swigregister", ImmutableSentencePieceText_ImmutableSentencePiece_swigregister, METH_O, NULL},
+ { "ImmutableSentencePieceText_ImmutableSentencePiece_swiginit", ImmutableSentencePieceText_ImmutableSentencePiece_swiginit, METH_VARARGS, NULL},
+ { "new_ImmutableSentencePieceText", _wrap_new_ImmutableSentencePieceText, METH_NOARGS, NULL},
+ { "delete_ImmutableSentencePieceText", _wrap_delete_ImmutableSentencePieceText, METH_O, NULL},
+- { "ImmutableSentencePieceText_pieces_size", _wrap_ImmutableSentencePieceText_pieces_size, METH_O, NULL},
+- { "ImmutableSentencePieceText_pieces", _wrap_ImmutableSentencePieceText_pieces, METH_VARARGS, NULL},
+- { "ImmutableSentencePieceText_text", _wrap_ImmutableSentencePieceText_text, METH_O, NULL},
+- { "ImmutableSentencePieceText_score", _wrap_ImmutableSentencePieceText_score, METH_O, NULL},
+- { "ImmutableSentencePieceText_SerializeAsString", _wrap_ImmutableSentencePieceText_SerializeAsString, METH_O, NULL},
++ { "ImmutableSentencePieceText__pieces_size", _wrap_ImmutableSentencePieceText__pieces_size, METH_O, NULL},
+ { "ImmutableSentencePieceText__pieces", _wrap_ImmutableSentencePieceText__pieces, METH_VARARGS, NULL},
++ { "ImmutableSentencePieceText__text", _wrap_ImmutableSentencePieceText__text, METH_O, NULL},
++ { "ImmutableSentencePieceText__score", _wrap_ImmutableSentencePieceText__score, METH_O, NULL},
++ { "ImmutableSentencePieceText_SerializeAsString", _wrap_ImmutableSentencePieceText_SerializeAsString, METH_O, NULL},
+ { "ImmutableSentencePieceText_swigregister", ImmutableSentencePieceText_swigregister, METH_O, NULL},
+ { "ImmutableSentencePieceText_swiginit", ImmutableSentencePieceText_swiginit, METH_VARARGS, NULL},
+ { "new_ImmutableNBestSentencePieceText", _wrap_new_ImmutableNBestSentencePieceText, METH_NOARGS, NULL},
+ { "delete_ImmutableNBestSentencePieceText", _wrap_delete_ImmutableNBestSentencePieceText, METH_O, NULL},
+- { "ImmutableNBestSentencePieceText_nbests_size", _wrap_ImmutableNBestSentencePieceText_nbests_size, METH_O, NULL},
+- { "ImmutableNBestSentencePieceText_nbests", _wrap_ImmutableNBestSentencePieceText_nbests, METH_VARARGS, NULL},
+- { "ImmutableNBestSentencePieceText_SerializeAsString", _wrap_ImmutableNBestSentencePieceText_SerializeAsString, METH_O, NULL},
++ { "ImmutableNBestSentencePieceText__nbests_size", _wrap_ImmutableNBestSentencePieceText__nbests_size, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText__nbests", _wrap_ImmutableNBestSentencePieceText__nbests, METH_VARARGS, NULL},
++ { "ImmutableNBestSentencePieceText_SerializeAsString", _wrap_ImmutableNBestSentencePieceText_SerializeAsString, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText_swigregister", ImmutableNBestSentencePieceText_swigregister, METH_O, NULL},
+ { "ImmutableNBestSentencePieceText_swiginit", ImmutableNBestSentencePieceText_swiginit, METH_VARARGS, NULL},
+ { "new_SentencePieceProcessor", _wrap_new_SentencePieceProcessor, METH_NOARGS, NULL},
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 5e4af7f..ed792bd 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -305,6 +305,12 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ s4 = self.sp_.DecodePiecesAsImmutableProto(['foo', 'bar'])
+ s5 = self.sp_.DecodeIdsAsImmutableProto([20, 30])
+
++ print(s1)
++ print(s2)
++ print(s3)
++ print(s4)
++ print(s5)
++
+ t1 = self.sp_.encode_as_immutable_proto(text)
+ t2 = self.sp_.sample_encode_as_immutable_proto(text, 10, 0.2)
+ t3 = self.sp_.nbest_encode_as_immutable_proto(text, 10)
+@@ -339,35 +345,35 @@ class TestSentencepieceProcessor(unittest.TestCase):
+
+ v1 = self.sp_.EncodeAsIds(text)
+ v2 = self.sp_.EncodeAsPieces(text)
+- self.assertEqual([x.id() for x in s1], v1)
+- self.assertEqual([x.piece() for x in s1], v2)
+- self.assertEqual(text, s1.text())
++ self.assertEqual([x.id for x in s1.pieces], v1)
++ self.assertEqual([x.piece for x in s1.pieces], v2)
++ self.assertEqual(text, s1.text)
+
+- surfaces1 = [s1.text()[x.begin():x.end()] for x in s1]
+- surfaces2 = [x.surface() for x in s1]
++ surfaces1 = [s1.text[x.begin:x.end] for x in s1.pieces]
++ surfaces2 = [x.surface for x in s1.pieces]
+ self.assertEqual(surfaces1, surfaces2)
+
+ ids = []
+- for i in range(s1.pieces_size()):
+- ids.append(s1.pieces(i).id())
++ for i in range(len(s1.pieces)):
++ ids.append(s1.pieces[i].id)
+ self.assertEqual(ids, v1)
+
+ pieces = []
+- for i in range(s1.pieces_size()):
+- pieces.append(s1.pieces(i).piece())
++ for i in range(len(s1.pieces)):
++ pieces.append(s1.pieces[i].piece)
+ self.assertEqual(pieces, v2)
+
+ # Japanese offset
+ s1 = self.jasp_.EncodeAsImmutableProto('吾輩は猫である。Hello world. ABC 123')
+- surfaces1 = [s1.text()[x.begin():x.end()] for x in s1]
+- surfaces2 = [x.surface() for x in s1]
++ surfaces1 = [s1.text[x.begin:x.end] for x in s1.pieces]
++ surfaces2 = [x.surface for x in s1.pieces]
+ self.assertEqual(surfaces1, surfaces2)
+
+- ids = [x.id() for x in s1]
++ ids = [x.id for x in s1.pieces]
+ s2 = self.jasp_.DecodeIdsAsImmutableProto(ids)
+ self.assertEqual(s2, s1)
+
+- pieces = [x.piece() for x in s1]
++ pieces = [x.piece for x in s1.pieces]
+ s2 = self.jasp_.DecodePiecesAsImmutableProto(pieces)
+ self.assertEqual(s2, s1)
+
+@@ -395,29 +401,29 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(sp.encode([text], out_type='serialized_proto'), [sprotos])
+ self.assertEqual(sp.encode([text], out_type='immutable_proto'), [iprotos])
+
+- self.assertEqual(len(iprotos), len(pieces))
+- self.assertEqual(len(iprotos), len(ids))
+- self.assertEqual(iprotos.text(), text)
++ self.assertEqual(len(iprotos.pieces), len(pieces))
++ self.assertEqual(len(iprotos.pieces), len(ids))
++ self.assertEqual(iprotos.text, text)
+
+- self.assertEqual(len(iprotos2), len(pieces2))
+- self.assertEqual(len(iprotos2), len(ids2))
+- self.assertEqual(iprotos2.text(), text2)
++ self.assertEqual(len(iprotos2.pieces), len(pieces2))
++ self.assertEqual(len(iprotos2.pieces), len(ids2))
++ self.assertEqual(iprotos2.text, text2)
+
+- for i in range(len(iprotos)):
+- self.assertEqual(ids[i], iprotos.pieces(i).id())
+- self.assertEqual(pieces[i], iprotos.pieces(i).piece())
++ for i in range(len(iprotos.pieces)):
++ self.assertEqual(ids[i], iprotos.pieces[i].id)
++ self.assertEqual(pieces[i], iprotos.pieces[i].piece)
+
+- for i, piece in enumerate(iprotos):
+- self.assertEqual(ids[i], piece.id())
+- self.assertEqual(pieces[i], piece.piece())
++ for i, piece in enumerate(iprotos.pieces):
++ self.assertEqual(ids[i], piece.id)
++ self.assertEqual(pieces[i], piece.piece)
+
+- for i in range(len(iprotos2)):
+- self.assertEqual(ids2[i], iprotos2.pieces(i).id())
+- self.assertEqual(pieces2[i], iprotos2.pieces(i).piece())
++ for i in range(len(iprotos2.pieces)):
++ self.assertEqual(ids2[i], iprotos2.pieces[i].id)
++ self.assertEqual(pieces2[i], iprotos2.pieces[i].piece)
+
+- for i, piece in enumerate(iprotos2):
+- self.assertEqual(ids2[i], piece.id())
+- self.assertEqual(pieces2[i], piece.piece())
++ for i, piece in enumerate(iprotos2.pieces):
++ self.assertEqual(ids2[i], piece.id)
++ self.assertEqual(pieces2[i], piece.piece)
+
+ detok_ids = self.sp_.DecodeIds(ids)
+ detok_pieces = self.sp_.DecodePieces(pieces)
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Fri, 5 Aug 2022 14:47:02 +0900
+Subject: automatically detect the number of CPUs in batch processing.
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/__init__.py | 27 +++++++++++++--------
+ python/src/sentencepiece/sentencepiece.i | 32 ++++++++++++++++---------
+ python/src/sentencepiece/sentencepiece_wrap.cxx | 5 +++-
+ python/test/sentencepiece_test.py | 32 +++++++++++++++++++++++++
+ 4 files changed, 74 insertions(+), 22 deletions(-)
+
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index 12dc631..ce9d60d 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -97,6 +97,13 @@ class ImmutableSentencePieceText_ImmutableSentencePiece(object):
+ 'begin: {}\n'
+ 'end: {}\n').format(self.piece, self.id, self.surface,
+ self.begin, self.end)
++
++ def __eq__(self, other):
++ return self.piece == other.piece and self.id == other.id and self.surface == other.surface and self.begin == other.begin and self.end == other.end
++
++ def __hash__(self):
++ return hash(str(self))
++
+ __repr__ = __str__
+
+
+@@ -395,7 +402,7 @@ class SentencePieceProcessor(object):
+ enable_sampling=False,
+ nbest_size=-1,
+ alpha=0.1,
+- num_threads=1):
++ num_threads=-1):
+ """Initialzie sentencepieceProcessor.
+
+ Args:
+@@ -407,15 +414,15 @@ class SentencePieceProcessor(object):
+ reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+- nbest_size: sampling parameters for unigram. Invalid for BPE-Dropout.
++ nbest_size: sampling parameters for unigram. Invalid in BPE-Dropout.
+ nbest_size = {0,1}: No sampling is performed.
+ nbest_size > 1: samples from the nbest_size results.
+ nbest_size < 0: assuming that nbest_size is infinite and samples
+ from the all hypothesis (lattice) using
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and dropout probability of
+- merge operations for BPE-dropout.
+- num_threads: number of threads in batch processing.
++ merge operations for BPE-dropout.
++ num_threads: number of threads in batch processing (Default = -1, auto-detected)
+ """
+
+ _sentencepiece_processor_init_native(self)
+@@ -450,18 +457,18 @@ class SentencePieceProcessor(object):
+ out_type: output type. int or str.
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after
+- reversing (if enabled).
++ reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+- nbest_size: sampling parameters for unigram. Invalid for BPE-Dropout.
++ nbest_size: sampling parameters for unigram. Invalid in BPE-Dropout.
+ nbest_size = {0,1}: No sampling is performed.
+ nbest_size > 1: samples from the nbest_size results.
+ nbest_size < 0: assuming that nbest_size is infinite and samples
+- from the all hypothesis (lattice) using
+- forward-filtering-and-backward-sampling algorithm.
++ from the all hypothesis (lattice) using
++ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and merge probability for
+ BPE-dropout (probablity 'p' in BPE-dropout paper).
+- num_threads: the number of threads used in the batch processin (Default = 1).
++ num_threads: the number of threads used in the batch processing (Default = -1).
+ """
+
+ if out_type is None:
+@@ -722,7 +729,7 @@ class SentencePieceProcessor(object):
+
+ Args:
+ out_type: output type. str or 'serialized_proto' or 'immutable_proto' (Default = str)
+- num_threads: the number of threads used in the batch processin (Default = 1).
++ num_threads: the number of threads used in the batch processing (Default = -1).
+ """
+
+ if num_threads is None:
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 8309fc2..e22f763 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -233,9 +233,12 @@ class ThreadPool {
+
+ template <typename T>
+ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
++ if (*num_threads < 0) {
++ *num_threads = std::thread::hardware_concurrency();
++ }
+ *num_threads = std::max<int>(1,
+ std::min<int>({*num_threads,
+- static_cast<int>(ins.size()), 256}));
++ static_cast<int>(ins.size()), 256}));
+ }
+
+ #define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+@@ -675,7 +678,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ enable_sampling=False,
+ nbest_size=-1,
+ alpha=0.1,
+- num_threads=1):
++ num_threads=-1):
+ """Initialzie sentencepieceProcessor.
+
+ Args:
+@@ -687,15 +690,15 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+- nbest_size: sampling parameters for unigram. Invalid for BPE-Dropout.
++ nbest_size: sampling parameters for unigram. Invalid in BPE-Dropout.
+ nbest_size = {0,1}: No sampling is performed.
+ nbest_size > 1: samples from the nbest_size results.
+ nbest_size < 0: assuming that nbest_size is infinite and samples
+ from the all hypothesis (lattice) using
+ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and dropout probability of
+- merge operations for BPE-dropout.
+- num_threads: number of threads in batch processing.
++ merge operations for BPE-dropout.
++ num_threads: number of threads in batch processing (Default = -1, auto-detected)
+ """
+
+ _sentencepiece_processor_init_native(self)
+@@ -730,18 +733,18 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ out_type: output type. int or str.
+ add_bos: Add <s> to the result (Default = false)
+ add_eos: Add </s> to the result (Default = false) <s>/</s> is added after
+- reversing (if enabled).
++ reversing (if enabled).
+ reverse: Reverses the tokenized sequence (Default = false)
+ emit_unk_piece: Emits the unk literal string (Default = false)
+- nbest_size: sampling parameters for unigram. Invalid for BPE-Dropout.
++ nbest_size: sampling parameters for unigram. Invalid in BPE-Dropout.
+ nbest_size = {0,1}: No sampling is performed.
+ nbest_size > 1: samples from the nbest_size results.
+ nbest_size < 0: assuming that nbest_size is infinite and samples
+- from the all hypothesis (lattice) using
+- forward-filtering-and-backward-sampling algorithm.
++ from the all hypothesis (lattice) using
++ forward-filtering-and-backward-sampling algorithm.
+ alpha: Soothing parameter for unigram sampling, and merge probability for
+ BPE-dropout (probablity 'p' in BPE-dropout paper).
+- num_threads: the number of threads used in the batch processin (Default = 1).
++ num_threads: the number of threads used in the batch processing (Default = -1).
+ """
+
+ if out_type is None:
+@@ -1002,7 +1005,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+
+ Args:
+ out_type: output type. str or 'serialized_proto' or 'immutable_proto' (Default = str)
+- num_threads: the number of threads used in the batch processin (Default = 1).
++ num_threads: the number of threads used in the batch processing (Default = -1).
+ """
+
+ if num_threads is None:
+@@ -1260,6 +1263,13 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ 'begin: {}\n'
+ 'end: {}\n').format(self.piece, self.id, self.surface,
+ self.begin, self.end)
++
++ def __eq__(self, other):
++ return self.piece == other.piece and self.id == other.id and self.surface == other.surface and self.begin == other.begin and self.end == other.end
++
++ def __hash__(self):
++ return hash(str(self))
++
+ __repr__ = __str__
+ %}
+ }
+diff --git a/python/src/sentencepiece/sentencepiece_wrap.cxx b/python/src/sentencepiece/sentencepiece_wrap.cxx
+index 0a8df5f..1eac211 100644
+--- a/python/src/sentencepiece/sentencepiece_wrap.cxx
++++ b/python/src/sentencepiece/sentencepiece_wrap.cxx
+@@ -3042,9 +3042,12 @@ class ThreadPool {
+
+ template <typename T>
+ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
++ if (*num_threads < 0) {
++ *num_threads = std::thread::hardware_concurrency();
++ }
+ *num_threads = std::max<int>(1,
+ std::min<int>({*num_threads,
+- static_cast<int>(ins.size()), 256}));
++ static_cast<int>(ins.size()), 256}));
+ }
+
+ #define DEFINE_ENCODE_BATCH_FUNC_IMPL(FuncName, InType, OutType) \
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index ed792bd..6cbe077 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -332,6 +332,29 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(s4, y4)
+ self.assertEqual(s5, y5)
+
++ hset_piece = defaultdict(int)
++
++ # eq test
++ for i in range(len(s1.pieces)):
++ self.assertEqual(s1.pieces[i], t1.pieces[i])
++ hset_piece[s1.pieces[i]] += 1
++ hset_piece[t1.pieces[i]] += 1
++
++ self.assertEqual(len(hset_piece), len(s1.pieces))
++
++ # has test
++ hset = defaultdict(int)
++ hset[s1] += 1
++ hset[t1] += 1
++ hset[s3] += 1
++ hset[t3] += 1
++
++ self.assertEqual(len(hset), 2)
++ self.assertEqual(hset[s1], 2)
++ self.assertEqual(hset[s3], 2)
++ self.assertEqual(hset[t1], 2)
++ self.assertEqual(hset[t3], 2)
++
+ x1 = self.sp_.encode_as_serialized_proto(text)
+ x2 = self.sp_.sample_encode_as_serialized_proto(text, 10, 0.2)
+ x3 = self.sp_.nbest_encode_as_serialized_proto(text, 10)
+@@ -363,6 +386,15 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ pieces.append(s1.pieces[i].piece)
+ self.assertEqual(pieces, v2)
+
++ for v in s3.nbests:
++ self.assertEqual(text, v.text)
++ self.assertEqual(self.sp_.Decode([x.id for x in v.pieces]), text)
++
++ for i in range(len(s3.nbests)):
++ self.assertEqual(text, s3.nbests[i].text)
++ self.assertEqual(
++ self.sp_.Decode([x.id for x in s3.nbests[i].pieces]), text)
++
+ # Japanese offset
+ s1 = self.jasp_.EncodeAsImmutableProto('吾輩は猫である。Hello world. ABC 123')
+ surfaces1 = [s1.text[x.begin:x.end] for x in s1.pieces]
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Fri, 5 Aug 2022 16:34:44 +0900
+Subject: support slice in pieces/nbests objects
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/src/sentencepiece/__init__.py | 8 ++++++++
+ python/src/sentencepiece/sentencepiece.i | 8 ++++++++
+ python/test/sentencepiece_test.py | 4 ++++
+ 3 files changed, 20 insertions(+)
+
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index ce9d60d..cf06830 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -145,6 +145,10 @@ class ImmutableSentencePieceText(object):
+ return self.len
+
+ def __getitem__(self, index):
++ if isinstance(index, slice):
++ return [self.proto._pieces(i) for i in range(self.len)][index.start:index.stop:index.step]
++ if index < 0:
++ index = index + self.len
+ if index < 0 or index >= self.len:
+ raise IndexError('piece index is out of range')
+ return self.proto._pieces(index)
+@@ -202,6 +206,10 @@ class ImmutableNBestSentencePieceText(object):
+ return self.len
+
+ def __getitem__(self, index):
++ if isinstance(index, slice):
++ return [self.proto._nbests(i) for i in range(self.len)][index.start:index.stop:index.step]
++ if index < 0:
++ index = index + self.len
+ if index < 0 or index >= self.len:
+ raise IndexError('nbests index is out of range')
+ return self.proto._nbests(index)
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index e22f763..2ac68a8 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -1293,6 +1293,10 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return self.len
+
+ def __getitem__(self, index):
++ if isinstance(index, slice):
++ return [self.proto._pieces(i) for i in range(self.len)][index.start:index.stop:index.step]
++ if index < 0:
++ index = index + self.len
+ if index < 0 or index >= self.len:
+ raise IndexError('piece index is out of range')
+ return self.proto._pieces(index)
+@@ -1336,6 +1340,10 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return self.len
+
+ def __getitem__(self, index):
++ if isinstance(index, slice):
++ return [self.proto._nbests(i) for i in range(self.len)][index.start:index.stop:index.step]
++ if index < 0:
++ index = index + self.len
+ if index < 0 or index >= self.len:
+ raise IndexError('nbests index is out of range')
+ return self.proto._nbests(index)
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 6cbe077..92327ac 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -395,6 +395,10 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ self.assertEqual(
+ self.sp_.Decode([x.id for x in s3.nbests[i].pieces]), text)
+
++ # slice
++ self.assertEqual(s1.pieces[::-1], list(reversed(s1.pieces)))
++ self.assertEqual(s3.nbests[::-1], list(reversed(s3.nbests)))
++
+ # Japanese offset
+ s1 = self.jasp_.EncodeAsImmutableProto('吾輩は猫である。Hello world. ABC 123')
+ surfaces1 = [s1.text[x.begin:x.end] for x in s1.pieces]
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Fri, 5 Aug 2022 19:05:52 +0900
+Subject: Updated the document
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ README.md | 1 -
+ doc/api.md | 22 ++--
+ doc/options.md | 102 ++++++++++---------
+ python/README.md | 168 +++++++++++++------------------
+ python/src/sentencepiece/__init__.py | 22 +++-
+ python/src/sentencepiece/sentencepiece.i | 22 +++-
+ python/test/sentencepiece_test.py | 20 +++-
+ 7 files changed, 199 insertions(+), 158 deletions(-)
+
+diff --git a/README.md b/README.md
+index dc71b64..1986047 100644
+--- a/README.md
++++ b/README.md
+@@ -276,6 +276,5 @@ Then segment train/test corpus with ```--vocabulary``` option
+ * [Use custom text normalization rules](doc/normalization.md)
+ * [Use custom symbols](doc/special_symbols.md)
+ * [Python Module](python/README.md)
+-* [TensorFlow Module](tensorflow/README.md)
+ * [Segmentation and training algorithms in detail]
+
+diff --git a/doc/api.md b/doc/api.md
+index 797074c..ebde880 100644
+--- a/doc/api.md
++++ b/doc/api.md
+@@ -14,9 +14,9 @@ if (!status.ok()) {
+ // error
+ }
+
+-// You can also load a model from std::ifstream.
+-// std::ifstream in("//path/to/model.model");
+-// auto status = processor.Load(in);
++// You can also load a serialized model from std::string.
++// const std::stirng str = // Load blob contents from a file.
++// auto status = processor.LoadFromSerializedProto(str);
+ ```
+
+ ## Tokenize text (preprocessing)
+@@ -75,16 +75,20 @@ Calls `SentencePieceTrainer::Train` function to train sentencepiece model. You c
+ sentencepiece::SentencePieceTrainer::Train("--input=test/botchan.txt --model_prefix=m --vocab_size=1000");
+ ```
+
+-## SentencePieceText proto
+-You will want to use `SentencePieceText` class to obtain the pieces and ids at the same time. This proto also encodes a utf8-byte offset of each piece over user input or detokenized text.
++## ImmutableSentencePieceText
++You will want to use `ImmutableSentencePieceText` class to obtain the pieces and ids at the same time.
++This proto also encodes a utf8-byte offset of each piece over user input or detokenized text.
+
+ ```C++
+-#include <sentencepiece.pb.h>
++#include <sentencepiece_processor.h>
+
+-sentencepiece::SentencePieceText spt;
++sentencepiece::ImmutableSentencePieceText spt;
+
+ // Encode
+-processor.Encode("This is a test.", &spt);
++processor.Encode("This is a test.", spt.mutable_proto());
++
++// or
++// spt = processor.EncodeAsImmutableProto("This is a test.");
+
+ std::cout << spt.text() << std::endl; // This is the same as the input.
+ for (const auto &piece : spt.pieces()) {
+@@ -96,7 +100,7 @@ for (const auto &piece : spt.pieces()) {
+ }
+
+ // Decode
+-processor.Decode({10, 20, 30}, &spt);
++processor.Decode({10, 20, 30}, spt.mutable_proto());
+ std::cout << spt.text() << std::endl; // This is the same as the decoded string.
+ for (const auto &piece : spt.pieces()) {
+ // the same as above.
+diff --git a/doc/options.md b/doc/options.md
+index 26cf681..6cdc0f9 100644
+--- a/doc/options.md
++++ b/doc/options.md
+@@ -3,50 +3,60 @@
+ The training options for the `spm_train` can be listed using `spm_train --help`. Since the standard `pip install` of sentencepiece does not necessarily install `spm_train`, the options are also listed here.
+
+ ```
+---help (show help) type: bool default: false
+---version (show version) type: bool default: false
+---minloglevel (Messages logged at a lower level than this don't actually get logged anywhere) type: int default: 0
+---input (comma separated list of input sentences) type: std::string default: ""
+---input_format (Input format. Supported format is `text` or `tsv`.) type: std::string default: ""
+---model_prefix (output model prefix) type: std::string default: ""
+---model_type (model algorithm: unigram, bpe, word or char) type: std::string default: "unigram"
+---vocab_size (vocabulary size) type: int32 default: 8000
+---accept_language (comma-separated list of languages this model can accept) type: std::string default: ""
+---self_test_sample_size (the size of self test samples) type: int32 default: 0
+---character_coverage (character coverage to determine the minimum symbols) type: double default: 0.9995
+---input_sentence_size (maximum size of sentences the trainer loads) type: int32 default: 0
+---shuffle_input_sentence (Randomly sample input sentences in advance. Valid when --input_sentence_size > 0) type: bool default: true
+---seed_sentencepiece_size (the size of seed sentencepieces) type: int32 default: 1000000
+---shrinking_factor (Keeps top shrinking_factor pieces with respect to the loss) type: double default: 0.75
+---num_threads (number of threads for training) type: int32 default: 16
+---num_sub_iterations (number of EM sub-iterations) type: int32 default: 2
+---max_sentencepiece_length (maximum length of sentence piece) type: int32 default: 16
+---max_sentence_length (maximum length of sentence in byte) type: int32 default: 4192
+---split_by_unicode_script (use Unicode script to split sentence pieces) type: bool default: true
+---split_by_number (split tokens by numbers (0-9)) type: bool default: true
+---split_by_whitespace (use a white space to split sentence pieces) type: bool default: true
+---split_digits (split all digits (0-9) into separate pieces) type: bool default: false
+---treat_whitespace_as_suffix (treat whitespace marker as suffix instead of prefix.) type: bool default: false
+---control_symbols (comma separated list of control symbols) type: std::string default: ""
+---user_defined_symbols (comma separated list of user defined symbols) type: std::string default: ""
+---required_chars (UTF8 characters in this flag are always used in the character set regardless of --character_coverage) type: std::string default: ""
+---byte_fallback (decompose unknown pieces into UTF-8 byte pieces) type: bool default: false
+---vocabulary_output_piece_score (Define score in vocab file) type: bool default: true
+---normalization_rule_name (Normalization rule name. Choose from nfkc or identity) type: std::string default: "nmt_nfkc"
+---normalization_rule_tsv (Normalization rule TSV file. ) type: std::string default: ""
+---denormalization_rule_tsv (Denormalization rule TSV file.) type: std::string default: ""
+---add_dummy_prefix (Add dummy whitespace at the beginning of text) type: bool default: true
+---remove_extra_whitespaces (Removes leading, trailing, and duplicate internal whitespace) type: bool default: true
+---hard_vocab_limit (If set to false, --vocab_size is considered as a soft limit.) type: bool default: true
+---use_all_vocab (If set to true, use all tokens as vocab. Valid for word/char models.) type: bool default: false
+---unk_id (Override UNK (<unk>) id.) type: int32 default: 0
+---bos_id (Override BOS (<s>) id. Set -1 to disable BOS.) type: int32 default: 1
+---eos_id (Override EOS (</s>) id. Set -1 to disable EOS.) type: int32 default: 2
+---pad_id (Override PAD (<pad>) id. Set -1 to disable PAD.) type: int32 default: -1
+---unk_piece (Override UNK (<unk>) piece.) type: std::string default: "<unk>"
+---bos_piece (Override BOS (<s>) piece.) type: std::string default: "<s>"
+---eos_piece (Override EOS (</s>) piece.) type: std::string default: "</s>"
+---pad_piece (Override PAD (<pad>) piece.) type: std::string default: "<pad>"
+---unk_surface (Dummy surface string for <unk>. In decoding <unk> is decoded to `unk_surface`.) type: std::string default: " ⁇ "
+---train_extremely_large_corpus (Increase bit depth for unigram tokenization.) type: bool default: false
++Usage: ../build/src/spm_train [options] files
++
++ --input (comma separated list of input sentences) type: std::string default: ""
++ --input_format (Input format. Supported format is `text` or `tsv`.) type: std::string default: ""
++ --model_prefix (output model prefix) type: std::string default: ""
++ --model_type (model algorithm: unigram, bpe, word or char) type: std::string default: "unigram"
++ --vocab_size (vocabulary size) type: int32 default: 8000
++ --accept_language (comma-separated list of languages this model can accept) type: std::string default: ""
++ --self_test_sample_size (the size of self test samples) type: int32 default: 0
++ --character_coverage (character coverage to determine the minimum symbols) type: double default: 0.9995
++ --input_sentence_size (maximum size of sentences the trainer loads) type: std::uint64_t default: 0
++ --shuffle_input_sentence (Randomly sample input sentences in advance. Valid when --input_sentence_size > 0) type: bool default: true
++ --seed_sentencepiece_size (the size of seed sentencepieces) type: int32 default: 1000000
++ --shrinking_factor (Keeps top shrinking_factor pieces with respect to the loss) type: double default: 0.75
++ --num_threads (number of threads for training) type: int32 default: 16
++ --num_sub_iterations (number of EM sub-iterations) type: int32 default: 2
++ --max_sentencepiece_length (maximum length of sentence piece) type: int32 default: 16
++ --max_sentence_length (maximum length of sentence in byte) type: int32 default: 4192
++ --split_by_unicode_script (use Unicode script to split sentence pieces) type: bool default: true
++ --split_by_number (split tokens by numbers (0-9)) type: bool default: true
++ --split_by_whitespace (use a white space to split sentence pieces) type: bool default: true
++ --split_digits (split all digits (0-9) into separate pieces) type: bool default: false
++ --treat_whitespace_as_suffix (treat whitespace marker as suffix instead of prefix.) type: bool default: false
++ --allow_whitespace_only_pieces (allow pieces that only contain (consecutive) whitespace tokens) type: bool default: false
++ --control_symbols (comma separated list of control symbols) type: std::string default: ""
++ --control_symbols_file (load control_symbols from file.) type: std::string default: ""
++ --user_defined_symbols (comma separated list of user defined symbols) type: std::string default: ""
++ --user_defined_symbols_file (load user_defined_symbols from file.) type: std::string default: ""
++ --required_chars (UTF8 characters in this flag are always used in the character set regardless of --character_coverage) type: std::string default: ""
++ --required_chars_file (load required_chars from file.) type: std::string default: ""
++ --byte_fallback (decompose unknown pieces into UTF-8 byte pieces) type: bool default: false
++ --vocabulary_output_piece_score (Define score in vocab file) type: bool default: true
++ --normalization_rule_name (Normalization rule name. Choose from nfkc or identity) type: std::string default: "nmt_nfkc"
++ --normalization_rule_tsv (Normalization rule TSV file. ) type: std::string default: ""
++ --denormalization_rule_tsv (Denormalization rule TSV file.) type: std::string default: ""
++ --add_dummy_prefix (Add dummy whitespace at the beginning of text) type: bool default: true
++ --remove_extra_whitespaces (Removes leading, trailing, and duplicate internal whitespace) type: bool default: true
++ --hard_vocab_limit (If set to false, --vocab_size is considered as a soft limit.) type: bool default: true
++ --use_all_vocab (If set to true, use all tokens as vocab. Valid for word/char models.) type: bool default: false
++ --unk_id (Override UNK (<unk>) id.) type: int32 default: 0
++ --bos_id (Override BOS (<s>) id. Set -1 to disable BOS.) type: int32 default: 1
++ --eos_id (Override EOS (</s>) id. Set -1 to disable EOS.) type: int32 default: 2
++ --pad_id (Override PAD (<pad>) id. Set -1 to disable PAD.) type: int32 default: -1
++ --unk_piece (Override UNK (<unk>) piece.) type: std::string default: "<unk>"
++ --bos_piece (Override BOS (<s>) piece.) type: std::string default: "<s>"
++ --eos_piece (Override EOS (</s>) piece.) type: std::string default: "</s>"
++ --pad_piece (Override PAD (<pad>) piece.) type: std::string default: "<pad>"
++ --unk_surface (Dummy surface string for <unk>. In decoding <unk> is decoded to `unk_surface`.) type: std::string default: " ⁇ "
++ --train_extremely_large_corpus (Increase bit depth for unigram tokenization.) type: bool default: false
++ --random_seed (Seed value for random generator.) type: uint32 default: 4294967295
++ --enable_differential_privacy (Whether to add DP while training. Currently supported only by UNIGRAM model.) type: bool default: false
++ --differential_privacy_noise_level (Amount of noise to add for DP) type: float default: 0
++ --differential_privacy_clipping_threshold (Threshold for clipping the counts for DP) type: std::uint64_t default: 0
++ --help (show help) type: bool default: false
++ --version (show version) type: bool default: false
++ --minloglevel (Messages logged at a lower level than this don't actually get logged anywhere) type: int default: 0
+ ```
+diff --git a/python/README.md b/python/README.md
+index b683082..bc5a59a 100644
+--- a/python/README.md
++++ b/python/README.md
+@@ -9,10 +9,17 @@ For Linux (x64/i686), macOS, and Windows(win32/x64) environment, you can simply
+ % pip install sentencepiece
+ ```
+
+-To build and install the Python wrapper from source, please install [SentencePiece C++](https://github.com/google/sentencepiece#c-from-source) and try the following commands:
++To build and install the Python wrapper from source, try the following commands to build and install wheel package.
+ ```
+-% python setup.py build
+-% sudo python setup.py install
++% git clone https://github.com/google/sentencepiece.git
++% cd sentencepiece
++% mkdir build
++% cd build
++% cmake .. -DSPM_ENABLE_SHARED=OFF -DCMAKE_INSTALL_PREFIX=./root
++% make install
++% cd ../python
++% python setup.py bdist_wheel
++% pip install dist/sentencepiece*.whl
+ ```
+
+ If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try:
+@@ -22,21 +29,50 @@ If you don’t have write permission to the global site-packages directory or do
+
+ ## Usage
+
+-See [this google colab page](https://github.com/google/sentencepiece/blob/master/python/sentencepiece_python_module_example.ipynb) to run sentencepiece interactively. (Note: this sample is written in old interface.)
++See [this google colab page](https://github.com/google/sentencepiece/blob/master/python/sentencepiece_python_module_example.ipynb) to run sentencepiece interactively.
+
+ ### Segmentation
+ ```
+ % python
+ >>> import sentencepiece as spm
+ >>> sp = spm.SentencePieceProcessor(model_file='test/test_model.model')
++
+ >>> sp.encode('This is a test')
+ [284, 47, 11, 4, 15, 400]
++
+ >>> sp.encode(['This is a test', 'Hello world'], out_type=int)
+ [[284, 47, 11, 4, 15, 400], [151, 88, 21, 887]]
++
++>>> sp.encode_as_ids(['This is a test', 'Hello world'])
++[[284, 47, 11, 4, 15, 400], [151, 88, 21, 887]]
++
+ >>> sp.encode('This is a test', out_type=str)
+ ['▁This', '▁is', '▁a', '▁', 't', 'est']
++
+ >>> sp.encode(['This is a test', 'Hello world'], out_type=str)
+ [['▁This', '▁is', '▁a', '▁', 't', 'est'], ['▁He', 'll', 'o', '▁world']]
++
++>>> sp.encode_as_pieces(['This is a test', 'Hello world'])
++[['▁This', '▁is', '▁a', '▁', 't', 'est'], ['▁He', 'll', 'o', '▁world']]
++
++>>> proto = sp.encode('This is a test', out_type='immutable_proto')
++>>> for n in proto.pieces:
++... print('piece="{}" surface="{}" id={} begin={} end={}'.format(n.piece, n.surface, n.id, n.begin, n.end))
++...
++piece="▁This" surface="This" id=284 begin=0 end=4
++piece="▁is" surface=" is" id=47 begin=4 end=7
++piece="▁a" surface=" a" id=11 begin=7 end=9
++piece="▁" surface=" " id=4 begin=9 end=10
++piece="t" surface="t" id=15 begin=10 end=11
++piece="est" surface="est" id=400 begin=11 end=14
++
++>>> [[x.id for x in proto.pieces], [x.piece for x in proto.pieces], [x.begin for x in proto.pieces], [x.end for x in proto.pieces]]
++[[284, 47, 11, 4, 15, 400], ['▁This', '▁is', '▁a', '▁', 't', 'est'], [0, 4, 7, 9, 10, 11], [4, 7, 9, 10, 11, 14]]
++
++>>> proto2 = sp.encode_as_immutable_proto('This is a test')
++>>> proto2 == proto
++True
++
+ >>> for _ in range(10):
+ ... sp.encode('This is a test', out_type=str, enable_sampling=True, alpha=0.1, nbest_size=-1)
+ ...
+@@ -50,26 +86,55 @@ See [this google colab page](https://github.com/google/sentencepiece/blob/master
+ ['▁', 'T', 'h', 'is', '▁', 'is', '▁', 'a', '▁', 'te', 'st']
+ ['▁', 'This', '▁', 'i', 's', '▁a', '▁', 't', 'e', 'st']
+ ['▁This', '▁', 'is', '▁a', '▁', 't', 'est']
++
++>> sp.nbest_encode('This is a test', nbest_size=5, out_type=str)
++[['▁This', '▁is', '▁a', '▁', 't', 'est'],
++['▁This', '▁is', '▁a', '▁', 'te', 'st'],
++['▁This', '▁is', '▁a', '▁', 'te', 's', 't'],
++['▁This', '▁is', '▁a', '▁', 't', 'e', 'st'],
++['▁This', '▁is', '▁a', '▁', 't', 'es', 't']]
++
++>>> sp.sample_encode_and_score('This is a test', num_samples=5, alpha=0.1, out_type=str, wor=True)
++[(['▁This', '▁', 'i', 's', '▁a', '▁', 'te', 's', 't'], -3.043105125427246),
++(['▁This', '▁', 'i', 's', '▁a', '▁', 'te', 'st'], -2.8475849628448486),
++(['▁', 'This', '▁is', '▁', 'a', '▁', 'te', 'st'], -3.043248176574707),
++(['▁', 'This', '▁is', '▁a', '▁', 't', 'e', 'st'], -2.87727689743042),
++(['▁', 'This', '▁', 'i', 's', '▁', 'a', '▁', 't', 'est'], -3.6284031867980957)]
++
+ >>> sp.decode([284, 47, 11, 4, 15, 400])
+ 'This is a test'
++
+ >>> sp.decode([[284, 47, 11, 4, 15, 400], [151, 88, 21, 887]])
+ ['This is a test', 'Hello world']
++
++>>> proto = sp.decode([284, 47, 11, 4, 15, 400], out_type='immutable_proto')
++>>> proto.text
++'This is a test'
++
+ >>> sp.decode(['▁', 'This', '▁', 'is', '▁a', '▁', 't', 'e', 'st'])
+ 'This is a test'
++
+ >>> sp.decode([['▁This', '▁is', '▁a', '▁', 't', 'est'], ['▁He', 'll', 'o', '▁world']])
+ ['This is a test', 'Hello world']
++
+ >>> sp.get_piece_size()
+ 1000
++
+ >>> sp.id_to_piece(2)
+ '</s>'
++
+ >>> sp.id_to_piece([2, 3, 4])
+ ['</s>', '\r', '▁']
++
+ >>> sp.piece_to_id('<s>')
+ 1
++
+ >>> sp.piece_to_id(['</s>', '\r', '▁'])
+ [2, 3, 4]
++
+ >>> len(sp)
+ 1000
++
+ >>> sp['</s>']
+ 2
+ ```
+@@ -116,98 +181,3 @@ with urllib.request.urlopen(
+ sp = spm.SentencePieceProcessor(model_proto=model.getvalue())
+ print(sp.encode('this is test'))
+ ```
+-
+-
+-### Segmentation (old interface)
+-```
+-% python
+->>> import sentencepiece as spm
+->>> sp = spm.SentencePieceProcessor()
+->>> sp.Load("test/test_model.model")
+-True
+->>> sp.EncodeAsPieces("This is a test")
+-['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']
+->>> sp.EncodeAsIds("This is a test")
+-[284, 47, 11, 4, 15, 400]
+->>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'])
+-'This is a test'
+->>> sp.NBestEncodeAsPieces("This is a test", 5)
+-[['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'], ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 'te', 'st'], ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 'te', 's', 't'], ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'e', 'st'], ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'es', 't']]
+->>> for x in range(10):
+-... sp.SampleEncodeAsPieces("This is a test", -1, 0.1)
+-...
+-['\xe2\x96\x81', 'T', 'h', 'i', 's', '\xe2\x96\x81', 'is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'e', 's', 't']
+-['\xe2\x96\x81T', 'h', 'is', '\xe2\x96\x81is', '\xe2\x96\x81', 'a', '\xe2\x96\x81', 't', 'est']
+-['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81', 'a', '\xe2\x96\x81', 't', 'e', 'st']
+-['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'e', 'st']
+-['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'e', 's', 't']
+-['\xe2\x96\x81T', 'h', 'is', '\xe2\x96\x81', 'i', 's', '\xe2\x96\x81a', '\xe2\x96\x81', 'te', 's', 't']
+-['\xe2\x96\x81This', '\xe2\x96\x81', 'is', '\xe2\x96\x81a', '\xe2\x96\x81', 'te', 's', 't']
+-['\xe2\x96\x81This', '\xe2\x96\x81', 'i', 's', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'e', 'st']
+-['\xe2\x96\x81This', '\xe2\x96\x81', 'is', '\xe2\x96\x81', 'a', '\xe2\x96\x81', 't', 'e', 'st']
+-['\xe2\x96\x81This', '\xe2\x96\x81', 'i', 's', '\xe2\x96\x81', 'a', '\xe2\x96\x81', 'te', 's', 't']
+->>> sp.DecodeIds([284, 47, 11, 4, 15, 400])
+-'This is a test'
+->>> sp.GetPieceSize()
+-1000
+->>> sp.IdToPiece(2)
+-'</s>'
+->>> sp.PieceToId('</s>')
+-2
+->>> len(sp)
+-1000
+->>> sp['</s>']
+-2
+-```
+-
+-### Model Training (old interface)
+-Training is performed by passing parameters of [spm_train](https://github.com/google/sentencepiece#train-sentencepiece-model) to SentencePieceTrainer.Train() function.
+-
+-```
+->>> import sentencepiece as spm
+->>> spm.SentencePieceTrainer.Train('--input=test/botchan.txt --model_prefix=m --vocab_size=1000')
+-unigram_model_trainer.cc(494) LOG(INFO) Starts training with :
+-input: "test/botchan.txt"
+-model_prefix: "m"
+-model_type: UNIGRAM
+-..snip..
+-unigram_model_trainer.cc(529) LOG(INFO) EM sub_iter=0 size=1239 obj=10.4055 num_tokens=36256 num_tokens/piece=29.2623
+-unigram_model_trainer.cc(529) LOG(INFO) EM sub_iter=1 size=1239 obj=10.3187 num_tokens=36256 num_tokens/piece=29.2623
+-unigram_model_trainer.cc(529) LOG(INFO) EM sub_iter=0 size=1100 obj=10.5285 num_tokens=37633 num_tokens/piece=34.2118
+-unigram_model_trainer.cc(529) LOG(INFO) EM sub_iter=1 size=1100 obj=10.4973 num_tokens=37630 num_tokens/piece=34.2091
+-trainer_interface.cc(284) LOG(INFO) Saving model: m.model
+-trainer_interface.cc(293) LOG(INFO) Saving vocabs: m.vocab
+->>>
+-```
+-
+-## Python2/3 String/Unicode compatibility
+-Sentencepiece python wrapper accepts both Unicode string and legacy byte string.
+-The output string type is determined by the input string type.
+-The output type of IdToPiece/DecodeIds methods is *str*, but note that it is a legacy byte string in Python2 and Unicode string in Python3 respectively.
+-
+-* Python2:
+-```
+->>> sp.EncodeAsPieces('吾輩は猫である')
+-['\xe2\x96\x81', '\xe5\x90\xbe', '\xe8\xbc\xa9', '\xe3\x81\xaf', '\xe7\x8c\xab', '\xe3\x81\xa7\xe3\x81\x82\xe3\x82\x8b']
+->>> sp.EncodeAsPieces(u'吾輩は猫である')
+-[u'\u2581', u'\u543e', u'\u8f29', u'\u306f', u'\u732b', u'\u3067\u3042\u308b']
+->>> sp.EncodeAsPieces(u'吾輩は猫である'.encode('utf-8'))
+-['\xe2\x96\x81', '\xe5\x90\xbe', '\xe8\xbc\xa9', '\xe3\x81\xaf', '\xe7\x8c\xab', '\xe3\x81\xa7\xe3\x81\x82\xe3\x82\x8b']
+->>> sp.IdToPiece(10)
+-'\xe3\x81\xab'
+->>> type(sp.IdToPiece(10))
+-<type 'str'>
+-```
+-
+-* Python3:
+-```
+->>> sp.EncodeAsPieces('吾輩は猫である')
+-['▁', '吾', '輩', 'は', '猫', 'である']
+->>> sp.EncodeAsPieces('吾輩は猫である'.encode('utf-8'))
+-[b'\xe2\x96\x81', b'\xe5\x90\xbe', b'\xe8\xbc\xa9', b'\xe3\x81\xaf', b'\xe7\x8c\xab', b'\xe3\x81\xa7\xe3\x81\x82\xe3\x82\x8b']
+->>>
+->>> sp.IdToPiece(10)
+-'に'
+->>> type(sp.IdToPiece(10))
+-<class 'str'>
+-```
+diff --git a/python/src/sentencepiece/__init__.py b/python/src/sentencepiece/__init__.py
+index cf06830..911a2cb 100644
+--- a/python/src/sentencepiece/__init__.py
++++ b/python/src/sentencepiece/__init__.py
+@@ -635,7 +635,7 @@ class SentencePieceProcessor(object):
+ return _encode(input)
+
+
+- def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
++ def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
+ return self.NBestEncode(input=input, nbest_size=nbest_size,
+ out_type=str, **kwargs)
+
+@@ -732,6 +732,26 @@ class SentencePieceProcessor(object):
+ return _encode(input)
+
+
++ def SampleEncodeAndScoreAsPieces(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type=str, **kwargs)
++
++
++ def SampleEncodeAndScoreAsIds(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type=int, **kwargs)
++
++
++ def SampleEncodeAndScoreAsSerializedProto(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type='serialized_proto', **kwargs)
++
++
++ def SampleEncodeAndScoreAsImmutableProto(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type='immutable_proto', **kwargs)
++
++
+ def Decode(self, input, out_type=str, num_threads=None):
+ """Decode processed id or token sequences.
+
+diff --git a/python/src/sentencepiece/sentencepiece.i b/python/src/sentencepiece/sentencepiece.i
+index 2ac68a8..fc773e2 100644
+--- a/python/src/sentencepiece/sentencepiece.i
++++ b/python/src/sentencepiece/sentencepiece.i
+@@ -903,7 +903,7 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return _encode(input)
+
+
+- def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
++ def NBestEncodeAsPieces(self, input, nbest_size=None, **kwargs):
+ return self.NBestEncode(input=input, nbest_size=nbest_size,
+ out_type=str, **kwargs)
+
+@@ -1000,6 +1000,26 @@ inline void InitNumThreads(const std::vector<T> &ins, int *num_threads) {
+ return _encode(input)
+
+
++ def SampleEncodeAndScoreAsPieces(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type=str, **kwargs)
++
++
++ def SampleEncodeAndScoreAsIds(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type=int, **kwargs)
++
++
++ def SampleEncodeAndScoreAsSerializedProto(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type='serialized_proto', **kwargs)
++
++
++ def SampleEncodeAndScoreAsImmutableProto(self, input, num_samples=None, alpha=None, **kwargs):
++ return self.SampleEncodeAndScore(input=input, num_samples=num_samples, alpha=alpha,
++ out_type='immutable_proto', **kwargs)
++
++
+ def Decode(self, input, out_type=str, num_threads=None):
+ """Decode processed id or token sequences.
+
+diff --git a/python/test/sentencepiece_test.py b/python/test/sentencepiece_test.py
+index 92327ac..2b9ad28 100755
+--- a/python/test/sentencepiece_test.py
++++ b/python/test/sentencepiece_test.py
+@@ -566,7 +566,7 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ for n in sp.decode(results):
+ self.assertEqual(n, text)
+
+- # batch test
++ # batch test
+ results = sp.nbest_encode([text, text2], nbest_size=10, out_type=out_type)
+ self.assertEqual(
+ results,
+@@ -589,6 +589,19 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ for n in decoded:
+ self.assertEqual(n, text2)
+
++ self.assertEqual(
++ sp.nbest_encode(text, nbest_size=10, out_type=str),
++ sp.nbest_encode_as_pieces(text, nbest_size=10))
++ self.assertEqual(
++ sp.nbest_encode(text, nbest_size=10, out_type=int),
++ sp.nbest_encode_as_ids(text, nbest_size=10))
++ self.assertEqual(
++ sp.nbest_encode(text, nbest_size=10, out_type='serialized_proto'),
++ sp.nbest_encode_as_serialized_proto(text, nbest_size=10))
++ self.assertEqual(
++ sp.nbest_encode(text, nbest_size=10, out_type='immutable_proto'),
++ sp.nbest_encode_as_immutable_proto(text, nbest_size=10))
++
+ def test_sample_and_score(self):
+ sp = self.sp_
+ text = 'hello world'
+@@ -618,6 +631,11 @@ class TestSentencepieceProcessor(unittest.TestCase):
+ for n in results[1]:
+ self.assertEqual(sp.decode(n[0]), text2)
+
++ sp.sample_encode_and_score_as_pieces(text, 10)
++ sp.sample_encode_and_score_as_ids(text, 10)
++ sp.sample_encode_and_score_as_immutable_proto(text, 10)
++ sp.sample_encode_and_score_as_serialized_proto(text, 10)
++
+ def test_valid_range(self):
+ size = self.sp_.piece_size()
+ funcs = [
--- /dev/null
+From: Aleksey Morozov <36787333+amrzv@users.noreply.github.com>
+Date: Tue, 9 Aug 2022 15:15:30 +0300
+Subject: Fixed errors in example notebook
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ python/sentencepiece_python_module_example.ipynb | 44 ++++++++++--------------
+ 1 file changed, 19 insertions(+), 25 deletions(-)
+
+diff --git a/python/sentencepiece_python_module_example.ipynb b/python/sentencepiece_python_module_example.ipynb
+index 78464d1..1eb0f9c 100644
+--- a/python/sentencepiece_python_module_example.ipynb
++++ b/python/sentencepiece_python_module_example.ipynb
+@@ -216,7 +216,7 @@
+ "import tensorflow as tf\n",
+ "\n",
+ "# Assumes that m.model is stored in non-Posix file system.\n",
+- "serialized_model_proto = tf.gfile.GFile('m.model', 'rb').read()\n",
++ "serialized_model_proto = tf.io.gfile.GFile('m.model', 'rb').read()\n",
+ "\n",
+ "sp = spm.SentencePieceProcessor()\n",
+ "sp.load_from_serialized_proto(serialized_model_proto)\n",
+@@ -265,7 +265,7 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "## Example of user defined symbols\n",
++ "# Example of user defined symbols\n",
+ "spm.SentencePieceTrainer.train('--input=botchan.txt --model_prefix=m_user --user_defined_symbols=<sep>,<cls> --vocab_size=2000')\n",
+ "\n",
+ "sp_user = spm.SentencePieceProcessor()\n",
+@@ -307,7 +307,7 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "## Example of control symbols\n",
++ "# Example of control symbols\n",
+ "spm.SentencePieceTrainer.train('--input=botchan.txt --model_prefix=m_ctrl --control_symbols=<sep>,<cls> --vocab_size=2000')\n",
+ "\n",
+ "sp_ctrl = spm.SentencePieceProcessor()\n",
+@@ -564,7 +564,7 @@
+ "spm.SentencePieceTrainer.train('--input=botchan.txt --vocab_size=2000 --model_prefix=m --unk_surface=__UNKNOWN__')\n",
+ "sp = spm.SentencePieceProcessor()\n",
+ "sp.load('m.model')\n",
+- "print(sp.decode_ids([sp.unk_id()])) "
++ "print(sp.decode_ids([sp.unk_id()]))"
+ ],
+ "execution_count": 0,
+ "outputs": [
+@@ -608,7 +608,7 @@
+ "# There are two hyperparamenters for sampling (nbest_size and inverse temperature). see the paper [kudo18] for detail.\n",
+ "for n in range(10):\n",
+ " print(sp.sample_encode_as_pieces('hello world', -1, 0.1))\n",
+- " \n",
++ "\n",
+ "for n in range(10):\n",
+ " print(sp.sample_encode_as_ids('hello world', -1, 0.1))"
+ ],
+@@ -858,8 +858,6 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "import sentencepiece as spm\n",
+- "\n",
+ "# NFKC normalization and lower casing.\n",
+ "spm.SentencePieceTrainer.train('--input=botchan.txt --model_prefix=m --vocab_size=2000 --normalization_rule_name=nfkc_cf')\n",
+ "\n",
+@@ -903,11 +901,12 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "def tocode(s): \n",
+- " out = [] \n",
+- " for c in s: \n",
+- " out.append(str(hex(ord(c))).replace('0x', 'U+')) \n",
+- " return ' '.join(out) \n",
++ "def tocode(s):\n",
++ " out = []\n",
++ " for c in s:\n",
++ " out.append(str(hex(ord(c))).replace('0x', 'U+'))\n",
++ " return ' '.join(out)\n",
++ "\n",
+ "\n",
+ "# TSV format: source Unicode code points <tab> target code points\n",
+ "# normalize \"don't => do not, I'm => I am\"\n",
+@@ -923,7 +922,7 @@
+ "# m.model embeds the normalization rule compiled into an FST.\n",
+ "sp.load('m.model')\n",
+ "print(sp.encode_as_pieces(\"I'm busy\")) # normalzied to `I am busy'\n",
+- "print(sp.encode_as_pieces(\"I don't know it.\")) # normalized to 'I do not know it.'\n"
++ "print(sp.encode_as_pieces(\"I don't know it.\")) # normalized to 'I do not know it.'"
+ ],
+ "execution_count": 0,
+ "outputs": [
+@@ -1029,9 +1028,9 @@
+ " for piece in sp.encode_as_pieces(line):\n",
+ " freq.setdefault(piece, 0)\n",
+ " freq[piece] += 1\n",
+- " \n",
++ "\n",
+ "# only uses the token appearing more than 1000 times in the training data.\n",
+- "vocabs = list(filter(lambda x : x in freq and freq[x] > 1000, vocabs))\n",
++ "vocabs = list(filter(lambda x: x in freq and freq[x] > 1000, vocabs))\n",
+ "sp.set_vocabulary(vocabs)\n",
+ "print(sp.encode_as_pieces('this is a test.'))\n",
+ "\n",
+@@ -1133,20 +1132,17 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "freq={}\n",
++ "freq = {}\n",
+ "with open('botchan.txt', 'r') as f:\n",
+ " for line in f:\n",
+ " line = line.rstrip()\n",
+ " for piece in line.split():\n",
+ " freq.setdefault(piece, 0)\n",
+ " freq[piece] += 1\n",
+- " \n",
++ "\n",
+ "with open('word_freq_list.tsv', 'w') as f:\n",
+ " for k, v in freq.items():\n",
+ " f.write('%s\\t%d\\n' % (k, v))\n",
+- " \n",
+- "\n",
+- "import sentencepiece as spm\n",
+ "\n",
+ "spm.SentencePieceTrainer.train('--input=word_freq_list.tsv --input_format=tsv --model_prefix=m --vocab_size=2000')\n",
+ "sp = spm.SentencePieceProcessor()\n",
+@@ -1176,7 +1172,7 @@
+ "\n",
+ "Sentencepiece keeps track of byte offset (span) of each token, which is useful for highlighting the token on top of unnormalized text.\n",
+ "\n",
+- "We first need to install protobuf module and sentencepiece_pb2.py as the byte offsets and all other meta data for segementation are encoded in protocol buffer.\n",
++ "We first need to install protobuf module as the byte offsets and all other meta data for segementation are encoded in protocol buffer.\n",
+ "**encode_as_serialized_proto** method resturns serialized SentencePieceText proto. You can get the deserialized object by calling ParseFromString method.\n",
+ "\n",
+ "The definition of SentencePieceText proto is found [here](https://github.com/google/sentencepiece/blob/3be3f2e11e2bb923c579c6be5e7335809341587f/src/sentencepiece.proto#L23).\n"
+@@ -1194,8 +1190,7 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "!pip install protobuf\n",
+- "!wget https://raw.githubusercontent.com/google/sentencepiece/master/python/sentencepiece_pb2.py"
++ "!pip install protobuf"
+ ],
+ "execution_count": 0,
+ "outputs": [
+@@ -1233,8 +1228,7 @@
+ },
+ "cell_type": "code",
+ "source": [
+- "import sentencepiece_pb2\n",
+- "import sentencepiece as spm\n",
++ "from sentencepiece import sentencepiece_pb2\n",
+ "\n",
+ "spm.SentencePieceTrainer.train('--input=botchan.txt --model_prefix=m --vocab_size=2000')\n",
+ "\n",
--- /dev/null
+From: Aleksey Morozov <36787333+amrzv@users.noreply.github.com>
+Date: Tue, 9 Aug 2022 15:15:51 +0300
+Subject: Fix dead links
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ README.md | 2 +-
+ doc/experiments.md | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/README.md b/README.md
+index 1986047..84e853e 100644
+--- a/README.md
++++ b/README.md
+@@ -36,7 +36,7 @@ For those unfamiliar with SentencePiece as a software/algorithm, one can read [a
+ |:---|:---:|:---:|:---:|
+ |Supported algorithm|BPE, unigram, char, word|BPE|BPE*|
+ |OSS?|Yes|Yes|Google internal|
+-|Subword regularization|[Yes](#subword-regularization)|No|No|
++|Subword regularization|[Yes](#subword-regularization-and-bpe-dropout)|No|No|
+ |Python Library (pip)|[Yes](python/README.md)|No|N/A|
+ |C++ Library|[Yes](doc/api.md)|No|N/A|
+ |Pre-segmentation required?|[No](#whitespace-is-treated-as-a-basic-symbol)|Yes|Yes|
+diff --git a/doc/experiments.md b/doc/experiments.md
+index 5a58cd1..e088152 100644
+--- a/doc/experiments.md
++++ b/doc/experiments.md
+@@ -112,7 +112,7 @@ We have evaluated SentencePiece segmentation with the following configurations.
+ * [KFTT](http://www.phontron.com/kftt/index.html)
+ * [MultiUN](http://opus.lingfil.uu.se/MultiUN.php) (First 5M and next
+ 5k/5k sentences are used for training and development/testing respectively.)
+- * [WMT16](http://www.statmt.org/WMT16/)
++ * [WMT16](https://www.statmt.org/wmt16/)
+ * In-house: (Used 5M parallel sentences for training)
+
+ **NoPretok** and **WsPretok** do not use any language-dependent resources.
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Sat, 20 Aug 2022 23:34:37 +0900
+Subject: added ShutdownLibrary function to uninitialize global variables
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ src/compile_charsmap_main.cc | 1 +
+ src/error.cc | 3 +++
+ src/init.h | 15 +++++++++++++++
+ src/spm_decode_main.cc | 1 +
+ src/spm_encode_main.cc | 1 +
+ src/spm_export_vocab_main.cc | 6 +++---
+ src/spm_normalize_main.cc | 1 +
+ src/spm_train_main.cc | 1 +
+ src/test_main.cc | 1 +
+ 9 files changed, 27 insertions(+), 3 deletions(-)
+
+diff --git a/src/compile_charsmap_main.cc b/src/compile_charsmap_main.cc
+index 13bf822..da15328 100644
+--- a/src/compile_charsmap_main.cc
++++ b/src/compile_charsmap_main.cc
+@@ -156,6 +156,7 @@ struct BinaryBlob {
+ } // namespace sentencepiece
+
+ int main(int argc, char **argv) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+
+ const std::vector<std::pair<
+diff --git a/src/error.cc b/src/error.cc
+index 10faa2d..d3792dc 100644
+--- a/src/error.cc
++++ b/src/error.cc
+@@ -15,6 +15,7 @@
+ #include <cstring>
+
+ #include "common.h"
++#include "init.h"
+ #include "sentencepiece_processor.h"
+
+ #ifdef _USE_EXTERNAL_ABSL
+@@ -35,6 +36,7 @@ void Abort() {
+ SetTestCounter(2);
+ } else {
+ std::cerr << "Program terminated with an unrecoverable error." << std::endl;
++ ShutdownLibrary();
+ exit(-1);
+ }
+ }
+@@ -43,6 +45,7 @@ void Exit(int code) {
+ if (GetTestCounter() == 1) {
+ SetTestCounter(2);
+ } else {
++ ShutdownLibrary();
+ exit(code);
+ }
+ }
+diff --git a/src/init.h b/src/init.h
+index 090a2d9..7c75db2 100644
+--- a/src/init.h
++++ b/src/init.h
+@@ -18,6 +18,7 @@
+ #include "common.h"
+ #include "third_party/absl/flags/flag.h"
+ #include "third_party/absl/flags/parse.h"
++#include "third_party/protobuf-lite/google/protobuf/message_lite.h"
+
+ ABSL_DECLARE_FLAG(int32, minloglevel);
+
+@@ -35,6 +36,20 @@ inline void ParseCommandLineFlags(const char *usage, int *argc, char ***argv,
+
+ logging::SetMinLogLevel(absl::GetFlag(FLAGS_minloglevel));
+ }
++
++inline void ShutdownLibrary() {
++ google::protobuf::ShutdownProtobufLibrary();
++#ifdef HAS_ABSL_CLEANUP_FLAGS
++ absl::CleanupFlags();
++#endif
++}
++
++class ScopedResourceDestructor {
++ public:
++ ScopedResourceDestructor() {}
++ ~ScopedResourceDestructor() { ShutdownLibrary(); }
++};
++
+ } // namespace sentencepiece
+
+ #endif // INIT_H_
+diff --git a/src/spm_decode_main.cc b/src/spm_decode_main.cc
+index 3382ddc..bc49bd3 100644
+--- a/src/spm_decode_main.cc
++++ b/src/spm_decode_main.cc
+@@ -34,6 +34,7 @@ ABSL_FLAG(std::string, extra_options, "",
+ "':' separated encoder extra options, e.g., \"reverse:bos:eos\"");
+
+ int main(int argc, char *argv[]) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+ std::vector<std::string> rest_args;
+
+diff --git a/src/spm_encode_main.cc b/src/spm_encode_main.cc
+index b0e508d..2fbb850 100644
+--- a/src/spm_encode_main.cc
++++ b/src/spm_encode_main.cc
+@@ -51,6 +51,7 @@ ABSL_FLAG(bool, generate_vocabulary, false,
+ "Generates vocabulary file instead of segmentation");
+
+ int main(int argc, char *argv[]) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+ std::vector<std::string> rest_args;
+
+diff --git a/src/spm_export_vocab_main.cc b/src/spm_export_vocab_main.cc
+index b5d93cb..e5b97df 100644
+--- a/src/spm_export_vocab_main.cc
++++ b/src/spm_export_vocab_main.cc
+@@ -1,11 +1,10 @@
+-
+-
+ // Copyright 2016 Google Inc.
+ //
+ // Licensed under the Apache License, Version 2.0 (the "License");
+ // you may not use this file except in compliance with the License.
+ // You may obtain a copy of the License at
+-// n// http://www.apache.org/licenses/LICENSE-2.0
++//
++// http://www.apache.org/licenses/LICENSE-2.0
+ //
+ // Unless required by applicable law or agreed to in writing, software
+ // distributed under the License is distributed on an "AS IS" BASIS,
+@@ -29,6 +28,7 @@ ABSL_FLAG(std::string, output_format, "vocab",
+ "and scores, syms outputs pieces and indices.");
+
+ int main(int argc, char *argv[]) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+
+ sentencepiece::SentencePieceProcessor sp;
+diff --git a/src/spm_normalize_main.cc b/src/spm_normalize_main.cc
+index 96da360..39f3ef9 100644
+--- a/src/spm_normalize_main.cc
++++ b/src/spm_normalize_main.cc
+@@ -46,6 +46,7 @@ using sentencepiece::normalizer::Builder;
+ using sentencepiece::normalizer::Normalizer;
+
+ int main(int argc, char *argv[]) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+ std::vector<std::string> rest_args;
+
+diff --git a/src/spm_train_main.cc b/src/spm_train_main.cc
+index c34ee02..6ab634d 100644
+--- a/src/spm_train_main.cc
++++ b/src/spm_train_main.cc
+@@ -157,6 +157,7 @@ ABSL_FLAG(std::uint64_t, differential_privacy_clipping_threshold, 0,
+ " clipping the counts for DP");
+
+ int main(int argc, char *argv[]) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+
+ sentencepiece::TrainerSpec trainer_spec;
+diff --git a/src/test_main.cc b/src/test_main.cc
+index b3170e2..38c978d 100644
+--- a/src/test_main.cc
++++ b/src/test_main.cc
+@@ -24,6 +24,7 @@ ABSL_FLAG(std::string, test_srcdir, "../data", "Data directory.");
+ ABSL_FLAG(std::string, test_tmpdir, "test_tmp", "Temporary directory.");
+
+ int main(int argc, char **argv) {
++ sentencepiece::ScopedResourceDestructor cleaner;
+ sentencepiece::ParseCommandLineFlags(argv[0], &argc, &argv, true);
+ sentencepiece::test::RunAllTests();
+ return 0;
--- /dev/null
+From: Taku Kudo <taku@google.com>
+Date: Sun, 21 Aug 2022 12:44:31 +0900
+Subject: Fixed the issue of concatinating paths for pkg-config
+
+Signed-off-by: Kentaro Hayashi <kenhys@gmail.com>
+---
+ CMakeLists.txt | 24 ++++++++++++++++++++++++
+ sentencepiece.pc.in | 4 ++--
+ third_party/absl/flags/flag.cc | 20 +++++++++++++++-----
+ third_party/absl/flags/flag.h | 10 ++++++++--
+ 4 files changed, 49 insertions(+), 9 deletions(-)
+
+diff --git a/CMakeLists.txt b/CMakeLists.txt
+index 78379a3..382103b 100644
+--- a/CMakeLists.txt
++++ b/CMakeLists.txt
+@@ -94,6 +94,30 @@ if (NOT DEFINED CMAKE_INSTALL_INCDIR)
+ set(CMAKE_INSTALL_INCDIR include)
+ endif()
+
++# SPDX-License-Identifier: (MIT OR CC0-1.0)
++# Copyright 2020 Jan Tojnar
++# https://github.com/jtojnar/cmake-snips
++#
++# Modelled after Python’s os.path.join
++# https://docs.python.org/3.7/library/os.path.html#os.path.join
++# Windows not supported
++function(join_paths joined_path first_path_segment)
++ set(temp_path "${first_path_segment}")
++ foreach(current_segment IN LISTS ARGN)
++ if(NOT ("${current_segment}" STREQUAL ""))
++ if(IS_ABSOLUTE "${current_segment}")
++ set(temp_path "${current_segment}")
++ else()
++ set(temp_path "${temp_path}/${current_segment}")
++ endif()
++ endif()
++ endforeach()
++ set(${joined_path} "${temp_path}" PARENT_SCOPE)
++endfunction()
++
++join_paths(libdir_for_pc_file "\${exec_prefix}" "${CMAKE_INSTALL_LIBDIR}")
++join_paths(includedir_for_pc_file "\${prefix}" "${CMAKE_INSTALL_INCLUDEDIR}")
++
+ configure_file("${PROJECT_SOURCE_DIR}/config.h.in" "config.h")
+ configure_file("${PROJECT_SOURCE_DIR}/sentencepiece.pc.in" "sentencepiece.pc" @ONLY)
+
+diff --git a/sentencepiece.pc.in b/sentencepiece.pc.in
+index ac7fef6..6a5ba56 100644
+--- a/sentencepiece.pc.in
++++ b/sentencepiece.pc.in
+@@ -1,7 +1,7 @@
+ prefix=@prefix@
+ exec_prefix=@exec_prefix@
+-libdir=@libdir@
+-includedir=@includedir@
++libdir=@libdir_for_pc_file@
++includedir=@includedir_for_pc_file@
+
+ Name: @PROJECT_NAME@
+ Description: Unsupervised text tokenizer and detokenizer for Neural Network-based text generation.
+diff --git a/third_party/absl/flags/flag.cc b/third_party/absl/flags/flag.cc
+index 8e99c0d..5d6642a 100644
+--- a/third_party/absl/flags/flag.cc
++++ b/third_party/absl/flags/flag.cc
+@@ -61,8 +61,8 @@ struct FlagFunc {
+
+ namespace {
+
+-using FlagMap = std::map<std::string, FlagFunc *>;
+-using FlagList = std::vector<FlagFunc *>;
++using FlagMap = std::map<std::string, std::shared_ptr<FlagFunc>>;
++using FlagList = std::vector<std::shared_ptr<FlagFunc>>;
+
+ FlagMap *GetFlagMap() {
+ static auto *flag_map = new FlagMap;
+@@ -111,7 +111,7 @@ std::string PrintHelp(const char *programname) {
+ os << PACKAGE_STRING << "\n\n";
+ os << "Usage: " << programname << " [options] files\n\n";
+
+- for (const auto *func : *GetFlagList()) {
++ for (auto func : *GetFlagList()) {
+ os << " --" << func->name << " (" << func->help << ")";
+ os << " type: " << func->type << " default: " << func->default_value
+ << '\n';
+@@ -123,7 +123,7 @@ std::string PrintHelp(const char *programname) {
+ }
+ } // namespace
+
+-void RegisterFlag(const std::string &name, FlagFunc *func) {
++void RegisterFlag(const std::string &name, std::shared_ptr<FlagFunc> func) {
+ GetFlagList()->emplace_back(func);
+ GetFlagMap()->emplace(name, func);
+ }
+@@ -140,7 +140,7 @@ Flag<T>::Flag(const char *name, const char *type, const char *help,
+ func_->set_value = [this](const std::string &value) {
+ this->set_value_as_str(value);
+ };
+- RegisterFlag(name, func_.get());
++ RegisterFlag(name, func_);
+ }
+
+ template <typename T>
+@@ -219,4 +219,14 @@ std::vector<char *> ParseCommandLine(int argc, char *argv[]) {
+
+ return output_args;
+ }
++
++void CleanupFlags() {
++ static bool is_shutdown = false;
++ if (!is_shutdown) {
++ delete internal::GetFlagList();
++ delete internal::GetFlagMap();
++ is_shutdown = true;
++ }
++}
++
+ } // namespace absl
+diff --git a/third_party/absl/flags/flag.h b/third_party/absl/flags/flag.h
+index e540edf..c522358 100644
+--- a/third_party/absl/flags/flag.h
++++ b/third_party/absl/flags/flag.h
+@@ -24,7 +24,8 @@ namespace absl {
+ namespace internal {
+ struct FlagFunc;
+
+-void RegisterFlag(const std::string &name, FlagFunc *func);
++void RegisterFlag(const std::string &name, std::shared_ptr<FlagFunc> func);
++
+ } // namespace internal
+
+ template <typename T>
+@@ -39,7 +40,7 @@ class Flag {
+
+ private:
+ T value_;
+- std::unique_ptr<internal::FlagFunc> func_;
++ std::shared_ptr<internal::FlagFunc> func_;
+ };
+
+ template <typename T>
+@@ -52,6 +53,11 @@ void SetFlag(Flag<T> *flag, const V &v) {
+ const T value(v);
+ flag->set_value(value);
+ }
++
++#define HAS_ABSL_CLEANUP_FLAGS
++
++void CleanupFlags();
++
+ } // namespace absl
+
+ #define ABSL_FLAG(Type, name, defautl_value, help) \
--- /dev/null
+From: Kentaro Hayashi <kenhys@xdump.org>
+Date: Wed, 28 Oct 2020 20:55:20 +0900
+Subject: Disable static library explicitly
+
+---
+ src/CMakeLists.txt | 11 +----------
+ 1 file changed, 1 insertion(+), 10 deletions(-)
+
+diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
+index 6cb3922..e4a23ac 100644
+--- a/src/CMakeLists.txt
++++ b/src/CMakeLists.txt
+@@ -204,12 +204,6 @@ if (SPM_ENABLE_SHARED)
+ add_library(sentencepiece_train SHARED ${SPM_TRAIN_SRCS})
+ endif()
+
+-add_library(sentencepiece-static STATIC ${SPM_SRCS})
+-add_library(sentencepiece_train-static STATIC ${SPM_TRAIN_SRCS})
+-
+-target_link_libraries(sentencepiece-static INTERFACE ${SPM_LIBS})
+-target_link_libraries(sentencepiece_train-static INTERFACE sentencepiece-static ${SPM_LIBS})
+-
+ if (SPM_ENABLE_SHARED)
+ target_link_libraries(sentencepiece ${SPM_LIBS})
+ target_link_libraries(sentencepiece_train ${SPM_LIBS} sentencepiece)
+@@ -220,7 +214,7 @@ if (SPM_ENABLE_SHARED)
+ (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "sh4"))
+ list(APPEND SPM_LIBS "atomic")
+ endif()
+- set(SPM_INSTALLTARGETS sentencepiece sentencepiece_train sentencepiece-static sentencepiece_train-static)
++ set(SPM_INSTALLTARGETS sentencepiece sentencepiece_train)
+ set_target_properties(sentencepiece sentencepiece_train PROPERTIES SOVERSION 0 VERSION 0.0.0)
+ set_target_properties(sentencepiece PROPERTIES WINDOWS_EXPORT_ALL_SYMBOLS YES)
+ set_target_properties(sentencepiece_train PROPERTIES WINDOWS_EXPORT_ALL_SYMBOLS YES)
+@@ -237,9 +231,6 @@ else()
+ set(SPM_INSTALLTARGETS sentencepiece-static sentencepiece_train-static)
+ endif()
+
+-set_target_properties(sentencepiece-static PROPERTIES OUTPUT_NAME "sentencepiece")
+-set_target_properties(sentencepiece_train-static PROPERTIES OUTPUT_NAME "sentencepiece_train")
+-
+ if (NOT MSVC)
+ if (SPM_COVERAGE)
+ set(CMAKE_CXX_FLAGS "-O0 -Wall -fPIC -coverage ${CMAKE_CXX_FLAGS}")
--- /dev/null
+From: Kentaro Hayashi <kenhys@gmail.com>
+Date: Mon, 21 Nov 2022 22:17:18 +0900
+Subject: Include necessary headers to ensure IS_BIG_ENDIAN is defined
+
+normalizer.h uses IS_BIG_ENDIAN, which is defined in util.h.
+Include util.h here.
+
+Author: Steve Langasek <steve.langasek@ubuntu.com>
+Last-Update: 2022-08-27
+Forwarded: no
+Bug-Debian: https://bugs.debian.org/1017360
+---
+ src/normalizer.h | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/src/normalizer.h b/src/normalizer.h
+index c79813c..37fdb8a 100644
+--- a/src/normalizer.h
++++ b/src/normalizer.h
+@@ -22,6 +22,7 @@
+ #include <vector>
+
+ #include "common.h"
++#include "util.h"
+ #include "sentencepiece_model.pb.h"
+ #include "sentencepiece_processor.h"
+ #include "third_party/absl/strings/string_view.h"
--- /dev/null
+0001-update-python-wrapper.patch
+0002-remove-debug-symbols-from-wheel-package.patch
+0003-allow-tab-character-to-be-used-in-user_defined_symbo.patch
+0004-add-test-to-use-tab-as-user-defined-symbols.patch
+0005-Uses-C-17-by-default.patch
+0006-Uses-std-atomic-to-define-global-variable.patch
+0007-Fix-a-typo.patch
+0008-Uses-absl-string_view-as-much-as-possible.patch
+0009-Fixed-build-break.patch
+0010-Added-ImmutableSentencePiece-class.patch
+0011-add-verbose-option.patch
+0012-Supports-ImmutableSentencePieceText-from-python-modu.patch
+0013-Adds-more-unittests.patch
+0014-Adds-SWIGPYTHON-flag.patch
+0015-remove-unused-ifdef-SWIG-macro.patch
+0016-Fixed-test-failure.patch
+0017-Uses-property-in-immutable-proto.patch
+0018-automatically-detect-the-number-of-CPUs-in-batch-pro.patch
+0019-support-slice-in-pieces-nbests-objects.patch
+0020-Updated-the-document.patch
+0021-Fixed-errors-in-example-notebook.patch
+0022-Fix-dead-links.patch
+0023-added-ShutdownLibrary-function-to-uninitialize-globa.patch
+0024-Fixed-the-issue-of-concatinating-paths-for-pkg-confi.patch
+disable-static-library.patch
+support-python-module-in-place.patch
+header-dependencies.patch
--- /dev/null
+From: Kentaro Hayashi <kenhys@gmail.com>
+Date: Mon, 21 Nov 2022 22:13:33 +0900
+Subject: Support to build Python module without pkg-config
+
+---
+ python/setup.py | 36 ++++++++++++++++++++----------------
+ 1 file changed, 20 insertions(+), 16 deletions(-)
+
+diff --git a/python/setup.py b/python/setup.py
+index fdf9394..5170d9a 100755
+--- a/python/setup.py
++++ b/python/setup.py
+@@ -77,25 +77,29 @@ class build_ext(_build_ext):
+ """Override build_extension to run cmake."""
+
+ def build_extension(self, ext):
+- cflags, libs = get_cflags_and_libs('../build/root')
+- if len(libs) == 0:
+- cflags, libs = get_cflags_and_libs('./bundled/root')
+-
+- if len(libs) == 0:
+- if is_sentencepiece_installed():
+- cflags = cflags + run_pkg_config('cflags')
+- libs = run_pkg_config('libs')
+- else:
+- subprocess.check_call(['./build_bundled.sh', __version__])
+- cflags, libs = get_cflags_and_libs('./bundled/root')
++ # cflags, libs = get_cflags_and_libs('../build/root')
++ # if len(libs) == 0:
++ # cflags, libs = get_cflags_and_libs('./bundled/root')
++
++ # if len(libs) == 0:
++ # if is_sentencepiece_installed():
++ # cflags = cflags + run_pkg_config('cflags')
++ # libs = run_pkg_config('libs')
++ # else:
++ # subprocess.check_call(['./build_bundled.sh', __version__])
++ # cflags, libs = get_cflags_and_libs('./bundled/root')
+
+ # Fix compile on some versions of Mac OSX
+ # See: https://github.com/neulab/xnmt/issues/199
+- if sys.platform == 'darwin':
+- cflags.append('-mmacosx-version-min=10.9')
+- else:
+- cflags.append('-Wl,-strip-all')
+- libs.append('-Wl,-strip-all')
++ # if sys.platform == 'darwin':
++ # cflags.append('-mmacosx-version-min=10.9')
++ # else:
++ # cflags.append('-Wl,-strip-all')
++ # libs.append('-Wl,-strip-all')
++ cflags = ['-I../src']
++ cmd = "dpkg-architecture -q DEB_BUILD_GNU_TYPE"
++ arch = subprocess.check_output(cmd, shell=True).decode("utf-8").strip().split()[0]
++ libs = ["-L../obj-%s/src" % arch, "-lsentencepiece", "-lsentencepiece_train"]
+ print('## cflags={}'.format(' '.join(cflags)))
+ print('## libs={}'.format(' '.join(libs)))
+ ext.extra_compile_args = cflags
--- /dev/null
+usr/lib/python3.*/
--- /dev/null
+#!/usr/bin/make -f
+# -*- makefile -*-
+# Sample debian/rules that uses debhelper.
+# This file was originally written by Joey Hess and Craig Small.
+# As a special exception, when this file is copied by dh-make into a
+# dh-make output file, you may use that output file without restriction.
+# This special exception was added by Craig Small in version 0.37 of dh-make.
+
+# Uncomment this to turn on verbose mode.
+#export DH_VERBOSE=1
+export DEB_BUILD_MAINT_OPTIONS = hardening=+all
+DPKG_EXPORT_BUILDFLAGS = 1
+include /usr/share/dpkg/buildflags.mk
+
+ifneq (,$(filter $(DEB_HOST_ARCH), armel mipsel m68k powerpc sh4))
+ export DEB_LDFLAGS_MAINT_APPEND += -Wl,--no-as-needed -latomic -Wl,--as-needed
+endif
+
+%:
+ dh $@ --with python3 --buildsystem=cmake
+
+override_dh_auto_configure:
+ dh_auto_configure --buildsystem=cmake
+ dh_auto_configure --sourcedirectory=python --buildsystem=pybuild
+
+override_dh_auto_build:
+ dh_auto_build --buildsystem=cmake
+ dh_auto_build --sourcedirectory=python --buildsystem=pybuild
+
+override_dh_auto_install: basedir=$(shell pwd)/debian
+override_dh_auto_install:
+ dh_auto_install --buildsystem=cmake
+ dh_auto_install --sourcedirectory=python --buildsystem=pybuild
+
+override_dh_auto_clean:
+ dh_auto_clean --buildsystem=cmake
+ -rm -rf .pybuild
+ -rm -rf .python/sentencepiece.egg-info
+
+# Do no tests.
+override_dh_auto_test:
--- /dev/null
+---
+include:
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
+ - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/pipeline-jobs.yml
+
+reprotest:
+ allow_failure: true
--- /dev/null
+<?xml version='1.0' encoding='UTF-8'?>
+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
+
+<!--
+
+`xsltproc -''-nonet \
+ -''-param man.charmap.use.subset "0" \
+ -''-param make.year.ranges "1" \
+ -''-param make.single.year.ranges "1" \
+ /usr/share/xml/docbook/stylesheet/nwalsh/manpages/docbook.xsl \
+ manpage.xml'
+
+A manual page <package>.<section> will be generated. You may view the
+manual page with: nroff -man <package>.<section> | less'. A typical entry
+in a Makefile or Makefile.am is:
+
+DB2MAN = /usr/share/sgml/docbook/stylesheet/xsl/nwalsh/manpages/docbook.xsl
+XP = xsltproc -''-nonet -''-param man.charmap.use.subset "0"
+
+manpage.1: manpage.xml
+ $(XP) $(DB2MAN) $<
+
+The xsltproc binary is found in the xsltproc package. The XSL files are in
+docbook-xsl. A description of the parameters you can use can be found in the
+docbook-xsl-doc-* packages. Please remember that if you create the nroff
+version in one of the debian/rules file targets (such as build), you will need
+to include xsltproc and docbook-xsl in your Build-Depends control field.
+Alternatively use the xmlto command/package. That will also automatically
+pull in xsltproc and docbook-xsl.
+
+Notes for using docbook2x: docbook2x-man does not automatically create the
+AUTHOR(S) and COPYRIGHT sections. In this case, please add them manually as
+<refsect1> ... </refsect1>.
+
+To disable the automatic creation of the AUTHOR(S) and COPYRIGHT sections
+read /usr/share/doc/docbook-xsl/doc/manpages/authors.html. This file can be
+found in the docbook-xsl-doc-html package.
+
+Validation can be done using: `xmllint -''-noout -''-valid manpage.xml`
+
+General documentation about man-pages and man-page-formatting:
+man(1), man(7), http://www.tldp.org/HOWTO/Man-Page/
+
+-->
+
+ <!-- Fill in your name for FIRSTNAME and SURNAME. -->
+ <!ENTITY dhfirstname "FIRSTNAME">
+ <!ENTITY dhsurname "SURNAME">
+ <!-- dhusername could also be set to "&firstname; &surname;". -->
+ <!ENTITY dhusername "TSUCHIYA Masatoshi">
+ <!ENTITY dhemail "tsuchiya@namazu.org">
+ <!-- SECTION should be 1-8, maybe w/ subsection other parameters are
+ allowed: see man(7), man(1) and
+ http://www.tldp.org/HOWTO/Man-Page/q2.html. -->
+ <!ENTITY dhsection "SECTION">
+ <!-- TITLE should be something like "User commands" or similar (see
+ http://www.tldp.org/HOWTO/Man-Page/q2.html). -->
+ <!ENTITY dhtitle "sentencepiece User Manual">
+ <!ENTITY dhucpackage "CRFSUITE">
+ <!ENTITY dhpackage "sentencepiece">
+]>
+
+<refentry>
+ <refentryinfo>
+ <title>&dhtitle;</title>
+ <productname>&dhpackage;</productname>
+ <authorgroup>
+ <author>
+ <firstname>&dhfirstname;</firstname>
+ <surname>&dhsurname;</surname>
+ <contrib>Wrote this manpage for the Debian system.</contrib>
+ <address>
+ <email>&dhemail;</email>
+ </address>
+ </author>
+ </authorgroup>
+ <copyright>
+ <year>2007</year>
+ <holder>&dhusername;</holder>
+ </copyright>
+ <legalnotice>
+ <para>This manual page was written for the Debian system
+ (but may be used by others).</para>
+ <para>Permission is granted to copy, distribute and/or modify this
+ document under the terms of the GNU General Public License,
+ Version 2 or (at your option) any later version published by
+ the Free Software Foundation.</para>
+ <para>On Debian systems, the complete text of the GNU General Public
+ License can be found in
+ <filename>/usr/share/common-licenses/GPL</filename>.</para>
+ </legalnotice>
+ </refentryinfo>
+ <refmeta>
+ <refentrytitle>&dhucpackage;</refentrytitle>
+ <manvolnum>&dhsection;</manvolnum>
+ </refmeta>
+ <refnamediv>
+ <refname>&dhpackage;</refname>
+ <refpurpose>program to do something</refpurpose>
+ </refnamediv>
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>&dhpackage;</command>
+ <!-- These are several examples, how syntaxes could look -->
+ <arg choice="plain"><option>-e <replaceable>this</replaceable></option></arg>
+ <arg choice="opt"><option>--example=<parameter>that</parameter></option></arg>
+ <arg choice="opt">
+ <group choice="req">
+ <arg choice="plain"><option>-e</option></arg>
+ <arg choice="plain"><option>--example</option></arg>
+ </group>
+ <replaceable class="option">this</replaceable>
+ </arg>
+ <arg choice="opt">
+ <group choice="req">
+ <arg choice="plain"><option>-e</option></arg>
+ <arg choice="plain"><option>--example</option></arg>
+ </group>
+ <group choice="req">
+ <arg choice="plain"><replaceable>this</replaceable></arg>
+ <arg choice="plain"><replaceable>that</replaceable></arg>
+ </group>
+ </arg>
+ </cmdsynopsis>
+ <cmdsynopsis>
+ <command>&dhpackage;</command>
+ <!-- Normally the help and version options make the programs stop
+ right after outputting the requested information. -->
+ <group choice="opt">
+ <arg choice="plain">
+ <group choice="req">
+ <arg choice="plain"><option>-h</option></arg>
+ <arg choice="plain"><option>--help</option></arg>
+ </group>
+ </arg>
+ <arg choice="plain">
+ <group choice="req">
+ <arg choice="plain"><option>-v</option></arg>
+ <arg choice="plain"><option>--version</option></arg>
+ </group>
+ </arg>
+ </group>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+ <refsect1 id="description">
+ <title>DESCRIPTION</title>
+ <para>This manual page documents briefly the
+ <command>&dhpackage;</command> and <command>bar</command>
+ commands.</para>
+ <para>This manual page was written for the Debian distribution
+ because the original program does not have a manual page.
+ Instead, it has documentation in the GNU <citerefentry>
+ <refentrytitle>info</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry> format; see below.</para>
+ <para><command>&dhpackage;</command> is a program that...</para>
+ </refsect1>
+ <refsect1 id="options">
+ <title>OPTIONS</title>
+ <para>The program follows the usual GNU command line syntax,
+ with long options starting with two dashes (`-'). A summary of
+ options is included below. For a complete description, see the
+ <citerefentry>
+ <refentrytitle>info</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry> files.</para>
+ <variablelist>
+ <!-- Use the variablelist.term.separator and the
+ variablelist.term.break.after parameters to
+ control the term elements. -->
+ <varlistentry>
+ <term><option>-e <replaceable>this</replaceable></option></term>
+ <term><option>--example=<replaceable>that</replaceable></option></term>
+ <listitem>
+ <para>Does this and that.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>Show summary of options.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>Show version of program.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect1>
+ <refsect1 id="files">
+ <title>FILES</title>
+ <variablelist>
+ <varlistentry>
+ <term><filename>/etc/foo.conf</filename></term>
+ <listitem>
+ <para>The system-wide configuration file to control the
+ behaviour of <application>&dhpackage;</application>. See
+ <citerefentry>
+ <refentrytitle>foo.conf</refentrytitle>
+ <manvolnum>5</manvolnum>
+ </citerefentry> for further details.</para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><filename>${HOME}/.foo.conf</filename></term>
+ <listitem>
+ <para>The per-user configuration file to control the
+ behaviour of <application>&dhpackage;</application>. See
+ <citerefentry>
+ <refentrytitle>foo.conf</refentrytitle>
+ <manvolnum>5</manvolnum>
+ </citerefentry> for further details.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect1>
+ <refsect1 id="environment">
+ <title>ENVIONMENT</title>
+ <variablelist>
+ <varlistentry>
+ <term><envar>FOO_CONF</envar></term>
+ <listitem>
+ <para>If used, the defined file is used as configuration
+ file (see also <xref linkend="files"/>).</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect1>
+ <refsect1 id="diagnostics">
+ <title>DIAGNOSTICS</title>
+ <para>The following diagnostics may be issued
+ on <filename class="devicefile">stderr</filename>:</para>
+ <variablelist>
+ <varlistentry>
+ <term><errortext>Bad configuration file. Exiting.</errortext></term>
+ <listitem>
+ <para>The configuration file seems to contain a broken configuration
+ line. Use the <option>--verbose</option> option, to get more info.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para><command>&dhpackage;</command> provides some return codes, that can
+ be used in scripts:</para>
+ <segmentedlist>
+ <segtitle>Code</segtitle>
+ <segtitle>Diagnostic</segtitle>
+ <seglistitem>
+ <seg><errorcode>0</errorcode></seg>
+ <seg>Program exited successfully.</seg>
+ </seglistitem>
+ <seglistitem>
+ <seg><errorcode>1</errorcode></seg>
+ <seg>The configuration file seems to be broken.</seg>
+ </seglistitem>
+ </segmentedlist>
+ </refsect1>
+ <refsect1 id="bugs">
+ <!-- Or use this section to tell about upstream BTS. -->
+ <title>BUGS</title>
+ <para>The program is currently limited to only work
+ with the <package>foobar</package> library.</para>
+ <para>The upstreams <acronym>BTS</acronym> can be found
+ at <ulink url="http://bugzilla.foo.tld"/>.</para>
+ </refsect1>
+ <refsect1 id="see_also">
+ <title>SEE ALSO</title>
+ <!-- In alpabetical order. -->
+ <para><citerefentry>
+ <refentrytitle>bar</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry>, <citerefentry>
+ <refentrytitle>baz</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry>, <citerefentry>
+ <refentrytitle>foo.conf</refentrytitle>
+ <manvolnum>5</manvolnum>
+ </citerefentry></para>
+ <para>The programs are documented fully by <citetitle>The Rise and
+ Fall of a Fooish Bar</citetitle> available via the <citerefentry>
+ <refentrytitle>info</refentrytitle>
+ <manvolnum>1</manvolnum>
+ </citerefentry> system.</para>
+ </refsect1>
+</refentry>
+
--- /dev/null
+3.0 (quilt)
--- /dev/null
+version=4
+opts="filenamemangle=s%(?:.*?)?v?(\d[\d.]*)\.tar\.gz%sentencepiece-$1-Source.tar.xz%" \
+ https://github.com/google/sentencepiece/tags \
+ (?:.*?/)?v(\d[\d.]*)\.tar\.gz debian uupdate